All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 00/20] Generic Ring Buffer Library
@ 2010-07-09 22:57 Mathieu Desnoyers
  2010-07-09 22:57   ` Mathieu Desnoyers
                   ` (19 more replies)
  0 siblings, 20 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen


This patchset implements a generic ring buffer library, which provides a very
efficient, yet flexible, API to both tracers and drivers to move large amounts
of data within and outside of kernel-space.

It comes as a response to Linus mandate from the 2008 Kernel Summit. In May
2010, Steven Rostedt, author of the Ftrace ring buffer, came forward and asked
me if I could handle this task, which results in this patchset. In addition to
come up with a common ring buffer, this patchset takes into account the pressing
industry need for a blazingly fast, and reliable, ring buffer. Tracing, as we
know, is a very resource-hungry activity, which should be kept to small
percentage of system's resources in order to be useful on heavy workloads, which
are the most likely to reproduce bugs and performance problems.

It is derived from the LTTng ring buffer, heavily cleaned up and librarified to
become a generic kernel ring buffer. The flexibility provided by this library
does _not_ come at the expense of performance, because each library client
provides its own constant "configuration" structure passed along to each
fast-path inline function, therefore letting the compiler perform code selection
statically. The slow paths are shared amongst all clients, which allows overall
code size savings as the number of library clients increases.


* History

As far back as May 2005, LTTng implemented its ring buffer from scratch,
learning lessons from K42, RelayFS and LTT. It became lock-less in October 2005.
It has been widely used by the industry and shipped with many embedded and
real-time distributions. Since then, Ftrace (2008, lock-less in 2009) and Perf
(2010) implemented their own ring buffer.

Ftrace ring buffer offers lock-less, per-cpu buffers, with good performance (at
least on x86, however Tim Bird reported that the amount of local_t operations on
the fast-path made did perform poorly on ARM). The main problems seen with this
ring buffer are linked with its complexity level, mainly caused by lack of
abstraction between the ring buffer format, locking, memory backend, and
time-stamping. Steven finally asked me to try coming forward with a generic ring
buffer in this post: http://lwn.net/Articles/389199/

Perf ring-buffer has lately seen some improvement regarding speed following
criticism from Steven and myself. Perf ring-buffer scheme was benchmarked by
Steven Rostedt, showing it was about 4 times slower than Ftrace and LTTng at
that point. Perf performance improved since then, but the monolithic nature
of its ring buffer, being inherently tied to the tracer, and the fact that it
does not implement a flight recorder mode required significant effort to come up
with numbers comparable with other ring buffers.

The state of the Perf ring buffer code is as we could expect from code having
being heavily modified in a short amount of time (a quick glance at the code
shows at least one clearly missing memory barrier in perf_output_put_handle() in
the 2.6.35-rc4-tip tree). The Perf user-space ABI comes as a pain point, as it
ties the ring buffer implementation to the control ABI exported to user-space
through a mmap'd page.  The user-space perf tool therefore interacts with the
kernel through reads and writes in a shared memory region without using system
calls.  This direct link between the kernel data structures and the user-space
ABI makes most abstractions impracticable and heavily constrains kernel-level
ring buffer design.


* Benchmarks

  * Throughput

These results shows the time it takes to write an entry to each ring buffer
implementation (generic library, Ftrace, Perf). The test is an adaptation of
kernel/trace/ring_buffer_benchmark.c, which stress-tests the ring buffer by
writing and reading data to/from it for 10 seconds.

  Setup:
  - Clock source:
    - trace_clock_local() for Generic Ring Buffer Library and Ftrace
    - perf_clock() for Perf
  - 1MB per-cpu buffers
  - 4 byte payload/event (contains the cpu id)
  - A single producer thread
  - On a Xeon 2.0GHz E5405, dual-socket, 4 cores/socket (8 cores total)
  - Using Ftrace ring_buffer_benchmark (adapted to the Ring Buffer Library)
    - producer_fifo=10
  - Kernel: 2.6.35-rc4-tip

    * Overwrite (flight-recorder) mode

Ring Buffer Library:       83 ns/entry (512kB sub-buffers, no reader)
                           84 ns/entry (128kB sub-buffers, no reader)
                           86 ns/entry (4kB sub-buffers,   no reader)
                           89 ns/entry (512kB sub-buffers: read 0.3M entries/s)
                          111 ns/entry (128kB sub-buffers: read 1.9M entries/s)
                          199 ns/entry (4kB sub-buffers:   read 4.8M entries/s)
   Reader wake-up: performed by per-cpu timer each 100ms.

Ftrace Ring Buffer:       103 ns/entry (no reader)
                          148 ns/entry (read by page:      read 6.6M entries/s)
                          187 ns/entry (read by event:     read 0.4M entries/s)
   Reader wake-up: each 100 events written (arbitrarily chosen by benchmark)

Perf record               (flight recorder mode unavailable)


    * Discard (producer-consumer) mode

Generic Ring Buffer Library:           (128kB sub-buffers: read 2.8M entries/s)

(in 10s)
Written:              28020143
Lost:                 28995757
Read:                 28017426

                           96 ns/entry discarded
                          257 ns/entry written

Perf record:
    (using patch from post http://lkml.org/lkml/2010/5/20/313)
# modprobe ring_buffer_benchmark producer_fifo=10 trace_event=1
# perf record -e rb_bench:rb_benchmark -c1 -a sleep 30

[ perf record: Woken up 169 times to write data ]
[ perf record: Captured and wrote 1623.336 MB perf.data (~70924640 samples) ]

# cat /debug/tracing/trace
   Note: output of the benchmark module is incorrect, because it does not take
into account events discarded by Perf.

Using the perf output approximation of 70924640 entries written in 30 seconds
leads to:
                         423 ns/entry (read: 2.3M entries/s)

Note that these numbers are based on the perf event approximation output (based
on a 24 bytes/entry estimation) rather than the benchmark module count due to
the inaccuracy discussed earlier.


  * Scalability

    * Generic Ring Buffer Library

I modified the ring buffer benchmark module for the "basic API" of the ring
buffer library (pre-built clients) to support multiple writer threads. The
following test uses 1MB per-cpu buffers (128kB sub-buffers) with local per-cpu
read iterators. It does not use any time-stamp; we notice that the numbers are
quite close to those of the throughput benchmark section above, meaning that the
extra overhead of the basic API compensates for the removal of
trace_clock_local() calls.

1 writer thread :  83 ns CPU time/record
2 writer threads: 116 ns CPU time/record
4 writer threads: 116 ns CPU time/record
8 writer threads: 118 ns CPU time/record

So basically the write-side scales almost linearly with the number of cores,
with the exception of L2 cache hits. This is because we are using 1MB per-core
buffers; we therefore hit the L2 cache shared amongst pairs of cores.

Saving a time-stamp taken with trace_clock() with each event (generic library
per-cpu buffers with support for channel-wide iterator) moves the scalability
drop point at 4 writer threads. The higher overhead and scalability change is
caused by the use of trace_clock() rather than trace_clock_local():

1 writer thread : 191 ns CPU time/record
2 writer threads: 189 ns CPU time/record
4 writer threads: 260 ns CPU time/record
8 writer threads: 265 ns CPU time/record


    * Ftrace

With a similar benchmark module, with time-stamps taken with
trace_clock_local(), Ftrace gives:

1 writer thread : 104 ns CPU time/record
2 writer threads: 165 ns CPU time/record
4 writer threads: 146 ns CPU time/record
8 writer threads: 153 ns CPU time/record


  * Formal Verification with Model-Checking

In addition to thorough testing, the Ring Buffer Library lock-less buffering
algorithm has been modeled and checked for races using the SPIN verifier on a
model detecting concurrent memory accesses in for all execution paths. This
model covers all the ring-buffer corner-cases. Note that this model assumes a
sequentially coherent machine; therefore memory barriers should be carefully
reviewed. I plan on enhancing this model with memory ordering awareness in the
future, but just did not have the time on my hand to tackle this task yet.

Even though it does not take away the risk of having discrepancy between the
implementation and the actual implementation and behavior on real hardware, this
additional level of formal verification provides a good level of confidence in
the ring buffer algorithm reliability.


Comments on this code would be very welcome,

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 01/20] Create generic alignment API (v8)
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
@ 2010-07-09 22:57   ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 02/20] notifier atomic call chain notrace Mathieu Desnoyers
                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen, Alexander Shishkin,
	Russell King - ARM Linux, linux-arm-kernel, Imre Deak,
	Jamie Lokier, Alexey Dobriyan

[-- Attachment #1: create-generic-alignment-api.patch --]
[-- Type: text/plain, Size: 6537 bytes --]

Rather than re-doing the "alignment on a type size" trick all over again at
different levels, import the "ltt_align" from LTTng into kernel.h and make this
available to everyone. Renaming to:

- object_align()
- object_align_floor()
- offset_align()
- offset_align_floor()

Changelog since v7:
- Add missing include/linux/Kconfig header-y.

Changelog since v6:
- Adapt to changes introduced by
  commit a79ff731a1b277d0e92d9453bdf374e04cec717a
- Use __alignof__() instead of sizeof() to support compound types.

Changelog since v5:
- moved alignment apis to a separate header file so that it is possible
  to use them from other header files which are, for example, included
  from kernel.h.

Changelog since v4:
- add missing ( ) around parameters within object_align() and
  object_align_floor().
- More coding style cleanups to ALIGN() (checkpatch.pl is happy now).

Changelog since v3:
- optimize object_align*() so fewer instructions are needed for alignment of
  addresses known dynamically. Use the (already existing) "ALIGN()", and create
  the "ALIGN_FLOOR()" macro.
- While we are there, let's clean up the ALIGN() macros wrt coding style. e.g.
  missing parenthesis around the first use of the "x" parameter in ALIGN().

Changelog since v2:
- Fix object_align*(): should use object size alignment, not pointer alignment.

Changelog since v1:
- Align on the object natural alignment
    (rather than min(arch word alignment, natural alignment))

The advantage of separating the API in "object alignment" and "offset alignment"
is that it gives more freedom to play with offset alignment. Very useful to
implement a tracer ring-buffer alignment. (hint hint)

Typical users will use "object alignment", but infrastructures like tracers
which need to perform alignment of statically known base+offsets will typically
use "offset alignment", because it allows to align with respect to a base rather
than to pass an absolute address.

We use "sizeof(object)" rather than "__alignof__()" object because alignof
returns "recommended" object alignment for the architecture, which can be
sub-optimal on some architectures. By ensuring alignment on the object size, we
are sure to make the right choice.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: Russell King - ARM Linux <linux@arm.linux.org.uk>
CC: linux-arm-kernel@lists.infradead.org
CC: Imre Deak <imre.deak@nokia.com>
CC: Jamie Lokier <jamie@shareable.org>
CC: rostedt@goodmis.org
CC: mingo@elte.hu
CC: Alexey Dobriyan <adobriyan@gmail.com>
---
 include/linux/Kbuild   |    1 
 include/linux/align.h  |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/kernel.h |    8 -------
 3 files changed, 58 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/align.h

Index: linux.trees.git/include/linux/align.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/align.h	2010-07-01 11:54:43.000000000 -0400
@@ -0,0 +1,56 @@
+#ifndef _LINUX_ALIGN_H
+#define _LINUX_ALIGN_H
+
+#define __ALIGN_KERNEL(x, a)	__ALIGN_KERNEL_MASK((x), (typeof(x))(a) - 1)
+#define __ALIGN_KERNEL_MASK(x, mask) \
+				(((x) + (mask)) & ~(mask))
+
+#ifdef __KERNEL__
+
+#include <linux/types.h>
+
+#define ALIGN(x, a)		__ALIGN_KERNEL((x), (a))
+#define __ALIGN_MASK(x, mask)	__ALIGN_KERNEL_MASK((x), (mask))
+#define PTR_ALIGN(p, a)		((typeof(p)) ALIGN((unsigned long) (p), (a)))
+#define ALIGN_FLOOR(x, a)	__ALIGN_FLOOR_MASK((x), (typeof(x)) (a) - 1)
+#define __ALIGN_FLOOR_MASK(x, mask)	((x) & ~(mask))
+#define PTR_ALIGN_FLOOR(p, a) \
+			((typeof(p)) ALIGN_FLOOR((unsigned long) (p), (a)))
+#define IS_ALIGNED(x, a)	(((x) & ((typeof(x)) (a) - 1)) == 0)
+
+/*
+ * Align pointer on natural object alignment. Object size must be power of two.
+ */
+#define object_align(obj)	PTR_ALIGN((obj), __alignof__(*(obj)))
+#define object_align_floor(obj)	PTR_ALIGN_FLOOR((obj), __alignof__(*(obj)))
+
+/**
+ * offset_align - Calculate the offset needed to align an object on its natural
+ *                alignment towards higher addresses.
+ * @align_drift:  object offset from an "alignment"-aligned address.
+ * @alignment:    natural object alignment. Must be non-zero, power of 2.
+ *
+ * Returns the offset that must be added to align towards higher
+ * addresses.
+ */
+static inline size_t offset_align(size_t align_drift, size_t alignment)
+{
+	return (alignment - align_drift) & (alignment - 1);
+}
+
+/**
+ * offset_align_floor - Calculate the offset needed to align an object
+ *                      on its natural alignment towards lower addresses.
+ * @align_drift:  object offset from an "alignment"-aligned address.
+ * @alignment:    natural object alignment. Must be non-zero, power of 2.
+ *
+ * Returns the offset that must be substracted to align towards lower addresses.
+ */
+static inline size_t offset_align_floor(size_t align_drift, size_t alignment)
+{
+	return (align_drift - alignment) & (alignment - 1);
+}
+
+#endif /* __KERNEL__ */
+
+#endif
Index: linux.trees.git/include/linux/kernel.h
===================================================================
--- linux.trees.git.orig/include/linux/kernel.h	2010-07-01 11:54:43.000000000 -0400
+++ linux.trees.git/include/linux/kernel.h	2010-07-01 12:29:05.000000000 -0400
@@ -4,8 +4,7 @@
 /*
  * 'kernel.h' contains some often-used function prototypes etc
  */
-#define __ALIGN_KERNEL(x, a)		__ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
-#define __ALIGN_KERNEL_MASK(x, mask)	(((x) + (mask)) & ~(mask))
+#include <linux/align.h>
 
 #ifdef __KERNEL__
 
@@ -39,11 +38,6 @@ extern const char linux_proc_banner[];
 
 #define STACK_MAGIC	0xdeadbeef
 
-#define ALIGN(x, a)		__ALIGN_KERNEL((x), (a))
-#define __ALIGN_MASK(x, mask)	__ALIGN_KERNEL_MASK((x), (mask))
-#define PTR_ALIGN(p, a)		((typeof(p))ALIGN((unsigned long)(p), (a)))
-#define IS_ALIGNED(x, a)		(((x) & ((typeof(x))(a) - 1)) == 0)
-
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
 
 /*
Index: linux.trees.git/include/linux/Kbuild
===================================================================
--- linux.trees.git.orig/include/linux/Kbuild	2010-07-01 12:29:13.000000000 -0400
+++ linux.trees.git/include/linux/Kbuild	2010-07-01 12:29:40.000000000 -0400
@@ -18,6 +18,7 @@ header-y += usb/
 
 header-y += affs_hardblocks.h
 header-y += aio_abi.h
+header-y += align.h
 header-y += arcfb.h
 header-y += atmapi.h
 header-y += atmarp.h


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 01/20] Create generic alignment API (v8)
@ 2010-07-09 22:57   ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: create-generic-alignment-api.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100709/c715dd62/attachment.ksh>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 02/20] notifier atomic call chain notrace
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
  2010-07-09 22:57   ` Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 03/20] idle notifier standardization Mathieu Desnoyers
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen, Mathieu Desnoyers, Jason Baron

[-- Attachment #1: notifier-atomic-call-chain-notrace.patch --]
[-- Type: text/plain, Size: 1424 bytes --]

Being able to use the atomic notifier from cpu idle entry to ensure the tracer
flush the last events in the current subbuffer requires the rcu read-side to be
marked "notrace".

Also apply to the the die notifier.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@redhat.com>
CC: Jason Baron <jbaron@redhat.com>
CC: mingo@elte.hu
---
 kernel/notifier.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng/kernel/notifier.c
===================================================================
--- linux-2.6-lttng.orig/kernel/notifier.c	2009-11-12 17:58:56.000000000 -0500
+++ linux-2.6-lttng/kernel/notifier.c	2009-11-12 18:03:28.000000000 -0500
@@ -148,7 +148,7 @@ int atomic_notifier_chain_unregister(str
 	spin_lock_irqsave(&nh->lock, flags);
 	ret = notifier_chain_unregister(&nh->head, n);
 	spin_unlock_irqrestore(&nh->lock, flags);
-	synchronize_rcu();
+	synchronize_sched();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(atomic_notifier_chain_unregister);
@@ -178,9 +178,9 @@ int __kprobes __atomic_notifier_call_cha
 {
 	int ret;
 
-	rcu_read_lock();
+	rcu_read_lock_sched_notrace();
 	ret = notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls);
-	rcu_read_unlock();
+	rcu_read_unlock_sched_notrace();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__atomic_notifier_call_chain);


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 03/20] idle notifier standardization
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
  2010-07-09 22:57   ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 02/20] notifier atomic call chain notrace Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 04/20] idle notifier standardization x86_32 Mathieu Desnoyers
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: idle-notifier-standardize.patch --]
[-- Type: text/plain, Size: 5409 bytes --]

Move idle notifiers into arch-agnostic code. Adapt x86 64 accordingly to call
the new architecture-agnostic notifiers rather than its own. Other architectures
are still "todo".

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 arch/x86/include/asm/idle.h  |    7 -------
 arch/x86/kernel/process_64.c |   19 +++----------------
 drivers/idle/i7300_idle.c    |    5 +++--
 include/linux/idle.h         |   19 +++++++++++++++++++
 kernel/notifier.c            |   25 +++++++++++++++++++++++++
 5 files changed, 50 insertions(+), 25 deletions(-)

Index: linux.trees.git/arch/x86/kernel/process_64.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_64.c	2010-06-25 15:12:56.000000000 -0400
+++ linux.trees.git/arch/x86/kernel/process_64.c	2010-06-25 15:12:59.000000000 -0400
@@ -35,6 +35,7 @@
 #include <linux/tick.h>
 #include <linux/prctl.h>
 #include <linux/uaccess.h>
+#include <linux/idle.h>
 #include <linux/io.h>
 #include <linux/ftrace.h>
 
@@ -58,31 +59,17 @@ asmlinkage extern void ret_from_fork(voi
 DEFINE_PER_CPU(unsigned long, old_rsp);
 static DEFINE_PER_CPU(unsigned char, is_idle);
 
-static ATOMIC_NOTIFIER_HEAD(idle_notifier);
-
-void idle_notifier_register(struct notifier_block *n)
-{
-	atomic_notifier_chain_register(&idle_notifier, n);
-}
-EXPORT_SYMBOL_GPL(idle_notifier_register);
-
-void idle_notifier_unregister(struct notifier_block *n)
-{
-	atomic_notifier_chain_unregister(&idle_notifier, n);
-}
-EXPORT_SYMBOL_GPL(idle_notifier_unregister);
-
 void enter_idle(void)
 {
 	percpu_write(is_idle, 1);
-	atomic_notifier_call_chain(&idle_notifier, IDLE_START, NULL);
+	notify_idle(IDLE_START);
 }
 
 static void __exit_idle(void)
 {
 	if (x86_test_and_clear_bit_percpu(0, is_idle) == 0)
 		return;
-	atomic_notifier_call_chain(&idle_notifier, IDLE_END, NULL);
+	notify_idle(IDLE_END);
 }
 
 /* Called from interrupts to signify idle end */
Index: linux.trees.git/arch/x86/include/asm/idle.h
===================================================================
--- linux.trees.git.orig/arch/x86/include/asm/idle.h	2010-06-25 15:12:56.000000000 -0400
+++ linux.trees.git/arch/x86/include/asm/idle.h	2010-06-25 15:12:59.000000000 -0400
@@ -1,13 +1,6 @@
 #ifndef _ASM_X86_IDLE_H
 #define _ASM_X86_IDLE_H
 
-#define IDLE_START 1
-#define IDLE_END 2
-
-struct notifier_block;
-void idle_notifier_register(struct notifier_block *n);
-void idle_notifier_unregister(struct notifier_block *n);
-
 #ifdef CONFIG_X86_64
 void enter_idle(void);
 void exit_idle(void);
Index: linux.trees.git/include/linux/idle.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/idle.h	2010-06-25 15:12:59.000000000 -0400
@@ -0,0 +1,19 @@
+/*
+ * include/linux/idle.h - generic idle definition
+ *
+ */
+#ifndef _LINUX_IDLE_H_
+#define _LINUX_IDLE_H_
+
+#include <linux/notifier.h>
+
+enum idle_val {
+	IDLE_START = 1,
+	IDLE_END = 2,
+};
+
+int notify_idle(enum idle_val val);
+void register_idle_notifier(struct notifier_block *n);
+void unregister_idle_notifier(struct notifier_block *n);
+
+#endif /* _LINUX_IDLE_H_ */
Index: linux.trees.git/kernel/notifier.c
===================================================================
--- linux.trees.git.orig/kernel/notifier.c	2010-06-25 15:12:59.000000000 -0400
+++ linux.trees.git/kernel/notifier.c	2010-06-25 15:12:59.000000000 -0400
@@ -5,6 +5,7 @@
 #include <linux/rcupdate.h>
 #include <linux/vmalloc.h>
 #include <linux/reboot.h>
+#include <linux/idle.h>
 
 /*
  *	Notifier list for kernel code which wants to be called
@@ -584,3 +585,27 @@ int unregister_die_notifier(struct notif
 	return atomic_notifier_chain_unregister(&die_chain, nb);
 }
 EXPORT_SYMBOL_GPL(unregister_die_notifier);
+
+static ATOMIC_NOTIFIER_HEAD(idle_notifier);
+
+/*
+ * Trace last event before calling notifiers. Notifiers flush data from buffers
+ * before going to idle.
+ */
+int notrace notify_idle(enum idle_val val)
+{
+	return atomic_notifier_call_chain(&idle_notifier, val, NULL);
+}
+EXPORT_SYMBOL_GPL(notify_idle);
+
+void register_idle_notifier(struct notifier_block *n)
+{
+	atomic_notifier_chain_register(&idle_notifier, n);
+}
+EXPORT_SYMBOL_GPL(register_idle_notifier);
+
+void unregister_idle_notifier(struct notifier_block *n)
+{
+	atomic_notifier_chain_unregister(&idle_notifier, n);
+}
+EXPORT_SYMBOL_GPL(unregister_idle_notifier);
Index: linux.trees.git/drivers/idle/i7300_idle.c
===================================================================
--- linux.trees.git.orig/drivers/idle/i7300_idle.c	2010-06-19 16:08:34.000000000 -0400
+++ linux.trees.git/drivers/idle/i7300_idle.c	2010-06-25 15:13:11.000000000 -0400
@@ -27,6 +27,7 @@
 #include <linux/debugfs.h>
 #include <linux/stop_machine.h>
 #include <linux/i7300_idle.h>
+#include <linux/idle.h>
 
 #include <asm/idle.h>
 
@@ -583,7 +584,7 @@ static int __init i7300_idle_init(void)
 		}
 	}
 
-	idle_notifier_register(&i7300_idle_nb);
+	register_idle_notifier(&i7300_idle_nb);
 
 	printk(KERN_INFO "i7300_idle: loaded v%s\n", I7300_IDLE_DRIVER_VERSION);
 	return 0;
@@ -591,7 +592,7 @@ static int __init i7300_idle_init(void)
 
 static void __exit i7300_idle_exit(void)
 {
-	idle_notifier_unregister(&i7300_idle_nb);
+	unregister_idle_notifier(&i7300_idle_nb);
 	free_cpumask_var(idle_cpumask);
 
 	if (debugfs_dir) {


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 04/20] idle notifier standardization x86_32
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2010-07-09 22:57 ` [patch 03/20] idle notifier standardization Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 05/20] Poll : add poll_wait_set_exclusive Mathieu Desnoyers
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: idle-notifier-x86_32.patch --]
[-- Type: text/plain, Size: 3333 bytes --]

Add idle notifier callback to x86_32.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 arch/x86/include/asm/idle.h  |    5 -----
 arch/x86/kernel/apm_32.c     |    6 ++++++
 arch/x86/kernel/process_32.c |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 5 deletions(-)

Index: linux.trees.git/arch/x86/kernel/process_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_32.c	2010-06-19 16:09:48.000000000 -0400
+++ linux.trees.git/arch/x86/kernel/process_32.c	2010-06-19 16:16:53.000000000 -0400
@@ -38,6 +38,8 @@
 #include <linux/uaccess.h>
 #include <linux/io.h>
 #include <linux/kdebug.h>
+#include <linux/notifier.h>
+#include <linux/idle.h>
 
 #include <asm/pgtable.h>
 #include <asm/system.h>
@@ -61,6 +63,30 @@
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
+static DEFINE_PER_CPU(unsigned char, is_idle);
+
+void enter_idle(void)
+{
+	percpu_write(is_idle, 1);
+	notify_idle(IDLE_START);
+}
+
+static void __exit_idle(void)
+{
+	if (x86_test_and_clear_bit_percpu(0, is_idle) == 0)
+		return;
+	notify_idle(IDLE_END);
+}
+
+/* Called from interrupts to signify idle end */
+void exit_idle(void)
+{
+	/* idle loop has pid 0 */
+	if (current->pid)
+		return;
+	__exit_idle();
+}
+
 /*
  * Return saved PC of a blocked thread.
  */
@@ -109,10 +135,17 @@ void cpu_idle(void)
 				play_dead();
 
 			local_irq_disable();
+			enter_idle();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
+			/*
+			 * In many cases the interrupt that ended idle
+			 * has already called exit_idle. But some idle loops can
+			 * be woken up without interrupt.
+			 */
+			__exit_idle();
 
 			trace_power_end(smp_processor_id());
 		}
Index: linux.trees.git/arch/x86/include/asm/idle.h
===================================================================
--- linux.trees.git.orig/arch/x86/include/asm/idle.h	2010-06-19 16:15:29.000000000 -0400
+++ linux.trees.git/arch/x86/include/asm/idle.h	2010-06-19 16:15:34.000000000 -0400
@@ -1,13 +1,8 @@
 #ifndef _ASM_X86_IDLE_H
 #define _ASM_X86_IDLE_H
 
-#ifdef CONFIG_X86_64
 void enter_idle(void);
 void exit_idle(void);
-#else /* !CONFIG_X86_64 */
-static inline void enter_idle(void) { }
-static inline void exit_idle(void) { }
-#endif /* CONFIG_X86_64 */
 
 void c1e_remove_cpu(int cpu);
 
Index: linux.trees.git/arch/x86/kernel/apm_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/apm_32.c	2010-06-19 16:08:32.000000000 -0400
+++ linux.trees.git/arch/x86/kernel/apm_32.c	2010-06-19 16:15:34.000000000 -0400
@@ -227,6 +227,7 @@
 #include <linux/suspend.h>
 #include <linux/kthread.h>
 #include <linux/jiffies.h>
+#include <linux/idle.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -947,10 +948,15 @@ recalc:
 				break;
 			}
 		}
+		enter_idle();
 		if (original_pm_idle)
 			original_pm_idle();
 		else
 			default_idle();
+		/* In many cases the interrupt that ended idle
+		   has already called exit_idle. But some idle
+		   loops can be woken up without interrupt. */
+		__exit_idle();
 		local_irq_disable();
 		jiffies_since_last_check = jiffies - last_jiffies;
 		if (jiffies_since_last_check > idle_period)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 05/20] Poll : add poll_wait_set_exclusive
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2010-07-09 22:57 ` [patch 04/20] idle notifier standardization x86_32 Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 06/20] prio_heap: heap_remove(), heap_maximum(), heap_replace() and heap_cherrypick() Mathieu Desnoyers
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen, Mathieu Desnoyers,
	William Lee Irwin III

[-- Attachment #1: poll-wait-exclusive.patch --]
[-- Type: text/plain, Size: 4238 bytes --]

Executive summary:

poll_wait_set_exclusive : set poll wait queue to exclusive

Sets up a poll wait queue to use exclusive wakeups. This is useful to
wake up only one waiter at each wakeup. Used to work-around "thundering herd"
problem.

Detail:

* Problem description :

In the ring buffer poll() implementation, a typical multithreaded user-space
buffer reader polls all per-cpu buffer descriptors for data.  The number of
reader threads can be user-defined; the motivation for permitting this is that
there are typical workloads where a single CPU is producing most of the tracing
data and all other CPUs are idle, available to consume data. It therefore makes
sense not to tie those threads to specific buffers. However, when the number of
threads grows, we face a "thundering herd" problem where many threads can be
woken up and put back to sleep, leaving only a single thread doing useful work.

* Solution :

Introduce a poll_wait_set_exclusive() primitive to poll API, so the code which
implements the pollfd operation can specify that only a single waiter must be
woken up.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: William Lee Irwin III <wli@holomorphy.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 fs/select.c          |   41 ++++++++++++++++++++++++++++++++++++++---
 include/linux/poll.h |    2 ++
 2 files changed, 40 insertions(+), 3 deletions(-)

Index: linux.trees.git/fs/select.c
===================================================================
--- linux.trees.git.orig/fs/select.c	2010-07-09 15:59:00.000000000 -0400
+++ linux.trees.git/fs/select.c	2010-07-09 16:03:24.000000000 -0400
@@ -112,6 +112,9 @@ struct poll_table_page {
  */
 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
 		       poll_table *p);
+static void __pollwait_exclusive(struct file *filp,
+				 wait_queue_head_t *wait_address,
+				 poll_table *p);
 
 void poll_initwait(struct poll_wqueues *pwq)
 {
@@ -152,6 +155,20 @@ void poll_freewait(struct poll_wqueues *
 }
 EXPORT_SYMBOL(poll_freewait);
 
+/**
+ * poll_wait_set_exclusive - set poll wait queue to exclusive
+ *
+ * Sets up a poll wait queue to use exclusive wakeups. This is useful to
+ * wake up only one waiter at each wakeup. Used to work-around "thundering herd"
+ * problem.
+ */
+void poll_wait_set_exclusive(poll_table *p)
+{
+	if (p)
+		init_poll_funcptr(p, __pollwait_exclusive);
+}
+EXPORT_SYMBOL(poll_wait_set_exclusive);
+
 static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p)
 {
 	struct poll_table_page *table = p->table;
@@ -213,8 +230,10 @@ static int pollwake(wait_queue_t *wait,
 }
 
 /* Add a new entry */
-static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
-				poll_table *p)
+static void __pollwait_common(struct file *filp,
+			      wait_queue_head_t *wait_address,
+			      poll_table *p,
+			      int exclusive)
 {
 	struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);
 	struct poll_table_entry *entry = poll_get_entry(pwq);
@@ -226,7 +245,23 @@ static void __pollwait(struct file *filp
 	entry->key = p->key;
 	init_waitqueue_func_entry(&entry->wait, pollwake);
 	entry->wait.private = pwq;
-	add_wait_queue(wait_address, &entry->wait);
+	if (!exclusive)
+		add_wait_queue(wait_address, &entry->wait);
+	else
+		add_wait_queue_exclusive(wait_address, &entry->wait);
+}
+
+static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
+				poll_table *p)
+{
+	__pollwait_common(filp, wait_address, p, 0);
+}
+
+static void __pollwait_exclusive(struct file *filp,
+				 wait_queue_head_t *wait_address,
+				 poll_table *p)
+{
+	__pollwait_common(filp, wait_address, p, 1);
 }
 
 int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
Index: linux.trees.git/include/linux/poll.h
===================================================================
--- linux.trees.git.orig/include/linux/poll.h	2010-07-09 15:59:00.000000000 -0400
+++ linux.trees.git/include/linux/poll.h	2010-07-09 16:03:24.000000000 -0400
@@ -79,6 +79,8 @@ static inline int poll_schedule(struct p
 	return poll_schedule_timeout(pwq, state, NULL, 0);
 }
 
+extern void poll_wait_set_exclusive(poll_table *p);
+
 /*
  * Scaleable version of the fd_set.
  */


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 06/20] prio_heap: heap_remove(), heap_maximum(), heap_replace() and heap_cherrypick()
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2010-07-09 22:57 ` [patch 05/20] Poll : add poll_wait_set_exclusive Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 07/20] kthread_kill_stop() Mathieu Desnoyers
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen, Paul Menage, Paul Jackson,
	David Rientjes, Nick Piggin, Peter Zijlstra, Balbir Singh,
	Cedric Le Goater, Eric W. Biederman, Serge Hallyn

[-- Attachment #1: prio-heap-remove-maximum-replace-cherrypick.patch --]
[-- Type: text/plain, Size: 6058 bytes --]

These added interfaces lets prio_heap users lookup the top of heap item without
performing any insertion, perform removal of the topmost heap entry, and also
replacement of topmost heap entry. This is useful if one need to use the result
of the lookup to determine if the current maximum should simply be removed or if
it should be replaced.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/prio_heap.h |   44 ++++++++++++++++++++++
 lib/prio_heap.c           |   91 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 116 insertions(+), 19 deletions(-)

Index: linux.trees.git/include/linux/prio_heap.h
===================================================================
--- linux.trees.git.orig/include/linux/prio_heap.h	2010-07-06 14:25:29.000000000 -0400
+++ linux.trees.git/include/linux/prio_heap.h	2010-07-07 10:04:33.000000000 -0400
@@ -23,6 +23,18 @@ struct ptr_heap {
 };
 
 /**
+ * heap_maximum - return the largest element in the heap
+ * @heap: the heap to be operated on
+ *
+ * Returns the largest element in the heap, without performing any modification
+ * to the heap structure. Returns NULL if the heap is empty.
+ */
+static inline void *heap_maximum(const struct ptr_heap *heap)
+{
+	return heap->size ? heap->ptrs[0] : NULL;
+}
+
+/**
  * heap_init - initialize an empty heap with a given memory size
  * @heap: the heap structure to be initialized
  * @size: amount of memory to use in bytes
@@ -53,6 +65,38 @@ void heap_free(struct ptr_heap *heap);
  */
 extern void *heap_insert(struct ptr_heap *heap, void *p);
 
+/**
+ * heap_remove - remove the largest element from the heap
+ * @heap: the heap to be operated on
+ *
+ * Returns the largest element in the heap. It removes this element from the
+ * heap. Returns NULL if the heap is empty.
+ */
+extern void *heap_remove(struct ptr_heap *heap);
 
+/**
+ * heap_cherrypick - remove a given element from the heap
+ * @heap: the heap to be operated on
+ * @p: the element
+ *
+ * Remove the given element from the heap. Return the element if present, else
+ * return NULL. This algorithm has a complexity of O(n), which is higher than
+ * O(log(n)) provided by the rest of this API.
+ */
+extern void *heap_cherrypick(struct ptr_heap *heap, void *p);
+
+/**
+ * heap_replace_max - replace the the largest element from the heap
+ * @heap: the heap to be operated on
+ * @p: the pointer to be inserted as topmost element replacement
+ *
+ * Returns the largest element in the heap. It removes this element from the
+ * heap. The heap is rebalanced only once after the insertion. Returns NULL if
+ * the heap is empty.
+ *
+ * This is the equivalent of calling heap_remove() and then heap_insert(), but
+ * it only rebalances the heap once.
+ */
+extern void *heap_replace_max(struct ptr_heap *heap, void *p);
 
 #endif /* _LINUX_PRIO_HEAP_H */
Index: linux.trees.git/lib/prio_heap.c
===================================================================
--- linux.trees.git.orig/lib/prio_heap.c	2010-07-06 14:25:29.000000000 -0400
+++ linux.trees.git/lib/prio_heap.c	2010-07-07 10:18:32.000000000 -0400
@@ -23,12 +23,49 @@ void heap_free(struct ptr_heap *heap)
 	kfree(heap->ptrs);
 }
 
-void *heap_insert(struct ptr_heap *heap, void *p)
+static void heapify(struct ptr_heap *heap, void **ptrs, void *p, int pos)
+{
+	while (1) {
+		int left = 2 * pos + 1;
+		int right = 2 * pos + 2;
+		int largest = pos;
+		if (left < heap->size && heap->gt(ptrs[left], p))
+			largest = left;
+		if (right < heap->size && heap->gt(ptrs[right], ptrs[largest]))
+			largest = right;
+		if (largest == pos)
+			break;
+		/* Push p down the heap one level and bump one up */
+		ptrs[pos] = ptrs[largest];
+		ptrs[largest] = p;
+		pos = largest;
+	}
+}
+
+void *heap_replace_max(struct ptr_heap *heap, void *p)
 {
 	void *res;
 	void **ptrs = heap->ptrs;
 	int pos;
 
+	if (!heap->size) {
+		ptrs[heap->size++] = p;
+		return NULL;
+	}
+
+	/* Replace the current max and heapify */
+	res = ptrs[0];
+	ptrs[0] = p;
+	pos = 0;
+	heapify(heap, ptrs, p, pos);
+	return res;
+}
+
+void *heap_insert(struct ptr_heap *heap, void *p)
+{
+	void **ptrs = heap->ptrs;
+	int pos;
+
 	if (heap->size < heap->max) {
 		/* Heap insertion */
 		pos = heap->size++;
@@ -47,24 +84,40 @@ void *heap_insert(struct ptr_heap *heap,
 		return p;
 
 	/* Replace the current max and heapify */
-	res = ptrs[0];
-	ptrs[0] = p;
-	pos = 0;
+	return heap_replace_max(heap, p);
+}
 
-	while (1) {
-		int left = 2 * pos + 1;
-		int right = 2 * pos + 2;
-		int largest = pos;
-		if (left < heap->size && heap->gt(ptrs[left], p))
-			largest = left;
-		if (right < heap->size && heap->gt(ptrs[right], ptrs[largest]))
-			largest = right;
-		if (largest == pos)
-			break;
-		/* Push p down the heap one level and bump one up */
-		ptrs[pos] = ptrs[largest];
-		ptrs[largest] = p;
-		pos = largest;
+void *heap_remove(struct ptr_heap *heap)
+{
+	void **ptrs = heap->ptrs;
+
+	switch (heap->size) {
+	case 0:
+		return NULL;
+	case 1:
+		return ptrs[--heap->size];
 	}
-	return res;
+
+	/* Shrink, replace the current max by previous last entry and heapify */
+	return heap_replace_max(heap, ptrs[--heap->size]);
+}
+
+void *heap_cherrypick(struct ptr_heap *heap, void *p)
+{
+	void **ptrs = heap->ptrs;
+	size_t pos, size = heap->size;
+
+	for (pos = 0; pos < size; pos++)
+		if (ptrs[pos] == p)
+			goto found;
+	return NULL;
+found:
+	if (heap->size == 1)
+		return ptrs[--heap->size];
+	/*
+	 * Replace p with previous last entry and heapify.
+	 */
+	ptrs[pos] = ptrs[--heap->size];
+	heapify(heap, ptrs, ptrs[pos], pos);
+	return p;
 }


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 07/20] kthread_kill_stop()
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2010-07-09 22:57 ` [patch 06/20] prio_heap: heap_remove(), heap_maximum(), heap_replace() and heap_cherrypick() Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 08/20] inline memcpy Mathieu Desnoyers
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: kthread_kill_stop.patch --]
[-- Type: text/plain, Size: 2513 bytes --]

Allow use of "interruptible" functions in kernel threads by creating this
kthread_stop_kill() variant. Instead of just waking up the thread, it also sends
a signal after setting the "must exit" variable to 1.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/kthread.h |    1 +
 kernel/kthread.c        |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

Index: linux.trees.git/include/linux/kthread.h
===================================================================
--- linux.trees.git.orig/include/linux/kthread.h	2010-06-18 18:37:11.000000000 -0400
+++ linux.trees.git/include/linux/kthread.h	2010-06-18 18:37:27.000000000 -0400
@@ -29,6 +29,7 @@ struct task_struct *kthread_create(int (
 
 void kthread_bind(struct task_struct *k, unsigned int cpu);
 int kthread_stop(struct task_struct *k);
+int kthread_kill_stop(struct task_struct *k, int signo);
 int kthread_should_stop(void);
 
 int kthreadd(void *unused);
Index: linux.trees.git/kernel/kthread.c
===================================================================
--- linux.trees.git.orig/kernel/kthread.c	2010-06-18 18:32:21.000000000 -0400
+++ linux.trees.git/kernel/kthread.c	2010-06-18 18:37:57.000000000 -0400
@@ -211,6 +211,46 @@ int kthread_stop(struct task_struct *k)
 }
 EXPORT_SYMBOL(kthread_stop);
 
+/**
+ * kthread_kill_stop - kill and stop a thread created by kthread_create().
+ * @k: thread created by kthread_create().
+ * @signo: signal number to send.
+ *
+ * Sets kthread_should_stop() for @k to return true, sends a signal, and
+ * waits for it to exit. This can also be called after kthread_create()
+ * instead of calling wake_up_process(): the thread will exit without
+ * calling threadfn().
+ *
+ * If threadfn() may call do_exit() itself, the caller must ensure
+ * task_struct can't go away.
+ *
+ * Returns the result of threadfn(), or %-EINTR if wake_up_process()
+ * was never called.
+ */
+int kthread_kill_stop(struct task_struct *k, int signo)
+{
+	struct kthread *kthread;
+	int ret;
+
+	trace_sched_kthread_stop(k);
+	get_task_struct(k);
+
+	kthread = to_kthread(k);
+	barrier(); /* it might have exited */
+	if (k->vfork_done != NULL) {
+		kthread->should_stop = 1;
+		force_sig(signo, k);
+		wait_for_completion(&kthread->exited);
+	}
+	ret = k->exit_code;
+
+	put_task_struct(k);
+	trace_sched_kthread_stop_ret(ret);
+
+	return ret;
+}
+EXPORT_SYMBOL(kthread_kill_stop);
+
 int kthreadd(void *unused)
 {
 	struct task_struct *tsk = current;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 08/20] inline memcpy
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2010-07-09 22:57 ` [patch 07/20] kthread_kill_stop() Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 09/20] x86 " Mathieu Desnoyers
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: inline-memcpy.patch --]
[-- Type: text/plain, Size: 816 bytes --]

Support _HAVE_ARCH_INLINE_MEMCPY. Start with a fall-back to memcpy().

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/string.h |    3 +++
 1 file changed, 3 insertions(+)

Index: linux.trees.git/include/linux/string.h
===================================================================
--- linux.trees.git.orig/include/linux/string.h	2010-06-21 18:23:20.000000000 -0400
+++ linux.trees.git/include/linux/string.h	2010-06-21 18:26:10.000000000 -0400
@@ -102,6 +102,9 @@ extern void * memset(void *,int,__kernel
 #ifndef __HAVE_ARCH_MEMCPY
 extern void * memcpy(void *,const void *,__kernel_size_t);
 #endif
+#ifndef __HAVE_ARCH_INLINE_MEMCPY
+#define inline_memcpy memcpy
+#endif
 #ifndef __HAVE_ARCH_MEMMOVE
 extern void * memmove(void *,const void *,__kernel_size_t);
 #endif


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 09/20] x86 inline memcpy
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2010-07-09 22:57 ` [patch 08/20] inline memcpy Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 10/20] Trace clock - build standalone Mathieu Desnoyers
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: x86-inline-memcpy.patch --]
[-- Type: text/plain, Size: 1672 bytes --]

Export an inline_memcpy() API. Useful when the memcpy size is unknown but the
caller cannot afford the cost of a function call. Useful for very frequent
memcpy callers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 arch/x86/include/asm/string_32.h |    7 +++++++
 arch/x86/include/asm/string_64.h |    7 +++++++
 2 files changed, 14 insertions(+)

Index: linux.trees.git/arch/x86/include/asm/string_32.h
===================================================================
--- linux.trees.git.orig/arch/x86/include/asm/string_32.h	2010-06-21 17:50:21.000000000 -0400
+++ linux.trees.git/arch/x86/include/asm/string_32.h	2010-06-21 17:56:49.000000000 -0400
@@ -44,6 +44,13 @@ static __always_inline void *__memcpy(vo
 	return to;
 }
 
+#define __HAVE_ARCH_INLINE_MEMCPY
+static __always_inline void *inline_memcpy(void *to, const void *from,
+					   size_t n)
+{
+	return __memcpy(to, from, n);
+}
+
 /*
  * This looks ugly, but the compiler can optimize it totally,
  * as the count is constant.
Index: linux.trees.git/arch/x86/include/asm/string_64.h
===================================================================
--- linux.trees.git.orig/arch/x86/include/asm/string_64.h	2010-06-21 17:50:21.000000000 -0400
+++ linux.trees.git/arch/x86/include/asm/string_64.h	2010-06-21 17:57:13.000000000 -0400
@@ -23,6 +23,13 @@ static __always_inline void *__inline_me
 	return to;
 }
 
+#define __HAVE_ARCH_INLINE_MEMCPY
+static __always_inline void *inline_memcpy(void *to, const void *from,
+					   size_t n)
+{
+	return __inline_memcpy(to, from, n);
+}
+
 /* Even with __builtin_ the compiler may decide to use the out of line
    function. */
 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 10/20] Trace clock - build standalone
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  2010-07-09 22:57 ` [patch 09/20] x86 " Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 11/20] Ftrace ring buffer renaming Mathieu Desnoyers
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: trace-clock-build-standalone.patch --]
[-- Type: text/plain, Size: 2341 bytes --]

Building the trace clock without CONFIG_TRACING enabled is useful for the ring
buffer library sample client.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 kernel/Makefile            |    1 +
 kernel/trace/Kconfig       |    3 +++
 kernel/trace/trace_clock.c |    3 +++
 3 files changed, 7 insertions(+)

Index: linux.trees.git/kernel/trace/trace_clock.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_clock.c	2010-06-28 08:15:25.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_clock.c	2010-06-28 08:32:08.000000000 -0400
@@ -44,6 +44,7 @@ u64 notrace trace_clock_local(void)
 
 	return clock;
 }
+EXPORT_SYMBOL_GPL(trace_clock_local);
 
 /*
  * trace_clock(): 'inbetween' trace clock. Not completely serialized,
@@ -57,6 +58,7 @@ u64 notrace trace_clock(void)
 {
 	return local_clock();
 }
+EXPORT_SYMBOL_GPL(trace_clock);
 
 
 /*
@@ -113,3 +115,4 @@ u64 notrace trace_clock_global(void)
 
 	return now;
 }
+EXPORT_SYMBOL_GPL(trace_clock_global);
Index: linux.trees.git/kernel/trace/Kconfig
===================================================================
--- linux.trees.git.orig/kernel/trace/Kconfig	2010-06-28 08:15:25.000000000 -0400
+++ linux.trees.git/kernel/trace/Kconfig	2010-06-28 08:31:35.000000000 -0400
@@ -73,6 +73,9 @@ config RING_BUFFER_ALLOW_SWAP
 	 Allow the use of ring_buffer_swap_cpu.
 	 Adds a very slight overhead to tracing when enabled.
 
+config TRACE_CLOCK_STANDALONE
+	bool
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
Index: linux.trees.git/kernel/Makefile
===================================================================
--- linux.trees.git.orig/kernel/Makefile	2010-06-28 08:15:25.000000000 -0400
+++ linux.trees.git/kernel/Makefile	2010-06-28 08:15:25.000000000 -0400
@@ -98,6 +98,7 @@ obj-$(CONFIG_FUNCTION_TRACER) += trace/
 obj-$(CONFIG_TRACING) += trace/
 obj-$(CONFIG_X86_DS) += trace/
 obj-$(CONFIG_RING_BUFFER) += trace/
+obj-$(CONFIG_TRACE_CLOCK_STANDALONE) += trace/
 obj-$(CONFIG_SMP) += sched_cpupri.o
 obj-$(CONFIG_SLOW_WORK) += slow-work.o
 obj-$(CONFIG_SLOW_WORK_DEBUG) += slow-work-debugfs.o


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 11/20] Ftrace ring buffer renaming
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (9 preceding siblings ...)
  2010-07-09 22:57 ` [patch 10/20] Trace clock - build standalone Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 12/20] ring buffer backend Mathieu Desnoyers
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ftrace-ring-buffer.patch --]
[-- Type: text/plain, Size: 317870 bytes --]

Rename ring_buffer_* to ftrace_ring_buffer_* everywhere. This is a first step in
the conversion of the ftrace ring buffer into a client of the generic ring
buffer library.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 drivers/oprofile/cpu_buffer.c               |   30 
 drivers/oprofile/cpu_buffer.h               |    2 
 include/linux/ftrace_event.h                |   24 
 include/linux/ftrace_ring_buffer.h          |  196 +
 include/linux/kernel.h                      |    2 
 include/linux/oprofile.h                    |    2 
 include/linux/ring_buffer.h                 |  196 -
 include/trace/ftrace.h                      |   12 
 kernel/trace/Kconfig                        |   12 
 kernel/trace/Makefile                       |    4 
 kernel/trace/blktrace.c                     |   12 
 kernel/trace/ftrace_ring_buffer.c           | 4022 ++++++++++++++++++++++++++++
 kernel/trace/ftrace_ring_buffer_benchmark.c |  488 +++
 kernel/trace/ring_buffer.c                  | 4022 ----------------------------
 kernel/trace/ring_buffer_benchmark.c        |  488 ---
 kernel/trace/trace.c                        |  262 -
 kernel/trace/trace.h                        |   30 
 kernel/trace/trace_branch.c                 |    8 
 kernel/trace/trace_events.c                 |   12 
 kernel/trace/trace_functions.c              |    2 
 kernel/trace/trace_functions_graph.c        |   30 
 kernel/trace/trace_kprobe.c                 |   12 
 kernel/trace/trace_ksym.c                   |    6 
 kernel/trace/trace_mmiotrace.c              |   14 
 kernel/trace/trace_sched_switch.c           |   14 
 kernel/trace/trace_selftest.c               |    8 
 kernel/trace/trace_syscalls.c               |   12 
 27 files changed, 4961 insertions(+), 4961 deletions(-)

Index: linux.trees.git/include/linux/ftrace_event.h
===================================================================
--- linux.trees.git.orig/include/linux/ftrace_event.h	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/include/linux/ftrace_event.h	2010-07-09 18:08:47.000000000 -0400
@@ -1,7 +1,7 @@
 #ifndef _LINUX_FTRACE_EVENT_H
 #define _LINUX_FTRACE_EVENT_H
 
-#include <linux/ring_buffer.h>
+#include <linux/ftrace_ring_buffer.h>
 #include <linux/trace_seq.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
@@ -55,7 +55,7 @@ struct trace_iterator {
 	void			*private;
 	int			cpu_file;
 	struct mutex		mutex;
-	struct ring_buffer_iter	*buffer_iter[NR_CPUS];
+	struct ftrace_ring_buffer_iter	*buffer_iter[NR_CPUS];
 	unsigned long		iter_flags;
 
 	/* The below is zeroed out in pipe_read */
@@ -106,18 +106,18 @@ enum print_line_t {
 void tracing_generic_entry_update(struct trace_entry *entry,
 				  unsigned long flags,
 				  int pc);
-struct ring_buffer_event *
-trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer,
+struct ftrace_ring_buffer_event *
+trace_current_buffer_lock_reserve(struct ftrace_ring_buffer **current_buffer,
 				  int type, unsigned long len,
 				  unsigned long flags, int pc);
-void trace_current_buffer_unlock_commit(struct ring_buffer *buffer,
-					struct ring_buffer_event *event,
+void trace_current_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+					struct ftrace_ring_buffer_event *event,
 					unsigned long flags, int pc);
-void trace_nowake_buffer_unlock_commit(struct ring_buffer *buffer,
-				       struct ring_buffer_event *event,
+void trace_nowake_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+				       struct ftrace_ring_buffer_event *event,
 					unsigned long flags, int pc);
-void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
-					 struct ring_buffer_event *event);
+void trace_current_buffer_discard_commit(struct ftrace_ring_buffer *buffer,
+					 struct ftrace_ring_buffer_event *event);
 
 void tracing_record_cmdline(struct task_struct *tsk);
 
@@ -199,10 +199,10 @@ struct ftrace_event_call {
 
 extern void destroy_preds(struct ftrace_event_call *call);
 extern int filter_match_preds(struct event_filter *filter, void *rec);
-extern int filter_current_check_discard(struct ring_buffer *buffer,
+extern int filter_current_check_discard(struct ftrace_ring_buffer *buffer,
 					struct ftrace_event_call *call,
 					void *rec,
-					struct ring_buffer_event *event);
+					struct ftrace_ring_buffer_event *event);
 
 enum {
 	FILTER_OTHER = 0,
Index: linux.trees.git/include/linux/ftrace_ring_buffer.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ftrace_ring_buffer.h	2010-07-09 18:08:47.000000000 -0400
@@ -0,0 +1,196 @@
+#ifndef _LINUX_FTRACE_RING_BUFFER_H
+#define _LINUX_FTRACE_RING_BUFFER_H
+
+#include <linux/kmemcheck.h>
+#include <linux/mm.h>
+#include <linux/seq_file.h>
+
+struct ftrace_ring_buffer;
+struct ftrace_ring_buffer_iter;
+
+/*
+ * Don't refer to this struct directly, use functions below.
+ */
+struct ftrace_ring_buffer_event {
+	kmemcheck_bitfield_begin(bitfield);
+	u32		type_len:5, time_delta:27;
+	kmemcheck_bitfield_end(bitfield);
+
+	u32		array[];
+};
+
+/**
+ * enum ftrace_ring_buffer_type - internal ring buffer types
+ *
+ * @RINGBUF_TYPE_PADDING:	Left over page padding or discarded event
+ *				 If time_delta is 0:
+ *				  array is ignored
+ *				  size is variable depending on how much
+ *				  padding is needed
+ *				 If time_delta is non zero:
+ *				  array[0] holds the actual length
+ *				  size = 4 + length (bytes)
+ *
+ * @RINGBUF_TYPE_TIME_EXTEND:	Extend the time delta
+ *				 array[0] = time delta (28 .. 59)
+ *				 size = 8 bytes
+ *
+ * @RINGBUF_TYPE_TIME_STAMP:	Sync time stamp with external clock
+ *				 array[0]    = tv_nsec
+ *				 array[1..2] = tv_sec
+ *				 size = 16 bytes
+ *
+ * <= @RINGBUF_TYPE_DATA_TYPE_LEN_MAX:
+ *				Data record
+ *				 If type_len is zero:
+ *				  array[0] holds the actual length
+ *				  array[1..(length+3)/4] holds data
+ *				  size = 4 + length (bytes)
+ *				 else
+ *				  length = type_len << 2
+ *				  array[0..(length+3)/4-1] holds data
+ *				  size = 4 + length (bytes)
+ */
+enum ftrace_ring_buffer_type {
+	RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
+	RINGBUF_TYPE_PADDING,
+	RINGBUF_TYPE_TIME_EXTEND,
+	/* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
+	RINGBUF_TYPE_TIME_STAMP,
+};
+
+unsigned ftrace_ring_buffer_event_length(struct ftrace_ring_buffer_event *event);
+void *ftrace_ring_buffer_event_data(struct ftrace_ring_buffer_event *event);
+
+/**
+ * ftrace_ring_buffer_event_time_delta - return the delta timestamp of the event
+ * @event: the event to get the delta timestamp of
+ *
+ * The delta timestamp is the 27 bit timestamp since the last event.
+ */
+static inline unsigned
+ftrace_ring_buffer_event_time_delta(struct ftrace_ring_buffer_event *event)
+{
+	return event->time_delta;
+}
+
+/*
+ * ftrace_ring_buffer_discard_commit will remove an event that has not
+ *   ben committed yet. If this is used, then ftrace_ring_buffer_unlock_commit
+ *   must not be called on the discarded event. This function
+ *   will try to remove the event from the ring buffer completely
+ *   if another event has not been written after it.
+ *
+ * Example use:
+ *
+ *  if (some_condition)
+ *    ftrace_ring_buffer_discard_commit(buffer, event);
+ *  else
+ *    ftrace_ring_buffer_unlock_commit(buffer, event);
+ */
+void ftrace_ring_buffer_discard_commit(struct ftrace_ring_buffer *buffer,
+				struct ftrace_ring_buffer_event *event);
+
+/*
+ * size is in bytes for each per CPU buffer.
+ */
+struct ftrace_ring_buffer *
+__ftrace_ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key);
+
+/*
+ * Because the ring buffer is generic, if other users of the ring buffer get
+ * traced by ftrace, it can produce lockdep warnings. We need to keep each
+ * ring buffer's lock class separate.
+ */
+#define ftrace_ring_buffer_alloc(size, flags)			\
+({							\
+	static struct lock_class_key __key;		\
+	__ftrace_ring_buffer_alloc((size), (flags), &__key);	\
+})
+
+void ftrace_ring_buffer_free(struct ftrace_ring_buffer *buffer);
+
+int ftrace_ring_buffer_resize(struct ftrace_ring_buffer *buffer, unsigned long size);
+
+struct ftrace_ring_buffer_event *ftrace_ring_buffer_lock_reserve(struct ftrace_ring_buffer *buffer,
+						   unsigned long length);
+int ftrace_ring_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+			      struct ftrace_ring_buffer_event *event);
+int ftrace_ring_buffer_write(struct ftrace_ring_buffer *buffer,
+		      unsigned long length, void *data);
+
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_peek(struct ftrace_ring_buffer *buffer, int cpu, u64 *ts,
+		 unsigned long *lost_events);
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_consume(struct ftrace_ring_buffer *buffer, int cpu, u64 *ts,
+		    unsigned long *lost_events);
+
+struct ftrace_ring_buffer_iter *
+ftrace_ring_buffer_read_prepare(struct ftrace_ring_buffer *buffer, int cpu);
+void ftrace_ring_buffer_read_prepare_sync(void);
+void ftrace_ring_buffer_read_start(struct ftrace_ring_buffer_iter *iter);
+void ftrace_ring_buffer_read_finish(struct ftrace_ring_buffer_iter *iter);
+
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_iter_peek(struct ftrace_ring_buffer_iter *iter, u64 *ts);
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_read(struct ftrace_ring_buffer_iter *iter, u64 *ts);
+void ftrace_ring_buffer_iter_reset(struct ftrace_ring_buffer_iter *iter);
+int ftrace_ring_buffer_iter_empty(struct ftrace_ring_buffer_iter *iter);
+
+unsigned long ftrace_ring_buffer_size(struct ftrace_ring_buffer *buffer);
+
+void ftrace_ring_buffer_reset_cpu(struct ftrace_ring_buffer *buffer, int cpu);
+void ftrace_ring_buffer_reset(struct ftrace_ring_buffer *buffer);
+
+#ifdef CONFIG_FTRACE_RING_BUFFER_ALLOW_SWAP
+int ftrace_ring_buffer_swap_cpu(struct ftrace_ring_buffer *buffer_a,
+			 struct ftrace_ring_buffer *buffer_b, int cpu);
+#else
+static inline int
+ftrace_ring_buffer_swap_cpu(struct ftrace_ring_buffer *buffer_a,
+		     struct ftrace_ring_buffer *buffer_b, int cpu)
+{
+	return -ENODEV;
+}
+#endif
+
+int ftrace_ring_buffer_empty(struct ftrace_ring_buffer *buffer);
+int ftrace_ring_buffer_empty_cpu(struct ftrace_ring_buffer *buffer, int cpu);
+
+void ftrace_ring_buffer_record_disable(struct ftrace_ring_buffer *buffer);
+void ftrace_ring_buffer_record_enable(struct ftrace_ring_buffer *buffer);
+void ftrace_ring_buffer_record_disable_cpu(struct ftrace_ring_buffer *buffer, int cpu);
+void ftrace_ring_buffer_record_enable_cpu(struct ftrace_ring_buffer *buffer, int cpu);
+
+unsigned long ftrace_ring_buffer_entries(struct ftrace_ring_buffer *buffer);
+unsigned long ftrace_ring_buffer_overruns(struct ftrace_ring_buffer *buffer);
+unsigned long ftrace_ring_buffer_entries_cpu(struct ftrace_ring_buffer *buffer, int cpu);
+unsigned long ftrace_ring_buffer_overrun_cpu(struct ftrace_ring_buffer *buffer, int cpu);
+unsigned long ftrace_ring_buffer_commit_overrun_cpu(struct ftrace_ring_buffer *buffer, int cpu);
+
+u64 ftrace_ring_buffer_time_stamp(struct ftrace_ring_buffer *buffer, int cpu);
+void ftrace_ring_buffer_normalize_time_stamp(struct ftrace_ring_buffer *buffer,
+				      int cpu, u64 *ts);
+void ftrace_ring_buffer_set_clock(struct ftrace_ring_buffer *buffer,
+			   u64 (*clock)(void));
+
+size_t ftrace_ring_buffer_page_len(void *page);
+
+
+void *ftrace_ring_buffer_alloc_read_page(struct ftrace_ring_buffer *buffer);
+void ftrace_ring_buffer_free_read_page(struct ftrace_ring_buffer *buffer, void *data);
+int ftrace_ring_buffer_read_page(struct ftrace_ring_buffer *buffer, void **data_page,
+			  size_t len, int cpu, int full);
+
+struct trace_seq;
+
+int ftrace_ring_buffer_print_entry_header(struct trace_seq *s);
+int ftrace_ring_buffer_print_page_header(struct trace_seq *s);
+
+enum ftrace_ring_buffer_flags {
+	RB_FL_OVERWRITE		= 1 << 0,
+};
+
+#endif /* _LINUX_FTRACE_RING_BUFFER_H */
Index: linux.trees.git/include/linux/ring_buffer.h
===================================================================
--- linux.trees.git.orig/include/linux/ring_buffer.h	2010-07-09 18:08:14.000000000 -0400
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,196 +0,0 @@
-#ifndef _LINUX_RING_BUFFER_H
-#define _LINUX_RING_BUFFER_H
-
-#include <linux/kmemcheck.h>
-#include <linux/mm.h>
-#include <linux/seq_file.h>
-
-struct ring_buffer;
-struct ring_buffer_iter;
-
-/*
- * Don't refer to this struct directly, use functions below.
- */
-struct ring_buffer_event {
-	kmemcheck_bitfield_begin(bitfield);
-	u32		type_len:5, time_delta:27;
-	kmemcheck_bitfield_end(bitfield);
-
-	u32		array[];
-};
-
-/**
- * enum ring_buffer_type - internal ring buffer types
- *
- * @RINGBUF_TYPE_PADDING:	Left over page padding or discarded event
- *				 If time_delta is 0:
- *				  array is ignored
- *				  size is variable depending on how much
- *				  padding is needed
- *				 If time_delta is non zero:
- *				  array[0] holds the actual length
- *				  size = 4 + length (bytes)
- *
- * @RINGBUF_TYPE_TIME_EXTEND:	Extend the time delta
- *				 array[0] = time delta (28 .. 59)
- *				 size = 8 bytes
- *
- * @RINGBUF_TYPE_TIME_STAMP:	Sync time stamp with external clock
- *				 array[0]    = tv_nsec
- *				 array[1..2] = tv_sec
- *				 size = 16 bytes
- *
- * <= @RINGBUF_TYPE_DATA_TYPE_LEN_MAX:
- *				Data record
- *				 If type_len is zero:
- *				  array[0] holds the actual length
- *				  array[1..(length+3)/4] holds data
- *				  size = 4 + length (bytes)
- *				 else
- *				  length = type_len << 2
- *				  array[0..(length+3)/4-1] holds data
- *				  size = 4 + length (bytes)
- */
-enum ring_buffer_type {
-	RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
-	RINGBUF_TYPE_PADDING,
-	RINGBUF_TYPE_TIME_EXTEND,
-	/* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
-	RINGBUF_TYPE_TIME_STAMP,
-};
-
-unsigned ring_buffer_event_length(struct ring_buffer_event *event);
-void *ring_buffer_event_data(struct ring_buffer_event *event);
-
-/**
- * ring_buffer_event_time_delta - return the delta timestamp of the event
- * @event: the event to get the delta timestamp of
- *
- * The delta timestamp is the 27 bit timestamp since the last event.
- */
-static inline unsigned
-ring_buffer_event_time_delta(struct ring_buffer_event *event)
-{
-	return event->time_delta;
-}
-
-/*
- * ring_buffer_discard_commit will remove an event that has not
- *   ben committed yet. If this is used, then ring_buffer_unlock_commit
- *   must not be called on the discarded event. This function
- *   will try to remove the event from the ring buffer completely
- *   if another event has not been written after it.
- *
- * Example use:
- *
- *  if (some_condition)
- *    ring_buffer_discard_commit(buffer, event);
- *  else
- *    ring_buffer_unlock_commit(buffer, event);
- */
-void ring_buffer_discard_commit(struct ring_buffer *buffer,
-				struct ring_buffer_event *event);
-
-/*
- * size is in bytes for each per CPU buffer.
- */
-struct ring_buffer *
-__ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key);
-
-/*
- * Because the ring buffer is generic, if other users of the ring buffer get
- * traced by ftrace, it can produce lockdep warnings. We need to keep each
- * ring buffer's lock class separate.
- */
-#define ring_buffer_alloc(size, flags)			\
-({							\
-	static struct lock_class_key __key;		\
-	__ring_buffer_alloc((size), (flags), &__key);	\
-})
-
-void ring_buffer_free(struct ring_buffer *buffer);
-
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
-
-struct ring_buffer_event *ring_buffer_lock_reserve(struct ring_buffer *buffer,
-						   unsigned long length);
-int ring_buffer_unlock_commit(struct ring_buffer *buffer,
-			      struct ring_buffer_event *event);
-int ring_buffer_write(struct ring_buffer *buffer,
-		      unsigned long length, void *data);
-
-struct ring_buffer_event *
-ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts,
-		 unsigned long *lost_events);
-struct ring_buffer_event *
-ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts,
-		    unsigned long *lost_events);
-
-struct ring_buffer_iter *
-ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu);
-void ring_buffer_read_prepare_sync(void);
-void ring_buffer_read_start(struct ring_buffer_iter *iter);
-void ring_buffer_read_finish(struct ring_buffer_iter *iter);
-
-struct ring_buffer_event *
-ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts);
-struct ring_buffer_event *
-ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
-void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
-int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
-
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
-
-void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
-void ring_buffer_reset(struct ring_buffer *buffer);
-
-#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
-int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
-			 struct ring_buffer *buffer_b, int cpu);
-#else
-static inline int
-ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
-		     struct ring_buffer *buffer_b, int cpu)
-{
-	return -ENODEV;
-}
-#endif
-
-int ring_buffer_empty(struct ring_buffer *buffer);
-int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu);
-
-void ring_buffer_record_disable(struct ring_buffer *buffer);
-void ring_buffer_record_enable(struct ring_buffer *buffer);
-void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
-void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
-
-unsigned long ring_buffer_entries(struct ring_buffer *buffer);
-unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
-unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
-unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
-unsigned long ring_buffer_commit_overrun_cpu(struct ring_buffer *buffer, int cpu);
-
-u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu);
-void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
-				      int cpu, u64 *ts);
-void ring_buffer_set_clock(struct ring_buffer *buffer,
-			   u64 (*clock)(void));
-
-size_t ring_buffer_page_len(void *page);
-
-
-void *ring_buffer_alloc_read_page(struct ring_buffer *buffer);
-void ring_buffer_free_read_page(struct ring_buffer *buffer, void *data);
-int ring_buffer_read_page(struct ring_buffer *buffer, void **data_page,
-			  size_t len, int cpu, int full);
-
-struct trace_seq;
-
-int ring_buffer_print_entry_header(struct trace_seq *s);
-int ring_buffer_print_page_header(struct trace_seq *s);
-
-enum ring_buffer_flags {
-	RB_FL_OVERWRITE		= 1 << 0,
-};
-
-#endif /* _LINUX_RING_BUFFER_H */
Index: linux.trees.git/kernel/trace/Kconfig
===================================================================
--- linux.trees.git.orig/kernel/trace/Kconfig	2010-07-09 18:08:46.000000000 -0400
+++ linux.trees.git/kernel/trace/Kconfig	2010-07-09 18:08:47.000000000 -0400
@@ -52,7 +52,7 @@ config HAVE_SYSCALL_TRACEPOINTS
 config TRACER_MAX_TRACE
 	bool
 
-config RING_BUFFER
+config FTRACE_RING_BUFFER
 	bool
 
 config FTRACE_NMI_ENTER
@@ -67,7 +67,7 @@ config EVENT_TRACING
 config CONTEXT_SWITCH_TRACER
 	bool
 
-config RING_BUFFER_ALLOW_SWAP
+config FTRACE_RING_BUFFER_ALLOW_SWAP
 	bool
 	help
 	 Allow the use of ring_buffer_swap_cpu.
@@ -86,7 +86,7 @@ config TRACE_CLOCK_STANDALONE
 config TRACING
 	bool
 	select DEBUG_FS
-	select RING_BUFFER
+	select FTRACE_RING_BUFFER
 	select STACKTRACE if STACKTRACE_SUPPORT
 	select TRACEPOINTS
 	select NOP_TRACER
@@ -160,7 +160,7 @@ config IRQSOFF_TRACER
 	select TRACE_IRQFLAGS
 	select GENERIC_TRACER
 	select TRACER_MAX_TRACE
-	select RING_BUFFER_ALLOW_SWAP
+	select FTRACE_RING_BUFFER_ALLOW_SWAP
 	help
 	  This option measures the time spent in irqs-off critical
 	  sections, with microsecond accuracy.
@@ -182,7 +182,7 @@ config PREEMPT_TRACER
 	depends on PREEMPT
 	select GENERIC_TRACER
 	select TRACER_MAX_TRACE
-	select RING_BUFFER_ALLOW_SWAP
+	select FTRACE_RING_BUFFER_ALLOW_SWAP
 	help
 	  This option measures the time spent in preemption-off critical
 	  sections, with microsecond accuracy.
@@ -498,7 +498,7 @@ config MMIOTRACE_TEST
 
 config RING_BUFFER_BENCHMARK
 	tristate "Ring buffer benchmark stress tester"
-	depends on RING_BUFFER
+	depends on FTRACE_RING_BUFFER
 	help
 	  This option creates a test to stress the ring buffer and benchmark it.
 	  It creates its own ring buffer such that it will not interfere with
Index: linux.trees.git/kernel/trace/Makefile
===================================================================
--- linux.trees.git.orig/kernel/trace/Makefile	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/Makefile	2010-07-09 18:08:47.000000000 -0400
@@ -22,8 +22,8 @@ endif
 obj-y += trace_clock.o
 
 obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
-obj-$(CONFIG_RING_BUFFER) += ring_buffer.o
-obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o
+obj-$(CONFIG_FTRACE_RING_BUFFER) += ftrace_ring_buffer.o
+obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ftrace_ring_buffer_benchmark.o
 
 obj-$(CONFIG_TRACING) += trace.o
 obj-$(CONFIG_TRACING) += trace_output.o
Index: linux.trees.git/kernel/trace/ftrace_ring_buffer.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/ftrace_ring_buffer.c	2010-07-09 18:08:47.000000000 -0400
@@ -0,0 +1,4022 @@
+/*
+ * Generic ring buffer
+ *
+ * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com>
+ */
+#include <linux/ftrace_ring_buffer.h>
+#include <linux/trace_clock.h>
+#include <linux/ftrace_irq.h>
+#include <linux/spinlock.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+#include <linux/hardirq.h>
+#include <linux/kmemcheck.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/hash.h>
+#include <linux/list.h>
+#include <linux/cpu.h>
+#include <linux/fs.h>
+
+#include <asm/local.h>
+#include "trace.h"
+
+/*
+ * The ring buffer header is special. We must manually up keep it.
+ */
+int ftrace_ring_buffer_print_entry_header(struct trace_seq *s)
+{
+	int ret;
+
+	ret = trace_seq_printf(s, "# compressed entry header\n");
+	ret = trace_seq_printf(s, "\ttype_len    :    5 bits\n");
+	ret = trace_seq_printf(s, "\ttime_delta  :   27 bits\n");
+	ret = trace_seq_printf(s, "\tarray       :   32 bits\n");
+	ret = trace_seq_printf(s, "\n");
+	ret = trace_seq_printf(s, "\tpadding     : type == %d\n",
+			       RINGBUF_TYPE_PADDING);
+	ret = trace_seq_printf(s, "\ttime_extend : type == %d\n",
+			       RINGBUF_TYPE_TIME_EXTEND);
+	ret = trace_seq_printf(s, "\tdata max type_len  == %d\n",
+			       RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
+
+	return ret;
+}
+
+/*
+ * The ring buffer is made up of a list of pages. A separate list of pages is
+ * allocated for each CPU. A writer may only write to a buffer that is
+ * associated with the CPU it is currently executing on.  A reader may read
+ * from any per cpu buffer.
+ *
+ * The reader is special. For each per cpu buffer, the reader has its own
+ * reader page. When a reader has read the entire reader page, this reader
+ * page is swapped with another page in the ring buffer.
+ *
+ * Now, as long as the writer is off the reader page, the reader can do what
+ * ever it wants with that page. The writer will never write to that page
+ * again (as long as it is out of the ring buffer).
+ *
+ * Here's some silly ASCII art.
+ *
+ *   +------+
+ *   |reader|          RING BUFFER
+ *   |page  |
+ *   +------+        +---+   +---+   +---+
+ *                   |   |-->|   |-->|   |
+ *                   +---+   +---+   +---+
+ *                     ^               |
+ *                     |               |
+ *                     +---------------+
+ *
+ *
+ *   +------+
+ *   |reader|          RING BUFFER
+ *   |page  |------------------v
+ *   +------+        +---+   +---+   +---+
+ *                   |   |-->|   |-->|   |
+ *                   +---+   +---+   +---+
+ *                     ^               |
+ *                     |               |
+ *                     +---------------+
+ *
+ *
+ *   +------+
+ *   |reader|          RING BUFFER
+ *   |page  |------------------v
+ *   +------+        +---+   +---+   +---+
+ *      ^            |   |-->|   |-->|   |
+ *      |            +---+   +---+   +---+
+ *      |                              |
+ *      |                              |
+ *      +------------------------------+
+ *
+ *
+ *   +------+
+ *   |buffer|          RING BUFFER
+ *   |page  |------------------v
+ *   +------+        +---+   +---+   +---+
+ *      ^            |   |   |   |-->|   |
+ *      |   New      +---+   +---+   +---+
+ *      |  Reader------^               |
+ *      |   page                       |
+ *      +------------------------------+
+ *
+ *
+ * After we make this swap, the reader can hand this page off to the splice
+ * code and be done with it. It can even allocate a new page if it needs to
+ * and swap that into the ring buffer.
+ *
+ * We will be using cmpxchg soon to make all this lockless.
+ *
+ */
+
+/*
+ * A fast way to enable or disable all ring buffers is to
+ * call tracing_on or tracing_off. Turning off the ring buffers
+ * prevents all ring buffers from being recorded to.
+ * Turning this switch on, makes it OK to write to the
+ * ring buffer, if the ring buffer is enabled itself.
+ *
+ * There's three layers that must be on in order to write
+ * to the ring buffer.
+ *
+ * 1) This global flag must be set.
+ * 2) The ring buffer must be enabled for recording.
+ * 3) The per cpu buffer must be enabled for recording.
+ *
+ * In case of an anomaly, this global flag has a bit set that
+ * will permantly disable all ring buffers.
+ */
+
+/*
+ * Global flag to disable all recording to ring buffers
+ *  This has two bits: ON, DISABLED
+ *
+ *  ON   DISABLED
+ * ---- ----------
+ *   0      0        : ring buffers are off
+ *   1      0        : ring buffers are on
+ *   X      1        : ring buffers are permanently disabled
+ */
+
+enum {
+	RB_BUFFERS_ON_BIT	= 0,
+	RB_BUFFERS_DISABLED_BIT	= 1,
+};
+
+enum {
+	RB_BUFFERS_ON		= 1 << RB_BUFFERS_ON_BIT,
+	RB_BUFFERS_DISABLED	= 1 << RB_BUFFERS_DISABLED_BIT,
+};
+
+static unsigned long ftrace_ring_buffer_flags __read_mostly = RB_BUFFERS_ON;
+
+#define BUF_PAGE_HDR_SIZE offsetof(struct buffer_data_page, data)
+
+/**
+ * tracing_on - enable all tracing buffers
+ *
+ * This function enables all tracing buffers that may have been
+ * disabled with tracing_off.
+ */
+void tracing_on(void)
+{
+	set_bit(RB_BUFFERS_ON_BIT, &ftrace_ring_buffer_flags);
+}
+EXPORT_SYMBOL_GPL(tracing_on);
+
+/**
+ * tracing_off - turn off all tracing buffers
+ *
+ * This function stops all tracing buffers from recording data.
+ * It does not disable any overhead the tracers themselves may
+ * be causing. This function simply causes all recording to
+ * the ring buffers to fail.
+ */
+void tracing_off(void)
+{
+	clear_bit(RB_BUFFERS_ON_BIT, &ftrace_ring_buffer_flags);
+}
+EXPORT_SYMBOL_GPL(tracing_off);
+
+/**
+ * tracing_off_permanent - permanently disable ring buffers
+ *
+ * This function, once called, will disable all ring buffers
+ * permanently.
+ */
+void tracing_off_permanent(void)
+{
+	set_bit(RB_BUFFERS_DISABLED_BIT, &ftrace_ring_buffer_flags);
+}
+
+/**
+ * tracing_is_on - show state of ring buffers enabled
+ */
+int tracing_is_on(void)
+{
+	return ftrace_ring_buffer_flags == RB_BUFFERS_ON;
+}
+EXPORT_SYMBOL_GPL(tracing_is_on);
+
+#define RB_EVNT_HDR_SIZE (offsetof(struct ftrace_ring_buffer_event, array))
+#define RB_ALIGNMENT		4U
+#define RB_MAX_SMALL_DATA	(RB_ALIGNMENT * RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
+#define RB_EVNT_MIN_SIZE	8U	/* two 32bit words */
+
+#if !defined(CONFIG_64BIT) || defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+# define RB_FORCE_8BYTE_ALIGNMENT	0
+# define RB_ARCH_ALIGNMENT		RB_ALIGNMENT
+#else
+# define RB_FORCE_8BYTE_ALIGNMENT	1
+# define RB_ARCH_ALIGNMENT		8U
+#endif
+
+/* define RINGBUF_TYPE_DATA for 'case RINGBUF_TYPE_DATA:' */
+#define RINGBUF_TYPE_DATA 0 ... RINGBUF_TYPE_DATA_TYPE_LEN_MAX
+
+enum {
+	RB_LEN_TIME_EXTEND = 8,
+	RB_LEN_TIME_STAMP = 16,
+};
+
+static inline int rb_null_event(struct ftrace_ring_buffer_event *event)
+{
+	return event->type_len == RINGBUF_TYPE_PADDING && !event->time_delta;
+}
+
+static void rb_event_set_padding(struct ftrace_ring_buffer_event *event)
+{
+	/* padding has a NULL time_delta */
+	event->type_len = RINGBUF_TYPE_PADDING;
+	event->time_delta = 0;
+}
+
+static unsigned
+rb_event_data_length(struct ftrace_ring_buffer_event *event)
+{
+	unsigned length;
+
+	if (event->type_len)
+		length = event->type_len * RB_ALIGNMENT;
+	else
+		length = event->array[0];
+	return length + RB_EVNT_HDR_SIZE;
+}
+
+/* inline for ring buffer fast paths */
+static unsigned
+rb_event_length(struct ftrace_ring_buffer_event *event)
+{
+	switch (event->type_len) {
+	case RINGBUF_TYPE_PADDING:
+		if (rb_null_event(event))
+			/* undefined */
+			return -1;
+		return  event->array[0] + RB_EVNT_HDR_SIZE;
+
+	case RINGBUF_TYPE_TIME_EXTEND:
+		return RB_LEN_TIME_EXTEND;
+
+	case RINGBUF_TYPE_TIME_STAMP:
+		return RB_LEN_TIME_STAMP;
+
+	case RINGBUF_TYPE_DATA:
+		return rb_event_data_length(event);
+	default:
+		BUG();
+	}
+	/* not hit */
+	return 0;
+}
+
+/**
+ * ftrace_ring_buffer_event_length - return the length of the event
+ * @event: the event to get the length of
+ */
+unsigned ftrace_ring_buffer_event_length(struct ftrace_ring_buffer_event *event)
+{
+	unsigned length = rb_event_length(event);
+	if (event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
+		return length;
+	length -= RB_EVNT_HDR_SIZE;
+	if (length > RB_MAX_SMALL_DATA + sizeof(event->array[0]))
+                length -= sizeof(event->array[0]);
+	return length;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_event_length);
+
+/* inline for ring buffer fast paths */
+static void *
+rb_event_data(struct ftrace_ring_buffer_event *event)
+{
+	BUG_ON(event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
+	/* If length is in len field, then array[0] has the data */
+	if (event->type_len)
+		return (void *)&event->array[0];
+	/* Otherwise length is in array[0] and array[1] has the data */
+	return (void *)&event->array[1];
+}
+
+/**
+ * ftrace_ring_buffer_event_data - return the data of the event
+ * @event: the event to get the data from
+ */
+void *ftrace_ring_buffer_event_data(struct ftrace_ring_buffer_event *event)
+{
+	return rb_event_data(event);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_event_data);
+
+#define for_each_buffer_cpu(buffer, cpu)		\
+	for_each_cpu(cpu, buffer->cpumask)
+
+#define TS_SHIFT	27
+#define TS_MASK		((1ULL << TS_SHIFT) - 1)
+#define TS_DELTA_TEST	(~TS_MASK)
+
+/* Flag when events were overwritten */
+#define RB_MISSED_EVENTS	(1 << 31)
+/* Missed count stored at end */
+#define RB_MISSED_STORED	(1 << 30)
+
+struct buffer_data_page {
+	u64		 time_stamp;	/* page time stamp */
+	local_t		 commit;	/* write committed index */
+	unsigned char	 data[];	/* data of buffer page */
+};
+
+/*
+ * Note, the buffer_page list must be first. The buffer pages
+ * are allocated in cache lines, which means that each buffer
+ * page will be at the beginning of a cache line, and thus
+ * the least significant bits will be zero. We use this to
+ * add flags in the list struct pointers, to make the ring buffer
+ * lockless.
+ */
+struct buffer_page {
+	struct list_head list;		/* list of buffer pages */
+	local_t		 write;		/* index for next write */
+	unsigned	 read;		/* index for next read */
+	local_t		 entries;	/* entries on this page */
+	unsigned long	 real_end;	/* real end of data */
+	struct buffer_data_page *page;	/* Actual data page */
+};
+
+/*
+ * The buffer page counters, write and entries, must be reset
+ * atomically when crossing page boundaries. To synchronize this
+ * update, two counters are inserted into the number. One is
+ * the actual counter for the write position or count on the page.
+ *
+ * The other is a counter of updaters. Before an update happens
+ * the update partition of the counter is incremented. This will
+ * allow the updater to update the counter atomically.
+ *
+ * The counter is 20 bits, and the state data is 12.
+ */
+#define RB_WRITE_MASK		0xfffff
+#define RB_WRITE_INTCNT		(1 << 20)
+
+static void rb_init_page(struct buffer_data_page *bpage)
+{
+	local_set(&bpage->commit, 0);
+}
+
+/**
+ * ftrace_ring_buffer_page_len - the size of data on the page.
+ * @page: The page to read
+ *
+ * Returns the amount of data on the page, including buffer page header.
+ */
+size_t ftrace_ring_buffer_page_len(void *page)
+{
+	return local_read(&((struct buffer_data_page *)page)->commit)
+		+ BUF_PAGE_HDR_SIZE;
+}
+
+/*
+ * Also stolen from mm/slob.c. Thanks to Mathieu Desnoyers for pointing
+ * this issue out.
+ */
+static void free_buffer_page(struct buffer_page *bpage)
+{
+	free_page((unsigned long)bpage->page);
+	kfree(bpage);
+}
+
+/*
+ * We need to fit the time_stamp delta into 27 bits.
+ */
+static inline int test_time_stamp(u64 delta)
+{
+	if (delta & TS_DELTA_TEST)
+		return 1;
+	return 0;
+}
+
+#define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE)
+
+/* Max payload is BUF_PAGE_SIZE - header (8bytes) */
+#define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
+
+/* Max number of timestamps that can fit on a page */
+#define RB_TIMESTAMPS_PER_PAGE	(BUF_PAGE_SIZE / RB_LEN_TIME_STAMP)
+
+int ftrace_ring_buffer_print_page_header(struct trace_seq *s)
+{
+	struct buffer_data_page field;
+	int ret;
+
+	ret = trace_seq_printf(s, "\tfield: u64 timestamp;\t"
+			       "offset:0;\tsize:%u;\tsigned:%u;\n",
+			       (unsigned int)sizeof(field.time_stamp),
+			       (unsigned int)is_signed_type(u64));
+
+	ret = trace_seq_printf(s, "\tfield: local_t commit;\t"
+			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
+			       (unsigned int)offsetof(typeof(field), commit),
+			       (unsigned int)sizeof(field.commit),
+			       (unsigned int)is_signed_type(long));
+
+	ret = trace_seq_printf(s, "\tfield: int overwrite;\t"
+			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
+			       (unsigned int)offsetof(typeof(field), commit),
+			       1,
+			       (unsigned int)is_signed_type(long));
+
+	ret = trace_seq_printf(s, "\tfield: char data;\t"
+			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
+			       (unsigned int)offsetof(typeof(field), data),
+			       (unsigned int)BUF_PAGE_SIZE,
+			       (unsigned int)is_signed_type(char));
+
+	return ret;
+}
+
+/*
+ * head_page == tail_page && head == tail then buffer is empty.
+ */
+struct ftrace_ring_buffer_per_cpu {
+	int				cpu;
+	struct ftrace_ring_buffer		*buffer;
+	spinlock_t			reader_lock;	/* serialize readers */
+	arch_spinlock_t			lock;
+	struct lock_class_key		lock_key;
+	struct list_head		*pages;
+	struct buffer_page		*head_page;	/* read from head */
+	struct buffer_page		*tail_page;	/* write to tail */
+	struct buffer_page		*commit_page;	/* committed pages */
+	struct buffer_page		*reader_page;
+	unsigned long			lost_events;
+	unsigned long			last_overrun;
+	local_t				commit_overrun;
+	local_t				overrun;
+	local_t				entries;
+	local_t				committing;
+	local_t				commits;
+	unsigned long			read;
+	u64				write_stamp;
+	u64				read_stamp;
+	atomic_t			record_disabled;
+};
+
+struct ftrace_ring_buffer {
+	unsigned			pages;
+	unsigned			flags;
+	int				cpus;
+	atomic_t			record_disabled;
+	cpumask_var_t			cpumask;
+
+	struct lock_class_key		*reader_lock_key;
+
+	struct mutex			mutex;
+
+	struct ftrace_ring_buffer_per_cpu	**buffers;
+
+#ifdef CONFIG_HOTPLUG_CPU
+	struct notifier_block		cpu_notify;
+#endif
+	u64				(*clock)(void);
+};
+
+struct ftrace_ring_buffer_iter {
+	struct ftrace_ring_buffer_per_cpu	*cpu_buffer;
+	unsigned long			head;
+	struct buffer_page		*head_page;
+	struct buffer_page		*cache_reader_page;
+	unsigned long			cache_read;
+	u64				read_stamp;
+};
+
+/* buffer may be either ftrace_ring_buffer or ftrace_ring_buffer_per_cpu */
+#define RB_WARN_ON(b, cond)						\
+	({								\
+		int _____ret = unlikely(cond);				\
+		if (_____ret) {						\
+			if (__same_type(*(b), struct ftrace_ring_buffer_per_cpu)) { \
+				struct ftrace_ring_buffer_per_cpu *__b =	\
+					(void *)b;			\
+				atomic_inc(&__b->buffer->record_disabled); \
+			} else						\
+				atomic_inc(&b->record_disabled);	\
+			WARN_ON(1);					\
+		}							\
+		_____ret;						\
+	})
+
+/* Up this if you want to test the TIME_EXTENTS and normalization */
+#define DEBUG_SHIFT 0
+
+static inline u64 rb_time_stamp(struct ftrace_ring_buffer *buffer)
+{
+	/* shift to debug/test normalization and TIME_EXTENTS */
+	return buffer->clock() << DEBUG_SHIFT;
+}
+
+u64 ftrace_ring_buffer_time_stamp(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	u64 time;
+
+	preempt_disable_notrace();
+	time = rb_time_stamp(buffer);
+	preempt_enable_no_resched_notrace();
+
+	return time;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_time_stamp);
+
+void ftrace_ring_buffer_normalize_time_stamp(struct ftrace_ring_buffer *buffer,
+				      int cpu, u64 *ts)
+{
+	/* Just stupid testing the normalize function and deltas */
+	*ts >>= DEBUG_SHIFT;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_normalize_time_stamp);
+
+/*
+ * Making the ring buffer lockless makes things tricky.
+ * Although writes only happen on the CPU that they are on,
+ * and they only need to worry about interrupts. Reads can
+ * happen on any CPU.
+ *
+ * The reader page is always off the ring buffer, but when the
+ * reader finishes with a page, it needs to swap its page with
+ * a new one from the buffer. The reader needs to take from
+ * the head (writes go to the tail). But if a writer is in overwrite
+ * mode and wraps, it must push the head page forward.
+ *
+ * Here lies the problem.
+ *
+ * The reader must be careful to replace only the head page, and
+ * not another one. As described at the top of the file in the
+ * ASCII art, the reader sets its old page to point to the next
+ * page after head. It then sets the page after head to point to
+ * the old reader page. But if the writer moves the head page
+ * during this operation, the reader could end up with the tail.
+ *
+ * We use cmpxchg to help prevent this race. We also do something
+ * special with the page before head. We set the LSB to 1.
+ *
+ * When the writer must push the page forward, it will clear the
+ * bit that points to the head page, move the head, and then set
+ * the bit that points to the new head page.
+ *
+ * We also don't want an interrupt coming in and moving the head
+ * page on another writer. Thus we use the second LSB to catch
+ * that too. Thus:
+ *
+ * head->list->prev->next        bit 1          bit 0
+ *                              -------        -------
+ * Normal page                     0              0
+ * Points to head page             0              1
+ * New head page                   1              0
+ *
+ * Note we can not trust the prev pointer of the head page, because:
+ *
+ * +----+       +-----+        +-----+
+ * |    |------>|  T  |---X--->|  N  |
+ * |    |<------|     |        |     |
+ * +----+       +-----+        +-----+
+ *   ^                           ^ |
+ *   |          +-----+          | |
+ *   +----------|  R  |----------+ |
+ *              |     |<-----------+
+ *              +-----+
+ *
+ * Key:  ---X-->  HEAD flag set in pointer
+ *         T      Tail page
+ *         R      Reader page
+ *         N      Next page
+ *
+ * (see __rb_reserve_next() to see where this happens)
+ *
+ *  What the above shows is that the reader just swapped out
+ *  the reader page with a page in the buffer, but before it
+ *  could make the new header point back to the new page added
+ *  it was preempted by a writer. The writer moved forward onto
+ *  the new page added by the reader and is about to move forward
+ *  again.
+ *
+ *  You can see, it is legitimate for the previous pointer of
+ *  the head (or any page) not to point back to itself. But only
+ *  temporarially.
+ */
+
+#define RB_PAGE_NORMAL		0UL
+#define RB_PAGE_HEAD		1UL
+#define RB_PAGE_UPDATE		2UL
+
+
+#define RB_FLAG_MASK		3UL
+
+/* PAGE_MOVED is not part of the mask */
+#define RB_PAGE_MOVED		4UL
+
+/*
+ * rb_list_head - remove any bit
+ */
+static struct list_head *rb_list_head(struct list_head *list)
+{
+	unsigned long val = (unsigned long)list;
+
+	return (struct list_head *)(val & ~RB_FLAG_MASK);
+}
+
+/*
+ * rb_is_head_page - test if the given page is the head page
+ *
+ * Because the reader may move the head_page pointer, we can
+ * not trust what the head page is (it may be pointing to
+ * the reader page). But if the next page is a header page,
+ * its flags will be non zero.
+ */
+static int inline
+rb_is_head_page(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		struct buffer_page *page, struct list_head *list)
+{
+	unsigned long val;
+
+	val = (unsigned long)list->next;
+
+	if ((val & ~RB_FLAG_MASK) != (unsigned long)&page->list)
+		return RB_PAGE_MOVED;
+
+	return val & RB_FLAG_MASK;
+}
+
+/*
+ * rb_is_reader_page
+ *
+ * The unique thing about the reader page, is that, if the
+ * writer is ever on it, the previous pointer never points
+ * back to the reader page.
+ */
+static int rb_is_reader_page(struct buffer_page *page)
+{
+	struct list_head *list = page->list.prev;
+
+	return rb_list_head(list->next) != &page->list;
+}
+
+/*
+ * rb_set_list_to_head - set a list_head to be pointing to head.
+ */
+static void rb_set_list_to_head(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+				struct list_head *list)
+{
+	unsigned long *ptr;
+
+	ptr = (unsigned long *)&list->next;
+	*ptr |= RB_PAGE_HEAD;
+	*ptr &= ~RB_PAGE_UPDATE;
+}
+
+/*
+ * rb_head_page_activate - sets up head page
+ */
+static void rb_head_page_activate(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct buffer_page *head;
+
+	head = cpu_buffer->head_page;
+	if (!head)
+		return;
+
+	/*
+	 * Set the previous list pointer to have the HEAD flag.
+	 */
+	rb_set_list_to_head(cpu_buffer, head->list.prev);
+}
+
+static void rb_list_head_clear(struct list_head *list)
+{
+	unsigned long *ptr = (unsigned long *)&list->next;
+
+	*ptr &= ~RB_FLAG_MASK;
+}
+
+/*
+ * rb_head_page_dactivate - clears head page ptr (for free list)
+ */
+static void
+rb_head_page_deactivate(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct list_head *hd;
+
+	/* Go through the whole list and clear any pointers found. */
+	rb_list_head_clear(cpu_buffer->pages);
+
+	list_for_each(hd, cpu_buffer->pages)
+		rb_list_head_clear(hd);
+}
+
+static int rb_head_page_set(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+			    struct buffer_page *head,
+			    struct buffer_page *prev,
+			    int old_flag, int new_flag)
+{
+	struct list_head *list;
+	unsigned long val = (unsigned long)&head->list;
+	unsigned long ret;
+
+	list = &prev->list;
+
+	val &= ~RB_FLAG_MASK;
+
+	ret = cmpxchg((unsigned long *)&list->next,
+		      val | old_flag, val | new_flag);
+
+	/* check if the reader took the page */
+	if ((ret & ~RB_FLAG_MASK) != val)
+		return RB_PAGE_MOVED;
+
+	return ret & RB_FLAG_MASK;
+}
+
+static int rb_head_page_set_update(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+				   struct buffer_page *head,
+				   struct buffer_page *prev,
+				   int old_flag)
+{
+	return rb_head_page_set(cpu_buffer, head, prev,
+				old_flag, RB_PAGE_UPDATE);
+}
+
+static int rb_head_page_set_head(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+				 struct buffer_page *head,
+				 struct buffer_page *prev,
+				 int old_flag)
+{
+	return rb_head_page_set(cpu_buffer, head, prev,
+				old_flag, RB_PAGE_HEAD);
+}
+
+static int rb_head_page_set_normal(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+				   struct buffer_page *head,
+				   struct buffer_page *prev,
+				   int old_flag)
+{
+	return rb_head_page_set(cpu_buffer, head, prev,
+				old_flag, RB_PAGE_NORMAL);
+}
+
+static inline void rb_inc_page(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+			       struct buffer_page **bpage)
+{
+	struct list_head *p = rb_list_head((*bpage)->list.next);
+
+	*bpage = list_entry(p, struct buffer_page, list);
+}
+
+static struct buffer_page *
+rb_set_head_page(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct buffer_page *head;
+	struct buffer_page *page;
+	struct list_head *list;
+	int i;
+
+	if (RB_WARN_ON(cpu_buffer, !cpu_buffer->head_page))
+		return NULL;
+
+	/* sanity check */
+	list = cpu_buffer->pages;
+	if (RB_WARN_ON(cpu_buffer, rb_list_head(list->prev->next) != list))
+		return NULL;
+
+	page = head = cpu_buffer->head_page;
+	/*
+	 * It is possible that the writer moves the header behind
+	 * where we started, and we miss in one loop.
+	 * A second loop should grab the header, but we'll do
+	 * three loops just because I'm paranoid.
+	 */
+	for (i = 0; i < 3; i++) {
+		do {
+			if (rb_is_head_page(cpu_buffer, page, page->list.prev)) {
+				cpu_buffer->head_page = page;
+				return page;
+			}
+			rb_inc_page(cpu_buffer, &page);
+		} while (page != head);
+	}
+
+	RB_WARN_ON(cpu_buffer, 1);
+
+	return NULL;
+}
+
+static int rb_head_page_replace(struct buffer_page *old,
+				struct buffer_page *new)
+{
+	unsigned long *ptr = (unsigned long *)&old->list.prev->next;
+	unsigned long val;
+	unsigned long ret;
+
+	val = *ptr & ~RB_FLAG_MASK;
+	val |= RB_PAGE_HEAD;
+
+	ret = cmpxchg(ptr, val, (unsigned long)&new->list);
+
+	return ret == val;
+}
+
+/*
+ * rb_tail_page_update - move the tail page forward
+ *
+ * Returns 1 if moved tail page, 0 if someone else did.
+ */
+static int rb_tail_page_update(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+			       struct buffer_page *tail_page,
+			       struct buffer_page *next_page)
+{
+	struct buffer_page *old_tail;
+	unsigned long old_entries;
+	unsigned long old_write;
+	int ret = 0;
+
+	/*
+	 * The tail page now needs to be moved forward.
+	 *
+	 * We need to reset the tail page, but without messing
+	 * with possible erasing of data brought in by interrupts
+	 * that have moved the tail page and are currently on it.
+	 *
+	 * We add a counter to the write field to denote this.
+	 */
+	old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
+	old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
+
+	/*
+	 * Just make sure we have seen our old_write and synchronize
+	 * with any interrupts that come in.
+	 */
+	barrier();
+
+	/*
+	 * If the tail page is still the same as what we think
+	 * it is, then it is up to us to update the tail
+	 * pointer.
+	 */
+	if (tail_page == cpu_buffer->tail_page) {
+		/* Zero the write counter */
+		unsigned long val = old_write & ~RB_WRITE_MASK;
+		unsigned long eval = old_entries & ~RB_WRITE_MASK;
+
+		/*
+		 * This will only succeed if an interrupt did
+		 * not come in and change it. In which case, we
+		 * do not want to modify it.
+		 *
+		 * We add (void) to let the compiler know that we do not care
+		 * about the return value of these functions. We use the
+		 * cmpxchg to only update if an interrupt did not already
+		 * do it for us. If the cmpxchg fails, we don't care.
+		 */
+		(void)local_cmpxchg(&next_page->write, old_write, val);
+		(void)local_cmpxchg(&next_page->entries, old_entries, eval);
+
+		/*
+		 * No need to worry about races with clearing out the commit.
+		 * it only can increment when a commit takes place. But that
+		 * only happens in the outer most nested commit.
+		 */
+		local_set(&next_page->page->commit, 0);
+
+		old_tail = cmpxchg(&cpu_buffer->tail_page,
+				   tail_page, next_page);
+
+		if (old_tail == tail_page)
+			ret = 1;
+	}
+
+	return ret;
+}
+
+static int rb_check_bpage(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+			  struct buffer_page *bpage)
+{
+	unsigned long val = (unsigned long)bpage;
+
+	if (RB_WARN_ON(cpu_buffer, val & RB_FLAG_MASK))
+		return 1;
+
+	return 0;
+}
+
+/**
+ * rb_check_list - make sure a pointer to a list has the last bits zero
+ */
+static int rb_check_list(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+			 struct list_head *list)
+{
+	if (RB_WARN_ON(cpu_buffer, rb_list_head(list->prev) != list->prev))
+		return 1;
+	if (RB_WARN_ON(cpu_buffer, rb_list_head(list->next) != list->next))
+		return 1;
+	return 0;
+}
+
+/**
+ * check_pages - integrity check of buffer pages
+ * @cpu_buffer: CPU buffer with pages to test
+ *
+ * As a safety measure we check to make sure the data pages have not
+ * been corrupted.
+ */
+static int rb_check_pages(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct list_head *head = cpu_buffer->pages;
+	struct buffer_page *bpage, *tmp;
+
+	rb_head_page_deactivate(cpu_buffer);
+
+	if (RB_WARN_ON(cpu_buffer, head->next->prev != head))
+		return -1;
+	if (RB_WARN_ON(cpu_buffer, head->prev->next != head))
+		return -1;
+
+	if (rb_check_list(cpu_buffer, head))
+		return -1;
+
+	list_for_each_entry_safe(bpage, tmp, head, list) {
+		if (RB_WARN_ON(cpu_buffer,
+			       bpage->list.next->prev != &bpage->list))
+			return -1;
+		if (RB_WARN_ON(cpu_buffer,
+			       bpage->list.prev->next != &bpage->list))
+			return -1;
+		if (rb_check_list(cpu_buffer, &bpage->list))
+			return -1;
+	}
+
+	rb_head_page_activate(cpu_buffer);
+
+	return 0;
+}
+
+static int rb_allocate_pages(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	struct buffer_page *bpage, *tmp;
+	unsigned long addr;
+	LIST_HEAD(pages);
+	unsigned i;
+
+	WARN_ON(!nr_pages);
+
+	for (i = 0; i < nr_pages; i++) {
+		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
+				    GFP_KERNEL, cpu_to_node(cpu_buffer->cpu));
+		if (!bpage)
+			goto free_pages;
+
+		rb_check_bpage(cpu_buffer, bpage);
+
+		list_add(&bpage->list, &pages);
+
+		addr = __get_free_page(GFP_KERNEL);
+		if (!addr)
+			goto free_pages;
+		bpage->page = (void *)addr;
+		rb_init_page(bpage->page);
+	}
+
+	/*
+	 * The ring buffer page list is a circular list that does not
+	 * start and end with a list head. All page list items point to
+	 * other pages.
+	 */
+	cpu_buffer->pages = pages.next;
+	list_del(&pages);
+
+	rb_check_pages(cpu_buffer);
+
+	return 0;
+
+ free_pages:
+	list_for_each_entry_safe(bpage, tmp, &pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+	return -ENOMEM;
+}
+
+static struct ftrace_ring_buffer_per_cpu *
+rb_allocate_cpu_buffer(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	struct buffer_page *bpage;
+	unsigned long addr;
+	int ret;
+
+	cpu_buffer = kzalloc_node(ALIGN(sizeof(*cpu_buffer), cache_line_size()),
+				  GFP_KERNEL, cpu_to_node(cpu));
+	if (!cpu_buffer)
+		return NULL;
+
+	cpu_buffer->cpu = cpu;
+	cpu_buffer->buffer = buffer;
+	spin_lock_init(&cpu_buffer->reader_lock);
+	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
+	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
+	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
+			    GFP_KERNEL, cpu_to_node(cpu));
+	if (!bpage)
+		goto fail_free_buffer;
+
+	rb_check_bpage(cpu_buffer, bpage);
+
+	cpu_buffer->reader_page = bpage;
+	addr = __get_free_page(GFP_KERNEL);
+	if (!addr)
+		goto fail_free_reader;
+	bpage->page = (void *)addr;
+	rb_init_page(bpage->page);
+
+	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+
+	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	if (ret < 0)
+		goto fail_free_reader;
+
+	cpu_buffer->head_page
+		= list_entry(cpu_buffer->pages, struct buffer_page, list);
+	cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page;
+
+	rb_head_page_activate(cpu_buffer);
+
+	return cpu_buffer;
+
+ fail_free_reader:
+	free_buffer_page(cpu_buffer->reader_page);
+
+ fail_free_buffer:
+	kfree(cpu_buffer);
+	return NULL;
+}
+
+static void rb_free_cpu_buffer(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct list_head *head = cpu_buffer->pages;
+	struct buffer_page *bpage, *tmp;
+
+	free_buffer_page(cpu_buffer->reader_page);
+
+	rb_head_page_deactivate(cpu_buffer);
+
+	if (head) {
+		list_for_each_entry_safe(bpage, tmp, head, list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
+		bpage = list_entry(head, struct buffer_page, list);
+		free_buffer_page(bpage);
+	}
+
+	kfree(cpu_buffer);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int rb_cpu_notify(struct notifier_block *self,
+			 unsigned long action, void *hcpu);
+#endif
+
+/**
+ * ftrace_ring_buffer_alloc - allocate a new ftrace_ring_buffer
+ * @size: the size in bytes per cpu that is needed.
+ * @flags: attributes to set for the ring buffer.
+ *
+ * Currently the only flag that is available is the RB_FL_OVERWRITE
+ * flag. This flag means that the buffer will overwrite old data
+ * when the buffer wraps. If this flag is not set, the buffer will
+ * drop data when the tail hits the head.
+ */
+struct ftrace_ring_buffer *__ftrace_ring_buffer_alloc(unsigned long size, unsigned flags,
+					struct lock_class_key *key)
+{
+	struct ftrace_ring_buffer *buffer;
+	int bsize;
+	int cpu;
+
+	/* keep it in its own cache line */
+	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
+			 GFP_KERNEL);
+	if (!buffer)
+		return NULL;
+
+	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
+		goto fail_free_buffer;
+
+	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	buffer->flags = flags;
+	buffer->clock = trace_clock_local;
+	buffer->reader_lock_key = key;
+
+	/* need at least two pages */
+	if (buffer->pages < 2)
+		buffer->pages = 2;
+
+	/*
+	 * In case of non-hotplug cpu, if the ring-buffer is allocated
+	 * in early initcall, it will not be notified of secondary cpus.
+	 * In that off case, we need to allocate for all possible cpus.
+	 */
+#ifdef CONFIG_HOTPLUG_CPU
+	get_online_cpus();
+	cpumask_copy(buffer->cpumask, cpu_online_mask);
+#else
+	cpumask_copy(buffer->cpumask, cpu_possible_mask);
+#endif
+	buffer->cpus = nr_cpu_ids;
+
+	bsize = sizeof(void *) * nr_cpu_ids;
+	buffer->buffers = kzalloc(ALIGN(bsize, cache_line_size()),
+				  GFP_KERNEL);
+	if (!buffer->buffers)
+		goto fail_free_cpumask;
+
+	for_each_buffer_cpu(buffer, cpu) {
+		buffer->buffers[cpu] =
+			rb_allocate_cpu_buffer(buffer, cpu);
+		if (!buffer->buffers[cpu])
+			goto fail_free_buffers;
+	}
+
+#ifdef CONFIG_HOTPLUG_CPU
+	buffer->cpu_notify.notifier_call = rb_cpu_notify;
+	buffer->cpu_notify.priority = 0;
+	register_cpu_notifier(&buffer->cpu_notify);
+#endif
+
+	put_online_cpus();
+	mutex_init(&buffer->mutex);
+
+	return buffer;
+
+ fail_free_buffers:
+	for_each_buffer_cpu(buffer, cpu) {
+		if (buffer->buffers[cpu])
+			rb_free_cpu_buffer(buffer->buffers[cpu]);
+	}
+	kfree(buffer->buffers);
+
+ fail_free_cpumask:
+	free_cpumask_var(buffer->cpumask);
+	put_online_cpus();
+
+ fail_free_buffer:
+	kfree(buffer);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(__ftrace_ring_buffer_alloc);
+
+/**
+ * ftrace_ring_buffer_free - free a ring buffer.
+ * @buffer: the buffer to free.
+ */
+void
+ftrace_ring_buffer_free(struct ftrace_ring_buffer *buffer)
+{
+	int cpu;
+
+	get_online_cpus();
+
+#ifdef CONFIG_HOTPLUG_CPU
+	unregister_cpu_notifier(&buffer->cpu_notify);
+#endif
+
+	for_each_buffer_cpu(buffer, cpu)
+		rb_free_cpu_buffer(buffer->buffers[cpu]);
+
+	put_online_cpus();
+
+	kfree(buffer->buffers);
+	free_cpumask_var(buffer->cpumask);
+
+	kfree(buffer);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_free);
+
+void ftrace_ring_buffer_set_clock(struct ftrace_ring_buffer *buffer,
+			   u64 (*clock)(void))
+{
+	buffer->clock = clock;
+}
+
+static void rb_reset_cpu(struct ftrace_ring_buffer_per_cpu *cpu_buffer);
+
+static void
+rb_remove_pages(struct ftrace_ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+{
+	struct buffer_page *bpage;
+	struct list_head *p;
+	unsigned i;
+
+	spin_lock_irq(&cpu_buffer->reader_lock);
+	rb_head_page_deactivate(cpu_buffer);
+
+	for (i = 0; i < nr_pages; i++) {
+		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
+			goto out;
+		p = cpu_buffer->pages->next;
+		bpage = list_entry(p, struct buffer_page, list);
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
+		goto out;
+
+	rb_reset_cpu(cpu_buffer);
+	rb_check_pages(cpu_buffer);
+
+out:
+	spin_unlock_irq(&cpu_buffer->reader_lock);
+}
+
+static void
+rb_insert_pages(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		struct list_head *pages, unsigned nr_pages)
+{
+	struct buffer_page *bpage;
+	struct list_head *p;
+	unsigned i;
+
+	spin_lock_irq(&cpu_buffer->reader_lock);
+	rb_head_page_deactivate(cpu_buffer);
+
+	for (i = 0; i < nr_pages; i++) {
+		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
+			goto out;
+		p = pages->next;
+		bpage = list_entry(p, struct buffer_page, list);
+		list_del_init(&bpage->list);
+		list_add_tail(&bpage->list, cpu_buffer->pages);
+	}
+	rb_reset_cpu(cpu_buffer);
+	rb_check_pages(cpu_buffer);
+
+out:
+	spin_unlock_irq(&cpu_buffer->reader_lock);
+}
+
+/**
+ * ftrace_ring_buffer_resize - resize the ring buffer
+ * @buffer: the buffer to resize.
+ * @size: the new size.
+ *
+ * Minimum size is 2 * BUF_PAGE_SIZE.
+ *
+ * Returns -1 on failure.
+ */
+int ftrace_ring_buffer_resize(struct ftrace_ring_buffer *buffer, unsigned long size)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned nr_pages, rm_pages, new_pages;
+	struct buffer_page *bpage, *tmp;
+	unsigned long buffer_size;
+	unsigned long addr;
+	LIST_HEAD(pages);
+	int i, cpu;
+
+	/*
+	 * Always succeed at resizing a non-existent buffer:
+	 */
+	if (!buffer)
+		return size;
+
+	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	size *= BUF_PAGE_SIZE;
+	buffer_size = buffer->pages * BUF_PAGE_SIZE;
+
+	/* we need a minimum of two pages */
+	if (size < BUF_PAGE_SIZE * 2)
+		size = BUF_PAGE_SIZE * 2;
+
+	if (size == buffer_size)
+		return size;
+
+	atomic_inc(&buffer->record_disabled);
+
+	/* Make sure all writers are done with this buffer. */
+	synchronize_sched();
+
+	mutex_lock(&buffer->mutex);
+	get_online_cpus();
+
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+
+	if (size < buffer_size) {
+
+		/* easy case, just free pages */
+		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
+			goto out_fail;
+
+		rm_pages = buffer->pages - nr_pages;
+
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			rb_remove_pages(cpu_buffer, rm_pages);
+		}
+		goto out;
+	}
+
+	/*
+	 * This is a bit more difficult. We only want to add pages
+	 * when we can allocate enough for all CPUs. We do this
+	 * by allocating all the pages and storing them on a local
+	 * link list. If we succeed in our allocation, then we
+	 * add these pages to the cpu_buffers. Otherwise we just free
+	 * them all and return -ENOMEM;
+	 */
+	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
+		goto out_fail;
+
+	new_pages = nr_pages - buffer->pages;
+
+	for_each_buffer_cpu(buffer, cpu) {
+		for (i = 0; i < new_pages; i++) {
+			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
+						  cache_line_size()),
+					    GFP_KERNEL, cpu_to_node(cpu));
+			if (!bpage)
+				goto free_pages;
+			list_add(&bpage->list, &pages);
+			addr = __get_free_page(GFP_KERNEL);
+			if (!addr)
+				goto free_pages;
+			bpage->page = (void *)addr;
+			rb_init_page(bpage->page);
+		}
+	}
+
+	for_each_buffer_cpu(buffer, cpu) {
+		cpu_buffer = buffer->buffers[cpu];
+		rb_insert_pages(cpu_buffer, &pages, new_pages);
+	}
+
+	if (RB_WARN_ON(buffer, !list_empty(&pages)))
+		goto out_fail;
+
+ out:
+	buffer->pages = nr_pages;
+	put_online_cpus();
+	mutex_unlock(&buffer->mutex);
+
+	atomic_dec(&buffer->record_disabled);
+
+	return size;
+
+ free_pages:
+	list_for_each_entry_safe(bpage, tmp, &pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+	put_online_cpus();
+	mutex_unlock(&buffer->mutex);
+	atomic_dec(&buffer->record_disabled);
+	return -ENOMEM;
+
+	/*
+	 * Something went totally wrong, and we are too paranoid
+	 * to even clean up the mess.
+	 */
+ out_fail:
+	put_online_cpus();
+	mutex_unlock(&buffer->mutex);
+	atomic_dec(&buffer->record_disabled);
+	return -1;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_resize);
+
+static inline void *
+__rb_data_page_index(struct buffer_data_page *bpage, unsigned index)
+{
+	return bpage->data + index;
+}
+
+static inline void *__rb_page_index(struct buffer_page *bpage, unsigned index)
+{
+	return bpage->page->data + index;
+}
+
+static inline struct ftrace_ring_buffer_event *
+rb_reader_event(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	return __rb_page_index(cpu_buffer->reader_page,
+			       cpu_buffer->reader_page->read);
+}
+
+static inline struct ftrace_ring_buffer_event *
+rb_iter_head_event(struct ftrace_ring_buffer_iter *iter)
+{
+	return __rb_page_index(iter->head_page, iter->head);
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
+static inline unsigned rb_page_commit(struct buffer_page *bpage)
+{
+	return local_read(&bpage->page->commit);
+}
+
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+/* Size is determined by what has been commited */
+static inline unsigned rb_page_size(struct buffer_page *bpage)
+{
+	return rb_page_commit(bpage);
+}
+
+static inline unsigned
+rb_commit_index(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	return rb_page_commit(cpu_buffer->commit_page);
+}
+
+static inline unsigned
+rb_event_index(struct ftrace_ring_buffer_event *event)
+{
+	unsigned long addr = (unsigned long)event;
+
+	return (addr & ~PAGE_MASK) - BUF_PAGE_HDR_SIZE;
+}
+
+static inline int
+rb_event_is_commit(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		   struct ftrace_ring_buffer_event *event)
+{
+	unsigned long addr = (unsigned long)event;
+	unsigned long index;
+
+	index = rb_event_index(event);
+	addr &= PAGE_MASK;
+
+	return cpu_buffer->commit_page->page == (void *)addr &&
+		rb_commit_index(cpu_buffer) == index;
+}
+
+static void
+rb_set_commit_to_write(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	unsigned long max_count;
+
+	/*
+	 * We only race with interrupts and NMIs on this CPU.
+	 * If we own the commit event, then we can commit
+	 * all others that interrupted us, since the interruptions
+	 * are in stack format (they finish before they come
+	 * back to us). This allows us to do a simple loop to
+	 * assign the commit to the tail.
+	 */
+ again:
+	max_count = cpu_buffer->buffer->pages * 100;
+
+	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
+		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
+			return;
+		if (RB_WARN_ON(cpu_buffer,
+			       rb_is_reader_page(cpu_buffer->tail_page)))
+			return;
+		local_set(&cpu_buffer->commit_page->page->commit,
+			  rb_page_write(cpu_buffer->commit_page));
+		rb_inc_page(cpu_buffer, &cpu_buffer->commit_page);
+		cpu_buffer->write_stamp =
+			cpu_buffer->commit_page->page->time_stamp;
+		/* add barrier to keep gcc from optimizing too much */
+		barrier();
+	}
+	while (rb_commit_index(cpu_buffer) !=
+	       rb_page_write(cpu_buffer->commit_page)) {
+
+		local_set(&cpu_buffer->commit_page->page->commit,
+			  rb_page_write(cpu_buffer->commit_page));
+		RB_WARN_ON(cpu_buffer,
+			   local_read(&cpu_buffer->commit_page->page->commit) &
+			   ~RB_WRITE_MASK);
+		barrier();
+	}
+
+	/* again, keep gcc from optimizing */
+	barrier();
+
+	/*
+	 * If an interrupt came in just after the first while loop
+	 * and pushed the tail page forward, we will be left with
+	 * a dangling commit that will never go forward.
+	 */
+	if (unlikely(cpu_buffer->commit_page != cpu_buffer->tail_page))
+		goto again;
+}
+
+static void rb_reset_reader_page(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	cpu_buffer->read_stamp = cpu_buffer->reader_page->page->time_stamp;
+	cpu_buffer->reader_page->read = 0;
+}
+
+static void rb_inc_iter(struct ftrace_ring_buffer_iter *iter)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+
+	/*
+	 * The iterator could be on the reader page (it starts there).
+	 * But the head could have moved, since the reader was
+	 * found. Check for this case and assign the iterator
+	 * to the head page instead of next.
+	 */
+	if (iter->head_page == cpu_buffer->reader_page)
+		iter->head_page = rb_set_head_page(cpu_buffer);
+	else
+		rb_inc_page(cpu_buffer, &iter->head_page);
+
+	iter->read_stamp = iter->head_page->page->time_stamp;
+	iter->head = 0;
+}
+
+/**
+ * ftrace_ring_buffer_update_event - update event type and data
+ * @event: the even to update
+ * @type: the type of event
+ * @length: the size of the event field in the ring buffer
+ *
+ * Update the type and data fields of the event. The length
+ * is the actual size that is written to the ring buffer,
+ * and with this, we can determine what to place into the
+ * data field.
+ */
+static void
+rb_update_event(struct ftrace_ring_buffer_event *event,
+			 unsigned type, unsigned length)
+{
+	event->type_len = type;
+
+	switch (type) {
+
+	case RINGBUF_TYPE_PADDING:
+	case RINGBUF_TYPE_TIME_EXTEND:
+	case RINGBUF_TYPE_TIME_STAMP:
+		break;
+
+	case 0:
+		length -= RB_EVNT_HDR_SIZE;
+		if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT)
+			event->array[0] = length;
+		else
+			event->type_len = DIV_ROUND_UP(length, RB_ALIGNMENT);
+		break;
+	default:
+		BUG();
+	}
+}
+
+/*
+ * rb_handle_head_page - writer hit the head page
+ *
+ * Returns: +1 to retry page
+ *           0 to continue
+ *          -1 on error
+ */
+static int
+rb_handle_head_page(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		    struct buffer_page *tail_page,
+		    struct buffer_page *next_page)
+{
+	struct buffer_page *new_head;
+	int entries;
+	int type;
+	int ret;
+
+	entries = rb_page_entries(next_page);
+
+	/*
+	 * The hard part is here. We need to move the head
+	 * forward, and protect against both readers on
+	 * other CPUs and writers coming in via interrupts.
+	 */
+	type = rb_head_page_set_update(cpu_buffer, next_page, tail_page,
+				       RB_PAGE_HEAD);
+
+	/*
+	 * type can be one of four:
+	 *  NORMAL - an interrupt already moved it for us
+	 *  HEAD   - we are the first to get here.
+	 *  UPDATE - we are the interrupt interrupting
+	 *           a current move.
+	 *  MOVED  - a reader on another CPU moved the next
+	 *           pointer to its reader page. Give up
+	 *           and try again.
+	 */
+
+	switch (type) {
+	case RB_PAGE_HEAD:
+		/*
+		 * We changed the head to UPDATE, thus
+		 * it is our responsibility to update
+		 * the counters.
+		 */
+		local_add(entries, &cpu_buffer->overrun);
+
+		/*
+		 * The entries will be zeroed out when we move the
+		 * tail page.
+		 */
+
+		/* still more to do */
+		break;
+
+	case RB_PAGE_UPDATE:
+		/*
+		 * This is an interrupt that interrupt the
+		 * previous update. Still more to do.
+		 */
+		break;
+	case RB_PAGE_NORMAL:
+		/*
+		 * An interrupt came in before the update
+		 * and processed this for us.
+		 * Nothing left to do.
+		 */
+		return 1;
+	case RB_PAGE_MOVED:
+		/*
+		 * The reader is on another CPU and just did
+		 * a swap with our next_page.
+		 * Try again.
+		 */
+		return 1;
+	default:
+		RB_WARN_ON(cpu_buffer, 1); /* WTF??? */
+		return -1;
+	}
+
+	/*
+	 * Now that we are here, the old head pointer is
+	 * set to UPDATE. This will keep the reader from
+	 * swapping the head page with the reader page.
+	 * The reader (on another CPU) will spin till
+	 * we are finished.
+	 *
+	 * We just need to protect against interrupts
+	 * doing the job. We will set the next pointer
+	 * to HEAD. After that, we set the old pointer
+	 * to NORMAL, but only if it was HEAD before.
+	 * otherwise we are an interrupt, and only
+	 * want the outer most commit to reset it.
+	 */
+	new_head = next_page;
+	rb_inc_page(cpu_buffer, &new_head);
+
+	ret = rb_head_page_set_head(cpu_buffer, new_head, next_page,
+				    RB_PAGE_NORMAL);
+
+	/*
+	 * Valid returns are:
+	 *  HEAD   - an interrupt came in and already set it.
+	 *  NORMAL - One of two things:
+	 *            1) We really set it.
+	 *            2) A bunch of interrupts came in and moved
+	 *               the page forward again.
+	 */
+	switch (ret) {
+	case RB_PAGE_HEAD:
+	case RB_PAGE_NORMAL:
+		/* OK */
+		break;
+	default:
+		RB_WARN_ON(cpu_buffer, 1);
+		return -1;
+	}
+
+	/*
+	 * It is possible that an interrupt came in,
+	 * set the head up, then more interrupts came in
+	 * and moved it again. When we get back here,
+	 * the page would have been set to NORMAL but we
+	 * just set it back to HEAD.
+	 *
+	 * How do you detect this? Well, if that happened
+	 * the tail page would have moved.
+	 */
+	if (ret == RB_PAGE_NORMAL) {
+		/*
+		 * If the tail had moved passed next, then we need
+		 * to reset the pointer.
+		 */
+		if (cpu_buffer->tail_page != tail_page &&
+		    cpu_buffer->tail_page != next_page)
+			rb_head_page_set_normal(cpu_buffer, new_head,
+						next_page,
+						RB_PAGE_HEAD);
+	}
+
+	/*
+	 * If this was the outer most commit (the one that
+	 * changed the original pointer from HEAD to UPDATE),
+	 * then it is up to us to reset it to NORMAL.
+	 */
+	if (type == RB_PAGE_HEAD) {
+		ret = rb_head_page_set_normal(cpu_buffer, next_page,
+					      tail_page,
+					      RB_PAGE_UPDATE);
+		if (RB_WARN_ON(cpu_buffer,
+			       ret != RB_PAGE_UPDATE))
+			return -1;
+	}
+
+	return 0;
+}
+
+static unsigned rb_calculate_event_length(unsigned length)
+{
+	struct ftrace_ring_buffer_event event; /* Used only for sizeof array */
+
+	/* zero length can cause confusions */
+	if (!length)
+		length = 1;
+
+	if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT)
+		length += sizeof(event.array[0]);
+
+	length += RB_EVNT_HDR_SIZE;
+	length = ALIGN(length, RB_ARCH_ALIGNMENT);
+
+	return length;
+}
+
+static inline void
+rb_reset_tail(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+	      struct buffer_page *tail_page,
+	      unsigned long tail, unsigned long length)
+{
+	struct ftrace_ring_buffer_event *event;
+
+	/*
+	 * Only the event that crossed the page boundary
+	 * must fill the old tail_page with padding.
+	 */
+	if (tail >= BUF_PAGE_SIZE) {
+		/*
+		 * If the page was filled, then we still need
+		 * to update the real_end. Reset it to zero
+		 * and the reader will ignore it.
+		 */
+		if (tail == BUF_PAGE_SIZE)
+			tail_page->real_end = 0;
+
+		local_sub(length, &tail_page->write);
+		return;
+	}
+
+	event = __rb_page_index(tail_page, tail);
+	kmemcheck_annotate_bitfield(event, bitfield);
+
+	/*
+	 * Save the original length to the meta data.
+	 * This will be used by the reader to add lost event
+	 * counter.
+	 */
+	tail_page->real_end = tail;
+
+	/*
+	 * If this event is bigger than the minimum size, then
+	 * we need to be careful that we don't subtract the
+	 * write counter enough to allow another writer to slip
+	 * in on this page.
+	 * We put in a discarded commit instead, to make sure
+	 * that this space is not used again.
+	 *
+	 * If we are less than the minimum size, we don't need to
+	 * worry about it.
+	 */
+	if (tail > (BUF_PAGE_SIZE - RB_EVNT_MIN_SIZE)) {
+		/* No room for any events */
+
+		/* Mark the rest of the page with padding */
+		rb_event_set_padding(event);
+
+		/* Set the write back to the previous setting */
+		local_sub(length, &tail_page->write);
+		return;
+	}
+
+	/* Put in a discarded event */
+	event->array[0] = (BUF_PAGE_SIZE - tail) - RB_EVNT_HDR_SIZE;
+	event->type_len = RINGBUF_TYPE_PADDING;
+	/* time delta must be non zero */
+	event->time_delta = 1;
+
+	/* Set write to end of buffer */
+	length = (tail + length) - BUF_PAGE_SIZE;
+	local_sub(length, &tail_page->write);
+}
+
+static struct ftrace_ring_buffer_event *
+rb_move_tail(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+	     unsigned long length, unsigned long tail,
+	     struct buffer_page *tail_page, u64 *ts)
+{
+	struct buffer_page *commit_page = cpu_buffer->commit_page;
+	struct ftrace_ring_buffer *buffer = cpu_buffer->buffer;
+	struct buffer_page *next_page;
+	int ret;
+
+	next_page = tail_page;
+
+	rb_inc_page(cpu_buffer, &next_page);
+
+	/*
+	 * If for some reason, we had an interrupt storm that made
+	 * it all the way around the buffer, bail, and warn
+	 * about it.
+	 */
+	if (unlikely(next_page == commit_page)) {
+		local_inc(&cpu_buffer->commit_overrun);
+		goto out_reset;
+	}
+
+	/*
+	 * This is where the fun begins!
+	 *
+	 * We are fighting against races between a reader that
+	 * could be on another CPU trying to swap its reader
+	 * page with the buffer head.
+	 *
+	 * We are also fighting against interrupts coming in and
+	 * moving the head or tail on us as well.
+	 *
+	 * If the next page is the head page then we have filled
+	 * the buffer, unless the commit page is still on the
+	 * reader page.
+	 */
+	if (rb_is_head_page(cpu_buffer, next_page, &tail_page->list)) {
+
+		/*
+		 * If the commit is not on the reader page, then
+		 * move the header page.
+		 */
+		if (!rb_is_reader_page(cpu_buffer->commit_page)) {
+			/*
+			 * If we are not in overwrite mode,
+			 * this is easy, just stop here.
+			 */
+			if (!(buffer->flags & RB_FL_OVERWRITE))
+				goto out_reset;
+
+			ret = rb_handle_head_page(cpu_buffer,
+						  tail_page,
+						  next_page);
+			if (ret < 0)
+				goto out_reset;
+			if (ret)
+				goto out_again;
+		} else {
+			/*
+			 * We need to be careful here too. The
+			 * commit page could still be on the reader
+			 * page. We could have a small buffer, and
+			 * have filled up the buffer with events
+			 * from interrupts and such, and wrapped.
+			 *
+			 * Note, if the tail page is also the on the
+			 * reader_page, we let it move out.
+			 */
+			if (unlikely((cpu_buffer->commit_page !=
+				      cpu_buffer->tail_page) &&
+				     (cpu_buffer->commit_page ==
+				      cpu_buffer->reader_page))) {
+				local_inc(&cpu_buffer->commit_overrun);
+				goto out_reset;
+			}
+		}
+	}
+
+	ret = rb_tail_page_update(cpu_buffer, tail_page, next_page);
+	if (ret) {
+		/*
+		 * Nested commits always have zero deltas, so
+		 * just reread the time stamp
+		 */
+		*ts = rb_time_stamp(buffer);
+		next_page->page->time_stamp = *ts;
+	}
+
+ out_again:
+
+	rb_reset_tail(cpu_buffer, tail_page, tail, length);
+
+	/* fail and let the caller try again */
+	return ERR_PTR(-EAGAIN);
+
+ out_reset:
+	/* reset write */
+	rb_reset_tail(cpu_buffer, tail_page, tail, length);
+
+	return NULL;
+}
+
+static struct ftrace_ring_buffer_event *
+__rb_reserve_next(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		  unsigned type, unsigned long length, u64 *ts)
+{
+	struct buffer_page *tail_page;
+	struct ftrace_ring_buffer_event *event;
+	unsigned long tail, write;
+
+	tail_page = cpu_buffer->tail_page;
+	write = local_add_return(length, &tail_page->write);
+
+	/* set write to only the index of the write */
+	write &= RB_WRITE_MASK;
+	tail = write - length;
+
+	/* See if we shot pass the end of this buffer page */
+	if (write > BUF_PAGE_SIZE)
+		return rb_move_tail(cpu_buffer, length, tail,
+				    tail_page, ts);
+
+	/* We reserved something on the buffer */
+
+	event = __rb_page_index(tail_page, tail);
+	kmemcheck_annotate_bitfield(event, bitfield);
+	rb_update_event(event, type, length);
+
+	/* The passed in type is zero for DATA */
+	if (likely(!type))
+		local_inc(&tail_page->entries);
+
+	/*
+	 * If this is the first commit on the page, then update
+	 * its timestamp.
+	 */
+	if (!tail)
+		tail_page->page->time_stamp = *ts;
+
+	return event;
+}
+
+static inline int
+rb_try_to_discard(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		  struct ftrace_ring_buffer_event *event)
+{
+	unsigned long new_index, old_index;
+	struct buffer_page *bpage;
+	unsigned long index;
+	unsigned long addr;
+
+	new_index = rb_event_index(event);
+	old_index = new_index + rb_event_length(event);
+	addr = (unsigned long)event;
+	addr &= PAGE_MASK;
+
+	bpage = cpu_buffer->tail_page;
+
+	if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
+		unsigned long write_mask =
+			local_read(&bpage->write) & ~RB_WRITE_MASK;
+		/*
+		 * This is on the tail page. It is possible that
+		 * a write could come in and move the tail page
+		 * and write to the next page. That is fine
+		 * because we just shorten what is on this page.
+		 */
+		old_index += write_mask;
+		new_index += write_mask;
+		index = local_cmpxchg(&bpage->write, old_index, new_index);
+		if (index == old_index)
+			return 1;
+	}
+
+	/* could not discard */
+	return 0;
+}
+
+static int
+rb_add_time_stamp(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		  u64 *ts, u64 *delta)
+{
+	struct ftrace_ring_buffer_event *event;
+	int ret;
+
+	WARN_ONCE(*delta > (1ULL << 59),
+		  KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n",
+		  (unsigned long long)*delta,
+		  (unsigned long long)*ts,
+		  (unsigned long long)cpu_buffer->write_stamp);
+
+	/*
+	 * The delta is too big, we to add a
+	 * new timestamp.
+	 */
+	event = __rb_reserve_next(cpu_buffer,
+				  RINGBUF_TYPE_TIME_EXTEND,
+				  RB_LEN_TIME_EXTEND,
+				  ts);
+	if (!event)
+		return -EBUSY;
+
+	if (PTR_ERR(event) == -EAGAIN)
+		return -EAGAIN;
+
+	/* Only a commited time event can update the write stamp */
+	if (rb_event_is_commit(cpu_buffer, event)) {
+		/*
+		 * If this is the first on the page, then it was
+		 * updated with the page itself. Try to discard it
+		 * and if we can't just make it zero.
+		 */
+		if (rb_event_index(event)) {
+			event->time_delta = *delta & TS_MASK;
+			event->array[0] = *delta >> TS_SHIFT;
+		} else {
+			/* try to discard, since we do not need this */
+			if (!rb_try_to_discard(cpu_buffer, event)) {
+				/* nope, just zero it */
+				event->time_delta = 0;
+				event->array[0] = 0;
+			}
+		}
+		cpu_buffer->write_stamp = *ts;
+		/* let the caller know this was the commit */
+		ret = 1;
+	} else {
+		/* Try to discard the event */
+		if (!rb_try_to_discard(cpu_buffer, event)) {
+			/* Darn, this is just wasted space */
+			event->time_delta = 0;
+			event->array[0] = 0;
+		}
+		ret = 0;
+	}
+
+	*delta = 0;
+
+	return ret;
+}
+
+static void rb_start_commit(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	local_inc(&cpu_buffer->committing);
+	local_inc(&cpu_buffer->commits);
+}
+
+static void rb_end_commit(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	unsigned long commits;
+
+	if (RB_WARN_ON(cpu_buffer,
+		       !local_read(&cpu_buffer->committing)))
+		return;
+
+ again:
+	commits = local_read(&cpu_buffer->commits);
+	/* synchronize with interrupts */
+	barrier();
+	if (local_read(&cpu_buffer->committing) == 1)
+		rb_set_commit_to_write(cpu_buffer);
+
+	local_dec(&cpu_buffer->committing);
+
+	/* synchronize with interrupts */
+	barrier();
+
+	/*
+	 * Need to account for interrupts coming in between the
+	 * updating of the commit page and the clearing of the
+	 * committing counter.
+	 */
+	if (unlikely(local_read(&cpu_buffer->commits) != commits) &&
+	    !local_read(&cpu_buffer->committing)) {
+		local_inc(&cpu_buffer->committing);
+		goto again;
+	}
+}
+
+static struct ftrace_ring_buffer_event *
+rb_reserve_next_event(struct ftrace_ring_buffer *buffer,
+		      struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		      unsigned long length)
+{
+	struct ftrace_ring_buffer_event *event;
+	u64 ts, delta = 0;
+	int commit = 0;
+	int nr_loops = 0;
+
+	rb_start_commit(cpu_buffer);
+
+#ifdef CONFIG_FTRACE_RING_BUFFER_ALLOW_SWAP
+	/*
+	 * Due to the ability to swap a cpu buffer from a buffer
+	 * it is possible it was swapped before we committed.
+	 * (committing stops a swap). We check for it here and
+	 * if it happened, we have to fail the write.
+	 */
+	barrier();
+	if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
+		local_dec(&cpu_buffer->committing);
+		local_dec(&cpu_buffer->commits);
+		return NULL;
+	}
+#endif
+
+	length = rb_calculate_event_length(length);
+ again:
+	/*
+	 * We allow for interrupts to reenter here and do a trace.
+	 * If one does, it will cause this original code to loop
+	 * back here. Even with heavy interrupts happening, this
+	 * should only happen a few times in a row. If this happens
+	 * 1000 times in a row, there must be either an interrupt
+	 * storm or we have something buggy.
+	 * Bail!
+	 */
+	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
+		goto out_fail;
+
+	ts = rb_time_stamp(cpu_buffer->buffer);
+
+	/*
+	 * Only the first commit can update the timestamp.
+	 * Yes there is a race here. If an interrupt comes in
+	 * just after the conditional and it traces too, then it
+	 * will also check the deltas. More than one timestamp may
+	 * also be made. But only the entry that did the actual
+	 * commit will be something other than zero.
+	 */
+	if (likely(cpu_buffer->tail_page == cpu_buffer->commit_page &&
+		   rb_page_write(cpu_buffer->tail_page) ==
+		   rb_commit_index(cpu_buffer))) {
+		u64 diff;
+
+		diff = ts - cpu_buffer->write_stamp;
+
+		/* make sure this diff is calculated here */
+		barrier();
+
+		/* Did the write stamp get updated already? */
+		if (unlikely(ts < cpu_buffer->write_stamp))
+			goto get_event;
+
+		delta = diff;
+		if (unlikely(test_time_stamp(delta))) {
+
+			commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
+			if (commit == -EBUSY)
+				goto out_fail;
+
+			if (commit == -EAGAIN)
+				goto again;
+
+			RB_WARN_ON(cpu_buffer, commit < 0);
+		}
+	}
+
+ get_event:
+	event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
+	if (unlikely(PTR_ERR(event) == -EAGAIN))
+		goto again;
+
+	if (!event)
+		goto out_fail;
+
+	if (!rb_event_is_commit(cpu_buffer, event))
+		delta = 0;
+
+	event->time_delta = delta;
+
+	return event;
+
+ out_fail:
+	rb_end_commit(cpu_buffer);
+	return NULL;
+}
+
+#ifdef CONFIG_TRACING
+
+#define TRACE_RECURSIVE_DEPTH 16
+
+static int trace_recursive_lock(void)
+{
+	current->trace_recursion++;
+
+	if (likely(current->trace_recursion < TRACE_RECURSIVE_DEPTH))
+		return 0;
+
+	/* Disable all tracing before we do anything else */
+	tracing_off_permanent();
+
+	printk_once(KERN_WARNING "Tracing recursion: depth[%ld]:"
+		    "HC[%lu]:SC[%lu]:NMI[%lu]\n",
+		    current->trace_recursion,
+		    hardirq_count() >> HARDIRQ_SHIFT,
+		    softirq_count() >> SOFTIRQ_SHIFT,
+		    in_nmi());
+
+	WARN_ON_ONCE(1);
+	return -1;
+}
+
+static void trace_recursive_unlock(void)
+{
+	WARN_ON_ONCE(!current->trace_recursion);
+
+	current->trace_recursion--;
+}
+
+#else
+
+#define trace_recursive_lock()		(0)
+#define trace_recursive_unlock()	do { } while (0)
+
+#endif
+
+/**
+ * ftrace_ring_buffer_lock_reserve - reserve a part of the buffer
+ * @buffer: the ring buffer to reserve from
+ * @length: the length of the data to reserve (excluding event header)
+ *
+ * Returns a reseverd event on the ring buffer to copy directly to.
+ * The user of this interface will need to get the body to write into
+ * and can use the ftrace_ring_buffer_event_data() interface.
+ *
+ * The length is the length of the data needed, not the event length
+ * which also includes the event header.
+ *
+ * Must be paired with ftrace_ring_buffer_unlock_commit, unless NULL is returned.
+ * If NULL is returned, then nothing has been allocated or locked.
+ */
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_lock_reserve(struct ftrace_ring_buffer *buffer, unsigned long length)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	struct ftrace_ring_buffer_event *event;
+	int cpu;
+
+	if (ftrace_ring_buffer_flags != RB_BUFFERS_ON)
+		return NULL;
+
+	/* If we are tracing schedule, we don't want to recurse */
+	preempt_disable_notrace();
+
+	if (atomic_read(&buffer->record_disabled))
+		goto out_nocheck;
+
+	if (trace_recursive_lock())
+		goto out_nocheck;
+
+	cpu = raw_smp_processor_id();
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		goto out;
+
+	cpu_buffer = buffer->buffers[cpu];
+
+	if (atomic_read(&cpu_buffer->record_disabled))
+		goto out;
+
+	if (length > BUF_MAX_DATA_SIZE)
+		goto out;
+
+	event = rb_reserve_next_event(buffer, cpu_buffer, length);
+	if (!event)
+		goto out;
+
+	return event;
+
+ out:
+	trace_recursive_unlock();
+
+ out_nocheck:
+	preempt_enable_notrace();
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_lock_reserve);
+
+static void
+rb_update_write_stamp(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		      struct ftrace_ring_buffer_event *event)
+{
+	/*
+	 * The event first in the commit queue updates the
+	 * time stamp.
+	 */
+	if (rb_event_is_commit(cpu_buffer, event))
+		cpu_buffer->write_stamp += event->time_delta;
+}
+
+static void rb_commit(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		      struct ftrace_ring_buffer_event *event)
+{
+	local_inc(&cpu_buffer->entries);
+	rb_update_write_stamp(cpu_buffer, event);
+	rb_end_commit(cpu_buffer);
+}
+
+/**
+ * ftrace_ring_buffer_unlock_commit - commit a reserved
+ * @buffer: The buffer to commit to
+ * @event: The event pointer to commit.
+ *
+ * This commits the data to the ring buffer, and releases any locks held.
+ *
+ * Must be paired with ftrace_ring_buffer_lock_reserve.
+ */
+int ftrace_ring_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+			      struct ftrace_ring_buffer_event *event)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	int cpu = raw_smp_processor_id();
+
+	cpu_buffer = buffer->buffers[cpu];
+
+	rb_commit(cpu_buffer, event);
+
+	trace_recursive_unlock();
+
+	preempt_enable_notrace();
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_unlock_commit);
+
+static inline void rb_event_discard(struct ftrace_ring_buffer_event *event)
+{
+	/* array[0] holds the actual length for the discarded event */
+	event->array[0] = rb_event_data_length(event) - RB_EVNT_HDR_SIZE;
+	event->type_len = RINGBUF_TYPE_PADDING;
+	/* time delta must be non zero */
+	if (!event->time_delta)
+		event->time_delta = 1;
+}
+
+/*
+ * Decrement the entries to the page that an event is on.
+ * The event does not even need to exist, only the pointer
+ * to the page it is on. This may only be called before the commit
+ * takes place.
+ */
+static inline void
+rb_decrement_entry(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		   struct ftrace_ring_buffer_event *event)
+{
+	unsigned long addr = (unsigned long)event;
+	struct buffer_page *bpage = cpu_buffer->commit_page;
+	struct buffer_page *start;
+
+	addr &= PAGE_MASK;
+
+	/* Do the likely case first */
+	if (likely(bpage->page == (void *)addr)) {
+		local_dec(&bpage->entries);
+		return;
+	}
+
+	/*
+	 * Because the commit page may be on the reader page we
+	 * start with the next page and check the end loop there.
+	 */
+	rb_inc_page(cpu_buffer, &bpage);
+	start = bpage;
+	do {
+		if (bpage->page == (void *)addr) {
+			local_dec(&bpage->entries);
+			return;
+		}
+		rb_inc_page(cpu_buffer, &bpage);
+	} while (bpage != start);
+
+	/* commit not part of this buffer?? */
+	RB_WARN_ON(cpu_buffer, 1);
+}
+
+/**
+ * ftrace_ring_buffer_commit_discard - discard an event that has not been committed
+ * @buffer: the ring buffer
+ * @event: non committed event to discard
+ *
+ * Sometimes an event that is in the ring buffer needs to be ignored.
+ * This function lets the user discard an event in the ring buffer
+ * and then that event will not be read later.
+ *
+ * This function only works if it is called before the the item has been
+ * committed. It will try to free the event from the ring buffer
+ * if another event has not been added behind it.
+ *
+ * If another event has been added behind it, it will set the event
+ * up as discarded, and perform the commit.
+ *
+ * If this function is called, do not call ftrace_ring_buffer_unlock_commit on
+ * the event.
+ */
+void ftrace_ring_buffer_discard_commit(struct ftrace_ring_buffer *buffer,
+				struct ftrace_ring_buffer_event *event)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	int cpu;
+
+	/* The event is discarded regardless */
+	rb_event_discard(event);
+
+	cpu = smp_processor_id();
+	cpu_buffer = buffer->buffers[cpu];
+
+	/*
+	 * This must only be called if the event has not been
+	 * committed yet. Thus we can assume that preemption
+	 * is still disabled.
+	 */
+	RB_WARN_ON(buffer, !local_read(&cpu_buffer->committing));
+
+	rb_decrement_entry(cpu_buffer, event);
+	if (rb_try_to_discard(cpu_buffer, event))
+		goto out;
+
+	/*
+	 * The commit is still visible by the reader, so we
+	 * must still update the timestamp.
+	 */
+	rb_update_write_stamp(cpu_buffer, event);
+ out:
+	rb_end_commit(cpu_buffer);
+
+	trace_recursive_unlock();
+
+	preempt_enable_notrace();
+
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_discard_commit);
+
+/**
+ * ftrace_ring_buffer_write - write data to the buffer without reserving
+ * @buffer: The ring buffer to write to.
+ * @length: The length of the data being written (excluding the event header)
+ * @data: The data to write to the buffer.
+ *
+ * This is like ftrace_ring_buffer_lock_reserve and ftrace_ring_buffer_unlock_commit as
+ * one function. If you already have the data to write to the buffer, it
+ * may be easier to simply call this function.
+ *
+ * Note, like ftrace_ring_buffer_lock_reserve, the length is the length of the data
+ * and not the length of the event which would hold the header.
+ */
+int ftrace_ring_buffer_write(struct ftrace_ring_buffer *buffer,
+			unsigned long length,
+			void *data)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	struct ftrace_ring_buffer_event *event;
+	void *body;
+	int ret = -EBUSY;
+	int cpu;
+
+	if (ftrace_ring_buffer_flags != RB_BUFFERS_ON)
+		return -EBUSY;
+
+	preempt_disable_notrace();
+
+	if (atomic_read(&buffer->record_disabled))
+		goto out;
+
+	cpu = raw_smp_processor_id();
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		goto out;
+
+	cpu_buffer = buffer->buffers[cpu];
+
+	if (atomic_read(&cpu_buffer->record_disabled))
+		goto out;
+
+	if (length > BUF_MAX_DATA_SIZE)
+		goto out;
+
+	event = rb_reserve_next_event(buffer, cpu_buffer, length);
+	if (!event)
+		goto out;
+
+	body = rb_event_data(event);
+
+	memcpy(body, data, length);
+
+	rb_commit(cpu_buffer, event);
+
+	ret = 0;
+ out:
+	preempt_enable_notrace();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_write);
+
+static int rb_per_cpu_empty(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct buffer_page *reader = cpu_buffer->reader_page;
+	struct buffer_page *head = rb_set_head_page(cpu_buffer);
+	struct buffer_page *commit = cpu_buffer->commit_page;
+
+	/* In case of error, head will be NULL */
+	if (unlikely(!head))
+		return 1;
+
+	return reader->read == rb_page_commit(reader) &&
+		(commit == reader ||
+		 (commit == head &&
+		  head->read == rb_page_commit(commit)));
+}
+
+/**
+ * ftrace_ring_buffer_record_disable - stop all writes into the buffer
+ * @buffer: The ring buffer to stop writes to.
+ *
+ * This prevents all writes to the buffer. Any attempt to write
+ * to the buffer after this will fail and return NULL.
+ *
+ * The caller should call synchronize_sched() after this.
+ */
+void ftrace_ring_buffer_record_disable(struct ftrace_ring_buffer *buffer)
+{
+	atomic_inc(&buffer->record_disabled);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_record_disable);
+
+/**
+ * ftrace_ring_buffer_record_enable - enable writes to the buffer
+ * @buffer: The ring buffer to enable writes
+ *
+ * Note, multiple disables will need the same number of enables
+ * to truly enable the writing (much like preempt_disable).
+ */
+void ftrace_ring_buffer_record_enable(struct ftrace_ring_buffer *buffer)
+{
+	atomic_dec(&buffer->record_disabled);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_record_enable);
+
+/**
+ * ftrace_ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
+ * @buffer: The ring buffer to stop writes to.
+ * @cpu: The CPU buffer to stop
+ *
+ * This prevents all writes to the buffer. Any attempt to write
+ * to the buffer after this will fail and return NULL.
+ *
+ * The caller should call synchronize_sched() after this.
+ */
+void ftrace_ring_buffer_record_disable_cpu(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return;
+
+	cpu_buffer = buffer->buffers[cpu];
+	atomic_inc(&cpu_buffer->record_disabled);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_record_disable_cpu);
+
+/**
+ * ftrace_ring_buffer_record_enable_cpu - enable writes to the buffer
+ * @buffer: The ring buffer to enable writes
+ * @cpu: The CPU to enable.
+ *
+ * Note, multiple disables will need the same number of enables
+ * to truly enable the writing (much like preempt_disable).
+ */
+void ftrace_ring_buffer_record_enable_cpu(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return;
+
+	cpu_buffer = buffer->buffers[cpu];
+	atomic_dec(&cpu_buffer->record_disabled);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_record_enable_cpu);
+
+/**
+ * ftrace_ring_buffer_entries_cpu - get the number of entries in a cpu buffer
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to get the entries from.
+ */
+unsigned long ftrace_ring_buffer_entries_cpu(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	ret = (local_read(&cpu_buffer->entries) - local_read(&cpu_buffer->overrun))
+		- cpu_buffer->read;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_entries_cpu);
+
+/**
+ * ftrace_ring_buffer_overrun_cpu - get the number of overruns in a cpu_buffer
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to get the number of overruns from
+ */
+unsigned long ftrace_ring_buffer_overrun_cpu(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	ret = local_read(&cpu_buffer->overrun);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_overrun_cpu);
+
+/**
+ * ftrace_ring_buffer_commit_overrun_cpu - get the number of overruns caused by commits
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to get the number of overruns from
+ */
+unsigned long
+ftrace_ring_buffer_commit_overrun_cpu(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	ret = local_read(&cpu_buffer->commit_overrun);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_commit_overrun_cpu);
+
+/**
+ * ftrace_ring_buffer_entries - get the number of entries in a buffer
+ * @buffer: The ring buffer
+ *
+ * Returns the total number of entries in the ring buffer
+ * (all CPU entries)
+ */
+unsigned long ftrace_ring_buffer_entries(struct ftrace_ring_buffer *buffer)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long entries = 0;
+	int cpu;
+
+	/* if you care about this being correct, lock the buffer */
+	for_each_buffer_cpu(buffer, cpu) {
+		cpu_buffer = buffer->buffers[cpu];
+		entries += (local_read(&cpu_buffer->entries) -
+			    local_read(&cpu_buffer->overrun)) - cpu_buffer->read;
+	}
+
+	return entries;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_entries);
+
+/**
+ * ftrace_ring_buffer_overruns - get the number of overruns in buffer
+ * @buffer: The ring buffer
+ *
+ * Returns the total number of overruns in the ring buffer
+ * (all CPU entries)
+ */
+unsigned long ftrace_ring_buffer_overruns(struct ftrace_ring_buffer *buffer)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long overruns = 0;
+	int cpu;
+
+	/* if you care about this being correct, lock the buffer */
+	for_each_buffer_cpu(buffer, cpu) {
+		cpu_buffer = buffer->buffers[cpu];
+		overruns += local_read(&cpu_buffer->overrun);
+	}
+
+	return overruns;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_overruns);
+
+static void rb_iter_reset(struct ftrace_ring_buffer_iter *iter)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+
+	/* Iterator usage is expected to have record disabled */
+	if (list_empty(&cpu_buffer->reader_page->list)) {
+		iter->head_page = rb_set_head_page(cpu_buffer);
+		if (unlikely(!iter->head_page))
+			return;
+		iter->head = iter->head_page->read;
+	} else {
+		iter->head_page = cpu_buffer->reader_page;
+		iter->head = cpu_buffer->reader_page->read;
+	}
+	if (iter->head)
+		iter->read_stamp = cpu_buffer->read_stamp;
+	else
+		iter->read_stamp = iter->head_page->page->time_stamp;
+	iter->cache_reader_page = cpu_buffer->reader_page;
+	iter->cache_read = cpu_buffer->read;
+}
+
+/**
+ * ftrace_ring_buffer_iter_reset - reset an iterator
+ * @iter: The iterator to reset
+ *
+ * Resets the iterator, so that it will start from the beginning
+ * again.
+ */
+void ftrace_ring_buffer_iter_reset(struct ftrace_ring_buffer_iter *iter)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long flags;
+
+	if (!iter)
+		return;
+
+	cpu_buffer = iter->cpu_buffer;
+
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+	rb_iter_reset(iter);
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_iter_reset);
+
+/**
+ * ftrace_ring_buffer_iter_empty - check if an iterator has no more to read
+ * @iter: The iterator to check
+ */
+int ftrace_ring_buffer_iter_empty(struct ftrace_ring_buffer_iter *iter)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+
+	cpu_buffer = iter->cpu_buffer;
+
+	return iter->head_page == cpu_buffer->commit_page &&
+		iter->head == rb_commit_index(cpu_buffer);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_iter_empty);
+
+static void
+rb_update_read_stamp(struct ftrace_ring_buffer_per_cpu *cpu_buffer,
+		     struct ftrace_ring_buffer_event *event)
+{
+	u64 delta;
+
+	switch (event->type_len) {
+	case RINGBUF_TYPE_PADDING:
+		return;
+
+	case RINGBUF_TYPE_TIME_EXTEND:
+		delta = event->array[0];
+		delta <<= TS_SHIFT;
+		delta += event->time_delta;
+		cpu_buffer->read_stamp += delta;
+		return;
+
+	case RINGBUF_TYPE_TIME_STAMP:
+		/* FIXME: not implemented */
+		return;
+
+	case RINGBUF_TYPE_DATA:
+		cpu_buffer->read_stamp += event->time_delta;
+		return;
+
+	default:
+		BUG();
+	}
+	return;
+}
+
+static void
+rb_update_iter_read_stamp(struct ftrace_ring_buffer_iter *iter,
+			  struct ftrace_ring_buffer_event *event)
+{
+	u64 delta;
+
+	switch (event->type_len) {
+	case RINGBUF_TYPE_PADDING:
+		return;
+
+	case RINGBUF_TYPE_TIME_EXTEND:
+		delta = event->array[0];
+		delta <<= TS_SHIFT;
+		delta += event->time_delta;
+		iter->read_stamp += delta;
+		return;
+
+	case RINGBUF_TYPE_TIME_STAMP:
+		/* FIXME: not implemented */
+		return;
+
+	case RINGBUF_TYPE_DATA:
+		iter->read_stamp += event->time_delta;
+		return;
+
+	default:
+		BUG();
+	}
+	return;
+}
+
+static struct buffer_page *
+rb_get_reader_page(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct buffer_page *reader = NULL;
+	unsigned long overwrite;
+	unsigned long flags;
+	int nr_loops = 0;
+	int ret;
+
+	local_irq_save(flags);
+	arch_spin_lock(&cpu_buffer->lock);
+
+ again:
+	/*
+	 * This should normally only loop twice. But because the
+	 * start of the reader inserts an empty page, it causes
+	 * a case where we will loop three times. There should be no
+	 * reason to loop four times (that I know of).
+	 */
+	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 3)) {
+		reader = NULL;
+		goto out;
+	}
+
+	reader = cpu_buffer->reader_page;
+
+	/* If there's more to read, return this page */
+	if (cpu_buffer->reader_page->read < rb_page_size(reader))
+		goto out;
+
+	/* Never should we have an index greater than the size */
+	if (RB_WARN_ON(cpu_buffer,
+		       cpu_buffer->reader_page->read > rb_page_size(reader)))
+		goto out;
+
+	/* check if we caught up to the tail */
+	reader = NULL;
+	if (cpu_buffer->commit_page == cpu_buffer->reader_page)
+		goto out;
+
+	/*
+	 * Reset the reader page to size zero.
+	 */
+	local_set(&cpu_buffer->reader_page->write, 0);
+	local_set(&cpu_buffer->reader_page->entries, 0);
+	local_set(&cpu_buffer->reader_page->page->commit, 0);
+	cpu_buffer->reader_page->real_end = 0;
+
+ spin:
+	/*
+	 * Splice the empty reader page into the list around the head.
+	 */
+	reader = rb_set_head_page(cpu_buffer);
+	cpu_buffer->reader_page->list.next = rb_list_head(reader->list.next);
+	cpu_buffer->reader_page->list.prev = reader->list.prev;
+
+	/*
+	 * cpu_buffer->pages just needs to point to the buffer, it
+	 *  has no specific buffer page to point to. Lets move it out
+	 *  of our way so we don't accidently swap it.
+	 */
+	cpu_buffer->pages = reader->list.prev;
+
+	/* The reader page will be pointing to the new head */
+	rb_set_list_to_head(cpu_buffer, &cpu_buffer->reader_page->list);
+
+	/*
+	 * We want to make sure we read the overruns after we set up our
+	 * pointers to the next object. The writer side does a
+	 * cmpxchg to cross pages which acts as the mb on the writer
+	 * side. Note, the reader will constantly fail the swap
+	 * while the writer is updating the pointers, so this
+	 * guarantees that the overwrite recorded here is the one we
+	 * want to compare with the last_overrun.
+	 */
+	smp_mb();
+	overwrite = local_read(&(cpu_buffer->overrun));
+
+	/*
+	 * Here's the tricky part.
+	 *
+	 * We need to move the pointer past the header page.
+	 * But we can only do that if a writer is not currently
+	 * moving it. The page before the header page has the
+	 * flag bit '1' set if it is pointing to the page we want.
+	 * but if the writer is in the process of moving it
+	 * than it will be '2' or already moved '0'.
+	 */
+
+	ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
+
+	/*
+	 * If we did not convert it, then we must try again.
+	 */
+	if (!ret)
+		goto spin;
+
+	/*
+	 * Yeah! We succeeded in replacing the page.
+	 *
+	 * Now make the new head point back to the reader page.
+	 */
+	rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
+	rb_inc_page(cpu_buffer, &cpu_buffer->head_page);
+
+	/* Finally update the reader page to the new head */
+	cpu_buffer->reader_page = reader;
+	rb_reset_reader_page(cpu_buffer);
+
+	if (overwrite != cpu_buffer->last_overrun) {
+		cpu_buffer->lost_events = overwrite - cpu_buffer->last_overrun;
+		cpu_buffer->last_overrun = overwrite;
+	}
+
+	goto again;
+
+ out:
+	arch_spin_unlock(&cpu_buffer->lock);
+	local_irq_restore(flags);
+
+	return reader;
+}
+
+static void rb_advance_reader(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	struct ftrace_ring_buffer_event *event;
+	struct buffer_page *reader;
+	unsigned length;
+
+	reader = rb_get_reader_page(cpu_buffer);
+
+	/* This function should not be called when buffer is empty */
+	if (RB_WARN_ON(cpu_buffer, !reader))
+		return;
+
+	event = rb_reader_event(cpu_buffer);
+
+	if (event->type_len <= RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
+		cpu_buffer->read++;
+
+	rb_update_read_stamp(cpu_buffer, event);
+
+	length = rb_event_length(event);
+	cpu_buffer->reader_page->read += length;
+}
+
+static void rb_advance_iter(struct ftrace_ring_buffer_iter *iter)
+{
+	struct ftrace_ring_buffer *buffer;
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	struct ftrace_ring_buffer_event *event;
+	unsigned length;
+
+	cpu_buffer = iter->cpu_buffer;
+	buffer = cpu_buffer->buffer;
+
+	/*
+	 * Check if we are at the end of the buffer.
+	 */
+	if (iter->head >= rb_page_size(iter->head_page)) {
+		/* discarded commits can make the page empty */
+		if (iter->head_page == cpu_buffer->commit_page)
+			return;
+		rb_inc_iter(iter);
+		return;
+	}
+
+	event = rb_iter_head_event(iter);
+
+	length = rb_event_length(event);
+
+	/*
+	 * This should not be called to advance the header if we are
+	 * at the tail of the buffer.
+	 */
+	if (RB_WARN_ON(cpu_buffer,
+		       (iter->head_page == cpu_buffer->commit_page) &&
+		       (iter->head + length > rb_commit_index(cpu_buffer))))
+		return;
+
+	rb_update_iter_read_stamp(iter, event);
+
+	iter->head += length;
+
+	/* check for end of page padding */
+	if ((iter->head >= rb_page_size(iter->head_page)) &&
+	    (iter->head_page != cpu_buffer->commit_page))
+		rb_advance_iter(iter);
+}
+
+static int rb_lost_events(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	return cpu_buffer->lost_events;
+}
+
+static struct ftrace_ring_buffer_event *
+rb_buffer_peek(struct ftrace_ring_buffer_per_cpu *cpu_buffer, u64 *ts,
+	       unsigned long *lost_events)
+{
+	struct ftrace_ring_buffer_event *event;
+	struct buffer_page *reader;
+	int nr_loops = 0;
+
+ again:
+	/*
+	 * We repeat when a timestamp is encountered. It is possible
+	 * to get multiple timestamps from an interrupt entering just
+	 * as one timestamp is about to be written, or from discarded
+	 * commits. The most that we can have is the number on a single page.
+	 */
+	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
+		return NULL;
+
+	reader = rb_get_reader_page(cpu_buffer);
+	if (!reader)
+		return NULL;
+
+	event = rb_reader_event(cpu_buffer);
+
+	switch (event->type_len) {
+	case RINGBUF_TYPE_PADDING:
+		if (rb_null_event(event))
+			RB_WARN_ON(cpu_buffer, 1);
+		/*
+		 * Because the writer could be discarding every
+		 * event it creates (which would probably be bad)
+		 * if we were to go back to "again" then we may never
+		 * catch up, and will trigger the warn on, or lock
+		 * the box. Return the padding, and we will release
+		 * the current locks, and try again.
+		 */
+		return event;
+
+	case RINGBUF_TYPE_TIME_EXTEND:
+		/* Internal data, OK to advance */
+		rb_advance_reader(cpu_buffer);
+		goto again;
+
+	case RINGBUF_TYPE_TIME_STAMP:
+		/* FIXME: not implemented */
+		rb_advance_reader(cpu_buffer);
+		goto again;
+
+	case RINGBUF_TYPE_DATA:
+		if (ts) {
+			*ts = cpu_buffer->read_stamp + event->time_delta;
+			ftrace_ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
+							 cpu_buffer->cpu, ts);
+		}
+		if (lost_events)
+			*lost_events = rb_lost_events(cpu_buffer);
+		return event;
+
+	default:
+		BUG();
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_peek);
+
+static struct ftrace_ring_buffer_event *
+rb_iter_peek(struct ftrace_ring_buffer_iter *iter, u64 *ts)
+{
+	struct ftrace_ring_buffer *buffer;
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	struct ftrace_ring_buffer_event *event;
+	int nr_loops = 0;
+
+	cpu_buffer = iter->cpu_buffer;
+	buffer = cpu_buffer->buffer;
+
+	/*
+	 * Check if someone performed a consuming read to
+	 * the buffer. A consuming read invalidates the iterator
+	 * and we need to reset the iterator in this case.
+	 */
+	if (unlikely(iter->cache_read != cpu_buffer->read ||
+		     iter->cache_reader_page != cpu_buffer->reader_page))
+		rb_iter_reset(iter);
+
+ again:
+	if (ftrace_ring_buffer_iter_empty(iter))
+		return NULL;
+
+	/*
+	 * We repeat when a timestamp is encountered.
+	 * We can get multiple timestamps by nested interrupts or also
+	 * if filtering is on (discarding commits). Since discarding
+	 * commits can be frequent we can get a lot of timestamps.
+	 * But we limit them by not adding timestamps if they begin
+	 * at the start of a page.
+	 */
+	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
+		return NULL;
+
+	if (rb_per_cpu_empty(cpu_buffer))
+		return NULL;
+
+	if (iter->head >= local_read(&iter->head_page->page->commit)) {
+		rb_inc_iter(iter);
+		goto again;
+	}
+
+	event = rb_iter_head_event(iter);
+
+	switch (event->type_len) {
+	case RINGBUF_TYPE_PADDING:
+		if (rb_null_event(event)) {
+			rb_inc_iter(iter);
+			goto again;
+		}
+		rb_advance_iter(iter);
+		return event;
+
+	case RINGBUF_TYPE_TIME_EXTEND:
+		/* Internal data, OK to advance */
+		rb_advance_iter(iter);
+		goto again;
+
+	case RINGBUF_TYPE_TIME_STAMP:
+		/* FIXME: not implemented */
+		rb_advance_iter(iter);
+		goto again;
+
+	case RINGBUF_TYPE_DATA:
+		if (ts) {
+			*ts = iter->read_stamp + event->time_delta;
+			ftrace_ring_buffer_normalize_time_stamp(buffer,
+							 cpu_buffer->cpu, ts);
+		}
+		return event;
+
+	default:
+		BUG();
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_iter_peek);
+
+static inline int rb_ok_to_lock(void)
+{
+	/*
+	 * If an NMI die dumps out the content of the ring buffer
+	 * do not grab locks. We also permanently disable the ring
+	 * buffer too. A one time deal is all you get from reading
+	 * the ring buffer from an NMI.
+	 */
+	if (likely(!in_nmi()))
+		return 1;
+
+	tracing_off_permanent();
+	return 0;
+}
+
+/**
+ * ftrace_ring_buffer_peek - peek at the next event to be read
+ * @buffer: The ring buffer to read
+ * @cpu: The cpu to peak at
+ * @ts: The timestamp counter of this event.
+ * @lost_events: a variable to store if events were lost (may be NULL)
+ *
+ * This will return the event that will be read next, but does
+ * not consume the data.
+ */
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_peek(struct ftrace_ring_buffer *buffer, int cpu, u64 *ts,
+		 unsigned long *lost_events)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+	struct ftrace_ring_buffer_event *event;
+	unsigned long flags;
+	int dolock;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return NULL;
+
+	dolock = rb_ok_to_lock();
+ again:
+	local_irq_save(flags);
+	if (dolock)
+		spin_lock(&cpu_buffer->reader_lock);
+	event = rb_buffer_peek(cpu_buffer, ts, lost_events);
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
+		rb_advance_reader(cpu_buffer);
+	if (dolock)
+		spin_unlock(&cpu_buffer->reader_lock);
+	local_irq_restore(flags);
+
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
+		goto again;
+
+	return event;
+}
+
+/**
+ * ftrace_ring_buffer_iter_peek - peek at the next event to be read
+ * @iter: The ring buffer iterator
+ * @ts: The timestamp counter of this event.
+ *
+ * This will return the event that will be read next, but does
+ * not increment the iterator.
+ */
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_iter_peek(struct ftrace_ring_buffer_iter *iter, u64 *ts)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+	struct ftrace_ring_buffer_event *event;
+	unsigned long flags;
+
+ again:
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+	event = rb_iter_peek(iter, ts);
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
+		goto again;
+
+	return event;
+}
+
+/**
+ * ftrace_ring_buffer_consume - return an event and consume it
+ * @buffer: The ring buffer to get the next event from
+ * @cpu: the cpu to read the buffer from
+ * @ts: a variable to store the timestamp (may be NULL)
+ * @lost_events: a variable to store if events were lost (may be NULL)
+ *
+ * Returns the next event in the ring buffer, and that event is consumed.
+ * Meaning, that sequential reads will keep returning a different event,
+ * and eventually empty the ring buffer if the producer is slower.
+ */
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_consume(struct ftrace_ring_buffer *buffer, int cpu, u64 *ts,
+		    unsigned long *lost_events)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	struct ftrace_ring_buffer_event *event = NULL;
+	unsigned long flags;
+	int dolock;
+
+	dolock = rb_ok_to_lock();
+
+ again:
+	/* might be called in atomic */
+	preempt_disable();
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		goto out;
+
+	cpu_buffer = buffer->buffers[cpu];
+	local_irq_save(flags);
+	if (dolock)
+		spin_lock(&cpu_buffer->reader_lock);
+
+	event = rb_buffer_peek(cpu_buffer, ts, lost_events);
+	if (event) {
+		cpu_buffer->lost_events = 0;
+		rb_advance_reader(cpu_buffer);
+	}
+
+	if (dolock)
+		spin_unlock(&cpu_buffer->reader_lock);
+	local_irq_restore(flags);
+
+ out:
+	preempt_enable();
+
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
+		goto again;
+
+	return event;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_consume);
+
+/**
+ * ftrace_ring_buffer_read_prepare - Prepare for a non consuming read of the buffer
+ * @buffer: The ring buffer to read from
+ * @cpu: The cpu buffer to iterate over
+ *
+ * This performs the initial preparations necessary to iterate
+ * through the buffer.  Memory is allocated, buffer recording
+ * is disabled, and the iterator pointer is returned to the caller.
+ *
+ * Disabling buffer recordng prevents the reading from being
+ * corrupted. This is not a consuming read, so a producer is not
+ * expected.
+ *
+ * After a sequence of ftrace_ring_buffer_read_prepare calls, the user is
+ * expected to make at least one call to ftrace_ring_buffer_prepare_sync.
+ * Afterwards, ftrace_ring_buffer_read_start is invoked to get things going
+ * for real.
+ *
+ * This overall must be paired with ftrace_ring_buffer_finish.
+ */
+struct ftrace_ring_buffer_iter *
+ftrace_ring_buffer_read_prepare(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	struct ftrace_ring_buffer_iter *iter;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return NULL;
+
+	iter = kmalloc(sizeof(*iter), GFP_KERNEL);
+	if (!iter)
+		return NULL;
+
+	cpu_buffer = buffer->buffers[cpu];
+
+	iter->cpu_buffer = cpu_buffer;
+
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	return iter;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_read_prepare);
+
+/**
+ * ftrace_ring_buffer_read_prepare_sync - Synchronize a set of prepare calls
+ *
+ * All previously invoked ftrace_ring_buffer_read_prepare calls to prepare
+ * iterators will be synchronized.  Afterwards, read_buffer_read_start
+ * calls on those iterators are allowed.
+ */
+void
+ftrace_ring_buffer_read_prepare_sync(void)
+{
+	synchronize_sched();
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_read_prepare_sync);
+
+/**
+ * ftrace_ring_buffer_read_start - start a non consuming read of the buffer
+ * @iter: The iterator returned by ftrace_ring_buffer_read_prepare
+ *
+ * This finalizes the startup of an iteration through the buffer.
+ * The iterator comes from a call to ftrace_ring_buffer_read_prepare and
+ * an intervening ftrace_ring_buffer_read_prepare_sync must have been
+ * performed.
+ *
+ * Must be paired with ftrace_ring_buffer_finish.
+ */
+void
+ftrace_ring_buffer_read_start(struct ftrace_ring_buffer_iter *iter)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long flags;
+
+	if (!iter)
+		return;
+
+	cpu_buffer = iter->cpu_buffer;
+
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+	arch_spin_lock(&cpu_buffer->lock);
+	rb_iter_reset(iter);
+	arch_spin_unlock(&cpu_buffer->lock);
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_read_start);
+
+/**
+ * ftrace_ring_buffer_finish - finish reading the iterator of the buffer
+ * @iter: The iterator retrieved by ftrace_ring_buffer_start
+ *
+ * This re-enables the recording to the buffer, and frees the
+ * iterator.
+ */
+void
+ftrace_ring_buffer_read_finish(struct ftrace_ring_buffer_iter *iter)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+
+	atomic_dec(&cpu_buffer->record_disabled);
+	kfree(iter);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_read_finish);
+
+/**
+ * ftrace_ring_buffer_read - read the next item in the ring buffer by the iterator
+ * @iter: The ring buffer iterator
+ * @ts: The time stamp of the event read.
+ *
+ * This reads the next event in the ring buffer and increments the iterator.
+ */
+struct ftrace_ring_buffer_event *
+ftrace_ring_buffer_read(struct ftrace_ring_buffer_iter *iter, u64 *ts)
+{
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ again:
+	event = rb_iter_peek(iter, ts);
+	if (!event)
+		goto out;
+
+	if (event->type_len == RINGBUF_TYPE_PADDING)
+		goto again;
+
+	rb_advance_iter(iter);
+ out:
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+	return event;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_read);
+
+/**
+ * ftrace_ring_buffer_size - return the size of the ring buffer (in bytes)
+ * @buffer: The ring buffer.
+ */
+unsigned long ftrace_ring_buffer_size(struct ftrace_ring_buffer *buffer)
+{
+	return BUF_PAGE_SIZE * buffer->pages;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_size);
+
+static void
+rb_reset_cpu(struct ftrace_ring_buffer_per_cpu *cpu_buffer)
+{
+	rb_head_page_deactivate(cpu_buffer);
+
+	cpu_buffer->head_page
+		= list_entry(cpu_buffer->pages, struct buffer_page, list);
+	local_set(&cpu_buffer->head_page->write, 0);
+	local_set(&cpu_buffer->head_page->entries, 0);
+	local_set(&cpu_buffer->head_page->page->commit, 0);
+
+	cpu_buffer->head_page->read = 0;
+
+	cpu_buffer->tail_page = cpu_buffer->head_page;
+	cpu_buffer->commit_page = cpu_buffer->head_page;
+
+	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	local_set(&cpu_buffer->reader_page->write, 0);
+	local_set(&cpu_buffer->reader_page->entries, 0);
+	local_set(&cpu_buffer->reader_page->page->commit, 0);
+	cpu_buffer->reader_page->read = 0;
+
+	local_set(&cpu_buffer->commit_overrun, 0);
+	local_set(&cpu_buffer->overrun, 0);
+	local_set(&cpu_buffer->entries, 0);
+	local_set(&cpu_buffer->committing, 0);
+	local_set(&cpu_buffer->commits, 0);
+	cpu_buffer->read = 0;
+
+	cpu_buffer->write_stamp = 0;
+	cpu_buffer->read_stamp = 0;
+
+	cpu_buffer->lost_events = 0;
+	cpu_buffer->last_overrun = 0;
+
+	rb_head_page_activate(cpu_buffer);
+}
+
+/**
+ * ftrace_ring_buffer_reset_cpu - reset a ring buffer per CPU buffer
+ * @buffer: The ring buffer to reset a per cpu buffer of
+ * @cpu: The CPU buffer to be reset
+ */
+void ftrace_ring_buffer_reset_cpu(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+	unsigned long flags;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return;
+
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
+		goto out;
+
+	arch_spin_lock(&cpu_buffer->lock);
+
+	rb_reset_cpu(cpu_buffer);
+
+	arch_spin_unlock(&cpu_buffer->lock);
+
+ out:
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+	atomic_dec(&cpu_buffer->record_disabled);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_reset_cpu);
+
+/**
+ * ftrace_ring_buffer_reset - reset a ring buffer
+ * @buffer: The ring buffer to reset all cpu buffers
+ */
+void ftrace_ring_buffer_reset(struct ftrace_ring_buffer *buffer)
+{
+	int cpu;
+
+	for_each_buffer_cpu(buffer, cpu)
+		ftrace_ring_buffer_reset_cpu(buffer, cpu);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_reset);
+
+/**
+ * rind_buffer_empty - is the ring buffer empty?
+ * @buffer: The ring buffer to test
+ */
+int ftrace_ring_buffer_empty(struct ftrace_ring_buffer *buffer)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long flags;
+	int dolock;
+	int cpu;
+	int ret;
+
+	dolock = rb_ok_to_lock();
+
+	/* yes this is racy, but if you don't like the race, lock the buffer */
+	for_each_buffer_cpu(buffer, cpu) {
+		cpu_buffer = buffer->buffers[cpu];
+		local_irq_save(flags);
+		if (dolock)
+			spin_lock(&cpu_buffer->reader_lock);
+		ret = rb_per_cpu_empty(cpu_buffer);
+		if (dolock)
+			spin_unlock(&cpu_buffer->reader_lock);
+		local_irq_restore(flags);
+
+		if (!ret)
+			return 0;
+	}
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_empty);
+
+/**
+ * ftrace_ring_buffer_empty_cpu - is a cpu buffer of a ring buffer empty?
+ * @buffer: The ring buffer
+ * @cpu: The CPU buffer to test
+ */
+int ftrace_ring_buffer_empty_cpu(struct ftrace_ring_buffer *buffer, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer;
+	unsigned long flags;
+	int dolock;
+	int ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 1;
+
+	dolock = rb_ok_to_lock();
+
+	cpu_buffer = buffer->buffers[cpu];
+	local_irq_save(flags);
+	if (dolock)
+		spin_lock(&cpu_buffer->reader_lock);
+	ret = rb_per_cpu_empty(cpu_buffer);
+	if (dolock)
+		spin_unlock(&cpu_buffer->reader_lock);
+	local_irq_restore(flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_empty_cpu);
+
+#ifdef CONFIG_FTRACE_RING_BUFFER_ALLOW_SWAP
+/**
+ * ftrace_ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
+ * @buffer_a: One buffer to swap with
+ * @buffer_b: The other buffer to swap with
+ *
+ * This function is useful for tracers that want to take a "snapshot"
+ * of a CPU buffer and has another back up buffer lying around.
+ * it is expected that the tracer handles the cpu buffer not being
+ * used at the moment.
+ */
+int ftrace_ring_buffer_swap_cpu(struct ftrace_ring_buffer *buffer_a,
+			 struct ftrace_ring_buffer *buffer_b, int cpu)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer_a;
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer_b;
+	int ret = -EINVAL;
+
+	if (!cpumask_test_cpu(cpu, buffer_a->cpumask) ||
+	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
+		goto out;
+
+	/* At least make sure the two buffers are somewhat the same */
+	if (buffer_a->pages != buffer_b->pages)
+		goto out;
+
+	ret = -EAGAIN;
+
+	if (ftrace_ring_buffer_flags != RB_BUFFERS_ON)
+		goto out;
+
+	if (atomic_read(&buffer_a->record_disabled))
+		goto out;
+
+	if (atomic_read(&buffer_b->record_disabled))
+		goto out;
+
+	cpu_buffer_a = buffer_a->buffers[cpu];
+	cpu_buffer_b = buffer_b->buffers[cpu];
+
+	if (atomic_read(&cpu_buffer_a->record_disabled))
+		goto out;
+
+	if (atomic_read(&cpu_buffer_b->record_disabled))
+		goto out;
+
+	/*
+	 * We can't do a synchronize_sched here because this
+	 * function can be called in atomic context.
+	 * Normally this will be called from the same CPU as cpu.
+	 * If not it's up to the caller to protect this.
+	 */
+	atomic_inc(&cpu_buffer_a->record_disabled);
+	atomic_inc(&cpu_buffer_b->record_disabled);
+
+	ret = -EBUSY;
+	if (local_read(&cpu_buffer_a->committing))
+		goto out_dec;
+	if (local_read(&cpu_buffer_b->committing))
+		goto out_dec;
+
+	buffer_a->buffers[cpu] = cpu_buffer_b;
+	buffer_b->buffers[cpu] = cpu_buffer_a;
+
+	cpu_buffer_b->buffer = buffer_a;
+	cpu_buffer_a->buffer = buffer_b;
+
+	ret = 0;
+
+out_dec:
+	atomic_dec(&cpu_buffer_a->record_disabled);
+	atomic_dec(&cpu_buffer_b->record_disabled);
+out:
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_swap_cpu);
+#endif /* CONFIG_FTRACE_RING_BUFFER_ALLOW_SWAP */
+
+/**
+ * ftrace_ring_buffer_alloc_read_page - allocate a page to read from buffer
+ * @buffer: the buffer to allocate for.
+ *
+ * This function is used in conjunction with ftrace_ring_buffer_read_page.
+ * When reading a full page from the ring buffer, these functions
+ * can be used to speed up the process. The calling function should
+ * allocate a few pages first with this function. Then when it
+ * needs to get pages from the ring buffer, it passes the result
+ * of this function into ftrace_ring_buffer_read_page, which will swap
+ * the page that was allocated, with the read page of the buffer.
+ *
+ * Returns:
+ *  The page allocated, or NULL on error.
+ */
+void *ftrace_ring_buffer_alloc_read_page(struct ftrace_ring_buffer *buffer)
+{
+	struct buffer_data_page *bpage;
+	unsigned long addr;
+
+	addr = __get_free_page(GFP_KERNEL);
+	if (!addr)
+		return NULL;
+
+	bpage = (void *)addr;
+
+	rb_init_page(bpage);
+
+	return bpage;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_alloc_read_page);
+
+/**
+ * ftrace_ring_buffer_free_read_page - free an allocated read page
+ * @buffer: the buffer the page was allocate for
+ * @data: the page to free
+ *
+ * Free a page allocated from ftrace_ring_buffer_alloc_read_page.
+ */
+void ftrace_ring_buffer_free_read_page(struct ftrace_ring_buffer *buffer, void *data)
+{
+	free_page((unsigned long)data);
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_free_read_page);
+
+/**
+ * ftrace_ring_buffer_read_page - extract a page from the ring buffer
+ * @buffer: buffer to extract from
+ * @data_page: the page to use allocated from ftrace_ring_buffer_alloc_read_page
+ * @len: amount to extract
+ * @cpu: the cpu of the buffer to extract
+ * @full: should the extraction only happen when the page is full.
+ *
+ * This function will pull out a page from the ring buffer and consume it.
+ * @data_page must be the address of the variable that was returned
+ * from ftrace_ring_buffer_alloc_read_page. This is because the page might be used
+ * to swap with a page in the ring buffer.
+ *
+ * for example:
+ *	rpage = ftrace_ring_buffer_alloc_read_page(buffer);
+ *	if (!rpage)
+ *		return error;
+ *	ret = ftrace_ring_buffer_read_page(buffer, &rpage, len, cpu, 0);
+ *	if (ret >= 0)
+ *		process_page(rpage, ret);
+ *
+ * When @full is set, the function will not return true unless
+ * the writer is off the reader page.
+ *
+ * Note: it is up to the calling functions to handle sleeps and wakeups.
+ *  The ring buffer can be used anywhere in the kernel and can not
+ *  blindly call wake_up. The layer that uses the ring buffer must be
+ *  responsible for that.
+ *
+ * Returns:
+ *  >=0 if data has been transferred, returns the offset of consumed data.
+ *  <0 if no data has been transferred.
+ */
+int ftrace_ring_buffer_read_page(struct ftrace_ring_buffer *buffer,
+			  void **data_page, size_t len, int cpu, int full)
+{
+	struct ftrace_ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+	struct ftrace_ring_buffer_event *event;
+	struct buffer_data_page *bpage;
+	struct buffer_page *reader;
+	unsigned long missed_events;
+	unsigned long flags;
+	unsigned int commit;
+	unsigned int read;
+	u64 save_timestamp;
+	int ret = -1;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		goto out;
+
+	/*
+	 * If len is not big enough to hold the page header, then
+	 * we can not copy anything.
+	 */
+	if (len <= BUF_PAGE_HDR_SIZE)
+		goto out;
+
+	len -= BUF_PAGE_HDR_SIZE;
+
+	if (!data_page)
+		goto out;
+
+	bpage = *data_page;
+	if (!bpage)
+		goto out;
+
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+
+	reader = rb_get_reader_page(cpu_buffer);
+	if (!reader)
+		goto out_unlock;
+
+	event = rb_reader_event(cpu_buffer);
+
+	read = reader->read;
+	commit = rb_page_commit(reader);
+
+	/* Check if any events were dropped */
+	missed_events = cpu_buffer->lost_events;
+
+	/*
+	 * If this page has been partially read or
+	 * if len is not big enough to read the rest of the page or
+	 * a writer is still on the page, then
+	 * we must copy the data from the page to the buffer.
+	 * Otherwise, we can simply swap the page with the one passed in.
+	 */
+	if (read || (len < (commit - read)) ||
+	    cpu_buffer->reader_page == cpu_buffer->commit_page) {
+		struct buffer_data_page *rpage = cpu_buffer->reader_page->page;
+		unsigned int rpos = read;
+		unsigned int pos = 0;
+		unsigned int size;
+
+		if (full)
+			goto out_unlock;
+
+		if (len > (commit - read))
+			len = (commit - read);
+
+		size = rb_event_length(event);
+
+		if (len < size)
+			goto out_unlock;
+
+		/* save the current timestamp, since the user will need it */
+		save_timestamp = cpu_buffer->read_stamp;
+
+		/* Need to copy one event at a time */
+		do {
+			memcpy(bpage->data + pos, rpage->data + rpos, size);
+
+			len -= size;
+
+			rb_advance_reader(cpu_buffer);
+			rpos = reader->read;
+			pos += size;
+
+			event = rb_reader_event(cpu_buffer);
+			size = rb_event_length(event);
+		} while (len > size);
+
+		/* update bpage */
+		local_set(&bpage->commit, pos);
+		bpage->time_stamp = save_timestamp;
+
+		/* we copied everything to the beginning */
+		read = 0;
+	} else {
+		/* update the entry counter */
+		cpu_buffer->read += rb_page_entries(reader);
+
+		/* swap the pages */
+		rb_init_page(bpage);
+		bpage = reader->page;
+		reader->page = *data_page;
+		local_set(&reader->write, 0);
+		local_set(&reader->entries, 0);
+		reader->read = 0;
+		*data_page = bpage;
+
+		/*
+		 * Use the real_end for the data size,
+		 * This gives us a chance to store the lost events
+		 * on the page.
+		 */
+		if (reader->real_end)
+			local_set(&bpage->commit, reader->real_end);
+	}
+	ret = read;
+
+	cpu_buffer->lost_events = 0;
+
+	commit = local_read(&bpage->commit);
+	/*
+	 * Set a flag in the commit field if we lost events
+	 */
+	if (missed_events) {
+		/* If there is room at the end of the page to save the
+		 * missed events, then record it there.
+		 */
+		if (BUF_PAGE_SIZE - commit >= sizeof(missed_events)) {
+			memcpy(&bpage->data[commit], &missed_events,
+			       sizeof(missed_events));
+			local_add(RB_MISSED_STORED, &bpage->commit);
+			commit += sizeof(missed_events);
+		}
+		local_add(RB_MISSED_EVENTS, &bpage->commit);
+	}
+
+	/*
+	 * This page may be off to user land. Zero it out here.
+	 */
+	if (commit < BUF_PAGE_SIZE)
+		memset(&bpage->data[commit], 0, BUF_PAGE_SIZE - commit);
+
+ out_unlock:
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+ out:
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ftrace_ring_buffer_read_page);
+
+#ifdef CONFIG_TRACING
+static ssize_t
+rb_simple_read(struct file *filp, char __user *ubuf,
+	       size_t cnt, loff_t *ppos)
+{
+	unsigned long *p = filp->private_data;
+	char buf[64];
+	int r;
+
+	if (test_bit(RB_BUFFERS_DISABLED_BIT, p))
+		r = sprintf(buf, "permanently disabled\n");
+	else
+		r = sprintf(buf, "%d\n", test_bit(RB_BUFFERS_ON_BIT, p));
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
+rb_simple_write(struct file *filp, const char __user *ubuf,
+		size_t cnt, loff_t *ppos)
+{
+	unsigned long *p = filp->private_data;
+	char buf[64];
+	unsigned long val;
+	int ret;
+
+	if (cnt >= sizeof(buf))
+		return -EINVAL;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	ret = strict_strtoul(buf, 10, &val);
+	if (ret < 0)
+		return ret;
+
+	if (val)
+		set_bit(RB_BUFFERS_ON_BIT, p);
+	else
+		clear_bit(RB_BUFFERS_ON_BIT, p);
+
+	(*ppos)++;
+
+	return cnt;
+}
+
+static const struct file_operations rb_simple_fops = {
+	.open		= tracing_open_generic,
+	.read		= rb_simple_read,
+	.write		= rb_simple_write,
+};
+
+
+static __init int rb_init_debugfs(void)
+{
+	struct dentry *d_tracer;
+
+	d_tracer = tracing_init_dentry();
+
+	trace_create_file("tracing_on", 0644, d_tracer,
+			    &ftrace_ring_buffer_flags, &rb_simple_fops);
+
+	return 0;
+}
+
+fs_initcall(rb_init_debugfs);
+#endif
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int rb_cpu_notify(struct notifier_block *self,
+			 unsigned long action, void *hcpu)
+{
+	struct ftrace_ring_buffer *buffer =
+		container_of(self, struct ftrace_ring_buffer, cpu_notify);
+	long cpu = (long)hcpu;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		if (cpumask_test_cpu(cpu, buffer->cpumask))
+			return NOTIFY_OK;
+
+		buffer->buffers[cpu] =
+			rb_allocate_cpu_buffer(buffer, cpu);
+		if (!buffer->buffers[cpu]) {
+			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
+			     cpu);
+			return NOTIFY_OK;
+		}
+		smp_wmb();
+		cpumask_set_cpu(cpu, buffer->cpumask);
+		break;
+	case CPU_DOWN_PREPARE:
+	case CPU_DOWN_PREPARE_FROZEN:
+		/*
+		 * Do nothing.
+		 *  If we were to free the buffer, then the user would
+		 *  lose any trace that was in the buffer.
+		 */
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+#endif
Index: linux.trees.git/kernel/trace/ring_buffer.c
===================================================================
--- linux.trees.git.orig/kernel/trace/ring_buffer.c	2010-07-09 18:08:14.000000000 -0400
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,4022 +0,0 @@
-/*
- * Generic ring buffer
- *
- * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com>
- */
-#include <linux/ring_buffer.h>
-#include <linux/trace_clock.h>
-#include <linux/ftrace_irq.h>
-#include <linux/spinlock.h>
-#include <linux/debugfs.h>
-#include <linux/uaccess.h>
-#include <linux/hardirq.h>
-#include <linux/kmemcheck.h>
-#include <linux/module.h>
-#include <linux/percpu.h>
-#include <linux/mutex.h>
-#include <linux/slab.h>
-#include <linux/init.h>
-#include <linux/hash.h>
-#include <linux/list.h>
-#include <linux/cpu.h>
-#include <linux/fs.h>
-
-#include <asm/local.h>
-#include "trace.h"
-
-/*
- * The ring buffer header is special. We must manually up keep it.
- */
-int ring_buffer_print_entry_header(struct trace_seq *s)
-{
-	int ret;
-
-	ret = trace_seq_printf(s, "# compressed entry header\n");
-	ret = trace_seq_printf(s, "\ttype_len    :    5 bits\n");
-	ret = trace_seq_printf(s, "\ttime_delta  :   27 bits\n");
-	ret = trace_seq_printf(s, "\tarray       :   32 bits\n");
-	ret = trace_seq_printf(s, "\n");
-	ret = trace_seq_printf(s, "\tpadding     : type == %d\n",
-			       RINGBUF_TYPE_PADDING);
-	ret = trace_seq_printf(s, "\ttime_extend : type == %d\n",
-			       RINGBUF_TYPE_TIME_EXTEND);
-	ret = trace_seq_printf(s, "\tdata max type_len  == %d\n",
-			       RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
-
-	return ret;
-}
-
-/*
- * The ring buffer is made up of a list of pages. A separate list of pages is
- * allocated for each CPU. A writer may only write to a buffer that is
- * associated with the CPU it is currently executing on.  A reader may read
- * from any per cpu buffer.
- *
- * The reader is special. For each per cpu buffer, the reader has its own
- * reader page. When a reader has read the entire reader page, this reader
- * page is swapped with another page in the ring buffer.
- *
- * Now, as long as the writer is off the reader page, the reader can do what
- * ever it wants with that page. The writer will never write to that page
- * again (as long as it is out of the ring buffer).
- *
- * Here's some silly ASCII art.
- *
- *   +------+
- *   |reader|          RING BUFFER
- *   |page  |
- *   +------+        +---+   +---+   +---+
- *                   |   |-->|   |-->|   |
- *                   +---+   +---+   +---+
- *                     ^               |
- *                     |               |
- *                     +---------------+
- *
- *
- *   +------+
- *   |reader|          RING BUFFER
- *   |page  |------------------v
- *   +------+        +---+   +---+   +---+
- *                   |   |-->|   |-->|   |
- *                   +---+   +---+   +---+
- *                     ^               |
- *                     |               |
- *                     +---------------+
- *
- *
- *   +------+
- *   |reader|          RING BUFFER
- *   |page  |------------------v
- *   +------+        +---+   +---+   +---+
- *      ^            |   |-->|   |-->|   |
- *      |            +---+   +---+   +---+
- *      |                              |
- *      |                              |
- *      +------------------------------+
- *
- *
- *   +------+
- *   |buffer|          RING BUFFER
- *   |page  |------------------v
- *   +------+        +---+   +---+   +---+
- *      ^            |   |   |   |-->|   |
- *      |   New      +---+   +---+   +---+
- *      |  Reader------^               |
- *      |   page                       |
- *      +------------------------------+
- *
- *
- * After we make this swap, the reader can hand this page off to the splice
- * code and be done with it. It can even allocate a new page if it needs to
- * and swap that into the ring buffer.
- *
- * We will be using cmpxchg soon to make all this lockless.
- *
- */
-
-/*
- * A fast way to enable or disable all ring buffers is to
- * call tracing_on or tracing_off. Turning off the ring buffers
- * prevents all ring buffers from being recorded to.
- * Turning this switch on, makes it OK to write to the
- * ring buffer, if the ring buffer is enabled itself.
- *
- * There's three layers that must be on in order to write
- * to the ring buffer.
- *
- * 1) This global flag must be set.
- * 2) The ring buffer must be enabled for recording.
- * 3) The per cpu buffer must be enabled for recording.
- *
- * In case of an anomaly, this global flag has a bit set that
- * will permantly disable all ring buffers.
- */
-
-/*
- * Global flag to disable all recording to ring buffers
- *  This has two bits: ON, DISABLED
- *
- *  ON   DISABLED
- * ---- ----------
- *   0      0        : ring buffers are off
- *   1      0        : ring buffers are on
- *   X      1        : ring buffers are permanently disabled
- */
-
-enum {
-	RB_BUFFERS_ON_BIT	= 0,
-	RB_BUFFERS_DISABLED_BIT	= 1,
-};
-
-enum {
-	RB_BUFFERS_ON		= 1 << RB_BUFFERS_ON_BIT,
-	RB_BUFFERS_DISABLED	= 1 << RB_BUFFERS_DISABLED_BIT,
-};
-
-static unsigned long ring_buffer_flags __read_mostly = RB_BUFFERS_ON;
-
-#define BUF_PAGE_HDR_SIZE offsetof(struct buffer_data_page, data)
-
-/**
- * tracing_on - enable all tracing buffers
- *
- * This function enables all tracing buffers that may have been
- * disabled with tracing_off.
- */
-void tracing_on(void)
-{
-	set_bit(RB_BUFFERS_ON_BIT, &ring_buffer_flags);
-}
-EXPORT_SYMBOL_GPL(tracing_on);
-
-/**
- * tracing_off - turn off all tracing buffers
- *
- * This function stops all tracing buffers from recording data.
- * It does not disable any overhead the tracers themselves may
- * be causing. This function simply causes all recording to
- * the ring buffers to fail.
- */
-void tracing_off(void)
-{
-	clear_bit(RB_BUFFERS_ON_BIT, &ring_buffer_flags);
-}
-EXPORT_SYMBOL_GPL(tracing_off);
-
-/**
- * tracing_off_permanent - permanently disable ring buffers
- *
- * This function, once called, will disable all ring buffers
- * permanently.
- */
-void tracing_off_permanent(void)
-{
-	set_bit(RB_BUFFERS_DISABLED_BIT, &ring_buffer_flags);
-}
-
-/**
- * tracing_is_on - show state of ring buffers enabled
- */
-int tracing_is_on(void)
-{
-	return ring_buffer_flags == RB_BUFFERS_ON;
-}
-EXPORT_SYMBOL_GPL(tracing_is_on);
-
-#define RB_EVNT_HDR_SIZE (offsetof(struct ring_buffer_event, array))
-#define RB_ALIGNMENT		4U
-#define RB_MAX_SMALL_DATA	(RB_ALIGNMENT * RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
-#define RB_EVNT_MIN_SIZE	8U	/* two 32bit words */
-
-#if !defined(CONFIG_64BIT) || defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
-# define RB_FORCE_8BYTE_ALIGNMENT	0
-# define RB_ARCH_ALIGNMENT		RB_ALIGNMENT
-#else
-# define RB_FORCE_8BYTE_ALIGNMENT	1
-# define RB_ARCH_ALIGNMENT		8U
-#endif
-
-/* define RINGBUF_TYPE_DATA for 'case RINGBUF_TYPE_DATA:' */
-#define RINGBUF_TYPE_DATA 0 ... RINGBUF_TYPE_DATA_TYPE_LEN_MAX
-
-enum {
-	RB_LEN_TIME_EXTEND = 8,
-	RB_LEN_TIME_STAMP = 16,
-};
-
-static inline int rb_null_event(struct ring_buffer_event *event)
-{
-	return event->type_len == RINGBUF_TYPE_PADDING && !event->time_delta;
-}
-
-static void rb_event_set_padding(struct ring_buffer_event *event)
-{
-	/* padding has a NULL time_delta */
-	event->type_len = RINGBUF_TYPE_PADDING;
-	event->time_delta = 0;
-}
-
-static unsigned
-rb_event_data_length(struct ring_buffer_event *event)
-{
-	unsigned length;
-
-	if (event->type_len)
-		length = event->type_len * RB_ALIGNMENT;
-	else
-		length = event->array[0];
-	return length + RB_EVNT_HDR_SIZE;
-}
-
-/* inline for ring buffer fast paths */
-static unsigned
-rb_event_length(struct ring_buffer_event *event)
-{
-	switch (event->type_len) {
-	case RINGBUF_TYPE_PADDING:
-		if (rb_null_event(event))
-			/* undefined */
-			return -1;
-		return  event->array[0] + RB_EVNT_HDR_SIZE;
-
-	case RINGBUF_TYPE_TIME_EXTEND:
-		return RB_LEN_TIME_EXTEND;
-
-	case RINGBUF_TYPE_TIME_STAMP:
-		return RB_LEN_TIME_STAMP;
-
-	case RINGBUF_TYPE_DATA:
-		return rb_event_data_length(event);
-	default:
-		BUG();
-	}
-	/* not hit */
-	return 0;
-}
-
-/**
- * ring_buffer_event_length - return the length of the event
- * @event: the event to get the length of
- */
-unsigned ring_buffer_event_length(struct ring_buffer_event *event)
-{
-	unsigned length = rb_event_length(event);
-	if (event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
-		return length;
-	length -= RB_EVNT_HDR_SIZE;
-	if (length > RB_MAX_SMALL_DATA + sizeof(event->array[0]))
-                length -= sizeof(event->array[0]);
-	return length;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_event_length);
-
-/* inline for ring buffer fast paths */
-static void *
-rb_event_data(struct ring_buffer_event *event)
-{
-	BUG_ON(event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
-	/* If length is in len field, then array[0] has the data */
-	if (event->type_len)
-		return (void *)&event->array[0];
-	/* Otherwise length is in array[0] and array[1] has the data */
-	return (void *)&event->array[1];
-}
-
-/**
- * ring_buffer_event_data - return the data of the event
- * @event: the event to get the data from
- */
-void *ring_buffer_event_data(struct ring_buffer_event *event)
-{
-	return rb_event_data(event);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_event_data);
-
-#define for_each_buffer_cpu(buffer, cpu)		\
-	for_each_cpu(cpu, buffer->cpumask)
-
-#define TS_SHIFT	27
-#define TS_MASK		((1ULL << TS_SHIFT) - 1)
-#define TS_DELTA_TEST	(~TS_MASK)
-
-/* Flag when events were overwritten */
-#define RB_MISSED_EVENTS	(1 << 31)
-/* Missed count stored at end */
-#define RB_MISSED_STORED	(1 << 30)
-
-struct buffer_data_page {
-	u64		 time_stamp;	/* page time stamp */
-	local_t		 commit;	/* write committed index */
-	unsigned char	 data[];	/* data of buffer page */
-};
-
-/*
- * Note, the buffer_page list must be first. The buffer pages
- * are allocated in cache lines, which means that each buffer
- * page will be at the beginning of a cache line, and thus
- * the least significant bits will be zero. We use this to
- * add flags in the list struct pointers, to make the ring buffer
- * lockless.
- */
-struct buffer_page {
-	struct list_head list;		/* list of buffer pages */
-	local_t		 write;		/* index for next write */
-	unsigned	 read;		/* index for next read */
-	local_t		 entries;	/* entries on this page */
-	unsigned long	 real_end;	/* real end of data */
-	struct buffer_data_page *page;	/* Actual data page */
-};
-
-/*
- * The buffer page counters, write and entries, must be reset
- * atomically when crossing page boundaries. To synchronize this
- * update, two counters are inserted into the number. One is
- * the actual counter for the write position or count on the page.
- *
- * The other is a counter of updaters. Before an update happens
- * the update partition of the counter is incremented. This will
- * allow the updater to update the counter atomically.
- *
- * The counter is 20 bits, and the state data is 12.
- */
-#define RB_WRITE_MASK		0xfffff
-#define RB_WRITE_INTCNT		(1 << 20)
-
-static void rb_init_page(struct buffer_data_page *bpage)
-{
-	local_set(&bpage->commit, 0);
-}
-
-/**
- * ring_buffer_page_len - the size of data on the page.
- * @page: The page to read
- *
- * Returns the amount of data on the page, including buffer page header.
- */
-size_t ring_buffer_page_len(void *page)
-{
-	return local_read(&((struct buffer_data_page *)page)->commit)
-		+ BUF_PAGE_HDR_SIZE;
-}
-
-/*
- * Also stolen from mm/slob.c. Thanks to Mathieu Desnoyers for pointing
- * this issue out.
- */
-static void free_buffer_page(struct buffer_page *bpage)
-{
-	free_page((unsigned long)bpage->page);
-	kfree(bpage);
-}
-
-/*
- * We need to fit the time_stamp delta into 27 bits.
- */
-static inline int test_time_stamp(u64 delta)
-{
-	if (delta & TS_DELTA_TEST)
-		return 1;
-	return 0;
-}
-
-#define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE)
-
-/* Max payload is BUF_PAGE_SIZE - header (8bytes) */
-#define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
-
-/* Max number of timestamps that can fit on a page */
-#define RB_TIMESTAMPS_PER_PAGE	(BUF_PAGE_SIZE / RB_LEN_TIME_STAMP)
-
-int ring_buffer_print_page_header(struct trace_seq *s)
-{
-	struct buffer_data_page field;
-	int ret;
-
-	ret = trace_seq_printf(s, "\tfield: u64 timestamp;\t"
-			       "offset:0;\tsize:%u;\tsigned:%u;\n",
-			       (unsigned int)sizeof(field.time_stamp),
-			       (unsigned int)is_signed_type(u64));
-
-	ret = trace_seq_printf(s, "\tfield: local_t commit;\t"
-			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
-			       (unsigned int)offsetof(typeof(field), commit),
-			       (unsigned int)sizeof(field.commit),
-			       (unsigned int)is_signed_type(long));
-
-	ret = trace_seq_printf(s, "\tfield: int overwrite;\t"
-			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
-			       (unsigned int)offsetof(typeof(field), commit),
-			       1,
-			       (unsigned int)is_signed_type(long));
-
-	ret = trace_seq_printf(s, "\tfield: char data;\t"
-			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
-			       (unsigned int)offsetof(typeof(field), data),
-			       (unsigned int)BUF_PAGE_SIZE,
-			       (unsigned int)is_signed_type(char));
-
-	return ret;
-}
-
-/*
- * head_page == tail_page && head == tail then buffer is empty.
- */
-struct ring_buffer_per_cpu {
-	int				cpu;
-	struct ring_buffer		*buffer;
-	spinlock_t			reader_lock;	/* serialize readers */
-	arch_spinlock_t			lock;
-	struct lock_class_key		lock_key;
-	struct list_head		*pages;
-	struct buffer_page		*head_page;	/* read from head */
-	struct buffer_page		*tail_page;	/* write to tail */
-	struct buffer_page		*commit_page;	/* committed pages */
-	struct buffer_page		*reader_page;
-	unsigned long			lost_events;
-	unsigned long			last_overrun;
-	local_t				commit_overrun;
-	local_t				overrun;
-	local_t				entries;
-	local_t				committing;
-	local_t				commits;
-	unsigned long			read;
-	u64				write_stamp;
-	u64				read_stamp;
-	atomic_t			record_disabled;
-};
-
-struct ring_buffer {
-	unsigned			pages;
-	unsigned			flags;
-	int				cpus;
-	atomic_t			record_disabled;
-	cpumask_var_t			cpumask;
-
-	struct lock_class_key		*reader_lock_key;
-
-	struct mutex			mutex;
-
-	struct ring_buffer_per_cpu	**buffers;
-
-#ifdef CONFIG_HOTPLUG_CPU
-	struct notifier_block		cpu_notify;
-#endif
-	u64				(*clock)(void);
-};
-
-struct ring_buffer_iter {
-	struct ring_buffer_per_cpu	*cpu_buffer;
-	unsigned long			head;
-	struct buffer_page		*head_page;
-	struct buffer_page		*cache_reader_page;
-	unsigned long			cache_read;
-	u64				read_stamp;
-};
-
-/* buffer may be either ring_buffer or ring_buffer_per_cpu */
-#define RB_WARN_ON(b, cond)						\
-	({								\
-		int _____ret = unlikely(cond);				\
-		if (_____ret) {						\
-			if (__same_type(*(b), struct ring_buffer_per_cpu)) { \
-				struct ring_buffer_per_cpu *__b =	\
-					(void *)b;			\
-				atomic_inc(&__b->buffer->record_disabled); \
-			} else						\
-				atomic_inc(&b->record_disabled);	\
-			WARN_ON(1);					\
-		}							\
-		_____ret;						\
-	})
-
-/* Up this if you want to test the TIME_EXTENTS and normalization */
-#define DEBUG_SHIFT 0
-
-static inline u64 rb_time_stamp(struct ring_buffer *buffer)
-{
-	/* shift to debug/test normalization and TIME_EXTENTS */
-	return buffer->clock() << DEBUG_SHIFT;
-}
-
-u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu)
-{
-	u64 time;
-
-	preempt_disable_notrace();
-	time = rb_time_stamp(buffer);
-	preempt_enable_no_resched_notrace();
-
-	return time;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_time_stamp);
-
-void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
-				      int cpu, u64 *ts)
-{
-	/* Just stupid testing the normalize function and deltas */
-	*ts >>= DEBUG_SHIFT;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_normalize_time_stamp);
-
-/*
- * Making the ring buffer lockless makes things tricky.
- * Although writes only happen on the CPU that they are on,
- * and they only need to worry about interrupts. Reads can
- * happen on any CPU.
- *
- * The reader page is always off the ring buffer, but when the
- * reader finishes with a page, it needs to swap its page with
- * a new one from the buffer. The reader needs to take from
- * the head (writes go to the tail). But if a writer is in overwrite
- * mode and wraps, it must push the head page forward.
- *
- * Here lies the problem.
- *
- * The reader must be careful to replace only the head page, and
- * not another one. As described at the top of the file in the
- * ASCII art, the reader sets its old page to point to the next
- * page after head. It then sets the page after head to point to
- * the old reader page. But if the writer moves the head page
- * during this operation, the reader could end up with the tail.
- *
- * We use cmpxchg to help prevent this race. We also do something
- * special with the page before head. We set the LSB to 1.
- *
- * When the writer must push the page forward, it will clear the
- * bit that points to the head page, move the head, and then set
- * the bit that points to the new head page.
- *
- * We also don't want an interrupt coming in and moving the head
- * page on another writer. Thus we use the second LSB to catch
- * that too. Thus:
- *
- * head->list->prev->next        bit 1          bit 0
- *                              -------        -------
- * Normal page                     0              0
- * Points to head page             0              1
- * New head page                   1              0
- *
- * Note we can not trust the prev pointer of the head page, because:
- *
- * +----+       +-----+        +-----+
- * |    |------>|  T  |---X--->|  N  |
- * |    |<------|     |        |     |
- * +----+       +-----+        +-----+
- *   ^                           ^ |
- *   |          +-----+          | |
- *   +----------|  R  |----------+ |
- *              |     |<-----------+
- *              +-----+
- *
- * Key:  ---X-->  HEAD flag set in pointer
- *         T      Tail page
- *         R      Reader page
- *         N      Next page
- *
- * (see __rb_reserve_next() to see where this happens)
- *
- *  What the above shows is that the reader just swapped out
- *  the reader page with a page in the buffer, but before it
- *  could make the new header point back to the new page added
- *  it was preempted by a writer. The writer moved forward onto
- *  the new page added by the reader and is about to move forward
- *  again.
- *
- *  You can see, it is legitimate for the previous pointer of
- *  the head (or any page) not to point back to itself. But only
- *  temporarially.
- */
-
-#define RB_PAGE_NORMAL		0UL
-#define RB_PAGE_HEAD		1UL
-#define RB_PAGE_UPDATE		2UL
-
-
-#define RB_FLAG_MASK		3UL
-
-/* PAGE_MOVED is not part of the mask */
-#define RB_PAGE_MOVED		4UL
-
-/*
- * rb_list_head - remove any bit
- */
-static struct list_head *rb_list_head(struct list_head *list)
-{
-	unsigned long val = (unsigned long)list;
-
-	return (struct list_head *)(val & ~RB_FLAG_MASK);
-}
-
-/*
- * rb_is_head_page - test if the given page is the head page
- *
- * Because the reader may move the head_page pointer, we can
- * not trust what the head page is (it may be pointing to
- * the reader page). But if the next page is a header page,
- * its flags will be non zero.
- */
-static int inline
-rb_is_head_page(struct ring_buffer_per_cpu *cpu_buffer,
-		struct buffer_page *page, struct list_head *list)
-{
-	unsigned long val;
-
-	val = (unsigned long)list->next;
-
-	if ((val & ~RB_FLAG_MASK) != (unsigned long)&page->list)
-		return RB_PAGE_MOVED;
-
-	return val & RB_FLAG_MASK;
-}
-
-/*
- * rb_is_reader_page
- *
- * The unique thing about the reader page, is that, if the
- * writer is ever on it, the previous pointer never points
- * back to the reader page.
- */
-static int rb_is_reader_page(struct buffer_page *page)
-{
-	struct list_head *list = page->list.prev;
-
-	return rb_list_head(list->next) != &page->list;
-}
-
-/*
- * rb_set_list_to_head - set a list_head to be pointing to head.
- */
-static void rb_set_list_to_head(struct ring_buffer_per_cpu *cpu_buffer,
-				struct list_head *list)
-{
-	unsigned long *ptr;
-
-	ptr = (unsigned long *)&list->next;
-	*ptr |= RB_PAGE_HEAD;
-	*ptr &= ~RB_PAGE_UPDATE;
-}
-
-/*
- * rb_head_page_activate - sets up head page
- */
-static void rb_head_page_activate(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct buffer_page *head;
-
-	head = cpu_buffer->head_page;
-	if (!head)
-		return;
-
-	/*
-	 * Set the previous list pointer to have the HEAD flag.
-	 */
-	rb_set_list_to_head(cpu_buffer, head->list.prev);
-}
-
-static void rb_list_head_clear(struct list_head *list)
-{
-	unsigned long *ptr = (unsigned long *)&list->next;
-
-	*ptr &= ~RB_FLAG_MASK;
-}
-
-/*
- * rb_head_page_dactivate - clears head page ptr (for free list)
- */
-static void
-rb_head_page_deactivate(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct list_head *hd;
-
-	/* Go through the whole list and clear any pointers found. */
-	rb_list_head_clear(cpu_buffer->pages);
-
-	list_for_each(hd, cpu_buffer->pages)
-		rb_list_head_clear(hd);
-}
-
-static int rb_head_page_set(struct ring_buffer_per_cpu *cpu_buffer,
-			    struct buffer_page *head,
-			    struct buffer_page *prev,
-			    int old_flag, int new_flag)
-{
-	struct list_head *list;
-	unsigned long val = (unsigned long)&head->list;
-	unsigned long ret;
-
-	list = &prev->list;
-
-	val &= ~RB_FLAG_MASK;
-
-	ret = cmpxchg((unsigned long *)&list->next,
-		      val | old_flag, val | new_flag);
-
-	/* check if the reader took the page */
-	if ((ret & ~RB_FLAG_MASK) != val)
-		return RB_PAGE_MOVED;
-
-	return ret & RB_FLAG_MASK;
-}
-
-static int rb_head_page_set_update(struct ring_buffer_per_cpu *cpu_buffer,
-				   struct buffer_page *head,
-				   struct buffer_page *prev,
-				   int old_flag)
-{
-	return rb_head_page_set(cpu_buffer, head, prev,
-				old_flag, RB_PAGE_UPDATE);
-}
-
-static int rb_head_page_set_head(struct ring_buffer_per_cpu *cpu_buffer,
-				 struct buffer_page *head,
-				 struct buffer_page *prev,
-				 int old_flag)
-{
-	return rb_head_page_set(cpu_buffer, head, prev,
-				old_flag, RB_PAGE_HEAD);
-}
-
-static int rb_head_page_set_normal(struct ring_buffer_per_cpu *cpu_buffer,
-				   struct buffer_page *head,
-				   struct buffer_page *prev,
-				   int old_flag)
-{
-	return rb_head_page_set(cpu_buffer, head, prev,
-				old_flag, RB_PAGE_NORMAL);
-}
-
-static inline void rb_inc_page(struct ring_buffer_per_cpu *cpu_buffer,
-			       struct buffer_page **bpage)
-{
-	struct list_head *p = rb_list_head((*bpage)->list.next);
-
-	*bpage = list_entry(p, struct buffer_page, list);
-}
-
-static struct buffer_page *
-rb_set_head_page(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct buffer_page *head;
-	struct buffer_page *page;
-	struct list_head *list;
-	int i;
-
-	if (RB_WARN_ON(cpu_buffer, !cpu_buffer->head_page))
-		return NULL;
-
-	/* sanity check */
-	list = cpu_buffer->pages;
-	if (RB_WARN_ON(cpu_buffer, rb_list_head(list->prev->next) != list))
-		return NULL;
-
-	page = head = cpu_buffer->head_page;
-	/*
-	 * It is possible that the writer moves the header behind
-	 * where we started, and we miss in one loop.
-	 * A second loop should grab the header, but we'll do
-	 * three loops just because I'm paranoid.
-	 */
-	for (i = 0; i < 3; i++) {
-		do {
-			if (rb_is_head_page(cpu_buffer, page, page->list.prev)) {
-				cpu_buffer->head_page = page;
-				return page;
-			}
-			rb_inc_page(cpu_buffer, &page);
-		} while (page != head);
-	}
-
-	RB_WARN_ON(cpu_buffer, 1);
-
-	return NULL;
-}
-
-static int rb_head_page_replace(struct buffer_page *old,
-				struct buffer_page *new)
-{
-	unsigned long *ptr = (unsigned long *)&old->list.prev->next;
-	unsigned long val;
-	unsigned long ret;
-
-	val = *ptr & ~RB_FLAG_MASK;
-	val |= RB_PAGE_HEAD;
-
-	ret = cmpxchg(ptr, val, (unsigned long)&new->list);
-
-	return ret == val;
-}
-
-/*
- * rb_tail_page_update - move the tail page forward
- *
- * Returns 1 if moved tail page, 0 if someone else did.
- */
-static int rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
-			       struct buffer_page *tail_page,
-			       struct buffer_page *next_page)
-{
-	struct buffer_page *old_tail;
-	unsigned long old_entries;
-	unsigned long old_write;
-	int ret = 0;
-
-	/*
-	 * The tail page now needs to be moved forward.
-	 *
-	 * We need to reset the tail page, but without messing
-	 * with possible erasing of data brought in by interrupts
-	 * that have moved the tail page and are currently on it.
-	 *
-	 * We add a counter to the write field to denote this.
-	 */
-	old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
-	old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
-
-	/*
-	 * Just make sure we have seen our old_write and synchronize
-	 * with any interrupts that come in.
-	 */
-	barrier();
-
-	/*
-	 * If the tail page is still the same as what we think
-	 * it is, then it is up to us to update the tail
-	 * pointer.
-	 */
-	if (tail_page == cpu_buffer->tail_page) {
-		/* Zero the write counter */
-		unsigned long val = old_write & ~RB_WRITE_MASK;
-		unsigned long eval = old_entries & ~RB_WRITE_MASK;
-
-		/*
-		 * This will only succeed if an interrupt did
-		 * not come in and change it. In which case, we
-		 * do not want to modify it.
-		 *
-		 * We add (void) to let the compiler know that we do not care
-		 * about the return value of these functions. We use the
-		 * cmpxchg to only update if an interrupt did not already
-		 * do it for us. If the cmpxchg fails, we don't care.
-		 */
-		(void)local_cmpxchg(&next_page->write, old_write, val);
-		(void)local_cmpxchg(&next_page->entries, old_entries, eval);
-
-		/*
-		 * No need to worry about races with clearing out the commit.
-		 * it only can increment when a commit takes place. But that
-		 * only happens in the outer most nested commit.
-		 */
-		local_set(&next_page->page->commit, 0);
-
-		old_tail = cmpxchg(&cpu_buffer->tail_page,
-				   tail_page, next_page);
-
-		if (old_tail == tail_page)
-			ret = 1;
-	}
-
-	return ret;
-}
-
-static int rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer,
-			  struct buffer_page *bpage)
-{
-	unsigned long val = (unsigned long)bpage;
-
-	if (RB_WARN_ON(cpu_buffer, val & RB_FLAG_MASK))
-		return 1;
-
-	return 0;
-}
-
-/**
- * rb_check_list - make sure a pointer to a list has the last bits zero
- */
-static int rb_check_list(struct ring_buffer_per_cpu *cpu_buffer,
-			 struct list_head *list)
-{
-	if (RB_WARN_ON(cpu_buffer, rb_list_head(list->prev) != list->prev))
-		return 1;
-	if (RB_WARN_ON(cpu_buffer, rb_list_head(list->next) != list->next))
-		return 1;
-	return 0;
-}
-
-/**
- * check_pages - integrity check of buffer pages
- * @cpu_buffer: CPU buffer with pages to test
- *
- * As a safety measure we check to make sure the data pages have not
- * been corrupted.
- */
-static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct list_head *head = cpu_buffer->pages;
-	struct buffer_page *bpage, *tmp;
-
-	rb_head_page_deactivate(cpu_buffer);
-
-	if (RB_WARN_ON(cpu_buffer, head->next->prev != head))
-		return -1;
-	if (RB_WARN_ON(cpu_buffer, head->prev->next != head))
-		return -1;
-
-	if (rb_check_list(cpu_buffer, head))
-		return -1;
-
-	list_for_each_entry_safe(bpage, tmp, head, list) {
-		if (RB_WARN_ON(cpu_buffer,
-			       bpage->list.next->prev != &bpage->list))
-			return -1;
-		if (RB_WARN_ON(cpu_buffer,
-			       bpage->list.prev->next != &bpage->list))
-			return -1;
-		if (rb_check_list(cpu_buffer, &bpage->list))
-			return -1;
-	}
-
-	rb_head_page_activate(cpu_buffer);
-
-	return 0;
-}
-
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
-{
-	struct buffer_page *bpage, *tmp;
-	unsigned long addr;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
-
-	for (i = 0; i < nr_pages; i++) {
-		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
-				    GFP_KERNEL, cpu_to_node(cpu_buffer->cpu));
-		if (!bpage)
-			goto free_pages;
-
-		rb_check_bpage(cpu_buffer, bpage);
-
-		list_add(&bpage->list, &pages);
-
-		addr = __get_free_page(GFP_KERNEL);
-		if (!addr)
-			goto free_pages;
-		bpage->page = (void *)addr;
-		rb_init_page(bpage->page);
-	}
-
-	/*
-	 * The ring buffer page list is a circular list that does not
-	 * start and end with a list head. All page list items point to
-	 * other pages.
-	 */
-	cpu_buffer->pages = pages.next;
-	list_del(&pages);
-
-	rb_check_pages(cpu_buffer);
-
-	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
-}
-
-static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	struct buffer_page *bpage;
-	unsigned long addr;
-	int ret;
-
-	cpu_buffer = kzalloc_node(ALIGN(sizeof(*cpu_buffer), cache_line_size()),
-				  GFP_KERNEL, cpu_to_node(cpu));
-	if (!cpu_buffer)
-		return NULL;
-
-	cpu_buffer->cpu = cpu;
-	cpu_buffer->buffer = buffer;
-	spin_lock_init(&cpu_buffer->reader_lock);
-	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
-	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
-
-	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
-			    GFP_KERNEL, cpu_to_node(cpu));
-	if (!bpage)
-		goto fail_free_buffer;
-
-	rb_check_bpage(cpu_buffer, bpage);
-
-	cpu_buffer->reader_page = bpage;
-	addr = __get_free_page(GFP_KERNEL);
-	if (!addr)
-		goto fail_free_reader;
-	bpage->page = (void *)addr;
-	rb_init_page(bpage->page);
-
-	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
-
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
-	if (ret < 0)
-		goto fail_free_reader;
-
-	cpu_buffer->head_page
-		= list_entry(cpu_buffer->pages, struct buffer_page, list);
-	cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page;
-
-	rb_head_page_activate(cpu_buffer);
-
-	return cpu_buffer;
-
- fail_free_reader:
-	free_buffer_page(cpu_buffer->reader_page);
-
- fail_free_buffer:
-	kfree(cpu_buffer);
-	return NULL;
-}
-
-static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct list_head *head = cpu_buffer->pages;
-	struct buffer_page *bpage, *tmp;
-
-	free_buffer_page(cpu_buffer->reader_page);
-
-	rb_head_page_deactivate(cpu_buffer);
-
-	if (head) {
-		list_for_each_entry_safe(bpage, tmp, head, list) {
-			list_del_init(&bpage->list);
-			free_buffer_page(bpage);
-		}
-		bpage = list_entry(head, struct buffer_page, list);
-		free_buffer_page(bpage);
-	}
-
-	kfree(cpu_buffer);
-}
-
-#ifdef CONFIG_HOTPLUG_CPU
-static int rb_cpu_notify(struct notifier_block *self,
-			 unsigned long action, void *hcpu);
-#endif
-
-/**
- * ring_buffer_alloc - allocate a new ring_buffer
- * @size: the size in bytes per cpu that is needed.
- * @flags: attributes to set for the ring buffer.
- *
- * Currently the only flag that is available is the RB_FL_OVERWRITE
- * flag. This flag means that the buffer will overwrite old data
- * when the buffer wraps. If this flag is not set, the buffer will
- * drop data when the tail hits the head.
- */
-struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
-					struct lock_class_key *key)
-{
-	struct ring_buffer *buffer;
-	int bsize;
-	int cpu;
-
-	/* keep it in its own cache line */
-	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
-			 GFP_KERNEL);
-	if (!buffer)
-		return NULL;
-
-	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
-		goto fail_free_buffer;
-
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
-	buffer->flags = flags;
-	buffer->clock = trace_clock_local;
-	buffer->reader_lock_key = key;
-
-	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
-
-	/*
-	 * In case of non-hotplug cpu, if the ring-buffer is allocated
-	 * in early initcall, it will not be notified of secondary cpus.
-	 * In that off case, we need to allocate for all possible cpus.
-	 */
-#ifdef CONFIG_HOTPLUG_CPU
-	get_online_cpus();
-	cpumask_copy(buffer->cpumask, cpu_online_mask);
-#else
-	cpumask_copy(buffer->cpumask, cpu_possible_mask);
-#endif
-	buffer->cpus = nr_cpu_ids;
-
-	bsize = sizeof(void *) * nr_cpu_ids;
-	buffer->buffers = kzalloc(ALIGN(bsize, cache_line_size()),
-				  GFP_KERNEL);
-	if (!buffer->buffers)
-		goto fail_free_cpumask;
-
-	for_each_buffer_cpu(buffer, cpu) {
-		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
-		if (!buffer->buffers[cpu])
-			goto fail_free_buffers;
-	}
-
-#ifdef CONFIG_HOTPLUG_CPU
-	buffer->cpu_notify.notifier_call = rb_cpu_notify;
-	buffer->cpu_notify.priority = 0;
-	register_cpu_notifier(&buffer->cpu_notify);
-#endif
-
-	put_online_cpus();
-	mutex_init(&buffer->mutex);
-
-	return buffer;
-
- fail_free_buffers:
-	for_each_buffer_cpu(buffer, cpu) {
-		if (buffer->buffers[cpu])
-			rb_free_cpu_buffer(buffer->buffers[cpu]);
-	}
-	kfree(buffer->buffers);
-
- fail_free_cpumask:
-	free_cpumask_var(buffer->cpumask);
-	put_online_cpus();
-
- fail_free_buffer:
-	kfree(buffer);
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(__ring_buffer_alloc);
-
-/**
- * ring_buffer_free - free a ring buffer.
- * @buffer: the buffer to free.
- */
-void
-ring_buffer_free(struct ring_buffer *buffer)
-{
-	int cpu;
-
-	get_online_cpus();
-
-#ifdef CONFIG_HOTPLUG_CPU
-	unregister_cpu_notifier(&buffer->cpu_notify);
-#endif
-
-	for_each_buffer_cpu(buffer, cpu)
-		rb_free_cpu_buffer(buffer->buffers[cpu]);
-
-	put_online_cpus();
-
-	kfree(buffer->buffers);
-	free_cpumask_var(buffer->cpumask);
-
-	kfree(buffer);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_free);
-
-void ring_buffer_set_clock(struct ring_buffer *buffer,
-			   u64 (*clock)(void))
-{
-	buffer->clock = clock;
-}
-
-static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
-
-static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
-{
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
-
-	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
-
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
-
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
-
-out:
-	spin_unlock_irq(&cpu_buffer->reader_lock);
-}
-
-static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
-{
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
-
-	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
-
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
-	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
-
-out:
-	spin_unlock_irq(&cpu_buffer->reader_lock);
-}
-
-/**
- * ring_buffer_resize - resize the ring buffer
- * @buffer: the buffer to resize.
- * @size: the new size.
- *
- * Minimum size is 2 * BUF_PAGE_SIZE.
- *
- * Returns -1 on failure.
- */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	unsigned long addr;
-	LIST_HEAD(pages);
-	int i, cpu;
-
-	/*
-	 * Always succeed at resizing a non-existent buffer:
-	 */
-	if (!buffer)
-		return size;
-
-	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
-	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
-
-	/* we need a minimum of two pages */
-	if (size < BUF_PAGE_SIZE * 2)
-		size = BUF_PAGE_SIZE * 2;
-
-	if (size == buffer_size)
-		return size;
-
-	atomic_inc(&buffer->record_disabled);
-
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
-
-	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
-
-	if (size < buffer_size) {
-
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
-
-		rm_pages = buffer->pages - nr_pages;
-
-		for_each_buffer_cpu(buffer, cpu) {
-			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
-		}
-		goto out;
-	}
-
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
-
-	new_pages = nr_pages - buffer->pages;
-
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL, cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			addr = __get_free_page(GFP_KERNEL);
-			if (!addr)
-				goto free_pages;
-			bpage->page = (void *)addr;
-			rb_init_page(bpage->page);
-		}
-	}
-
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
-	}
-
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
-
- out:
-	buffer->pages = nr_pages;
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
-	return size;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_resize);
-
-static inline void *
-__rb_data_page_index(struct buffer_data_page *bpage, unsigned index)
-{
-	return bpage->data + index;
-}
-
-static inline void *__rb_page_index(struct buffer_page *bpage, unsigned index)
-{
-	return bpage->page->data + index;
-}
-
-static inline struct ring_buffer_event *
-rb_reader_event(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	return __rb_page_index(cpu_buffer->reader_page,
-			       cpu_buffer->reader_page->read);
-}
-
-static inline struct ring_buffer_event *
-rb_iter_head_event(struct ring_buffer_iter *iter)
-{
-	return __rb_page_index(iter->head_page, iter->head);
-}
-
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
-static inline unsigned rb_page_commit(struct buffer_page *bpage)
-{
-	return local_read(&bpage->page->commit);
-}
-
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
-/* Size is determined by what has been commited */
-static inline unsigned rb_page_size(struct buffer_page *bpage)
-{
-	return rb_page_commit(bpage);
-}
-
-static inline unsigned
-rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	return rb_page_commit(cpu_buffer->commit_page);
-}
-
-static inline unsigned
-rb_event_index(struct ring_buffer_event *event)
-{
-	unsigned long addr = (unsigned long)event;
-
-	return (addr & ~PAGE_MASK) - BUF_PAGE_HDR_SIZE;
-}
-
-static inline int
-rb_event_is_commit(struct ring_buffer_per_cpu *cpu_buffer,
-		   struct ring_buffer_event *event)
-{
-	unsigned long addr = (unsigned long)event;
-	unsigned long index;
-
-	index = rb_event_index(event);
-	addr &= PAGE_MASK;
-
-	return cpu_buffer->commit_page->page == (void *)addr &&
-		rb_commit_index(cpu_buffer) == index;
-}
-
-static void
-rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	unsigned long max_count;
-
-	/*
-	 * We only race with interrupts and NMIs on this CPU.
-	 * If we own the commit event, then we can commit
-	 * all others that interrupted us, since the interruptions
-	 * are in stack format (they finish before they come
-	 * back to us). This allows us to do a simple loop to
-	 * assign the commit to the tail.
-	 */
- again:
-	max_count = cpu_buffer->buffer->pages * 100;
-
-	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
-		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
-			return;
-		if (RB_WARN_ON(cpu_buffer,
-			       rb_is_reader_page(cpu_buffer->tail_page)))
-			return;
-		local_set(&cpu_buffer->commit_page->page->commit,
-			  rb_page_write(cpu_buffer->commit_page));
-		rb_inc_page(cpu_buffer, &cpu_buffer->commit_page);
-		cpu_buffer->write_stamp =
-			cpu_buffer->commit_page->page->time_stamp;
-		/* add barrier to keep gcc from optimizing too much */
-		barrier();
-	}
-	while (rb_commit_index(cpu_buffer) !=
-	       rb_page_write(cpu_buffer->commit_page)) {
-
-		local_set(&cpu_buffer->commit_page->page->commit,
-			  rb_page_write(cpu_buffer->commit_page));
-		RB_WARN_ON(cpu_buffer,
-			   local_read(&cpu_buffer->commit_page->page->commit) &
-			   ~RB_WRITE_MASK);
-		barrier();
-	}
-
-	/* again, keep gcc from optimizing */
-	barrier();
-
-	/*
-	 * If an interrupt came in just after the first while loop
-	 * and pushed the tail page forward, we will be left with
-	 * a dangling commit that will never go forward.
-	 */
-	if (unlikely(cpu_buffer->commit_page != cpu_buffer->tail_page))
-		goto again;
-}
-
-static void rb_reset_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	cpu_buffer->read_stamp = cpu_buffer->reader_page->page->time_stamp;
-	cpu_buffer->reader_page->read = 0;
-}
-
-static void rb_inc_iter(struct ring_buffer_iter *iter)
-{
-	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
-
-	/*
-	 * The iterator could be on the reader page (it starts there).
-	 * But the head could have moved, since the reader was
-	 * found. Check for this case and assign the iterator
-	 * to the head page instead of next.
-	 */
-	if (iter->head_page == cpu_buffer->reader_page)
-		iter->head_page = rb_set_head_page(cpu_buffer);
-	else
-		rb_inc_page(cpu_buffer, &iter->head_page);
-
-	iter->read_stamp = iter->head_page->page->time_stamp;
-	iter->head = 0;
-}
-
-/**
- * ring_buffer_update_event - update event type and data
- * @event: the even to update
- * @type: the type of event
- * @length: the size of the event field in the ring buffer
- *
- * Update the type and data fields of the event. The length
- * is the actual size that is written to the ring buffer,
- * and with this, we can determine what to place into the
- * data field.
- */
-static void
-rb_update_event(struct ring_buffer_event *event,
-			 unsigned type, unsigned length)
-{
-	event->type_len = type;
-
-	switch (type) {
-
-	case RINGBUF_TYPE_PADDING:
-	case RINGBUF_TYPE_TIME_EXTEND:
-	case RINGBUF_TYPE_TIME_STAMP:
-		break;
-
-	case 0:
-		length -= RB_EVNT_HDR_SIZE;
-		if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT)
-			event->array[0] = length;
-		else
-			event->type_len = DIV_ROUND_UP(length, RB_ALIGNMENT);
-		break;
-	default:
-		BUG();
-	}
-}
-
-/*
- * rb_handle_head_page - writer hit the head page
- *
- * Returns: +1 to retry page
- *           0 to continue
- *          -1 on error
- */
-static int
-rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer,
-		    struct buffer_page *tail_page,
-		    struct buffer_page *next_page)
-{
-	struct buffer_page *new_head;
-	int entries;
-	int type;
-	int ret;
-
-	entries = rb_page_entries(next_page);
-
-	/*
-	 * The hard part is here. We need to move the head
-	 * forward, and protect against both readers on
-	 * other CPUs and writers coming in via interrupts.
-	 */
-	type = rb_head_page_set_update(cpu_buffer, next_page, tail_page,
-				       RB_PAGE_HEAD);
-
-	/*
-	 * type can be one of four:
-	 *  NORMAL - an interrupt already moved it for us
-	 *  HEAD   - we are the first to get here.
-	 *  UPDATE - we are the interrupt interrupting
-	 *           a current move.
-	 *  MOVED  - a reader on another CPU moved the next
-	 *           pointer to its reader page. Give up
-	 *           and try again.
-	 */
-
-	switch (type) {
-	case RB_PAGE_HEAD:
-		/*
-		 * We changed the head to UPDATE, thus
-		 * it is our responsibility to update
-		 * the counters.
-		 */
-		local_add(entries, &cpu_buffer->overrun);
-
-		/*
-		 * The entries will be zeroed out when we move the
-		 * tail page.
-		 */
-
-		/* still more to do */
-		break;
-
-	case RB_PAGE_UPDATE:
-		/*
-		 * This is an interrupt that interrupt the
-		 * previous update. Still more to do.
-		 */
-		break;
-	case RB_PAGE_NORMAL:
-		/*
-		 * An interrupt came in before the update
-		 * and processed this for us.
-		 * Nothing left to do.
-		 */
-		return 1;
-	case RB_PAGE_MOVED:
-		/*
-		 * The reader is on another CPU and just did
-		 * a swap with our next_page.
-		 * Try again.
-		 */
-		return 1;
-	default:
-		RB_WARN_ON(cpu_buffer, 1); /* WTF??? */
-		return -1;
-	}
-
-	/*
-	 * Now that we are here, the old head pointer is
-	 * set to UPDATE. This will keep the reader from
-	 * swapping the head page with the reader page.
-	 * The reader (on another CPU) will spin till
-	 * we are finished.
-	 *
-	 * We just need to protect against interrupts
-	 * doing the job. We will set the next pointer
-	 * to HEAD. After that, we set the old pointer
-	 * to NORMAL, but only if it was HEAD before.
-	 * otherwise we are an interrupt, and only
-	 * want the outer most commit to reset it.
-	 */
-	new_head = next_page;
-	rb_inc_page(cpu_buffer, &new_head);
-
-	ret = rb_head_page_set_head(cpu_buffer, new_head, next_page,
-				    RB_PAGE_NORMAL);
-
-	/*
-	 * Valid returns are:
-	 *  HEAD   - an interrupt came in and already set it.
-	 *  NORMAL - One of two things:
-	 *            1) We really set it.
-	 *            2) A bunch of interrupts came in and moved
-	 *               the page forward again.
-	 */
-	switch (ret) {
-	case RB_PAGE_HEAD:
-	case RB_PAGE_NORMAL:
-		/* OK */
-		break;
-	default:
-		RB_WARN_ON(cpu_buffer, 1);
-		return -1;
-	}
-
-	/*
-	 * It is possible that an interrupt came in,
-	 * set the head up, then more interrupts came in
-	 * and moved it again. When we get back here,
-	 * the page would have been set to NORMAL but we
-	 * just set it back to HEAD.
-	 *
-	 * How do you detect this? Well, if that happened
-	 * the tail page would have moved.
-	 */
-	if (ret == RB_PAGE_NORMAL) {
-		/*
-		 * If the tail had moved passed next, then we need
-		 * to reset the pointer.
-		 */
-		if (cpu_buffer->tail_page != tail_page &&
-		    cpu_buffer->tail_page != next_page)
-			rb_head_page_set_normal(cpu_buffer, new_head,
-						next_page,
-						RB_PAGE_HEAD);
-	}
-
-	/*
-	 * If this was the outer most commit (the one that
-	 * changed the original pointer from HEAD to UPDATE),
-	 * then it is up to us to reset it to NORMAL.
-	 */
-	if (type == RB_PAGE_HEAD) {
-		ret = rb_head_page_set_normal(cpu_buffer, next_page,
-					      tail_page,
-					      RB_PAGE_UPDATE);
-		if (RB_WARN_ON(cpu_buffer,
-			       ret != RB_PAGE_UPDATE))
-			return -1;
-	}
-
-	return 0;
-}
-
-static unsigned rb_calculate_event_length(unsigned length)
-{
-	struct ring_buffer_event event; /* Used only for sizeof array */
-
-	/* zero length can cause confusions */
-	if (!length)
-		length = 1;
-
-	if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT)
-		length += sizeof(event.array[0]);
-
-	length += RB_EVNT_HDR_SIZE;
-	length = ALIGN(length, RB_ARCH_ALIGNMENT);
-
-	return length;
-}
-
-static inline void
-rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
-	      struct buffer_page *tail_page,
-	      unsigned long tail, unsigned long length)
-{
-	struct ring_buffer_event *event;
-
-	/*
-	 * Only the event that crossed the page boundary
-	 * must fill the old tail_page with padding.
-	 */
-	if (tail >= BUF_PAGE_SIZE) {
-		/*
-		 * If the page was filled, then we still need
-		 * to update the real_end. Reset it to zero
-		 * and the reader will ignore it.
-		 */
-		if (tail == BUF_PAGE_SIZE)
-			tail_page->real_end = 0;
-
-		local_sub(length, &tail_page->write);
-		return;
-	}
-
-	event = __rb_page_index(tail_page, tail);
-	kmemcheck_annotate_bitfield(event, bitfield);
-
-	/*
-	 * Save the original length to the meta data.
-	 * This will be used by the reader to add lost event
-	 * counter.
-	 */
-	tail_page->real_end = tail;
-
-	/*
-	 * If this event is bigger than the minimum size, then
-	 * we need to be careful that we don't subtract the
-	 * write counter enough to allow another writer to slip
-	 * in on this page.
-	 * We put in a discarded commit instead, to make sure
-	 * that this space is not used again.
-	 *
-	 * If we are less than the minimum size, we don't need to
-	 * worry about it.
-	 */
-	if (tail > (BUF_PAGE_SIZE - RB_EVNT_MIN_SIZE)) {
-		/* No room for any events */
-
-		/* Mark the rest of the page with padding */
-		rb_event_set_padding(event);
-
-		/* Set the write back to the previous setting */
-		local_sub(length, &tail_page->write);
-		return;
-	}
-
-	/* Put in a discarded event */
-	event->array[0] = (BUF_PAGE_SIZE - tail) - RB_EVNT_HDR_SIZE;
-	event->type_len = RINGBUF_TYPE_PADDING;
-	/* time delta must be non zero */
-	event->time_delta = 1;
-
-	/* Set write to end of buffer */
-	length = (tail + length) - BUF_PAGE_SIZE;
-	local_sub(length, &tail_page->write);
-}
-
-static struct ring_buffer_event *
-rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
-	     unsigned long length, unsigned long tail,
-	     struct buffer_page *tail_page, u64 *ts)
-{
-	struct buffer_page *commit_page = cpu_buffer->commit_page;
-	struct ring_buffer *buffer = cpu_buffer->buffer;
-	struct buffer_page *next_page;
-	int ret;
-
-	next_page = tail_page;
-
-	rb_inc_page(cpu_buffer, &next_page);
-
-	/*
-	 * If for some reason, we had an interrupt storm that made
-	 * it all the way around the buffer, bail, and warn
-	 * about it.
-	 */
-	if (unlikely(next_page == commit_page)) {
-		local_inc(&cpu_buffer->commit_overrun);
-		goto out_reset;
-	}
-
-	/*
-	 * This is where the fun begins!
-	 *
-	 * We are fighting against races between a reader that
-	 * could be on another CPU trying to swap its reader
-	 * page with the buffer head.
-	 *
-	 * We are also fighting against interrupts coming in and
-	 * moving the head or tail on us as well.
-	 *
-	 * If the next page is the head page then we have filled
-	 * the buffer, unless the commit page is still on the
-	 * reader page.
-	 */
-	if (rb_is_head_page(cpu_buffer, next_page, &tail_page->list)) {
-
-		/*
-		 * If the commit is not on the reader page, then
-		 * move the header page.
-		 */
-		if (!rb_is_reader_page(cpu_buffer->commit_page)) {
-			/*
-			 * If we are not in overwrite mode,
-			 * this is easy, just stop here.
-			 */
-			if (!(buffer->flags & RB_FL_OVERWRITE))
-				goto out_reset;
-
-			ret = rb_handle_head_page(cpu_buffer,
-						  tail_page,
-						  next_page);
-			if (ret < 0)
-				goto out_reset;
-			if (ret)
-				goto out_again;
-		} else {
-			/*
-			 * We need to be careful here too. The
-			 * commit page could still be on the reader
-			 * page. We could have a small buffer, and
-			 * have filled up the buffer with events
-			 * from interrupts and such, and wrapped.
-			 *
-			 * Note, if the tail page is also the on the
-			 * reader_page, we let it move out.
-			 */
-			if (unlikely((cpu_buffer->commit_page !=
-				      cpu_buffer->tail_page) &&
-				     (cpu_buffer->commit_page ==
-				      cpu_buffer->reader_page))) {
-				local_inc(&cpu_buffer->commit_overrun);
-				goto out_reset;
-			}
-		}
-	}
-
-	ret = rb_tail_page_update(cpu_buffer, tail_page, next_page);
-	if (ret) {
-		/*
-		 * Nested commits always have zero deltas, so
-		 * just reread the time stamp
-		 */
-		*ts = rb_time_stamp(buffer);
-		next_page->page->time_stamp = *ts;
-	}
-
- out_again:
-
-	rb_reset_tail(cpu_buffer, tail_page, tail, length);
-
-	/* fail and let the caller try again */
-	return ERR_PTR(-EAGAIN);
-
- out_reset:
-	/* reset write */
-	rb_reset_tail(cpu_buffer, tail_page, tail, length);
-
-	return NULL;
-}
-
-static struct ring_buffer_event *
-__rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
-		  unsigned type, unsigned long length, u64 *ts)
-{
-	struct buffer_page *tail_page;
-	struct ring_buffer_event *event;
-	unsigned long tail, write;
-
-	tail_page = cpu_buffer->tail_page;
-	write = local_add_return(length, &tail_page->write);
-
-	/* set write to only the index of the write */
-	write &= RB_WRITE_MASK;
-	tail = write - length;
-
-	/* See if we shot pass the end of this buffer page */
-	if (write > BUF_PAGE_SIZE)
-		return rb_move_tail(cpu_buffer, length, tail,
-				    tail_page, ts);
-
-	/* We reserved something on the buffer */
-
-	event = __rb_page_index(tail_page, tail);
-	kmemcheck_annotate_bitfield(event, bitfield);
-	rb_update_event(event, type, length);
-
-	/* The passed in type is zero for DATA */
-	if (likely(!type))
-		local_inc(&tail_page->entries);
-
-	/*
-	 * If this is the first commit on the page, then update
-	 * its timestamp.
-	 */
-	if (!tail)
-		tail_page->page->time_stamp = *ts;
-
-	return event;
-}
-
-static inline int
-rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
-		  struct ring_buffer_event *event)
-{
-	unsigned long new_index, old_index;
-	struct buffer_page *bpage;
-	unsigned long index;
-	unsigned long addr;
-
-	new_index = rb_event_index(event);
-	old_index = new_index + rb_event_length(event);
-	addr = (unsigned long)event;
-	addr &= PAGE_MASK;
-
-	bpage = cpu_buffer->tail_page;
-
-	if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
-		unsigned long write_mask =
-			local_read(&bpage->write) & ~RB_WRITE_MASK;
-		/*
-		 * This is on the tail page. It is possible that
-		 * a write could come in and move the tail page
-		 * and write to the next page. That is fine
-		 * because we just shorten what is on this page.
-		 */
-		old_index += write_mask;
-		new_index += write_mask;
-		index = local_cmpxchg(&bpage->write, old_index, new_index);
-		if (index == old_index)
-			return 1;
-	}
-
-	/* could not discard */
-	return 0;
-}
-
-static int
-rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
-		  u64 *ts, u64 *delta)
-{
-	struct ring_buffer_event *event;
-	int ret;
-
-	WARN_ONCE(*delta > (1ULL << 59),
-		  KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n",
-		  (unsigned long long)*delta,
-		  (unsigned long long)*ts,
-		  (unsigned long long)cpu_buffer->write_stamp);
-
-	/*
-	 * The delta is too big, we to add a
-	 * new timestamp.
-	 */
-	event = __rb_reserve_next(cpu_buffer,
-				  RINGBUF_TYPE_TIME_EXTEND,
-				  RB_LEN_TIME_EXTEND,
-				  ts);
-	if (!event)
-		return -EBUSY;
-
-	if (PTR_ERR(event) == -EAGAIN)
-		return -EAGAIN;
-
-	/* Only a commited time event can update the write stamp */
-	if (rb_event_is_commit(cpu_buffer, event)) {
-		/*
-		 * If this is the first on the page, then it was
-		 * updated with the page itself. Try to discard it
-		 * and if we can't just make it zero.
-		 */
-		if (rb_event_index(event)) {
-			event->time_delta = *delta & TS_MASK;
-			event->array[0] = *delta >> TS_SHIFT;
-		} else {
-			/* try to discard, since we do not need this */
-			if (!rb_try_to_discard(cpu_buffer, event)) {
-				/* nope, just zero it */
-				event->time_delta = 0;
-				event->array[0] = 0;
-			}
-		}
-		cpu_buffer->write_stamp = *ts;
-		/* let the caller know this was the commit */
-		ret = 1;
-	} else {
-		/* Try to discard the event */
-		if (!rb_try_to_discard(cpu_buffer, event)) {
-			/* Darn, this is just wasted space */
-			event->time_delta = 0;
-			event->array[0] = 0;
-		}
-		ret = 0;
-	}
-
-	*delta = 0;
-
-	return ret;
-}
-
-static void rb_start_commit(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	local_inc(&cpu_buffer->committing);
-	local_inc(&cpu_buffer->commits);
-}
-
-static void rb_end_commit(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	unsigned long commits;
-
-	if (RB_WARN_ON(cpu_buffer,
-		       !local_read(&cpu_buffer->committing)))
-		return;
-
- again:
-	commits = local_read(&cpu_buffer->commits);
-	/* synchronize with interrupts */
-	barrier();
-	if (local_read(&cpu_buffer->committing) == 1)
-		rb_set_commit_to_write(cpu_buffer);
-
-	local_dec(&cpu_buffer->committing);
-
-	/* synchronize with interrupts */
-	barrier();
-
-	/*
-	 * Need to account for interrupts coming in between the
-	 * updating of the commit page and the clearing of the
-	 * committing counter.
-	 */
-	if (unlikely(local_read(&cpu_buffer->commits) != commits) &&
-	    !local_read(&cpu_buffer->committing)) {
-		local_inc(&cpu_buffer->committing);
-		goto again;
-	}
-}
-
-static struct ring_buffer_event *
-rb_reserve_next_event(struct ring_buffer *buffer,
-		      struct ring_buffer_per_cpu *cpu_buffer,
-		      unsigned long length)
-{
-	struct ring_buffer_event *event;
-	u64 ts, delta = 0;
-	int commit = 0;
-	int nr_loops = 0;
-
-	rb_start_commit(cpu_buffer);
-
-#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
-	/*
-	 * Due to the ability to swap a cpu buffer from a buffer
-	 * it is possible it was swapped before we committed.
-	 * (committing stops a swap). We check for it here and
-	 * if it happened, we have to fail the write.
-	 */
-	barrier();
-	if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
-		local_dec(&cpu_buffer->committing);
-		local_dec(&cpu_buffer->commits);
-		return NULL;
-	}
-#endif
-
-	length = rb_calculate_event_length(length);
- again:
-	/*
-	 * We allow for interrupts to reenter here and do a trace.
-	 * If one does, it will cause this original code to loop
-	 * back here. Even with heavy interrupts happening, this
-	 * should only happen a few times in a row. If this happens
-	 * 1000 times in a row, there must be either an interrupt
-	 * storm or we have something buggy.
-	 * Bail!
-	 */
-	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
-		goto out_fail;
-
-	ts = rb_time_stamp(cpu_buffer->buffer);
-
-	/*
-	 * Only the first commit can update the timestamp.
-	 * Yes there is a race here. If an interrupt comes in
-	 * just after the conditional and it traces too, then it
-	 * will also check the deltas. More than one timestamp may
-	 * also be made. But only the entry that did the actual
-	 * commit will be something other than zero.
-	 */
-	if (likely(cpu_buffer->tail_page == cpu_buffer->commit_page &&
-		   rb_page_write(cpu_buffer->tail_page) ==
-		   rb_commit_index(cpu_buffer))) {
-		u64 diff;
-
-		diff = ts - cpu_buffer->write_stamp;
-
-		/* make sure this diff is calculated here */
-		barrier();
-
-		/* Did the write stamp get updated already? */
-		if (unlikely(ts < cpu_buffer->write_stamp))
-			goto get_event;
-
-		delta = diff;
-		if (unlikely(test_time_stamp(delta))) {
-
-			commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
-			if (commit == -EBUSY)
-				goto out_fail;
-
-			if (commit == -EAGAIN)
-				goto again;
-
-			RB_WARN_ON(cpu_buffer, commit < 0);
-		}
-	}
-
- get_event:
-	event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
-	if (unlikely(PTR_ERR(event) == -EAGAIN))
-		goto again;
-
-	if (!event)
-		goto out_fail;
-
-	if (!rb_event_is_commit(cpu_buffer, event))
-		delta = 0;
-
-	event->time_delta = delta;
-
-	return event;
-
- out_fail:
-	rb_end_commit(cpu_buffer);
-	return NULL;
-}
-
-#ifdef CONFIG_TRACING
-
-#define TRACE_RECURSIVE_DEPTH 16
-
-static int trace_recursive_lock(void)
-{
-	current->trace_recursion++;
-
-	if (likely(current->trace_recursion < TRACE_RECURSIVE_DEPTH))
-		return 0;
-
-	/* Disable all tracing before we do anything else */
-	tracing_off_permanent();
-
-	printk_once(KERN_WARNING "Tracing recursion: depth[%ld]:"
-		    "HC[%lu]:SC[%lu]:NMI[%lu]\n",
-		    current->trace_recursion,
-		    hardirq_count() >> HARDIRQ_SHIFT,
-		    softirq_count() >> SOFTIRQ_SHIFT,
-		    in_nmi());
-
-	WARN_ON_ONCE(1);
-	return -1;
-}
-
-static void trace_recursive_unlock(void)
-{
-	WARN_ON_ONCE(!current->trace_recursion);
-
-	current->trace_recursion--;
-}
-
-#else
-
-#define trace_recursive_lock()		(0)
-#define trace_recursive_unlock()	do { } while (0)
-
-#endif
-
-/**
- * ring_buffer_lock_reserve - reserve a part of the buffer
- * @buffer: the ring buffer to reserve from
- * @length: the length of the data to reserve (excluding event header)
- *
- * Returns a reseverd event on the ring buffer to copy directly to.
- * The user of this interface will need to get the body to write into
- * and can use the ring_buffer_event_data() interface.
- *
- * The length is the length of the data needed, not the event length
- * which also includes the event header.
- *
- * Must be paired with ring_buffer_unlock_commit, unless NULL is returned.
- * If NULL is returned, then nothing has been allocated or locked.
- */
-struct ring_buffer_event *
-ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	struct ring_buffer_event *event;
-	int cpu;
-
-	if (ring_buffer_flags != RB_BUFFERS_ON)
-		return NULL;
-
-	/* If we are tracing schedule, we don't want to recurse */
-	preempt_disable_notrace();
-
-	if (atomic_read(&buffer->record_disabled))
-		goto out_nocheck;
-
-	if (trace_recursive_lock())
-		goto out_nocheck;
-
-	cpu = raw_smp_processor_id();
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		goto out;
-
-	cpu_buffer = buffer->buffers[cpu];
-
-	if (atomic_read(&cpu_buffer->record_disabled))
-		goto out;
-
-	if (length > BUF_MAX_DATA_SIZE)
-		goto out;
-
-	event = rb_reserve_next_event(buffer, cpu_buffer, length);
-	if (!event)
-		goto out;
-
-	return event;
-
- out:
-	trace_recursive_unlock();
-
- out_nocheck:
-	preempt_enable_notrace();
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_lock_reserve);
-
-static void
-rb_update_write_stamp(struct ring_buffer_per_cpu *cpu_buffer,
-		      struct ring_buffer_event *event)
-{
-	/*
-	 * The event first in the commit queue updates the
-	 * time stamp.
-	 */
-	if (rb_event_is_commit(cpu_buffer, event))
-		cpu_buffer->write_stamp += event->time_delta;
-}
-
-static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
-		      struct ring_buffer_event *event)
-{
-	local_inc(&cpu_buffer->entries);
-	rb_update_write_stamp(cpu_buffer, event);
-	rb_end_commit(cpu_buffer);
-}
-
-/**
- * ring_buffer_unlock_commit - commit a reserved
- * @buffer: The buffer to commit to
- * @event: The event pointer to commit.
- *
- * This commits the data to the ring buffer, and releases any locks held.
- *
- * Must be paired with ring_buffer_lock_reserve.
- */
-int ring_buffer_unlock_commit(struct ring_buffer *buffer,
-			      struct ring_buffer_event *event)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	int cpu = raw_smp_processor_id();
-
-	cpu_buffer = buffer->buffers[cpu];
-
-	rb_commit(cpu_buffer, event);
-
-	trace_recursive_unlock();
-
-	preempt_enable_notrace();
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_unlock_commit);
-
-static inline void rb_event_discard(struct ring_buffer_event *event)
-{
-	/* array[0] holds the actual length for the discarded event */
-	event->array[0] = rb_event_data_length(event) - RB_EVNT_HDR_SIZE;
-	event->type_len = RINGBUF_TYPE_PADDING;
-	/* time delta must be non zero */
-	if (!event->time_delta)
-		event->time_delta = 1;
-}
-
-/*
- * Decrement the entries to the page that an event is on.
- * The event does not even need to exist, only the pointer
- * to the page it is on. This may only be called before the commit
- * takes place.
- */
-static inline void
-rb_decrement_entry(struct ring_buffer_per_cpu *cpu_buffer,
-		   struct ring_buffer_event *event)
-{
-	unsigned long addr = (unsigned long)event;
-	struct buffer_page *bpage = cpu_buffer->commit_page;
-	struct buffer_page *start;
-
-	addr &= PAGE_MASK;
-
-	/* Do the likely case first */
-	if (likely(bpage->page == (void *)addr)) {
-		local_dec(&bpage->entries);
-		return;
-	}
-
-	/*
-	 * Because the commit page may be on the reader page we
-	 * start with the next page and check the end loop there.
-	 */
-	rb_inc_page(cpu_buffer, &bpage);
-	start = bpage;
-	do {
-		if (bpage->page == (void *)addr) {
-			local_dec(&bpage->entries);
-			return;
-		}
-		rb_inc_page(cpu_buffer, &bpage);
-	} while (bpage != start);
-
-	/* commit not part of this buffer?? */
-	RB_WARN_ON(cpu_buffer, 1);
-}
-
-/**
- * ring_buffer_commit_discard - discard an event that has not been committed
- * @buffer: the ring buffer
- * @event: non committed event to discard
- *
- * Sometimes an event that is in the ring buffer needs to be ignored.
- * This function lets the user discard an event in the ring buffer
- * and then that event will not be read later.
- *
- * This function only works if it is called before the the item has been
- * committed. It will try to free the event from the ring buffer
- * if another event has not been added behind it.
- *
- * If another event has been added behind it, it will set the event
- * up as discarded, and perform the commit.
- *
- * If this function is called, do not call ring_buffer_unlock_commit on
- * the event.
- */
-void ring_buffer_discard_commit(struct ring_buffer *buffer,
-				struct ring_buffer_event *event)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	int cpu;
-
-	/* The event is discarded regardless */
-	rb_event_discard(event);
-
-	cpu = smp_processor_id();
-	cpu_buffer = buffer->buffers[cpu];
-
-	/*
-	 * This must only be called if the event has not been
-	 * committed yet. Thus we can assume that preemption
-	 * is still disabled.
-	 */
-	RB_WARN_ON(buffer, !local_read(&cpu_buffer->committing));
-
-	rb_decrement_entry(cpu_buffer, event);
-	if (rb_try_to_discard(cpu_buffer, event))
-		goto out;
-
-	/*
-	 * The commit is still visible by the reader, so we
-	 * must still update the timestamp.
-	 */
-	rb_update_write_stamp(cpu_buffer, event);
- out:
-	rb_end_commit(cpu_buffer);
-
-	trace_recursive_unlock();
-
-	preempt_enable_notrace();
-
-}
-EXPORT_SYMBOL_GPL(ring_buffer_discard_commit);
-
-/**
- * ring_buffer_write - write data to the buffer without reserving
- * @buffer: The ring buffer to write to.
- * @length: The length of the data being written (excluding the event header)
- * @data: The data to write to the buffer.
- *
- * This is like ring_buffer_lock_reserve and ring_buffer_unlock_commit as
- * one function. If you already have the data to write to the buffer, it
- * may be easier to simply call this function.
- *
- * Note, like ring_buffer_lock_reserve, the length is the length of the data
- * and not the length of the event which would hold the header.
- */
-int ring_buffer_write(struct ring_buffer *buffer,
-			unsigned long length,
-			void *data)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	struct ring_buffer_event *event;
-	void *body;
-	int ret = -EBUSY;
-	int cpu;
-
-	if (ring_buffer_flags != RB_BUFFERS_ON)
-		return -EBUSY;
-
-	preempt_disable_notrace();
-
-	if (atomic_read(&buffer->record_disabled))
-		goto out;
-
-	cpu = raw_smp_processor_id();
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		goto out;
-
-	cpu_buffer = buffer->buffers[cpu];
-
-	if (atomic_read(&cpu_buffer->record_disabled))
-		goto out;
-
-	if (length > BUF_MAX_DATA_SIZE)
-		goto out;
-
-	event = rb_reserve_next_event(buffer, cpu_buffer, length);
-	if (!event)
-		goto out;
-
-	body = rb_event_data(event);
-
-	memcpy(body, data, length);
-
-	rb_commit(cpu_buffer, event);
-
-	ret = 0;
- out:
-	preempt_enable_notrace();
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_write);
-
-static int rb_per_cpu_empty(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct buffer_page *reader = cpu_buffer->reader_page;
-	struct buffer_page *head = rb_set_head_page(cpu_buffer);
-	struct buffer_page *commit = cpu_buffer->commit_page;
-
-	/* In case of error, head will be NULL */
-	if (unlikely(!head))
-		return 1;
-
-	return reader->read == rb_page_commit(reader) &&
-		(commit == reader ||
-		 (commit == head &&
-		  head->read == rb_page_commit(commit)));
-}
-
-/**
- * ring_buffer_record_disable - stop all writes into the buffer
- * @buffer: The ring buffer to stop writes to.
- *
- * This prevents all writes to the buffer. Any attempt to write
- * to the buffer after this will fail and return NULL.
- *
- * The caller should call synchronize_sched() after this.
- */
-void ring_buffer_record_disable(struct ring_buffer *buffer)
-{
-	atomic_inc(&buffer->record_disabled);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_record_disable);
-
-/**
- * ring_buffer_record_enable - enable writes to the buffer
- * @buffer: The ring buffer to enable writes
- *
- * Note, multiple disables will need the same number of enables
- * to truly enable the writing (much like preempt_disable).
- */
-void ring_buffer_record_enable(struct ring_buffer *buffer)
-{
-	atomic_dec(&buffer->record_disabled);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_record_enable);
-
-/**
- * ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
- * @buffer: The ring buffer to stop writes to.
- * @cpu: The CPU buffer to stop
- *
- * This prevents all writes to the buffer. Any attempt to write
- * to the buffer after this will fail and return NULL.
- *
- * The caller should call synchronize_sched() after this.
- */
-void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return;
-
-	cpu_buffer = buffer->buffers[cpu];
-	atomic_inc(&cpu_buffer->record_disabled);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_record_disable_cpu);
-
-/**
- * ring_buffer_record_enable_cpu - enable writes to the buffer
- * @buffer: The ring buffer to enable writes
- * @cpu: The CPU to enable.
- *
- * Note, multiple disables will need the same number of enables
- * to truly enable the writing (much like preempt_disable).
- */
-void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return;
-
-	cpu_buffer = buffer->buffers[cpu];
-	atomic_dec(&cpu_buffer->record_disabled);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_record_enable_cpu);
-
-/**
- * ring_buffer_entries_cpu - get the number of entries in a cpu buffer
- * @buffer: The ring buffer
- * @cpu: The per CPU buffer to get the entries from.
- */
-unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long ret;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return 0;
-
-	cpu_buffer = buffer->buffers[cpu];
-	ret = (local_read(&cpu_buffer->entries) - local_read(&cpu_buffer->overrun))
-		- cpu_buffer->read;
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_entries_cpu);
-
-/**
- * ring_buffer_overrun_cpu - get the number of overruns in a cpu_buffer
- * @buffer: The ring buffer
- * @cpu: The per CPU buffer to get the number of overruns from
- */
-unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long ret;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return 0;
-
-	cpu_buffer = buffer->buffers[cpu];
-	ret = local_read(&cpu_buffer->overrun);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_overrun_cpu);
-
-/**
- * ring_buffer_commit_overrun_cpu - get the number of overruns caused by commits
- * @buffer: The ring buffer
- * @cpu: The per CPU buffer to get the number of overruns from
- */
-unsigned long
-ring_buffer_commit_overrun_cpu(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long ret;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return 0;
-
-	cpu_buffer = buffer->buffers[cpu];
-	ret = local_read(&cpu_buffer->commit_overrun);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_commit_overrun_cpu);
-
-/**
- * ring_buffer_entries - get the number of entries in a buffer
- * @buffer: The ring buffer
- *
- * Returns the total number of entries in the ring buffer
- * (all CPU entries)
- */
-unsigned long ring_buffer_entries(struct ring_buffer *buffer)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long entries = 0;
-	int cpu;
-
-	/* if you care about this being correct, lock the buffer */
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		entries += (local_read(&cpu_buffer->entries) -
-			    local_read(&cpu_buffer->overrun)) - cpu_buffer->read;
-	}
-
-	return entries;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_entries);
-
-/**
- * ring_buffer_overruns - get the number of overruns in buffer
- * @buffer: The ring buffer
- *
- * Returns the total number of overruns in the ring buffer
- * (all CPU entries)
- */
-unsigned long ring_buffer_overruns(struct ring_buffer *buffer)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long overruns = 0;
-	int cpu;
-
-	/* if you care about this being correct, lock the buffer */
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		overruns += local_read(&cpu_buffer->overrun);
-	}
-
-	return overruns;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_overruns);
-
-static void rb_iter_reset(struct ring_buffer_iter *iter)
-{
-	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
-
-	/* Iterator usage is expected to have record disabled */
-	if (list_empty(&cpu_buffer->reader_page->list)) {
-		iter->head_page = rb_set_head_page(cpu_buffer);
-		if (unlikely(!iter->head_page))
-			return;
-		iter->head = iter->head_page->read;
-	} else {
-		iter->head_page = cpu_buffer->reader_page;
-		iter->head = cpu_buffer->reader_page->read;
-	}
-	if (iter->head)
-		iter->read_stamp = cpu_buffer->read_stamp;
-	else
-		iter->read_stamp = iter->head_page->page->time_stamp;
-	iter->cache_reader_page = cpu_buffer->reader_page;
-	iter->cache_read = cpu_buffer->read;
-}
-
-/**
- * ring_buffer_iter_reset - reset an iterator
- * @iter: The iterator to reset
- *
- * Resets the iterator, so that it will start from the beginning
- * again.
- */
-void ring_buffer_iter_reset(struct ring_buffer_iter *iter)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long flags;
-
-	if (!iter)
-		return;
-
-	cpu_buffer = iter->cpu_buffer;
-
-	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-	rb_iter_reset(iter);
-	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_iter_reset);
-
-/**
- * ring_buffer_iter_empty - check if an iterator has no more to read
- * @iter: The iterator to check
- */
-int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-
-	cpu_buffer = iter->cpu_buffer;
-
-	return iter->head_page == cpu_buffer->commit_page &&
-		iter->head == rb_commit_index(cpu_buffer);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_iter_empty);
-
-static void
-rb_update_read_stamp(struct ring_buffer_per_cpu *cpu_buffer,
-		     struct ring_buffer_event *event)
-{
-	u64 delta;
-
-	switch (event->type_len) {
-	case RINGBUF_TYPE_PADDING:
-		return;
-
-	case RINGBUF_TYPE_TIME_EXTEND:
-		delta = event->array[0];
-		delta <<= TS_SHIFT;
-		delta += event->time_delta;
-		cpu_buffer->read_stamp += delta;
-		return;
-
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
-		return;
-
-	case RINGBUF_TYPE_DATA:
-		cpu_buffer->read_stamp += event->time_delta;
-		return;
-
-	default:
-		BUG();
-	}
-	return;
-}
-
-static void
-rb_update_iter_read_stamp(struct ring_buffer_iter *iter,
-			  struct ring_buffer_event *event)
-{
-	u64 delta;
-
-	switch (event->type_len) {
-	case RINGBUF_TYPE_PADDING:
-		return;
-
-	case RINGBUF_TYPE_TIME_EXTEND:
-		delta = event->array[0];
-		delta <<= TS_SHIFT;
-		delta += event->time_delta;
-		iter->read_stamp += delta;
-		return;
-
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
-		return;
-
-	case RINGBUF_TYPE_DATA:
-		iter->read_stamp += event->time_delta;
-		return;
-
-	default:
-		BUG();
-	}
-	return;
-}
-
-static struct buffer_page *
-rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct buffer_page *reader = NULL;
-	unsigned long overwrite;
-	unsigned long flags;
-	int nr_loops = 0;
-	int ret;
-
-	local_irq_save(flags);
-	arch_spin_lock(&cpu_buffer->lock);
-
- again:
-	/*
-	 * This should normally only loop twice. But because the
-	 * start of the reader inserts an empty page, it causes
-	 * a case where we will loop three times. There should be no
-	 * reason to loop four times (that I know of).
-	 */
-	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 3)) {
-		reader = NULL;
-		goto out;
-	}
-
-	reader = cpu_buffer->reader_page;
-
-	/* If there's more to read, return this page */
-	if (cpu_buffer->reader_page->read < rb_page_size(reader))
-		goto out;
-
-	/* Never should we have an index greater than the size */
-	if (RB_WARN_ON(cpu_buffer,
-		       cpu_buffer->reader_page->read > rb_page_size(reader)))
-		goto out;
-
-	/* check if we caught up to the tail */
-	reader = NULL;
-	if (cpu_buffer->commit_page == cpu_buffer->reader_page)
-		goto out;
-
-	/*
-	 * Reset the reader page to size zero.
-	 */
-	local_set(&cpu_buffer->reader_page->write, 0);
-	local_set(&cpu_buffer->reader_page->entries, 0);
-	local_set(&cpu_buffer->reader_page->page->commit, 0);
-	cpu_buffer->reader_page->real_end = 0;
-
- spin:
-	/*
-	 * Splice the empty reader page into the list around the head.
-	 */
-	reader = rb_set_head_page(cpu_buffer);
-	cpu_buffer->reader_page->list.next = rb_list_head(reader->list.next);
-	cpu_buffer->reader_page->list.prev = reader->list.prev;
-
-	/*
-	 * cpu_buffer->pages just needs to point to the buffer, it
-	 *  has no specific buffer page to point to. Lets move it out
-	 *  of our way so we don't accidently swap it.
-	 */
-	cpu_buffer->pages = reader->list.prev;
-
-	/* The reader page will be pointing to the new head */
-	rb_set_list_to_head(cpu_buffer, &cpu_buffer->reader_page->list);
-
-	/*
-	 * We want to make sure we read the overruns after we set up our
-	 * pointers to the next object. The writer side does a
-	 * cmpxchg to cross pages which acts as the mb on the writer
-	 * side. Note, the reader will constantly fail the swap
-	 * while the writer is updating the pointers, so this
-	 * guarantees that the overwrite recorded here is the one we
-	 * want to compare with the last_overrun.
-	 */
-	smp_mb();
-	overwrite = local_read(&(cpu_buffer->overrun));
-
-	/*
-	 * Here's the tricky part.
-	 *
-	 * We need to move the pointer past the header page.
-	 * But we can only do that if a writer is not currently
-	 * moving it. The page before the header page has the
-	 * flag bit '1' set if it is pointing to the page we want.
-	 * but if the writer is in the process of moving it
-	 * than it will be '2' or already moved '0'.
-	 */
-
-	ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
-
-	/*
-	 * If we did not convert it, then we must try again.
-	 */
-	if (!ret)
-		goto spin;
-
-	/*
-	 * Yeah! We succeeded in replacing the page.
-	 *
-	 * Now make the new head point back to the reader page.
-	 */
-	rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
-	rb_inc_page(cpu_buffer, &cpu_buffer->head_page);
-
-	/* Finally update the reader page to the new head */
-	cpu_buffer->reader_page = reader;
-	rb_reset_reader_page(cpu_buffer);
-
-	if (overwrite != cpu_buffer->last_overrun) {
-		cpu_buffer->lost_events = overwrite - cpu_buffer->last_overrun;
-		cpu_buffer->last_overrun = overwrite;
-	}
-
-	goto again;
-
- out:
-	arch_spin_unlock(&cpu_buffer->lock);
-	local_irq_restore(flags);
-
-	return reader;
-}
-
-static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	struct ring_buffer_event *event;
-	struct buffer_page *reader;
-	unsigned length;
-
-	reader = rb_get_reader_page(cpu_buffer);
-
-	/* This function should not be called when buffer is empty */
-	if (RB_WARN_ON(cpu_buffer, !reader))
-		return;
-
-	event = rb_reader_event(cpu_buffer);
-
-	if (event->type_len <= RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
-		cpu_buffer->read++;
-
-	rb_update_read_stamp(cpu_buffer, event);
-
-	length = rb_event_length(event);
-	cpu_buffer->reader_page->read += length;
-}
-
-static void rb_advance_iter(struct ring_buffer_iter *iter)
-{
-	struct ring_buffer *buffer;
-	struct ring_buffer_per_cpu *cpu_buffer;
-	struct ring_buffer_event *event;
-	unsigned length;
-
-	cpu_buffer = iter->cpu_buffer;
-	buffer = cpu_buffer->buffer;
-
-	/*
-	 * Check if we are at the end of the buffer.
-	 */
-	if (iter->head >= rb_page_size(iter->head_page)) {
-		/* discarded commits can make the page empty */
-		if (iter->head_page == cpu_buffer->commit_page)
-			return;
-		rb_inc_iter(iter);
-		return;
-	}
-
-	event = rb_iter_head_event(iter);
-
-	length = rb_event_length(event);
-
-	/*
-	 * This should not be called to advance the header if we are
-	 * at the tail of the buffer.
-	 */
-	if (RB_WARN_ON(cpu_buffer,
-		       (iter->head_page == cpu_buffer->commit_page) &&
-		       (iter->head + length > rb_commit_index(cpu_buffer))))
-		return;
-
-	rb_update_iter_read_stamp(iter, event);
-
-	iter->head += length;
-
-	/* check for end of page padding */
-	if ((iter->head >= rb_page_size(iter->head_page)) &&
-	    (iter->head_page != cpu_buffer->commit_page))
-		rb_advance_iter(iter);
-}
-
-static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	return cpu_buffer->lost_events;
-}
-
-static struct ring_buffer_event *
-rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts,
-	       unsigned long *lost_events)
-{
-	struct ring_buffer_event *event;
-	struct buffer_page *reader;
-	int nr_loops = 0;
-
- again:
-	/*
-	 * We repeat when a timestamp is encountered. It is possible
-	 * to get multiple timestamps from an interrupt entering just
-	 * as one timestamp is about to be written, or from discarded
-	 * commits. The most that we can have is the number on a single page.
-	 */
-	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
-		return NULL;
-
-	reader = rb_get_reader_page(cpu_buffer);
-	if (!reader)
-		return NULL;
-
-	event = rb_reader_event(cpu_buffer);
-
-	switch (event->type_len) {
-	case RINGBUF_TYPE_PADDING:
-		if (rb_null_event(event))
-			RB_WARN_ON(cpu_buffer, 1);
-		/*
-		 * Because the writer could be discarding every
-		 * event it creates (which would probably be bad)
-		 * if we were to go back to "again" then we may never
-		 * catch up, and will trigger the warn on, or lock
-		 * the box. Return the padding, and we will release
-		 * the current locks, and try again.
-		 */
-		return event;
-
-	case RINGBUF_TYPE_TIME_EXTEND:
-		/* Internal data, OK to advance */
-		rb_advance_reader(cpu_buffer);
-		goto again;
-
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
-		rb_advance_reader(cpu_buffer);
-		goto again;
-
-	case RINGBUF_TYPE_DATA:
-		if (ts) {
-			*ts = cpu_buffer->read_stamp + event->time_delta;
-			ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
-							 cpu_buffer->cpu, ts);
-		}
-		if (lost_events)
-			*lost_events = rb_lost_events(cpu_buffer);
-		return event;
-
-	default:
-		BUG();
-	}
-
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_peek);
-
-static struct ring_buffer_event *
-rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
-{
-	struct ring_buffer *buffer;
-	struct ring_buffer_per_cpu *cpu_buffer;
-	struct ring_buffer_event *event;
-	int nr_loops = 0;
-
-	cpu_buffer = iter->cpu_buffer;
-	buffer = cpu_buffer->buffer;
-
-	/*
-	 * Check if someone performed a consuming read to
-	 * the buffer. A consuming read invalidates the iterator
-	 * and we need to reset the iterator in this case.
-	 */
-	if (unlikely(iter->cache_read != cpu_buffer->read ||
-		     iter->cache_reader_page != cpu_buffer->reader_page))
-		rb_iter_reset(iter);
-
- again:
-	if (ring_buffer_iter_empty(iter))
-		return NULL;
-
-	/*
-	 * We repeat when a timestamp is encountered.
-	 * We can get multiple timestamps by nested interrupts or also
-	 * if filtering is on (discarding commits). Since discarding
-	 * commits can be frequent we can get a lot of timestamps.
-	 * But we limit them by not adding timestamps if they begin
-	 * at the start of a page.
-	 */
-	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
-		return NULL;
-
-	if (rb_per_cpu_empty(cpu_buffer))
-		return NULL;
-
-	if (iter->head >= local_read(&iter->head_page->page->commit)) {
-		rb_inc_iter(iter);
-		goto again;
-	}
-
-	event = rb_iter_head_event(iter);
-
-	switch (event->type_len) {
-	case RINGBUF_TYPE_PADDING:
-		if (rb_null_event(event)) {
-			rb_inc_iter(iter);
-			goto again;
-		}
-		rb_advance_iter(iter);
-		return event;
-
-	case RINGBUF_TYPE_TIME_EXTEND:
-		/* Internal data, OK to advance */
-		rb_advance_iter(iter);
-		goto again;
-
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
-		rb_advance_iter(iter);
-		goto again;
-
-	case RINGBUF_TYPE_DATA:
-		if (ts) {
-			*ts = iter->read_stamp + event->time_delta;
-			ring_buffer_normalize_time_stamp(buffer,
-							 cpu_buffer->cpu, ts);
-		}
-		return event;
-
-	default:
-		BUG();
-	}
-
-	return NULL;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_iter_peek);
-
-static inline int rb_ok_to_lock(void)
-{
-	/*
-	 * If an NMI die dumps out the content of the ring buffer
-	 * do not grab locks. We also permanently disable the ring
-	 * buffer too. A one time deal is all you get from reading
-	 * the ring buffer from an NMI.
-	 */
-	if (likely(!in_nmi()))
-		return 1;
-
-	tracing_off_permanent();
-	return 0;
-}
-
-/**
- * ring_buffer_peek - peek at the next event to be read
- * @buffer: The ring buffer to read
- * @cpu: The cpu to peak at
- * @ts: The timestamp counter of this event.
- * @lost_events: a variable to store if events were lost (may be NULL)
- *
- * This will return the event that will be read next, but does
- * not consume the data.
- */
-struct ring_buffer_event *
-ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts,
-		 unsigned long *lost_events)
-{
-	struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
-	struct ring_buffer_event *event;
-	unsigned long flags;
-	int dolock;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return NULL;
-
-	dolock = rb_ok_to_lock();
- again:
-	local_irq_save(flags);
-	if (dolock)
-		spin_lock(&cpu_buffer->reader_lock);
-	event = rb_buffer_peek(cpu_buffer, ts, lost_events);
-	if (event && event->type_len == RINGBUF_TYPE_PADDING)
-		rb_advance_reader(cpu_buffer);
-	if (dolock)
-		spin_unlock(&cpu_buffer->reader_lock);
-	local_irq_restore(flags);
-
-	if (event && event->type_len == RINGBUF_TYPE_PADDING)
-		goto again;
-
-	return event;
-}
-
-/**
- * ring_buffer_iter_peek - peek at the next event to be read
- * @iter: The ring buffer iterator
- * @ts: The timestamp counter of this event.
- *
- * This will return the event that will be read next, but does
- * not increment the iterator.
- */
-struct ring_buffer_event *
-ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
-{
-	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
-	struct ring_buffer_event *event;
-	unsigned long flags;
-
- again:
-	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-	event = rb_iter_peek(iter, ts);
-	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-
-	if (event && event->type_len == RINGBUF_TYPE_PADDING)
-		goto again;
-
-	return event;
-}
-
-/**
- * ring_buffer_consume - return an event and consume it
- * @buffer: The ring buffer to get the next event from
- * @cpu: the cpu to read the buffer from
- * @ts: a variable to store the timestamp (may be NULL)
- * @lost_events: a variable to store if events were lost (may be NULL)
- *
- * Returns the next event in the ring buffer, and that event is consumed.
- * Meaning, that sequential reads will keep returning a different event,
- * and eventually empty the ring buffer if the producer is slower.
- */
-struct ring_buffer_event *
-ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts,
-		    unsigned long *lost_events)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	struct ring_buffer_event *event = NULL;
-	unsigned long flags;
-	int dolock;
-
-	dolock = rb_ok_to_lock();
-
- again:
-	/* might be called in atomic */
-	preempt_disable();
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		goto out;
-
-	cpu_buffer = buffer->buffers[cpu];
-	local_irq_save(flags);
-	if (dolock)
-		spin_lock(&cpu_buffer->reader_lock);
-
-	event = rb_buffer_peek(cpu_buffer, ts, lost_events);
-	if (event) {
-		cpu_buffer->lost_events = 0;
-		rb_advance_reader(cpu_buffer);
-	}
-
-	if (dolock)
-		spin_unlock(&cpu_buffer->reader_lock);
-	local_irq_restore(flags);
-
- out:
-	preempt_enable();
-
-	if (event && event->type_len == RINGBUF_TYPE_PADDING)
-		goto again;
-
-	return event;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_consume);
-
-/**
- * ring_buffer_read_prepare - Prepare for a non consuming read of the buffer
- * @buffer: The ring buffer to read from
- * @cpu: The cpu buffer to iterate over
- *
- * This performs the initial preparations necessary to iterate
- * through the buffer.  Memory is allocated, buffer recording
- * is disabled, and the iterator pointer is returned to the caller.
- *
- * Disabling buffer recordng prevents the reading from being
- * corrupted. This is not a consuming read, so a producer is not
- * expected.
- *
- * After a sequence of ring_buffer_read_prepare calls, the user is
- * expected to make at least one call to ring_buffer_prepare_sync.
- * Afterwards, ring_buffer_read_start is invoked to get things going
- * for real.
- *
- * This overall must be paired with ring_buffer_finish.
- */
-struct ring_buffer_iter *
-ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	struct ring_buffer_iter *iter;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return NULL;
-
-	iter = kmalloc(sizeof(*iter), GFP_KERNEL);
-	if (!iter)
-		return NULL;
-
-	cpu_buffer = buffer->buffers[cpu];
-
-	iter->cpu_buffer = cpu_buffer;
-
-	atomic_inc(&cpu_buffer->record_disabled);
-
-	return iter;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_read_prepare);
-
-/**
- * ring_buffer_read_prepare_sync - Synchronize a set of prepare calls
- *
- * All previously invoked ring_buffer_read_prepare calls to prepare
- * iterators will be synchronized.  Afterwards, read_buffer_read_start
- * calls on those iterators are allowed.
- */
-void
-ring_buffer_read_prepare_sync(void)
-{
-	synchronize_sched();
-}
-EXPORT_SYMBOL_GPL(ring_buffer_read_prepare_sync);
-
-/**
- * ring_buffer_read_start - start a non consuming read of the buffer
- * @iter: The iterator returned by ring_buffer_read_prepare
- *
- * This finalizes the startup of an iteration through the buffer.
- * The iterator comes from a call to ring_buffer_read_prepare and
- * an intervening ring_buffer_read_prepare_sync must have been
- * performed.
- *
- * Must be paired with ring_buffer_finish.
- */
-void
-ring_buffer_read_start(struct ring_buffer_iter *iter)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long flags;
-
-	if (!iter)
-		return;
-
-	cpu_buffer = iter->cpu_buffer;
-
-	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-	arch_spin_lock(&cpu_buffer->lock);
-	rb_iter_reset(iter);
-	arch_spin_unlock(&cpu_buffer->lock);
-	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_read_start);
-
-/**
- * ring_buffer_finish - finish reading the iterator of the buffer
- * @iter: The iterator retrieved by ring_buffer_start
- *
- * This re-enables the recording to the buffer, and frees the
- * iterator.
- */
-void
-ring_buffer_read_finish(struct ring_buffer_iter *iter)
-{
-	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
-
-	atomic_dec(&cpu_buffer->record_disabled);
-	kfree(iter);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_read_finish);
-
-/**
- * ring_buffer_read - read the next item in the ring buffer by the iterator
- * @iter: The ring buffer iterator
- * @ts: The time stamp of the event read.
- *
- * This reads the next event in the ring buffer and increments the iterator.
- */
-struct ring_buffer_event *
-ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts)
-{
-	struct ring_buffer_event *event;
-	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
-	unsigned long flags;
-
-	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- again:
-	event = rb_iter_peek(iter, ts);
-	if (!event)
-		goto out;
-
-	if (event->type_len == RINGBUF_TYPE_PADDING)
-		goto again;
-
-	rb_advance_iter(iter);
- out:
-	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-
-	return event;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_read);
-
-/**
- * ring_buffer_size - return the size of the ring buffer (in bytes)
- * @buffer: The ring buffer.
- */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
-{
-	return BUF_PAGE_SIZE * buffer->pages;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_size);
-
-static void
-rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
-{
-	rb_head_page_deactivate(cpu_buffer);
-
-	cpu_buffer->head_page
-		= list_entry(cpu_buffer->pages, struct buffer_page, list);
-	local_set(&cpu_buffer->head_page->write, 0);
-	local_set(&cpu_buffer->head_page->entries, 0);
-	local_set(&cpu_buffer->head_page->page->commit, 0);
-
-	cpu_buffer->head_page->read = 0;
-
-	cpu_buffer->tail_page = cpu_buffer->head_page;
-	cpu_buffer->commit_page = cpu_buffer->head_page;
-
-	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
-	local_set(&cpu_buffer->reader_page->write, 0);
-	local_set(&cpu_buffer->reader_page->entries, 0);
-	local_set(&cpu_buffer->reader_page->page->commit, 0);
-	cpu_buffer->reader_page->read = 0;
-
-	local_set(&cpu_buffer->commit_overrun, 0);
-	local_set(&cpu_buffer->overrun, 0);
-	local_set(&cpu_buffer->entries, 0);
-	local_set(&cpu_buffer->committing, 0);
-	local_set(&cpu_buffer->commits, 0);
-	cpu_buffer->read = 0;
-
-	cpu_buffer->write_stamp = 0;
-	cpu_buffer->read_stamp = 0;
-
-	cpu_buffer->lost_events = 0;
-	cpu_buffer->last_overrun = 0;
-
-	rb_head_page_activate(cpu_buffer);
-}
-
-/**
- * ring_buffer_reset_cpu - reset a ring buffer per CPU buffer
- * @buffer: The ring buffer to reset a per cpu buffer of
- * @cpu: The CPU buffer to be reset
- */
-void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
-	unsigned long flags;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return;
-
-	atomic_inc(&cpu_buffer->record_disabled);
-
-	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-
-	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
-		goto out;
-
-	arch_spin_lock(&cpu_buffer->lock);
-
-	rb_reset_cpu(cpu_buffer);
-
-	arch_spin_unlock(&cpu_buffer->lock);
-
- out:
-	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-
-	atomic_dec(&cpu_buffer->record_disabled);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
-
-/**
- * ring_buffer_reset - reset a ring buffer
- * @buffer: The ring buffer to reset all cpu buffers
- */
-void ring_buffer_reset(struct ring_buffer *buffer)
-{
-	int cpu;
-
-	for_each_buffer_cpu(buffer, cpu)
-		ring_buffer_reset_cpu(buffer, cpu);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_reset);
-
-/**
- * rind_buffer_empty - is the ring buffer empty?
- * @buffer: The ring buffer to test
- */
-int ring_buffer_empty(struct ring_buffer *buffer)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long flags;
-	int dolock;
-	int cpu;
-	int ret;
-
-	dolock = rb_ok_to_lock();
-
-	/* yes this is racy, but if you don't like the race, lock the buffer */
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		local_irq_save(flags);
-		if (dolock)
-			spin_lock(&cpu_buffer->reader_lock);
-		ret = rb_per_cpu_empty(cpu_buffer);
-		if (dolock)
-			spin_unlock(&cpu_buffer->reader_lock);
-		local_irq_restore(flags);
-
-		if (!ret)
-			return 0;
-	}
-
-	return 1;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_empty);
-
-/**
- * ring_buffer_empty_cpu - is a cpu buffer of a ring buffer empty?
- * @buffer: The ring buffer
- * @cpu: The CPU buffer to test
- */
-int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned long flags;
-	int dolock;
-	int ret;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		return 1;
-
-	dolock = rb_ok_to_lock();
-
-	cpu_buffer = buffer->buffers[cpu];
-	local_irq_save(flags);
-	if (dolock)
-		spin_lock(&cpu_buffer->reader_lock);
-	ret = rb_per_cpu_empty(cpu_buffer);
-	if (dolock)
-		spin_unlock(&cpu_buffer->reader_lock);
-	local_irq_restore(flags);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_empty_cpu);
-
-#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
-/**
- * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
- * @buffer_a: One buffer to swap with
- * @buffer_b: The other buffer to swap with
- *
- * This function is useful for tracers that want to take a "snapshot"
- * of a CPU buffer and has another back up buffer lying around.
- * it is expected that the tracer handles the cpu buffer not being
- * used at the moment.
- */
-int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
-			 struct ring_buffer *buffer_b, int cpu)
-{
-	struct ring_buffer_per_cpu *cpu_buffer_a;
-	struct ring_buffer_per_cpu *cpu_buffer_b;
-	int ret = -EINVAL;
-
-	if (!cpumask_test_cpu(cpu, buffer_a->cpumask) ||
-	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
-		goto out;
-
-	/* At least make sure the two buffers are somewhat the same */
-	if (buffer_a->pages != buffer_b->pages)
-		goto out;
-
-	ret = -EAGAIN;
-
-	if (ring_buffer_flags != RB_BUFFERS_ON)
-		goto out;
-
-	if (atomic_read(&buffer_a->record_disabled))
-		goto out;
-
-	if (atomic_read(&buffer_b->record_disabled))
-		goto out;
-
-	cpu_buffer_a = buffer_a->buffers[cpu];
-	cpu_buffer_b = buffer_b->buffers[cpu];
-
-	if (atomic_read(&cpu_buffer_a->record_disabled))
-		goto out;
-
-	if (atomic_read(&cpu_buffer_b->record_disabled))
-		goto out;
-
-	/*
-	 * We can't do a synchronize_sched here because this
-	 * function can be called in atomic context.
-	 * Normally this will be called from the same CPU as cpu.
-	 * If not it's up to the caller to protect this.
-	 */
-	atomic_inc(&cpu_buffer_a->record_disabled);
-	atomic_inc(&cpu_buffer_b->record_disabled);
-
-	ret = -EBUSY;
-	if (local_read(&cpu_buffer_a->committing))
-		goto out_dec;
-	if (local_read(&cpu_buffer_b->committing))
-		goto out_dec;
-
-	buffer_a->buffers[cpu] = cpu_buffer_b;
-	buffer_b->buffers[cpu] = cpu_buffer_a;
-
-	cpu_buffer_b->buffer = buffer_a;
-	cpu_buffer_a->buffer = buffer_b;
-
-	ret = 0;
-
-out_dec:
-	atomic_dec(&cpu_buffer_a->record_disabled);
-	atomic_dec(&cpu_buffer_b->record_disabled);
-out:
-	return ret;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_swap_cpu);
-#endif /* CONFIG_RING_BUFFER_ALLOW_SWAP */
-
-/**
- * ring_buffer_alloc_read_page - allocate a page to read from buffer
- * @buffer: the buffer to allocate for.
- *
- * This function is used in conjunction with ring_buffer_read_page.
- * When reading a full page from the ring buffer, these functions
- * can be used to speed up the process. The calling function should
- * allocate a few pages first with this function. Then when it
- * needs to get pages from the ring buffer, it passes the result
- * of this function into ring_buffer_read_page, which will swap
- * the page that was allocated, with the read page of the buffer.
- *
- * Returns:
- *  The page allocated, or NULL on error.
- */
-void *ring_buffer_alloc_read_page(struct ring_buffer *buffer)
-{
-	struct buffer_data_page *bpage;
-	unsigned long addr;
-
-	addr = __get_free_page(GFP_KERNEL);
-	if (!addr)
-		return NULL;
-
-	bpage = (void *)addr;
-
-	rb_init_page(bpage);
-
-	return bpage;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_alloc_read_page);
-
-/**
- * ring_buffer_free_read_page - free an allocated read page
- * @buffer: the buffer the page was allocate for
- * @data: the page to free
- *
- * Free a page allocated from ring_buffer_alloc_read_page.
- */
-void ring_buffer_free_read_page(struct ring_buffer *buffer, void *data)
-{
-	free_page((unsigned long)data);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_free_read_page);
-
-/**
- * ring_buffer_read_page - extract a page from the ring buffer
- * @buffer: buffer to extract from
- * @data_page: the page to use allocated from ring_buffer_alloc_read_page
- * @len: amount to extract
- * @cpu: the cpu of the buffer to extract
- * @full: should the extraction only happen when the page is full.
- *
- * This function will pull out a page from the ring buffer and consume it.
- * @data_page must be the address of the variable that was returned
- * from ring_buffer_alloc_read_page. This is because the page might be used
- * to swap with a page in the ring buffer.
- *
- * for example:
- *	rpage = ring_buffer_alloc_read_page(buffer);
- *	if (!rpage)
- *		return error;
- *	ret = ring_buffer_read_page(buffer, &rpage, len, cpu, 0);
- *	if (ret >= 0)
- *		process_page(rpage, ret);
- *
- * When @full is set, the function will not return true unless
- * the writer is off the reader page.
- *
- * Note: it is up to the calling functions to handle sleeps and wakeups.
- *  The ring buffer can be used anywhere in the kernel and can not
- *  blindly call wake_up. The layer that uses the ring buffer must be
- *  responsible for that.
- *
- * Returns:
- *  >=0 if data has been transferred, returns the offset of consumed data.
- *  <0 if no data has been transferred.
- */
-int ring_buffer_read_page(struct ring_buffer *buffer,
-			  void **data_page, size_t len, int cpu, int full)
-{
-	struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
-	struct ring_buffer_event *event;
-	struct buffer_data_page *bpage;
-	struct buffer_page *reader;
-	unsigned long missed_events;
-	unsigned long flags;
-	unsigned int commit;
-	unsigned int read;
-	u64 save_timestamp;
-	int ret = -1;
-
-	if (!cpumask_test_cpu(cpu, buffer->cpumask))
-		goto out;
-
-	/*
-	 * If len is not big enough to hold the page header, then
-	 * we can not copy anything.
-	 */
-	if (len <= BUF_PAGE_HDR_SIZE)
-		goto out;
-
-	len -= BUF_PAGE_HDR_SIZE;
-
-	if (!data_page)
-		goto out;
-
-	bpage = *data_page;
-	if (!bpage)
-		goto out;
-
-	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-
-	reader = rb_get_reader_page(cpu_buffer);
-	if (!reader)
-		goto out_unlock;
-
-	event = rb_reader_event(cpu_buffer);
-
-	read = reader->read;
-	commit = rb_page_commit(reader);
-
-	/* Check if any events were dropped */
-	missed_events = cpu_buffer->lost_events;
-
-	/*
-	 * If this page has been partially read or
-	 * if len is not big enough to read the rest of the page or
-	 * a writer is still on the page, then
-	 * we must copy the data from the page to the buffer.
-	 * Otherwise, we can simply swap the page with the one passed in.
-	 */
-	if (read || (len < (commit - read)) ||
-	    cpu_buffer->reader_page == cpu_buffer->commit_page) {
-		struct buffer_data_page *rpage = cpu_buffer->reader_page->page;
-		unsigned int rpos = read;
-		unsigned int pos = 0;
-		unsigned int size;
-
-		if (full)
-			goto out_unlock;
-
-		if (len > (commit - read))
-			len = (commit - read);
-
-		size = rb_event_length(event);
-
-		if (len < size)
-			goto out_unlock;
-
-		/* save the current timestamp, since the user will need it */
-		save_timestamp = cpu_buffer->read_stamp;
-
-		/* Need to copy one event at a time */
-		do {
-			memcpy(bpage->data + pos, rpage->data + rpos, size);
-
-			len -= size;
-
-			rb_advance_reader(cpu_buffer);
-			rpos = reader->read;
-			pos += size;
-
-			event = rb_reader_event(cpu_buffer);
-			size = rb_event_length(event);
-		} while (len > size);
-
-		/* update bpage */
-		local_set(&bpage->commit, pos);
-		bpage->time_stamp = save_timestamp;
-
-		/* we copied everything to the beginning */
-		read = 0;
-	} else {
-		/* update the entry counter */
-		cpu_buffer->read += rb_page_entries(reader);
-
-		/* swap the pages */
-		rb_init_page(bpage);
-		bpage = reader->page;
-		reader->page = *data_page;
-		local_set(&reader->write, 0);
-		local_set(&reader->entries, 0);
-		reader->read = 0;
-		*data_page = bpage;
-
-		/*
-		 * Use the real_end for the data size,
-		 * This gives us a chance to store the lost events
-		 * on the page.
-		 */
-		if (reader->real_end)
-			local_set(&bpage->commit, reader->real_end);
-	}
-	ret = read;
-
-	cpu_buffer->lost_events = 0;
-
-	commit = local_read(&bpage->commit);
-	/*
-	 * Set a flag in the commit field if we lost events
-	 */
-	if (missed_events) {
-		/* If there is room at the end of the page to save the
-		 * missed events, then record it there.
-		 */
-		if (BUF_PAGE_SIZE - commit >= sizeof(missed_events)) {
-			memcpy(&bpage->data[commit], &missed_events,
-			       sizeof(missed_events));
-			local_add(RB_MISSED_STORED, &bpage->commit);
-			commit += sizeof(missed_events);
-		}
-		local_add(RB_MISSED_EVENTS, &bpage->commit);
-	}
-
-	/*
-	 * This page may be off to user land. Zero it out here.
-	 */
-	if (commit < BUF_PAGE_SIZE)
-		memset(&bpage->data[commit], 0, BUF_PAGE_SIZE - commit);
-
- out_unlock:
-	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-
- out:
-	return ret;
-}
-EXPORT_SYMBOL_GPL(ring_buffer_read_page);
-
-#ifdef CONFIG_TRACING
-static ssize_t
-rb_simple_read(struct file *filp, char __user *ubuf,
-	       size_t cnt, loff_t *ppos)
-{
-	unsigned long *p = filp->private_data;
-	char buf[64];
-	int r;
-
-	if (test_bit(RB_BUFFERS_DISABLED_BIT, p))
-		r = sprintf(buf, "permanently disabled\n");
-	else
-		r = sprintf(buf, "%d\n", test_bit(RB_BUFFERS_ON_BIT, p));
-
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
-}
-
-static ssize_t
-rb_simple_write(struct file *filp, const char __user *ubuf,
-		size_t cnt, loff_t *ppos)
-{
-	unsigned long *p = filp->private_data;
-	char buf[64];
-	unsigned long val;
-	int ret;
-
-	if (cnt >= sizeof(buf))
-		return -EINVAL;
-
-	if (copy_from_user(&buf, ubuf, cnt))
-		return -EFAULT;
-
-	buf[cnt] = 0;
-
-	ret = strict_strtoul(buf, 10, &val);
-	if (ret < 0)
-		return ret;
-
-	if (val)
-		set_bit(RB_BUFFERS_ON_BIT, p);
-	else
-		clear_bit(RB_BUFFERS_ON_BIT, p);
-
-	(*ppos)++;
-
-	return cnt;
-}
-
-static const struct file_operations rb_simple_fops = {
-	.open		= tracing_open_generic,
-	.read		= rb_simple_read,
-	.write		= rb_simple_write,
-};
-
-
-static __init int rb_init_debugfs(void)
-{
-	struct dentry *d_tracer;
-
-	d_tracer = tracing_init_dentry();
-
-	trace_create_file("tracing_on", 0644, d_tracer,
-			    &ring_buffer_flags, &rb_simple_fops);
-
-	return 0;
-}
-
-fs_initcall(rb_init_debugfs);
-#endif
-
-#ifdef CONFIG_HOTPLUG_CPU
-static int rb_cpu_notify(struct notifier_block *self,
-			 unsigned long action, void *hcpu)
-{
-	struct ring_buffer *buffer =
-		container_of(self, struct ring_buffer, cpu_notify);
-	long cpu = (long)hcpu;
-
-	switch (action) {
-	case CPU_UP_PREPARE:
-	case CPU_UP_PREPARE_FROZEN:
-		if (cpumask_test_cpu(cpu, buffer->cpumask))
-			return NOTIFY_OK;
-
-		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
-		if (!buffer->buffers[cpu]) {
-			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
-			     cpu);
-			return NOTIFY_OK;
-		}
-		smp_wmb();
-		cpumask_set_cpu(cpu, buffer->cpumask);
-		break;
-	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
-		/*
-		 * Do nothing.
-		 *  If we were to free the buffer, then the user would
-		 *  lose any trace that was in the buffer.
-		 */
-		break;
-	default:
-		break;
-	}
-	return NOTIFY_OK;
-}
-#endif
Index: linux.trees.git/include/linux/oprofile.h
===================================================================
--- linux.trees.git.orig/include/linux/oprofile.h	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/include/linux/oprofile.h	2010-07-09 18:08:47.000000000 -0400
@@ -172,7 +172,7 @@ void oprofile_cpu_buffer_inc_smpl_lost(v
 struct op_sample;
 
 struct op_entry {
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct op_sample *sample;
 	unsigned long size;
 	unsigned long *data;
Index: linux.trees.git/include/trace/ftrace.h
===================================================================
--- linux.trees.git.orig/include/trace/ftrace.h	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/include/trace/ftrace.h	2010-07-09 18:08:47.000000000 -0400
@@ -399,9 +399,9 @@ static inline notrace int ftrace_get_off
  * {
  *	struct ftrace_event_call *event_call = __data;
  *	struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
- *	struct ring_buffer_event *event;
+ *	struct ftrace_ring_buffer_event *event;
  *	struct ftrace_raw_<call> *entry; <-- defined in stage 1
- *	struct ring_buffer *buffer;
+ *	struct ftrace_ring_buffer *buffer;
  *	unsigned long irq_flags;
  *	int __data_size;
  *	int pc;
@@ -417,7 +417,7 @@ static inline notrace int ftrace_get_off
  *				  irq_flags, pc);
  *	if (!event)
  *		return;
- *	entry	= ring_buffer_event_data(event);
+ *	entry	= ftrace_ring_buffer_event_data(event);
  *
  *	{ <assign>; }  <-- Here we assign the entries by the __field and
  *			   __array macros.
@@ -501,9 +501,9 @@ ftrace_raw_event_##call(void *__data, pr
 {									\
 	struct ftrace_event_call *event_call = __data;			\
 	struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
-	struct ring_buffer_event *event;				\
+	struct ftrace_ring_buffer_event *event;				\
 	struct ftrace_raw_##call *entry;				\
-	struct ring_buffer *buffer;					\
+	struct ftrace_ring_buffer *buffer;					\
 	unsigned long irq_flags;					\
 	int __data_size;						\
 	int pc;								\
@@ -519,7 +519,7 @@ ftrace_raw_event_##call(void *__data, pr
 				 irq_flags, pc);			\
 	if (!event)							\
 		return;							\
-	entry	= ring_buffer_event_data(event);			\
+	entry	= ftrace_ring_buffer_event_data(event);			\
 									\
 	tstruct								\
 									\
Index: linux.trees.git/kernel/trace/blktrace.c
===================================================================
--- linux.trees.git.orig/kernel/trace/blktrace.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/blktrace.c	2010-07-09 18:08:47.000000000 -0400
@@ -65,8 +65,8 @@ static void trace_note(struct blk_trace
 		       const void *data, size_t len)
 {
 	struct blk_io_trace *t;
-	struct ring_buffer_event *event = NULL;
-	struct ring_buffer *buffer = NULL;
+	struct ftrace_ring_buffer_event *event = NULL;
+	struct ftrace_ring_buffer *buffer = NULL;
 	int pc = 0;
 	int cpu = smp_processor_id();
 	bool blk_tracer = blk_tracer_enabled;
@@ -79,7 +79,7 @@ static void trace_note(struct blk_trace
 						  0, pc);
 		if (!event)
 			return;
-		t = ring_buffer_event_data(event);
+		t = ftrace_ring_buffer_event_data(event);
 		goto record_it;
 	}
 
@@ -181,8 +181,8 @@ static void __blk_add_trace(struct blk_t
 		     int rw, u32 what, int error, int pdu_len, void *pdu_data)
 {
 	struct task_struct *tsk = current;
-	struct ring_buffer_event *event = NULL;
-	struct ring_buffer *buffer = NULL;
+	struct ftrace_ring_buffer_event *event = NULL;
+	struct ftrace_ring_buffer *buffer = NULL;
 	struct blk_io_trace *t;
 	unsigned long flags = 0;
 	unsigned long *sequence;
@@ -215,7 +215,7 @@ static void __blk_add_trace(struct blk_t
 						  0, pc);
 		if (!event)
 			return;
-		t = ring_buffer_event_data(event);
+		t = ftrace_ring_buffer_event_data(event);
 		goto record_it;
 	}
 
Index: linux.trees.git/kernel/trace/ftrace_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/ftrace_ring_buffer_benchmark.c	2010-07-09 18:08:47.000000000 -0400
@@ -0,0 +1,488 @@
+/*
+ * ftrace ring buffer tester and benchmark
+ *
+ * Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
+ */
+#include <linux/ftrace_ring_buffer.h>
+#include <linux/completion.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/time.h>
+#include <asm/local.h>
+
+struct rb_page {
+	u64		ts;
+	local_t		commit;
+	char		data[4080];
+};
+
+/* run time and sleep time in seconds */
+#define RUN_TIME	10
+#define SLEEP_TIME	10
+
+/* number of events for writer to wake up the reader */
+static int wakeup_interval = 100;
+
+static int reader_finish;
+static struct completion read_start;
+static struct completion read_done;
+
+static struct ftrace_ring_buffer *buffer;
+static struct task_struct *producer;
+static struct task_struct *consumer;
+static unsigned long read;
+
+static int disable_reader;
+module_param(disable_reader, uint, 0644);
+MODULE_PARM_DESC(disable_reader, "only run producer");
+
+static int write_iteration = 50;
+module_param(write_iteration, uint, 0644);
+MODULE_PARM_DESC(write_iteration, "# of writes between timestamp readings");
+
+static int producer_nice = 19;
+static int consumer_nice = 19;
+
+static int producer_fifo = -1;
+static int consumer_fifo = -1;
+
+module_param(producer_nice, uint, 0644);
+MODULE_PARM_DESC(producer_nice, "nice prio for producer");
+
+module_param(consumer_nice, uint, 0644);
+MODULE_PARM_DESC(consumer_nice, "nice prio for consumer");
+
+module_param(producer_fifo, uint, 0644);
+MODULE_PARM_DESC(producer_fifo, "fifo prio for producer");
+
+module_param(consumer_fifo, uint, 0644);
+MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
+
+static int read_events;
+
+static int kill_test;
+
+#define KILL_TEST()				\
+	do {					\
+		if (!kill_test) {		\
+			kill_test = 1;		\
+			WARN_ON(1);		\
+		}				\
+	} while (0)
+
+enum event_status {
+	EVENT_FOUND,
+	EVENT_DROPPED,
+};
+
+static enum event_status read_event(int cpu)
+{
+	struct ftrace_ring_buffer_event *event;
+	int *entry;
+	u64 ts;
+
+	event = ftrace_ring_buffer_consume(buffer, cpu, &ts, NULL);
+	if (!event)
+		return EVENT_DROPPED;
+
+	entry = ftrace_ring_buffer_event_data(event);
+	if (*entry != cpu) {
+		KILL_TEST();
+		return EVENT_DROPPED;
+	}
+
+	read++;
+	return EVENT_FOUND;
+}
+
+static enum event_status read_page(int cpu)
+{
+	struct ftrace_ring_buffer_event *event;
+	struct rb_page *rpage;
+	unsigned long commit;
+	void *bpage;
+	int *entry;
+	int ret;
+	int inc;
+	int i;
+
+	bpage = ftrace_ring_buffer_alloc_read_page(buffer);
+	if (!bpage)
+		return EVENT_DROPPED;
+
+	ret = ftrace_ring_buffer_read_page(buffer, &bpage, PAGE_SIZE, cpu, 1);
+	if (ret >= 0) {
+		rpage = bpage;
+		/* The commit may have missed event flags set, clear them */
+		commit = local_read(&rpage->commit) & 0xfffff;
+		for (i = 0; i < commit && !kill_test; i += inc) {
+
+			if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) {
+				KILL_TEST();
+				break;
+			}
+
+			inc = -1;
+			event = (void *)&rpage->data[i];
+			switch (event->type_len) {
+			case RINGBUF_TYPE_PADDING:
+				/* failed writes may be discarded events */
+				if (!event->time_delta)
+					KILL_TEST();
+				inc = event->array[0] + 4;
+				break;
+			case RINGBUF_TYPE_TIME_EXTEND:
+				inc = 8;
+				break;
+			case 0:
+				entry = ftrace_ring_buffer_event_data(event);
+				if (*entry != cpu) {
+					KILL_TEST();
+					break;
+				}
+				read++;
+				if (!event->array[0]) {
+					KILL_TEST();
+					break;
+				}
+				inc = event->array[0] + 4;
+				break;
+			default:
+				entry = ftrace_ring_buffer_event_data(event);
+				if (*entry != cpu) {
+					KILL_TEST();
+					break;
+				}
+				read++;
+				inc = ((event->type_len + 1) * 4);
+			}
+			if (kill_test)
+				break;
+
+			if (inc <= 0) {
+				KILL_TEST();
+				break;
+			}
+		}
+	}
+	ftrace_ring_buffer_free_read_page(buffer, bpage);
+
+	if (ret < 0)
+		return EVENT_DROPPED;
+	return EVENT_FOUND;
+}
+
+static void ftrace_ring_buffer_consumer(void)
+{
+	/* toggle between reading pages and events */
+	read_events ^= 1;
+
+	read = 0;
+	while (!reader_finish && !kill_test) {
+		int found;
+
+		do {
+			int cpu;
+
+			found = 0;
+			for_each_online_cpu(cpu) {
+				enum event_status stat;
+
+				if (read_events)
+					stat = read_event(cpu);
+				else
+					stat = read_page(cpu);
+
+				if (kill_test)
+					break;
+				if (stat == EVENT_FOUND)
+					found = 1;
+			}
+		} while (found && !kill_test);
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (reader_finish)
+			break;
+
+		schedule();
+		__set_current_state(TASK_RUNNING);
+	}
+	reader_finish = 0;
+	complete(&read_done);
+}
+
+static void ftrace_ring_buffer_producer(void)
+{
+	struct timeval start_tv;
+	struct timeval end_tv;
+	unsigned long long time;
+	unsigned long long entries;
+	unsigned long long overruns;
+	unsigned long missed = 0;
+	unsigned long hit = 0;
+	unsigned long avg;
+	int cnt = 0;
+
+	/*
+	 * Hammer the buffer for 10 secs (this may
+	 * make the system stall)
+	 */
+	trace_printk("Starting ring buffer hammer\n");
+	do_gettimeofday(&start_tv);
+	do {
+		struct ftrace_ring_buffer_event *event;
+		int *entry;
+		int i;
+
+		for (i = 0; i < write_iteration; i++) {
+			event = ftrace_ring_buffer_lock_reserve(buffer, 10);
+			if (!event) {
+				missed++;
+			} else {
+				hit++;
+				entry = ftrace_ring_buffer_event_data(event);
+				*entry = smp_processor_id();
+				ftrace_ring_buffer_unlock_commit(buffer, event);
+			}
+		}
+		do_gettimeofday(&end_tv);
+
+		cnt++;
+		if (consumer && !(cnt % wakeup_interval))
+			wake_up_process(consumer);
+
+#ifndef CONFIG_PREEMPT
+		/*
+		 * If we are a non preempt kernel, the 10 second run will
+		 * stop everything while it runs. Instead, we will call
+		 * cond_resched and also add any time that was lost by a
+		 * rescedule.
+		 *
+		 * Do a cond resched at the same frequency we would wake up
+		 * the reader.
+		 */
+		if (cnt % wakeup_interval)
+			cond_resched();
+#endif
+
+	} while (end_tv.tv_sec < (start_tv.tv_sec + RUN_TIME) && !kill_test);
+	trace_printk("End ring buffer hammer\n");
+
+	if (consumer) {
+		/* Init both completions here to avoid races */
+		init_completion(&read_start);
+		init_completion(&read_done);
+		/* the completions must be visible before the finish var */
+		smp_wmb();
+		reader_finish = 1;
+		/* finish var visible before waking up the consumer */
+		smp_wmb();
+		wake_up_process(consumer);
+		wait_for_completion(&read_done);
+	}
+
+	time = end_tv.tv_sec - start_tv.tv_sec;
+	time *= USEC_PER_SEC;
+	time += (long long)((long)end_tv.tv_usec - (long)start_tv.tv_usec);
+
+	entries = ftrace_ring_buffer_entries(buffer);
+	overruns = ftrace_ring_buffer_overruns(buffer);
+
+	if (kill_test)
+		trace_printk("ERROR!\n");
+
+	if (!disable_reader) {
+		if (consumer_fifo < 0)
+			trace_printk("Running Consumer at nice: %d\n",
+				     consumer_nice);
+		else
+			trace_printk("Running Consumer at SCHED_FIFO %d\n",
+				     consumer_fifo);
+	}
+	if (producer_fifo < 0)
+		trace_printk("Running Producer at nice: %d\n",
+			     producer_nice);
+	else
+		trace_printk("Running Producer at SCHED_FIFO %d\n",
+			     producer_fifo);
+
+	/* Let the user know that the test is running at low priority */
+	if (producer_fifo < 0 && consumer_fifo < 0 &&
+	    producer_nice == 19 && consumer_nice == 19)
+		trace_printk("WARNING!!! This test is running at lowest priority.\n");
+
+	trace_printk("Time:     %lld (usecs)\n", time);
+	trace_printk("Overruns: %lld\n", overruns);
+	if (disable_reader)
+		trace_printk("Read:     (reader disabled)\n");
+	else
+		trace_printk("Read:     %ld  (by %s)\n", read,
+			read_events ? "events" : "pages");
+	trace_printk("Entries:  %lld\n", entries);
+	trace_printk("Total:    %lld\n", entries + overruns + read);
+	trace_printk("Missed:   %ld\n", missed);
+	trace_printk("Hit:      %ld\n", hit);
+
+	/* Convert time from usecs to millisecs */
+	do_div(time, USEC_PER_MSEC);
+	if (time)
+		hit /= (long)time;
+	else
+		trace_printk("TIME IS ZERO??\n");
+
+	trace_printk("Entries per millisec: %ld\n", hit);
+
+	if (hit) {
+		/* Calculate the average time in nanosecs */
+		avg = NSEC_PER_MSEC / hit;
+		trace_printk("%ld ns per entry\n", avg);
+	}
+
+	if (missed) {
+		if (time)
+			missed /= (long)time;
+
+		trace_printk("Total iterations per millisec: %ld\n",
+			     hit + missed);
+
+		/* it is possible that hit + missed will overflow and be zero */
+		if (!(hit + missed)) {
+			trace_printk("hit + missed overflowed and totalled zero!\n");
+			hit--; /* make it non zero */
+		}
+
+		/* Caculate the average time in nanosecs */
+		avg = NSEC_PER_MSEC / (hit + missed);
+		trace_printk("%ld ns per entry\n", avg);
+	}
+}
+
+static void wait_to_die(void)
+{
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+}
+
+static int ftrace_ring_buffer_consumer_thread(void *arg)
+{
+	while (!kthread_should_stop() && !kill_test) {
+		complete(&read_start);
+
+		ftrace_ring_buffer_consumer();
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (kthread_should_stop() || kill_test)
+			break;
+
+		schedule();
+		__set_current_state(TASK_RUNNING);
+	}
+	__set_current_state(TASK_RUNNING);
+
+	if (kill_test)
+		wait_to_die();
+
+	return 0;
+}
+
+static int ftrace_ring_buffer_producer_thread(void *arg)
+{
+	init_completion(&read_start);
+
+	while (!kthread_should_stop() && !kill_test) {
+		ftrace_ring_buffer_reset(buffer);
+
+		if (consumer) {
+			smp_wmb();
+			wake_up_process(consumer);
+			wait_for_completion(&read_start);
+		}
+
+		ftrace_ring_buffer_producer();
+
+		trace_printk("Sleeping for 10 secs\n");
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(HZ * SLEEP_TIME);
+		__set_current_state(TASK_RUNNING);
+	}
+
+	if (kill_test)
+		wait_to_die();
+
+	return 0;
+}
+
+static int __init ftrace_ring_buffer_benchmark_init(void)
+{
+	int ret;
+
+	/* make a one meg buffer in overwite mode */
+	buffer = ftrace_ring_buffer_alloc(1000000, RB_FL_OVERWRITE);
+	if (!buffer)
+		return -ENOMEM;
+
+	if (!disable_reader) {
+		consumer = kthread_create(ftrace_ring_buffer_consumer_thread,
+					  NULL, "rb_consumer");
+		ret = PTR_ERR(consumer);
+		if (IS_ERR(consumer))
+			goto out_fail;
+	}
+
+	producer = kthread_run(ftrace_ring_buffer_producer_thread,
+			       NULL, "rb_producer");
+	ret = PTR_ERR(producer);
+
+	if (IS_ERR(producer))
+		goto out_kill;
+
+	/*
+	 * Run them as low-prio background tasks by default:
+	 */
+	if (!disable_reader) {
+		if (consumer_fifo >= 0) {
+			struct sched_param param = {
+				.sched_priority = consumer_fifo
+			};
+			sched_setscheduler(consumer, SCHED_FIFO, &param);
+		} else
+			set_user_nice(consumer, consumer_nice);
+	}
+
+	if (producer_fifo >= 0) {
+		struct sched_param param = {
+			.sched_priority = consumer_fifo
+		};
+		sched_setscheduler(producer, SCHED_FIFO, &param);
+	} else
+		set_user_nice(producer, producer_nice);
+
+	return 0;
+
+ out_kill:
+	if (consumer)
+		kthread_stop(consumer);
+
+ out_fail:
+	ftrace_ring_buffer_free(buffer);
+	return ret;
+}
+
+static void __exit ftrace_ring_buffer_benchmark_exit(void)
+{
+	kthread_stop(producer);
+	if (consumer)
+		kthread_stop(consumer);
+	ftrace_ring_buffer_free(buffer);
+}
+
+module_init(ftrace_ring_buffer_benchmark_init);
+module_exit(ftrace_ring_buffer_benchmark_exit);
+
+MODULE_AUTHOR("Steven Rostedt");
+MODULE_DESCRIPTION("ftrace_ring_buffer_benchmark");
+MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/ring_buffer_benchmark.c
===================================================================
--- linux.trees.git.orig/kernel/trace/ring_buffer_benchmark.c	2010-07-09 18:08:14.000000000 -0400
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,488 +0,0 @@
-/*
- * ring buffer tester and benchmark
- *
- * Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
- */
-#include <linux/ring_buffer.h>
-#include <linux/completion.h>
-#include <linux/kthread.h>
-#include <linux/module.h>
-#include <linux/time.h>
-#include <asm/local.h>
-
-struct rb_page {
-	u64		ts;
-	local_t		commit;
-	char		data[4080];
-};
-
-/* run time and sleep time in seconds */
-#define RUN_TIME	10
-#define SLEEP_TIME	10
-
-/* number of events for writer to wake up the reader */
-static int wakeup_interval = 100;
-
-static int reader_finish;
-static struct completion read_start;
-static struct completion read_done;
-
-static struct ring_buffer *buffer;
-static struct task_struct *producer;
-static struct task_struct *consumer;
-static unsigned long read;
-
-static int disable_reader;
-module_param(disable_reader, uint, 0644);
-MODULE_PARM_DESC(disable_reader, "only run producer");
-
-static int write_iteration = 50;
-module_param(write_iteration, uint, 0644);
-MODULE_PARM_DESC(write_iteration, "# of writes between timestamp readings");
-
-static int producer_nice = 19;
-static int consumer_nice = 19;
-
-static int producer_fifo = -1;
-static int consumer_fifo = -1;
-
-module_param(producer_nice, uint, 0644);
-MODULE_PARM_DESC(producer_nice, "nice prio for producer");
-
-module_param(consumer_nice, uint, 0644);
-MODULE_PARM_DESC(consumer_nice, "nice prio for consumer");
-
-module_param(producer_fifo, uint, 0644);
-MODULE_PARM_DESC(producer_fifo, "fifo prio for producer");
-
-module_param(consumer_fifo, uint, 0644);
-MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
-
-static int read_events;
-
-static int kill_test;
-
-#define KILL_TEST()				\
-	do {					\
-		if (!kill_test) {		\
-			kill_test = 1;		\
-			WARN_ON(1);		\
-		}				\
-	} while (0)
-
-enum event_status {
-	EVENT_FOUND,
-	EVENT_DROPPED,
-};
-
-static enum event_status read_event(int cpu)
-{
-	struct ring_buffer_event *event;
-	int *entry;
-	u64 ts;
-
-	event = ring_buffer_consume(buffer, cpu, &ts, NULL);
-	if (!event)
-		return EVENT_DROPPED;
-
-	entry = ring_buffer_event_data(event);
-	if (*entry != cpu) {
-		KILL_TEST();
-		return EVENT_DROPPED;
-	}
-
-	read++;
-	return EVENT_FOUND;
-}
-
-static enum event_status read_page(int cpu)
-{
-	struct ring_buffer_event *event;
-	struct rb_page *rpage;
-	unsigned long commit;
-	void *bpage;
-	int *entry;
-	int ret;
-	int inc;
-	int i;
-
-	bpage = ring_buffer_alloc_read_page(buffer);
-	if (!bpage)
-		return EVENT_DROPPED;
-
-	ret = ring_buffer_read_page(buffer, &bpage, PAGE_SIZE, cpu, 1);
-	if (ret >= 0) {
-		rpage = bpage;
-		/* The commit may have missed event flags set, clear them */
-		commit = local_read(&rpage->commit) & 0xfffff;
-		for (i = 0; i < commit && !kill_test; i += inc) {
-
-			if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) {
-				KILL_TEST();
-				break;
-			}
-
-			inc = -1;
-			event = (void *)&rpage->data[i];
-			switch (event->type_len) {
-			case RINGBUF_TYPE_PADDING:
-				/* failed writes may be discarded events */
-				if (!event->time_delta)
-					KILL_TEST();
-				inc = event->array[0] + 4;
-				break;
-			case RINGBUF_TYPE_TIME_EXTEND:
-				inc = 8;
-				break;
-			case 0:
-				entry = ring_buffer_event_data(event);
-				if (*entry != cpu) {
-					KILL_TEST();
-					break;
-				}
-				read++;
-				if (!event->array[0]) {
-					KILL_TEST();
-					break;
-				}
-				inc = event->array[0] + 4;
-				break;
-			default:
-				entry = ring_buffer_event_data(event);
-				if (*entry != cpu) {
-					KILL_TEST();
-					break;
-				}
-				read++;
-				inc = ((event->type_len + 1) * 4);
-			}
-			if (kill_test)
-				break;
-
-			if (inc <= 0) {
-				KILL_TEST();
-				break;
-			}
-		}
-	}
-	ring_buffer_free_read_page(buffer, bpage);
-
-	if (ret < 0)
-		return EVENT_DROPPED;
-	return EVENT_FOUND;
-}
-
-static void ring_buffer_consumer(void)
-{
-	/* toggle between reading pages and events */
-	read_events ^= 1;
-
-	read = 0;
-	while (!reader_finish && !kill_test) {
-		int found;
-
-		do {
-			int cpu;
-
-			found = 0;
-			for_each_online_cpu(cpu) {
-				enum event_status stat;
-
-				if (read_events)
-					stat = read_event(cpu);
-				else
-					stat = read_page(cpu);
-
-				if (kill_test)
-					break;
-				if (stat == EVENT_FOUND)
-					found = 1;
-			}
-		} while (found && !kill_test);
-
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (reader_finish)
-			break;
-
-		schedule();
-		__set_current_state(TASK_RUNNING);
-	}
-	reader_finish = 0;
-	complete(&read_done);
-}
-
-static void ring_buffer_producer(void)
-{
-	struct timeval start_tv;
-	struct timeval end_tv;
-	unsigned long long time;
-	unsigned long long entries;
-	unsigned long long overruns;
-	unsigned long missed = 0;
-	unsigned long hit = 0;
-	unsigned long avg;
-	int cnt = 0;
-
-	/*
-	 * Hammer the buffer for 10 secs (this may
-	 * make the system stall)
-	 */
-	trace_printk("Starting ring buffer hammer\n");
-	do_gettimeofday(&start_tv);
-	do {
-		struct ring_buffer_event *event;
-		int *entry;
-		int i;
-
-		for (i = 0; i < write_iteration; i++) {
-			event = ring_buffer_lock_reserve(buffer, 10);
-			if (!event) {
-				missed++;
-			} else {
-				hit++;
-				entry = ring_buffer_event_data(event);
-				*entry = smp_processor_id();
-				ring_buffer_unlock_commit(buffer, event);
-			}
-		}
-		do_gettimeofday(&end_tv);
-
-		cnt++;
-		if (consumer && !(cnt % wakeup_interval))
-			wake_up_process(consumer);
-
-#ifndef CONFIG_PREEMPT
-		/*
-		 * If we are a non preempt kernel, the 10 second run will
-		 * stop everything while it runs. Instead, we will call
-		 * cond_resched and also add any time that was lost by a
-		 * rescedule.
-		 *
-		 * Do a cond resched at the same frequency we would wake up
-		 * the reader.
-		 */
-		if (cnt % wakeup_interval)
-			cond_resched();
-#endif
-
-	} while (end_tv.tv_sec < (start_tv.tv_sec + RUN_TIME) && !kill_test);
-	trace_printk("End ring buffer hammer\n");
-
-	if (consumer) {
-		/* Init both completions here to avoid races */
-		init_completion(&read_start);
-		init_completion(&read_done);
-		/* the completions must be visible before the finish var */
-		smp_wmb();
-		reader_finish = 1;
-		/* finish var visible before waking up the consumer */
-		smp_wmb();
-		wake_up_process(consumer);
-		wait_for_completion(&read_done);
-	}
-
-	time = end_tv.tv_sec - start_tv.tv_sec;
-	time *= USEC_PER_SEC;
-	time += (long long)((long)end_tv.tv_usec - (long)start_tv.tv_usec);
-
-	entries = ring_buffer_entries(buffer);
-	overruns = ring_buffer_overruns(buffer);
-
-	if (kill_test)
-		trace_printk("ERROR!\n");
-
-	if (!disable_reader) {
-		if (consumer_fifo < 0)
-			trace_printk("Running Consumer at nice: %d\n",
-				     consumer_nice);
-		else
-			trace_printk("Running Consumer at SCHED_FIFO %d\n",
-				     consumer_fifo);
-	}
-	if (producer_fifo < 0)
-		trace_printk("Running Producer at nice: %d\n",
-			     producer_nice);
-	else
-		trace_printk("Running Producer at SCHED_FIFO %d\n",
-			     producer_fifo);
-
-	/* Let the user know that the test is running at low priority */
-	if (producer_fifo < 0 && consumer_fifo < 0 &&
-	    producer_nice == 19 && consumer_nice == 19)
-		trace_printk("WARNING!!! This test is running at lowest priority.\n");
-
-	trace_printk("Time:     %lld (usecs)\n", time);
-	trace_printk("Overruns: %lld\n", overruns);
-	if (disable_reader)
-		trace_printk("Read:     (reader disabled)\n");
-	else
-		trace_printk("Read:     %ld  (by %s)\n", read,
-			read_events ? "events" : "pages");
-	trace_printk("Entries:  %lld\n", entries);
-	trace_printk("Total:    %lld\n", entries + overruns + read);
-	trace_printk("Missed:   %ld\n", missed);
-	trace_printk("Hit:      %ld\n", hit);
-
-	/* Convert time from usecs to millisecs */
-	do_div(time, USEC_PER_MSEC);
-	if (time)
-		hit /= (long)time;
-	else
-		trace_printk("TIME IS ZERO??\n");
-
-	trace_printk("Entries per millisec: %ld\n", hit);
-
-	if (hit) {
-		/* Calculate the average time in nanosecs */
-		avg = NSEC_PER_MSEC / hit;
-		trace_printk("%ld ns per entry\n", avg);
-	}
-
-	if (missed) {
-		if (time)
-			missed /= (long)time;
-
-		trace_printk("Total iterations per millisec: %ld\n",
-			     hit + missed);
-
-		/* it is possible that hit + missed will overflow and be zero */
-		if (!(hit + missed)) {
-			trace_printk("hit + missed overflowed and totalled zero!\n");
-			hit--; /* make it non zero */
-		}
-
-		/* Caculate the average time in nanosecs */
-		avg = NSEC_PER_MSEC / (hit + missed);
-		trace_printk("%ld ns per entry\n", avg);
-	}
-}
-
-static void wait_to_die(void)
-{
-	set_current_state(TASK_INTERRUPTIBLE);
-	while (!kthread_should_stop()) {
-		schedule();
-		set_current_state(TASK_INTERRUPTIBLE);
-	}
-	__set_current_state(TASK_RUNNING);
-}
-
-static int ring_buffer_consumer_thread(void *arg)
-{
-	while (!kthread_should_stop() && !kill_test) {
-		complete(&read_start);
-
-		ring_buffer_consumer();
-
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (kthread_should_stop() || kill_test)
-			break;
-
-		schedule();
-		__set_current_state(TASK_RUNNING);
-	}
-	__set_current_state(TASK_RUNNING);
-
-	if (kill_test)
-		wait_to_die();
-
-	return 0;
-}
-
-static int ring_buffer_producer_thread(void *arg)
-{
-	init_completion(&read_start);
-
-	while (!kthread_should_stop() && !kill_test) {
-		ring_buffer_reset(buffer);
-
-		if (consumer) {
-			smp_wmb();
-			wake_up_process(consumer);
-			wait_for_completion(&read_start);
-		}
-
-		ring_buffer_producer();
-
-		trace_printk("Sleeping for 10 secs\n");
-		set_current_state(TASK_INTERRUPTIBLE);
-		schedule_timeout(HZ * SLEEP_TIME);
-		__set_current_state(TASK_RUNNING);
-	}
-
-	if (kill_test)
-		wait_to_die();
-
-	return 0;
-}
-
-static int __init ring_buffer_benchmark_init(void)
-{
-	int ret;
-
-	/* make a one meg buffer in overwite mode */
-	buffer = ring_buffer_alloc(1000000, RB_FL_OVERWRITE);
-	if (!buffer)
-		return -ENOMEM;
-
-	if (!disable_reader) {
-		consumer = kthread_create(ring_buffer_consumer_thread,
-					  NULL, "rb_consumer");
-		ret = PTR_ERR(consumer);
-		if (IS_ERR(consumer))
-			goto out_fail;
-	}
-
-	producer = kthread_run(ring_buffer_producer_thread,
-			       NULL, "rb_producer");
-	ret = PTR_ERR(producer);
-
-	if (IS_ERR(producer))
-		goto out_kill;
-
-	/*
-	 * Run them as low-prio background tasks by default:
-	 */
-	if (!disable_reader) {
-		if (consumer_fifo >= 0) {
-			struct sched_param param = {
-				.sched_priority = consumer_fifo
-			};
-			sched_setscheduler(consumer, SCHED_FIFO, &param);
-		} else
-			set_user_nice(consumer, consumer_nice);
-	}
-
-	if (producer_fifo >= 0) {
-		struct sched_param param = {
-			.sched_priority = consumer_fifo
-		};
-		sched_setscheduler(producer, SCHED_FIFO, &param);
-	} else
-		set_user_nice(producer, producer_nice);
-
-	return 0;
-
- out_kill:
-	if (consumer)
-		kthread_stop(consumer);
-
- out_fail:
-	ring_buffer_free(buffer);
-	return ret;
-}
-
-static void __exit ring_buffer_benchmark_exit(void)
-{
-	kthread_stop(producer);
-	if (consumer)
-		kthread_stop(consumer);
-	ring_buffer_free(buffer);
-}
-
-module_init(ring_buffer_benchmark_init);
-module_exit(ring_buffer_benchmark_exit);
-
-MODULE_AUTHOR("Steven Rostedt");
-MODULE_DESCRIPTION("ring_buffer_benchmark");
-MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/trace.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace.c	2010-07-09 18:08:47.000000000 -0400
@@ -11,7 +11,7 @@
  *  Copyright (C) 2004-2006 Ingo Molnar
  *  Copyright (C) 2004 William Lee Irwin III
  */
-#include <linux/ring_buffer.h>
+#include <linux/ftrace_ring_buffer.h>
 #include <generated/utsrelease.h>
 #include <linux/stacktrace.h>
 #include <linux/writeback.h>
@@ -48,7 +48,7 @@
  * On boot up, the ring buffer is set to the minimum size, so that
  * we do not waste memory on systems that are not using tracing.
  */
-int ring_buffer_expanded;
+int ftrace_ring_buffer_expanded;
 
 /*
  * We need to change this state when a selftest is running.
@@ -135,7 +135,7 @@ static int __init set_cmdline_ftrace(cha
 	strncpy(bootup_tracer_buf, str, MAX_TRACER_SIZE);
 	default_bootup_tracer = bootup_tracer_buf;
 	/* We are using ftrace early, expand it */
-	ring_buffer_expanded = 1;
+	ftrace_ring_buffer_expanded = 1;
 	return 1;
 }
 __setup("ftrace=", set_cmdline_ftrace);
@@ -179,9 +179,9 @@ static struct trace_array	global_trace;
 
 static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu);
 
-int filter_current_check_discard(struct ring_buffer *buffer,
+int filter_current_check_discard(struct ftrace_ring_buffer *buffer,
 				 struct ftrace_event_call *call, void *rec,
-				 struct ring_buffer_event *event)
+				 struct ftrace_ring_buffer_event *event)
 {
 	return filter_check_discard(call, rec, buffer, event);
 }
@@ -195,8 +195,8 @@ cycle_t ftrace_now(int cpu)
 	if (!global_trace.buffer)
 		return trace_clock_local();
 
-	ts = ring_buffer_time_stamp(global_trace.buffer, cpu);
-	ring_buffer_normalize_time_stamp(global_trace.buffer, cpu, &ts);
+	ts = ftrace_ring_buffer_time_stamp(global_trace.buffer, cpu);
+	ftrace_ring_buffer_normalize_time_stamp(global_trace.buffer, cpu, &ts);
 
 	return ts;
 }
@@ -260,7 +260,7 @@ static DEFINE_MUTEX(trace_types_lock);
  * serialize the access of the ring buffer
  *
  * ring buffer serializes readers, but it is low level protection.
- * The validity of the events (which returns by ring_buffer_peek() ..etc)
+ * The validity of the events (which returns by ftrace_ring_buffer_peek() ..etc)
  * are not protected by ring buffer.
  *
  * The content of events may become garbage if we allow other process consumes
@@ -653,7 +653,7 @@ __update_max_tr(struct trace_array *tr,
 void
 update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
 {
-	struct ring_buffer *buf = tr->buffer;
+	struct ftrace_ring_buffer *buf = tr->buffer;
 
 	if (trace_stop_count)
 		return;
@@ -689,7 +689,7 @@ update_max_tr_single(struct trace_array
 
 	ftrace_disable_cpu();
 
-	ret = ring_buffer_swap_cpu(max_tr.buffer, tr->buffer, cpu);
+	ret = ftrace_ring_buffer_swap_cpu(max_tr.buffer, tr->buffer, cpu);
 
 	if (ret == -EBUSY) {
 		/*
@@ -852,32 +852,32 @@ out:
 	mutex_unlock(&trace_types_lock);
 }
 
-static void __tracing_reset(struct ring_buffer *buffer, int cpu)
+static void __tracing_reset(struct ftrace_ring_buffer *buffer, int cpu)
 {
 	ftrace_disable_cpu();
-	ring_buffer_reset_cpu(buffer, cpu);
+	ftrace_ring_buffer_reset_cpu(buffer, cpu);
 	ftrace_enable_cpu();
 }
 
 void tracing_reset(struct trace_array *tr, int cpu)
 {
-	struct ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
 
-	ring_buffer_record_disable(buffer);
+	ftrace_ring_buffer_record_disable(buffer);
 
 	/* Make sure all commits have finished */
 	synchronize_sched();
 	__tracing_reset(buffer, cpu);
 
-	ring_buffer_record_enable(buffer);
+	ftrace_ring_buffer_record_enable(buffer);
 }
 
 void tracing_reset_online_cpus(struct trace_array *tr)
 {
-	struct ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
 	int cpu;
 
-	ring_buffer_record_disable(buffer);
+	ftrace_ring_buffer_record_disable(buffer);
 
 	/* Make sure all commits have finished */
 	synchronize_sched();
@@ -887,7 +887,7 @@ void tracing_reset_online_cpus(struct tr
 	for_each_online_cpu(cpu)
 		__tracing_reset(buffer, cpu);
 
-	ring_buffer_record_enable(buffer);
+	ftrace_ring_buffer_record_enable(buffer);
 }
 
 void tracing_reset_current(int cpu)
@@ -946,7 +946,7 @@ void ftrace_off_permanent(void)
  */
 void tracing_start(void)
 {
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer *buffer;
 	unsigned long flags;
 
 	if (tracing_disabled)
@@ -967,11 +967,11 @@ void tracing_start(void)
 
 	buffer = global_trace.buffer;
 	if (buffer)
-		ring_buffer_record_enable(buffer);
+		ftrace_ring_buffer_record_enable(buffer);
 
 	buffer = max_tr.buffer;
 	if (buffer)
-		ring_buffer_record_enable(buffer);
+		ftrace_ring_buffer_record_enable(buffer);
 
 	arch_spin_unlock(&ftrace_max_lock);
 
@@ -988,7 +988,7 @@ void tracing_start(void)
  */
 void tracing_stop(void)
 {
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer *buffer;
 	unsigned long flags;
 
 	ftrace_stop();
@@ -1001,11 +1001,11 @@ void tracing_stop(void)
 
 	buffer = global_trace.buffer;
 	if (buffer)
-		ring_buffer_record_disable(buffer);
+		ftrace_ring_buffer_record_disable(buffer);
 
 	buffer = max_tr.buffer;
 	if (buffer)
-		ring_buffer_record_disable(buffer);
+		ftrace_ring_buffer_record_disable(buffer);
 
 	arch_spin_unlock(&ftrace_max_lock);
 
@@ -1117,17 +1117,17 @@ tracing_generic_entry_update(struct trac
 }
 EXPORT_SYMBOL_GPL(tracing_generic_entry_update);
 
-struct ring_buffer_event *
-trace_buffer_lock_reserve(struct ring_buffer *buffer,
+struct ftrace_ring_buffer_event *
+trace_buffer_lock_reserve(struct ftrace_ring_buffer *buffer,
 			  int type,
 			  unsigned long len,
 			  unsigned long flags, int pc)
 {
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 
-	event = ring_buffer_lock_reserve(buffer, len);
+	event = ftrace_ring_buffer_lock_reserve(buffer, len);
 	if (event != NULL) {
-		struct trace_entry *ent = ring_buffer_event_data(event);
+		struct trace_entry *ent = ftrace_ring_buffer_event_data(event);
 
 		tracing_generic_entry_update(ent, flags, pc);
 		ent->type = type;
@@ -1137,12 +1137,12 @@ trace_buffer_lock_reserve(struct ring_bu
 }
 
 static inline void
-__trace_buffer_unlock_commit(struct ring_buffer *buffer,
-			     struct ring_buffer_event *event,
+__trace_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+			     struct ftrace_ring_buffer_event *event,
 			     unsigned long flags, int pc,
 			     int wake)
 {
-	ring_buffer_unlock_commit(buffer, event);
+	ftrace_ring_buffer_unlock_commit(buffer, event);
 
 	ftrace_trace_stack(buffer, flags, 6, pc);
 	ftrace_trace_userstack(buffer, flags, pc);
@@ -1151,15 +1151,15 @@ __trace_buffer_unlock_commit(struct ring
 		trace_wake_up();
 }
 
-void trace_buffer_unlock_commit(struct ring_buffer *buffer,
-				struct ring_buffer_event *event,
+void trace_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+				struct ftrace_ring_buffer_event *event,
 				unsigned long flags, int pc)
 {
 	__trace_buffer_unlock_commit(buffer, event, flags, pc, 1);
 }
 
-struct ring_buffer_event *
-trace_current_buffer_lock_reserve(struct ring_buffer **current_rb,
+struct ftrace_ring_buffer_event *
+trace_current_buffer_lock_reserve(struct ftrace_ring_buffer **current_rb,
 				  int type, unsigned long len,
 				  unsigned long flags, int pc)
 {
@@ -1169,26 +1169,26 @@ trace_current_buffer_lock_reserve(struct
 }
 EXPORT_SYMBOL_GPL(trace_current_buffer_lock_reserve);
 
-void trace_current_buffer_unlock_commit(struct ring_buffer *buffer,
-					struct ring_buffer_event *event,
+void trace_current_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+					struct ftrace_ring_buffer_event *event,
 					unsigned long flags, int pc)
 {
 	__trace_buffer_unlock_commit(buffer, event, flags, pc, 1);
 }
 EXPORT_SYMBOL_GPL(trace_current_buffer_unlock_commit);
 
-void trace_nowake_buffer_unlock_commit(struct ring_buffer *buffer,
-				       struct ring_buffer_event *event,
+void trace_nowake_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+				       struct ftrace_ring_buffer_event *event,
 				       unsigned long flags, int pc)
 {
 	__trace_buffer_unlock_commit(buffer, event, flags, pc, 0);
 }
 EXPORT_SYMBOL_GPL(trace_nowake_buffer_unlock_commit);
 
-void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
-					 struct ring_buffer_event *event)
+void trace_current_buffer_discard_commit(struct ftrace_ring_buffer *buffer,
+					 struct ftrace_ring_buffer_event *event)
 {
-	ring_buffer_discard_commit(buffer, event);
+	ftrace_ring_buffer_discard_commit(buffer, event);
 }
 EXPORT_SYMBOL_GPL(trace_current_buffer_discard_commit);
 
@@ -1198,8 +1198,8 @@ trace_function(struct trace_array *tr,
 	       int pc)
 {
 	struct ftrace_event_call *call = &event_function;
-	struct ring_buffer *buffer = tr->buffer;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer_event *event;
 	struct ftrace_entry *entry;
 
 	/* If we are reading the ring buffer, don't trace */
@@ -1210,12 +1210,12 @@ trace_function(struct trace_array *tr,
 					  flags, pc);
 	if (!event)
 		return;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->ip			= ip;
 	entry->parent_ip		= parent_ip;
 
 	if (!filter_check_discard(call, entry, buffer, event))
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 }
 
 void
@@ -1228,12 +1228,12 @@ ftrace(struct trace_array *tr, struct tr
 }
 
 #ifdef CONFIG_STACKTRACE
-static void __ftrace_trace_stack(struct ring_buffer *buffer,
+static void __ftrace_trace_stack(struct ftrace_ring_buffer *buffer,
 				 unsigned long flags,
 				 int skip, int pc)
 {
 	struct ftrace_event_call *call = &event_kernel_stack;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct stack_entry *entry;
 	struct stack_trace trace;
 
@@ -1241,7 +1241,7 @@ static void __ftrace_trace_stack(struct
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	memset(&entry->caller, 0, sizeof(entry->caller));
 
 	trace.nr_entries	= 0;
@@ -1251,10 +1251,10 @@ static void __ftrace_trace_stack(struct
 
 	save_stack_trace(&trace);
 	if (!filter_check_discard(call, entry, buffer, event))
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 }
 
-void ftrace_trace_stack(struct ring_buffer *buffer, unsigned long flags,
+void ftrace_trace_stack(struct ftrace_ring_buffer *buffer, unsigned long flags,
 			int skip, int pc)
 {
 	if (!(trace_flags & TRACE_ITER_STACKTRACE))
@@ -1286,10 +1286,10 @@ void trace_dump_stack(void)
 }
 
 void
-ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc)
+ftrace_trace_userstack(struct ftrace_ring_buffer *buffer, unsigned long flags, int pc)
 {
 	struct ftrace_event_call *call = &event_user_stack;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct userstack_entry *entry;
 	struct stack_trace trace;
 
@@ -1307,7 +1307,7 @@ ftrace_trace_userstack(struct ring_buffe
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 
 	entry->tgid		= current->tgid;
 	memset(&entry->caller, 0, sizeof(entry->caller));
@@ -1319,7 +1319,7 @@ ftrace_trace_userstack(struct ring_buffe
 
 	save_stack_trace_user(&trace);
 	if (!filter_check_discard(call, entry, buffer, event))
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 }
 
 #ifdef UNUSED
@@ -1337,16 +1337,16 @@ ftrace_trace_special(void *__tr,
 		     int pc)
 {
 	struct ftrace_event_call *call = &event_special;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct trace_array *tr = __tr;
-	struct ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
 	struct special_entry *entry;
 
 	event = trace_buffer_lock_reserve(buffer, TRACE_SPECIAL,
 					  sizeof(*entry), 0, pc);
 	if (!event)
 		return;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->arg1			= arg1;
 	entry->arg2			= arg2;
 	entry->arg3			= arg3;
@@ -1397,8 +1397,8 @@ int trace_vbprintk(unsigned long ip, con
 	static u32 trace_buf[TRACE_BUF_SIZE];
 
 	struct ftrace_event_call *call = &event_bprint;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer;
 	struct trace_array *tr = &global_trace;
 	struct trace_array_cpu *data;
 	struct bprint_entry *entry;
@@ -1435,13 +1435,13 @@ int trace_vbprintk(unsigned long ip, con
 					  flags, pc);
 	if (!event)
 		goto out_unlock;
-	entry = ring_buffer_event_data(event);
+	entry = ftrace_ring_buffer_event_data(event);
 	entry->ip			= ip;
 	entry->fmt			= fmt;
 
 	memcpy(entry->buf, trace_buf, sizeof(u32) * len);
 	if (!filter_check_discard(call, entry, buffer, event)) {
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 		ftrace_trace_stack(buffer, flags, 6, pc);
 	}
 
@@ -1480,8 +1480,8 @@ int trace_array_vprintk(struct trace_arr
 	static char trace_buf[TRACE_BUF_SIZE];
 
 	struct ftrace_event_call *call = &event_print;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer;
 	struct trace_array_cpu *data;
 	int cpu, len = 0, size, pc;
 	struct print_entry *entry;
@@ -1511,13 +1511,13 @@ int trace_array_vprintk(struct trace_arr
 					  irq_flags, pc);
 	if (!event)
 		goto out_unlock;
-	entry = ring_buffer_event_data(event);
+	entry = ftrace_ring_buffer_event_data(event);
 	entry->ip = ip;
 
 	memcpy(&entry->buf, trace_buf, len);
 	entry->buf[len] = '\0';
 	if (!filter_check_discard(call, entry, buffer, event)) {
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 		ftrace_trace_stack(buffer, irq_flags, 6, pc);
 	}
 
@@ -1550,7 +1550,7 @@ static void trace_iterator_increment(str
 
 	iter->idx++;
 	if (iter->buffer_iter[iter->cpu])
-		ring_buffer_read(iter->buffer_iter[iter->cpu], NULL);
+		ftrace_ring_buffer_read(iter->buffer_iter[iter->cpu], NULL);
 
 	ftrace_enable_cpu();
 }
@@ -1559,28 +1559,28 @@ static struct trace_entry *
 peek_next_entry(struct trace_iterator *iter, int cpu, u64 *ts,
 		unsigned long *lost_events)
 {
-	struct ring_buffer_event *event;
-	struct ring_buffer_iter *buf_iter = iter->buffer_iter[cpu];
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer_iter *buf_iter = iter->buffer_iter[cpu];
 
 	/* Don't allow ftrace to trace into the ring buffers */
 	ftrace_disable_cpu();
 
 	if (buf_iter)
-		event = ring_buffer_iter_peek(buf_iter, ts);
+		event = ftrace_ring_buffer_iter_peek(buf_iter, ts);
 	else
-		event = ring_buffer_peek(iter->tr->buffer, cpu, ts,
+		event = ftrace_ring_buffer_peek(iter->tr->buffer, cpu, ts,
 					 lost_events);
 
 	ftrace_enable_cpu();
 
-	return event ? ring_buffer_event_data(event) : NULL;
+	return event ? ftrace_ring_buffer_event_data(event) : NULL;
 }
 
 static struct trace_entry *
 __find_next_entry(struct trace_iterator *iter, int *ent_cpu,
 		  unsigned long *missing_events, u64 *ent_ts)
 {
-	struct ring_buffer *buffer = iter->tr->buffer;
+	struct ftrace_ring_buffer *buffer = iter->tr->buffer;
 	struct trace_entry *ent, *next = NULL;
 	unsigned long lost_events = 0, next_lost = 0;
 	int cpu_file = iter->cpu_file;
@@ -1593,7 +1593,7 @@ __find_next_entry(struct trace_iterator
 	 * all cpu and peek directly.
 	 */
 	if (cpu_file > TRACE_PIPE_ALL_CPU) {
-		if (ring_buffer_empty_cpu(buffer, cpu_file))
+		if (ftrace_ring_buffer_empty_cpu(buffer, cpu_file))
 			return NULL;
 		ent = peek_next_entry(iter, cpu_file, ent_ts, missing_events);
 		if (ent_cpu)
@@ -1604,7 +1604,7 @@ __find_next_entry(struct trace_iterator
 
 	for_each_tracing_cpu(cpu) {
 
-		if (ring_buffer_empty_cpu(buffer, cpu))
+		if (ftrace_ring_buffer_empty_cpu(buffer, cpu))
 			continue;
 
 		ent = peek_next_entry(iter, cpu, &ts, &lost_events);
@@ -1655,7 +1655,7 @@ static void trace_consume(struct trace_i
 {
 	/* Don't allow ftrace to trace into the ring buffers */
 	ftrace_disable_cpu();
-	ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts,
+	ftrace_ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts,
 			    &iter->lost_events);
 	ftrace_enable_cpu();
 }
@@ -1690,8 +1690,8 @@ static void *s_next(struct seq_file *m,
 static void tracing_iter_reset(struct trace_iterator *iter, int cpu)
 {
 	struct trace_array *tr = iter->tr;
-	struct ring_buffer_event *event;
-	struct ring_buffer_iter *buf_iter;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer_iter *buf_iter;
 	unsigned long entries = 0;
 	u64 ts;
 
@@ -1701,18 +1701,18 @@ static void tracing_iter_reset(struct tr
 		return;
 
 	buf_iter = iter->buffer_iter[cpu];
-	ring_buffer_iter_reset(buf_iter);
+	ftrace_ring_buffer_iter_reset(buf_iter);
 
 	/*
 	 * We could have the case with the max latency tracers
 	 * that a reset never took place on a cpu. This is evident
 	 * by the timestamp being before the start of the buffer.
 	 */
-	while ((event = ring_buffer_iter_peek(buf_iter, &ts))) {
+	while ((event = ftrace_ring_buffer_iter_peek(buf_iter, &ts))) {
 		if (ts >= iter->tr->time_start)
 			break;
 		entries++;
-		ring_buffer_read(buf_iter, NULL);
+		ftrace_ring_buffer_read(buf_iter, NULL);
 	}
 
 	tr->data[cpu]->skipped_entries = entries;
@@ -1825,7 +1825,7 @@ print_trace_header(struct seq_file *m, s
 
 
 	for_each_tracing_cpu(cpu) {
-		count = ring_buffer_entries_cpu(tr->buffer, cpu);
+		count = ftrace_ring_buffer_entries_cpu(tr->buffer, cpu);
 		/*
 		 * If this buffer has skipped entries, then we hold all
 		 * entries for the trace and we need to ignore the
@@ -1837,7 +1837,7 @@ print_trace_header(struct seq_file *m, s
 			total += count;
 		} else
 			total += count +
-				ring_buffer_overrun_cpu(tr->buffer, cpu);
+				ftrace_ring_buffer_overrun_cpu(tr->buffer, cpu);
 		entries += count;
 	}
 
@@ -2025,10 +2025,10 @@ int trace_empty(struct trace_iterator *i
 	if (iter->cpu_file != TRACE_PIPE_ALL_CPU) {
 		cpu = iter->cpu_file;
 		if (iter->buffer_iter[cpu]) {
-			if (!ring_buffer_iter_empty(iter->buffer_iter[cpu]))
+			if (!ftrace_ring_buffer_iter_empty(iter->buffer_iter[cpu]))
 				return 0;
 		} else {
-			if (!ring_buffer_empty_cpu(iter->tr->buffer, cpu))
+			if (!ftrace_ring_buffer_empty_cpu(iter->tr->buffer, cpu))
 				return 0;
 		}
 		return 1;
@@ -2036,10 +2036,10 @@ int trace_empty(struct trace_iterator *i
 
 	for_each_tracing_cpu(cpu) {
 		if (iter->buffer_iter[cpu]) {
-			if (!ring_buffer_iter_empty(iter->buffer_iter[cpu]))
+			if (!ftrace_ring_buffer_iter_empty(iter->buffer_iter[cpu]))
 				return 0;
 		} else {
-			if (!ring_buffer_empty_cpu(iter->tr->buffer, cpu))
+			if (!ftrace_ring_buffer_empty_cpu(iter->tr->buffer, cpu))
 				return 0;
 		}
 	}
@@ -2193,7 +2193,7 @@ __tracing_open(struct inode *inode, stru
 		iter->trace->open(iter);
 
 	/* Annotate start of buffers if we had overruns */
-	if (ring_buffer_overruns(iter->tr->buffer))
+	if (ftrace_ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
 	/* stop the trace while dumping */
@@ -2202,19 +2202,19 @@ __tracing_open(struct inode *inode, stru
 	if (iter->cpu_file == TRACE_PIPE_ALL_CPU) {
 		for_each_tracing_cpu(cpu) {
 			iter->buffer_iter[cpu] =
-				ring_buffer_read_prepare(iter->tr->buffer, cpu);
+				ftrace_ring_buffer_read_prepare(iter->tr->buffer, cpu);
 		}
-		ring_buffer_read_prepare_sync();
+		ftrace_ring_buffer_read_prepare_sync();
 		for_each_tracing_cpu(cpu) {
-			ring_buffer_read_start(iter->buffer_iter[cpu]);
+			ftrace_ring_buffer_read_start(iter->buffer_iter[cpu]);
 			tracing_iter_reset(iter, cpu);
 		}
 	} else {
 		cpu = iter->cpu_file;
 		iter->buffer_iter[cpu] =
-			ring_buffer_read_prepare(iter->tr->buffer, cpu);
-		ring_buffer_read_prepare_sync();
-		ring_buffer_read_start(iter->buffer_iter[cpu]);
+			ftrace_ring_buffer_read_prepare(iter->tr->buffer, cpu);
+		ftrace_ring_buffer_read_prepare_sync();
+		ftrace_ring_buffer_read_start(iter->buffer_iter[cpu]);
 		tracing_iter_reset(iter, cpu);
 	}
 
@@ -2234,7 +2234,7 @@ __tracing_open(struct inode *inode, stru
  fail_buffer:
 	for_each_tracing_cpu(cpu) {
 		if (iter->buffer_iter[cpu])
-			ring_buffer_read_finish(iter->buffer_iter[cpu]);
+			ftrace_ring_buffer_read_finish(iter->buffer_iter[cpu]);
 	}
 	free_cpumask_var(iter->started);
 	tracing_start();
@@ -2269,7 +2269,7 @@ static int tracing_release(struct inode
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu) {
 		if (iter->buffer_iter[cpu])
-			ring_buffer_read_finish(iter->buffer_iter[cpu]);
+			ftrace_ring_buffer_read_finish(iter->buffer_iter[cpu]);
 	}
 
 	if (iter->trace && iter->trace->close)
@@ -2782,7 +2782,7 @@ int tracer_init(struct tracer *t, struct
 	return t->init(tr);
 }
 
-static int tracing_resize_ring_buffer(unsigned long size)
+static int tracing_resize_ftrace_ring_buffer(unsigned long size)
 {
 	int ret;
 
@@ -2791,17 +2791,17 @@ static int tracing_resize_ring_buffer(un
 	 * we use the size that was given, and we can forget about
 	 * expanding it later.
 	 */
-	ring_buffer_expanded = 1;
+	ftrace_ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ftrace_ring_buffer_resize(global_trace.buffer, size);
 	if (ret < 0)
 		return ret;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ftrace_ring_buffer_resize(max_tr.buffer, size);
 	if (ret < 0) {
 		int r;
 
-		r = ring_buffer_resize(global_trace.buffer,
+		r = ftrace_ring_buffer_resize(global_trace.buffer,
 				       global_trace.entries);
 		if (r < 0) {
 			/*
@@ -2844,8 +2844,8 @@ int tracing_update_buffers(void)
 	int ret = 0;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		ret = tracing_resize_ring_buffer(trace_buf_size);
+	if (!ftrace_ring_buffer_expanded)
+		ret = tracing_resize_ftrace_ring_buffer(trace_buf_size);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -2868,8 +2868,8 @@ static int tracing_set_tracer(const char
 
 	mutex_lock(&trace_types_lock);
 
-	if (!ring_buffer_expanded) {
-		ret = tracing_resize_ring_buffer(trace_buf_size);
+	if (!ftrace_ring_buffer_expanded) {
+		ret = tracing_resize_ftrace_ring_buffer(trace_buf_size);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3404,7 +3404,7 @@ tracing_entries_read(struct file *filp,
 	int r;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
+	if (!ftrace_ring_buffer_expanded)
 		r = sprintf(buf, "%lu (expanded: %lu)\n",
 			    tr->entries >> 10,
 			    trace_buf_size >> 10);
@@ -3455,7 +3455,7 @@ tracing_entries_write(struct file *filp,
 	val <<= 10;
 
 	if (val != global_trace.entries) {
-		ret = tracing_resize_ring_buffer(val);
+		ret = tracing_resize_ftrace_ring_buffer(val);
 		if (ret < 0) {
 			cnt = ret;
 			goto out;
@@ -3567,9 +3567,9 @@ static ssize_t tracing_clock_write(struc
 
 	mutex_lock(&trace_types_lock);
 
-	ring_buffer_set_clock(global_trace.buffer, trace_clocks[i].func);
+	ftrace_ring_buffer_set_clock(global_trace.buffer, trace_clocks[i].func);
 	if (max_tr.buffer)
-		ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
+		ftrace_ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
 	mutex_unlock(&trace_types_lock);
 
@@ -3672,7 +3672,7 @@ tracing_buffers_read(struct file *filp,
 		return 0;
 
 	if (!info->spare)
-		info->spare = ring_buffer_alloc_read_page(info->tr->buffer);
+		info->spare = ftrace_ring_buffer_alloc_read_page(info->tr->buffer);
 	if (!info->spare)
 		return -ENOMEM;
 
@@ -3683,7 +3683,7 @@ tracing_buffers_read(struct file *filp,
 	info->read = 0;
 
 	trace_access_lock(info->cpu);
-	ret = ring_buffer_read_page(info->tr->buffer,
+	ret = ftrace_ring_buffer_read_page(info->tr->buffer,
 				    &info->spare,
 				    count,
 				    info->cpu, 0);
@@ -3712,14 +3712,14 @@ static int tracing_buffers_release(struc
 	struct ftrace_buffer_info *info = file->private_data;
 
 	if (info->spare)
-		ring_buffer_free_read_page(info->tr->buffer, info->spare);
+		ftrace_ring_buffer_free_read_page(info->tr->buffer, info->spare);
 	kfree(info);
 
 	return 0;
 }
 
 struct buffer_ref {
-	struct ring_buffer	*buffer;
+	struct ftrace_ring_buffer	*buffer;
 	void			*page;
 	int			ref;
 };
@@ -3732,7 +3732,7 @@ static void buffer_pipe_buf_release(stru
 	if (--ref->ref)
 		return;
 
-	ring_buffer_free_read_page(ref->buffer, ref->page);
+	ftrace_ring_buffer_free_read_page(ref->buffer, ref->page);
 	kfree(ref);
 	buf->private = 0;
 }
@@ -3774,7 +3774,7 @@ static void buffer_spd_release(struct sp
 	if (--ref->ref)
 		return;
 
-	ring_buffer_free_read_page(ref->buffer, ref->page);
+	ftrace_ring_buffer_free_read_page(ref->buffer, ref->page);
 	kfree(ref);
 	spd->partial[i].private = 0;
 }
@@ -3817,7 +3817,7 @@ tracing_buffers_splice_read(struct file
 	}
 
 	trace_access_lock(info->cpu);
-	entries = ring_buffer_entries_cpu(info->tr->buffer, info->cpu);
+	entries = ftrace_ring_buffer_entries_cpu(info->tr->buffer, info->cpu);
 
 	for (i = 0; i < pipe->buffers && len && entries; i++, len -= PAGE_SIZE) {
 		struct page *page;
@@ -3829,16 +3829,16 @@ tracing_buffers_splice_read(struct file
 
 		ref->ref = 1;
 		ref->buffer = info->tr->buffer;
-		ref->page = ring_buffer_alloc_read_page(ref->buffer);
+		ref->page = ftrace_ring_buffer_alloc_read_page(ref->buffer);
 		if (!ref->page) {
 			kfree(ref);
 			break;
 		}
 
-		r = ring_buffer_read_page(ref->buffer, &ref->page,
+		r = ftrace_ring_buffer_read_page(ref->buffer, &ref->page,
 					  len, info->cpu, 1);
 		if (r < 0) {
-			ring_buffer_free_read_page(ref->buffer,
+			ftrace_ring_buffer_free_read_page(ref->buffer,
 						   ref->page);
 			kfree(ref);
 			break;
@@ -3848,7 +3848,7 @@ tracing_buffers_splice_read(struct file
 		 * zero out any left over data, this is going to
 		 * user land.
 		 */
-		size = ring_buffer_page_len(ref->page);
+		size = ftrace_ring_buffer_page_len(ref->page);
 		if (size < PAGE_SIZE)
 			memset(ref->page + size, 0, PAGE_SIZE - size);
 
@@ -3861,7 +3861,7 @@ tracing_buffers_splice_read(struct file
 		spd.nr_pages++;
 		*ppos += PAGE_SIZE;
 
-		entries = ring_buffer_entries_cpu(info->tr->buffer, info->cpu);
+		entries = ftrace_ring_buffer_entries_cpu(info->tr->buffer, info->cpu);
 	}
 
 	trace_access_unlock(info->cpu);
@@ -3906,13 +3906,13 @@ tracing_stats_read(struct file *filp, ch
 
 	trace_seq_init(s);
 
-	cnt = ring_buffer_entries_cpu(tr->buffer, cpu);
+	cnt = ftrace_ring_buffer_entries_cpu(tr->buffer, cpu);
 	trace_seq_printf(s, "entries: %ld\n", cnt);
 
-	cnt = ring_buffer_overrun_cpu(tr->buffer, cpu);
+	cnt = ftrace_ring_buffer_overrun_cpu(tr->buffer, cpu);
 	trace_seq_printf(s, "overrun: %ld\n", cnt);
 
-	cnt = ring_buffer_commit_overrun_cpu(tr->buffer, cpu);
+	cnt = ftrace_ring_buffer_commit_overrun_cpu(tr->buffer, cpu);
 	trace_seq_printf(s, "commit overrun: %ld\n", cnt);
 
 	count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->len);
@@ -4554,7 +4554,7 @@ __init static int tracer_alloc_buffers(v
 		goto out_free_buffer_mask;
 
 	/* To save memory, keep the ring buffer size to its minimum */
-	if (ring_buffer_expanded)
+	if (ftrace_ring_buffer_expanded)
 		ring_buf_size = trace_buf_size;
 	else
 		ring_buf_size = 1;
@@ -4563,26 +4563,26 @@ __init static int tracer_alloc_buffers(v
 	cpumask_copy(tracing_cpumask, cpu_all_mask);
 
 	/* TODO: make the number of buffers hot pluggable with CPUS */
-	global_trace.buffer = ring_buffer_alloc(ring_buf_size,
+	global_trace.buffer = ftrace_ring_buffer_alloc(ring_buf_size,
 						   TRACE_BUFFER_FLAGS);
 	if (!global_trace.buffer) {
 		printk(KERN_ERR "tracer: failed to allocate ring buffer!\n");
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
+	global_trace.entries = ftrace_ring_buffer_size(global_trace.buffer);
 
 
 #ifdef CONFIG_TRACER_MAX_TRACE
-	max_tr.buffer = ring_buffer_alloc(ring_buf_size,
+	max_tr.buffer = ftrace_ring_buffer_alloc(ring_buf_size,
 					     TRACE_BUFFER_FLAGS);
 	if (!max_tr.buffer) {
 		printk(KERN_ERR "tracer: failed to allocate max ring buffer!\n");
 		WARN_ON(1);
-		ring_buffer_free(global_trace.buffer);
+		ftrace_ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = ring_buffer_size(max_tr.buffer);
+	max_tr.entries = ftrace_ring_buffer_size(max_tr.buffer);
 	WARN_ON(max_tr.entries != global_trace.entries);
 #endif
 
Index: linux.trees.git/kernel/trace/trace.h
===================================================================
--- linux.trees.git.orig/kernel/trace/trace.h	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace.h	2010-07-09 18:08:47.000000000 -0400
@@ -5,7 +5,7 @@
 #include <asm/atomic.h>
 #include <linux/sched.h>
 #include <linux/clocksource.h>
-#include <linux/ring_buffer.h>
+#include <linux/ftrace_ring_buffer.h>
 #include <linux/mmiotrace.h>
 #include <linux/tracepoint.h>
 #include <linux/ftrace.h>
@@ -147,7 +147,7 @@ struct trace_array_cpu {
  * They have on/off state as well:
  */
 struct trace_array {
-	struct ring_buffer	*buffer;
+	struct ftrace_ring_buffer	*buffer;
 	unsigned long		entries;
 	int			cpu;
 	cycle_t			time_start;
@@ -300,16 +300,16 @@ struct dentry *trace_create_file(const c
 struct dentry *tracing_init_dentry(void);
 void init_tracer_sysprof_debugfs(struct dentry *d_tracer);
 
-struct ring_buffer_event;
+struct ftrace_ring_buffer_event;
 
-struct ring_buffer_event *
-trace_buffer_lock_reserve(struct ring_buffer *buffer,
+struct ftrace_ring_buffer_event *
+trace_buffer_lock_reserve(struct ftrace_ring_buffer *buffer,
 			  int type,
 			  unsigned long len,
 			  unsigned long flags,
 			  int pc);
-void trace_buffer_unlock_commit(struct ring_buffer *buffer,
-				struct ring_buffer_event *event,
+void trace_buffer_unlock_commit(struct ftrace_ring_buffer *buffer,
+				struct ftrace_ring_buffer_event *event,
 				unsigned long flags, int pc);
 
 struct trace_entry *tracing_get_trace_entry(struct trace_array *tr,
@@ -376,21 +376,21 @@ void update_max_tr_single(struct trace_a
 #endif /* CONFIG_TRACER_MAX_TRACE */
 
 #ifdef CONFIG_STACKTRACE
-void ftrace_trace_stack(struct ring_buffer *buffer, unsigned long flags,
+void ftrace_trace_stack(struct ftrace_ring_buffer *buffer, unsigned long flags,
 			int skip, int pc);
 
-void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags,
+void ftrace_trace_userstack(struct ftrace_ring_buffer *buffer, unsigned long flags,
 			    int pc);
 
 void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
 		   int pc);
 #else
-static inline void ftrace_trace_stack(struct ring_buffer *buffer,
+static inline void ftrace_trace_stack(struct ftrace_ring_buffer *buffer,
 				      unsigned long flags, int skip, int pc)
 {
 }
 
-static inline void ftrace_trace_userstack(struct ring_buffer *buffer,
+static inline void ftrace_trace_userstack(struct ftrace_ring_buffer *buffer,
 					  unsigned long flags, int pc)
 {
 }
@@ -411,7 +411,7 @@ extern unsigned long ftrace_update_tot_c
 extern int DYN_FTRACE_TEST_NAME(void);
 #endif
 
-extern int ring_buffer_expanded;
+extern int ftrace_ring_buffer_expanded;
 extern bool tracing_selftest_disabled;
 DECLARE_PER_CPU(int, ftrace_cpu_disabled);
 
@@ -717,12 +717,12 @@ trace_get_fields(struct ftrace_event_cal
 
 static inline int
 filter_check_discard(struct ftrace_event_call *call, void *rec,
-		     struct ring_buffer *buffer,
-		     struct ring_buffer_event *event)
+		     struct ftrace_ring_buffer *buffer,
+		     struct ftrace_ring_buffer_event *event)
 {
 	if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) &&
 	    !filter_match_preds(call->filter, rec)) {
-		ring_buffer_discard_commit(buffer, event);
+		ftrace_ring_buffer_discard_commit(buffer, event);
 		return 1;
 	}
 
Index: linux.trees.git/kernel/trace/trace_branch.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_branch.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_branch.c	2010-07-09 18:08:47.000000000 -0400
@@ -32,9 +32,9 @@ probe_likely_condition(struct ftrace_bra
 {
 	struct ftrace_event_call *call = &event_branch;
 	struct trace_array *tr = branch_tracer;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct trace_branch *entry;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer *buffer;
 	unsigned long flags;
 	int cpu, pc;
 	const char *p;
@@ -61,7 +61,7 @@ probe_likely_condition(struct ftrace_bra
 	if (!event)
 		goto out;
 
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 
 	/* Strip off the path, only save the file */
 	p = f->file + strlen(f->file);
@@ -77,7 +77,7 @@ probe_likely_condition(struct ftrace_bra
 	entry->correct = val == expect;
 
 	if (!filter_check_discard(call, entry, buffer, event))
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 
  out:
 	atomic_dec(&tr->data[cpu]->disabled);
Index: linux.trees.git/kernel/trace/trace_events.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_events.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_events.c	2010-07-09 18:08:47.000000000 -0400
@@ -1277,7 +1277,7 @@ static char bootup_event_buf[COMMAND_LIN
 static __init int setup_trace_event(char *str)
 {
 	strlcpy(bootup_event_buf, str, COMMAND_LINE_SIZE);
-	ring_buffer_expanded = 1;
+	ftrace_ring_buffer_expanded = 1;
 	tracing_selftest_disabled = 1;
 
 	return 1;
@@ -1318,11 +1318,11 @@ static __init int event_trace_init(void)
 
 	/* ring buffer internal formats */
 	trace_create_file("header_page", 0444, d_events,
-			  ring_buffer_print_page_header,
+			  ftrace_ring_buffer_print_page_header,
 			  &ftrace_show_header_fops);
 
 	trace_create_file("header_event", 0444, d_events,
-			  ring_buffer_print_entry_header,
+			  ftrace_ring_buffer_print_entry_header,
 			  &ftrace_show_header_fops);
 
 	trace_create_file("enable", 0644, d_events,
@@ -1517,8 +1517,8 @@ static DEFINE_PER_CPU(atomic_t, ftrace_t
 static void
 function_test_events_call(unsigned long ip, unsigned long parent_ip)
 {
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer;
 	struct ftrace_entry *entry;
 	unsigned long flags;
 	long disabled;
@@ -1540,7 +1540,7 @@ function_test_events_call(unsigned long
 						  flags, pc);
 	if (!event)
 		goto out;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->ip			= ip;
 	entry->parent_ip		= parent_ip;
 
Index: linux.trees.git/kernel/trace/trace_functions_graph.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_functions_graph.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_functions_graph.c	2010-07-09 18:08:47.000000000 -0400
@@ -185,8 +185,8 @@ int __trace_graph_entry(struct trace_arr
 				int pc)
 {
 	struct ftrace_event_call *call = &event_funcgraph_entry;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
 	struct ftrace_graph_ent_entry *entry;
 
 	if (unlikely(__this_cpu_read(ftrace_cpu_disabled)))
@@ -196,10 +196,10 @@ int __trace_graph_entry(struct trace_arr
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return 0;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->graph_ent			= *trace;
 	if (!filter_current_check_discard(buffer, call, entry, event))
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 
 	return 1;
 }
@@ -252,8 +252,8 @@ void __trace_graph_return(struct trace_a
 				int pc)
 {
 	struct ftrace_event_call *call = &event_funcgraph_exit;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
 	struct ftrace_graph_ret_entry *entry;
 
 	if (unlikely(__this_cpu_read(ftrace_cpu_disabled)))
@@ -263,10 +263,10 @@ void __trace_graph_return(struct trace_a
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->ret				= *trace;
 	if (!filter_current_check_discard(buffer, call, entry, event))
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 }
 
 void trace_graph_return(struct ftrace_graph_ret *trace)
@@ -467,8 +467,8 @@ get_return_for_leaf(struct trace_iterato
 		struct ftrace_graph_ent_entry *curr)
 {
 	struct fgraph_data *data = iter->private;
-	struct ring_buffer_iter *ring_iter = NULL;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_iter *ring_iter = NULL;
+	struct ftrace_ring_buffer_event *event;
 	struct ftrace_graph_ret_entry *next;
 
 	/*
@@ -484,22 +484,22 @@ get_return_for_leaf(struct trace_iterato
 
 		/* First peek to compare current entry and the next one */
 		if (ring_iter)
-			event = ring_buffer_iter_peek(ring_iter, NULL);
+			event = ftrace_ring_buffer_iter_peek(ring_iter, NULL);
 		else {
 			/*
 			 * We need to consume the current entry to see
 			 * the next one.
 			 */
-			ring_buffer_consume(iter->tr->buffer, iter->cpu,
+			ftrace_ring_buffer_consume(iter->tr->buffer, iter->cpu,
 					    NULL, NULL);
-			event = ring_buffer_peek(iter->tr->buffer, iter->cpu,
+			event = ftrace_ring_buffer_peek(iter->tr->buffer, iter->cpu,
 						 NULL, NULL);
 		}
 
 		if (!event)
 			return NULL;
 
-		next = ring_buffer_event_data(event);
+		next = ftrace_ring_buffer_event_data(event);
 
 		if (data) {
 			/*
@@ -520,7 +520,7 @@ get_return_for_leaf(struct trace_iterato
 
 	/* this is a leaf, now advance the iterator */
 	if (ring_iter)
-		ring_buffer_read(ring_iter, NULL);
+		ftrace_ring_buffer_read(ring_iter, NULL);
 
 	return next;
 }
Index: linux.trees.git/kernel/trace/trace_kprobe.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_kprobe.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_kprobe.c	2010-07-09 18:08:47.000000000 -0400
@@ -1264,8 +1264,8 @@ static __kprobes void kprobe_trace_func(
 {
 	struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
 	struct kprobe_trace_entry_head *entry;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer;
 	int size, dsize, pc;
 	unsigned long irq_flags;
 	struct ftrace_event_call *call = &tp->call;
@@ -1283,7 +1283,7 @@ static __kprobes void kprobe_trace_func(
 	if (!event)
 		return;
 
-	entry = ring_buffer_event_data(event);
+	entry = ftrace_ring_buffer_event_data(event);
 	entry->ip = (unsigned long)kp->addr;
 	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
 
@@ -1297,8 +1297,8 @@ static __kprobes void kretprobe_trace_fu
 {
 	struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
 	struct kretprobe_trace_entry_head *entry;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer;
 	int size, pc, dsize;
 	unsigned long irq_flags;
 	struct ftrace_event_call *call = &tp->call;
@@ -1314,7 +1314,7 @@ static __kprobes void kretprobe_trace_fu
 	if (!event)
 		return;
 
-	entry = ring_buffer_event_data(event);
+	entry = ftrace_ring_buffer_event_data(event);
 	entry->func = (unsigned long)tp->rp.kp.addr;
 	entry->ret_ip = (unsigned long)ri->ret_addr;
 	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
Index: linux.trees.git/kernel/trace/trace_ksym.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_ksym.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_ksym.c	2010-07-09 18:08:47.000000000 -0400
@@ -77,9 +77,9 @@ void ksym_hbp_handler(struct perf_event
 		      struct perf_sample_data *data,
 		      struct pt_regs *regs)
 {
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct ksym_trace_entry *entry;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer *buffer;
 	int pc;
 
 	if (!ksym_tracing_enabled)
@@ -94,7 +94,7 @@ void ksym_hbp_handler(struct perf_event
 	if (!event)
 		return;
 
-	entry		= ring_buffer_event_data(event);
+	entry		= ftrace_ring_buffer_event_data(event);
 	entry->ip	= instruction_pointer(regs);
 	entry->type	= hw_breakpoint_type(hbp);
 	entry->addr	= hw_breakpoint_addr(hbp);
Index: linux.trees.git/kernel/trace/trace_mmiotrace.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_mmiotrace.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_mmiotrace.c	2010-07-09 18:08:47.000000000 -0400
@@ -128,7 +128,7 @@ static void mmio_close(struct trace_iter
 static unsigned long count_overruns(struct trace_iterator *iter)
 {
 	unsigned long cnt = atomic_xchg(&dropped_count, 0);
-	unsigned long over = ring_buffer_overruns(iter->tr->buffer);
+	unsigned long over = ftrace_ring_buffer_overruns(iter->tr->buffer);
 
 	if (over > prev_overruns)
 		cnt += over - prev_overruns;
@@ -309,8 +309,8 @@ static void __trace_mmiotrace_rw(struct
 				struct mmiotrace_rw *rw)
 {
 	struct ftrace_event_call *call = &event_mmiotrace_rw;
-	struct ring_buffer *buffer = tr->buffer;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer_event *event;
 	struct trace_mmiotrace_rw *entry;
 	int pc = preempt_count();
 
@@ -320,7 +320,7 @@ static void __trace_mmiotrace_rw(struct
 		atomic_inc(&dropped_count);
 		return;
 	}
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->rw			= *rw;
 
 	if (!filter_check_discard(call, entry, buffer, event))
@@ -339,8 +339,8 @@ static void __trace_mmiotrace_map(struct
 				struct mmiotrace_map *map)
 {
 	struct ftrace_event_call *call = &event_mmiotrace_map;
-	struct ring_buffer *buffer = tr->buffer;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer_event *event;
 	struct trace_mmiotrace_map *entry;
 	int pc = preempt_count();
 
@@ -350,7 +350,7 @@ static void __trace_mmiotrace_map(struct
 		atomic_inc(&dropped_count);
 		return;
 	}
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->map			= *map;
 
 	if (!filter_check_discard(call, entry, buffer, event))
Index: linux.trees.git/kernel/trace/trace_sched_switch.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_sched_switch.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_sched_switch.c	2010-07-09 18:08:47.000000000 -0400
@@ -28,15 +28,15 @@ tracing_sched_switch_trace(struct trace_
 			   unsigned long flags, int pc)
 {
 	struct ftrace_event_call *call = &event_context_switch;
-	struct ring_buffer *buffer = tr->buffer;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer_event *event;
 	struct ctx_switch_entry *entry;
 
 	event = trace_buffer_lock_reserve(buffer, TRACE_CTX,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->prev_pid			= prev->pid;
 	entry->prev_prio		= prev->prio;
 	entry->prev_state		= prev->state;
@@ -84,15 +84,15 @@ tracing_sched_wakeup_trace(struct trace_
 			   unsigned long flags, int pc)
 {
 	struct ftrace_event_call *call = &event_wakeup;
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct ctx_switch_entry *entry;
-	struct ring_buffer *buffer = tr->buffer;
+	struct ftrace_ring_buffer *buffer = tr->buffer;
 
 	event = trace_buffer_lock_reserve(buffer, TRACE_WAKE,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
-	entry	= ring_buffer_event_data(event);
+	entry	= ftrace_ring_buffer_event_data(event);
 	entry->prev_pid			= curr->pid;
 	entry->prev_prio		= curr->prio;
 	entry->prev_state		= curr->state;
@@ -102,7 +102,7 @@ tracing_sched_wakeup_trace(struct trace_
 	entry->next_cpu			= task_cpu(wakee);
 
 	if (!filter_check_discard(call, entry, buffer, event))
-		ring_buffer_unlock_commit(buffer, event);
+		ftrace_ring_buffer_unlock_commit(buffer, event);
 	ftrace_trace_stack(tr->buffer, flags, 6, pc);
 	ftrace_trace_userstack(tr->buffer, flags, pc);
 }
Index: linux.trees.git/kernel/trace/trace_selftest.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_selftest.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_selftest.c	2010-07-09 18:08:47.000000000 -0400
@@ -25,12 +25,12 @@ static inline int trace_valid_entry(stru
 
 static int trace_test_buffer_cpu(struct trace_array *tr, int cpu)
 {
-	struct ring_buffer_event *event;
+	struct ftrace_ring_buffer_event *event;
 	struct trace_entry *entry;
 	unsigned int loops = 0;
 
-	while ((event = ring_buffer_consume(tr->buffer, cpu, NULL, NULL))) {
-		entry = ring_buffer_event_data(event);
+	while ((event = ftrace_ring_buffer_consume(tr->buffer, cpu, NULL, NULL))) {
+		entry = ftrace_ring_buffer_event_data(event);
 
 		/*
 		 * The ring buffer is a size of trace_buf_size, if
@@ -69,7 +69,7 @@ static int trace_test_buffer(struct trac
 	local_irq_save(flags);
 	arch_spin_lock(&ftrace_max_lock);
 
-	cnt = ring_buffer_entries(tr->buffer);
+	cnt = ftrace_ring_buffer_entries(tr->buffer);
 
 	/*
 	 * The trace_test_buffer_cpu runs a while loop to consume all data.
Index: linux.trees.git/kernel/trace/trace_syscalls.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_syscalls.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_syscalls.c	2010-07-09 18:08:47.000000000 -0400
@@ -298,8 +298,8 @@ void ftrace_syscall_enter(void *ignore,
 {
 	struct syscall_trace_enter *entry;
 	struct syscall_metadata *sys_data;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer;
 	int size;
 	int syscall_nr;
 
@@ -320,7 +320,7 @@ void ftrace_syscall_enter(void *ignore,
 	if (!event)
 		return;
 
-	entry = ring_buffer_event_data(event);
+	entry = ftrace_ring_buffer_event_data(event);
 	entry->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
 
@@ -333,8 +333,8 @@ void ftrace_syscall_exit(void *ignore, s
 {
 	struct syscall_trace_exit *entry;
 	struct syscall_metadata *sys_data;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
+	struct ftrace_ring_buffer_event *event;
+	struct ftrace_ring_buffer *buffer;
 	int syscall_nr;
 
 	syscall_nr = syscall_get_nr(current, regs);
@@ -352,7 +352,7 @@ void ftrace_syscall_exit(void *ignore, s
 	if (!event)
 		return;
 
-	entry = ring_buffer_event_data(event);
+	entry = ftrace_ring_buffer_event_data(event);
 	entry->nr = syscall_nr;
 	entry->ret = syscall_get_return_value(current, regs);
 
Index: linux.trees.git/drivers/oprofile/cpu_buffer.c
===================================================================
--- linux.trees.git.orig/drivers/oprofile/cpu_buffer.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/drivers/oprofile/cpu_buffer.c	2010-07-09 18:08:47.000000000 -0400
@@ -30,7 +30,7 @@
 
 #define OP_BUFFER_FLAGS	0
 
-static struct ring_buffer *op_ring_buffer;
+static struct ftrace_ring_buffer *op_ftrace_ring_buffer;
 DEFINE_PER_CPU(struct oprofile_cpu_buffer, op_cpu_buffer);
 
 static void wq_sync_buffer(struct work_struct *work);
@@ -52,9 +52,9 @@ void oprofile_cpu_buffer_inc_smpl_lost(v
 
 void free_cpu_buffers(void)
 {
-	if (op_ring_buffer)
-		ring_buffer_free(op_ring_buffer);
-	op_ring_buffer = NULL;
+	if (op_ftrace_ring_buffer)
+		ftrace_ring_buffer_free(op_ftrace_ring_buffer);
+	op_ftrace_ring_buffer = NULL;
 }
 
 #define RB_EVENT_HDR_SIZE 4
@@ -67,8 +67,8 @@ int alloc_cpu_buffers(void)
 	unsigned long byte_size = buffer_size * (sizeof(struct op_sample) +
 						 RB_EVENT_HDR_SIZE);
 
-	op_ring_buffer = ring_buffer_alloc(byte_size, OP_BUFFER_FLAGS);
-	if (!op_ring_buffer)
+	op_ftrace_ring_buffer = ftrace_ring_buffer_alloc(byte_size, OP_BUFFER_FLAGS);
+	if (!op_ftrace_ring_buffer)
 		goto fail;
 
 	for_each_possible_cpu(i) {
@@ -139,12 +139,12 @@ void end_cpu_work(void)
 struct op_sample
 *op_cpu_buffer_write_reserve(struct op_entry *entry, unsigned long size)
 {
-	entry->event = ring_buffer_lock_reserve
-		(op_ring_buffer, sizeof(struct op_sample) +
+	entry->event = ftrace_ring_buffer_lock_reserve
+		(op_ftrace_ring_buffer, sizeof(struct op_sample) +
 		 size * sizeof(entry->sample->data[0]));
 	if (!entry->event)
 		return NULL;
-	entry->sample = ring_buffer_event_data(entry->event);
+	entry->sample = ftrace_ring_buffer_event_data(entry->event);
 	entry->size = size;
 	entry->data = entry->sample->data;
 
@@ -153,19 +153,19 @@ struct op_sample
 
 int op_cpu_buffer_write_commit(struct op_entry *entry)
 {
-	return ring_buffer_unlock_commit(op_ring_buffer, entry->event);
+	return ftrace_ring_buffer_unlock_commit(op_ftrace_ring_buffer, entry->event);
 }
 
 struct op_sample *op_cpu_buffer_read_entry(struct op_entry *entry, int cpu)
 {
-	struct ring_buffer_event *e;
-	e = ring_buffer_consume(op_ring_buffer, cpu, NULL, NULL);
+	struct ftrace_ring_buffer_event *e;
+	e = ftrace_ring_buffer_consume(op_ftrace_ring_buffer, cpu, NULL, NULL);
 	if (!e)
 		return NULL;
 
 	entry->event = e;
-	entry->sample = ring_buffer_event_data(e);
-	entry->size = (ring_buffer_event_length(e) - sizeof(struct op_sample))
+	entry->sample = ftrace_ring_buffer_event_data(e);
+	entry->size = (ftrace_ring_buffer_event_length(e) - sizeof(struct op_sample))
 		/ sizeof(entry->sample->data[0]);
 	entry->data = entry->sample->data;
 	return entry->sample;
@@ -173,7 +173,7 @@ struct op_sample *op_cpu_buffer_read_ent
 
 unsigned long op_cpu_buffer_entries(int cpu)
 {
-	return ring_buffer_entries_cpu(op_ring_buffer, cpu);
+	return ftrace_ring_buffer_entries_cpu(op_ftrace_ring_buffer, cpu);
 }
 
 static int
Index: linux.trees.git/drivers/oprofile/cpu_buffer.h
===================================================================
--- linux.trees.git.orig/drivers/oprofile/cpu_buffer.h	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/drivers/oprofile/cpu_buffer.h	2010-07-09 18:08:47.000000000 -0400
@@ -16,7 +16,7 @@
 #include <linux/workqueue.h>
 #include <linux/cache.h>
 #include <linux/sched.h>
-#include <linux/ring_buffer.h>
+#include <linux/ftrace_ring_buffer.h>
 
 struct task_struct;
 
Index: linux.trees.git/include/linux/kernel.h
===================================================================
--- linux.trees.git.orig/include/linux/kernel.h	2010-07-09 18:08:28.000000000 -0400
+++ linux.trees.git/include/linux/kernel.h	2010-07-09 18:08:47.000000000 -0400
@@ -485,7 +485,7 @@ extern int hex_to_bin(char ch);
  *
  * Most likely, you want to use tracing_on/tracing_off.
  */
-#ifdef CONFIG_RING_BUFFER
+#ifdef CONFIG_FTRACE_RING_BUFFER
 void tracing_on(void);
 void tracing_off(void);
 /* trace_off_permanent stops recording with no way to bring it back */
Index: linux.trees.git/kernel/trace/trace_functions.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_functions.c	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/kernel/trace/trace_functions.c	2010-07-09 18:08:47.000000000 -0400
@@ -9,7 +9,7 @@
  *  Copyright (C) 2004-2006 Ingo Molnar
  *  Copyright (C) 2004 William Lee Irwin III
  */
-#include <linux/ring_buffer.h>
+#include <linux/ftrace_ring_buffer.h>
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
 #include <linux/ftrace.h>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 12/20] ring buffer backend
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (10 preceding siblings ...)
  2010-07-09 22:57 ` [patch 11/20] Ftrace ring buffer renaming Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 13/20] ring buffer frontend Mathieu Desnoyers
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring_buffer_backend.patch --]
[-- Type: text/plain, Size: 45633 bytes --]

Ring buffer backend, with page allocation, data read/write API, cpu hotplug
management.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/ringbuffer/backend.h          |  141 +++++
 include/linux/ringbuffer/backend_internal.h |  418 +++++++++++++++
 include/linux/ringbuffer/backend_types.h    |   80 ++
 lib/ringbuffer/Makefile                     |    1 
 lib/ringbuffer/ring_buffer_backend.c        |  755 ++++++++++++++++++++++++++++
 5 files changed, 1395 insertions(+)

Index: linux.trees.git/lib/ringbuffer/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/Makefile	2010-07-09 18:09:01.000000000 -0400
@@ -0,0 +1 @@
+obj-y += ring_buffer_backend.o
Index: linux.trees.git/include/linux/ringbuffer/backend.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/backend.h	2010-07-09 18:11:39.000000000 -0400
@@ -0,0 +1,141 @@
+#ifndef _LINUX_RING_BUFFER_BACKEND_H
+#define _LINUX_RING_BUFFER_BACKEND_H
+
+/*
+ * linux/ringbuffer/backend.h
+ *
+ * Copyright (C) 2008-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer backend (API).
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ *
+ * Credits to Steven Rostedt for proposing to use an extra-subbuffer owned by
+ * the reader in flight recorder mode.
+ */
+
+#include <linux/types.h>
+#include <linux/sched.h>
+#include <linux/timer.h>
+#include <linux/wait.h>
+#include <linux/poll.h>
+#include <linux/list.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+
+/* Internal helpers */
+#include <linux/ringbuffer/backend_internal.h>
+#include <linux/ringbuffer/frontend_internal.h>
+
+/* Ring buffer backend API */
+
+/* Ring buffer backend access (read/write) */
+
+extern size_t ring_buffer_read(struct ring_buffer_backend *bufb,
+			       size_t offset, void *dest, size_t len);
+
+extern int __ring_buffer_copy_to_user(struct ring_buffer_backend *bufb,
+				      size_t offset, void __user *dest,
+				      size_t len);
+
+extern int ring_buffer_read_cstr(struct ring_buffer_backend *bufb,
+				 size_t offset, void *dest, size_t len);
+
+extern struct page **
+ring_buffer_read_get_page(struct ring_buffer_backend *bufb, size_t offset,
+			  void ***virt);
+
+/*
+ * Return the address where a given offset is located.
+ * Should be used to get the current subbuffer header pointer. Given we know
+ * it's never on a page boundary, it's safe to write directly to this address,
+ * as long as the write is never bigger than a page size.
+ */
+extern void *
+ring_buffer_offset_address(struct ring_buffer_backend *bufb,
+			   size_t offset);
+extern void *
+ring_buffer_read_offset_address(struct ring_buffer_backend *bufb,
+				size_t offset);
+
+/**
+ * ring_buffer_write - write data to a buffer backend
+ * @config : ring buffer instance configuration
+ * @ctx: ring buffer context. (input arguments only)
+ * @src : source pointer to copy from
+ * @len : length of data to copy
+ *
+ * This function copies "len" bytes of data from a source pointer to a buffer
+ * backend, at the current context offset. This is more or less a buffer
+ * backend-specific memcpy() operation. Calls the slow path (_ring_buffer_write)
+ * if copy is crossing a page boundary.
+ */
+static inline
+void ring_buffer_write(const struct ring_buffer_config *config,
+		       struct ring_buffer_ctx *ctx,
+		       const void *src, size_t len)
+{
+	struct ring_buffer_backend *bufb = &ctx->buf->backend;
+	struct channel_backend *chanb = &ctx->chan->backend;
+	size_t sbidx, index;
+	size_t offset = ctx->buf_offset;
+	ssize_t pagecpy;
+	struct ring_buffer_backend_pages *rpages;
+	unsigned long sb_bindex, id;
+
+	offset &= chanb->buf_size - 1;
+	sbidx = offset >> chanb->subbuf_size_order;
+	index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+	pagecpy = min_t(size_t, len, (-offset) & ~PAGE_MASK);
+	id = bufb->buf_wsb[sbidx].id;
+	sb_bindex = subbuffer_id_get_index(config, id);
+	rpages = bufb->array[sb_bindex];
+	CHAN_WARN_ON(ctx->chan,
+		     config->mode == RING_BUFFER_OVERWRITE
+		     && subbuffer_id_is_noref(config, id));
+	if (likely(pagecpy == len))
+		ring_buffer_do_copy(config,
+				    rpages->p[index].virt
+				    + (offset & ~PAGE_MASK),
+				    src, len);
+	else
+		_ring_buffer_write(bufb, offset, src, len, 0);
+	ctx->buf_offset += len;
+}
+
+/*
+ * This accessor counts the number of unread records in a buffer.
+ * It only provides a consistent value if no reads not writes are performed
+ * concurrently.
+ */
+static inline
+unsigned long ring_buffer_get_records_unread(
+				const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	struct ring_buffer_backend *bufb = &buf->backend;
+	struct ring_buffer_backend_pages *pages;
+	unsigned long records_unread = 0, sb_bindex, id;
+	unsigned int i;
+
+	for (i = 0; i < bufb->chan->backend.num_subbuf; i++) {
+		id = bufb->buf_wsb[i].id;
+		sb_bindex = subbuffer_id_get_index(config, id);
+		pages = bufb->array[sb_bindex];
+		records_unread += v_read(config, &pages->records_unread);
+	}
+	if (config->mode == RING_BUFFER_OVERWRITE) {
+		id = bufb->buf_rsb.id;
+		sb_bindex = subbuffer_id_get_index(config, id);
+		pages = bufb->array[sb_bindex];
+		records_unread += v_read(config, &pages->records_unread);
+	}
+	return records_unread;
+}
+
+ssize_t ring_buffer_file_splice_read(struct file *in, loff_t *ppos,
+				     struct pipe_inode_info *pipe, size_t len,
+				     unsigned int flags);
+loff_t ring_buffer_no_llseek(struct file *file, loff_t offset, int origin);
+
+#endif /* _LINUX_RING_BUFFER_BACKEND_H */
Index: linux.trees.git/include/linux/ringbuffer/backend_internal.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/backend_internal.h	2010-07-09 18:11:58.000000000 -0400
@@ -0,0 +1,418 @@
+#ifndef _LINUX_RING_BUFFER_BACKEND_INTERNAL_H
+#define _LINUX_RING_BUFFER_BACKEND_INTERNAL_H
+
+/*
+ * linux/ringbuffer/backend_internal.h
+ *
+ * Copyright (C) 2008-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer backend (internal helpers).
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/backend_types.h>
+#include <linux/ringbuffer/frontend_types.h>
+#include <linux/string.h>
+
+/* Ring buffer backend API presented to the frontend */
+
+/* Ring buffer and channel backend create/free */
+
+int ring_buffer_backend_create(struct ring_buffer_backend *bufb,
+			       struct channel_backend *chan,
+			       int cpu);
+void channel_backend_unregister_notifiers(struct channel_backend *chanb);
+void ring_buffer_backend_free(struct ring_buffer_backend *bufb);
+int channel_backend_init(struct channel_backend *chanb,
+			 const char *name,
+			 const struct ring_buffer_config *config,
+			 void *priv, size_t subbuf_size,
+			 size_t num_subbuf);
+void channel_backend_free(struct channel_backend *chanb);
+
+void ring_buffer_backend_reset(struct ring_buffer_backend *bufb);
+void channel_backend_reset(struct channel_backend *chanb);
+
+int ring_buffer_backend_init(void);
+void ring_buffer_backend_exit(void);
+
+extern void _ring_buffer_write(struct ring_buffer_backend *bufb,
+			       size_t offset, const void *src, size_t len,
+			       ssize_t pagecpy);
+
+/*
+ * Subbuffer ID bits for overwrite mode. Need to fit within a single word to be
+ * exchanged atomically.
+ *
+ * Top half word, except lowest bit, belongs to "offset", which is used to keep
+ * to count the produced buffers.  For overwrite mode, this provides the
+ * consumer with the capacity to read subbuffers in order, handling the
+ * situation where producers would write up to 2^15 buffers (or 2^31 for 64-bit
+ * systems) concurrently with a single execution of get_subbuf (between offset
+ * sampling and subbuffer ID exchange).
+ */
+
+#define HALF_ULONG_BITS		(BITS_PER_LONG >> 1)
+
+#define SB_ID_OFFSET_SHIFT	(HALF_ULONG_BITS + 1)
+#define SB_ID_OFFSET_COUNT	(1UL << SB_ID_OFFSET_SHIFT)
+#define SB_ID_OFFSET_MASK	(~(SB_ID_OFFSET_COUNT - 1))
+/*
+ * Lowest bit of top word half belongs to noref. Used only for overwrite mode.
+ */
+#define SB_ID_NOREF_SHIFT	(SB_ID_OFFSET_SHIFT - 1)
+#define SB_ID_NOREF_COUNT	(1UL << SB_ID_NOREF_SHIFT)
+#define SB_ID_NOREF_MASK	SB_ID_NOREF_COUNT
+/*
+ * In overwrite mode: lowest half of word is used for index.
+ * Limit of 2^16 subbuffers per buffer on 32-bit, 2^32 on 64-bit.
+ * In producer-consumer mode: whole word used for index.
+ */
+#define SB_ID_INDEX_SHIFT	0
+#define SB_ID_INDEX_COUNT	(1UL << SB_ID_INDEX_SHIFT)
+#define SB_ID_INDEX_MASK	(SB_ID_NOREF_COUNT - 1)
+
+/*
+ * Construct the subbuffer id from offset, index and noref. Use only the index
+ * for producer-consumer mode (offset and noref are only used in overwrite
+ * mode).
+ */
+static inline
+unsigned long subbuffer_id(const struct ring_buffer_config *config,
+			   unsigned long offset, unsigned long noref,
+			   unsigned long index)
+{
+	if (config->mode == RING_BUFFER_OVERWRITE)
+		return (offset << SB_ID_OFFSET_SHIFT)
+		       | (noref << SB_ID_NOREF_SHIFT)
+		       | index;
+	else
+		return index;
+}
+
+/*
+ * Compare offset with the offset contained within id. Return 1 if the offset
+ * bits are identical, else 0.
+ */
+static inline
+int subbuffer_id_compare_offset(const struct ring_buffer_config *config,
+				unsigned long id, unsigned long offset)
+{
+	return (id & SB_ID_OFFSET_MASK) == (offset << SB_ID_OFFSET_SHIFT);
+}
+
+static inline
+unsigned long subbuffer_id_get_index(const struct ring_buffer_config *config,
+				     unsigned long id)
+{
+	if (config->mode == RING_BUFFER_OVERWRITE)
+		return id & SB_ID_INDEX_MASK;
+	else
+		return id;
+}
+
+static inline
+unsigned long subbuffer_id_is_noref(const struct ring_buffer_config *config,
+				    unsigned long id)
+{
+	if (config->mode == RING_BUFFER_OVERWRITE)
+		return !!(id & SB_ID_NOREF_MASK);
+	else
+		return 1;
+}
+
+/*
+ * Only used by reader on subbuffer ID it has exclusive access to. No volatile
+ * needed.
+ */
+static inline
+void subbuffer_id_set_noref(const struct ring_buffer_config *config,
+			    unsigned long *id)
+{
+	if (config->mode == RING_BUFFER_OVERWRITE)
+		*id |= SB_ID_NOREF_MASK;
+}
+
+static inline
+void subbuffer_id_set_noref_offset(const struct ring_buffer_config *config,
+				   unsigned long *id, unsigned long offset)
+{
+	unsigned long tmp;
+
+	if (config->mode == RING_BUFFER_OVERWRITE) {
+		tmp = *id;
+		tmp &= ~SB_ID_OFFSET_MASK;
+		tmp |= offset << SB_ID_OFFSET_SHIFT;
+		tmp |= SB_ID_NOREF_MASK;
+		/* Volatile store, read concurrently by readers. */
+		ACCESS_ONCE(*id) = tmp;
+	}
+}
+
+/* No volatile access, since already used locally */
+static inline
+void subbuffer_id_clear_noref(const struct ring_buffer_config *config,
+			      unsigned long *id)
+{
+	if (config->mode == RING_BUFFER_OVERWRITE)
+		*id &= ~SB_ID_NOREF_MASK;
+}
+
+/*
+ * For overwrite mode, cap the number of subbuffers per buffer to:
+ * 2^16 on 32-bit architectures
+ * 2^32 on 64-bit architectures
+ * This is required to fit in the index part of the ID. Return 0 on success,
+ * -EPERM on failure.
+ */
+static inline
+int subbuffer_id_check_index(const struct ring_buffer_config *config,
+			     unsigned long num_subbuf)
+{
+	if (config->mode == RING_BUFFER_OVERWRITE)
+		return (num_subbuf > (1UL << HALF_ULONG_BITS)) ? -EPERM : 0;
+	else
+		return 0;
+}
+
+static inline
+void subbuffer_count_record(const struct ring_buffer_config *config,
+			    struct ring_buffer_backend *bufb,
+			    unsigned long idx)
+{
+	unsigned long sb_bindex;
+
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_wsb[idx].id);
+	v_inc(config, &bufb->array[sb_bindex]->records_commit);
+}
+
+/*
+ * Reader has exclusive subbuffer access for record consumption. No need to
+ * perform the decrement atomically.
+ */
+static inline
+void subbuffer_consume_record(const struct ring_buffer_config *config,
+			      struct ring_buffer_backend *bufb)
+{
+	unsigned long sb_bindex;
+
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_rsb.id);
+	CHAN_WARN_ON(bufb->chan,
+		     !v_read(config, &bufb->array[sb_bindex]->records_unread));
+	/* Non-atomic decrement protected by exclusive subbuffer access */
+	_v_dec(config, &bufb->array[sb_bindex]->records_unread);
+	v_inc(config, &bufb->records_read);
+}
+
+static inline
+unsigned long subbuffer_get_records_count(
+				const struct ring_buffer_config *config,
+				struct ring_buffer_backend *bufb,
+				unsigned long idx)
+{
+	unsigned long sb_bindex;
+
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_wsb[idx].id);
+	return v_read(config, &bufb->array[sb_bindex]->records_commit);
+}
+
+/*
+ * Must be executed at subbuffer delivery when the writer has _exclusive_
+ * subbuffer access. See ring_buffer_check_deliver() for details.
+ * ring_buffer_get_records_count() must be called to get the records count
+ * before this function, because it resets the records_commit count.
+ */
+static inline
+unsigned long subbuffer_count_records_overrun(
+				const struct ring_buffer_config *config,
+				struct ring_buffer_backend *bufb,
+				unsigned long idx)
+{
+	struct ring_buffer_backend_pages *pages;
+	unsigned long overruns, sb_bindex;
+
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_wsb[idx].id);
+	pages = bufb->array[sb_bindex];
+	overruns = v_read(config, &pages->records_unread);
+	v_set(config, &pages->records_unread,
+	      v_read(config, &pages->records_commit));
+	v_set(config, &pages->records_commit, 0);
+
+	return overruns;
+}
+
+static inline
+void subbuffer_set_data_size(const struct ring_buffer_config *config,
+			     struct ring_buffer_backend *bufb,
+			     unsigned long idx,
+			     unsigned long data_size)
+{
+	struct ring_buffer_backend_pages *pages;
+	unsigned long sb_bindex;
+
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_wsb[idx].id);
+	pages = bufb->array[sb_bindex];
+	pages->data_size = data_size;
+}
+
+static inline
+unsigned long subbuffer_get_read_data_size(
+				const struct ring_buffer_config *config,
+				struct ring_buffer_backend *bufb)
+{
+	struct ring_buffer_backend_pages *pages;
+	unsigned long sb_bindex;
+
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_rsb.id);
+	pages = bufb->array[sb_bindex];
+	return pages->data_size;
+}
+
+static inline
+unsigned long subbuffer_get_data_size(
+				const struct ring_buffer_config *config,
+				struct ring_buffer_backend *bufb,
+				unsigned long idx)
+{
+	struct ring_buffer_backend_pages *pages;
+	unsigned long sb_bindex;
+
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_wsb[idx].id);
+	pages = bufb->array[sb_bindex];
+	return pages->data_size;
+}
+
+/**
+ * ring_buffer_clear_noref - Clear the noref subbuffer flag, called by writer.
+ */
+static inline
+void ring_buffer_clear_noref(const struct ring_buffer_config *config,
+			     struct ring_buffer_backend *bufb,
+			     unsigned long idx)
+{
+	unsigned long id, new_id;
+
+	if (config->mode != RING_BUFFER_OVERWRITE)
+		return;
+
+	/*
+	 * Performing a volatile access to read the sb_pages, because we want to
+	 * read a coherent version of the pointer and the associated noref flag.
+	 */
+	id = ACCESS_ONCE(bufb->buf_wsb[idx].id);
+	for (;;) {
+		/* This check is called on the fast path for each record. */
+		if (likely(!subbuffer_id_is_noref(config, id))) {
+			/*
+			 * Store after load dependency ordering the writes to
+			 * the subbuffer after load and test of the noref flag
+			 * matches the memory barrier implied by the cmpxchg()
+			 * in update_read_sb_index().
+			 */
+			return;	/* Already writing to this buffer */
+		}
+		new_id = id;
+		subbuffer_id_clear_noref(config, &new_id);
+		new_id = cmpxchg(&bufb->buf_wsb[idx].id, id, new_id);
+		if (likely(new_id == id))
+			break;
+		id = new_id;
+	}
+}
+
+/**
+ * ring_buffer_set_noref_offset - Set the noref subbuffer flag and offset,
+ *                                called by writer.
+ */
+static inline
+void ring_buffer_set_noref_offset(const struct ring_buffer_config *config,
+				  struct ring_buffer_backend *bufb,
+				  unsigned long idx,
+				  unsigned long offset)
+{
+	if (config->mode != RING_BUFFER_OVERWRITE)
+		return;
+
+	/*
+	 * Because ring_buffer_set_noref() is only called by a single thread
+	 * (the one which updated the cc_sb value), there are no concurrent
+	 * updates to take care of: other writers have not updated cc_sb, so
+	 * they cannot set the noref flag, and concurrent readers cannot modify
+	 * the pointer because the noref flag is not set yet.
+	 * The smp_wmb() in ring_buffer_commit() takes care of ordering writes
+	 * to the subbuffer before this set noref operation.
+	 * subbuffer_set_noref() uses a volatile store to deal with concurrent
+	 * readers of the noref flag.
+	 */
+	CHAN_WARN_ON(bufb->chan,
+		     subbuffer_id_is_noref(config, bufb->buf_wsb[idx].id));
+	/*
+	 * Memory barrier that ensures counter stores are ordered before set
+	 * noref and offset.
+	 */
+	smp_mb();
+	subbuffer_id_set_noref_offset(config, &bufb->buf_wsb[idx].id, offset);
+}
+
+/**
+ * update_read_sb_index - Read-side subbuffer index update.
+ */
+static inline
+int update_read_sb_index(const struct ring_buffer_config *config,
+			 struct ring_buffer_backend *bufb,
+			 struct channel_backend *chanb,
+			 unsigned long consumed_idx,
+			 unsigned long consumed_count)
+{
+	unsigned long old_id, new_id;
+
+	if (config->mode == RING_BUFFER_OVERWRITE) {
+		/*
+		 * Exchange the target writer subbuffer with our own unused
+		 * subbuffer. No need to use ACCESS_ONCE() here to read the
+		 * old_wpage, because the value read will be confirmed by the
+		 * following cmpxchg().
+		 */
+		old_id = bufb->buf_wsb[consumed_idx].id;
+		if (unlikely(!subbuffer_id_is_noref(config, old_id)))
+			return -EAGAIN;
+		/*
+		 * Make sure the offset count we are expecting matches the one
+		 * indicated by the writer.
+		 */
+		if (unlikely(!subbuffer_id_compare_offset(config, old_id,
+							  consumed_count)))
+			return -EAGAIN;
+		CHAN_WARN_ON(bufb->chan,
+			     !subbuffer_id_is_noref(config, bufb->buf_rsb.id));
+		new_id = cmpxchg(&bufb->buf_wsb[consumed_idx].id, old_id,
+				bufb->buf_rsb.id);
+		if (unlikely(old_id != new_id))
+			return -EAGAIN;
+		bufb->buf_rsb.id = new_id;
+		subbuffer_id_clear_noref(config, &bufb->buf_rsb.id);
+	} else {
+		/* No page exchange, use the writer page directly */
+		bufb->buf_rsb.id = bufb->buf_wsb[consumed_idx].id;
+		subbuffer_id_clear_noref(config, &bufb->buf_rsb.id);
+	}
+	return 0;
+}
+
+/*
+ * Use the architecture-specific memcpy implementation for constant-sized
+ * inputs, but rely on an inline memcpy for length statically unknown.
+ * The function call to memcpy is just way too expensive for a fast path.
+ */
+#define ring_buffer_do_copy(config, dest, src, len)		\
+do {								\
+	size_t __len = (len);					\
+	if (__builtin_constant_p(len))				\
+		memcpy((dest), (src), __len);			\
+	else							\
+		inline_memcpy((dest), (src), __len);		\
+} while (0)
+
+#endif /* _LINUX_RING_BUFFER_BACKEND_INTERNAL_H */
Index: linux.trees.git/lib/ringbuffer/ring_buffer_backend.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_backend.c	2010-07-09 18:13:38.000000000 -0400
@@ -0,0 +1,755 @@
+/*
+ * ring_buffer_backend.c
+ *
+ * Copyright (C) 2005-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/vmalloc.h>
+#include <linux/stddef.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/bitops.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/cpu.h>
+#include <linux/mm.h>
+
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/backend.h>
+#include <linux/ringbuffer/frontend.h>
+
+/**
+ * ring_buffer_backend_allocate - allocate a channel buffer
+ * @config: ring buffer instance configuration
+ * @buf: the buffer struct
+ * @size: total size of the buffer
+ * @num_subbuf: number of subbuffers
+ * @extra_reader_sb: need extra subbuffer for reader
+ */
+static
+int ring_buffer_backend_allocate(const struct ring_buffer_config *config,
+				 struct ring_buffer_backend *bufb,
+				 size_t size, size_t num_subbuf,
+				 int extra_reader_sb)
+{
+	struct channel_backend *chanb = &bufb->chan->backend;
+	unsigned long j, num_pages, num_pages_per_subbuf, page_idx = 0;
+	unsigned long subbuf_size, mmap_offset = 0;
+	unsigned long num_subbuf_alloc;
+	struct page **pages;
+	void **virt;
+	unsigned long i;
+
+	num_pages = size >> PAGE_SHIFT;
+	num_pages_per_subbuf = num_pages >> get_count_order(num_subbuf);
+	subbuf_size = chanb->subbuf_size;
+	num_subbuf_alloc = num_subbuf;
+
+	if (extra_reader_sb) {
+		num_pages += num_pages_per_subbuf; /* Add pages for reader */
+		num_subbuf_alloc++;
+	}
+
+	pages = kmalloc_node(ALIGN(sizeof(*pages) * num_pages,
+				   1 << INTERNODE_CACHE_SHIFT),
+			GFP_KERNEL, cpu_to_node(max(bufb->cpu, 0)));
+	if (unlikely(!pages))
+		goto pages_error;
+
+	virt = kmalloc_node(ALIGN(sizeof(*virt) * num_pages,
+				  1 << INTERNODE_CACHE_SHIFT),
+			GFP_KERNEL, cpu_to_node(max(bufb->cpu, 0)));
+	if (unlikely(!virt))
+		goto virt_error;
+
+	bufb->array = kmalloc_node(ALIGN(sizeof(*bufb->array)
+					 * num_subbuf_alloc,
+				  1 << INTERNODE_CACHE_SHIFT),
+			GFP_KERNEL, cpu_to_node(max(bufb->cpu, 0)));
+	if (unlikely(!bufb->array))
+		goto array_error;
+
+	for (i = 0; i < num_pages; i++) {
+		pages[i] = alloc_pages_node(cpu_to_node(max(bufb->cpu, 0)),
+					    GFP_KERNEL | __GFP_ZERO, 0);
+		if (unlikely(!pages[i]))
+			goto depopulate;
+		virt[i] = page_address(pages[i]);
+	}
+	bufb->num_pages_per_subbuf = num_pages_per_subbuf;
+
+	/* Allocate backend pages array elements */
+	for (i = 0; i < num_subbuf_alloc; i++) {
+		bufb->array[i] =
+			kzalloc_node(ALIGN(
+				sizeof(struct ring_buffer_backend_pages) +
+				sizeof(struct ring_buffer_backend_page)
+				* num_pages_per_subbuf,
+				1 << INTERNODE_CACHE_SHIFT),
+				GFP_KERNEL, cpu_to_node(max(bufb->cpu, 0)));
+		if (!bufb->array[i])
+			goto free_array;
+	}
+
+	/* Allocate write-side subbuffer table */
+	bufb->buf_wsb = kzalloc_node(ALIGN(
+				sizeof(struct ring_buffer_backend_subbuffer)
+				* num_subbuf,
+				1 << INTERNODE_CACHE_SHIFT),
+				GFP_KERNEL, cpu_to_node(max(bufb->cpu, 0)));
+	if (unlikely(!bufb->buf_wsb))
+		goto free_array;
+
+	for (i = 0; i < num_subbuf; i++)
+		bufb->buf_wsb[i].id = subbuffer_id(config, 0, 1, i);
+
+	/* Assign read-side subbuffer table */
+	if (extra_reader_sb)
+		bufb->buf_rsb.id = subbuffer_id(config, 0, 1,
+						num_subbuf_alloc - 1);
+	else
+		bufb->buf_rsb.id = subbuffer_id(config, 0, 1, 0);
+
+	/* Assign pages to page index */
+	for (i = 0; i < num_subbuf_alloc; i++) {
+		for (j = 0; j < num_pages_per_subbuf; j++) {
+			CHAN_WARN_ON(chanb, page_idx > num_pages);
+			bufb->array[i]->p[j].virt = virt[page_idx];
+			bufb->array[i]->p[j].page = pages[page_idx];
+			page_idx++;
+		}
+		if (config->output == RING_BUFFER_MMAP) {
+			bufb->array[i]->mmap_offset = mmap_offset;
+			mmap_offset += subbuf_size;
+		}
+	}
+
+	/*
+	 * If kmalloc ever uses vmalloc underneath, make sure the buffer pages
+	 * will not fault.
+	 */
+	vmalloc_sync_all();
+	kfree(virt);
+	kfree(pages);
+	return 0;
+
+free_array:
+	for (i = 0; (i < num_subbuf_alloc && bufb->array[i]); i++)
+		kfree(bufb->array[i]);
+depopulate:
+	/* Free all allocated pages */
+	for (i = 0; (i < num_pages && pages[i]); i++)
+		__free_page(pages[i]);
+	kfree(bufb->array);
+array_error:
+	kfree(virt);
+virt_error:
+	kfree(pages);
+pages_error:
+	return -ENOMEM;
+}
+
+int ring_buffer_backend_create(struct ring_buffer_backend *bufb,
+				     struct channel_backend *chanb,
+				     int cpu)
+{
+	const struct ring_buffer_config *config = chanb->config;
+
+	bufb->chan = container_of(chanb, struct channel, backend);
+	bufb->cpu = cpu;
+
+	return ring_buffer_backend_allocate(config, bufb, chanb->buf_size,
+					   chanb->num_subbuf,
+					   chanb->extra_reader_sb);
+}
+
+void ring_buffer_backend_free(struct ring_buffer_backend *bufb)
+{
+	struct channel_backend *chanb = &bufb->chan->backend;
+	unsigned long i, j, num_subbuf_alloc;
+
+	num_subbuf_alloc = chanb->num_subbuf;
+	if (chanb->extra_reader_sb)
+		num_subbuf_alloc++;
+
+	kfree(bufb->buf_wsb);
+	for (i = 0; i < num_subbuf_alloc; i++) {
+		for (j = 0; j < bufb->num_pages_per_subbuf; j++)
+			__free_page(bufb->array[i]->p[j].page);
+		kfree(bufb->array[i]);
+	}
+	kfree(bufb->array);
+	bufb->allocated = 0;
+}
+
+void ring_buffer_backend_reset(struct ring_buffer_backend *bufb)
+{
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	unsigned long num_subbuf_alloc;
+	unsigned int i;
+
+	num_subbuf_alloc = chanb->num_subbuf;
+	if (chanb->extra_reader_sb)
+		num_subbuf_alloc++;
+
+	for (i = 0; i < chanb->num_subbuf; i++)
+		bufb->buf_wsb[i].id = subbuffer_id(config, 0, 1, i);
+	if (chanb->extra_reader_sb)
+		bufb->buf_rsb.id = subbuffer_id(config, 0, 1,
+						num_subbuf_alloc - 1);
+	else
+		bufb->buf_rsb.id = subbuffer_id(config, 0, 1, 0);
+
+	for (i = 0; i < num_subbuf_alloc; i++) {
+		/* Don't reset mmap_offset */
+		v_set(config, &bufb->array[i]->records_commit, 0);
+		v_set(config, &bufb->array[i]->records_unread, 0);
+		bufb->array[i]->data_size = 0;
+		/* Don't reset backend page and virt addresses */
+	}
+	/* Don't reset num_pages_per_subbuf, cpu, allocated */
+	v_set(config, &bufb->records_read, 0);
+}
+
+/*
+ * The frontend is responsible for also calling ring_buffer_backend_reset for
+ * each buffer when calling channel_backend_reset.
+ */
+void channel_backend_reset(struct channel_backend *chanb)
+{
+	struct channel *chan = container_of(chanb, struct channel, backend);
+	const struct ring_buffer_config *config = chanb->config;
+
+	/*
+	 * Don't reset buf_size, subbuf_size, subbuf_size_order,
+	 * num_subbuf_order, buf_size_order, extra_reader_sb, num_subbuf,
+	 * priv, notifiers, config, cpumask and name.
+	 */
+	chanb->start_tsc = config->cb.ring_buffer_clock_read(chan);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+/**
+ *	ring_buffer_cpu_hp_callback - CPU hotplug callback
+ *	@nb: notifier block
+ *	@action: hotplug action to take
+ *	@hcpu: CPU number
+ *
+ *	Returns the success/failure of the operation. (%NOTIFY_OK, %NOTIFY_BAD)
+ */
+static
+int __cpuinit ring_buffer_cpu_hp_callback(struct notifier_block *nb,
+					  unsigned long action,
+					  void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	struct channel_backend *chanb = container_of(nb, struct channel_backend,
+						     cpu_hp_notifier);
+	const struct ring_buffer_config *config = chanb->config;
+	struct ring_buffer *buf;
+	int ret;
+
+	CHAN_WARN_ON(chanb, config->alloc == RING_BUFFER_ALLOC_GLOBAL);
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		buf = per_cpu_ptr(chanb->buf, cpu);
+		ret = ring_buffer_create(buf, chanb, cpu);
+		if (ret) {
+			printk(KERN_ERR
+			  "ring_buffer_cpu_hp_callback: cpu %d "
+			  "buffer creation failed\n", cpu);
+			return NOTIFY_BAD;
+		}
+		break;
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		/* No need to do a buffer switch here, because it will happen
+		 * when tracing is stopped, or will be done by switch timer CPU
+		 * DEAD callback. */
+		break;
+	}
+	return NOTIFY_OK;
+}
+#endif
+
+/**
+ * channel_backend_init - initialize a channel backend
+ * @chanb: channel backend
+ * @name: channel name
+ * @config: client ring buffer configuration
+ * @priv: client private data
+ * @parent: dentry of parent directory, %NULL for root directory
+ * @subbuf_size: size of sub-buffers (> PAGE_SIZE, power of 2)
+ * @num_subbuf: number of sub-buffers (power of 2)
+ *
+ * Returns channel pointer if successful, %NULL otherwise.
+ *
+ * Creates per-cpu channel buffers using the sizes and attributes
+ * specified.  The created channel buffer files will be named
+ * name_0...name_N-1.  File permissions will be %S_IRUSR.
+ *
+ * Called with CPU hotplug disabled.
+ */
+int channel_backend_init(struct channel_backend *chanb,
+				const char *name,
+				const struct ring_buffer_config *config,
+				void *priv, size_t subbuf_size,
+				size_t num_subbuf)
+{
+	struct channel *chan = container_of(chanb, struct channel, backend);
+	unsigned int i;
+	int ret;
+
+	if (!name)
+		return -EPERM;
+
+	if (!(subbuf_size && num_subbuf))
+		return -EPERM;
+
+	/* Check that the subbuffer size is larger than a page. */
+	CHAN_WARN_ON(chanb, subbuf_size < PAGE_SIZE);
+
+	/*
+	 * Make sure the number of subbuffers and subbuffer size are power of 2.
+	 */
+	CHAN_WARN_ON(chanb, hweight32(subbuf_size) != 1);
+	CHAN_WARN_ON(chanb, hweight32(num_subbuf) != 1);
+
+	ret = subbuffer_id_check_index(config, num_subbuf);
+	if (ret)
+		return ret;
+
+	chanb->priv = priv;
+	chanb->buf_size = num_subbuf * subbuf_size;
+	chanb->subbuf_size = subbuf_size;
+	chanb->buf_size_order = get_count_order(chanb->buf_size);
+	chanb->subbuf_size_order = get_count_order(subbuf_size);
+	chanb->num_subbuf_order = get_count_order(num_subbuf);
+	chanb->extra_reader_sb =
+			(config->mode == RING_BUFFER_OVERWRITE) ? 1 : 0;
+	chanb->num_subbuf = num_subbuf;
+	strlcpy(chanb->name, name, NAME_MAX);
+	chanb->config = config;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		if (!zalloc_cpumask_var(&chanb->cpumask, GFP_KERNEL))
+			return -ENOMEM;
+	}
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		/* Allocating the buffer per-cpu structures */
+		chanb->buf = alloc_percpu(struct ring_buffer);
+		if (!chanb->buf)
+			goto free_cpumask;
+
+		/*
+		 * In case of non-hotplug cpu, if the ring-buffer is allocated
+		 * in early initcall, it will not be notified of secondary cpus.
+		 * In that off case, we need to allocate for all possible cpus.
+		 */
+#ifdef CONFIG_HOTPLUG_CPU
+		/*
+		 * buf->backend.allocated test takes care of concurrent CPU
+		 * hotplug.
+		 * Priority higher than frontend, so we create the ring buffer
+		 * before we start the timer.
+		 */
+		chanb->cpu_hp_notifier.notifier_call =
+				ring_buffer_cpu_hp_callback;
+		chanb->cpu_hp_notifier.priority = 5;
+		register_hotcpu_notifier(&chanb->cpu_hp_notifier);
+
+		get_online_cpus();
+		for_each_online_cpu(i) {
+			ret = ring_buffer_create(per_cpu_ptr(chanb->buf, i),
+						 chanb, i);
+			if (ret)
+				goto free_bufs;	/* cpu hotplug locked */
+		}
+		put_online_cpus();
+#else
+		for_each_possible_cpu(i) {
+			ret = ring_buffer_create(per_cpu_ptr(chanb->buf, i),
+						 chanb, i);
+			if (ret)
+				goto free_bufs;	/* cpu hotplug locked */
+		}
+#endif
+	} else {
+		chanb->buf = kzalloc(sizeof(struct ring_buffer), GFP_KERNEL);
+		if (!chanb->buf)
+			goto free_cpumask;
+		ret = ring_buffer_create(chanb->buf, chanb, -1);
+		if (ret)
+			goto free_bufs;
+	}
+	chanb->start_tsc = config->cb.ring_buffer_clock_read(chan);
+
+	return 0;
+
+free_bufs:
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		for_each_possible_cpu(i) {
+			struct ring_buffer *buf = per_cpu_ptr(chanb->buf, i);
+
+			if (!buf->backend.allocated)
+				continue;
+			ring_buffer_free(buf);
+		}
+#ifdef CONFIG_HOTPLUG_CPU
+		put_online_cpus();
+#endif
+		free_percpu(chanb->buf);
+	} else
+		kfree(chanb->buf);
+free_cpumask:
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		free_cpumask_var(chanb->cpumask);
+	return -ENOMEM;
+}
+
+/**
+ * channel_backend_unregister_notifiers - unregister notifiers
+ * @chan: the channel
+ *
+ * Holds CPU hotplug.
+ */
+void channel_backend_unregister_notifiers(struct channel_backend *chanb)
+{
+	const struct ring_buffer_config *config = chanb->config;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		unregister_hotcpu_notifier(&chanb->cpu_hp_notifier);
+}
+
+/**
+ * channel_backend_free - destroy the channel
+ * @chan: the channel
+ *
+ * Destroy all channel buffers and frees the channel.
+ */
+void channel_backend_free(struct channel_backend *chanb)
+{
+	const struct ring_buffer_config *config = chanb->config;
+	unsigned int i;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		for_each_possible_cpu(i) {
+			struct ring_buffer *buf = per_cpu_ptr(chanb->buf, i);
+
+			if (!buf->backend.allocated)
+				continue;
+			ring_buffer_free(buf);
+		}
+		free_cpumask_var(chanb->cpumask);
+		free_percpu(chanb->buf);
+	} else {
+		struct ring_buffer *buf = chanb->buf;
+
+		CHAN_WARN_ON(chanb, !buf->backend.allocated);
+		ring_buffer_free(buf);
+		kfree(buf);
+	}
+}
+
+/**
+ * ring_buffer_write - write data to a ring_buffer buffer.
+ * @bufb : buffer backend
+ * @offset : offset within the buffer
+ * @src : source address
+ * @len : length to write
+ * @pagecpy : page size copied so far
+ */
+void _ring_buffer_write(struct ring_buffer_backend *bufb, size_t offset,
+			const void *src, size_t len, ssize_t pagecpy)
+{
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	size_t sbidx, index;
+	struct ring_buffer_backend_pages *rpages;
+	unsigned long sb_bindex, id;
+
+	do {
+		len -= pagecpy;
+		src += pagecpy;
+		offset += pagecpy;
+		sbidx = offset >> chanb->subbuf_size_order;
+		index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+
+		/*
+		 * Underlying layer should never ask for writes across
+		 * subbuffers.
+		 */
+		CHAN_WARN_ON(chanb, offset >= chanb->buf_size);
+
+		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
+		id = bufb->buf_wsb[sbidx].id;
+		sb_bindex = subbuffer_id_get_index(config, id);
+		rpages = bufb->array[sb_bindex];
+		CHAN_WARN_ON(chanb, config->mode == RING_BUFFER_OVERWRITE
+			     && subbuffer_id_is_noref(config, id));
+		ring_buffer_do_copy(config,
+				    rpages->p[index].virt
+					+ (offset & ~PAGE_MASK),
+				    src, pagecpy);
+	} while (unlikely(len != pagecpy));
+}
+EXPORT_SYMBOL_GPL(_ring_buffer_write);
+
+/**
+ * ring_buffer_read - read data from ring_buffer_buffer.
+ * @bufb : buffer backend
+ * @offset : offset within the buffer
+ * @dest : destination address
+ * @len : length to copy to destination
+ *
+ * Should be protected by get_subbuf/put_subbuf.
+ * Returns the length copied.
+ */
+size_t ring_buffer_read(struct ring_buffer_backend *bufb, size_t offset,
+			void *dest, size_t len)
+{
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	size_t index;
+	ssize_t pagecpy, orig_len;
+	struct ring_buffer_backend_pages *rpages;
+	unsigned long sb_bindex, id;
+
+	orig_len = len;
+	offset &= chanb->buf_size - 1;
+	index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+	if (unlikely(!len))
+		return 0;
+	for (;;) {
+		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
+		id = bufb->buf_rsb.id;
+		sb_bindex = subbuffer_id_get_index(config, id);
+		rpages = bufb->array[sb_bindex];
+		CHAN_WARN_ON(chanb, config->mode == RING_BUFFER_OVERWRITE
+			     && subbuffer_id_is_noref(config, id));
+		memcpy(dest, rpages->p[index].virt + (offset & ~PAGE_MASK),
+		       pagecpy);
+		len -= pagecpy;
+		if (likely(!len))
+			break;
+		dest += pagecpy;
+		offset += pagecpy;
+		index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+		/*
+		 * Underlying layer should never ask for reads across
+		 * subbuffers.
+		 */
+		CHAN_WARN_ON(chanb, offset >= chanb->buf_size);
+	}
+	return orig_len;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_read);
+
+/**
+ * ring_buffer_copy_to_user - read data from ring_buffer_buffer to userspace
+ * @bufb : buffer backend
+ * @offset : offset within the buffer
+ * @dest : destination userspace address
+ * @len : length to copy to destination
+ *
+ * Should be protected by get_subbuf/put_subbuf.
+ * access_ok() must have been performed on dest addresses prior to call this
+ * function.
+ * Returns -EFAULT on error, 0 if ok.
+ */
+int __ring_buffer_copy_to_user(struct ring_buffer_backend *bufb,
+			       size_t offset, void __user *dest, size_t len)
+{
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	size_t index;
+	ssize_t pagecpy, orig_len;
+	struct ring_buffer_backend_pages *rpages;
+	unsigned long sb_bindex, id;
+
+	orig_len = len;
+	offset &= chanb->buf_size - 1;
+	index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+	if (unlikely(!len))
+		return 0;
+	for (;;) {
+		pagecpy = min_t(size_t, len, PAGE_SIZE - (offset & ~PAGE_MASK));
+		id = bufb->buf_rsb.id;
+		sb_bindex = subbuffer_id_get_index(config, id);
+		rpages = bufb->array[sb_bindex];
+		CHAN_WARN_ON(chanb, config->mode == RING_BUFFER_OVERWRITE
+			     && subbuffer_id_is_noref(config, id));
+		if (__copy_to_user(dest,
+			       rpages->p[index].virt + (offset & ~PAGE_MASK),
+			       pagecpy))
+			return -EFAULT;
+		len -= pagecpy;
+		if (likely(!len))
+			break;
+		dest += pagecpy;
+		offset += pagecpy;
+		index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+		/*
+		 * Underlying layer should never ask for reads across
+		 * subbuffers.
+		 */
+		CHAN_WARN_ON(chanb, offset >= chanb->buf_size);
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__ring_buffer_copy_to_user);
+
+/**
+ * ring_buffer_read_cstr - read a C-style string from ring_buffer_buffer.
+ * @bufb : buffer backend
+ * @offset : offset within the buffer
+ * @dest : destination address
+ * @len : destination's length
+ *
+ * return string's length
+ * Should be protected by get_subbuf/put_subbuf.
+ */
+int ring_buffer_read_cstr(struct ring_buffer_backend *bufb, size_t offset,
+			  void *dest, size_t len)
+{
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	size_t index;
+	ssize_t pagecpy, pagelen, strpagelen, orig_offset;
+	char *str;
+	struct ring_buffer_backend_pages *rpages;
+	unsigned long sb_bindex, id;
+
+	offset &= chanb->buf_size - 1;
+	index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+	orig_offset = offset;
+	for (;;) {
+		id = bufb->buf_rsb.id;
+		sb_bindex = subbuffer_id_get_index(config, id);
+		rpages = bufb->array[sb_bindex];
+		CHAN_WARN_ON(chanb, config->mode == RING_BUFFER_OVERWRITE
+			     && subbuffer_id_is_noref(config, id));
+		str = (char *)rpages->p[index].virt + (offset & ~PAGE_MASK);
+		pagelen = PAGE_SIZE - (offset & ~PAGE_MASK);
+		strpagelen = strnlen(str, pagelen);
+		if (len) {
+			pagecpy = min_t(size_t, len, strpagelen);
+			if (dest) {
+				memcpy(dest, str, pagecpy);
+				dest += pagecpy;
+			}
+			len -= pagecpy;
+		}
+		offset += strpagelen;
+		index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+		if (strpagelen < pagelen)
+			break;
+		/*
+		 * Underlying layer should never ask for reads across
+		 * subbuffers.
+		 */
+		CHAN_WARN_ON(chanb, offset >= chanb->buf_size);
+	}
+	if (dest && len)
+		((char *)dest)[0] = 0;
+	return offset - orig_offset;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_read_cstr);
+
+/**
+ * ring_buffer_read_get_page - Get a whole page to read from
+ * @bufb : buffer backend
+ * @offset : offset within the buffer
+ * @virt : pointer to page address (output)
+ *
+ * Should be protected by get_subbuf/put_subbuf.
+ * Returns the pointer to the page struct pointer.
+ */
+struct page **ring_buffer_read_get_page(struct ring_buffer_backend *bufb,
+					size_t offset, void ***virt)
+{
+	size_t index;
+	struct ring_buffer_backend_pages *rpages;
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	unsigned long sb_bindex, id;
+
+	offset &= chanb->buf_size - 1;
+	index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+	id = bufb->buf_rsb.id;
+	sb_bindex = subbuffer_id_get_index(config, id);
+	rpages = bufb->array[sb_bindex];
+	CHAN_WARN_ON(chanb, config->mode == RING_BUFFER_OVERWRITE
+		     && subbuffer_id_is_noref(config, id));
+	*virt = &rpages->p[index].virt;
+	return &rpages->p[index].page;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_read_get_page);
+
+/**
+ * ring_buffer_read_offset_address - get address of a location within the buffer
+ * @bufb : buffer backend
+ * @offset : offset within the buffer.
+ *
+ * Return the address where a given offset is located (for read).
+ * Should be used to get the current subbuffer header pointer. Given we know
+ * it's never on a page boundary, it's safe to write directly to this address,
+ * as long as the write is never bigger than a page size.
+ */
+void *ring_buffer_read_offset_address(struct ring_buffer_backend *bufb,
+				      size_t offset)
+{
+	size_t index;
+	struct ring_buffer_backend_pages *rpages;
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	unsigned long sb_bindex, id;
+
+	offset &= chanb->buf_size - 1;
+	index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+	id = bufb->buf_rsb.id;
+	sb_bindex = subbuffer_id_get_index(config, id);
+	rpages = bufb->array[sb_bindex];
+	CHAN_WARN_ON(chanb, config->mode == RING_BUFFER_OVERWRITE
+		     && subbuffer_id_is_noref(config, id));
+	return rpages->p[index].virt + (offset & ~PAGE_MASK);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_read_offset_address);
+
+/**
+ * ring_buffer_offset_address - get address of a location within the buffer
+ * @bufb : buffer backend
+ * @offset : offset within the buffer.
+ *
+ * Return the address where a given offset is located.
+ * Should be used to get the current subbuffer header pointer. Given we know
+ * it's always at the beginning of a page, it's safe to write directly to this
+ * address, as long as the write is never bigger than a page size.
+ */
+void *ring_buffer_offset_address(struct ring_buffer_backend *bufb,
+				 size_t offset)
+{
+	size_t sbidx, index;
+	struct ring_buffer_backend_pages *rpages;
+	struct channel_backend *chanb = &bufb->chan->backend;
+	const struct ring_buffer_config *config = chanb->config;
+	unsigned long sb_bindex, id;
+
+	offset &= chanb->buf_size - 1;
+	sbidx = offset >> chanb->subbuf_size_order;
+	index = (offset & (chanb->subbuf_size - 1)) >> PAGE_SHIFT;
+	id = bufb->buf_wsb[sbidx].id;
+	sb_bindex = subbuffer_id_get_index(config, id);
+	rpages = bufb->array[sb_bindex];
+	CHAN_WARN_ON(chanb, config->mode == RING_BUFFER_OVERWRITE
+		     && subbuffer_id_is_noref(config, id));
+	return rpages->p[index].virt + (offset & ~PAGE_MASK);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_offset_address);
Index: linux.trees.git/include/linux/ringbuffer/backend_types.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/backend_types.h	2010-07-09 18:09:01.000000000 -0400
@@ -0,0 +1,80 @@
+#ifndef _LINUX_RING_BUFFER_BACKEND_TYPES_H
+#define _LINUX_RING_BUFFER_BACKEND_TYPES_H
+
+/*
+ * linux/ringbuffer/backend_types.h
+ *
+ * Copyright (C) 2008-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer backend (types).
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/types.h>
+
+struct ring_buffer_backend_page {
+	void *virt;			/* page virtual address (cached) */
+	struct page *page;		/* pointer to page structure */
+};
+
+struct ring_buffer_backend_pages {
+	unsigned long mmap_offset;	/* offset of the subbuffer in mmap */
+	union v_atomic records_commit;	/* current records committed count */
+	union v_atomic records_unread;	/* records to read */
+	unsigned long data_size;	/* Amount of data to read from subbuf */
+	struct ring_buffer_backend_page p[];
+};
+
+struct ring_buffer_backend_subbuffer {
+	/* Identifier for subbuf backend pages. Exchanged atomically. */
+	unsigned long id;		/* backend subbuffer identifier */
+};
+
+/*
+ * Forward declaration of frontend-specific channel and ring_buffer.
+ */
+struct channel;
+struct ring_buffer;
+
+struct ring_buffer_backend {
+	/* Array of ring_buffer_backend_subbuffer for writer */
+	struct ring_buffer_backend_subbuffer *buf_wsb;
+	/* ring_buffer_backend_subbuffer for reader */
+	struct ring_buffer_backend_subbuffer buf_rsb;
+	/*
+	 * Pointer array of backend pages, for whole buffer.
+	 * Indexed by ring_buffer_backend_subbuffer identifier (id) index.
+	 */
+	struct ring_buffer_backend_pages **array;
+	unsigned int num_pages_per_subbuf;
+
+	struct channel *chan;		/* Associated channel */
+	int cpu;			/* This buffer's cpu. -1 if global. */
+	union v_atomic records_read;	/* Number of records read */
+	unsigned int allocated:1;	/* Bool: is buffer allocated ? */
+};
+
+struct channel_backend {
+	unsigned long buf_size;		/* Size of the buffer */
+	unsigned long subbuf_size;	/* Sub-buffer size */
+	unsigned int subbuf_size_order;	/* Order of sub-buffer size */
+	unsigned int num_subbuf_order;	/*
+					 * Order of number of sub-buffers/buffer
+					 * for writer.
+					 */
+	unsigned int buf_size_order;	/* Order of buffer size */
+	int extra_reader_sb:1;		/* Bool: has extra reader subbuffer */
+	struct ring_buffer *buf;	/* Channel per-cpu buffers */
+
+	unsigned long num_subbuf;	/* Number of sub-buffers for writer */
+	u64 start_tsc;			/* Channel creation TSC value */
+	void *priv;			/* Client-specific information */
+	struct notifier_block cpu_hp_notifier;	 /* CPU hotplug notifier */
+	const struct ring_buffer_config *config; /* Ring buffer configuration */
+	cpumask_var_t cpumask;		/* Allocated per-cpu buffers cpumask */
+	char name[NAME_MAX];		/* Channel name */
+};
+
+#endif /* _LINUX_RING_BUFFER_BACKEND_TYPES_H */


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 13/20] ring buffer frontend
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (11 preceding siblings ...)
  2010-07-09 22:57 ` [patch 12/20] ring buffer backend Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 14/20] Ring buffer library - documentation Mathieu Desnoyers
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring_buffer_frontend.patch --]
[-- Type: text/plain, Size: 101890 bytes --]

Wait-free ring buffer reader/writer synchronization. Inherits of a parent
backend that holds the memory buffers and accessors. The backend can be replaced
so that the same frontend (synchronization) code can be used with various
backends by compiling the frontend code with various backends.

The frontend inherits from the backend because it needs to call the backend for
the flight-recorder "sub-buffer exchange" routine and to clear/set the subbuffer
noref flag.

However, the backend (the parent) does not have to know anything specific about
the frontend. It's the user (client) which calls the frontend to synchronize and
the backend to manipulate buffer data.

This frontend/backend separation permits to use the same ring buffer
synchronization code to write data to kernel pages, to video memory, to serial
ports, etc etc, without having to deal with different synchronization schemes.

The frontend also deals with cpu hotplug, cpu idle and periodical "flush"
deferrable timers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/ringbuffer/api.h               |   25 
 include/linux/ringbuffer/config.h            |  309 +++++
 include/linux/ringbuffer/frontend.h          |  191 +++
 include/linux/ringbuffer/frontend_api.h      |  352 ++++++
 include/linux/ringbuffer/frontend_internal.h |  424 +++++++
 include/linux/ringbuffer/frontend_types.h    |  158 ++
 include/linux/ringbuffer/vatomic.h           |   85 +
 lib/Kconfig                                  |   12 
 lib/Makefile                                 |    2 
 lib/ringbuffer/Makefile                      |    1 
 lib/ringbuffer/ring_buffer_frontend.c        | 1510 +++++++++++++++++++++++++++
 11 files changed, 3069 insertions(+)

Index: linux.trees.git/lib/ringbuffer/ring_buffer_frontend.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_frontend.c	2010-07-09 18:23:52.000000000 -0400
@@ -0,0 +1,1510 @@
+/*
+ * ring_buffer_frontend.c
+ *
+ * (C) Copyright 2005-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer wait-free buffer synchronization. Producer-consumer and flight
+ * recorder (overwrite) modes. See thesis:
+ *
+ * Desnoyers, Mathieu (2009), "Low-Impact Operating System Tracing", Ph.D.
+ * dissertation, Ecole Polytechnique de Montreal.
+ * http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf
+ *
+ * - Algorithm presentation in Chapter 5:
+ *     "Lockless Multi-Core High-Throughput Buffering".
+ * - Algorithm formal verification in Section 8.6:
+ *     "Formal verification of LTTng"
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Inspired from LTT and RelayFS:
+ *  Karim Yaghmour <karim@opersys.com>
+ *  Tom Zanussi <zanussi@us.ibm.com>
+ *  Bob Wisniewski <bob@watson.ibm.com>
+ * And from K42 :
+ *  Bob Wisniewski <bob@watson.ibm.com>
+ *
+ * Buffer reader semantic :
+ *
+ * - get_subbuf_size
+ * while buffer is not finalized and empty
+ *   - get_subbuf
+ *     - if return value != 0, continue
+ *   - splice one subbuffer worth of data to a pipe
+ *   - splice the data from pipe to disk/network
+ *   - put_subbuf
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/idle.h>
+#include <linux/delay.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/backend.h>
+#include <linux/ringbuffer/frontend.h>
+#include <linux/ringbuffer/iterator.h>
+
+/*
+ * Internal structure representing offsets to use at a sub-buffer switch.
+ */
+struct switch_offsets {
+	unsigned long begin, end, old;
+	size_t pre_header_padding, size;
+	unsigned int switch_new_start:1, switch_new_end:1, switch_old_start:1,
+		     switch_old_end:1;
+};
+
+DEFINE_PER_CPU(unsigned int, ring_buffer_nesting);
+EXPORT_PER_CPU_SYMBOL(ring_buffer_nesting);
+
+static
+void ring_buffer_print_errors(struct channel *chan,
+			      struct ring_buffer *buf,
+			      int cpu);
+
+static const struct file_operations ring_buffer_file_operations;
+
+/*
+ * Must be called under cpu hotplug protection.
+ */
+void ring_buffer_free(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+
+	ring_buffer_print_errors(chan, buf, buf->backend.cpu);
+	kfree(buf->commit_hot);
+	kfree(buf->commit_cold);
+
+	ring_buffer_backend_free(&buf->backend);
+}
+
+/**
+ * ring_buffer_reset - Reset ring buffer to initial values.
+ * @buf: Ring buffer.
+ *
+ * Effectively empty the ring buffer. Should be called when the buffer is not
+ * used for writing. The ring buffer can be opened for reading, but the reader
+ * should not be using the iterator concurrently with reset. The previous
+ * current iterator record is reset.
+ */
+void ring_buffer_reset(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned int i;
+
+	/*
+	 * Reset iterator first. It will put the subbuffer if it currently holds
+	 * it.
+	 */
+	ring_buffer_iterator_reset(buf);
+	v_set(config, &buf->offset, 0);
+	for (i = 0; i < chan->backend.num_subbuf; i++) {
+		v_set(config, &buf->commit_hot[i].cc, 0);
+		v_set(config, &buf->commit_hot[i].seq, 0);
+		v_set(config, &buf->commit_cold[i].cc_sb, 0);
+	}
+	atomic_long_set(&buf->consumed, 0);
+	atomic_set(&buf->record_disabled, 0);
+	v_set(config, &buf->last_tsc, 0);
+	ring_buffer_backend_reset(&buf->backend);
+	/* Don't reset number of active readers */
+	v_set(config, &buf->records_lost_full, 0);
+	v_set(config, &buf->records_lost_wrap, 0);
+	v_set(config, &buf->records_lost_big, 0);
+	v_set(config, &buf->records_count, 0);
+	v_set(config, &buf->records_overrun, 0);
+	buf->finalized = 0;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_reset);
+
+/**
+ * channel_reset - Reset channel to initial values.
+ * @chan: Channel.
+ *
+ * Effectively empty the channel. Should be called when the channel is not used
+ * for writing. The channel can be opened for reading, but the reader should not
+ * be using the iterator concurrently with reset. The previous current iterator
+ * record is reset.
+ */
+void channel_reset(struct channel *chan)
+{
+	/*
+	 * Reset iterators first. Will put the subbuffer if held for reading.
+	 */
+	channel_iterator_reset(chan);
+	atomic_set(&chan->record_disabled, 0);
+	/* Don't reset commit_count_mask, still valid */
+	channel_backend_reset(&chan->backend);
+	/* Don't reset switch/read timer interval */
+	/* Don't reset notifiers and notifier enable bits */
+	/* Don't reset reader reference count */
+}
+EXPORT_SYMBOL_GPL(channel_reset);
+
+/*
+ * Must be called under cpu hotplug protection.
+ */
+int ring_buffer_create(struct ring_buffer *buf,
+		       struct channel_backend *chanb,
+		       int cpu)
+{
+	const struct ring_buffer_config *config = chanb->config;
+	struct channel *chan = container_of(chanb, struct channel, backend);
+	void *priv = chanb->priv;
+	unsigned int j, num_subbuf;
+	size_t subbuf_header_size;
+	u64 tsc;
+	int ret;
+
+	/* Test for cpu hotplug */
+	if (buf->backend.allocated)
+		return 0;
+
+	ret = ring_buffer_backend_create(&buf->backend, &chan->backend, cpu);
+	if (ret)
+		return ret;
+
+	buf->commit_hot =
+		kzalloc_node(ALIGN(sizeof(*buf->commit_hot)
+				   * chan->backend.num_subbuf,
+				   1 << INTERNODE_CACHE_SHIFT),
+			GFP_KERNEL, cpu_to_node(max(cpu, 0)));
+	if (!buf->commit_hot) {
+		ret = -ENOMEM;
+		goto free_chanbuf;
+	}
+
+	buf->commit_cold =
+		kzalloc_node(ALIGN(sizeof(*buf->commit_cold)
+				   * chan->backend.num_subbuf,
+				   1 << INTERNODE_CACHE_SHIFT),
+			GFP_KERNEL, cpu_to_node(max(cpu, 0)));
+	if (!buf->commit_cold) {
+		ret = -ENOMEM;
+		goto free_commit;
+	}
+
+	atomic_long_set(&buf->consumed, 0);
+	atomic_long_set(&buf->active_readers, 0);
+	num_subbuf = chan->backend.num_subbuf;
+	for (j = 0; j < num_subbuf; j++) {
+		v_set(config, &buf->commit_hot[j].cc, 0);
+		v_set(config, &buf->commit_hot[j].seq, 0);
+		v_set(config, &buf->commit_cold[j].cc_sb, 0);
+	}
+	init_waitqueue_head(&buf->read_wait);
+	raw_spin_lock_init(&buf->raw_idle_spinlock);
+
+	v_set(config, &buf->records_lost_full, 0);
+	v_set(config, &buf->records_lost_wrap, 0);
+	v_set(config, &buf->records_lost_big, 0);
+	v_set(config, &buf->records_count, 0);
+	v_set(config, &buf->records_overrun, 0);
+	buf->finalized = 0;
+
+	/*
+	 * Write the subbuffer header for first subbuffer so we know the total
+	 * duration of data gathering.
+	 */
+	subbuf_header_size = config->cb.subbuffer_header_size();
+	v_set(config, &buf->offset, subbuf_header_size);
+	subbuffer_id_clear_noref(config, &buf->backend.buf_wsb[0].id);
+	tsc = config->cb.ring_buffer_clock_read(buf->backend.chan);
+	config->cb.buffer_begin(buf, tsc, 0);
+	v_add(config, subbuf_header_size, &buf->commit_hot[0].cc);
+
+	if (config->cb.buffer_create) {
+		ret = config->cb.buffer_create(buf, priv, cpu, chanb->name);
+		if (ret)
+			goto free_init;
+	}
+
+	/*
+	 * Ensure the buffer is ready before setting it to allocated and setting
+	 * the cpumask.
+	 * Used for cpu hotplug vs cpumask iteration.
+	 */
+	smp_wmb();
+	buf->backend.allocated = 1;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		CHAN_WARN_ON(chan, cpumask_test_cpu(cpu,
+			     chan->backend.cpumask));
+		cpumask_set_cpu(cpu, chan->backend.cpumask);
+	}
+
+	return 0;
+
+	/* Error handling */
+free_init:
+	kfree(buf->commit_cold);
+free_commit:
+	kfree(buf->commit_hot);
+free_chanbuf:
+	ring_buffer_backend_free(&buf->backend);
+	return ret;
+}
+
+static void switch_buffer_timer(unsigned long data)
+{
+	struct ring_buffer *buf = (struct ring_buffer *)data;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	/*
+	 * Only flush buffers periodically if readers are active.
+	 */
+	if (atomic_long_read(&buf->active_readers))
+		ring_buffer_switch_slow(buf, SWITCH_ACTIVE);
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		mod_timer_pinned(&buf->switch_timer,
+				 jiffies + chan->switch_timer_interval);
+	else
+		mod_timer(&buf->switch_timer,
+			  jiffies + chan->switch_timer_interval);
+}
+
+static void ring_buffer_start_switch_timer(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (!chan->switch_timer_interval)
+		return;
+
+	init_timer_deferrable(&buf->switch_timer);
+	buf->switch_timer.function = switch_buffer_timer;
+	buf->switch_timer.expires = jiffies + chan->switch_timer_interval;
+	buf->switch_timer.data = (unsigned long)buf;
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		add_timer_on(&buf->switch_timer, buf->backend.cpu);
+	else
+		add_timer(&buf->switch_timer);
+}
+
+static void ring_buffer_stop_switch_timer(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+
+	if (!chan->switch_timer_interval)
+		return;
+
+	del_timer_sync(&buf->switch_timer);
+}
+
+/*
+ * Polling timer to check the channels for data.
+ */
+static void read_buffer_timer(unsigned long data)
+{
+	struct ring_buffer *buf = (struct ring_buffer *)data;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	CHAN_WARN_ON(chan, !buf->backend.allocated);
+
+	if (atomic_long_read(&buf->active_readers)
+	    && ring_buffer_poll_deliver(config, buf, chan)) {
+		wake_up_interruptible(&buf->read_wait);
+		wake_up_interruptible(&chan->read_wait);
+	}
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		mod_timer_pinned(&buf->read_timer,
+				 jiffies + chan->read_timer_interval);
+	else
+		mod_timer(&buf->read_timer,
+			  jiffies + chan->read_timer_interval);
+}
+
+static void ring_buffer_start_read_timer(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (config->wakeup != RING_BUFFER_WAKEUP_BY_TIMER
+	    || !chan->read_timer_interval)
+		return;
+
+	init_timer_deferrable(&buf->read_timer);
+	buf->read_timer.function = read_buffer_timer;
+	buf->read_timer.expires = jiffies + chan->read_timer_interval;
+	buf->read_timer.data = (unsigned long)buf;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		add_timer_on(&buf->read_timer, buf->backend.cpu);
+	else
+		add_timer(&buf->read_timer);
+}
+
+static void ring_buffer_stop_read_timer(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (config->wakeup != RING_BUFFER_WAKEUP_BY_TIMER
+	    || !chan->read_timer_interval)
+		return;
+
+	del_timer_sync(&buf->read_timer);
+	/*
+	 * do one more check to catch data that has been written in the last
+	 * timer period.
+	 */
+	if (ring_buffer_poll_deliver(config, buf, chan)) {
+		wake_up_interruptible(&buf->read_wait);
+		wake_up_interruptible(&chan->read_wait);
+	}
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+/**
+ *	ring_buffer_cpu_hp_callback - CPU hotplug callback
+ *	@nb: notifier block
+ *	@action: hotplug action to take
+ *	@hcpu: CPU number
+ *
+ *	Returns the success/failure of the operation. (%NOTIFY_OK, %NOTIFY_BAD)
+ */
+static
+int __cpuinit ring_buffer_cpu_hp_callback(struct notifier_block *nb,
+					  unsigned long action,
+					  void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	struct channel *chan = container_of(nb, struct channel,
+					    cpu_hp_notifier);
+	struct ring_buffer *buf = per_cpu_ptr(chan->backend.buf, cpu);
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (!chan->cpu_hp_enable)
+		return NOTIFY_DONE;
+
+	CHAN_WARN_ON(chan, config->alloc == RING_BUFFER_ALLOC_GLOBAL);
+
+	switch (action) {
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		ring_buffer_start_switch_timer(buf);
+		ring_buffer_start_read_timer(buf);
+		return NOTIFY_OK;
+
+	case CPU_DOWN_PREPARE:
+	case CPU_DOWN_PREPARE_FROZEN:
+		ring_buffer_stop_switch_timer(buf);
+		ring_buffer_stop_read_timer(buf);
+		return NOTIFY_OK;
+
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		/*
+		 * Performing a buffer switch on a remote CPU. Performed by
+		 * the CPU responsible for doing the hotunplug after the target
+		 * CPU stopped running completely. Ensures that all data
+		 * from that remote CPU is flushed.
+		 */
+		ring_buffer_switch_slow(buf, SWITCH_ACTIVE);
+		return NOTIFY_OK;
+
+	default:
+		return NOTIFY_DONE;
+	}
+}
+#endif
+
+
+/*
+ * For per-cpu buffers, call the reader wakeups before switching the buffer, so
+ * that wake-up-tracing generated events are flushed before going idle. We test
+ * if the spinlock is locked to deal with the race where readers try to sample
+ * the ring buffer before we perform the switch. We let the readers retry in
+ * that case. If there is data in the buffer, the wake up is going to forbid the
+ * CPU running the reader thread from going idle.
+ *
+ * For a global buffer, if the client requested a reader timer, then chances are
+ * we are going to keep the system from going idle anyway, so just bite the
+ * bullet and do the wake up. We have no way to know if we are the last CPU
+ * going to idle, so just switch the buffer. Use a spinlock to ensure we send
+ * the wakeup before performing the buffer switch, in case wakeup is
+ * instrumented and writes data in the buffer.
+ */
+static int ring_buffer_idle_callback(struct notifier_block *nb,
+				  unsigned long val,
+				  void *data)
+{
+	struct channel *chan = container_of(nb, struct channel,
+					    idle_notifier);
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer *buf;
+
+	if (val != IDLE_START)
+		return 0;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		buf = channel_get_ring_buffer(config, chan, smp_processor_id());
+	else
+		buf = channel_get_ring_buffer(config, chan, 0);
+
+	raw_spin_lock(&buf->raw_idle_spinlock);
+	if (config->wakeup == RING_BUFFER_WAKEUP_BY_TIMER
+	    && chan->read_timer_interval
+	    && atomic_long_read(&buf->active_readers)
+	    && (ring_buffer_poll_deliver(config, buf, chan)
+		|| ring_buffer_pending_data(config, buf, chan))) {
+		wake_up_interruptible(&buf->read_wait);
+		wake_up_interruptible(&chan->read_wait);
+	}
+	if (chan->switch_timer_interval)
+		ring_buffer_switch_slow(buf, SWITCH_ACTIVE);
+	raw_spin_unlock(&buf->raw_idle_spinlock);
+
+	return 0;
+}
+
+/*
+ * Holds CPU hotplug.
+ */
+static void channel_unregister_notifiers(struct channel *chan)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	int cpu;
+
+	channel_iterator_unregister_notifiers(chan);
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+#ifdef CONFIG_HOTPLUG_CPU
+		get_online_cpus();
+		chan->cpu_hp_enable = 0;
+		for_each_online_cpu(cpu) {
+			struct ring_buffer *buf = per_cpu_ptr(chan->backend.buf,
+							      cpu);
+			ring_buffer_stop_switch_timer(buf);
+			ring_buffer_stop_read_timer(buf);
+		}
+		put_online_cpus();
+		unregister_cpu_notifier(&chan->cpu_hp_notifier);
+#else
+		for_each_possible_cpu(cpu) {
+			struct ring_buffer *buf = per_cpu_ptr(chan->backend.buf,
+							      cpu);
+			ring_buffer_stop_switch_timer(buf);
+			ring_buffer_stop_read_timer(buf);
+		}
+#endif
+	} else {
+		struct ring_buffer *buf = chan->backend.buf;
+
+		ring_buffer_stop_switch_timer(buf);
+		ring_buffer_stop_read_timer(buf);
+	}
+	unregister_idle_notifier(&chan->idle_notifier);
+	channel_backend_unregister_notifiers(&chan->backend);
+}
+
+static void channel_free(struct channel *chan)
+{
+	channel_iterator_free(chan);
+	channel_backend_free(&chan->backend);
+	kfree(chan);
+}
+
+/**
+ * channel_create - Create channel.
+ * @config: ring buffer instance configuration
+ * @name: name of the channel
+ * @priv: ring buffer client private data
+ * @buf_addr: pointer the the beginning of the preallocated buffer contiguous
+ *            address mapping. It is used only by RING_BUFFER_STATIC
+ *            configuration. It can be set to NULL for other backends.
+ * @subbuf_size: subbuffer size
+ * @num_subbuf: number of subbuffers
+ * @switch_timer_interval: Time interval (in us) to fill sub-buffers with
+ *                         padding to let readers get those sub-buffers.
+ *                         Used for live streaming.
+ * @read_timer_interval: Time interval (in us) to wake up pending readers.
+ *
+ * Holds cpu hotplug.
+ * Returns NULL on failure.
+ */
+struct channel *channel_create(const struct ring_buffer_config *config,
+		   const char *name, void *priv, void *buf_addr,
+		   size_t subbuf_size,
+		   size_t num_subbuf, unsigned int switch_timer_interval,
+		   unsigned int read_timer_interval)
+{
+	int ret, cpu;
+	struct channel *chan;
+
+	if (ring_buffer_check_config(config,
+				     switch_timer_interval,
+				     read_timer_interval))
+		return NULL;
+
+	chan = kzalloc(sizeof(struct channel), GFP_KERNEL);
+	if (!chan)
+		return NULL;
+
+	ret = channel_backend_init(&chan->backend, name, config, priv,
+				   subbuf_size, num_subbuf);
+	if (ret)
+		goto error;
+
+	ret = channel_iterator_init(chan);
+	if (ret)
+		goto error_free_backend;
+
+	chan->commit_count_mask = (~0UL >> chan->backend.num_subbuf_order);
+	chan->switch_timer_interval = usecs_to_jiffies(switch_timer_interval);
+	chan->read_timer_interval = usecs_to_jiffies(read_timer_interval);
+	init_waitqueue_head(&chan->read_wait);
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		/*
+		 * In case of non-hotplug cpu, if the ring-buffer is allocated
+		 * in early initcall, it will not be notified of secondary cpus.
+		 * In that off case, we need to allocate for all possible cpus.
+		 */
+#ifdef CONFIG_HOTPLUG_CPU
+		chan->cpu_hp_notifier.notifier_call =
+				ring_buffer_cpu_hp_callback;
+		chan->cpu_hp_notifier.priority = 6;
+		register_cpu_notifier(&chan->cpu_hp_notifier);
+
+		get_online_cpus();
+		for_each_online_cpu(cpu) {
+			struct ring_buffer *buf = per_cpu_ptr(chan->backend.buf,
+							       cpu);
+			ring_buffer_start_switch_timer(buf);
+			ring_buffer_start_read_timer(buf);
+		}
+		chan->cpu_hp_enable = 1;
+		put_online_cpus();
+#else
+		for_each_possible_cpu(cpu) {
+			struct ring_buffer *buf = per_cpu_ptr(chan->backend.buf,
+							      cpu);
+			ring_buffer_start_switch_timer(buf);
+			ring_buffer_start_read_timer(buf);
+		}
+#endif
+	} else {
+		struct ring_buffer *buf = chan->backend.buf;
+
+		ring_buffer_start_switch_timer(buf);
+		ring_buffer_start_read_timer(buf);
+	}
+
+	chan->idle_notifier.notifier_call = ring_buffer_idle_callback;
+	/*
+	 * smallest prio, run after any tracing activity, right before sleeping.
+	 */
+	chan->idle_notifier.priority = ~0U;
+	register_idle_notifier(&chan->idle_notifier);
+
+	return chan;
+
+error_free_backend:
+	channel_backend_free(&chan->backend);
+error:
+	kfree(chan);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(channel_create);
+
+/**
+ * channel_destroy - Finalize, wait for q.s. and destroy channel.
+ * @chan: channel to destroy
+ *
+ * Holds cpu hotplug.
+ * Call "destroy" callback, finalize channels, wait for readers to release their
+ * reference, then destroy ring buffer data. Note that when readers have
+ * completed data consumption of finalized channels, get_subbuf() will return
+ * -ENODATA. They should release their handle at that point.
+ * Returns the private data pointer.
+ */
+void *channel_destroy(struct channel *chan)
+{
+	int cpu;
+	const struct ring_buffer_config *config = chan->backend.config;
+	void *priv;
+
+	channel_unregister_notifiers(chan);
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		/*
+		 * No need to hold cpu hotplug, because all notifiers have been
+		 * unregistered.
+		 */
+		for_each_channel_cpu(cpu, chan) {
+			struct ring_buffer *buf = per_cpu_ptr(chan->backend.buf,
+							      cpu);
+
+			if (config->cb.buffer_finalize)
+				config->cb.buffer_finalize(buf,
+							   chan->backend.priv,
+							   cpu);
+			if (buf->backend.allocated)
+				ring_buffer_switch_slow(buf, SWITCH_FLUSH);
+			/*
+			 * Perform flush before writing to finalized.
+			 */
+			smp_wmb();
+			ACCESS_ONCE(buf->finalized) = 1;
+			wake_up_interruptible(&buf->read_wait);
+		}
+	} else {
+		struct ring_buffer *buf = chan->backend.buf;
+
+		if (config->cb.buffer_finalize)
+			config->cb.buffer_finalize(buf, chan->backend.priv, -1);
+		if (buf->backend.allocated)
+			ring_buffer_switch_slow(buf, SWITCH_FLUSH);
+		/*
+		 * Perform flush before writing to finalized.
+		 */
+		smp_wmb();
+		ACCESS_ONCE(buf->finalized) = 1;
+		wake_up_interruptible(&buf->read_wait);
+	}
+	wake_up_interruptible(&chan->read_wait);
+
+	while (atomic_long_read(&chan->read_ref) > 0)
+		msleep(100);
+	/* Finish waiting for refcount before free */
+	smp_mb();
+	priv = chan->backend.priv;
+	channel_free(chan);
+	return priv;
+}
+EXPORT_SYMBOL_GPL(channel_destroy);
+
+struct ring_buffer *channel_get_ring_buffer(
+					const struct ring_buffer_config *config,
+					struct channel *chan, int cpu)
+{
+	if (config->alloc == RING_BUFFER_ALLOC_GLOBAL)
+		return chan->backend.buf;
+	else
+		return per_cpu_ptr(chan->backend.buf, cpu);
+}
+EXPORT_SYMBOL_GPL(channel_get_ring_buffer);
+
+int ring_buffer_open_read(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+
+	if (!atomic_long_add_unless(&buf->active_readers, 1, 1))
+		return -EBUSY;
+	atomic_long_inc(&chan->read_ref);
+	smp_mb__after_atomic_inc();
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_open_read);
+
+void ring_buffer_release_read(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+
+	CHAN_WARN_ON(chan, atomic_long_read(&buf->active_readers) != 1);
+	smp_mb__before_atomic_dec();
+	atomic_long_dec(&chan->read_ref);
+	atomic_long_dec(&buf->active_readers);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_release_read);
+
+/*
+ * Promote compiler barrier to a smp_mb().
+ * For the specific ring buffer case, this IPI call should be removed if the
+ * architecture does not reorder writes.  This should eventually be provided by
+ * a separate architecture-specific infrastructure.
+ */
+static void remote_mb(void *info)
+{
+	smp_mb();
+}
+
+/**
+ * ring_buffer_get_subbuf - get exclusive access to subbuffer for reading
+ * @buf: ring buffer
+ * @consumed: pointer where to save the consumed count value (output)
+ *
+ * Returns -ENODATA if buffer is finalized, -EAGAIN if there is currently no
+ * data ready, or 0 if the get operation succeeds.
+ * Busy-loop trying to get data if the idle sequence lock is held.
+ */
+int ring_buffer_get_subbuf(struct ring_buffer *buf, unsigned long *consumed)
+{
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long consumed_old, consumed_idx, commit_count, write_offset;
+	int ret;
+	int finalized;
+
+retry:
+	finalized = ACCESS_ONCE(buf->finalized);
+	/*
+	 * Read finalized before counters.
+	 */
+	smp_rmb();
+	consumed_old = atomic_long_read(&buf->consumed);
+	consumed_idx = subbuf_index(consumed_old, chan);
+	commit_count = v_read(config, &buf->commit_cold[consumed_idx].cc_sb);
+	/*
+	 * Make sure we read the commit count before reading the buffer
+	 * data and the write offset. Correct consumed offset ordering
+	 * wrt commit count is insured by the use of cmpxchg to update
+	 * the consumed offset.
+	 * smp_call_function_single can fail if the remote CPU is offline,
+	 * this is OK because then there is no wmb to execute there.
+	 * If our thread is executing on the same CPU as the on the buffers
+	 * belongs to, we don't have to synchronize it at all. If we are
+	 * migrated, the scheduler will take care of the memory barriers.
+	 * Normally, smp_call_function_single() should ensure program order when
+	 * executing the remote function, which implies that it surrounds the
+	 * function execution with :
+	 * smp_mb()
+	 * send IPI
+	 * csd_lock_wait
+	 *                recv IPI
+	 *                smp_mb()
+	 *                exec. function
+	 *                smp_mb()
+	 *                csd unlock
+	 * smp_mb()
+	 *
+	 * However, smp_call_function_single() does not seem to clearly execute
+	 * such barriers. It depends on spinlock semantic to provide the barrier
+	 * before executing the IPI and, when busy-looping, csd_lock_wait only
+	 * executes smp_mb() when it has to wait for the other CPU.
+	 *
+	 * I don't trust this code. Therefore, let's add the smp_mb() sequence
+	 * required ourself, even if duplicated. It has no performance impact
+	 * anyway.
+	 *
+	 * smp_mb() is needed because smp_rmb() and smp_wmb() only order read vs
+	 * read and write vs write. They do not ensure core synchronization. We
+	 * really have to ensure total order between the 3 barriers running on
+	 * the 2 CPUs.
+	 */
+	if (config->ipi == RING_BUFFER_IPI_BARRIER) {
+		if (config->sync == RING_BUFFER_SYNC_PER_CPU
+		    && config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+			if (raw_smp_processor_id() != buf->backend.cpu) {
+				/* Total order with IPI handler smp_mb() */
+				smp_mb();
+				smp_call_function_single(buf->backend.cpu,
+							 remote_mb, NULL, 1);
+				/* Total order with IPI handler smp_mb() */
+				smp_mb();
+			}
+		} else {
+			/* Total order with IPI handler smp_mb() */
+			smp_mb();
+			smp_call_function(remote_mb, NULL, 1);
+			/* Total order with IPI handler smp_mb() */
+			smp_mb();
+		}
+	} else {
+		/*
+		 * Local rmb to match the remote wmb to read the commit count
+		 * before the buffer data and the write offset.
+		 */
+		smp_rmb();
+	}
+
+	write_offset = v_read(config, &buf->offset);
+
+	/*
+	 * Check that the subbuffer we are trying to consume has been
+	 * already fully committed.
+	 */
+	if (((commit_count - chan->backend.subbuf_size)
+	     & chan->commit_count_mask)
+	    - (buf_trunc(consumed_old, chan)
+	       >> chan->backend.num_subbuf_order)
+	    != 0)
+		goto nodata;
+
+	/*
+	 * Check that we are not about to read the same subbuffer in
+	 * which the writer head is.
+	 */
+	if ((subbuf_trunc(write_offset, chan)
+	   - subbuf_trunc(consumed_old, chan))
+	   == 0)
+		goto nodata;
+
+	/*
+	 * Failure to get the subbuffer causes a busy-loop retry without going
+	 * to a wait queue. These are caused by short-lived race windows where
+	 * the writer is getting access to a subbuffer we were trying to get
+	 * access to. Also checks that the "consumed" buffer count we are
+	 * looking for matches the one contained in the subbuffer id.
+	 */
+	ret = update_read_sb_index(config, &buf->backend, &chan->backend,
+				   consumed_idx,
+				   buf_trunc_val(consumed_old, chan));
+	if (ret)
+		goto retry;
+
+	*consumed = consumed_old;
+	return 0;
+
+nodata:
+	/*
+	 * The memory barriers __wait_event()/wake_up_interruptible() take care
+	 * of "raw_spin_is_locked" memory ordering.
+	 */
+	if (finalized)
+		return -ENODATA;
+	else if (raw_spin_is_locked(&buf->raw_idle_spinlock))
+		goto retry;
+	else
+		return -EAGAIN;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_get_subbuf);
+
+/**
+ * ring_buffer_put_subbuf - release exclusive subbuffer access
+ * @buf: ring buffer
+ * @consumed: consumed count value (output)
+ */
+
+void ring_buffer_put_subbuf(struct ring_buffer *buf, unsigned long consumed)
+{
+	struct ring_buffer_backend *bufb = &buf->backend;
+	struct channel *chan = bufb->chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long consumed_new, consumed_old, sb_bindex;
+
+	CHAN_WARN_ON(chan, atomic_long_read(&buf->active_readers) != 1);
+
+	consumed_old = consumed;
+	consumed_new = subbuf_align(consumed_old, chan);
+
+	/*
+	 * Clear the records_unread counter. (overruns counter)
+	 * Can still be non-zero if a file reader simply grabbed the data
+	 * without using iterators.
+	 */
+	sb_bindex = subbuffer_id_get_index(config, bufb->buf_rsb.id);
+	v_add(config, v_read(config, &bufb->array[sb_bindex]->records_unread),
+	      &bufb->records_read);
+	v_set(config, &bufb->array[sb_bindex]->records_unread, 0);
+	CHAN_WARN_ON(chan, config->mode == RING_BUFFER_OVERWRITE
+		     && subbuffer_id_is_noref(config, bufb->buf_rsb.id));
+	subbuffer_id_set_noref(config, &bufb->buf_rsb.id);
+
+	/*
+	 * We exchange the subbuffer pages. No corruption possible even if the
+	 * writer did push us, because our subbuffer pages were owned by the
+	 * reader. If the consumed cmpxchg fails, this is because we have been
+	 * pushed by the writer in flight recorder mode. Don't retry, just keep
+	 * the new consumed offset given by the writer.
+	 */
+	atomic_long_cmpxchg(&buf->consumed, consumed_old, consumed_new);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_put_subbuf);
+
+/*
+ * cons_offset is an iterator on all subbuffer offsets between the reader
+ * position and the writer position. (inclusive)
+ */
+static
+void ring_buffer_print_subbuffer_errors(struct ring_buffer *buf,
+					struct channel *chan,
+					unsigned long cons_offset,
+					int cpu)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long cons_idx, commit_count, commit_count_sb;
+
+	cons_idx = subbuf_index(cons_offset, chan);
+	commit_count = v_read(config, &buf->commit_hot[cons_idx].cc);
+	commit_count_sb = v_read(config, &buf->commit_cold[cons_idx].cc_sb);
+
+	if (subbuf_offset(commit_count, chan) != 0)
+		printk(KERN_WARNING
+		       "ring buffer %s, cpu %d: "
+		       "commit count in subbuffer %lu,\n"
+		       "expecting multiples of %lu bytes\n"
+		       "  [ %lu bytes committed, %lu bytes reader-visible ]\n",
+		       chan->backend.name, cpu, cons_idx,
+		       chan->backend.subbuf_size,
+		       commit_count, commit_count_sb);
+
+	printk(KERN_DEBUG "ring buffer: %s, cpu %d: %lu bytes committed\n",
+	       chan->backend.name, cpu, commit_count);
+}
+
+static
+void ring_buffer_print_buffer_errors(struct ring_buffer *buf,
+				     struct channel *chan,
+				     void *priv, int cpu)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long write_offset, cons_offset;
+
+	/*
+	 * Can be called in the error path of allocation when
+	 * trans_channel_data is not yet set.
+	 */
+	if (!chan)
+		return;
+	/*
+	 * No need to order commit_count, write_offset and cons_offset reads
+	 * because we execute at teardown when no more writer nor reader
+	 * references are left.
+	 */
+	write_offset = v_read(config, &buf->offset);
+	cons_offset = atomic_long_read(&buf->consumed);
+	if (write_offset != cons_offset)
+		printk(KERN_WARNING
+		       "ring buffer %s, cpu %d: "
+		       "non-consumed data\n"
+		       "  [ %lu bytes written, %lu bytes read ]\n",
+		       chan->backend.name, cpu, write_offset, cons_offset);
+
+	for (cons_offset = atomic_long_read(&buf->consumed);
+	     (long) (subbuf_trunc((unsigned long) v_read(config, &buf->offset),
+				  chan)
+		     - cons_offset) > 0;
+	     cons_offset = subbuf_align(cons_offset, chan))
+		ring_buffer_print_subbuffer_errors(buf, chan, cons_offset,
+						   cpu);
+}
+
+static
+void ring_buffer_print_errors(struct channel *chan,
+			      struct ring_buffer *buf,
+			      int cpu)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	void *priv = chan->backend.priv;
+
+	printk(KERN_DEBUG "ring buffer %s, cpu %d: %lu records written, "
+			  "%lu records overrun\n",
+			  chan->backend.name, cpu,
+			  v_read(config, &buf->records_count),
+			  v_read(config, &buf->records_overrun));
+
+	if (v_read(config, &buf->records_lost_full)
+	    || v_read(config, &buf->records_lost_wrap)
+	    || v_read(config, &buf->records_lost_big))
+		printk(KERN_WARNING
+		       "ring buffer %s, cpu %d: records were lost. Caused by:\n"
+		       "  [ %lu buffer full, %lu nest buffer wrap-around, "
+		       "%lu event too big ]\n",
+		       chan->backend.name, cpu,
+		       v_read(config, &buf->records_lost_full),
+		       v_read(config, &buf->records_lost_wrap),
+		       v_read(config, &buf->records_lost_big));
+
+	ring_buffer_print_buffer_errors(buf, chan, priv, cpu);
+}
+
+/*
+ * ring_buffer_switch_old_start: Populate old subbuffer header.
+ *
+ * Only executed when the buffer is finalized, in SWITCH_FLUSH.
+ */
+static
+void ring_buffer_switch_old_start(struct ring_buffer *buf,
+				  struct channel *chan,
+				  struct switch_offsets *offsets,
+				  u64 tsc)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long oldidx = subbuf_index(offsets->old, chan);
+	unsigned long commit_count;
+
+	config->cb.buffer_begin(buf, tsc, oldidx);
+
+	/*
+	 * Order all writes to buffer before the commit count update that will
+	 * determine that the subbuffer is full.
+	 */
+	if (config->ipi == RING_BUFFER_IPI_BARRIER) {
+		/*
+		 * Must write slot data before incrementing commit count.  This
+		 * compiler barrier is upgraded into a smp_mb() by the IPI sent
+		 * by get_subbuf().
+		 */
+		barrier();
+	} else
+		smp_wmb();
+	v_add(config, config->cb.subbuffer_header_size(),
+	      &buf->commit_hot[oldidx].cc);
+	commit_count = v_read(config, &buf->commit_hot[oldidx].cc);
+	/* Check if the written buffer has to be delivered */
+	ring_buffer_check_deliver(config, buf, chan, offsets->old, commit_count,
+				  oldidx);
+	ring_buffer_write_commit_counter(config, buf, chan, oldidx,
+					 offsets->old, commit_count,
+					 config->cb.subbuffer_header_size());
+}
+
+/*
+ * ring_buffer_switch_old_end: switch old subbuffer
+ *
+ * Note : offset_old should never be 0 here. It is ok, because we never perform
+ * buffer switch on an empty subbuffer in SWITCH_ACTIVE mode. The caller
+ * increments the offset_old value when doing a SWITCH_FLUSH on an empty
+ * subbuffer.
+ */
+static
+void ring_buffer_switch_old_end(struct ring_buffer *buf,
+					   struct channel *chan,
+					   struct switch_offsets *offsets,
+					   u64 tsc)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long oldidx = subbuf_index(offsets->old - 1, chan);
+	unsigned long commit_count, padding_size, data_size;
+
+	data_size = subbuf_offset(offsets->old - 1, chan) + 1;
+	padding_size = chan->backend.subbuf_size - data_size;
+	subbuffer_set_data_size(config, &buf->backend, oldidx, data_size);
+
+	/*
+	 * Order all writes to buffer before the commit count update that will
+	 * determine that the subbuffer is full.
+	 */
+	if (config->ipi == RING_BUFFER_IPI_BARRIER) {
+		/*
+		 * Must write slot data before incrementing commit count.  This
+		 * compiler barrier is upgraded into a smp_mb() by the IPI sent
+		 * by get_subbuf().
+		 */
+		barrier();
+	} else
+		smp_wmb();
+	v_add(config, padding_size, &buf->commit_hot[oldidx].cc);
+	commit_count = v_read(config, &buf->commit_hot[oldidx].cc);
+	ring_buffer_check_deliver(config, buf, chan, offsets->old - 1,
+				  commit_count, oldidx);
+	ring_buffer_write_commit_counter(config, buf, chan, oldidx,
+					 offsets->old, commit_count,
+					 padding_size);
+}
+
+/*
+ * ring_buffer_switch_new_start: Populate new subbuffer.
+ *
+ * This code can be executed unordered : writers may already have written to the
+ * sub-buffer before this code gets executed, caution.  The commit makes sure
+ * that this code is executed before the deliver of this sub-buffer.
+ */
+static
+void ring_buffer_switch_new_start(struct ring_buffer *buf,
+					   struct channel *chan,
+					   struct switch_offsets *offsets,
+					   u64 tsc)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long beginidx = subbuf_index(offsets->begin, chan);
+	unsigned long commit_count;
+
+	config->cb.buffer_begin(buf, tsc, beginidx);
+
+	/*
+	 * Order all writes to buffer before the commit count update that will
+	 * determine that the subbuffer is full.
+	 */
+	if (config->ipi == RING_BUFFER_IPI_BARRIER) {
+		/*
+		 * Must write slot data before incrementing commit count.  This
+		 * compiler barrier is upgraded into a smp_mb() by the IPI sent
+		 * by get_subbuf().
+		 */
+		barrier();
+	} else
+		smp_wmb();
+	v_add(config, config->cb.subbuffer_header_size(),
+	      &buf->commit_hot[beginidx].cc);
+	commit_count = v_read(config, &buf->commit_hot[beginidx].cc);
+	/* Check if the written buffer has to be delivered */
+	ring_buffer_check_deliver(config, buf, chan, offsets->begin,
+				  commit_count, beginidx);
+	ring_buffer_write_commit_counter(config, buf, chan, beginidx,
+					 offsets->begin, commit_count,
+					 config->cb.subbuffer_header_size());
+}
+
+/*
+ * ring_buffer_switch_new_end: finish switching current subbuffer
+ *
+ * The only remaining threads could be the ones with pending commits. They will
+ * have to do the deliver themselves.
+ */
+static
+void ring_buffer_switch_new_end(struct ring_buffer *buf,
+					    struct channel *chan,
+					    struct switch_offsets *offsets,
+					    u64 tsc)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long endidx = subbuf_index(offsets->end - 1, chan);
+	unsigned long commit_count, padding_size, data_size;
+
+	data_size = subbuf_offset(offsets->end - 1, chan) + 1;
+	padding_size = chan->backend.subbuf_size - data_size;
+	subbuffer_set_data_size(config, &buf->backend, endidx, data_size);
+
+	/*
+	 * Order all writes to buffer before the commit count update that will
+	 * determine that the subbuffer is full.
+	 */
+	if (config->ipi == RING_BUFFER_IPI_BARRIER) {
+		/*
+		 * Must write slot data before incrementing commit count.  This
+		 * compiler barrier is upgraded into a smp_mb() by the IPI sent
+		 * by get_subbuf().
+		 */
+		barrier();
+	} else
+		smp_wmb();
+	v_add(config, padding_size, &buf->commit_hot[endidx].cc);
+	commit_count = v_read(config, &buf->commit_hot[endidx].cc);
+	ring_buffer_check_deliver(config, buf, chan, offsets->end - 1,
+				  commit_count, endidx);
+	ring_buffer_write_commit_counter(config, buf, chan, endidx,
+					 offsets->end, commit_count,
+					 padding_size);
+}
+
+/*
+ * Returns :
+ * 0 if ok
+ * !0 if execution must be aborted.
+ */
+static
+int ring_buffer_try_switch_slow(enum switch_mode mode,
+				struct ring_buffer *buf,
+				struct channel *chan,
+				struct switch_offsets *offsets,
+				u64 *tsc)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long off;
+
+	offsets->begin = v_read(config, &buf->offset);
+	offsets->old = offsets->begin;
+	offsets->switch_old_start = 0;
+	off = subbuf_offset(offsets->begin, chan);
+
+	*tsc = config->cb.ring_buffer_clock_read(chan);
+
+	/*
+	 * Ensure we flush the header of an empty subbuffer when doing the
+	 * finalize (SWITCH_FLUSH). This ensures that we end up knowing the
+	 * total data gathering duration even if there were no records saved
+	 * after the last buffer switch.
+	 * In SWITCH_ACTIVE mode, switch the buffer when it contains events.
+	 * SWITCH_ACTIVE only flushes the current subbuffer, dealing with end of
+	 * subbuffer header as appropriate.
+	 * The next record that reserves space will be responsible for
+	 * populating the following subbuffer header. We choose not to populate
+	 * the next subbuffer header here because we want to be able to use
+	 * SWITCH_ACTIVE for periodical buffer flush and CPU idle entry buffer
+	 * flush, which must guarantee that all the buffer content (records and
+	 * header timestamps) are visible to the reader. This is required for
+	 * quiescence guarantees for the fusion merge.
+	 */
+	if (mode == SWITCH_FLUSH || off > 0) {
+		if (unlikely(off == 0)) {
+			/*
+			 * The client does not save any header information.
+			 * Don't switch empty subbuffer on finalize, because it
+			 * is invalid to deliver a completely empty subbuffer.
+			 */
+			if (!config->cb.subbuffer_header_size())
+				return -1;
+			/*
+			 * Need to write the subbuffer start header on finalize.
+			 */
+			offsets->switch_old_start = 1;
+		}
+		offsets->begin = subbuf_align(offsets->begin, chan);
+	} else
+		return -1;	/* we do not have to switch : buffer is empty */
+	/* Note: old points to the next subbuf at offset 0 */
+	offsets->end = offsets->begin;
+	return 0;
+}
+
+/*
+ * Force a sub-buffer switch. This operation is completely reentrant : can be
+ * called while tracing is active with absolutely no lock held.
+ *
+ * Note, however, that as a v_cmpxchg is used for some atomic
+ * operations, this function must be called from the CPU which owns the buffer
+ * for a ACTIVE flush.
+ */
+void ring_buffer_switch_slow(struct ring_buffer *buf, enum switch_mode mode)
+{
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct switch_offsets offsets;
+	unsigned long oldidx;
+	u64 tsc;
+
+	offsets.size = 0;
+
+	/*
+	 * Perform retryable operations.
+	 */
+	do {
+		if (ring_buffer_try_switch_slow(mode, buf, chan, &offsets,
+						&tsc))
+			return;	/* Switch not needed */
+	} while (v_cmpxchg(config, &buf->offset, offsets.old, offsets.end)
+		 != offsets.old);
+
+	/*
+	 * Atomically update last_tsc. This update races against concurrent
+	 * atomic updates, but the race will always cause supplementary full TSC
+	 * records, never the opposite (missing a full TSC record when it would
+	 * be needed).
+	 */
+	save_last_tsc(config, buf, tsc);
+
+	/*
+	 * Push the reader if necessary
+	 */
+	ring_buffer_reserve_push_reader(buf, chan, offsets.old);
+
+	oldidx = subbuf_index(offsets.old, chan);
+	ring_buffer_clear_noref(config, &buf->backend, oldidx);
+
+	/*
+	 * May need to populate header start on SWITCH_FLUSH.
+	 */
+	if (offsets.switch_old_start) {
+		ring_buffer_switch_old_start(buf, chan, &offsets, tsc);
+		offsets.old += config->cb.subbuffer_header_size();
+	}
+
+	/*
+	 * Switch old subbuffer.
+	 */
+	ring_buffer_switch_old_end(buf, chan, &offsets, tsc);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_switch_slow);
+
+/*
+ * Returns :
+ * 0 if ok
+ * !0 if execution must be aborted.
+ */
+static
+int ring_buffer_try_reserve_slow(struct ring_buffer *buf, struct channel *chan,
+				 struct switch_offsets *offsets,
+				 struct ring_buffer_ctx *ctx)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long reserve_commit_diff;
+
+	offsets->begin = v_read(config, &buf->offset);
+	offsets->old = offsets->begin;
+	offsets->switch_new_start = 0;
+	offsets->switch_new_end = 0;
+	offsets->switch_old_end = 0;
+	offsets->pre_header_padding = 0;
+
+	ctx->tsc = config->cb.ring_buffer_clock_read(chan);
+
+	if (last_tsc_overflow(config, buf, ctx->tsc))
+		ctx->rflags = RING_BUFFER_RFLAG_FULL_TSC;
+
+	if (unlikely(subbuf_offset(offsets->begin, ctx->chan) == 0)) {
+		offsets->switch_new_start = 1;		/* For offsets->begin */
+	} else {
+		offsets->size = config->cb.record_header_size(config, chan,
+						offsets->begin,
+						ctx->data_size,
+						&offsets->pre_header_padding,
+						ctx->rflags, ctx);
+		offsets->size +=
+			ring_buffer_align(config,
+					  offsets->begin + offsets->size,
+					  ctx->largest_align)
+			+ ctx->data_size;
+		if (unlikely((subbuf_offset(offsets->begin, chan) +
+			     offsets->size) > chan->backend.subbuf_size)) {
+			offsets->switch_old_end = 1;	/* For offsets->old */
+			offsets->switch_new_start = 1;	/* For offsets->begin */
+		}
+	}
+	if (unlikely(offsets->switch_new_start)) {
+		unsigned long sb_index;
+
+		/*
+		 * We are typically not filling the previous buffer completely.
+		 */
+		if (likely(offsets->switch_old_end))
+			offsets->begin = subbuf_align(offsets->begin, chan);
+		offsets->begin = offsets->begin
+				 + config->cb.subbuffer_header_size();
+		/* Test new buffer integrity */
+		sb_index = subbuf_index(offsets->begin, chan);
+		reserve_commit_diff =
+		  (buf_trunc(offsets->begin, chan)
+		   >> chan->backend.num_subbuf_order)
+		  - ((unsigned long) v_read(config,
+					    &buf->commit_cold[sb_index].cc_sb)
+		     & chan->commit_count_mask);
+		if (likely(reserve_commit_diff == 0)) {
+			/* Next subbuffer not being written to. */
+			if (unlikely(config->mode != RING_BUFFER_OVERWRITE &&
+				(subbuf_trunc(offsets->begin, chan)
+				 - subbuf_trunc((unsigned long)
+				     atomic_long_read(&buf->consumed), chan))
+				>= chan->backend.buf_size)) {
+				/*
+				 * We do not overwrite non consumed buffers
+				 * and we are full : record is lost.
+				 */
+				v_inc(config, &buf->records_lost_full);
+				return -1;
+			} else {
+				/*
+				 * Next subbuffer not being written to, and we
+				 * are either in overwrite mode or the buffer is
+				 * not full. It's safe to write in this new
+				 * subbuffer.
+				 */
+			}
+		} else {
+			/*
+			 * Next subbuffer reserve offset does not match the
+			 * commit offset. Drop record in producer-consumer and
+			 * overwrite mode. Caused by either a writer OOPS or too
+			 * many nested writes over a reserve/commit pair.
+			 */
+			v_inc(config, &buf->records_lost_wrap);
+			return -1;
+		}
+		offsets->size =
+			config->cb.record_header_size(config, chan,
+						offsets->begin,
+						ctx->data_size,
+						&offsets->pre_header_padding,
+						ctx->rflags, ctx);
+		offsets->size +=
+			ring_buffer_align(config,
+					  offsets->begin + offsets->size,
+					  ctx->largest_align)
+			+ ctx->data_size;
+		if (unlikely((subbuf_offset(offsets->begin, chan)
+			     + offsets->size) > chan->backend.subbuf_size)) {
+			/*
+			 * Record too big for subbuffers, report error, don't
+			 * complete the sub-buffer switch.
+			 */
+			v_inc(config, &buf->records_lost_big);
+			return -1;
+		} else {
+			/*
+			 * We just made a successful buffer switch and the
+			 * record fits in the new subbuffer. Let's write.
+			 */
+		}
+	} else {
+		/*
+		 * Record fits in the current buffer and we are not on a switch
+		 * boundary. It's safe to write.
+		 */
+	}
+	offsets->end = offsets->begin + offsets->size;
+
+	if (unlikely((subbuf_offset(offsets->end, chan)) == 0)) {
+		/*
+		 * The offset_end will fall at the very beginning of the next
+		 * subbuffer.
+		 */
+		offsets->switch_new_end = 1;	/* For offsets->begin */
+	}
+	return 0;
+}
+
+/**
+ * ring_buffer_reserve_slow - Atomic slot reservation in a buffer.
+ * @ctx: ring buffer context.
+ *
+ * Return : -ENOSPC if not enough space, else returns 0.
+ * It will take care of sub-buffer switching.
+ */
+int ring_buffer_reserve_slow(struct ring_buffer_ctx *ctx)
+{
+	struct channel *chan = ctx->chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer *buf;
+	struct switch_offsets offsets;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		buf = per_cpu_ptr(chan->backend.buf, ctx->cpu);
+	else
+		buf = chan->backend.buf;
+	ctx->buf = buf;
+
+	offsets.size = 0;
+
+	do {
+		if (unlikely(ring_buffer_try_reserve_slow(buf, chan, &offsets,
+							  ctx)))
+			return -ENOSPC;
+	} while (unlikely(v_cmpxchg(config, &buf->offset, offsets.old,
+				    offsets.end)
+			  != offsets.old));
+
+	/*
+	 * Atomically update last_tsc. This update races against concurrent
+	 * atomic updates, but the race will always cause supplementary full TSC
+	 * records, never the opposite (missing a full TSC record when it would
+	 * be needed).
+	 */
+	save_last_tsc(config, buf, ctx->tsc);
+
+	/*
+	 * Push the reader if necessary
+	 */
+	ring_buffer_reserve_push_reader(buf, chan, offsets.end - 1);
+
+	/*
+	 * Clear noref flag for this subbuffer.
+	 */
+	ring_buffer_clear_noref(config, &buf->backend,
+				subbuf_index(offsets.end - 1, chan));
+
+	/*
+	 * Switch old subbuffer if needed.
+	 */
+	if (unlikely(offsets.switch_old_end)) {
+		ring_buffer_clear_noref(config, &buf->backend,
+					subbuf_index(offsets.old - 1, chan));
+		ring_buffer_switch_old_end(buf, chan, &offsets, ctx->tsc);
+	}
+
+	/*
+	 * Populate new subbuffer.
+	 */
+	if (unlikely(offsets.switch_new_start))
+		ring_buffer_switch_new_start(buf, chan, &offsets, ctx->tsc);
+
+	if (unlikely(offsets.switch_new_end))
+		ring_buffer_switch_new_end(buf, chan, &offsets, ctx->tsc);
+
+	ctx->slot_size = offsets.size;
+	ctx->pre_offset = offsets.begin;
+	ctx->buf_offset = offsets.begin + offsets.pre_header_padding;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_reserve_slow);
Index: linux.trees.git/lib/ringbuffer/Makefile
===================================================================
--- linux.trees.git.orig/lib/ringbuffer/Makefile	2010-07-09 18:09:01.000000000 -0400
+++ linux.trees.git/lib/ringbuffer/Makefile	2010-07-09 18:13:53.000000000 -0400
@@ -1 +1,2 @@
 obj-y += ring_buffer_backend.o
+obj-y += ring_buffer_frontend.o
Index: linux.trees.git/lib/Kconfig
===================================================================
--- linux.trees.git.orig/lib/Kconfig	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/lib/Kconfig	2010-07-09 18:13:53.000000000 -0400
@@ -83,6 +83,18 @@ config LIBCRC32C
 	  require M here.  See Castagnoli93.
 	  Module will be libcrc32c.
 
+config LIB_RING_BUFFER
+	bool "Ring Buffer"
+	help
+	  This option provides a generic ring buffer.
+
+config LIB_RING_BUFFER_CLIENTS
+	tristate "Ring Buffer Clients"
+	help
+	  This option provides three generic ring buffer clients: global
+	  buffers, per-cpu buffers with global iterators, and per-cpu buffers
+	  with local per-cpu iterators.
+
 config AUDIT_GENERIC
 	bool
 	depends on AUDIT && !AUDIT_ARCH
Index: linux.trees.git/lib/Makefile
===================================================================
--- linux.trees.git.orig/lib/Makefile	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/lib/Makefile	2010-07-09 18:13:53.000000000 -0400
@@ -107,6 +107,8 @@ obj-$(CONFIG_GENERIC_ATOMIC64) += atomic
 
 obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o
 
+obj-$(CONFIG_LIB_RING_BUFFER) += ringbuffer/
+
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
Index: linux.trees.git/include/linux/ringbuffer/api.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/api.h	2010-07-09 18:13:53.000000000 -0400
@@ -0,0 +1,25 @@
+#ifndef _LINUX_RING_BUFFER_API_H
+#define _LINUX_RING_BUFFER_API_H
+
+/*
+ * linux/ringbuffer/api.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring Buffer API.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/backend.h>
+#include <linux/ringbuffer/frontend.h>
+#include <linux/ringbuffer/vfs.h>
+
+/*
+ * ring_buffer_frontend_api.h contains static inline functions that depend on
+ * client static inlines. Hence the inclusion of this "api" header only
+ * within the client.
+ */
+#include <linux/ringbuffer/frontend_api.h>
+
+#endif /* _LINUX_RING_BUFFER_API_H */
Index: linux.trees.git/include/linux/ringbuffer/config.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/config.h	2010-07-09 18:24:22.000000000 -0400
@@ -0,0 +1,309 @@
+#ifndef _LINUX_RING_BUFFER_CONFIG_H
+#define _LINUX_RING_BUFFER_CONFIG_H
+
+/*
+ * linux/ringbuffer/config.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer configuration header. Note: after declaring the standard inline
+ * functions, clients should also include linux/ringbuffer/api.h.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/types.h>
+#include <linux/percpu.h>
+
+struct ring_buffer;
+struct channel;
+struct ring_buffer_config;
+struct ring_buffer_ctx;
+
+/*
+ * Ring buffer client callbacks. Only used by slow path, never on fast path.
+ * For the fast path, record_header_size(), ring_buffer_clock_read() should be
+ * provided as inline functions too.  These may simply return 0 if not used by
+ * the client.
+ */
+struct ring_buffer_client_cb {
+	/* Mandatory callbacks */
+
+	/* A static inline version is also required for fast path */
+	u64 (*ring_buffer_clock_read) (struct channel *chan);
+	size_t (*record_header_size) (const struct ring_buffer_config *config,
+				      struct channel *chan, size_t offset,
+				      size_t data_size,
+				      size_t *pre_header_padding,
+				      unsigned int rflags,
+				      struct ring_buffer_ctx *ctx);
+
+	/* Slow path only, at subbuffer switch */
+	size_t (*subbuffer_header_size) (void);
+	void (*buffer_begin) (struct ring_buffer *buf, u64 tsc,
+			      unsigned int subbuf_idx);
+	void (*buffer_end) (struct ring_buffer *buf, u64 tsc,
+			    unsigned int subbuf_idx, unsigned long data_size);
+
+	/* Optional callbacks (can be set to NULL) */
+
+	/* Called at buffer creation/finalize */
+	int (*buffer_create) (struct ring_buffer *buf, void *priv,
+			      int cpu, const char *name);
+	/*
+	 * Clients should guarantee that no new reader handle can be opened
+	 * after finalize.
+	 */
+	void (*buffer_finalize) (struct ring_buffer *buf, void *priv, int cpu);
+
+	/*
+	 * Extract header length, payload length and timestamp from event
+	 * record. Used by buffer iterators. Timestamp is only used by channel
+	 * iterator.
+	 */
+	void (*record_get) (const struct ring_buffer_config *config,
+			    struct channel *chan, struct ring_buffer *buf,
+			    size_t offset, size_t *header_len,
+			    size_t *payload_len, u64 *timestamp);
+};
+
+/*
+ * Ring buffer instance configuration.
+ *
+ * Declare as "static const" within the client object to ensure the inline fast
+ * paths can be optimized.
+ *
+ * alloc/sync pairs:
+ *
+ * RING_BUFFER_ALLOC_PER_CPU and RING_BUFFER_SYNC_PER_CPU :
+ *   Per-cpu buffers with per-cpu synchronization. Tracing must be performed
+ *   with preemption disabled (ring_buffer_get_cpu() and ring_buffer_put_cpu()).
+ *
+ * RING_BUFFER_ALLOC_PER_CPU and RING_BUFFER_SYNC_GLOBAL :
+ *   Per-cpu buffer with global synchronization. Tracing can be performed with
+ *   preemption enabled, statistically stays on the local buffers.
+ *
+ * RING_BUFFER_ALLOC_GLOBAL and RING_BUFFER_SYNC_PER_CPU :
+ *   Should only be used for buffers belonging to a single thread or protected
+ *   by mutual exclusion by the client. Note that periodical sub-buffer switch
+ *   should be disabled in this kind of configuration.
+ *
+ * RING_BUFFER_ALLOC_GLOBAL and RING_BUFFER_SYNC_GLOBAL :
+ *   Global shared buffer with global synchronization.
+ *
+ * wakeup:
+ *
+ * RING_BUFFER_WAKEUP_BY_TIMER uses per-cpu deferrable timers to poll the
+ * buffers and wake up readers if data is ready. Mainly useful for tracers which
+ * don't want to call into the wakeup code on the tracing path. Use in
+ * combination with "read_timer_interval" channel_create() argument.
+ *
+ * RING_BUFFER_WAKEUP_BY_WRITER directly wakes up readers when a subbuffer is
+ * ready to read. Lower latencies before the reader is woken up. Mainly suitable
+ * for drivers.
+ *
+ * RING_BUFFER_WAKEUP_NONE does not perform any wakeup whatsoever. The client
+ * has the responsibility to perform wakeups.
+ */
+struct ring_buffer_config {
+	enum {
+		RING_BUFFER_ALLOC_PER_CPU,
+		RING_BUFFER_ALLOC_GLOBAL,
+	} alloc;
+	enum {
+		RING_BUFFER_SYNC_PER_CPU,	/* Wait-free */
+		RING_BUFFER_SYNC_GLOBAL,	/* Lock-free */
+	} sync;
+	enum {
+		RING_BUFFER_OVERWRITE,		/* Overwrite when buffer full */
+		RING_BUFFER_DISCARD,		/* Discard when buffer full */
+	} mode;
+	enum {
+		RING_BUFFER_NATURAL,
+		RING_BUFFER_PACKED,
+	} align;
+	enum {
+		RING_BUFFER_SPLICE,
+		RING_BUFFER_MMAP,
+		RING_BUFFER_READ,		/* TODO */
+		RING_BUFFER_ITERATOR,
+		RING_BUFFER_NONE,
+	} output;
+	enum {
+		RING_BUFFER_PAGE,
+		RING_BUFFER_VMAP,		/* TODO */
+		RING_BUFFER_STATIC,		/* TODO */
+	} backend;
+	enum {
+		RING_BUFFER_NO_OOPS_CONSISTENCY,
+		RING_BUFFER_OOPS_CONSISTENCY,
+	} oops;
+	enum {
+		RING_BUFFER_IPI_BARRIER,
+		RING_BUFFER_NO_IPI_BARRIER,
+	} ipi;
+	enum {
+		RING_BUFFER_WAKEUP_BY_TIMER,	/* wake up performed by timer */
+		RING_BUFFER_WAKEUP_BY_WRITER,	/*
+						 * writer wakes up reader,
+						 * not lock-free
+						 * (takes spinlock).
+						 */
+	} wakeup;
+	/*
+	 * tsc_bits: timestamp bits saved at each record.
+	 *   0 and 64 disable the timestamp compression scheme.
+	 */
+	unsigned int tsc_bits;
+	struct ring_buffer_client_cb cb;
+};
+
+/*
+ * ring buffer context
+ *
+ * Context passed to ring_buffer_reserve(), ring_buffer_commit(),
+ * ring_buffer_try_discard_reserve(), ring_buffer_align_ctx() and
+ * ring_buffer_write().
+ */
+struct ring_buffer_ctx {
+	/* input received by ring_buffer_reserve(), saved here. */
+	struct channel *chan;		/* channel */
+	void *priv;			/* client private data */
+	size_t data_size;		/* size of payload */
+	int largest_align;		/*
+					 * alignment of the largest element
+					 * in the payload
+					 */
+	int cpu;			/* processor id */
+
+	/* output from ring_buffer_reserve() */
+	struct ring_buffer *buf;	/*
+					 * buffer corresponding to processor id
+					 * for this channel
+					 */
+	size_t slot_size;		/* size of the reserved slot */
+	unsigned long buf_offset;	/* offset following the record header */
+	unsigned long pre_offset;	/*
+					 * Initial offset position _before_
+					 * the record is written. Positioned
+					 * prior to record header alignment
+					 * padding.
+					 */
+	u64 tsc;			/* time-stamp counter value */
+	unsigned int rflags;		/* reservation flags */
+};
+
+/**
+ * ring_buffer_ctx_init - initialize ring buffer context
+ * @ctx: ring buffer context to initialize
+ * @chan: channel
+ * @priv: client private data
+ * @data_size: size of record data payload
+ * @largest_align: largest alignment within data payload types
+ * @cpu: processor id
+ */
+static inline
+void ring_buffer_ctx_init(struct ring_buffer_ctx *ctx,
+			  struct channel *chan, void *priv,
+			  size_t data_size, int largest_align,
+			  int cpu)
+{
+	ctx->chan = chan;
+	ctx->priv = priv;
+	ctx->data_size = data_size;
+	ctx->largest_align = largest_align;
+	ctx->cpu = cpu;
+}
+
+/*
+ * Reservation flags.
+ *
+ * RING_BUFFER_RFLAG_FULL_TSC
+ *
+ * This flag is passed to record_header_size() and to the primitive used to
+ * write the record header. It indicates that the full 64-bit time value is
+ * needed in the record header. If this flag is not set, the record header needs
+ * only to contain "tsc_bits" bit of time value.
+ *
+ * Reservation flags can be added by the client, starting from
+ * "(RING_BUFFER_FLAGS_END << 0)". It can be used to pass information from
+ * record_header_size() to ring_buffer_write_record_header().
+ */
+#define	RING_BUFFER_RFLAG_FULL_TSC		(1U << 0)
+#define RING_BUFFER_RFLAG_END			(1U << 1)
+
+/*
+ * We need to define RING_BUFFER_ALIGN_ATTR so it is known early at
+ * compile-time. We have to duplicate the "config->align" information and the
+ * definition here because config->align is used both in the slow and fast
+ * paths, but RING_BUFFER_ALIGN_ATTR is only available for the client code.
+ */
+#ifdef RING_BUFFER_ALIGN
+# define RING_BUFFER_ALIGN_ATTR		/* Default arch alignment */
+#else
+# define RING_BUFFER_ALIGN_ATTR __attribute__((packed))
+#endif
+
+/*
+ * Calculate the offset needed to align the type.
+ * size_of_type must be non-zero.
+ */
+static inline
+unsigned int ring_buffer_align(const struct ring_buffer_config *config,
+			       size_t align_drift, size_t size_of_type)
+{
+	switch (config->align) {
+	case RING_BUFFER_NATURAL:
+		return offset_align(align_drift, min(sizeof(void *),
+						     size_of_type));
+	case RING_BUFFER_PACKED:
+	default:
+		return 0;
+	}
+}
+
+static inline
+int ring_buffer_get_alignment(const struct ring_buffer_config *config)
+{
+	switch (config->align) {
+	case RING_BUFFER_NATURAL:
+		return sizeof(void *);
+	case RING_BUFFER_PACKED:
+	default:
+		return 0;
+	}
+}
+
+/**
+ * ring_buffer_align_ctx - Align context offset on "alignment"
+ * @config: ring buffer instance configuration.
+ * @ctx: ring buffer context.
+ */
+static inline
+void ring_buffer_align_ctx(const struct ring_buffer_config *config,
+			   struct ring_buffer_ctx *ctx,
+			   size_t alignment)
+{
+	ctx->buf_offset += ring_buffer_align(config, ctx->buf_offset,
+					     alignment);
+}
+
+/*
+ * ring_buffer_check_config() returns 0 on success.
+ * Used internally to check for valid configurations at channel creation.
+ */
+static inline
+int ring_buffer_check_config(const struct ring_buffer_config *config,
+			     unsigned int switch_timer_interval,
+			     unsigned int read_timer_interval)
+{
+	if (config->alloc == RING_BUFFER_ALLOC_GLOBAL
+	    && config->sync == RING_BUFFER_SYNC_PER_CPU
+	    && switch_timer_interval)
+		return -EINVAL;
+	return 0;
+}
+
+#include <linux/ringbuffer/vatomic.h>
+
+#endif /* _LINUX_RING_BUFFER_CONFIG_H */
Index: linux.trees.git/include/linux/ringbuffer/frontend_api.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/frontend_api.h	2010-07-09 18:25:28.000000000 -0400
@@ -0,0 +1,352 @@
+#ifndef _LINUX_RING_BUFFER_FRONTEND_API_H
+#define _LINUX_RING_BUFFER_FRONTEND_API_H
+
+/*
+ * linux/ringbuffer/frontend_api.h
+ *
+ * (C) Copyright 2005-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring Buffer Library Synchronization Header (buffer write API).
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * See ring_buffer_frontend.c for more information on wait-free algorithms.
+ * See linux/ringbuffer/frontend.h for channel allocation and read-side API.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/frontend.h>
+#include <linux/errno.h>
+
+/**
+ * ring_buffer_get_cpu - Precedes ring buffer reserve/commit.
+ *
+ * Disables preemption (acts as a RCU read-side critical section) and keeps a
+ * ring buffer nesting count as supplementary safety net to ensure tracer client
+ * code will never trigger an endless recursion. Returns the processor ID on
+ * success, -EPERM on failure (nesting count too high).
+ *
+ * asm volatile and "memory" clobber prevent the compiler from moving
+ * instructions out of the ring buffer nesting count. This is required to ensure
+ * that probe side-effects which can cause recursion (e.g. unforeseen traps,
+ * divisions by 0, ...) are triggered within the incremented nesting count
+ * section.
+ */
+static inline
+int ring_buffer_get_cpu(const struct ring_buffer_config *config)
+{
+	int cpu, nesting;
+
+	rcu_read_lock_sched_notrace();
+	cpu = smp_processor_id();
+	nesting = ++per_cpu(ring_buffer_nesting, cpu);
+	barrier();
+
+	if (unlikely(nesting > 4)) {
+		WARN_ON_ONCE(1);
+		per_cpu(ring_buffer_nesting, cpu)--;
+		rcu_read_unlock_sched_notrace();
+		return -EPERM;
+	} else
+		return cpu;
+}
+
+/**
+ * ring_buffer_put_cpu - Follows ring buffer reserve/commit.
+ */
+static inline
+void ring_buffer_put_cpu(const struct ring_buffer_config *config)
+{
+	barrier();
+	__get_cpu_var(ring_buffer_nesting)--;
+	rcu_read_unlock_sched_notrace();
+}
+
+/*
+ * ring_buffer_try_reserve is called by ring_buffer_reserve(). It is not part of
+ * the API per se.
+ *
+ * returns 0 if reserve ok, or 1 if the slow path must be taken.
+ */
+static inline
+int ring_buffer_try_reserve(const struct ring_buffer_config *config,
+			    struct ring_buffer_ctx *ctx,
+			    unsigned long *o_begin, unsigned long *o_end,
+			    unsigned long *o_old, size_t *before_hdr_pad)
+{
+	struct channel *chan = ctx->chan;
+	struct ring_buffer *buf = ctx->buf;
+	*o_begin = v_read(config, &buf->offset);
+	*o_old = *o_begin;
+
+	ctx->tsc = ring_buffer_clock_read(chan);
+
+	/*
+	 * Prefetch cacheline for read because we have to read the previous
+	 * commit counter to increment it and commit seq value to compare it to
+	 * the commit counter.
+	 */
+	prefetch(&buf->commit_hot[subbuf_index(*o_begin, chan)]);
+
+	if (last_tsc_overflow(config, buf, ctx->tsc))
+		ctx->rflags = RING_BUFFER_RFLAG_FULL_TSC;
+
+	if (unlikely(subbuf_offset(*o_begin, chan) == 0))
+		return 1;
+
+	ctx->slot_size = record_header_size(config, chan, *o_begin,
+					    ctx->data_size, before_hdr_pad,
+					    ctx->rflags, ctx);
+	ctx->slot_size +=
+		ring_buffer_align(config, *o_begin + ctx->slot_size,
+				  ctx->largest_align) + ctx->data_size;
+	if (unlikely((subbuf_offset(*o_begin, chan) + ctx->slot_size)
+		     > chan->backend.subbuf_size))
+		return 1;
+
+	/*
+	 * Record fits in the current buffer and we are not on a switch
+	 * boundary. It's safe to write.
+	 */
+	*o_end = *o_begin + ctx->slot_size;
+
+	if (unlikely((subbuf_offset(*o_end, chan)) == 0))
+		/*
+		 * The offset_end will fall at the very beginning of the next
+		 * subbuffer.
+		 */
+		return 1;
+
+	return 0;
+}
+
+/**
+ * ring_buffer_reserve - Reserve space in a ring buffer.
+ * @config: ring buffer instance configuration.
+ * @ctx: ring buffer context. (input and output) Must be already initialized.
+ *
+ * Atomic wait-free slot reservation. The reserved space starts at the context
+ * "pre_offset". Its length is "slot_size". The associated time-stamp is "tsc".
+ *
+ * Return : -ENOSPC if not enough space, -EAGAIN if channel is disabled.
+ *          Returns 0 on success.
+ */
+
+static inline
+int ring_buffer_reserve(const struct ring_buffer_config *config,
+			struct ring_buffer_ctx *ctx)
+{
+	struct channel *chan = ctx->chan;
+	struct ring_buffer *buf;
+	unsigned long o_begin, o_end, o_old;
+	size_t before_hdr_pad = 0;
+
+	if (atomic_read(&chan->record_disabled))
+		return -EAGAIN;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		buf = per_cpu_ptr(chan->backend.buf, ctx->cpu);
+	else
+		buf = chan->backend.buf;
+	if (atomic_read(&buf->record_disabled))
+		return -EAGAIN;
+	ctx->buf = buf;
+
+	/*
+	 * Perform retryable operations.
+	 */
+	if (unlikely(ring_buffer_try_reserve(config, ctx, &o_begin,
+					     &o_end, &o_old, &before_hdr_pad)))
+		goto slow_path;
+
+	if (unlikely(v_cmpxchg(config, &ctx->buf->offset, o_old, o_end)
+		     != o_old))
+		goto slow_path;
+
+	/*
+	 * Atomically update last_tsc. This update races against concurrent
+	 * atomic updates, but the race will always cause supplementary full TSC
+	 * record headers, never the opposite (missing a full TSC record header
+	 * when it would be needed).
+	 */
+	save_last_tsc(config, ctx->buf, ctx->tsc);
+
+	/*
+	 * Push the reader if necessary
+	 */
+	ring_buffer_reserve_push_reader(ctx->buf, chan, o_end - 1);
+
+	/*
+	 * Clear noref flag for this subbuffer.
+	 */
+	ring_buffer_clear_noref(config, &ctx->buf->backend,
+				subbuf_index(o_end - 1, chan));
+
+	ctx->pre_offset = o_begin;
+	ctx->buf_offset = o_begin + before_hdr_pad;
+	return 0;
+slow_path:
+	return ring_buffer_reserve_slow(ctx);
+}
+
+/**
+ * ring_buffer_switch - Perform a sub-buffer switch for a per-cpu buffer.
+ * @config: ring buffer instance configuration.
+ * @buf: buffer
+ * @mode: buffer switch mode (SWITCH_ACTIVE or SWITCH_FLUSH)
+ *
+ * This operation is completely reentrant : can be called while tracing is
+ * active with absolutely no lock held.
+ *
+ * Note, however, that as a v_cmpxchg is used for some atomic operations and
+ * requires to be executed locally for per-CPU buffers, this function must be
+ * called from the CPU which owns the buffer for a ACTIVE flush, with preemption
+ * disabled, for RING_BUFFER_SYNC_PER_CPU configuration.
+ */
+static inline
+void ring_buffer_switch(const struct ring_buffer_config *config,
+			struct ring_buffer *buf, enum switch_mode mode)
+{
+	ring_buffer_switch_slow(buf, mode);
+}
+
+/* See ring_buffer_frontend_api.h for ring_buffer_reserve(). */
+
+/**
+ * ring_buffer_commit - Commit an record.
+ * @config: ring buffer instance configuration.
+ * @ctx: ring buffer context. (input arguments only)
+ *
+ * Atomic unordered slot commit. Increments the commit count in the
+ * specified sub-buffer, and delivers it if necessary.
+ */
+static inline
+void ring_buffer_commit(const struct ring_buffer_config *config,
+			const struct ring_buffer_ctx *ctx)
+{
+	struct channel *chan = ctx->chan;
+	struct ring_buffer *buf = ctx->buf;
+	unsigned long offset_end = ctx->buf_offset;
+	unsigned long endidx = subbuf_index(offset_end - 1, chan);
+	unsigned long commit_count;
+
+	/*
+	 * Must count record before incrementing the commit count.
+	 */
+	subbuffer_count_record(config, &buf->backend, endidx);
+
+	/*
+	 * Order all writes to buffer before the commit count update that will
+	 * determine that the subbuffer is full.
+	 */
+	if (config->ipi == RING_BUFFER_IPI_BARRIER) {
+		/*
+		 * Must write slot data before incrementing commit count.  This
+		 * compiler barrier is upgraded into a smp_mb() by the IPI sent
+		 * by get_subbuf().
+		 */
+		barrier();
+	} else
+		smp_wmb();
+
+	v_add(config, ctx->slot_size, &buf->commit_hot[endidx].cc);
+
+	/*
+	 * commit count read can race with concurrent OOO commit count updates.
+	 * This is only needed for ring_buffer_check_deliver (for non-polling
+	 * delivery only) and for ring_buffer_write_commit_counter. The race can
+	 * only cause the counter to be read with the same value more than once,
+	 * which could cause :
+	 * - Multiple delivery for the same sub-buffer (which is handled
+	 *   gracefully by the reader code) if the value is for a full
+	 *   sub-buffer. It's important that we can never miss a sub-buffer
+	 *   delivery. Re-reading the value after the v_add ensures this.
+	 * - Reading a commit_count with a higher value that what was actually
+	 *   added to it for the ring_buffer_write_commit_counter call (again
+	 *   caused by a concurrent committer). It does not matter, because this
+	 *   function is interested in the fact that the commit count reaches
+	 *   back the reserve offset for a specific sub-buffer, which is
+	 *   completely independent of the order.
+	 */
+	commit_count = v_read(config, &buf->commit_hot[endidx].cc);
+
+	ring_buffer_check_deliver(config, buf, chan, offset_end - 1,
+				  commit_count, endidx);
+	/*
+	 * Update used size at each commit. It's needed only for extracting
+	 * ring_buffer buffers from vmcore, after crash.
+	 */
+	ring_buffer_write_commit_counter(config, buf, chan, endidx,
+					 ctx->buf_offset, commit_count,
+					 ctx->slot_size);
+}
+
+/**
+ * ring_buffer_try_discard_reserve - Try discarding a record.
+ * @config: ring buffer instance configuration.
+ * @ctx: ring buffer context. (input arguments only)
+ *
+ * Only succeeds if no other record has been written after the record to
+ * discard. If discard fails, the record must be committed to the buffer.
+ *
+ * Returns 0 upon success, -EPERM if the record cannot be discarded.
+ */
+static inline
+int ring_buffer_try_discard_reserve(const struct ring_buffer_config *config,
+				    const struct ring_buffer_ctx *ctx)
+{
+	struct ring_buffer *buf = ctx->buf;
+	unsigned long end_offset = ctx->pre_offset + ctx->slot_size;
+
+	/*
+	 * We need to ensure that if the cmpxchg succeeds and discards the
+	 * record, the next record will record a full TSC, because it cannot
+	 * rely on the last_tsc associated with the discarded record to detect
+	 * overflows. The only way to ensure this is to set the last_tsc to 0
+	 * (assuming no 64-bit TSC overflow), which forces to write a 64-bit
+	 * timestamp in the next record.
+	 *
+	 * Note: if discard fails, we must leave the TSC in the record header.
+	 * It is needed to keep track of TSC overflows for the following
+	 * records.
+	 */
+	save_last_tsc(config, buf, 0ULL);
+
+	if (likely(v_cmpxchg(config, &buf->offset, end_offset, ctx->pre_offset)
+		   != end_offset))
+		return -EPERM;
+	else
+		return 0;
+}
+
+static inline
+void channel_record_disable(const struct ring_buffer_config *config,
+			    struct channel *chan)
+{
+	atomic_inc(&chan->record_disabled);
+}
+
+static inline
+void channel_record_enable(const struct ring_buffer_config *config,
+			   struct channel *chan)
+{
+	atomic_dec(&chan->record_disabled);
+}
+
+static inline
+void ring_buffer_record_disable(const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	atomic_inc(&buf->record_disabled);
+}
+
+static inline
+void ring_buffer_record_enable(const struct ring_buffer_config *config,
+			       struct ring_buffer *buf)
+{
+	atomic_dec(&buf->record_disabled);
+}
+
+#endif /* _LINUX_RING_BUFFER_FRONTEND_API_H */
Index: linux.trees.git/include/linux/ringbuffer/frontend_internal.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/frontend_internal.h	2010-07-09 18:27:00.000000000 -0400
@@ -0,0 +1,424 @@
+#ifndef _LINUX_RING_BUFFER_FRONTEND_INTERNAL_H
+#define _LINUX_RING_BUFFER_FRONTEND_INTERNAL_H
+
+/*
+ * linux/ringbuffer/frontend_internal.h
+ *
+ * (C) Copyright 2005-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring Buffer Library Synchronization Header (internal helpers).
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * See ring_buffer_frontend.c for more information on wait-free algorithms.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/backend_types.h>
+#include <linux/ringbuffer/frontend_types.h>
+#include <linux/prio_heap.h>	/* For per-CPU read-side iterator */
+
+/* Buffer offset macros */
+
+/* buf_trunc mask selects only the buffer number. */
+static inline
+unsigned long buf_trunc(unsigned long offset, struct channel *chan)
+{
+	return (offset) & (~((chan)->backend.buf_size - 1));
+
+}
+
+/* Select the buffer number value (counter). */
+static inline
+unsigned long buf_trunc_val(unsigned long offset, struct channel *chan)
+{
+	return buf_trunc(offset, chan) >> (chan)->backend.buf_size_order;
+}
+
+/* buf_offset mask selects only the offset within the current buffer. */
+static inline
+unsigned long buf_offset(unsigned long offset, struct channel *chan)
+{
+	return (offset) & ((chan)->backend.buf_size - 1);
+}
+
+/* subbuf_offset mask selects the offset within the current subbuffer. */
+static inline
+unsigned long subbuf_offset(unsigned long offset, struct channel *chan)
+{
+	return (offset) & ((chan)->backend.subbuf_size - 1);
+}
+
+/* subbuf_trunc mask selects the subbuffer number. */
+static inline
+unsigned long subbuf_trunc(unsigned long offset, struct channel *chan)
+{
+	return (offset) & (~((chan)->backend.subbuf_size - 1));
+}
+
+/* subbuf_align aligns the offset to the next subbuffer. */
+static inline
+unsigned long subbuf_align(unsigned long offset, struct channel *chan)
+{
+	return ((offset) + (chan)->backend.subbuf_size)
+	       & (~((chan)->backend.subbuf_size - 1));
+}
+
+/* subbuf_index returns the index of the current subbuffer within the buffer. */
+static inline
+unsigned long subbuf_index(unsigned long offset, struct channel *chan)
+{
+	return buf_offset((offset), chan) >> (chan)->backend.subbuf_size_order;
+}
+
+/*
+ * Last TSC comparison functions. Check if the current TSC overflows tsc_bits
+ * bits from the last TSC read. When overflows are detected, the full 64-bit
+ * timestamp counter should be written in the record header. Reads and writes
+ * last_tsc atomically.
+ */
+
+#if (BITS_PER_LONG == 32)
+static inline
+void save_last_tsc(const struct ring_buffer_config *config,
+		   struct ring_buffer *buf, u64 tsc)
+{
+	if (config->tsc_bits == 0 || config->tsc_bits == 64)
+		return;
+
+	/*
+	 * Ensure the compiler performs this update in a single instruction.
+	 */
+	v_set(config, &buf->last_tsc, (unsigned long)(tsc >> config->tsc_bits));
+}
+
+static inline
+int last_tsc_overflow(const struct ring_buffer_config *config,
+		      struct ring_buffer *buf, u64 tsc)
+{
+	unsigned long tsc_shifted;
+
+	if (config->tsc_bits == 0 || config->tsc_bits == 64)
+		return 0;
+
+	tsc_shifted = (unsigned long)(tsc >> config->tsc_bits);
+	if (unlikely((tsc_shifted
+		      - (unsigned long)v_read(config, &buf->last_tsc))))
+		return 1;
+	else
+		return 0;
+}
+#else
+static inline
+void save_last_tsc(const struct ring_buffer_config *config,
+		   struct ring_buffer *buf, u64 tsc)
+{
+	if (config->tsc_bits == 0 || config->tsc_bits == 64)
+		return;
+
+	v_set(config, &buf->last_tsc, (unsigned long)tsc);
+}
+
+static inline
+int last_tsc_overflow(const struct ring_buffer_config *config,
+		      struct ring_buffer *buf, u64 tsc)
+{
+	if (config->tsc_bits == 0 || config->tsc_bits == 64)
+		return 0;
+
+	if (unlikely((tsc - v_read(config, &buf->last_tsc))
+		     >> config->tsc_bits))
+		return 1;
+	else
+		return 0;
+}
+#endif
+
+extern
+int ring_buffer_reserve_slow(struct ring_buffer_ctx *ctx);
+
+extern
+void ring_buffer_switch_slow(struct ring_buffer *buf,
+			     enum switch_mode mode);
+
+/* Buffer write helpers */
+
+static inline
+void ring_buffer_reserve_push_reader(struct ring_buffer *buf,
+				     struct channel *chan,
+				     unsigned long offset)
+{
+	unsigned long consumed_old, consumed_new;
+
+	do {
+		consumed_old = atomic_long_read(&buf->consumed);
+		/*
+		 * If buffer is in overwrite mode, push the reader consumed
+		 * count if the write position has reached it and we are not
+		 * at the first iteration (don't push the reader farther than
+		 * the writer). This operation can be done concurrently by many
+		 * writers in the same buffer, the writer being at the farthest
+		 * write position sub-buffer index in the buffer being the one
+		 * which will win this loop.
+		 */
+		if (unlikely((subbuf_trunc(offset, chan)
+			      - subbuf_trunc(consumed_old, chan))
+			     >= chan->backend.buf_size))
+			consumed_new = subbuf_align(consumed_old, chan);
+		else
+			return;
+	} while (unlikely(atomic_long_cmpxchg(&buf->consumed, consumed_old,
+					      consumed_new) != consumed_old));
+}
+
+static inline
+void ring_buffer_vmcore_check_deliver(const struct ring_buffer_config *config,
+				      struct ring_buffer *buf,
+				      unsigned long commit_count,
+				      unsigned long idx)
+{
+	if (config->oops == RING_BUFFER_OOPS_CONSISTENCY)
+		v_set(config, &buf->commit_hot[idx].seq, commit_count);
+}
+
+static inline
+int ring_buffer_poll_deliver(const struct ring_buffer_config *config,
+			     struct ring_buffer *buf,
+			     struct channel *chan)
+{
+	unsigned long consumed_old, consumed_idx, commit_count, write_offset;
+
+	consumed_old = atomic_long_read(&buf->consumed);
+	consumed_idx = subbuf_index(consumed_old, chan);
+	commit_count = v_read(config, &buf->commit_cold[consumed_idx].cc_sb);
+	/*
+	 * No memory barrier here, since we are only interested
+	 * in a statistically correct polling result. The next poll will
+	 * get the data is we are racing. The mb() that ensures correct
+	 * memory order is in get_subbuf.
+	 */
+	write_offset = v_read(config, &buf->offset);
+
+	/*
+	 * Check that the subbuffer we are trying to consume has been
+	 * already fully committed.
+	 */
+
+	if (((commit_count - chan->backend.subbuf_size)
+	     & chan->commit_count_mask)
+	    - (buf_trunc(consumed_old, chan)
+	       >> chan->backend.num_subbuf_order)
+	    != 0)
+		return 0;
+
+	/*
+	 * Check that we are not about to read the same subbuffer in
+	 * which the writer head is.
+	 */
+	if ((subbuf_trunc(write_offset, chan)
+	   - subbuf_trunc(consumed_old, chan))
+	   == 0)
+		return 0;
+
+	return 1;
+
+}
+
+static inline
+int ring_buffer_pending_data(const struct ring_buffer_config *config,
+			     struct ring_buffer *buf,
+			     struct channel *chan)
+{
+	return !!subbuf_offset(v_read(config, &buf->offset), chan);
+}
+
+static inline
+unsigned long ring_buffer_get_data_size(const struct ring_buffer_config *config,
+					struct ring_buffer *buf,
+					unsigned long idx)
+{
+	return subbuffer_get_data_size(config, &buf->backend, idx);
+}
+
+/*
+ * Check if all space reservation in a buffer have been committed. This helps
+ * knowing if an execution context is nested (for per-cpu buffers only).
+ * This is a very specific ftrace use-case, so we keep this as "internal" API.
+ */
+static inline
+int ring_buffer_reserve_committed(const struct ring_buffer_config *config,
+				  struct ring_buffer *buf,
+				  struct channel *chan)
+{
+	unsigned long offset, idx, commit_count;
+
+	CHAN_WARN_ON(chan, config->alloc != RING_BUFFER_ALLOC_PER_CPU);
+	CHAN_WARN_ON(chan, config->sync != RING_BUFFER_SYNC_PER_CPU);
+
+	/*
+	 * Read offset and commit count in a loop so they are both read
+	 * atomically wrt interrupts. By deal with interrupt concurrency by
+	 * restarting both reads if the offset has been pushed. Note that given
+	 * we only have to deal with interrupt concurrency here, an interrupt
+	 * modifying the commit count will also modify "offset", so it is safe
+	 * to only check for offset modifications.
+	 */
+	do {
+		offset = v_read(config, &buf->offset);
+		idx = subbuf_index(offset, chan);
+		commit_count = v_read(config, &buf->commit_hot[idx].cc);
+	} while (offset != v_read(config, &buf->offset));
+
+	return ((buf_trunc(offset, chan) >> chan->backend.num_subbuf_order)
+		     - (commit_count & chan->commit_count_mask) == 0);
+}
+
+static inline
+void ring_buffer_check_deliver(const struct ring_buffer_config *config,
+			       struct ring_buffer *buf,
+			       struct channel *chan,
+			       unsigned long offset, unsigned long commit_count,
+			       unsigned long idx)
+{
+	unsigned long old_commit_count = commit_count
+					 - chan->backend.subbuf_size;
+	u64 tsc;
+
+	/* Check if all commits have been done */
+	if (unlikely((buf_trunc(offset, chan) >> chan->backend.num_subbuf_order)
+		     - (old_commit_count & chan->commit_count_mask) == 0)) {
+		/*
+		 * If we succeeded at updating cc_sb below, we are the subbuffer
+		 * writer delivering the subbuffer. Deals with concurrent
+		 * updates of the "cc" value without adding a add_return atomic
+		 * operation to the fast path.
+		 *
+		 * We are doing the delivery in two steps:
+		 * - First, we cmpxchg() cc_sb to the new value
+		 *   old_commit_count + 1. This ensures that we are the only
+		 *   subbuffer user successfully filling the subbuffer, but we
+		 *   do _not_ set the cc_sb value to "commit_count" yet.
+		 *   Therefore, other writers that would wrap around the ring
+		 *   buffer and try to start writing to our subbuffer would
+		 *   have to drop records, because it would appear as
+		 *   non-filled.
+		 *   We therefore have exclusive access to the subbuffer control
+		 *   structures.  This mutual exclusion with other writers is
+		 *   crucially important to perform record overruns count in
+		 *   flight recorder mode locklessly.
+		 * - When we are ready to release the subbuffer (either for
+		 *   reading or for overrun by other writers), we simply set the
+		 *   cc_sb value to "commit_count" and perform delivery.
+		 *
+		 * The subbuffer size is least 2 bytes (minimum size: 1 page).
+		 * This guarantees that old_commit_count + 1 != commit_count.
+		 */
+		if (likely(v_cmpxchg(config, &buf->commit_cold[idx].cc_sb,
+					 old_commit_count, old_commit_count + 1)
+			   == old_commit_count)) {
+			/*
+			 * Start of exclusive subbuffer access. We are
+			 * guaranteed to be the last writer in this subbuffer
+			 * and any other writer trying to access this subbuffer
+			 * in this state is required to drop records.
+			 */
+			tsc = config->cb.ring_buffer_clock_read(chan);
+			v_add(config,
+			      subbuffer_get_records_count(config,
+							  &buf->backend, idx),
+			      &buf->records_count);
+			v_add(config,
+			      subbuffer_count_records_overrun(config,
+							      &buf->backend,
+							      idx),
+			      &buf->records_overrun);
+			config->cb.buffer_end(buf, tsc, idx,
+					      ring_buffer_get_data_size(config,
+									buf,
+									idx));
+
+			/*
+			 * Set noref flag and offset for this subbuffer id.
+			 * Contains a memory barrier that ensures counter stores
+			 * are ordered before set noref and offset.
+			 */
+			ring_buffer_set_noref_offset(config, &buf->backend, idx,
+						buf_trunc_val(offset, chan));
+
+			/*
+			 * Order set_noref and record counter updates before the
+			 * end of subbuffer exclusive access. Orders with
+			 * respect to writers coming into the subbuffer after
+			 * wrap around, and also order wrt concurrent readers.
+			 */
+			smp_mb();
+			/* End of exclusive subbuffer access */
+			v_set(config, &buf->commit_cold[idx].cc_sb,
+			      commit_count);
+			ring_buffer_vmcore_check_deliver(config, buf,
+							 commit_count, idx);
+
+			/*
+			 * RING_BUFFER_WAKEUP_BY_WRITER wakeup is not lock-free.
+			 */
+			if (config->wakeup == RING_BUFFER_WAKEUP_BY_WRITER
+			    && atomic_long_read(&buf->active_readers)
+			    && ring_buffer_poll_deliver(config, buf, chan)) {
+				wake_up_interruptible(&buf->read_wait);
+				wake_up_interruptible(&chan->read_wait);
+			}
+
+		}
+	}
+}
+
+/*
+ * ring_buffer_write_commit_counter
+ *
+ * For flight recording. must be called after commit.
+ * This function increments the subbuffer's commit_seq counter each time the
+ * commit count reaches back the reserve offset (modulo subbuffer size). It is
+ * useful for crash dump.
+ */
+static inline
+void ring_buffer_write_commit_counter(const struct ring_buffer_config *config,
+				      struct ring_buffer *buf,
+				      struct channel *chan,
+				      unsigned long idx,
+				      unsigned long buf_offset,
+				      unsigned long commit_count,
+				      size_t slot_size)
+{
+	unsigned long offset, commit_seq_old;
+
+	if (config->oops != RING_BUFFER_OOPS_CONSISTENCY)
+		return;
+
+	offset = buf_offset + slot_size;
+
+	/*
+	 * subbuf_offset includes commit_count_mask. We can simply
+	 * compare the offsets within the subbuffer without caring about
+	 * buffer full/empty mismatch because offset is never zero here
+	 * (subbuffer header and record headers have non-zero length).
+	 */
+	if (unlikely(subbuf_offset(offset - commit_count, chan)))
+		return;
+
+	commit_seq_old = v_read(config, &buf->commit_hot[idx].seq);
+	while ((long) (commit_seq_old - commit_count) < 0)
+		commit_seq_old = v_cmpxchg(config, &buf->commit_hot[idx].seq,
+					 commit_seq_old, commit_count);
+}
+
+extern int ring_buffer_create(struct ring_buffer *buf,
+			      struct channel_backend *chanb, int cpu);
+extern void ring_buffer_free(struct ring_buffer *buf);
+
+/* Keep track of trap nesting inside ring buffer code */
+DECLARE_PER_CPU(unsigned int, ring_buffer_nesting);
+
+#endif /* _LINUX_RING_BUFFER_FRONTEND_INTERNAL_H */
Index: linux.trees.git/include/linux/ringbuffer/frontend.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/frontend.h	2010-07-09 18:27:45.000000000 -0400
@@ -0,0 +1,191 @@
+#ifndef _LINUX_RING_BUFFER_FRONTEND_H
+#define _LINUX_RING_BUFFER_FRONTEND_H
+
+/*
+ * linux/ringbuffer/frontend.h
+ *
+ * (C) Copyright 2005-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring Buffer Library Synchronization Header (API).
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * See ring_buffer_frontend.c for more information on wait-free algorithms.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/pipe_fs_i.h>
+#include <linux/rcupdate.h>
+#include <linux/smp_lock.h>
+#include <linux/cpumask.h>
+#include <linux/module.h>
+#include <linux/bitops.h>
+#include <linux/splice.h>
+#include <linux/string.h>
+#include <linux/timer.h>
+#include <linux/sched.h>
+#include <linux/cache.h>
+#include <linux/time.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/stat.h>
+#include <linux/cpu.h>
+#include <linux/fs.h>
+
+#include <asm/atomic.h>
+#include <asm/local.h>
+
+/* Internal helpers */
+#include <linux/ringbuffer/frontend_internal.h>
+
+/* Buffer creation/removal and setup operations */
+
+/*
+ * switch_timer_interval is the time interval (in us) to fill sub-buffers with
+ * padding to let readers get those sub-buffers.  Used for live streaming.
+ *
+ * read_timer_interval is the time interval (in us) to wake up pending readers.
+ *
+ * buf_addr is a pointer the the beginning of the preallocated buffer contiguous
+ * address mapping. It is used only by RING_BUFFER_STATIC configuration. It can
+ * be set to NULL for other backends.
+ */
+
+extern
+struct channel *channel_create(const struct ring_buffer_config *config,
+			       const char *name, void *priv,
+			       void *buf_addr,
+			       size_t subbuf_size, size_t num_subbuf,
+			       unsigned int switch_timer_interval,
+			       unsigned int read_timer_interval);
+
+/*
+ * channel_destroy returns the private data pointer. It finalizes all channel's
+ * buffers, waits for readers to release all references, and destroys the
+ * channel.
+ */
+extern
+void *channel_destroy(struct channel *chan);
+
+
+/* Buffer read operations */
+
+/*
+ * Iteration on channel cpumask needs to issue a read barrier to match the write
+ * barrier in cpu hotplug. It orders the cpumask read before read of per-cpu
+ * buffer data. The per-cpu buffer is never removed by cpu hotplug; teardown is
+ * only performed at channel destruction.
+ */
+#define for_each_channel_cpu(cpu, chan)					\
+	for ((cpu) = -1;						\
+		({ (cpu) = cpumask_next((cpu), (chan)->backend.cpumask);\
+		   smp_read_barrier_depends(); (cpu) < nr_cpu_ids; });)
+
+extern struct ring_buffer *channel_get_ring_buffer(
+					const struct ring_buffer_config *config,
+					struct channel *chan, int cpu);
+extern int ring_buffer_open_read(struct ring_buffer *buf);
+extern void ring_buffer_release_read(struct ring_buffer *buf);
+extern int ring_buffer_get_subbuf(struct ring_buffer *buf,
+				  unsigned long *consumed);
+extern void ring_buffer_put_subbuf(struct ring_buffer *buf,
+				   unsigned long consumed);
+
+extern void channel_reset(struct channel *chan);
+extern void ring_buffer_reset(struct ring_buffer *buf);
+
+static inline
+unsigned long ring_buffer_get_offset(const struct ring_buffer_config *config,
+				     struct ring_buffer *buf)
+{
+	return v_read(config, &buf->offset);
+}
+
+static inline
+unsigned long ring_buffer_get_consumed(const struct ring_buffer_config *config,
+				       struct ring_buffer *buf)
+{
+	return atomic_long_read(&buf->consumed);
+}
+
+/*
+ * Must call ring_buffer_is_finalized before reading counters (memory ordering
+ * enforced with respect to trace teardown).
+ */
+static inline
+int ring_buffer_is_finalized(const struct ring_buffer_config *config,
+			     struct ring_buffer *buf)
+{
+	int finalized = ACCESS_ONCE(buf->finalized);
+	/*
+	 * Read finalized before counters.
+	 */
+	smp_rmb();
+	return finalized;
+}
+
+static inline
+unsigned long ring_buffer_get_read_data_size(
+					const struct ring_buffer_config *config,
+					struct ring_buffer *buf)
+{
+	return subbuffer_get_read_data_size(config, &buf->backend);
+}
+
+static inline
+unsigned long ring_buffer_get_records_count(
+				const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	return v_read(config, &buf->records_count);
+}
+
+static inline
+unsigned long ring_buffer_get_records_overrun(
+				const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	return v_read(config, &buf->records_overrun);
+}
+
+static inline
+unsigned long ring_buffer_get_records_lost_full(
+				const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	return v_read(config, &buf->records_lost_full);
+}
+
+static inline
+unsigned long ring_buffer_get_records_lost_wrap(
+				const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	return v_read(config, &buf->records_lost_wrap);
+}
+
+static inline
+unsigned long ring_buffer_get_records_lost_big(
+				const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	return v_read(config, &buf->records_lost_big);
+}
+
+static inline
+unsigned long ring_buffer_get_records_read(
+				const struct ring_buffer_config *config,
+				struct ring_buffer *buf)
+{
+	return v_read(config, &buf->backend.records_read);
+}
+
+static inline
+void *channel_get_private(struct channel *chan)
+{
+	return chan->backend.priv;
+}
+
+#endif /* _LINUX_RING_BUFFER_FRONTEND_H */
Index: linux.trees.git/include/linux/ringbuffer/frontend_types.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/frontend_types.h	2010-07-09 18:13:53.000000000 -0400
@@ -0,0 +1,158 @@
+#ifndef _LINUX_RING_BUFFER_FRONTEND_TYPES_H
+#define _LINUX_RING_BUFFER_FRONTEND_TYPES_H
+
+/*
+ * linux/ringbuffer/frontend_types.h
+ *
+ * (C) Copyright 2005-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring Buffer Library Synchronization Header (types).
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * See ring_buffer_frontend.c for more information on wait-free algorithms.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/backend_types.h>
+#include <linux/prio_heap.h>	/* For per-CPU read-side iterator */
+
+/*
+ * A switch is done during tracing or as a final flush after tracing (so it
+ * won't write in the new sub-buffer).
+ */
+enum switch_mode { SWITCH_ACTIVE, SWITCH_FLUSH };
+
+/* channel-level read-side iterator */
+struct channel_iter {
+	/* Prio heap of buffers. Lowest timestamps at the top. */
+	struct ptr_heap heap;		/* Heap of struct ring_buffer ptrs */
+	struct list_head empty_head;	/* Empty buffers linked-list head */
+	int read_open;			/* Opened for reading ? */
+	u64 last_qs;			/* Last quiescent state timestamp */
+	u64 last_timestamp;		/* Last timestamp (for WARN_ON) */
+	int last_cpu;			/* Last timestamp cpu */
+	/*
+	 * read() file operation state.
+	 */
+	unsigned long len_left;
+};
+
+/* channel: collection of per-cpu ring buffers. */
+struct channel {
+	atomic_t record_disabled;
+	unsigned long commit_count_mask;	/*
+						 * Commit count mask, removing
+						 * the MSBs corresponding to
+						 * bits used to represent the
+						 * subbuffer index.
+						 */
+
+	struct channel_backend backend;		/* Associated backend */
+
+	unsigned long switch_timer_interval;	/* Buffer flush (jiffies) */
+	unsigned long read_timer_interval;	/* Reader wakeup (jiffies) */
+	struct notifier_block cpu_hp_notifier;	/* CPU hotplug notifier */
+	struct notifier_block idle_notifier;	/* CPU idle notifier */
+	struct notifier_block hp_iter_notifier;	/* hotplug iterator notifier */
+	int cpu_hp_enable:1;			/* Enable CPU hotplug notif. */
+	int hp_iter_enable:1;			/* Enable hp iter notif. */
+	wait_queue_head_t read_wait;		/* reader wait queue */
+	struct channel_iter iter;		/* Channel read-side iterator */
+	atomic_long_t read_ref;			/* Reader reference count */
+};
+
+/* Per-subbuffer commit counters used on the hot path */
+struct commit_counters_hot {
+	union v_atomic cc;		/* Commit counter */
+	union v_atomic seq;		/* Consecutive commits */
+};
+
+/* Per-subbuffer commit counters used only on cold paths */
+struct commit_counters_cold {
+	union v_atomic cc_sb;		/* Incremented _once_ at sb switch */
+};
+
+/* Per-buffer read iterator */
+struct ring_buffer_iter {
+	u64 timestamp;			/* Current record timestamp */
+	size_t header_len;		/* Current record header length */
+	size_t payload_len;		/* Current record payload length */
+
+	struct list_head empty_node;	/* Linked list of empty buffers */
+	unsigned long consumed, read_offset, data_size;
+	enum {
+		ITER_GET_SUBBUF = 0,
+		ITER_TEST_RECORD,
+		ITER_NEXT_RECORD,
+		ITER_PUT_SUBBUF,
+	} state;
+	int allocated:1;
+	int read_open:1;		/* Opened for reading ? */
+};
+
+/* ring buffer state */
+struct ring_buffer {
+	/* First 32 bytes cache-hot cacheline */
+	union v_atomic offset;		/* Current offset in the buffer */
+	struct commit_counters_hot *commit_hot;
+					/* Commit count per sub-buffer */
+	atomic_long_t consumed;		/*
+					 * Current offset in the buffer
+					 * standard atomic access (shared)
+					 */
+	atomic_t record_disabled;
+	/* End of first 32 bytes cacheline */
+	union v_atomic last_tsc;	/*
+					 * Last timestamp written in the buffer.
+					 */
+
+	struct ring_buffer_backend backend;	/* Associated backend */
+
+	struct commit_counters_cold *commit_cold;
+					/* Commit count per sub-buffer */
+	atomic_long_t active_readers;	/*
+					 * Active readers count
+					 * standard atomic access (shared)
+					 */
+					/* Dropped records */
+	union v_atomic records_lost_full;	/* Buffer full */
+	union v_atomic records_lost_wrap;	/* Nested wrap-around */
+	union v_atomic records_lost_big;	/* Events too big */
+	union v_atomic records_count;	/* Number of records written */
+	union v_atomic records_overrun;	/* Number of overwritten records */
+	wait_queue_head_t read_wait;	/* reader buffer-level wait queue */
+	int finalized;			/* buffer has been finalized */
+	struct timer_list switch_timer;	/* timer for periodical switch */
+	struct timer_list read_timer;	/* timer for read poll */
+	raw_spinlock_t raw_idle_spinlock;	/* Idle entry lock/trylock */
+	struct ring_buffer_iter iter;	/* read-side iterator */
+};
+
+/*
+ * Issue warnings and disable channels upon internal error.
+ * Can receive struct ring_buffer or struct ring_buffer_backend parameters.
+ */
+#define CHAN_WARN_ON(c, cond)						\
+	({								\
+		struct channel *__chan;					\
+		int _____ret = unlikely(cond);				\
+		if (_____ret) {						\
+			if (__same_type(*(c), struct channel_backend))	\
+				__chan = container_of((void *) (c),	\
+							struct channel, \
+							backend);	\
+			else if (__same_type(*(c), struct channel))	\
+				__chan = (void *) (c);			\
+			else						\
+				BUG_ON(1);				\
+			atomic_inc(&__chan->record_disabled);		\
+			WARN_ON(1);					\
+		}							\
+		_____ret;						\
+	})
+
+#endif /* _LINUX_RING_BUFFER_FRONTEND_TYPES_H */
Index: linux.trees.git/include/linux/ringbuffer/vatomic.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/vatomic.h	2010-07-09 18:28:22.000000000 -0400
@@ -0,0 +1,85 @@
+#ifndef _LINUX_RING_BUFFER_VATOMIC_H
+#define _LINUX_RING_BUFFER_VATOMIC_H
+
+/*
+ * linux/ringbuffer/vatomic.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <asm/atomic.h>
+#include <asm/local.h>
+
+/*
+ * Same data type (long) accessed differently depending on configuration.
+ * v field is for non-atomic access (protected by mutual exclusion).
+ * In the fast-path, the ring_buffer_config structure is constant, so the
+ * compiler can statically select the appropriate branch.
+ * local_t is used for per-cpu and per-thread buffers.
+ * atomic_long_t is used for globally shared buffers.
+ */
+union v_atomic {
+	local_t l;
+	atomic_long_t a;
+	long v;
+};
+
+static inline
+long v_read(const struct ring_buffer_config *config, union v_atomic *v_a)
+{
+	if (config->sync == RING_BUFFER_SYNC_PER_CPU)
+		return local_read(&v_a->l);
+	else
+		return atomic_long_read(&v_a->a);
+}
+
+static inline
+void v_set(const struct ring_buffer_config *config, union v_atomic *v_a,
+	   long v)
+{
+	if (config->sync == RING_BUFFER_SYNC_PER_CPU)
+		local_set(&v_a->l, v);
+	else
+		atomic_long_set(&v_a->a, v);
+}
+
+static inline
+void v_add(const struct ring_buffer_config *config, long v, union v_atomic *v_a)
+{
+	if (config->sync == RING_BUFFER_SYNC_PER_CPU)
+		local_add(v, &v_a->l);
+	else
+		atomic_long_add(v, &v_a->a);
+}
+
+static inline
+void v_inc(const struct ring_buffer_config *config, union v_atomic *v_a)
+{
+	if (config->sync == RING_BUFFER_SYNC_PER_CPU)
+		local_inc(&v_a->l);
+	else
+		atomic_long_inc(&v_a->a);
+}
+
+/*
+ * Non-atomic decrement. Only used by reader, apply to reader-owned subbuffer.
+ */
+static inline
+void _v_dec(const struct ring_buffer_config *config, union v_atomic *v_a)
+{
+	--v_a->v;
+}
+
+static inline
+long v_cmpxchg(const struct ring_buffer_config *config, union v_atomic *v_a,
+	       long old, long _new)
+{
+	if (config->sync == RING_BUFFER_SYNC_PER_CPU)
+		return local_cmpxchg(&v_a->l, old, _new);
+	else
+		return atomic_long_cmpxchg(&v_a->a, old, _new);
+}
+
+#endif /* _LINUX_RING_BUFFER_VATOMIC_H */


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 14/20] Ring buffer library - documentation
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (12 preceding siblings ...)
  2010-07-09 22:57 ` [patch 13/20] ring buffer frontend Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 15/20] Ring buffer library - VFS operations Mathieu Desnoyers
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring-buffer-documentation.patch --]
[-- Type: text/plain, Size: 16345 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 Documentation/ring-buffer/ring-buffer-design.txt |   78 ++++++
 Documentation/ring-buffer/ring-buffer-usage.txt  |  260 +++++++++++++++++++++++
 2 files changed, 338 insertions(+)

Index: linux.trees.git/Documentation/ring-buffer/ring-buffer-design.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/Documentation/ring-buffer/ring-buffer-design.txt	2010-07-02 12:34:02.000000000 -0400
@@ -0,0 +1,78 @@
+                        Ring Buffer Library Design
+
+                            Mathieu Desnoyers
+
+
+This document explains Linux Kernel Ring Buffer library.
+
+
+* Purpose of the ring buffer library
+
+Tracing: the main purpose of the ring buffer library is to perform tracing
+efficiently by providing an efficient ring buffer to transport trace data.
+
+Fast fifo queue for drivers: this library is meant to be generic enough to meet
+the requirements of audio, video and other drivers to provide an easy-to-use,
+yet efficient, buffering API.
+
+Lock-free write-side: the main advantage of this ring buffer implementation is
+that it provides non-blocking synchronization for the writer context. It
+furthermore provides a bounded write-side execution time for real-time
+applications. The per-CPU buffer configuration is wait-free. The global buffer
+configuration is lock-free. (wait-free is a stronger progress guarantee than
+lock-free.)
+
+
+* Semantic
+
+The execution context writing to the ring buffer is hereby called "producer" (or
+writer) and the thread reading the ring buffer content is called "consumer" (or
+reader). Each instance of either per-cpu or global ring buffers is called a
+"channel". A buffer is divided into subbuffers, which are synchronization points
+in the buffers (sometimes referred to as periods in the audio world). Each item
+stored in the ring buffer is called a "record". Both subbuffers and records
+may start with a "header". Records can also contain a variable-sized payload.
+
+The ring buffer supports two write modes. The "discard" mode drops data when the
+ring buffer is full. The "overwrite" (a.k.a. flight recorder) mode overwrites
+the oldest information when the ring buffer is full.
+
+Iterators are one way to consume data from the ring buffer. They allow a reader
+thread to read records one by one in the order they were written, either on a
+per-buffer or per-channel basis. Other ways to consume data are by using file
+descriptors which provide access to raw subbuffer content through, e.g.,
+splice() or mmap().
+
+
+* Programmer Interfaces
+
+The library presents a high-level interface that allows programmers to easily
+create and use a ring buffer instance. It also provides a more advanced client
+configuration API for clients with more elaborate needs (e.g. tracers).
+
+
+* Advanced client configuration options
+
+The options listed in the linux/ringbuffer/config.h header are tailored for ring
+buffer "clients" (a kernel object using the ring buffer library through its
+advanced options API) with more specific needs. The clients must set up a
+"static const" ring_buffer_config structure in which all options are spelled
+out. Given that this structure is known to be immutable, compiler optimizations
+can optimize away all the unneeded code from the library inline fast paths. The
+slow paths, however, dynamically select the correct code depending on the
+ring_buffer_config structure received as parameter. This saves space by sharing
+the slow path code between all ring buffer clients.
+
+
+* Frontend/backend layered design
+
+The ring buffer is made of two main layers: a frontend and a backend. The
+"frontend" locklessly manages space reservation within the buffer. It also
+manages timers, idle and cpu hotplug. The "backend" manages the memory backend
+used to allocate the buffers. It deals with subbuffer exchanges between the
+consumer and the producer in overwrite mode. Currently, only a page-based
+backend is implemented (RING_BUFFER_PAGE), but other backends are planned for
+the future: statically allocated backends (RING_BUFFER_STATIC) and vmap-based
+backends (RING_BUFFER_VMAP). These will allow, for instance, tracers to write
+trace data in a physically contiguous memory region allocated at boot time, or
+to write data in video card memory for crash reports.
Index: linux.trees.git/Documentation/ring-buffer/ring-buffer-usage.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/Documentation/ring-buffer/ring-buffer-usage.txt	2010-07-02 12:35:20.000000000 -0400
@@ -0,0 +1,260 @@
+		        Ring Buffer Library Usage
+
+			    Mathieu Desnoyers
+
+
+This document explains how to use the Linux Kernel Ring Buffer Library.
+
+The library presents a high-level interface that allows programmers to easily
+create and use a ring buffer instance. It also provides a more advanced client
+configuration API for clients with more elaborate needs (e.g. tracers).
+
+
+* Basic ring buffer configurations
+
+  The basic high-level configurations offered are pre-built clients with the
+following configuration selections under include/linux/ringbuffer/.
+
+  * The write-side (data producer) APIs are available in:
+
+    - global_overwrite.h:
+        global buffer, overwrite mode, channel-wide record iterator
+
+    - global_discard.h:
+        global buffer, discard mode, channel-wide record iterator
+
+    - percpu_overwrite.h:
+        per-cpu buffers, overwrite mode, channel-wide record iterator
+
+    - percpu_discard.h:
+        per-cpu buffers, discard mode, channel-wide record iterator
+
+    - percpu_local_overwrite.h:
+        per-cpu buffers, overwrite mode, per-cpu buffer record iterator
+
+    - percpu_local_discard.h:
+        per-cpu buffers, discard mode, per-cpu buffer record iterator
+
+  Typical use-case of the ring buffer write-side:
+
+    1) create
+    2) multiple calls to the write primitive.
+    3) destroy
+
+
+  * The read-side (data consumer) iterator APIs are available in:
+
+  - iterator.h
+
+    These iterators allow to iterate on records either on a per-cpu buffer or
+    channel-wide basis.
+
+    Typical life-span of a reader using the file descriptor read() iterator:
+
+    (in user-space)
+    # cat /path_to_file/filename
+
+    Typical life-span of a reader using the in-kernel API:
+
+    1) iterator_open()
+    2) get_next_record and read_current_record until get_next_record returns
+       -ENODATA. -EAGAIN means there is currently no data, but there might be
+       more data coming in the future.
+    3) iterator_close()
+
+
+* Advanced client configurations
+
+  * Advanced client configuration options
+
+  More options are available for clients with more advanced needs. These options
+are listed in the linux/ringbuffer/config.h header. A ring buffer "client" (a
+kernel object using the ring buffer library through its advanced options API)
+must set up a "static const" ring_buffer_config structure in which all options
+are spelled out.
+
+The pre-built basic configurations presented in the above set these advanced
+configuration options to values typically correct for driver use.
+
+A client using the advanced configuration options must first include
+linux/ringbuffer/config.h, declare its configuration structure, declare the
+required static inline functions used by the fast-paths, and then include
+linux/ringbuffer/api.h.
+
+The struct ring_buffer_config options are:
+
+  * alloc: RING_BUFFER_ALLOC_PER_CPU / RING_BUFFER_ALLOC_GLOBAL
+
+    Selects either global buffer or per-cpu ring buffers.
+
+  * sync: RING_BUFFER_SYNC_PER_CPU / RING_BUFFER_SYNC_GLOBAL
+
+    Selects which synchronization primitives must be used. Either expect
+    concurrency from other processors, or expect to only have concurrency with
+    the local processor. Separated from the "alloc" option because per-thread
+    buffers would fit in the "global alloc, per-cpu sync". Similarly, per-cpu
+    buffers written to with preemption enabled would fit in the "per-cpu
+    alloc, global sync" category, because migration could lead to a concurrent
+    write into a remote cpu buffer.
+
+  * mode: RING_BUFFER_OVERWRITE / RING_BUFFER_DISCARD
+
+    Either overwrite oldest subbuffers when buffer is full, or discard events.
+
+  * align: RING_BUFFER_NATURAL / RING_BUFFER_PACKED
+
+    Natural alignment aligns record headers on their natural alignment on the
+    architecture. It also aligns record payload on their natural alignment
+    (similarly to a C structure). The packed option does not perform any
+    alignment for record header and payloads. It corresponds to the "packed" gcc
+    type attribute.
+
+  * output:
+
+      RING_BUFFER_SPLICE:   Output raw subbuffers through per-buffer file
+                            descriptors with splice(). The read-side
+                            synchronization needed to select the current
+                            subbuffer is performed with ioctl().
+
+      RING_BUFFER_MMAP:     Output raw subbuffers through per-buffer memory
+                            mapped file descriptors. Read-side synchronization
+                            to select the current subbuffer is performed with
+                            ioctl().
+
+      RING_BUFFER_READ:     Output raw subbuffers through per-buffer file
+                            descriptors with read(). The read-side
+                            synchronization needed to select the current
+                            subbuffer is performed with ioctl().
+                            (unimplemented)
+
+      RING_BUFFER_ITERATOR: Iterators allow a reader thread to read records one
+                            by one in the order they were written, either on a
+                            per-buffer or per-channel basis.
+
+      RING_BUFFER_NONE:     No output provided by the library is used.
+
+  * backend:
+
+      RING_BUFFER_PAGE:     The memory backend used to hold the ring buffers is
+                            made of non-contiguous pages. A software-controlled
+                            "subbuffer table" indexes the pages. It allows
+                            sub-buffer exchange between the producer and
+                            consumer in overwrite mode.
+
+      RING_BUFFER_VMAP:     A vmap'd virtually contiguous memory area is used as
+                            memory backend. (unimplemented)
+
+      RING_BUFFER_STATIC:   A physically contiguous memory area is used as
+                            memory backend. e.g. memory allocated at early boot,
+                            or video card memory. (unimplemented)
+
+  * oops:
+        Select "oops" consistency if you plan to read from the ring buffer
+        after a kernel oops occurred. This is useful if you plan to use the
+        ring buffer data in a crash report. Adds a slight performance overhead
+        to keep track of how much contiguous data has been written in the
+        current subbuffer.
+
+  * ipi:
+        The IPI_BARRIER scheme issues IPIs when the consumer needs to grab a
+        sub-buffer. It issues the appropriate memory barriers on the writer
+        CPU(s). It is therefore possible to turn the memory barrier in the
+        commit fast-path into a simple compiler barrier, thus improving
+        performances. This scheme is recommended when both per-cpu allocation
+        and synchronization are used. This scheme is not recommended for
+        "global" buffers, because it would involve sending IPIs to all
+        processors.
+
+  * wakeup:
+        The option "RING_BUFFER_WAKEUP_BY_TIMER" reduces intrusiveness in
+        the writer code and guarantees wait-free/lock-free write primitives
+        by performing lazy reader wakeups in a periodic deferrable timer and
+        hooking into cpu idle notifiers. This option makes tracer code more
+        robust at the expense of additional data delivery delay.
+        Use in combination with "read_timer_interval" channel_create()
+        argument.
+                - Note: CPU idle notifiers are not implemented for all
+                  architectures at the moment. The deferrable timer delays can
+                  only expected to be met by architectures with idle notifiers.
+       RING_BUFFER_WAKEUP_BY_WRITER option specifies that the ring buffer
+       write-side must perform reader wakeups at each sub-buffer boundary.
+       RING_BUFFER_WAKEUP_NONE does not perform any wakeup whatsoever. The
+       client has the responsibility to perform wakeups.
+
+  * tsc_bits:
+        Timestamp compression scheme setting. 0 means that no timestamps
+        are used; 64 means that full 64-bit timestamps are written with
+        each record. For any value between 1 and 63, the ring buffer
+        library will set the RING_BUFFER_RFLAG_FULL_TSC bit in the
+        "rflags" ring_buffer_ctx field, which is also passed as parameter
+        passed to the "record_header_size()" callback to inform the client
+        that a full 64-bit timestamp is needed due to a "tsc_bits"
+        overflow since the last record.
+
+Some options are passed as parameter to channel_create():
+
+  * subbuf_size:
+        Size of a sub-buffer within a ring buffer. Extra synchronization is
+	performed when the data producer crosses sub-buffer boundaries. This
+        corresponds to "periods" in audio buffers. The maximum record size is
+        limited by the sub-buffer size. The minimum sub-buffer size is 1 page.
+
+  * num_subbuf:
+        Number of sub-buffers per buffer. Typically, using at least 2
+        sub-buffers is recommended to minimize record discards.
+
+  * switch_timer_interval:
+        The switch timer interval configures the periodical deferrable
+        timer which handles periodical buffer switch. It is used to make
+        data readily available for consumption periodically for live data
+	streaming. A buffer switch is a synchronization point between the data
+        producers and consumer.
+
+  * read_timer_interval:
+        The read timer interval is the time interval (in us) to wake up pending
+        readers.
+
+* Advanced client callbacks
+
+  These callbacks are configured by the cb field of the ring_buffer_config
+structure. They are provided to the ring buffer by the client. For both
+ring_buffer_clock_read() and record_header_size(), inline versions must also be
+provided before inclusion of linux/ringbuffer/api.h.
+
+  * ring_buffer_clock_read():
+        Returns the current ring buffer clock source time (64-bit value).
+
+  * record_header_size():
+        Returns the size of the current record size, including record header
+        size. It uses the "rflags" parameter to determine if a full 64-bit
+        timestamp is required or if "tsc_bits" bits are enough to represent the
+        current time and detect "tsc_bits"-bit overflow. The offset received as
+        parameter is relative to a page boundary, which allows alignment
+        calculation. data_size is the size of the event payload.
+        "pre_header_padding" can be set by record_header_size() to the amount of
+        padding required to align the record header (considered to be 0 if
+        unset).
+
+  * subbuffer_header_size():
+        Returns the size of the subbuffer header.
+
+  * buffer_begin():
+        Callback executed when crossing a sub-buffer boundary, when starting to
+        write into the sub-buffer.
+
+  * buffer_end():
+        Callback executed when crossing a sub-buffer boundary, before delivering
+        a sub-buffer. Has exclusive sub-buffer access when called; meaning that
+        no concurrent commits are left, no reader can access the sub-buffer, no
+        concurrent writers are allowed to overwrite the sub-buffer.
+
+  * buffer_create():
+        This callback is executed upon creation of a buffer, either at channel
+        creation, or at CPU hotplug.
+
+  * buffer_finalize():
+        Callback executed upon channel finalize, performed by channel_destroy().
+
+  * record_get():
+        Reader helper provided by the client, which can be used to extract the
+        record header from a record in the buffer.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 15/20] Ring buffer library - VFS operations
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (13 preceding siblings ...)
  2010-07-09 22:57 ` [patch 14/20] Ring buffer library - documentation Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 16/20] Ring buffer library - client sample Mathieu Desnoyers
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring-buffer-vfs.patch --]
[-- Type: text/plain, Size: 19659 bytes --]

File operation supports for ring buffer reader. splice() and mmap().

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 Documentation/ioctl/ioctl-number.txt |    2 
 include/linux/ringbuffer/vfs.h       |   57 +++++++
 lib/ringbuffer/Makefile              |    3 
 lib/ringbuffer/ring_buffer_mmap.c    |  115 +++++++++++++++
 lib/ringbuffer/ring_buffer_splice.c  |  190 +++++++++++++++++++++++++
 lib/ringbuffer/ring_buffer_vfs.c     |  257 +++++++++++++++++++++++++++++++++++
 6 files changed, 624 insertions(+)

Index: linux.trees.git/lib/ringbuffer/Makefile
===================================================================
--- linux.trees.git.orig/lib/ringbuffer/Makefile	2010-07-09 18:13:53.000000000 -0400
+++ linux.trees.git/lib/ringbuffer/Makefile	2010-07-09 18:29:10.000000000 -0400
@@ -1,2 +1,5 @@
 obj-y += ring_buffer_backend.o
 obj-y += ring_buffer_frontend.o
+obj-y += ring_buffer_vfs.o
+obj-y += ring_buffer_splice.o
+obj-y += ring_buffer_mmap.o
Index: linux.trees.git/lib/ringbuffer/ring_buffer_splice.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_splice.c	2010-07-09 18:30:04.000000000 -0400
@@ -0,0 +1,190 @@
+/*
+ * ring_buffer_splice.c
+ *
+ * Copyright (C) 2002-2005 - Tom Zanussi <zanussi@us.ibm.com>, IBM Corp
+ * Copyright (C) 1999-2005 - Karim Yaghmour <karim@opersys.com>
+ * Copyright (C) 2008-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Re-using content from kernel/relay.c.
+ *
+ * This file is released under the GPL v2.
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+
+#include <linux/ringbuffer/backend.h>
+#include <linux/ringbuffer/frontend.h>
+#include <linux/ringbuffer/vfs.h>
+
+#if 0
+#define printk_dbg(fmt, args...) printk(fmt, args)
+#else
+#define printk_dbg(fmt, args...)
+#endif
+
+loff_t ring_buffer_no_llseek(struct file *file, loff_t offset, int origin)
+{
+	return -ESPIPE;
+}
+
+/*
+ * Release pages from the buffer so splice pipe_to_file can move them.
+ * Called after the pipe has been populated with buffer pages.
+ */
+static void ring_buffer_pipe_buf_release(struct pipe_inode_info *pipe,
+					 struct pipe_buffer *pbuf)
+{
+	__free_page(pbuf->page);
+}
+
+static const struct pipe_buf_operations ring_buffer_pipe_buf_ops = {
+	.can_merge = 0,
+	.map = generic_pipe_buf_map,
+	.unmap = generic_pipe_buf_unmap,
+	.confirm = generic_pipe_buf_confirm,
+	.release = ring_buffer_pipe_buf_release,
+	.steal = generic_pipe_buf_steal,
+	.get = generic_pipe_buf_get,
+};
+
+/*
+ * Page release operation after splice pipe_to_file ends.
+ */
+static void ring_buffer_page_release(struct splice_pipe_desc *spd,
+				     unsigned int i)
+{
+	__free_page(spd->pages[i]);
+}
+
+/*
+ *	subbuf_splice_actor - splice up to one subbuf's worth of data
+ */
+static int subbuf_splice_actor(struct file *in,
+			       loff_t *ppos,
+			       struct pipe_inode_info *pipe,
+			       size_t len,
+			       unsigned int flags)
+{
+	struct ring_buffer *buf = in->private_data;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned int poff, subbuf_pages, nr_pages;
+	struct page *pages[PIPE_DEF_BUFFERS];
+	struct partial_page partial[PIPE_DEF_BUFFERS];
+	struct splice_pipe_desc spd = {
+		.pages = pages,
+		.nr_pages = 0,
+		.partial = partial,
+		.flags = flags,
+		.ops = &ring_buffer_pipe_buf_ops,
+		.spd_release = ring_buffer_page_release,
+	};
+	unsigned long consumed_old, consumed_idx, roffset;
+	unsigned long bytes_avail;
+
+	/*
+	 * Check that a GET_SUBBUF ioctl has been done before.
+	 */
+	WARN_ON(atomic_long_read(&buf->active_readers) != 1);
+	consumed_old = ring_buffer_get_consumed(config, buf);
+	consumed_old += *ppos;
+	consumed_idx = subbuf_index(consumed_old, chan);
+
+	/*
+	 * Adjust read len, if longer than what is available.
+	 * Max read size is 1 subbuffer due to get_subbuf/put_subbuf for
+	 * protection.
+	 */
+	bytes_avail = chan->backend.subbuf_size;
+	WARN_ON(bytes_avail > chan->backend.buf_size);
+	len = min_t(size_t, len, bytes_avail);
+	subbuf_pages = bytes_avail >> PAGE_SHIFT;
+	nr_pages = min_t(unsigned int, subbuf_pages, PIPE_DEF_BUFFERS);
+	roffset = consumed_old & PAGE_MASK;
+	poff = consumed_old & ~PAGE_MASK;
+	printk_dbg(KERN_DEBUG "SPLICE actor len %zu pos %zd write_pos %ld\n",
+		   len, (ssize_t)*ppos, ring_buffer_get_offset(config, buf));
+
+	for (; spd.nr_pages < nr_pages; spd.nr_pages++) {
+		unsigned int this_len;
+		struct page **page, *new_page;
+		void **virt;
+
+		if (!len)
+			break;
+		printk_dbg(KERN_DEBUG "SPLICE actor loop len %zu roffset %ld\n",
+			   len, roffset);
+
+		/*
+		 * We have to replace the page we are moving into the splice
+		 * pipe.
+		 */
+		new_page = alloc_pages_node(cpu_to_node(max(buf->backend.cpu,
+							    0)),
+					    GFP_KERNEL | __GFP_ZERO, 0);
+		if (!new_page)
+			break;
+
+		this_len = PAGE_SIZE - poff;
+		page = ring_buffer_read_get_page(&buf->backend, roffset, &virt);
+		spd.pages[spd.nr_pages] = *page;
+		*page = new_page;
+		*virt = page_address(new_page);
+		spd.partial[spd.nr_pages].offset = poff;
+		spd.partial[spd.nr_pages].len = this_len;
+
+		poff = 0;
+		roffset += PAGE_SIZE;
+		len -= this_len;
+	}
+
+	if (!spd.nr_pages)
+		return 0;
+
+	return splice_to_pipe(pipe, &spd);
+}
+
+ssize_t ring_buffer_splice_read(struct file *in, loff_t *ppos,
+				struct pipe_inode_info *pipe, size_t len,
+				unsigned int flags)
+{
+	struct ring_buffer *buf = in->private_data;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	ssize_t spliced;
+	int ret;
+
+	if (config->output != RING_BUFFER_SPLICE)
+		return -EINVAL;
+
+	ret = 0;
+	spliced = 0;
+
+	printk_dbg(KERN_DEBUG "SPLICE read len %zu pos %zd\n", len,
+		   (ssize_t)*ppos);
+	while (len && !spliced) {
+		ret = subbuf_splice_actor(in, ppos, pipe, len, flags);
+		printk_dbg(KERN_DEBUG "SPLICE read loop ret %d\n", ret);
+		if (ret < 0)
+			break;
+		else if (!ret) {
+			if (flags & SPLICE_F_NONBLOCK)
+				ret = -EAGAIN;
+			break;
+		}
+
+		*ppos += ret;
+		if (ret > len)
+			len = 0;
+		else
+			len -= ret;
+		spliced += ret;
+	}
+
+	if (spliced)
+		return spliced;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_splice_read);
Index: linux.trees.git/Documentation/ioctl/ioctl-number.txt
===================================================================
--- linux.trees.git.orig/Documentation/ioctl/ioctl-number.txt	2010-07-09 18:08:14.000000000 -0400
+++ linux.trees.git/Documentation/ioctl/ioctl-number.txt	2010-07-09 18:29:10.000000000 -0400
@@ -320,4 +320,6 @@ Code  Seq#(hex)	Include File		Comments
 					<mailto:thomas@winischhofer.net>
 0xF4	00-1F	video/mbxfb.h		mbxfb
 					<mailto:raph@8d.com>
+0xF6	00-3F	lib/ringbuffer/ring_buffer_vfs.h	Ring Buffer Library
+					<mailto:mathieu.desnoyers@efficios.com>
 0xFD	all	linux/dm-ioctl.h
Index: linux.trees.git/lib/ringbuffer/ring_buffer_mmap.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_mmap.c	2010-07-09 18:29:10.000000000 -0400
@@ -0,0 +1,115 @@
+/*
+ * ring_buffer_mmap.c
+ *
+ * Copyright (C) 2002-2005 - Tom Zanussi <zanussi@us.ibm.com>, IBM Corp
+ * Copyright (C) 1999-2005 - Karim Yaghmour <karim@opersys.com>
+ * Copyright (C) 2008-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Re-using content from kernel/relay.c.
+ *
+ * This file is released under the GPL v2.
+ */
+
+#include <linux/module.h>
+#include <linux/mm.h>
+
+#include <linux/ringbuffer/backend.h>
+#include <linux/ringbuffer/frontend.h>
+#include <linux/ringbuffer/vfs.h>
+
+/*
+ * fault() vm_op implementation for ring buffer file mapping.
+ */
+static int ring_buffer_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct ring_buffer *buf = vma->vm_private_data;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	pgoff_t pgoff = vmf->pgoff;
+	struct page **page;
+	void **virt;
+	unsigned long offset, sb_bindex;
+
+	if (!buf)
+		return VM_FAULT_OOM;
+
+	/*
+	 * Verify that faults are only done on the range of pages owned by the
+	 * reader.
+	 */
+	offset = pgoff << PAGE_SHIFT;
+	sb_bindex = subbuffer_id_get_index(config, buf->backend.buf_rsb.id);
+	if (!(offset >= buf->backend.array[sb_bindex]->mmap_offset
+	      && offset < buf->backend.array[sb_bindex]->mmap_offset +
+			  buf->backend.chan->backend.subbuf_size))
+		return VM_FAULT_SIGBUS;
+	/*
+	 * ring_buffer_read_get_page() gets the page in the current reader's
+	 * pages.
+	 */
+	page = ring_buffer_read_get_page(&buf->backend, offset, &virt);
+	if (!*page)
+		return VM_FAULT_SIGBUS;
+	get_page(*page);
+	vmf->page = *page;
+
+	return 0;
+}
+
+/*
+ * vm_ops for relay file mappings.
+ */
+static const struct vm_operations_struct ring_buffer_mmap_ops = {
+	.fault = ring_buffer_fault,
+};
+
+/**
+ *	ring_buffer_mmap_buf: - mmap channel buffer to process address space
+ *	@buf: ring buffer to map
+ *	@vma: vm_area_struct describing memory to be mapped
+ *
+ *	Returns 0 if ok, negative on error
+ *
+ *	Caller should already have grabbed mmap_sem.
+ */
+static int ring_buffer_mmap_buf(struct ring_buffer *buf,
+				struct vm_area_struct *vma)
+{
+	unsigned long length = vma->vm_end - vma->vm_start;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	unsigned long mmap_buf_len;
+
+	if (config->output != RING_BUFFER_MMAP)
+		return -EINVAL;
+
+	if (!buf)
+		return -EBADF;
+
+	mmap_buf_len = chan->backend.buf_size;
+	if (chan->backend.extra_reader_sb)
+		mmap_buf_len += chan->backend.subbuf_size;
+
+	if (length != mmap_buf_len)
+		return -EINVAL;
+
+	vma->vm_ops = &ring_buffer_mmap_ops;
+	vma->vm_flags |= VM_DONTEXPAND;
+	vma->vm_private_data = buf;
+
+	return 0;
+}
+
+/**
+ *	relay_file_mmap - mmap file op for relay files
+ *	@filp: the file
+ *	@vma: the vma describing what to map
+ *
+ *	Calls upon relay_mmap_buf() to map the file into user space.
+ */
+int ring_buffer_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	struct ring_buffer *buf = filp->private_data;
+	return ring_buffer_mmap_buf(buf, vma);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_mmap);
Index: linux.trees.git/include/linux/ringbuffer/vfs.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/vfs.h	2010-07-09 18:29:10.000000000 -0400
@@ -0,0 +1,57 @@
+#ifndef _LINUX_RING_BUFFER_VFS_H
+#define _LINUX_RING_BUFFER_VFS_H
+
+/*
+ * linux/ringbuffer/vfs.h
+ *
+ * (C) Copyright 2005-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Wait-free ring buffer VFS file operations.
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/fs.h>
+#include <linux/poll.h>
+
+/* VFS API */
+
+extern const struct file_operations ring_buffer_file_operations;
+
+/*
+ * Internal file operations.
+ */
+
+int ring_buffer_open(struct inode *inode, struct file *file);
+int ring_buffer_release(struct inode *inode, struct file *file);
+unsigned int ring_buffer_poll(struct file *filp, poll_table *wait);
+ssize_t ring_buffer_splice_read(struct file *in, loff_t *ppos,
+				struct pipe_inode_info *pipe, size_t len,
+				unsigned int flags);
+int ring_buffer_mmap(struct file *filp, struct vm_area_struct *vma);
+
+/* Ring Buffer ioctl() and ioctl numbers */
+int ring_buffer_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
+		      unsigned long arg);
+#ifdef CONFIG_COMPAT
+long ring_buffer_compat_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long arg);
+#endif
+
+/* Get the next sub-buffer that can be read. */
+#define RING_BUFFER_GET_SUBBUF			_IOR(0xF6, 0x00, __u32)
+/* Release the oldest reserved (by "get") sub-buffer. */
+#define RING_BUFFER_PUT_SUBBUF			_IOW(0xF6, 0x01, __u32)
+/* returns the size of the current sub-buffer. */
+#define RING_BUFFER_GET_SUBBUF_SIZE		_IOR(0xF6, 0x02, __u32)
+/* returns the maximum size for sub-buffers. */
+#define RING_BUFFER_GET_MAX_SUBBUF_SIZE		_IOR(0xF6, 0x03, __u32)
+/* returns the length to mmap. */
+#define RING_BUFFER_GET_MMAP_LEN		_IOR(0xF6, 0x04, __u32)
+/* returns the offset of the subbuffer belonging to the mmap reader. */
+#define RING_BUFFER_GET_MMAP_READ_OFFSET	_IOR(0xF6, 0x05, __u32)
+
+#endif /* _LINUX_RING_BUFFER_VFS_H */
Index: linux.trees.git/lib/ringbuffer/ring_buffer_vfs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_vfs.c	2010-07-09 18:30:33.000000000 -0400
@@ -0,0 +1,257 @@
+/*
+ * ring_buffer_vfs.c
+ *
+ * Copyright (C) 2009-2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring Buffer VFS file operations.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+
+#include <linux/ringbuffer/backend.h>
+#include <linux/ringbuffer/frontend.h>
+#include <linux/ringbuffer/vfs.h>
+
+/**
+ *	ring_buffer_open - ring buffer open file operation
+ *	@inode: opened inode
+ *	@file: opened file
+ *
+ *	Open implementation. Makes sure only one open instance of a buffer is
+ *	done at a given moment.
+ */
+int ring_buffer_open(struct inode *inode, struct file *file)
+{
+	struct ring_buffer *buf = inode->i_private;
+	int ret;
+
+	ret = ring_buffer_open_read(buf);
+	if (ret)
+		return ret;
+
+	file->private_data = buf;
+	ret = nonseekable_open(inode, file);
+	if (ret)
+		goto release_read;
+	return 0;
+
+release_read:
+	ring_buffer_release_read(buf);
+	return ret;
+}
+
+/**
+ *	ring_buffer_release - ring buffer release file operation
+ *	@inode: opened inode
+ *	@file: opened file
+ *
+ *	Release implementation.
+ */
+int ring_buffer_release(struct inode *inode, struct file *file)
+{
+	struct ring_buffer *buf = inode->i_private;
+
+	ring_buffer_release_read(buf);
+
+	return 0;
+}
+
+/**
+ *	ring_buffer_poll - ring buffer poll file operation
+ *	@filp: the file
+ *	@wait: poll table
+ *
+ *	Poll implementation.
+ */
+unsigned int ring_buffer_poll(struct file *filp, poll_table *wait)
+{
+	unsigned int mask = 0;
+	struct inode *inode = filp->f_dentry->d_inode;
+	struct ring_buffer *buf = inode->i_private;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	int finalized;
+
+	if (filp->f_mode & FMODE_READ) {
+		poll_wait_set_exclusive(wait);
+		poll_wait(filp, &buf->read_wait, wait);
+
+		finalized = ring_buffer_is_finalized(config, buf);
+		/*
+		 * ring_buffer_is_finalized() contains a smp_rmb() ordering
+		 * finalized load before offsets loads.
+		 */
+
+		WARN_ON(atomic_long_read(&buf->active_readers) != 1);
+retry:
+		if (subbuf_trunc(ring_buffer_get_offset(config, buf), chan)
+		  - subbuf_trunc(ring_buffer_get_consumed(config, buf), chan)
+		  == 0) {
+			if (finalized)
+				return POLLHUP;
+			else {
+				/*
+				 * The memory barriers
+				 * __wait_event()/wake_up_interruptible() take
+				 * care of "raw_spin_is_locked" memory ordering.
+				 */
+				if (raw_spin_is_locked(&buf->raw_idle_spinlock))
+					goto retry;
+				else
+					return 0;
+			}
+		} else {
+			if (subbuf_trunc(ring_buffer_get_offset(config, buf),
+					 chan)
+			  - subbuf_trunc(ring_buffer_get_consumed(config, buf),
+					 chan)
+			  >= chan->backend.buf_size)
+				return POLLPRI | POLLRDBAND;
+			else
+				return POLLIN | POLLRDNORM;
+		}
+	}
+	return mask;
+}
+
+/**
+ *	ring_buffer_ioctl - control ring buffer reader synchronization
+ *
+ *	@inode: the inode
+ *	@filp: the file
+ *	@cmd: the command
+ *	@arg: command arg
+ *
+ *	This ioctl implements commands necessary for producer/consumer
+ *	and flight recorder reader interaction :
+ *	RING_BUFFER_GET_SUBBUF
+ *		Get the next sub-buffer that can be read. It never blocks.
+ *	RING_BUFFER_PUT_SUBBUF
+ *		Release the currently read sub-buffer. Parameter is the last
+ *		put subbuffer (returned by GET_SUBBUF).
+ *	RING_BUFFER_GET_SUBBUF_SIZE
+ *		returns the size of the current sub-buffer.
+ *	RING_BUFFER_GET_MAX_SUBBUF_SIZE
+ *		returns the maximum size for sub-buffers.
+ *	RING_BUFFER_GET_NUM_SUBBUF
+ *		returns the number of reader-visible sub-buffers in the per cpu
+ *              channel (for mmap).
+ *      RING_BUFFER_GET_MMAP_READ_OFFSET
+ *              returns the offset of the subbuffer belonging to the reader.
+ *              Should only be used for mmap clients.
+ */
+int ring_buffer_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
+		      unsigned long arg)
+{
+	struct ring_buffer *buf = inode->i_private;
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	u32 __user *argp = (u32 __user *)arg;
+
+	switch (cmd) {
+	case RING_BUFFER_GET_SUBBUF:
+	{
+		unsigned long consumed;
+		int ret;
+
+		ret = ring_buffer_get_subbuf(buf, &consumed);
+		if (ret)
+			return ret;
+		else
+			return put_user((u32)consumed, argp);
+		break;
+	}
+	case RING_BUFFER_PUT_SUBBUF:
+	{
+		u32 uconsumed_old;
+		int ret;
+		long consumed_old;
+
+		ret = get_user(uconsumed_old, argp);
+		if (ret)
+			return ret; /* will return -EFAULT */
+
+		consumed_old = ring_buffer_get_consumed(config, buf);
+		consumed_old = consumed_old & (~0xFFFFFFFFL);
+		consumed_old = consumed_old | uconsumed_old;
+		ring_buffer_put_subbuf(buf, consumed_old);
+		break;
+	}
+	case RING_BUFFER_GET_SUBBUF_SIZE:
+		return put_user(ring_buffer_get_read_data_size(config, buf),
+				argp);
+		break;
+	case RING_BUFFER_GET_MAX_SUBBUF_SIZE:
+		return put_user((u32)chan->backend.subbuf_size, argp);
+		break;
+	/*
+	 * TODO: mmap length is currently limited to 4GB, even on 64-bit
+	 * architectures. We should be more clever in dealing with ioctl
+	 * compatibility here. Using a u32 is probably not what we want.
+	 */
+	case RING_BUFFER_GET_MMAP_LEN:
+	{
+		unsigned long mmap_buf_len;
+
+		if (config->output != RING_BUFFER_MMAP)
+			return -EINVAL;
+		mmap_buf_len = chan->backend.buf_size;
+		if (chan->backend.extra_reader_sb)
+			mmap_buf_len += chan->backend.subbuf_size;
+		if (mmap_buf_len > INT_MAX)
+			return -EFBIG;
+		return put_user((u32)mmap_buf_len, argp);
+		break;
+	}
+	case RING_BUFFER_GET_MMAP_READ_OFFSET:
+	{
+		unsigned long sb_bindex;
+
+		if (config->output != RING_BUFFER_MMAP)
+			return -EINVAL;
+		sb_bindex = subbuffer_id_get_index(config,
+						  buf->backend.buf_rsb.id);
+		return put_user((u32)buf->backend.array[sb_bindex]->mmap_offset,
+				 argp);
+		break;
+	}
+	default:
+		return -ENOIOCTLCMD;
+	}
+	return 0;
+}
+
+#ifdef CONFIG_COMPAT
+long ring_buffer_compat_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	long ret = -ENOIOCTLCMD;
+
+	lock_kernel();
+	ret = ring_buffer_ioctl(file->f_dentry->d_inode, file, cmd, arg);
+	unlock_kernel();
+
+	return ret;
+}
+#endif
+
+const struct file_operations ring_buffer_file_operations = {
+	.open = ring_buffer_open,
+	.release = ring_buffer_release,
+	.poll = ring_buffer_poll,
+	.splice_read = ring_buffer_splice_read,
+	.mmap = ring_buffer_mmap,
+	.ioctl = ring_buffer_ioctl,
+	.llseek = ring_buffer_no_llseek,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = ring_buffer_compat_ioctl,
+#endif
+};
+EXPORT_SYMBOL_GPL(ring_buffer_file_operations);
+
+MODULE_LICENSE("GPL and additional rights");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Ring Buffer Library VFS");


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 16/20] Ring buffer library - client sample
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (14 preceding siblings ...)
  2010-07-09 22:57 ` [patch 15/20] Ring buffer library - VFS operations Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 17/20] Ring buffer benchmark library Mathieu Desnoyers
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring-buffer-client-sample.patch --]
[-- Type: text/plain, Size: 13046 bytes --]

Example ring buffer library client.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 samples/Kconfig                                      |    9 
 samples/Makefile                                     |    2 
 samples/ring_buffer_user/Makefile                    |    4 
 samples/ring_buffer_user/ring_buffer_template_user.c |  273 +++++++++++++++++++
 samples/ring_buffer_user/ring_buffer_template_user.h |  100 ++++++
 5 files changed, 387 insertions(+), 1 deletion(-)

Index: linux.trees.git/samples/Makefile
===================================================================
--- linux.trees.git.orig/samples/Makefile	2010-07-09 18:30:15.000000000 -0400
+++ linux.trees.git/samples/Makefile	2010-07-09 18:31:11.000000000 -0400
@@ -1,4 +1,4 @@
 # Makefile for Linux samples code
 
 obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ tracepoints/ trace_events/ \
-			   hw_breakpoint/
+			   hw_breakpoint/ ring_buffer_user/
Index: linux.trees.git/samples/ring_buffer_user/ring_buffer_template_user.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/samples/ring_buffer_user/ring_buffer_template_user.c	2010-07-09 18:31:52.000000000 -0400
@@ -0,0 +1,273 @@
+/*
+ * ring_buffer_template_user.c
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer template instance code example.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/module.h>
+#include <linux/debugfs.h>
+
+#include "ring_buffer_template_user.h"
+
+struct ring_buffer_priv {
+	struct dentry *dentry;
+};
+
+struct channel_priv {
+	struct ring_buffer_priv *buf;
+};
+
+static struct channel *chan;
+static struct channel_priv channel_priv;
+static const struct ring_buffer_config client_config;
+
+/* Client callbacks */
+
+static u64 client_ring_buffer_clock_read(struct channel *chan)
+{
+	return ring_buffer_clock_read(chan);
+}
+
+size_t client_record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size,
+				 size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	return record_header_size(config, chan, offset, data_size,
+				  pre_header_padding, rflags, ctx);
+}
+
+/**
+ * ring_buffer_sb_header_size - called on buffer-switch to a new sub-buffer
+ *
+ * Return header size without padding after the structure. Don't use packed
+ * structure because gcc generates inefficient code on some architectures
+ * (powerpc, mips..)
+ */
+static size_t client_subbuffer_header_size(void)
+{
+	return offsetof(struct subbuffer_header, header_end);
+}
+
+/**
+ * ring_buffer_write_trace_header - Write trace header
+ * @trace: Trace information
+ * @header: Memory address where the information must be written to
+ */
+static inline
+void write_trace_header(void *priv, struct subbuffer_header *header)
+{
+	header->magic_number = 0x12345678;
+	header->alignment = ring_buffer_get_alignment(&client_config);
+	/* ... */
+}
+
+static void client_buffer_begin(struct ring_buffer *buf, u64 tsc,
+			      unsigned int subbuf_idx)
+{
+	struct channel *chan = buf->backend.chan;
+	struct subbuffer_header *header =
+		(struct subbuffer_header *)
+			ring_buffer_offset_address(&buf->backend,
+				subbuf_idx * chan->backend.subbuf_size);
+
+	header->cycle_count_begin = tsc;
+	header->data_size = 0xFFFFFFFF; /* for debugging */
+	write_trace_header(chan->backend.priv, header);
+}
+
+/*
+ * offset is assumed to never be 0 here : never deliver a completely empty
+ * subbuffer. data_size is between 1 and subbuf_size.
+ */
+static void client_buffer_end(struct ring_buffer *buf, u64 tsc,
+			    unsigned int subbuf_idx, unsigned long data_size)
+{
+	struct channel *chan = buf->backend.chan;
+	struct subbuffer_header *header =
+		(struct subbuffer_header *)
+			ring_buffer_offset_address(&buf->backend,
+				subbuf_idx * chan->backend.subbuf_size);
+	unsigned long records_lost = 0;
+
+	header->data_size = data_size;
+	header->subbuf_size = PAGE_ALIGN(data_size);
+	header->cycle_count_end = tsc;
+	records_lost += ring_buffer_get_records_lost_full(&client_config, buf);
+	records_lost += ring_buffer_get_records_lost_wrap(&client_config, buf);
+	records_lost += ring_buffer_get_records_lost_big(&client_config, buf);
+	header->records_lost = records_lost;
+}
+
+static int client_buffer_create(struct ring_buffer *buf, void *priv,
+				int cpu, const char *name)
+{
+	struct channel_priv *chan_priv = priv;
+	struct ring_buffer_priv *buf_priv;
+	char *tmpname;
+	int ret = 0;
+
+	if (client_config.alloc == RING_BUFFER_ALLOC_PER_CPU)
+		buf_priv = per_cpu_ptr(chan_priv->buf, cpu);
+	else
+		buf_priv = chan_priv->buf;
+
+	tmpname = kzalloc(NAME_MAX + 1, GFP_KERNEL);
+	if (!tmpname) {
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	snprintf(tmpname, NAME_MAX, "%s%s_%d",
+		 (client_config.mode == RING_BUFFER_OVERWRITE) ? "flight-" : "",
+		 name, cpu);
+
+	buf_priv->dentry = debugfs_create_file(tmpname, S_IRUSR, NULL, buf,
+					       &ring_buffer_file_operations);
+	if (!buf_priv->dentry) {
+		ret = -ENOMEM;
+		goto free_name;
+	}
+free_name:
+	kfree(tmpname);
+end:
+	return ret;
+}
+
+static void client_buffer_finalize(struct ring_buffer *buf, void *priv, int cpu)
+{
+	struct channel_priv *chan_priv = priv;
+	struct ring_buffer_priv *buf_priv;
+
+	if (client_config.alloc == RING_BUFFER_ALLOC_PER_CPU)
+		buf_priv = per_cpu_ptr(chan_priv->buf, cpu);
+	else
+		buf_priv = chan_priv->buf;
+
+	debugfs_remove(buf_priv->dentry);
+}
+
+static const struct ring_buffer_config client_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = client_buffer_create,
+	.cb.buffer_finalize = client_buffer_finalize,
+
+	.tsc_bits = 64,
+	.alloc = RING_BUFFER_ALLOC_PER_CPU,
+	.sync = RING_BUFFER_SYNC_PER_CPU,
+	.mode = RING_BUFFER_OVERWRITE,
+	.align = RING_BUFFER_NATURAL,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_SPLICE,
+	.oops = RING_BUFFER_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_TIMER,
+};
+
+static void write_event_header(const struct ring_buffer_config *config,
+			       struct ring_buffer_ctx *ctx)
+{
+	ring_buffer_write(config, ctx, &ctx->tsc, sizeof(ctx->tsc));
+}
+
+static noinline void trace_event(unsigned long data1, short data2)
+{
+	struct ring_buffer_ctx ctx;
+	int ret, cpu;
+
+	cpu = ring_buffer_get_cpu(&client_config);
+	if (cpu < 0)
+		return;
+	ring_buffer_ctx_init(&ctx, chan, &channel_priv,
+			     sizeof(struct payload),
+			     sizeof(unsigned long),
+			     cpu);
+
+	ret = ring_buffer_reserve(&client_config, &ctx);
+	if (ret)
+		goto put;
+
+	write_event_header(&client_config, &ctx);
+
+	ring_buffer_align_ctx(&client_config, &ctx, sizeof(data1));
+	ring_buffer_write(&client_config, &ctx, &data1, sizeof(data1));
+	ring_buffer_align_ctx(&client_config, &ctx, sizeof(data2));
+	ring_buffer_write(&client_config, &ctx, &data2, sizeof(data2));
+
+	ring_buffer_commit(&client_config, &ctx);
+
+put:
+	ring_buffer_put_cpu(&client_config);
+}
+
+static noinline void use_ring_buffer(void)
+{
+	unsigned long i;
+
+	for (i = 0; i < 1000000; i++)
+		trace_event(i, (short)i);
+}
+
+static int __init ring_buffer_client_init(void)
+{
+	int ret;
+
+	printk(KERN_DEBUG "Ring buffer client init begin\n");
+
+	if (client_config.alloc == RING_BUFFER_ALLOC_PER_CPU)
+		channel_priv.buf = alloc_percpu(struct ring_buffer_priv);
+	else
+		channel_priv.buf = kzalloc(sizeof(struct ring_buffer_priv),
+					    GFP_KERNEL);
+	if (!channel_priv.buf)
+		return -ENOMEM;
+
+	chan = channel_create(&client_config, "sample", &channel_priv, NULL,
+			     1048576, 2,
+			     100000, 1000);
+	if (!chan) {
+		ret = -EINVAL;
+		goto error_create;
+	}
+
+	printk(KERN_DEBUG "Ring buffer client init end\n");
+
+	use_ring_buffer();
+
+	return 0;
+
+error_create:
+	if (client_config.alloc == RING_BUFFER_ALLOC_PER_CPU)
+		free_percpu(channel_priv.buf);
+	else
+		kfree(channel_priv.buf);
+	return ret;
+}
+
+static void __exit ring_buffer_client_exit(void)
+{
+	printk(KERN_DEBUG "Ring buffer client exit begin\n");
+	channel_destroy(chan);
+	if (client_config.alloc == RING_BUFFER_ALLOC_PER_CPU)
+		free_percpu(channel_priv.buf);
+	else
+		kfree(channel_priv.buf);
+	printk(KERN_DEBUG "Ring buffer client exit end\n");
+}
+
+module_init(ring_buffer_client_init);
+module_exit(ring_buffer_client_exit);
+
+MODULE_LICENSE("GPL and additional rights");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Ring Buffer Client Template");
Index: linux.trees.git/samples/ring_buffer_user/ring_buffer_template_user.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/samples/ring_buffer_user/ring_buffer_template_user.h	2010-07-09 18:32:01.000000000 -0400
@@ -0,0 +1,100 @@
+#ifndef _RING_BUFFER_TEMPLATE_USER_H
+#define _RING_BUFFER_TEMPLATE_USER_H
+
+/*
+ * ring_buffer_template_user.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer template instance code example.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/types.h>
+#include <linux/trace_clock.h>
+
+/* Align data on its natural alignment */
+#define RING_BUFFER_ALIGN
+
+#include <linux/ringbuffer/config.h>
+
+struct subbuffer_header {
+	uint64_t cycle_count_begin;	/* Cycle count at subbuffer start */
+	uint64_t cycle_count_end;	/* Cycle count at subbuffer end */
+	uint32_t magic_number;		/*
+					 * Trace magic number.
+					 * contains endianness information.
+					 */
+	uint8_t major_version;
+	uint8_t minor_version;
+	uint8_t arch_size;		/* Architecture pointer size */
+	uint8_t alignment;		/* ring buffer data alignment */
+	uint64_t start_time_sec;	/* NTP-corrected start time */
+	uint64_t start_time_usec;
+	uint64_t start_freq;		/*
+					 * Frequency at trace start,
+					 * used all along the trace.
+					 */
+	uint32_t freq_scale;		/* Frequency scaling (divisor) */
+	uint32_t data_size;		/* Size of data in subbuffer */
+	uint32_t subbuf_size;		/* Subbuffer size (include padding) */
+	uint32_t records_lost;		/*
+					 * Records lost in this subbuffer since
+					 * the beginning of the trace.
+					 * (may overflow)
+					 */
+	uint8_t header_end[0];		/* End of header */
+};
+
+struct event_header {
+	u64 tsc;
+};
+
+struct payload {
+	unsigned long field1;
+	short field2;
+} RING_BUFFER_ALIGN_ATTR;
+
+/*
+ * Using the trace_clock_global() as an example.
+ */
+static inline notrace u64 ring_buffer_clock_read(struct channel *chan)
+{
+	return trace_clock_global();
+}
+
+/*
+ * record_header_size - Calculate the header size and padding necessary.
+ * @config: ring buffer instance configuration
+ * @chan: channel
+ * @offset: offset in the write buffer
+ * @data_size: size of the payload
+ * @pre_header_padding: padding to add before the header (output)
+ * @rflags: reservation flags
+ * @ctx: reservation context
+ *
+ * Returns the event header size (including padding).
+ */
+static inline notrace
+unsigned char record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size, size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	size_t orig_offset = offset;
+	size_t padding;
+
+	padding = ring_buffer_align(config, offset,
+				    sizeof(struct event_header));
+	offset += padding;
+	offset += sizeof(struct event_header);
+
+	*pre_header_padding = padding;
+	return offset - orig_offset;
+}
+
+#include <linux/ringbuffer/api.h>
+
+#endif /* _RING_BUFFER_TEMPLATE_USER_H */
Index: linux.trees.git/samples/Kconfig
===================================================================
--- linux.trees.git.orig/samples/Kconfig	2010-07-09 18:30:15.000000000 -0400
+++ linux.trees.git/samples/Kconfig	2010-07-09 18:31:11.000000000 -0400
@@ -44,4 +44,13 @@ config SAMPLE_HW_BREAKPOINT
 	help
 	  This builds kernel hardware breakpoint example modules.
 
+config SAMPLE_LIB_RING_BUFFER_TEMPLATE
+	tristate "Build ring buffer template user -- loadable modules only"
+	default m
+	depends on LIB_RING_BUFFER
+	#depends on !TRACING	#name conflict with old ring buffer API
+	select TRACE_CLOCK_STANDALONE
+	help
+	  This builds the ring buffer template user.
+
 endif # SAMPLES
Index: linux.trees.git/samples/ring_buffer_user/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/samples/ring_buffer_user/Makefile	2010-07-09 18:31:11.000000000 -0400
@@ -0,0 +1,4 @@
+# builds the ring buffer template user example kernel modules;
+# then to use one (as root):  insmod <module_name.ko>
+
+obj-$(CONFIG_SAMPLE_LIB_RING_BUFFER_TEMPLATE) += ring_buffer_template_user.o


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 17/20] Ring buffer benchmark library
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (15 preceding siblings ...)
  2010-07-09 22:57 ` [patch 16/20] Ring buffer library - client sample Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 18/20] Ring Buffer Record Iterator Mathieu Desnoyers
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring-buffer-benchmark-lib.patch --]
[-- Type: text/plain, Size: 20103 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 kernel/trace/Kconfig                     |    1 
 kernel/trace/Makefile                    |    1 
 kernel/trace/lib_ring_buffer_benchmark.c |  658 +++++++++++++++++++++++++++++++
 3 files changed, 660 insertions(+)

Index: linux.trees.git/kernel/trace/lib_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/lib_ring_buffer_benchmark.c	2010-07-09 18:34:37.000000000 -0400
@@ -0,0 +1,658 @@
+/*
+ * ring buffer library tester and benchmark
+ *
+ * Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#define RING_BUFFER_ALIGN
+
+#include <linux/ringbuffer/config.h>
+#include <linux/trace_clock.h>
+#include <linux/completion.h>
+#include <linux/kmemcheck.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/time.h>
+#include <asm/local.h>
+
+struct subbuffer_header {
+	u64		ts;
+	unsigned long	commit;
+	uint8_t		header_end[0];
+};
+
+/*
+ * Only 27-bit tsc support, TODO: extended header to support 27-bit overflow.
+ */
+struct event_header {
+	kmemcheck_bitfield_begin(bitfield);
+	u32		type_len:5, tsc:27;
+	kmemcheck_bitfield_end(bitfield);
+	uint8_t		header_end[0];
+};
+
+struct payload {
+	int cpuid;
+} RING_BUFFER_ALIGN_ATTR;
+
+static inline notrace u64 ring_buffer_clock_read(struct channel *chan)
+{
+	return trace_clock_local();
+}
+
+static inline notrace
+unsigned char record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size, size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	size_t orig_offset = offset;
+	size_t padding;
+
+	padding = ring_buffer_align(config, offset,
+				    offsetof(struct event_header, header_end));
+	offset += padding;
+	offset += sizeof(struct event_header);
+
+	*pre_header_padding = padding;
+	return offset - orig_offset;
+}
+
+#include <linux/ringbuffer/api.h>
+
+static struct channel *channel;
+static const struct ring_buffer_config client_config;
+
+static u64 client_ring_buffer_clock_read(struct channel *chan)
+{
+	return ring_buffer_clock_read(chan);
+}
+
+static
+size_t client_record_header_size(const struct ring_buffer_config *config,
+				struct channel *chan, size_t offset,
+				size_t data_size,
+				size_t *pre_header_padding,
+				unsigned int rflags,
+				struct ring_buffer_ctx *ctx)
+{
+	return record_header_size(config, chan, offset, data_size,
+				  pre_header_padding, rflags, ctx);
+}
+
+static size_t client_subbuffer_header_size(void)
+{
+	return offsetof(struct subbuffer_header, header_end);
+}
+
+static void client_buffer_begin(struct ring_buffer *buf, u64 tsc,
+				unsigned int subbuf_idx)
+{
+	struct channel *chan = buf->backend.chan;
+	struct subbuffer_header *header =
+		(struct subbuffer_header *)
+			ring_buffer_offset_address(&buf->backend,
+				subbuf_idx * chan->backend.subbuf_size);
+	header->ts = tsc;
+}
+
+static void client_buffer_end(struct ring_buffer *buf, u64 tsc,
+			    unsigned int subbuf_idx, unsigned long data_size)
+{
+	struct channel *chan = buf->backend.chan;
+	struct subbuffer_header *header =
+		(struct subbuffer_header *)
+			ring_buffer_offset_address(&buf->backend,
+				subbuf_idx * chan->backend.subbuf_size);
+	header->commit = data_size;
+}
+
+static const struct ring_buffer_config client_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = NULL,
+	.cb.buffer_finalize = NULL,
+
+	.tsc_bits = 27,
+	.alloc = RING_BUFFER_ALLOC_PER_CPU,
+	/* .alloc = RING_BUFFER_ALLOC_GLOBAL, */
+	.sync = RING_BUFFER_SYNC_PER_CPU,
+	/* .sync = RING_BUFFER_SYNC_GLOBAL, */
+	.mode = RING_BUFFER_OVERWRITE,
+	/* .mode = RING_BUFFER_DISCARD, */
+	.align = RING_BUFFER_NATURAL,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_NONE,
+	.oops = RING_BUFFER_NO_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_TIMER,
+};
+
+static void write_event_header(const struct ring_buffer_config *config,
+			       struct ring_buffer_ctx *ctx)
+{
+	struct event_header header;
+
+	header.type_len = ctx->data_size;
+	header.tsc = ctx->tsc;
+	/*
+	 * eventually check rflags to know if a 27 bit tsc overflow is detected
+	 * between consecutive events.
+	 */
+	ring_buffer_write(config, ctx, &header, sizeof(header));
+}
+
+/* run time and sleep time in seconds */
+#define RUN_TIME	10
+#define SLEEP_TIME	10
+
+#ifndef CONFIG_PREEMPT
+/* number of events for writer to give up the cpu */
+static int resched_interval = 5000;
+#endif
+
+static struct completion read_start;
+
+static struct task_struct *producer;
+static struct task_struct *consumer;
+static unsigned long iter_read;
+static unsigned long long global_read;
+
+static int disable_reader;
+module_param(disable_reader, uint, 0644);
+MODULE_PARM_DESC(disable_reader, "only run producer");
+
+static int write_iteration = 50;
+module_param(write_iteration, uint, 0644);
+MODULE_PARM_DESC(write_iteration, "# of writes between timestamp readings");
+
+static int producer_nice = 19;
+static int consumer_nice = 19;
+
+static int producer_fifo = -1;
+static int consumer_fifo = -1;
+
+module_param(producer_nice, uint, 0644);
+MODULE_PARM_DESC(producer_nice, "nice prio for producer");
+
+module_param(consumer_nice, uint, 0644);
+MODULE_PARM_DESC(consumer_nice, "nice prio for consumer");
+
+module_param(producer_fifo, uint, 0644);
+MODULE_PARM_DESC(producer_fifo, "fifo prio for producer");
+
+module_param(consumer_fifo, uint, 0644);
+MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
+
+static int kill_test;
+
+#define KILL_TEST()				\
+	do {					\
+		if (!kill_test) {		\
+			kill_test = 1;		\
+			WARN_ON(1);		\
+		}				\
+	} while (0)
+
+enum event_status {
+	EVENT_FOUND,
+	EVENT_DROPPED,
+};
+
+static int read_subbuffer(struct ring_buffer *buf, int cpu)
+{
+	unsigned long consumed, offset, data_size;
+	int ret;
+
+	ret = ring_buffer_get_subbuf(buf, &consumed);
+	if (ret && !ACCESS_ONCE(buf->finalized)
+	    && client_config.alloc == RING_BUFFER_ALLOC_GLOBAL) {
+		/*
+		 * Use "pull" scheme for global buffers. The reader
+		 * itself flushes the buffer to "pull" data not visible
+		 * to readers yet. Flush current subbuffer and re-try.
+		 *
+		 * Per-CPU buffers rather use a "push" scheme because
+		 * the IPI needed to flush all CPU's buffers is too
+		 * costly. In the "push" scheme, the reader waits for
+		 * the writer periodic deferrable timer to flush the
+		 * buffers (keeping track of a quiescent state
+		 * timestamp). Therefore, the writer "pushes" data out
+		 * of the buffers rather than letting the reader "pull"
+		 * data from the buffer.
+		 */
+		ring_buffer_switch(&client_config, buf, SWITCH_ACTIVE);
+		ret = ring_buffer_get_subbuf(buf, &consumed);
+	}
+	if (ret)
+		goto get_fail;
+
+	data_size = ring_buffer_get_read_data_size(&client_config, buf);
+	offset = consumed;
+
+	/* Skip header */
+	offset += client_subbuffer_header_size();
+
+	/* Read events */
+	while (offset < consumed + data_size) {
+		unsigned int cpuid;
+
+		offset += ring_buffer_align(&client_config, offset,
+					    offsetof(struct event_header,
+						     header_end));
+		offset += offsetof(struct event_header, header_end);
+		offset += ring_buffer_align(&client_config, offset,
+					    sizeof(cpuid));
+		ring_buffer_read(&buf->backend, offset, &cpuid, sizeof(cpuid));
+		if (client_config.alloc == RING_BUFFER_ALLOC_GLOBAL)
+			WARN_ON_ONCE(cpuid > num_online_cpus());
+		else
+			WARN_ON_ONCE(cpuid != cpu);
+		offset += sizeof(cpuid);
+		iter_read++;
+		global_read++;
+	}
+
+	/* Put subbuffer */
+	ring_buffer_put_subbuf(buf, consumed);
+get_fail:
+
+	return ret;
+}
+
+static int read_channel_events(void)
+{
+	int cpu;
+	int all_finalized = 1;
+	int all_empty = 1;
+
+	for_each_channel_cpu(cpu, channel) {
+		struct ring_buffer *buf;
+		int ret;
+
+		buf = channel_get_ring_buffer(&client_config, channel, cpu);
+		ret = read_subbuffer(buf, cpu);
+		if (ret != -ENODATA)
+			all_finalized = 0;
+		if (ret == 0)
+			all_empty = 0;
+	}
+
+	if (all_finalized)
+		return -ENODATA;
+	if (all_empty)
+		return -EAGAIN;
+	else
+		return 0;
+}
+
+static void ring_buffer_consumer(void)
+{
+	int ret;
+
+	do {
+		if (client_config.alloc == RING_BUFFER_ALLOC_GLOBAL) {
+			struct ring_buffer *buf;
+
+			buf = channel_get_ring_buffer(
+				&client_config, channel, 0);
+			ret = read_subbuffer(buf, 0);
+			if (ret == -EAGAIN)
+				wait_event_interruptible(channel->read_wait,
+					(ret = read_subbuffer(buf, 0),
+					 ret != -EAGAIN));
+		} else {
+			ret = read_channel_events();
+			if (ret == -EAGAIN)
+				wait_event_interruptible(channel->read_wait,
+					(ret = read_channel_events(),
+					 ret != -EAGAIN));
+		}
+	} while (!kill_test && ret != -ENODATA);
+}
+
+static void ring_buffer_producer(void)
+{
+	struct timeval start_tv;
+	struct timeval end_tv;
+	unsigned long long time;
+
+	unsigned long long hit = 0;
+	unsigned long long missed = 0;
+
+	unsigned long long written = 0;
+	unsigned long long lost_full = 0, lost_wrap = 0, lost_big = 0;
+	unsigned long long entries = 0;
+	unsigned long long overruns = 0;
+	unsigned long long read = 0;
+	unsigned long avg;
+	int cnt = 0;
+	int ret, cpu;
+	struct ring_buffer_ctx ctx;
+
+	/*
+	 * Hammer the buffer for 10 secs (this may
+	 * make the system stall)
+	 */
+	trace_printk("Starting ring buffer hammer\n");
+	do_gettimeofday(&start_tv);
+	do {
+		int i;
+
+		for (i = 0; i < write_iteration; i++) {
+			cpu = ring_buffer_get_cpu(&client_config);
+			if (cpu < 0)
+				continue;
+			ring_buffer_ctx_init(&ctx, channel, NULL,
+					     sizeof(struct payload),
+					     sizeof(cpuid), cpu);
+			ret = ring_buffer_reserve(&client_config, &ctx);
+			if (ret) {
+				missed++;
+			} else {
+				int cpuid = smp_processor_id();
+
+				hit++;
+				write_event_header(&client_config, &ctx);
+				ring_buffer_align_ctx(&client_config, &ctx,
+						      sizeof(cpuid));
+				ring_buffer_write(&client_config, &ctx,
+						  &cpuid, sizeof(cpuid));
+				ring_buffer_commit(&client_config, &ctx);
+			}
+			ring_buffer_put_cpu(&client_config);
+		}
+		do_gettimeofday(&end_tv);
+
+		cnt++;
+
+#ifndef CONFIG_PREEMPT
+		/*
+		 * If we are a non preempt kernel, the 10 second run will
+		 * stop everything while it runs. Instead, we will call
+		 * cond_resched and also add any time that was lost by a
+		 * rescedule.
+		 *
+		 * Do a cond resched at the same frequency we would wake up
+		 * the reader.
+		 */
+		if (cnt % resched_interval)
+			cond_resched();
+#endif
+
+	} while (end_tv.tv_sec < (start_tv.tv_sec + RUN_TIME) && !kill_test);
+	trace_printk("End ring buffer hammer\n");
+
+	time = end_tv.tv_sec - start_tv.tv_sec;
+	time *= USEC_PER_SEC;
+	time += (long long)((long)end_tv.tv_usec - (long)start_tv.tv_usec);
+
+	if (client_config.alloc == RING_BUFFER_ALLOC_GLOBAL) {
+		struct ring_buffer *buf =
+			channel_get_ring_buffer(&client_config,
+						channel, 0);
+
+		/*
+		 * These values only take into account flushed subbuffers.
+		 */
+		written += ring_buffer_get_records_count(&client_config, buf);
+		lost_full += ring_buffer_get_records_lost_full(&client_config, buf);
+		lost_wrap += ring_buffer_get_records_lost_wrap(&client_config, buf);
+		lost_big += ring_buffer_get_records_lost_big(&client_config, buf);
+		overruns += ring_buffer_get_records_overrun(&client_config, buf);
+		entries += ring_buffer_get_records_unread(&client_config, buf);
+		read += ring_buffer_get_records_read(&client_config, buf);
+	} else {
+		for_each_channel_cpu(cpu, channel) {
+			struct ring_buffer *buf =
+				channel_get_ring_buffer(&client_config,
+							channel, cpu);
+
+			/*
+			 * These values only take into account flushed
+			 * subbuffers.
+			 */
+			written += ring_buffer_get_records_count(&client_config, buf);
+			lost_full += ring_buffer_get_records_lost_full(&client_config, buf);
+			lost_wrap += ring_buffer_get_records_lost_wrap(&client_config, buf);
+			lost_big += ring_buffer_get_records_lost_big(&client_config, buf);
+			overruns += ring_buffer_get_records_overrun(&client_config, buf);
+			entries += ring_buffer_get_records_unread(&client_config, buf);
+			read += ring_buffer_get_records_read(&client_config, buf);
+		}
+	}
+
+	if (kill_test)
+		trace_printk("ERROR!\n");
+
+	if (!disable_reader) {
+		if (consumer_fifo < 0)
+			trace_printk("Running Consumer at nice: %d\n",
+				     consumer_nice);
+		else
+			trace_printk("Running Consumer at SCHED_FIFO %d\n",
+				     consumer_fifo);
+	}
+	if (producer_fifo < 0)
+		trace_printk("Running Producer at nice: %d\n",
+			     producer_nice);
+	else
+		trace_printk("Running Producer at SCHED_FIFO %d\n",
+			     producer_fifo);
+
+	/* Let the user know that the test is running at low priority */
+	if (producer_fifo < 0 && consumer_fifo < 0 &&
+	    producer_nice == 19 && consumer_nice == 19)
+		trace_printk("WARNING!!! This test is running at lowest priority.\n");
+
+	trace_printk("This iteration:           %llu (usecs)\n", time);
+	trace_printk("  Time:                   %llu (usecs)\n", time);
+	trace_printk("  Data production:\n");
+	trace_printk("    Written:              %llu\n", hit);
+	trace_printk("    Lost:                 %llu\n", missed);
+	trace_printk("  Data consumption:\n");
+	if (disable_reader)
+		trace_printk("    Read:                 (reader disabled)\n");
+	else
+		trace_printk("    Read:                 %lu\n",
+			     iter_read);
+	trace_printk("\n");
+	trace_printk("Global (only flushed subbuffers):\n");
+	trace_printk("  Data production:\n");
+	trace_printk("    Written:              %llu\n", written);
+	trace_printk("    Lost (buffer full)    %llu\n", lost_full);
+	trace_printk("    Lost (wrap around)    %llu\n", lost_wrap);
+	trace_printk("    Lost (event too big)  %llu\n", lost_big);
+	trace_printk("  Data consumption:\n");
+	if (disable_reader)
+		trace_printk("    Read:                 (reader disabled)\n");
+	else
+		trace_printk("    Read:                 %llu (%llu read-side)\n",
+			     read, global_read);
+	trace_printk("    Overruns:             %llu\n", overruns);
+	trace_printk("    Non-consumed entries: %llu\n", entries);
+	trace_printk("    Consumption total:    %llu\n",
+		     entries + overruns + read);
+
+	/* Convert time from usecs to millisecs */
+	do_div(time, USEC_PER_MSEC);
+	if (time)
+		hit /= (long)time;
+	else
+		trace_printk("TIME IS ZERO??\n");
+
+	trace_printk("Entries per millisec: %llu\n", hit);
+
+	if (hit) {
+		/* Calculate the average time in nanosecs */
+		avg = NSEC_PER_MSEC / hit;
+		trace_printk("%lu ns per entry written\n", avg);
+	}
+
+	if (missed) {
+		if (time)
+			missed /= (long)time;
+
+		trace_printk("Total iterations per millisec: %llu\n",
+			     hit + missed);
+
+		/* it is possible that hit + missed will overflow and be zero */
+		if (!(hit + missed)) {
+			trace_printk("hit + missed overflowed and totalled zero!\n");
+			hit--; /* make it non zero */
+		}
+
+		/* Caculate the average time in nanosecs */
+		avg = NSEC_PER_MSEC / (hit + missed);
+		trace_printk("%lu ns per entry (written+lost)\n", avg);
+	}
+	iter_read = 0;
+}
+
+static void wait_to_die(void)
+{
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+}
+
+static int ring_buffer_consumer_thread(void *arg)
+{
+	struct ring_buffer *buf;
+	int cpu;
+
+	wait_for_completion_interruptible(&read_start);
+
+	if (client_config.alloc == RING_BUFFER_ALLOC_GLOBAL) {
+		buf = channel_get_ring_buffer(&client_config, channel, 0);
+		WARN_ON_ONCE(ring_buffer_open_read(buf));
+	} else {
+		/* TODO: benchmark does not take cpu hotplug into account. */
+		for_each_channel_cpu(cpu, channel) {
+			buf = channel_get_ring_buffer(&client_config, channel,
+						      cpu);
+			WARN_ON_ONCE(ring_buffer_open_read(buf));
+		}
+	}
+
+	ring_buffer_consumer();
+
+	if (client_config.alloc == RING_BUFFER_ALLOC_GLOBAL)
+		ring_buffer_release_read(buf);
+	else {
+		/* TODO: benchmark does not take cpu hotplug into account. */
+		for_each_channel_cpu(cpu, channel) {
+			buf = channel_get_ring_buffer(&client_config, channel,
+						      cpu);
+			ring_buffer_release_read(buf);
+		}
+	}
+
+	wait_to_die();
+
+	return 0;
+}
+
+static int ring_buffer_producer_thread(void *arg)
+{
+	if (consumer)
+		complete(&read_start);
+
+	while (!kthread_should_stop() && !kill_test) {
+		ring_buffer_producer();
+
+		trace_printk("Sleeping for 10 secs\n");
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(HZ * SLEEP_TIME);
+		__set_current_state(TASK_RUNNING);
+	}
+
+	if (kill_test)
+		wait_to_die();
+
+	return 0;
+}
+
+static int __init ring_buffer_benchmark_init(void)
+{
+	int ret;
+
+	/* make a one meg buffer in overwite mode */
+	/* altern. ftrace equivalent: 4k subbuffers: 4096 * 256. */
+	channel = channel_create(&client_config, "benchmark", NULL, NULL,
+				 /* 4096, 256, */
+				 131072, 8,
+				 /* 524288, 2, */
+				 100000, 100000);
+	if (!channel)
+		return -EINVAL;
+
+	if (!disable_reader) {
+		init_completion(&read_start);
+		consumer = kthread_run(ring_buffer_consumer_thread,
+				       NULL, "rb_consumer");
+		ret = PTR_ERR(consumer);
+		if (IS_ERR(consumer))
+			goto out_fail;
+	}
+
+	producer = kthread_run(ring_buffer_producer_thread,
+			       NULL, "rb_producer");
+	ret = PTR_ERR(producer);
+
+	if (IS_ERR(producer))
+		goto out_kill;
+
+	/*
+	 * Run them as low-prio background tasks by default:
+	 */
+	if (!disable_reader) {
+		if (consumer_fifo >= 0) {
+			struct sched_param param = {
+				.sched_priority = consumer_fifo
+			};
+			sched_setscheduler(consumer, SCHED_FIFO, &param);
+		} else
+			set_user_nice(consumer, consumer_nice);
+	}
+
+	if (producer_fifo >= 0) {
+		struct sched_param param = {
+			.sched_priority = consumer_fifo
+		};
+		sched_setscheduler(producer, SCHED_FIFO, &param);
+	} else
+		set_user_nice(producer, producer_nice);
+
+	return 0;
+
+out_kill:
+	if (consumer)
+		kthread_kill_stop(consumer, SIGKILL);
+out_fail:
+	channel_destroy(channel);
+	return ret;
+}
+
+static void __exit ring_buffer_benchmark_exit(void)
+{
+	kthread_kill_stop(producer, SIGKILL);
+	channel_destroy(channel);
+	if (consumer)
+		kthread_kill_stop(consumer, SIGKILL);
+}
+
+module_init(ring_buffer_benchmark_init);
+module_exit(ring_buffer_benchmark_exit);
+
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("lib_ring_buffer_benchmark");
+MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/Kconfig
===================================================================
--- linux.trees.git.orig/kernel/trace/Kconfig	2010-07-09 18:30:14.000000000 -0400
+++ linux.trees.git/kernel/trace/Kconfig	2010-07-09 18:32:08.000000000 -0400
@@ -499,6 +499,7 @@ config MMIOTRACE_TEST
 config RING_BUFFER_BENCHMARK
 	tristate "Ring buffer benchmark stress tester"
 	depends on FTRACE_RING_BUFFER
+	depends on LIB_RING_BUFFER
 	help
 	  This option creates a test to stress the ring buffer and benchmark it.
 	  It creates its own ring buffer such that it will not interfere with
Index: linux.trees.git/kernel/trace/Makefile
===================================================================
--- linux.trees.git.orig/kernel/trace/Makefile	2010-07-09 18:30:14.000000000 -0400
+++ linux.trees.git/kernel/trace/Makefile	2010-07-09 18:32:08.000000000 -0400
@@ -24,6 +24,7 @@ obj-y += trace_clock.o
 obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
 obj-$(CONFIG_FTRACE_RING_BUFFER) += ftrace_ring_buffer.o
 obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ftrace_ring_buffer_benchmark.o
+obj-$(CONFIG_RING_BUFFER_BENCHMARK) += lib_ring_buffer_benchmark.o
 
 obj-$(CONFIG_TRACING) += trace.o
 obj-$(CONFIG_TRACING) += trace_output.o


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 18/20] Ring Buffer Record Iterator
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (16 preceding siblings ...)
  2010-07-09 22:57 ` [patch 17/20] Ring buffer benchmark library Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 19/20] Ring Buffer: Basic API Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 20/20] Ring buffer: benchmark simple API Mathieu Desnoyers
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring-buffer-record-iterator.patch --]
[-- Type: text/plain, Size: 26648 bytes --]

Implements per-cpu-local iterator and channel-wide iterator. Implements a read()
file operation based on these iterators. These iterators or the read() file
operation can be used by ring buffer clients.

The channel-wide iterator implements timestamp-ordered fusion merge of per-cpu
channels using a priority heap.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/ringbuffer/iterator.h   |   70 ++
 lib/ringbuffer/Makefile               |    1 
 lib/ringbuffer/ring_buffer_iterator.c |  795 ++++++++++++++++++++++++++++++++++
 3 files changed, 866 insertions(+)

Index: linux.trees.git/lib/ringbuffer/ring_buffer_iterator.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_iterator.c	2010-07-09 18:35:12.000000000 -0400
@@ -0,0 +1,795 @@
+/*
+ * ring_buffer_iterator.c
+ *
+ * (C) Copyright 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer and channel iterators. Get each event of a channel in order. Uses
+ * a prio heap for per-cpu buffers, giving a O(log(NR_CPUS)) algorithmic
+ * complexity for the "get next event" operation.
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/iterator.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+
+/*
+ * Safety factor taking into account internal kernel interrupt latency.
+ * Assuming 250ms worse-case latency.
+ */
+#define MAX_SYSTEM_LATENCY	250
+
+/*
+ * Maximum delta expected between trace clocks. At most 1 jiffy delta.
+ */
+#define MAX_CLOCK_DELTA		(jiffies_to_usecs(1) * 1000)
+
+/**
+ * ring_buffer_get_next_record - Get the next record in a buffer.
+ * @chan: channel
+ * @buf: buffer
+ *
+ * Returns the size of the event read, -EAGAIN if buffer is empty, -ENODATA if
+ * buffer is empty and finalized. The buffer must already be opened for reading.
+ */
+ssize_t ring_buffer_get_next_record(struct channel *chan,
+				    struct ring_buffer *buf)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer_iter *iter = &buf->iter;
+	int ret;
+
+restart:
+	switch (iter->state) {
+	case ITER_GET_SUBBUF:
+		ret = ring_buffer_get_subbuf(buf, &iter->consumed);
+		if (ret && !ACCESS_ONCE(buf->finalized)
+		    && config->alloc == RING_BUFFER_ALLOC_GLOBAL) {
+			/*
+			 * Use "pull" scheme for global buffers. The reader
+			 * itself flushes the buffer to "pull" data not visible
+			 * to readers yet. Flush current subbuffer and re-try.
+			 *
+			 * Per-CPU buffers rather use a "push" scheme because
+			 * the IPI needed to flush all CPU's buffers is too
+			 * costly. In the "push" scheme, the reader waits for
+			 * the writer periodic deferrable timer to flush the
+			 * buffers (keeping track of a quiescent state
+			 * timestamp). Therefore, the writer "pushes" data out
+			 * of the buffers rather than letting the reader "pull"
+			 * data from the buffer.
+			 */
+			ring_buffer_switch_slow(buf, SWITCH_ACTIVE);
+			ret = ring_buffer_get_subbuf(buf, &iter->consumed);
+		}
+		if (ret)
+			return ret;
+		iter->data_size = ring_buffer_get_read_data_size(config, buf);
+		iter->read_offset = iter->consumed;
+		/* skip header */
+		iter->read_offset += config->cb.subbuffer_header_size();
+		iter->state = ITER_TEST_RECORD;
+		goto restart;
+	case ITER_TEST_RECORD:
+		if (iter->read_offset - iter->consumed >= iter->data_size) {
+			iter->state = ITER_PUT_SUBBUF;
+		} else {
+			CHAN_WARN_ON(chan, !config->cb.record_get);
+			config->cb.record_get(config, chan, buf,
+					      iter->read_offset,
+					      &iter->header_len,
+					      &iter->payload_len,
+					      &iter->timestamp);
+			iter->read_offset += iter->header_len;
+			subbuffer_consume_record(config, &buf->backend);
+			iter->state = ITER_NEXT_RECORD;
+			return iter->payload_len;
+		}
+		goto restart;
+	case ITER_NEXT_RECORD:
+		iter->read_offset += iter->payload_len;
+		iter->state = ITER_TEST_RECORD;
+		goto restart;
+	case ITER_PUT_SUBBUF:
+		ring_buffer_put_subbuf(buf, iter->consumed);
+		iter->state = ITER_GET_SUBBUF;
+		goto restart;
+	default:
+		CHAN_WARN_ON(chan, 1);	/* Should not happen */
+		return -EPERM;
+	}
+}
+EXPORT_SYMBOL_GPL(ring_buffer_get_next_record);
+
+static int buf_is_higher(void *a, void *b)
+{
+	struct ring_buffer *bufa = a;
+	struct ring_buffer *bufb = b;
+
+	/* Consider lowest timestamps to be at the top of the heap */
+	return (bufa->iter.timestamp < bufb->iter.timestamp);
+}
+
+static
+void ring_buffer_get_empty_buf_records(const struct ring_buffer_config *config,
+				       struct channel *chan)
+{
+	struct ptr_heap *heap = &chan->iter.heap;
+	struct ring_buffer *buf, *tmp;
+	ssize_t len;
+
+	list_for_each_entry_safe(buf, tmp, &chan->iter.empty_head,
+				 iter.empty_node) {
+		len = ring_buffer_get_next_record(chan, buf);
+
+		/*
+		 * Deal with -EAGAIN and -ENODATA.
+		 * len >= 0 means record contains data.
+		 * -EBUSY should never happen, because we support only one
+		 * reader.
+		 */
+		switch (len) {
+		case -EAGAIN:
+			/* Keep node in empty list */
+			break;
+		case -ENODATA:
+			/*
+			 * Buffer is finalized. Don't add to list of empty
+			 * buffer, because it has no more data to provide, ever.
+			 */
+			list_del(&buf->iter.empty_node);
+			break;
+		case -EBUSY:
+			CHAN_WARN_ON(chan, 1);
+			break;
+		default:
+			/*
+			 * Insert buffer into the heap, remove from empty buffer
+			 * list. The heap should never overflow.
+			 */
+			CHAN_WARN_ON(chan, len < 0);
+			list_del(&buf->iter.empty_node);
+			CHAN_WARN_ON(chan, heap_insert(heap, buf) != NULL);
+		}
+	}
+}
+
+static
+void ring_buffer_wait_for_qs(const struct ring_buffer_config *config,
+			     struct channel *chan)
+{
+	u64 timestamp_qs;
+	unsigned long wait_msecs;
+
+	/*
+	 * No need to wait if no empty buffers are present.
+	 */
+	if (list_empty(&chan->iter.empty_head))
+		return;
+
+	timestamp_qs = config->cb.ring_buffer_clock_read(chan);
+	/*
+	 * We need to consider previously empty buffers.
+	 * Do a get next buf record on each of them. Add them to
+	 * the heap if they have data. If at least one of them
+	 * don't have data, we need to wait for
+	 * switch_timer_interval + MAX_SYSTEM_LATENCY (so we are sure the
+	 * buffers have been switched either by the timer or idle entry) and
+	 * check them again, adding them if they have data.
+	 */
+	ring_buffer_get_empty_buf_records(config, chan);
+
+	/*
+	 * No need to wait if no empty buffers are present.
+	 */
+	if (list_empty(&chan->iter.empty_head))
+		return;
+
+	/*
+	 * We need to wait for the buffer switch timer to run. If the
+	 * CPU is idle, idle entry performed the switch.
+	 * TODO: we could optimize further by skipping the sleep if all
+	 * empty buffers belong to idle or offline cpus.
+	 */
+	wait_msecs = jiffies_to_msecs(chan->switch_timer_interval);
+	wait_msecs += MAX_SYSTEM_LATENCY;
+	msleep(wait_msecs);
+	ring_buffer_get_empty_buf_records(config, chan);
+	/*
+	 * Any buffer still in the empty list here cannot possibly
+	 * contain an event with a timestamp prior to "timestamp_qs".
+	 * The new quiescent state timestamp is the one we grabbed
+	 * before waiting for buffer data.  It is therefore safe to
+	 * ignore empty buffers up to last_qs timestamp for fusion
+	 * merge.
+	 */
+	chan->iter.last_qs = timestamp_qs;
+}
+
+/**
+ * channel_get_next_record - Get the next record in a channel.
+ * @chan: channel
+ * @ret_buf: the buffer in which the event is located (output)
+ *
+ * Returns the size of new current event, -EAGAIN if all buffers are empty,
+ * -ENODATA if all buffers are empty and finalized. The channel must already be
+ * opened for reading.
+ */
+
+ssize_t channel_get_next_record(struct channel *chan,
+				struct ring_buffer **ret_buf)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer *buf;
+	struct ptr_heap *heap;
+	ssize_t len;
+
+	if (config->alloc == RING_BUFFER_ALLOC_GLOBAL) {
+		*ret_buf = channel_get_ring_buffer(config, chan, 0);
+		return ring_buffer_get_next_record(chan, *ret_buf);
+	}
+
+	heap = &chan->iter.heap;
+
+	/*
+	 * get next record for topmost buffer.
+	 */
+	buf = heap_maximum(heap);
+	if (buf) {
+		len = ring_buffer_get_next_record(chan, buf);
+		/*
+		 * Deal with -EAGAIN and -ENODATA.
+		 * len >= 0 means record contains data.
+		 */
+		switch (len) {
+		case -EAGAIN:
+			buf->iter.timestamp = 0;
+			list_add(&buf->iter.empty_node, &chan->iter.empty_head);
+			/* Remove topmost buffer from the heap */
+			CHAN_WARN_ON(chan, heap_remove(heap) != buf);
+			break;
+		case -ENODATA:
+			/*
+			 * Buffer is finalized. Remove buffer from heap and
+			 * don't add to list of empty buffer, because it has no
+			 * more data to provide, ever.
+			 */
+			CHAN_WARN_ON(chan, heap_remove(heap) != buf);
+			break;
+		case -EBUSY:
+			CHAN_WARN_ON(chan, 1);
+			break;
+		default:
+			/*
+			 * Reinsert buffer into the heap. Note that heap can be
+			 * partially empty, so we need to use
+			 * heap_replace_max().
+			 */
+			CHAN_WARN_ON(chan, len < 0);
+			CHAN_WARN_ON(chan, heap_replace_max(heap, buf) != buf);
+			break;
+		}
+	}
+
+	buf = heap_maximum(heap);
+	if (!buf || buf->iter.timestamp > chan->iter.last_qs) {
+		/*
+		 * Deal with buffers previously showing no data.
+		 * Add buffers containing data to the heap, update
+		 * last_qs.
+		 */
+		ring_buffer_wait_for_qs(config, chan);
+	}
+
+	*ret_buf = buf = heap_maximum(heap);
+	if (buf) {
+		/*
+		 * If this warning triggers, you probably need to check your
+		 * system interrupt latency. Typical causes: too many printk()
+		 * output going to a serial console with interrupts off.
+		 * Allow for MAX_CLOCK_DELTA ns timestamp delta going backward.
+		 * Observed on SMP KVM setups with trace_clock().
+		 */
+		if (chan->iter.last_timestamp
+		    > (buf->iter.timestamp + MAX_CLOCK_DELTA)) {
+			printk(KERN_WARNING "ring_buffer: timestamps going "
+			       "backward. Last time %llu ns, cpu %d, "
+			       "current time %llu ns, cpu %d, "
+			       "delta %llu ns.\n",
+			       chan->iter.last_timestamp, chan->iter.last_cpu,
+			       buf->iter.timestamp, buf->backend.cpu,
+			       chan->iter.last_timestamp - buf->iter.timestamp);
+			CHAN_WARN_ON(chan, 1);
+		}
+		chan->iter.last_timestamp = buf->iter.timestamp;
+		chan->iter.last_cpu = buf->backend.cpu;
+		return buf->iter.payload_len;
+	} else {
+		/* Heap is empty */
+		if (list_empty(&chan->iter.empty_head))
+			return -ENODATA;	/* All buffers finalized */
+		else
+			return -EAGAIN;		/* Temporarily empty */
+	}
+}
+EXPORT_SYMBOL_GPL(channel_get_next_record);
+
+static
+void ring_buffer_iterator_init(struct channel *chan, struct ring_buffer *buf)
+{
+	if (buf->iter.allocated)
+		return;
+
+	buf->iter.allocated = 1;
+	if (chan->iter.read_open && !buf->iter.read_open) {
+		CHAN_WARN_ON(chan, ring_buffer_open_read(buf) != 0);
+		buf->iter.read_open = 1;
+	}
+
+	/* Add to list of buffers without any current record */
+	if (chan->backend.config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		list_add(&buf->iter.empty_node, &chan->iter.empty_head);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static
+int __cpuinit channel_iterator_cpu_hotplug(struct notifier_block *nb,
+					   unsigned long action,
+					   void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	struct channel *chan = container_of(nb, struct channel,
+					    hp_iter_notifier);
+	struct ring_buffer *buf = per_cpu_ptr(chan->backend.buf, cpu);
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (!chan->hp_iter_enable)
+		return NOTIFY_DONE;
+
+	CHAN_WARN_ON(chan, config->alloc == RING_BUFFER_ALLOC_GLOBAL);
+
+	switch (action) {
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		ring_buffer_iterator_init(chan, buf);
+		return NOTIFY_OK;
+	default:
+		return NOTIFY_DONE;
+	}
+}
+#endif
+
+int channel_iterator_init(struct channel *chan)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer *buf;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		int cpu, ret;
+
+		INIT_LIST_HEAD(&chan->iter.empty_head);
+		ret = heap_init(&chan->iter.heap,
+				num_possible_cpus()
+				* sizeof(struct ring_buffer *),
+				GFP_KERNEL, buf_is_higher);
+		if (ret)
+			return ret;
+		/*
+		 * In case of non-hotplug cpu, if the ring-buffer is allocated
+		 * in early initcall, it will not be notified of secondary cpus.
+		 * In that off case, we need to allocate for all possible cpus.
+		 */
+#ifdef CONFIG_HOTPLUG_CPU
+		chan->hp_iter_notifier.notifier_call =
+			channel_iterator_cpu_hotplug;
+		chan->hp_iter_notifier.priority = 10;
+		register_cpu_notifier(&chan->hp_iter_notifier);
+		get_online_cpus();
+		for_each_online_cpu(cpu) {
+			buf = per_cpu_ptr(chan->backend.buf, cpu);
+			ring_buffer_iterator_init(chan, buf);
+		}
+		chan->hp_iter_enable = 1;
+		put_online_cpus();
+#else
+		for_each_possible_cpu(cpu) {
+			buf = per_cpu_ptr(chan->backend.buf, cpu);
+			ring_buffer_iterator_init(chan, buf);
+		}
+#endif
+	} else {
+		buf = channel_get_ring_buffer(config, chan, 0);
+		ring_buffer_iterator_init(chan, buf);
+	}
+	return 0;
+}
+
+void channel_iterator_unregister_notifiers(struct channel *chan)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		chan->hp_iter_enable = 0;
+		unregister_cpu_notifier(&chan->hp_iter_notifier);
+	}
+}
+
+void channel_iterator_free(struct channel *chan)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		heap_free(&chan->iter.heap);
+}
+
+int ring_buffer_iterator_open(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+	const struct ring_buffer_config *config = chan->backend.config;
+	CHAN_WARN_ON(chan, config->output != RING_BUFFER_ITERATOR);
+	return ring_buffer_open_read(buf);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_iterator_open);
+
+/*
+ * Note: Iterators must not be mixed with other types of outputs, because an
+ * iterator can leave the buffer in "GET" state, which is not consistent with
+ * other types of output (mmap, splice, raw data read).
+ */
+void ring_buffer_iterator_release(struct ring_buffer *buf)
+{
+	ring_buffer_release_read(buf);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_iterator_release);
+
+int channel_iterator_open(struct channel *chan)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer *buf;
+	int ret = 0, cpu;
+
+	CHAN_WARN_ON(chan, config->output != RING_BUFFER_ITERATOR);
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		get_online_cpus();
+		/* Allow CPU hotplug to keep track of opened reader */
+		chan->iter.read_open = 1;
+		for_each_channel_cpu(cpu, chan) {
+			buf = channel_get_ring_buffer(config, chan, cpu);
+			ret = ring_buffer_iterator_open(buf);
+			if (ret)
+				goto error;
+			buf->iter.read_open = 1;
+		}
+		put_online_cpus();
+	} else {
+		buf = channel_get_ring_buffer(config, chan, 0);
+		ret = ring_buffer_iterator_open(buf);
+	}
+	return ret;
+error:
+	/* Error should always happen on CPU 0, hence no close is required. */
+	CHAN_WARN_ON(chan, cpu != 0);
+	put_online_cpus();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(channel_iterator_open);
+
+void channel_iterator_release(struct channel *chan)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer *buf;
+	int cpu;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU) {
+		get_online_cpus();
+		for_each_channel_cpu(cpu, chan) {
+			buf = channel_get_ring_buffer(config, chan, cpu);
+			if (buf->iter.read_open) {
+				ring_buffer_iterator_release(buf);
+				buf->iter.read_open = 0;
+			}
+		}
+		chan->iter.read_open = 0;
+		put_online_cpus();
+	} else {
+		buf = channel_get_ring_buffer(config, chan, 0);
+		ring_buffer_iterator_release(buf);
+	}
+}
+EXPORT_SYMBOL_GPL(channel_iterator_release);
+
+void ring_buffer_iterator_reset(struct ring_buffer *buf)
+{
+	struct channel *chan = buf->backend.chan;
+
+	if (buf->iter.state != ITER_GET_SUBBUF)
+		ring_buffer_put_subbuf(buf, buf->iter.consumed);
+	buf->iter.state = ITER_GET_SUBBUF;
+	/* Remove from heap (if present). */
+	if (heap_cherrypick(&chan->iter.heap, buf))
+		list_add(&buf->iter.empty_node, &chan->iter.empty_head);
+	buf->iter.timestamp = 0;
+	buf->iter.header_len = 0;
+	buf->iter.payload_len = 0;
+	buf->iter.consumed = 0;
+	buf->iter.read_offset = 0;
+	buf->iter.data_size = 0;
+	/* Don't reset allocated and read_open */
+}
+
+void channel_iterator_reset(struct channel *chan)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	struct ring_buffer *buf;
+	int cpu;
+
+	/* Empty heap, put into empty_head */
+	while ((buf = heap_remove(&chan->iter.heap)) != NULL)
+		list_add(&buf->iter.empty_node, &chan->iter.empty_head);
+
+	for_each_channel_cpu(cpu, chan) {
+		buf = channel_get_ring_buffer(config, chan, cpu);
+		ring_buffer_iterator_reset(buf);
+	}
+	/* Don't reset read_open */
+	chan->iter.last_qs = 0;
+	chan->iter.last_timestamp = 0;
+	chan->iter.last_cpu = 0;
+	chan->iter.len_left = 0;
+}
+
+/*
+ * Ring buffer payload extraction read() implementation.
+ */
+static
+ssize_t channel_ring_buffer_file_read(struct file *filp,
+				      char __user *user_buf,
+				      size_t count,
+				      loff_t *ppos,
+				      struct channel *chan,
+				      struct ring_buffer *buf,
+				      int fusionmerge)
+{
+	const struct ring_buffer_config *config = chan->backend.config;
+	size_t read_count = 0, read_offset;
+	ssize_t len;
+
+	might_sleep();
+	if (!access_ok(VERIFY_WRITE, user_buf, count))
+		return -EFAULT;
+
+	/* Finish copy of previous record */
+	if (*ppos != 0) {
+		if (read_count < count) {
+			len = chan->iter.len_left;
+			read_offset = *ppos;
+			if (config->alloc == RING_BUFFER_ALLOC_PER_CPU
+			    && fusionmerge)
+				buf = heap_maximum(&chan->iter.heap);
+			CHAN_WARN_ON(chan, !buf);
+			goto skip_get_next;
+		}
+	}
+
+	while (read_count < count) {
+		size_t copy_len, space_left;
+
+		if (fusionmerge)
+			len = channel_get_next_record(chan, &buf);
+		else
+			len = ring_buffer_get_next_record(chan, buf);
+len_test:
+		if (len < 0) {
+			/*
+			 * Check if buffer is finalized (end of file).
+			 */
+			if (len == -ENODATA) {
+				/* A 0 read_count will tell about end of file */
+				goto nodata;
+			}
+			if (filp->f_flags & O_NONBLOCK) {
+				if (!read_count)
+					read_count = -EAGAIN;
+				goto nodata;
+			} else {
+				int error;
+
+				/*
+				 * No data available at the moment, return what
+				 * we got.
+				 */
+				if (read_count)
+					goto nodata;
+
+				/*
+				 * Wait for returned len to be >= 0 or -ENODATA.
+				 */
+				if (fusionmerge)
+					error = wait_event_interruptible(
+					  chan->read_wait,
+					  ((len = channel_get_next_record(chan,
+						&buf)), len != -EAGAIN));
+				else
+					error = wait_event_interruptible(
+					  buf->read_wait,
+					  ((len = ring_buffer_get_next_record(
+						  chan, buf)), len != -EAGAIN));
+				CHAN_WARN_ON(chan, len == -EBUSY);
+				if (error) {
+					read_count = error;
+					goto nodata;
+				}
+				CHAN_WARN_ON(chan, len < 0 && len != -ENODATA);
+				goto len_test;
+			}
+		}
+		read_offset = buf->iter.read_offset;
+skip_get_next:
+		space_left = count - read_count;
+		if (len <= space_left) {
+			copy_len = len;
+			chan->iter.len_left = 0;
+			*ppos = 0;
+		} else {
+			copy_len = space_left;
+			chan->iter.len_left = len - copy_len;
+			*ppos = read_offset + copy_len;
+		}
+		if (__ring_buffer_copy_to_user(&buf->backend, read_offset,
+					       &user_buf[read_count],
+					       copy_len)) {
+			/*
+			 * Leave the len_left and ppos values at their current
+			 * state, as we currently have a valid event to read.
+			 */
+			return -EFAULT;
+		}
+		read_count += copy_len;
+	};
+	return read_count;
+
+nodata:
+	*ppos = 0;
+	chan->iter.len_left = 0;
+	return read_count;
+}
+
+/**
+ * ring_buffer_sp_file_read - Read buffer record payload.
+ * @filp: file structure pointer.
+ * @buffer: user buffer to read data into.
+ * @count: number of bytes to read.
+ * @ppos: file read position.
+ *
+ * Returns a negative value on error, or the number of bytes read on success.
+ * ppos is used to save the position _within the current record_ between calls
+ * to read().
+ */
+static
+ssize_t ring_buffer_file_read(struct file *filp,
+			      char __user *user_buf,
+			      size_t count,
+			      loff_t *ppos)
+{
+	struct inode *inode = filp->f_dentry->d_inode;
+	struct ring_buffer *buf = inode->i_private;
+	struct channel *chan = buf->backend.chan;
+
+	return channel_ring_buffer_file_read(filp, user_buf, count, ppos,
+					     chan, buf, 0);
+}
+
+/**
+ * channel_file_read - Read channel record payload.
+ * @filp: file structure pointer.
+ * @buffer: user buffer to read data into.
+ * @count: number of bytes to read.
+ * @ppos: file read position.
+ *
+ * Returns a negative value on error, or the number of bytes read on success.
+ * ppos is used to save the position _within the current record_ between calls
+ * to read().
+ */
+static
+ssize_t channel_file_read(struct file *filp,
+			  char __user *user_buf,
+			  size_t count,
+			  loff_t *ppos)
+{
+	struct inode *inode = filp->f_dentry->d_inode;
+	struct channel *chan = inode->i_private;
+	const struct ring_buffer_config *config = chan->backend.config;
+
+	if (config->alloc == RING_BUFFER_ALLOC_PER_CPU)
+		return channel_ring_buffer_file_read(filp, user_buf, count,
+						     ppos, chan, NULL, 1);
+	else {
+		struct ring_buffer *buf =
+			channel_get_ring_buffer(config, chan, 0);
+		return channel_ring_buffer_file_read(filp, user_buf, count,
+						     ppos, chan, buf, 0);
+	}
+}
+
+static
+int ring_buffer_file_open(struct inode *inode, struct file *file)
+{
+	struct ring_buffer *buf = inode->i_private;
+	int ret;
+
+	ret = ring_buffer_iterator_open(buf);
+	if (ret)
+		return ret;
+
+	file->private_data = buf;
+	ret = nonseekable_open(inode, file);
+	if (ret)
+		goto release_iter;
+	return 0;
+
+release_iter:
+	ring_buffer_iterator_release(buf);
+	return ret;
+}
+
+static
+int ring_buffer_file_release(struct inode *inode, struct file *file)
+{
+	struct ring_buffer *buf = inode->i_private;
+
+	ring_buffer_iterator_release(buf);
+	return 0;
+}
+
+static
+int channel_file_open(struct inode *inode, struct file *file)
+{
+	struct channel *chan = inode->i_private;
+	int ret;
+
+	ret = channel_iterator_open(chan);
+	if (ret)
+		return ret;
+
+	file->private_data = chan;
+	ret = nonseekable_open(inode, file);
+	if (ret)
+		goto release_iter;
+	return 0;
+
+release_iter:
+	channel_iterator_release(chan);
+	return ret;
+}
+
+static
+int channel_file_release(struct inode *inode, struct file *file)
+{
+	struct channel *chan = inode->i_private;
+
+	channel_iterator_release(chan);
+	return 0;
+}
+
+const struct file_operations channel_payload_file_operations = {
+	.open = channel_file_open,
+	.release = channel_file_release,
+	.read = channel_file_read,
+	.llseek = ring_buffer_no_llseek,
+};
+EXPORT_SYMBOL_GPL(channel_payload_file_operations);
+
+const struct file_operations ring_buffer_payload_file_operations = {
+	.open = ring_buffer_file_open,
+	.release = ring_buffer_file_release,
+	.read = ring_buffer_file_read,
+	.llseek = ring_buffer_no_llseek,
+};
+EXPORT_SYMBOL_GPL(ring_buffer_payload_file_operations);
Index: linux.trees.git/include/linux/ringbuffer/iterator.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/iterator.h	2010-07-09 18:34:50.000000000 -0400
@@ -0,0 +1,70 @@
+#ifndef _LINUX_RING_BUFFER_ITERATOR_H
+#define _LINUX_RING_BUFFER_ITERATOR_H
+
+/*
+ * linux/ringbuffer/iterator.h
+ *
+ * (C) Copyright 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer and channel iterators.
+ *
+ * Author:
+ *	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/backend.h>
+#include <linux/ringbuffer/frontend.h>
+
+/*
+ * ring_buffer_get_next_record advances the buffer read position to the next
+ * record. It returns either the size of the next record, -EAGAIN if there is
+ * currently no data available, or -ENODATA if no data is available and buffer
+ * is finalized.
+ */
+extern ssize_t ring_buffer_get_next_record(struct channel *chan,
+					   struct ring_buffer *buf);
+
+/*
+ * channel_get_next_record advances the buffer read position to the next record.
+ * It returns either the size of the next record, -EAGAIN if there is currently
+ * no data available, or -ENODATA if no data is available and buffer is
+ * finalized.
+ * Returns the current buffer in ret_buf.
+ */
+extern ssize_t channel_get_next_record(struct channel *chan,
+				       struct ring_buffer **ret_buf);
+
+/**
+ * read_current_record - copy the buffer current record into dest.
+ * @buf: ring buffer
+ * @dest: destination where the record should be copied
+ *
+ * dest should be large enough to contain the record. Returns the number of
+ * bytes copied.
+ */
+static inline size_t read_current_record(struct ring_buffer *buf, void *dest)
+{
+	return ring_buffer_read(&buf->backend, buf->iter.read_offset,
+				dest, buf->iter.payload_len);
+}
+
+extern int ring_buffer_iterator_open(struct ring_buffer *buf);
+extern void ring_buffer_iterator_release(struct ring_buffer *buf);
+extern int channel_iterator_open(struct channel *chan);
+extern void channel_iterator_release(struct channel *chan);
+
+extern const struct file_operations channel_payload_file_operations;
+extern const struct file_operations ring_buffer_payload_file_operations;
+
+/*
+ * Used internally.
+ */
+int channel_iterator_init(struct channel *chan);
+void channel_iterator_unregister_notifiers(struct channel *chan);
+void channel_iterator_free(struct channel *chan);
+void channel_iterator_reset(struct channel *chan);
+void ring_buffer_iterator_reset(struct ring_buffer *buf);
+
+#endif /* _LINUX_RING_BUFFER_ITERATOR_H */
Index: linux.trees.git/lib/ringbuffer/Makefile
===================================================================
--- linux.trees.git.orig/lib/ringbuffer/Makefile	2010-07-09 18:31:10.000000000 -0400
+++ linux.trees.git/lib/ringbuffer/Makefile	2010-07-09 18:34:50.000000000 -0400
@@ -1,5 +1,6 @@
 obj-y += ring_buffer_backend.o
 obj-y += ring_buffer_frontend.o
+obj-y += ring_buffer_iterator.o
 obj-y += ring_buffer_vfs.o
 obj-y += ring_buffer_splice.o
 obj-y += ring_buffer_mmap.o


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 19/20] Ring Buffer: Basic API
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (17 preceding siblings ...)
  2010-07-09 22:57 ` [patch 18/20] Ring Buffer Record Iterator Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  2010-07-09 22:57 ` [patch 20/20] Ring buffer: benchmark simple API Mathieu Desnoyers
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring-buffer-simple-api.patch --]
[-- Type: text/plain, Size: 45079 bytes --]

Offer basic APIs for pre-built ring buffer clients:

- Global Overwrite
- Global Discard

- Per-cpu Overwrite with channel-wide iterator
- Per-cpu Discard with channel-wide iterator

- Per-cpu Overwrite with buffer-local iterator
- Per-cpu Discard with buffer-local iterator

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/ringbuffer/global_discard.h         |   39 ++
 include/linux/ringbuffer/global_overwrite.h       |   40 ++
 include/linux/ringbuffer/percpu_discard.h         |   39 ++
 include/linux/ringbuffer/percpu_local_discard.h   |   47 ++
 include/linux/ringbuffer/percpu_local_overwrite.h |   47 ++
 include/linux/ringbuffer/percpu_overwrite.h       |   40 ++
 lib/ringbuffer/Makefile                           |   15 
 lib/ringbuffer/ring_buffer_global.c               |  317 ++++++++++++++++++
 lib/ringbuffer/ring_buffer_percpu.c               |  334 +++++++++++++++++++
 lib/ringbuffer/ring_buffer_percpu_local.c         |  371 ++++++++++++++++++++++
 10 files changed, 1283 insertions(+), 6 deletions(-)

Index: linux.trees.git/lib/ringbuffer/Makefile
===================================================================
--- linux.trees.git.orig/lib/ringbuffer/Makefile	2010-07-09 18:34:50.000000000 -0400
+++ linux.trees.git/lib/ringbuffer/Makefile	2010-07-09 18:35:16.000000000 -0400
@@ -1,6 +1,9 @@
-obj-y += ring_buffer_backend.o
-obj-y += ring_buffer_frontend.o
-obj-y += ring_buffer_iterator.o
-obj-y += ring_buffer_vfs.o
-obj-y += ring_buffer_splice.o
-obj-y += ring_buffer_mmap.o
+ring_buffer-objs := ring_buffer_backend.o ring_buffer_frontend.o \
+		ring_buffer_iterator.o ring_buffer_vfs.o \
+		ring_buffer_splice.o ring_buffer_mmap.o
+
+obj-$(CONFIG_LIB_RING_BUFFER) += ring_buffer.o
+
+obj-$(CONFIG_LIB_RING_BUFFER_CLIENTS) += ring_buffer_global.o
+obj-$(CONFIG_LIB_RING_BUFFER_CLIENTS) += ring_buffer_percpu.o
+obj-$(CONFIG_LIB_RING_BUFFER_CLIENTS) += ring_buffer_percpu_local.o
Index: linux.trees.git/include/linux/ringbuffer/global_discard.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/global_discard.h	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,39 @@
+#ifndef _RING_BUFFER_GLOBAL_DISCARD_H
+#define _RING_BUFFER_GLOBAL_DISCARD_H
+
+/*
+ * ring_buffer_global_discard.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer global discard API. Drops records when buffer is full.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/iterator.h>
+#include <linux/types.h>
+
+/*
+ * ring_buffer_global_discard_create creates a global discard ring buffer.
+ * buf_size is the buffer size, which will be rounded up to the next power of 2
+ * (the floor value is 2*PAGE_SIZE). Returns the ring buffer channel address on
+ * success, NULL on error.
+ */
+extern struct channel *ring_buffer_global_discard_create(size_t buf_size);
+
+/*
+ * ring_buffer_global_discard_destroy finalizes all channel's buffers, waits
+ * for readers to release all references, and destroys the channel.
+ */
+extern void ring_buffer_global_discard_destroy(struct channel *chan);
+
+/*
+ * ring_buffer_global_discard_write writes a record into the ring buffer. The
+ * record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+extern int ring_buffer_global_discard_write(struct channel *chan,
+					    const void *src, size_t len);
+
+#endif /* _RING_BUFFER_GLOBAL_DISCARD_H */
Index: linux.trees.git/include/linux/ringbuffer/global_overwrite.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/global_overwrite.h	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,40 @@
+#ifndef _RING_BUFFER_GLOBAL_OVERWRITE_H
+#define _RING_BUFFER_GLOBAL_OVERWRITE_H
+
+/*
+ * ring_buffer_global_overwrite.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer global overwrite API. Overwrites oldest records when buffer is
+ * full.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/iterator.h>
+#include <linux/types.h>
+
+/*
+ * ring_buffer_global_overwrite_create creates a global overwrite ring buffer.
+ * buf_size is the buffer size, which will be rounded up to the next power of 2
+ * (the floor value is 2*PAGE_SIZE). Returns the ring buffer channel address on
+ * success, NULL on error.
+ */
+extern struct channel *ring_buffer_global_overwrite_create(size_t buf_size);
+
+/*
+ * ring_buffer_global_overwrite_destroy finalizes all channel's buffers, waits
+ * for readers to release all references, and destroys the channel.
+ */
+extern void ring_buffer_global_overwrite_destroy(struct channel *chan);
+
+/*
+ * ring_buffer_global_overwrite_write writes a record into the ring buffer. The
+ * record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+extern int ring_buffer_global_overwrite_write(struct channel *chan,
+					      const void *src, size_t len);
+
+#endif /* _RING_BUFFER_GLOBAL_OVERWRITE_H */
Index: linux.trees.git/include/linux/ringbuffer/percpu_discard.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/percpu_discard.h	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,39 @@
+#ifndef _RING_BUFFER_PERCPU_DISCARD_H
+#define _RING_BUFFER_PERCPU_DISCARD_H
+
+/*
+ * ring_buffer_percpu_discard.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Per-CPU ring buffer discard API. Drops records when buffer is full.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/iterator.h>
+#include <linux/types.h>
+
+/*
+ * ring_buffer_percpu_discard_create creates a per-cpu producer-consumer ring
+ * buffer.  buf_size is the buffer size, which will be rounded up to the next
+ * power of 2 (the floor value is 2*PAGE_SIZE). Returns the ring buffer channel
+ * address on success, NULL on error.
+ */
+extern struct channel *ring_buffer_percpu_discard_create(size_t buf_size);
+
+/*
+ * ring_buffer_percpu_discard_destroy finalizes all channel's buffers, waits
+ * for readers to release all references, and destroys the channel.
+ */
+extern void ring_buffer_percpu_discard_destroy(struct channel *chan);
+
+/*
+ * ring_buffer_percpu_discard_write writes a record into the ring buffer. The
+ * record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+extern int ring_buffer_percpu_discard_write(struct channel *chan,
+					    const void *src, size_t len);
+
+#endif /* _RING_BUFFER_PERCPU_DISCARD_H */
Index: linux.trees.git/include/linux/ringbuffer/percpu_local_discard.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/percpu_local_discard.h	2010-07-09 18:35:43.000000000 -0400
@@ -0,0 +1,47 @@
+#ifndef _RING_BUFFER_PERCPU_LOCAL_DISCARD_H
+#define _RING_BUFFER_PERCPU_LOCAL_DISCARD_H
+
+/*
+ * ring_buffer_percpu_local_discard.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Per-CPU local ring buffer discard API. Drops records when buffer is full.
+ * Presents per-cpu-buffer iterators.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/iterator.h>
+#include <linux/types.h>
+
+/*
+ * ring_buffer_percpu_local_discard_create creates a per-cpu producer-consumer
+ * ring buffer with local iterators. buf_size is the buffer size, which will be
+ * rounded up to the next power of 2 (the floor value is 2*PAGE_SIZE). Returns
+ * the ring buffer channel address on success, NULL on error.
+ * on_buffer_create and on_buffer_finalize are callbacks called whenever a
+ * per-cpu buffer is created or finalized. on_buffer_create returns 0 on
+ * success.
+ */
+extern
+struct channel *ring_buffer_percpu_local_discard_create(size_t buf_size,
+		int (*on_buffer_create)(struct ring_buffer *buf, int cpu),
+		void (*on_buffer_finalize)(struct ring_buffer *buf, int cpu));
+
+/*
+ * ring_buffer_percpu_local_discard_destroy finalizes all channel's buffers,
+ * waits for readers to release all references, and destroys the channel.
+ */
+extern void ring_buffer_percpu_local_discard_destroy(struct channel *chan);
+
+/*
+ * ring_buffer_percpu_local_discard_write writes a record into the ring buffer.
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+extern
+int ring_buffer_percpu_local_discard_write(struct channel *chan,
+					   const void *src, size_t len);
+
+#endif /* _RING_BUFFER_PERCPU_LOCAL_DISCARD_H */
Index: linux.trees.git/include/linux/ringbuffer/percpu_local_overwrite.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/percpu_local_overwrite.h	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,47 @@
+#ifndef _RING_BUFFER_PERCPU_LOCAL_OVERWRITE_H
+#define _RING_BUFFER_PERCPU_LOCAL_OVERWRITE_H
+
+/*
+ * ring_buffer_percpu_local_overwrite.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Per-CPU local ring buffer overwrite API. Overwrites oldest records
+ * when buffer is full. Presents per-cpu-buffer iterators.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/iterator.h>
+#include <linux/types.h>
+
+/*
+ * ring_buffer_percpu_local_overwrite_create creates a per-cpu overwrite ring
+ * buffer with local iterators. buf_size is the buffer size, which will be
+ * rounded up to the next power of 2 (the floor value is 2*PAGE_SIZE). Returns
+ * the ring buffer channel address on success, NULL on error.
+ * on_buffer_create and on_buffer_finalize are callbacks called whenever a
+ * per-cpu buffer is created or finalized. on_buffer_create returns 0 on
+ * success.
+ */
+extern
+struct channel *ring_buffer_percpu_local_overwrite_create(size_t buf_size,
+		int (*on_buffer_create)(struct ring_buffer *buf, int cpu),
+		void (*on_buffer_finalize)(struct ring_buffer *buf, int cpu));
+
+/*
+ * ring_buffer_percpu_local_overwrite_destroy finalizes all channel's buffers,
+ * waits for readers to release all references, and destroys the channel.
+ */
+extern void ring_buffer_percpu_local_overwrite_destroy(struct channel *chan);
+
+/*
+ * ring_buffer_percpu_local_overwrite_write writes a record into the ring
+ * buffer. The record starts at the "src" address and is "len" bytes long.
+ * Returns 0 on success, else it returns a negative error value.
+ */
+extern
+int ring_buffer_percpu_local_overwrite_write(struct channel *chan,
+					     const void *src, size_t len);
+
+#endif /* _RING_BUFFER_PERCPU_LOCAL_OVERWRITE_H */
Index: linux.trees.git/include/linux/ringbuffer/percpu_overwrite.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/include/linux/ringbuffer/percpu_overwrite.h	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,40 @@
+#ifndef _RING_BUFFER_PERCPU_OVERWRITE_H
+#define _RING_BUFFER_PERCPU_OVERWRITE_H
+
+/*
+ * ring_buffer_percpu_overwrite.h
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Per-CPU ring buffer overwrite API. Overwrites oldest records when
+ * buffer is full.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/ringbuffer/iterator.h>
+#include <linux/types.h>
+
+/*
+ * ring_buffer_percpu_overwrite_create creates a per-cpu overwrite ring buffer.
+ * buf_size is the buffer size, which will be rounded up to the next power of 2
+ * (the floor value is 2*PAGE_SIZE). Returns the ring buffer channel address on
+ * success, NULL on error.
+ */
+extern struct channel *ring_buffer_percpu_overwrite_create(size_t buf_size);
+
+/*
+ * ring_buffer_percpu_overwrite_destroy finalizes all channel's buffers, waits
+ * for readers to release all references, and destroys the channel.
+ */
+extern void ring_buffer_percpu_overwrite_destroy(struct channel *chan);
+
+/*
+ * ring_buffer_percpu_overwrite_write writes a record into the ring buffer. The
+ * record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+extern int ring_buffer_percpu_overwrite_write(struct channel *chan,
+					      const void *src, size_t len);
+
+#endif /* _RING_BUFFER_PERCPU_OVERWRITE_H */
Index: linux.trees.git/lib/ringbuffer/ring_buffer_global.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_global.c	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,317 @@
+/*
+ * ring_buffer_global.c
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Ring buffer global library implementation.
+ * Creates instances of both overwrite and discard modes.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/global_overwrite.h>
+#include <linux/ringbuffer/global_discard.h>
+#include <linux/ringbuffer/vfs.h>
+
+struct subbuffer_header {
+	uint8_t header_end[0];		/* End of header */
+};
+
+struct record_header {
+	uint32_t len;			/* Size of record payload */
+	uint8_t header_end[0];		/* End of header */
+};
+
+static inline
+u64 ring_buffer_clock_read(struct channel *chan)
+{
+	return 0;
+}
+
+static inline
+unsigned char record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size, size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	return offsetof(struct record_header, header_end);
+}
+
+#include <linux/ringbuffer/api.h>
+
+static
+u64 client_ring_buffer_clock_read(struct channel *chan)
+{
+	return 0;
+}
+
+static
+size_t client_record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size,
+				 size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	return record_header_size(config, chan, offset, data_size,
+				  pre_header_padding, rflags, ctx);
+}
+
+static
+size_t client_subbuffer_header_size(void)
+{
+	return offsetof(struct subbuffer_header, header_end);
+}
+
+static
+void client_buffer_begin(struct ring_buffer *buf, u64 tsc,
+			 unsigned int subbuf_idx)
+{
+}
+
+static
+void client_buffer_end(struct ring_buffer *buf, u64 tsc,
+		       unsigned int subbuf_idx, unsigned long data_size)
+{
+}
+
+static
+int client_buffer_create(struct ring_buffer *buf, void *priv,
+			 int cpu, const char *name)
+{
+	return 0;
+}
+
+static
+void client_buffer_finalize(struct ring_buffer *buf, void *priv, int cpu)
+{
+}
+
+static
+void client_record_get(const struct ring_buffer_config *config,
+		       struct channel *chan, struct ring_buffer *buf,
+		       size_t offset, size_t *header_len,
+		       size_t *payload_len, u64 *timestamp)
+{
+	struct record_header header;
+	int ret;
+
+	ret = ring_buffer_read(&buf->backend, offset, &header,
+			       offsetof(struct record_header, header_end));
+	CHAN_WARN_ON(chan, ret != offsetof(struct record_header, header_end));
+	*header_len = offsetof(struct record_header, header_end);
+	*payload_len = header.len;
+	*timestamp = 0;
+}
+
+/*
+ * Typically 8 subbuffers of variable size.
+ * Maximum subbuffer size is 4GB. Allocate more subbuffers if more space is
+ * requested.
+ * Periodical buffer switch deferrable timer set to 100ms.
+ * Periodical reader wakeup delivery timer is disabled. It is useless because
+ * RING_BUFFER_WAKEUP_BY_WRITER is set.
+ */
+#define SG_SUBBUF_NUM_ORDER	3
+#define SG_SUBBUF_NUM		(1 << SG_SUBBUF_NUM_ORDER)
+#define SG_SWITCH_INTERVAL_MS	100U
+#define SG_SWITCH_INTERVAL_US	(SG_SWITCH_INTERVAL_MS * 1000)
+#define SG_READ_INTERVAL_US	0
+#define SG_U32_MAX		4294967295U	/* 2^32 - 1 */
+
+static const struct ring_buffer_config global_overwrite_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = client_buffer_create,
+	.cb.buffer_finalize = client_buffer_finalize,
+	.cb.record_get = client_record_get,
+
+	.tsc_bits = 0,
+	.alloc = RING_BUFFER_ALLOC_GLOBAL,
+	.sync = RING_BUFFER_SYNC_GLOBAL,
+	.mode = RING_BUFFER_OVERWRITE,
+	.align = RING_BUFFER_PACKED,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_ITERATOR,
+	.oops = RING_BUFFER_NO_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_NO_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_WRITER,
+};
+
+static const struct ring_buffer_config global_discard_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = client_buffer_create,
+	.cb.buffer_finalize = client_buffer_finalize,
+	.cb.record_get = client_record_get,
+
+	.tsc_bits = 0,
+	.alloc = RING_BUFFER_ALLOC_GLOBAL,
+	.sync = RING_BUFFER_SYNC_GLOBAL,
+	.mode = RING_BUFFER_DISCARD,
+	.align = RING_BUFFER_PACKED,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_ITERATOR,
+	.oops = RING_BUFFER_NO_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_NO_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_WRITER,
+};
+
+/* Wrapper library API */
+
+static
+struct channel *ring_buffer_global_create(
+				const struct ring_buffer_config *config,
+				size_t buf_size)
+{
+	size_t subbuf_size, subbuf_size_order;
+	unsigned int subbuf_num = SG_SUBBUF_NUM;
+
+	/* Typically use 8 subbuffers, minimum of PAGE_SIZE size each */
+	buf_size = max_t(size_t, buf_size, PAGE_SIZE << SG_SUBBUF_NUM_ORDER);
+	subbuf_size = buf_size >> SG_SUBBUF_NUM_ORDER;
+	/*
+	 * Ensure the event payload size fits on u32 event header.
+	 * Maximum subbuffer size is therefore 4GB.
+	 */
+	subbuf_size = min_t(size_t, SG_U32_MAX, subbuf_size);
+
+	/* Allocate more than 8 subbuffers if necessary. */
+	if (subbuf_size < (buf_size >> SG_SUBBUF_NUM_ORDER)) {
+		subbuf_size_order = get_count_order(subbuf_size);
+		subbuf_num = buf_size >> subbuf_size_order;
+	}
+
+	return channel_create(config, "sg", NULL, NULL,
+			      subbuf_size, subbuf_num,
+			      SG_SWITCH_INTERVAL_US, SG_READ_INTERVAL_US);
+}
+
+/**
+ * ring_buffer_global_overwrite_create - creates a global overwrite ring buffer.
+ * @buf_size: the buffer size
+ *
+ * Returns the ring buffer channel address on success, NULL on error.
+ */
+struct channel *ring_buffer_global_overwrite_create(size_t buf_size)
+{
+	return ring_buffer_global_create(&global_overwrite_config, buf_size);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_global_overwrite_create);
+
+/**
+ * ring_buffer_global_discard_create - creates a global discard ring buffer.
+ * @buf_size: the buffer size
+ *
+ * Returns the ring buffer channel address on success, NULL on error.
+ */
+
+struct channel *ring_buffer_global_discard_create(size_t buf_size)
+{
+	return ring_buffer_global_create(&global_discard_config, buf_size);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_global_discard_create);
+
+static
+void ring_buffer_global_destroy(struct channel *chan)
+{
+	channel_destroy(chan);
+}
+
+/**
+ * ring_buffer_global_overwrite_destroy - teardown global overwrite ring buffer.
+ * @chan: ring buffer channel
+ */
+void ring_buffer_global_overwrite_destroy(struct channel *chan)
+{
+	ring_buffer_global_destroy(chan);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_global_overwrite_destroy);
+
+/**
+ * ring_buffer_global_overwrite_destroy - teardown global discard ring buffer.
+ * @chan: ring buffer channel
+ */
+void ring_buffer_global_discard_destroy(struct channel *chan)
+{
+	ring_buffer_global_destroy(chan);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_global_discard_destroy);
+
+/**
+ * ring_buffer_global_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+static
+int ring_buffer_global_write(const struct ring_buffer_config *config,
+			     struct channel *chan, const void *src, size_t len)
+{
+	struct record_header header;
+	struct ring_buffer_ctx ctx;
+	int ret;
+
+	ring_buffer_ctx_init(&ctx, chan, NULL, len, 0, 0);
+	ret = ring_buffer_reserve(config, &ctx);
+	if (ret)
+		goto end;
+	header.len = len;
+	ring_buffer_write(config, &ctx, &header,
+			  offsetof(struct record_header, header_end));
+	ring_buffer_write(config, &ctx, src, len);
+	ring_buffer_commit(config, &ctx);
+end:
+	return ret;
+
+}
+
+/**
+ * ring_buffer_global_overwrite_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+int ring_buffer_global_overwrite_write(struct channel *chan, const void *src,
+				       size_t len)
+{
+	return ring_buffer_global_write(&global_overwrite_config, chan, src,
+					len);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_global_overwrite_write);
+
+/**
+ * ring_buffer_global_discard_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+int ring_buffer_global_discard_write(struct channel *chan, const void *src,
+				     size_t len)
+{
+	return ring_buffer_global_write(&global_discard_config, chan, src, len);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_global_discard_write);
+
+MODULE_LICENSE("GPL and additional rights");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Ring Buffer Library Global Client");
Index: linux.trees.git/lib/ringbuffer/ring_buffer_percpu.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_percpu.c	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,334 @@
+/*
+ * ring_buffer_percpu.c
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Per-CPU ring buffer library implementation.
+ * Creates instances of both overwrite and discard modes.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/percpu_overwrite.h>
+#include <linux/ringbuffer/percpu_discard.h>
+#include <linux/ringbuffer/vfs.h>
+#include <linux/prio_heap.h>
+#include <linux/trace_clock.h>
+
+struct subbuffer_header {
+	uint8_t header_end[0];		/* End of header */
+};
+
+struct record_header {
+	uint64_t timestamp;		/* Record timestamp */
+	uint32_t len;			/* Size of record payload */
+	uint8_t header_end[0];		/* End of header */
+};
+
+static inline
+u64 ring_buffer_clock_read(struct channel *chan)
+{
+	return trace_clock();
+}
+
+static inline
+unsigned char record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size, size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	return offsetof(struct record_header, header_end);
+}
+
+#include <linux/ringbuffer/api.h>
+
+static
+u64 client_ring_buffer_clock_read(struct channel *chan)
+{
+	return ring_buffer_clock_read(chan);
+}
+
+static
+size_t client_record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size,
+				 size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	return record_header_size(config, chan, offset, data_size,
+				  pre_header_padding, rflags, ctx);
+}
+
+static
+size_t client_subbuffer_header_size(void)
+{
+	return offsetof(struct subbuffer_header, header_end);
+}
+
+static
+void client_buffer_begin(struct ring_buffer *buf, u64 tsc,
+			 unsigned int subbuf_idx)
+{
+}
+
+static
+void client_buffer_end(struct ring_buffer *buf, u64 tsc,
+		       unsigned int subbuf_idx, unsigned long data_size)
+{
+}
+
+static
+int client_buffer_create(struct ring_buffer *buf, void *priv,
+			 int cpu, const char *name)
+{
+	return 0;
+}
+
+static
+void client_buffer_finalize(struct ring_buffer *buf, void *priv, int cpu)
+{
+}
+
+static
+void client_record_get(const struct ring_buffer_config *config,
+		       struct channel *chan, struct ring_buffer *buf,
+		       size_t offset, size_t *header_len,
+		       size_t *payload_len, u64 *timestamp)
+{
+	int ret;
+	struct record_header header;
+
+	ret = ring_buffer_read(&buf->backend, offset, &header,
+			       offsetof(struct record_header, header_end));
+	CHAN_WARN_ON(chan, ret != offsetof(struct record_header, header_end));
+	*header_len = offsetof(struct record_header, header_end);
+	*payload_len = header.len;
+	*timestamp = header.timestamp;
+}
+
+/*
+ * Typically 8 subbuffers of variable size per CPU. We allocate more than 2
+ * subbuffers per cpu to provide room for the merge-sort.
+ * Maximum subbuffer size is 4GB. Allocate more subbuffers if more space is
+ * requested.
+ * Periodical buffer switch deferrable timer is set to 100ms. This will wake up
+ * blocking reads when partially filled subbuffers are ready for reading.
+ * Periodical reader wakeup delivery timer is disabled. It is useless because
+ * RING_BUFFER_WAKEUP_BY_WRITER is set.
+ */
+#define SP_SUBBUF_NUM_ORDER	3
+#define SP_SUBBUF_NUM		(1 << SP_SUBBUF_NUM_ORDER)
+#define SP_SWITCH_INTERVAL_MS	100U
+#define SP_SWITCH_INTERVAL_US	(SP_SWITCH_INTERVAL_MS * 1000)
+#define SP_READ_INTERVAL_US	0
+#define SP_U32_MAX		4294967295U	/* 2^32 - 1 */
+
+static const struct ring_buffer_config percpu_overwrite_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = client_buffer_create,
+	.cb.buffer_finalize = client_buffer_finalize,
+	.cb.record_get = client_record_get,
+
+	.tsc_bits = 64,
+	.alloc = RING_BUFFER_ALLOC_PER_CPU,
+	.sync = RING_BUFFER_SYNC_PER_CPU,
+	.mode = RING_BUFFER_OVERWRITE,
+	.align = RING_BUFFER_PACKED,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_ITERATOR,
+	.oops = RING_BUFFER_NO_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_WRITER,
+};
+
+static const struct ring_buffer_config percpu_discard_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = client_buffer_create,
+	.cb.buffer_finalize = client_buffer_finalize,
+	.cb.record_get = client_record_get,
+
+	.tsc_bits = 64,
+	.alloc = RING_BUFFER_ALLOC_PER_CPU,
+	.sync = RING_BUFFER_SYNC_PER_CPU,
+	.mode = RING_BUFFER_DISCARD,
+	.align = RING_BUFFER_PACKED,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_ITERATOR,
+	.oops = RING_BUFFER_NO_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_WRITER,
+};
+
+/* Wrapper library API */
+
+static
+struct channel *ring_buffer_percpu_create(
+				const struct ring_buffer_config *config,
+				size_t buf_size)
+{
+	size_t subbuf_size, subbuf_size_order;
+	unsigned int subbuf_num = SP_SUBBUF_NUM;
+
+	/* Typically use 8 subbuffers, minimum of PAGE_SIZE size each */
+	buf_size = max_t(size_t, buf_size, PAGE_SIZE << SP_SUBBUF_NUM_ORDER);
+	subbuf_size = buf_size >> SP_SUBBUF_NUM_ORDER;
+	/*
+	 * Ensure the event payload size fits on u32 event header.
+	 * Maximum subbuffer size is therefore 4GB.
+	 */
+	subbuf_size = min_t(size_t, SP_U32_MAX, subbuf_size);
+
+	/* Allocate more than 8 subbuffers if necessary. */
+	if (subbuf_size < (buf_size >> SP_SUBBUF_NUM_ORDER)) {
+		subbuf_size_order = get_count_order(subbuf_size);
+		subbuf_num = buf_size >> subbuf_size_order;
+	}
+
+	return channel_create(config, "sp", NULL, NULL,
+			      subbuf_size, subbuf_num,
+			      SP_SWITCH_INTERVAL_US, SP_READ_INTERVAL_US);
+}
+
+/**
+ * ring_buffer_percpu_overwrite_create - creates a per-cpu overwrite ring
+ *                                       buffer.
+ * @buf_size: the buffer size
+ *
+ * Returns the ring buffer channel address on success, NULL on error.
+ */
+struct channel *ring_buffer_percpu_overwrite_create(size_t buf_size)
+{
+	return ring_buffer_percpu_create(&percpu_overwrite_config, buf_size);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_overwrite_create);
+
+/**
+ * ring_buffer_percpu_discard_create - creates a per-cpu discard ring buffer.
+ * @buf_size: the buffer size
+ *
+ * Returns the ring buffer channel address on success, NULL on error.
+ */
+
+struct channel *ring_buffer_percpu_discard_create(size_t buf_size)
+{
+	return ring_buffer_percpu_create(&percpu_discard_config, buf_size);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_discard_create);
+
+static
+void ring_buffer_percpu_destroy(struct channel *chan)
+{
+	channel_destroy(chan);
+}
+
+/**
+ * ring_buffer_percpu_overwrite_destroy - deletes a per-cpu
+ *                                        overwrite ring buffer.
+ * @chan: ring buffer channel
+ */
+void ring_buffer_percpu_overwrite_destroy(struct channel *chan)
+{
+	ring_buffer_percpu_destroy(chan);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_overwrite_destroy);
+
+/**
+ * ring_buffer_percpu_discard_destroy - deletes a per-cpu
+ *                                      discard ring buffer.
+ * @chan: ring buffer channel
+ */
+void ring_buffer_percpu_discard_destroy(struct channel *chan)
+{
+	ring_buffer_percpu_destroy(chan);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_discard_destroy);
+
+/**
+ * ring_buffer_percpu_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+static
+int ring_buffer_percpu_write(const struct ring_buffer_config *config,
+			     struct channel *chan, const void *src, size_t len)
+{
+	struct percpu_private *priv = channel_get_private(chan);
+	struct record_header header;
+	struct ring_buffer_ctx ctx;
+	int ret, cpu;
+
+	cpu = ring_buffer_get_cpu(config);
+	if (cpu < 0) {
+		ret = cpu;
+		goto end;
+	}
+	ring_buffer_ctx_init(&ctx, chan, priv, len, 0, cpu);
+	ret = ring_buffer_reserve(config, &ctx);
+	if (ret)
+		goto put;
+	header.timestamp = ctx.tsc;
+	header.len = len;
+	ring_buffer_write(config, &ctx, &header,
+			  offsetof(struct record_header, header_end));
+	ring_buffer_write(config, &ctx, src, len);
+	ring_buffer_commit(config, &ctx);
+put:
+	ring_buffer_put_cpu(config);
+end:
+	return ret;
+}
+
+/**
+ * ring_buffer_percpu_overwrite_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+int ring_buffer_percpu_overwrite_write(struct channel *chan, const void *src,
+				       size_t len)
+{
+	return ring_buffer_percpu_write(&percpu_overwrite_config, chan, src,
+					len);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_overwrite_write);
+
+/**
+ * ring_buffer_percpu_discard_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+int ring_buffer_percpu_discard_write(struct channel *chan, const void *src,
+				     size_t len)
+{
+	return ring_buffer_percpu_write(&percpu_discard_config, chan, src,
+					len);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_discard_write);
+
+MODULE_LICENSE("GPL and additional rights");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Ring Buffer Library Per-CPU Client");
Index: linux.trees.git/lib/ringbuffer/ring_buffer_percpu_local.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/lib/ringbuffer/ring_buffer_percpu_local.c	2010-07-09 18:35:16.000000000 -0400
@@ -0,0 +1,371 @@
+/*
+ * ring_buffer_percpu_local.c
+ *
+ * Copyright (C) 2010 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * Per-CPU ring buffer library implementation.
+ * Creates instances of both overwrite and discard modes.
+ * Presents per-cpu-buffer iterators.
+ *
+ * Dual LGPL v2.1/GPL v2 license.
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/ringbuffer/config.h>
+#include <linux/ringbuffer/percpu_local_overwrite.h>
+#include <linux/ringbuffer/percpu_local_discard.h>
+#include <linux/ringbuffer/vfs.h>
+#include <linux/prio_heap.h>
+
+struct channel_priv {
+	/* Returns 0 on success */
+	int (*on_buffer_create)(struct ring_buffer *buf, int cpu);
+	void (*on_buffer_finalize)(struct ring_buffer *buf, int cpu);
+};
+
+struct subbuffer_header {
+	uint8_t header_end[0];		/* End of header */
+};
+
+struct record_header {
+	uint32_t len;			/* Size of record payload */
+	uint8_t header_end[0];		/* End of header */
+};
+
+static inline
+u64 ring_buffer_clock_read(struct channel *chan)
+{
+	return 0;
+}
+
+static inline
+unsigned char record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size, size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	return offsetof(struct record_header, header_end);
+}
+
+#include <linux/ringbuffer/api.h>
+
+static
+u64 client_ring_buffer_clock_read(struct channel *chan)
+{
+	return ring_buffer_clock_read(chan);
+}
+
+static
+size_t client_record_header_size(const struct ring_buffer_config *config,
+				 struct channel *chan, size_t offset,
+				 size_t data_size,
+				 size_t *pre_header_padding,
+				 unsigned int rflags,
+				 struct ring_buffer_ctx *ctx)
+{
+	return record_header_size(config, chan, offset, data_size,
+				  pre_header_padding, rflags, ctx);
+}
+
+static
+size_t client_subbuffer_header_size(void)
+{
+	return offsetof(struct subbuffer_header, header_end);
+}
+
+static
+void client_buffer_begin(struct ring_buffer *buf, u64 tsc,
+			 unsigned int subbuf_idx)
+{
+}
+
+static
+void client_buffer_end(struct ring_buffer *buf, u64 tsc,
+		       unsigned int subbuf_idx, unsigned long data_size)
+{
+}
+
+static
+int client_buffer_create(struct ring_buffer *buf, void *priv,
+			 int cpu, const char *name)
+{
+	struct channel_priv *chan_priv = priv;
+	return chan_priv->on_buffer_create(buf, cpu);
+}
+
+static
+void client_buffer_finalize(struct ring_buffer *buf, void *priv, int cpu)
+{
+	struct channel_priv *chan_priv = priv;
+	chan_priv->on_buffer_finalize(buf, cpu);
+}
+
+static
+void client_record_get(const struct ring_buffer_config *config,
+		       struct channel *chan, struct ring_buffer *buf,
+		       size_t offset, size_t *header_len,
+		       size_t *payload_len, u64 *timestamp)
+{
+	int ret;
+	struct record_header header;
+
+	ret = ring_buffer_read(&buf->backend, offset, &header,
+			       offsetof(struct record_header, header_end));
+	CHAN_WARN_ON(chan, ret != offsetof(struct record_header, header_end));
+	*header_len = offsetof(struct record_header, header_end);
+	*payload_len = header.len;
+	/* Timestamp is left unset. We don't use channel iterators. */
+}
+
+/*
+ * Typically 8 subbuffers of variable size per CPU.
+ * Maximum subbuffer size is 4GB. Allocate more subbuffers if more space is
+ * requested.
+ * Periodical buffer switch deferrable timer is set to 100ms. This will wake up
+ * blocking reads when partially filled subbuffers are ready for reading.
+ * Periodical reader wakeup delivery timer is disabled. It is useless because
+ * RING_BUFFER_WAKEUP_BY_WRITER is set.
+ */
+#define SP_SUBBUF_NUM_ORDER	3
+#define SP_SUBBUF_NUM		(1 << SP_SUBBUF_NUM_ORDER)
+#define SP_SWITCH_INTERVAL_MS	100U
+#define SP_SWITCH_INTERVAL_US	(SP_SWITCH_INTERVAL_MS * 1000)
+#define SP_READ_INTERVAL_US	0
+#define SP_U32_MAX		4294967295U	/* 2^32 - 1 */
+
+static const struct ring_buffer_config percpu_local_overwrite_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = client_buffer_create,
+	.cb.buffer_finalize = client_buffer_finalize,
+	.cb.record_get = client_record_get,
+
+	.tsc_bits = 64,
+	.alloc = RING_BUFFER_ALLOC_PER_CPU,
+	.sync = RING_BUFFER_SYNC_PER_CPU,
+	.mode = RING_BUFFER_OVERWRITE,
+	.align = RING_BUFFER_PACKED,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_ITERATOR,
+	.oops = RING_BUFFER_NO_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_WRITER,
+};
+
+static const struct ring_buffer_config percpu_local_discard_config = {
+	.cb.ring_buffer_clock_read = client_ring_buffer_clock_read,
+	.cb.record_header_size = client_record_header_size,
+	.cb.subbuffer_header_size = client_subbuffer_header_size,
+	.cb.buffer_begin = client_buffer_begin,
+	.cb.buffer_end = client_buffer_end,
+	.cb.buffer_create = client_buffer_create,
+	.cb.buffer_finalize = client_buffer_finalize,
+	.cb.record_get = client_record_get,
+
+	.tsc_bits = 64,
+	.alloc = RING_BUFFER_ALLOC_PER_CPU,
+	.sync = RING_BUFFER_SYNC_PER_CPU,
+	.mode = RING_BUFFER_DISCARD,
+	.align = RING_BUFFER_PACKED,
+	.backend = RING_BUFFER_PAGE,
+	.output = RING_BUFFER_ITERATOR,
+	.oops = RING_BUFFER_NO_OOPS_CONSISTENCY,
+	.ipi = RING_BUFFER_IPI_BARRIER,
+	.wakeup = RING_BUFFER_WAKEUP_BY_WRITER,
+};
+
+/* Wrapper library API */
+
+static
+struct channel *ring_buffer_spl_create(const struct ring_buffer_config *config,
+		size_t buf_size,
+		int (*on_buffer_create)(struct ring_buffer *buf, int cpu),
+		void (*on_buffer_finalize)(struct ring_buffer *buf, int cpu))
+{
+	struct channel *chan;
+	size_t subbuf_size, subbuf_size_order;
+	unsigned int subbuf_num = SP_SUBBUF_NUM;
+	struct channel_priv *priv;
+
+	/* Typically use 8 subbuffers, minimum of PAGE_SIZE size each */
+	buf_size = max_t(size_t, buf_size, PAGE_SIZE << SP_SUBBUF_NUM_ORDER);
+	subbuf_size = buf_size >> SP_SUBBUF_NUM_ORDER;
+	/*
+	 * Ensure the event payload size fits on u32 event header.
+	 * Maximum subbuffer size is therefore 4GB.
+	 */
+	subbuf_size = min_t(size_t, SP_U32_MAX, subbuf_size);
+
+	/* Allocate more than 8 subbuffers if necessary. */
+	if (subbuf_size < (buf_size >> SP_SUBBUF_NUM_ORDER)) {
+		subbuf_size_order = get_count_order(subbuf_size);
+		subbuf_num = buf_size >> subbuf_size_order;
+	}
+
+	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return NULL;
+	priv->on_buffer_create = on_buffer_create;
+	priv->on_buffer_finalize = on_buffer_finalize;
+
+	chan = channel_create(config, "spl", priv, NULL,
+			      subbuf_size, subbuf_num,
+			      SP_SWITCH_INTERVAL_US, SP_READ_INTERVAL_US);
+	if (!chan)
+		goto free_priv;
+	return chan;
+
+free_priv:
+	kfree(priv);
+	return NULL;
+}
+
+/**
+ * ring_buffer_percpu_local_overwrite_create - creates a per-cpu overwrite
+*                                              ring buffer.
+ * @buf_size: the buffer size
+ * @on_buffer_create: callback to be called on per-cpu buffer creation
+ * @on_buffer_finalize: callback to be called on per-cpu buffer finalize
+ *
+ * Returns the ring buffer channel address on success, NULL on error.
+ */
+struct channel *ring_buffer_percpu_local_overwrite_create(size_t buf_size,
+		int (*on_buffer_create)(struct ring_buffer *buf, int cpu),
+		void (*on_buffer_finalize)(struct ring_buffer *buf, int cpu))
+
+{
+	return ring_buffer_spl_create(&percpu_local_overwrite_config, buf_size,
+				      on_buffer_create, on_buffer_finalize);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_local_overwrite_create);
+
+/**
+ * ring_buffer_percpu_local_discard_create - creates a per-cpu discard ring
+ *                                           buffer.
+ * @buf_size: the buffer size
+ * @on_buffer_create: callback to be called on per-cpu buffer creation
+ * @on_buffer_finalize: callback to be called on per-cpu buffer finalize
+ *
+ * Returns the ring buffer channel address on success, NULL on error.
+ */
+
+struct channel *ring_buffer_percpu_local_discard_create(size_t buf_size,
+		int (*on_buffer_create)(struct ring_buffer *buf, int cpu),
+		void (*on_buffer_finalize)(struct ring_buffer *buf, int cpu))
+{
+	return ring_buffer_spl_create(&percpu_local_discard_config, buf_size,
+				      on_buffer_create, on_buffer_finalize);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_local_discard_create);
+
+static
+void ring_buffer_percpu_local_destroy(struct channel *chan)
+{
+	struct channel_priv *priv;
+
+	priv = channel_destroy(chan);
+	kfree(priv);
+}
+
+/**
+ * ring_buffer_percpu_local_overwrite_destroy - deletes a per-cpu overwrite
+ *                                              ring buffer.
+ * @chan: ring buffer channel
+ */
+void ring_buffer_percpu_local_overwrite_destroy(struct channel *chan)
+{
+	ring_buffer_percpu_local_destroy(chan);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_local_overwrite_destroy);
+/**
+ * ring_buffer_percpu_local_discard_destroy - deletes a per-cpu discard
+ *                                            ring buffer.
+ * @chan: ring buffer channel
+ */
+void ring_buffer_percpu_local_discard_destroy(struct channel *chan)
+{
+	ring_buffer_percpu_local_destroy(chan);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_local_discard_destroy);
+
+/**
+ * ring_buffer_percpu_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+static
+int ring_buffer_percpu_write(const struct ring_buffer_config *config,
+			 struct channel *chan, const void *src, size_t len)
+{
+	struct percpu_private *priv = channel_get_private(chan);
+	struct record_header header;
+	struct ring_buffer_ctx ctx;
+	int ret, cpu;
+
+	cpu = ring_buffer_get_cpu(config);
+	if (cpu < 0) {
+		ret = cpu;
+		goto end;
+	}
+	ring_buffer_ctx_init(&ctx, chan, priv, len, 0, cpu);
+	ret = ring_buffer_reserve(config, &ctx);
+	if (ret)
+		goto put;
+	header.len = len;
+	ring_buffer_write(config, &ctx, &header,
+			  offsetof(struct record_header, header_end));
+	ring_buffer_write(config, &ctx, src, len);
+	ring_buffer_commit(config, &ctx);
+put:
+	ring_buffer_put_cpu(config);
+end:
+	return ret;
+}
+
+/**
+ * ring_buffer_percpu_local_overwrite_write - writes a record into the ring
+ *                                            buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+int ring_buffer_percpu_local_overwrite_write(struct channel *chan,
+					     const void *src, size_t len)
+{
+	return ring_buffer_percpu_write(&percpu_local_overwrite_config, chan,
+					src, len);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_local_overwrite_write);
+
+/**
+ * ring_buffer_percpu_local_discard_write - writes a record into the ring buffer.
+ * @chan: ring buffer channel
+ * @src: start of input to copy from
+ * @len: length of record
+ *
+ * The record starts at the "src" address and is "len" bytes long. Returns 0 on
+ * success, else it returns a negative error value.
+ */
+int ring_buffer_percpu_local_discard_write(struct channel *chan,
+					   const void *src, size_t len)
+{
+	return ring_buffer_percpu_write(&percpu_local_discard_config, chan, src,
+					len);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_percpu_local_discard_write);
+
+MODULE_LICENSE("GPL and additional rights");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("Ring Buffer Library Per-CPU Local Client");


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 20/20] Ring buffer: benchmark simple API
  2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
                   ` (18 preceding siblings ...)
  2010-07-09 22:57 ` [patch 19/20] Ring Buffer: Basic API Mathieu Desnoyers
@ 2010-07-09 22:57 ` Mathieu Desnoyers
  19 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-07-09 22:57 UTC (permalink / raw)
  To: Steven Rostedt, LKML
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	Frederic Weisbecker, Thomas Gleixner, Christoph Hellwig,
	Mathieu Desnoyers, Li Zefan, Lai Jiangshan, Johannes Berg,
	Masami Hiramatsu, Arnaldo Carvalho de Melo, Tom Zanussi,
	KOSAKI Motohiro, Andi Kleen

[-- Attachment #1: ring-buffer-benchmark-simple-api.patch --]
[-- Type: text/plain, Size: 26060 bytes --]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 kernel/trace/Makefile                                           |    8 
 kernel/trace/lib_global_discard_ring_buffer_benchmark.c         |   14 
 kernel/trace/lib_global_overwrite_ring_buffer_benchmark.c       |   14 
 kernel/trace/lib_percpu_discard_ring_buffer_benchmark.c         |   14 
 kernel/trace/lib_percpu_local_discard_ring_buffer_benchmark.c   |   14 
 kernel/trace/lib_percpu_local_overwrite_ring_buffer_benchmark.c |   14 
 kernel/trace/lib_percpu_overwrite_ring_buffer_benchmark.c       |   14 
 kernel/trace/ring_buffer_benchmark_template.h                   |  725 ++++++++++
 8 files changed, 816 insertions(+), 1 deletion(-)

Index: linux.trees.git/kernel/trace/Makefile
===================================================================
--- linux.trees.git.orig/kernel/trace/Makefile	2010-07-09 18:32:08.000000000 -0400
+++ linux.trees.git/kernel/trace/Makefile	2010-07-09 18:35:53.000000000 -0400
@@ -24,7 +24,13 @@ obj-y += trace_clock.o
 obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
 obj-$(CONFIG_FTRACE_RING_BUFFER) += ftrace_ring_buffer.o
 obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ftrace_ring_buffer_benchmark.o
-obj-$(CONFIG_RING_BUFFER_BENCHMARK) += lib_ring_buffer_benchmark.o
+obj-$(CONFIG_RING_BUFFER_BENCHMARK) += lib_ring_buffer_benchmark.o \
+	lib_global_overwrite_ring_buffer_benchmark.o \
+	lib_global_discard_ring_buffer_benchmark.o \
+	lib_percpu_overwrite_ring_buffer_benchmark.o \
+	lib_percpu_discard_ring_buffer_benchmark.o \
+	lib_percpu_local_overwrite_ring_buffer_benchmark.o \
+	lib_percpu_local_discard_ring_buffer_benchmark.o
 
 obj-$(CONFIG_TRACING) += trace.o
 obj-$(CONFIG_TRACING) += trace_output.o
Index: linux.trees.git/kernel/trace/ring_buffer_benchmark_template.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/ring_buffer_benchmark_template.h	2010-07-09 18:36:24.000000000 -0400
@@ -0,0 +1,725 @@
+/*
+ * ring buffer tester and benchmark template
+ *
+ * This template file is meant to be included in respective tester variant
+ * modules with appropriate definitions.
+ *
+ * Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#include <linux/trace_clock.h>
+#include <linux/completion.h>
+#include <linux/kmemcheck.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/time.h>
+#include <linux/debugfs.h>
+#include <linux/stringify.h>
+#include <asm/local.h>
+
+#include <linux/ringbuffer/config.h>		/* only for counter access */
+#include <linux/ringbuffer/backend.h>		/* only for counter access */
+#include <linux/ringbuffer/frontend.h>		/* only for counter access */
+
+#define BUFFER_SIZE	1048576			/* Use 1MB buffers */
+
+#ifdef RING_BUFFER_GLOBAL_TMPL
+# ifdef RING_BUFFER_OVERWRITE_TMPL
+#  include <linux/ringbuffer/global_overwrite.h>
+#  define RING_BUFFER_TMPL(x)	ring_buffer_global_overwrite_##x
+# else
+#  include <linux/ringbuffer/global_discard.h>
+#  define RING_BUFFER_TMPL(x)	ring_buffer_global_discard_##x
+# endif
+#elif defined(RING_BUFFER_PER_CPU_TMPL)
+# ifdef RING_BUFFER_OVERWRITE_TMPL
+#  include <linux/ringbuffer/percpu_overwrite.h>
+#  define RING_BUFFER_TMPL(x)	ring_buffer_percpu_overwrite_##x
+# else
+#  include <linux/ringbuffer/percpu_discard.h>
+#  define RING_BUFFER_TMPL(x)	ring_buffer_percpu_discard_##x
+# endif
+#elif defined(RING_BUFFER_PER_CPU_LOCAL_TMPL)
+# ifdef RING_BUFFER_OVERWRITE_TMPL
+#  include <linux/ringbuffer/percpu_local_overwrite.h>
+#  define RING_BUFFER_TMPL(x)	ring_buffer_percpu_local_overwrite_##x
+# else
+#  include <linux/ringbuffer/percpu_local_discard.h>
+#  define RING_BUFFER_TMPL(x)	ring_buffer_percpu_local_discard_##x
+# endif
+#else
+#error "Please define one type of ring buffer template"
+#endif
+
+#if (!defined(RING_BUFFER_OVERWRITE_TMPL) \
+	&& !defined(RING_BUFFER_DISCARD_TMPL))
+#error "Please define one mode of ring buffer template"
+#endif
+
+#ifndef RING_BUFFER_NAME_TMPL
+#error "Please define the ring buffer template name"
+#endif
+
+static struct channel *channel;
+
+#ifdef RING_BUFFER_PER_CPU_LOCAL_TMPL
+static struct dentry *dentry[NR_CPUS];
+#else
+static struct dentry *dentry;
+#endif
+
+/* run time and sleep time in seconds */
+#define RUN_TIME	10
+#define SLEEP_TIME	10
+
+#ifndef CONFIG_PREEMPT
+/* number of events for writer to give up the cpu */
+static int resched_interval = 5000;
+#endif
+
+static struct completion read_start;
+
+static struct task_struct *consumer;
+static unsigned long iter_read;
+static unsigned long long global_read;
+
+static int *writer_finish;
+static struct task_struct *producer;	/* Dispatch thread */
+static struct task_struct **producers;
+static struct completion *write_start;
+static struct completion *write_done;
+static atomic_long_t tot_hit;
+static atomic_long_t tot_missed;
+
+static int disable_reader;
+module_param(disable_reader, uint, 0644);
+MODULE_PARM_DESC(disable_reader, "only run producer");
+
+static int file_reader;
+module_param(file_reader, uint, 0644);
+MODULE_PARM_DESC(file_reader, "open debugfs file for read()");
+
+static unsigned int nr_producers = 1;
+module_param(nr_producers, uint, 0644);
+MODULE_PARM_DESC(nr_producers, "number of producer threads");
+
+static int writer_delay;
+module_param(writer_delay, uint, 0644);
+MODULE_PARM_DESC(writer_delay, "delay between writes, in ms");
+
+static int producer_nice = 19;
+static int consumer_nice = 19;
+
+static int producer_fifo = -1;
+static int consumer_fifo = -1;
+
+module_param(producer_nice, uint, 0644);
+MODULE_PARM_DESC(producer_nice, "nice prio for producer");
+
+module_param(consumer_nice, uint, 0644);
+MODULE_PARM_DESC(consumer_nice, "nice prio for consumer");
+
+module_param(producer_fifo, uint, 0644);
+MODULE_PARM_DESC(producer_fifo, "fifo prio for producer");
+
+module_param(consumer_fifo, uint, 0644);
+MODULE_PARM_DESC(consumer_fifo, "fifo prio for consumer");
+
+static int kill_test;
+
+#define KILL_TEST()				\
+	do {					\
+		if (!kill_test) {		\
+			kill_test = 1;		\
+			WARN_ON(1);		\
+		}				\
+	} while (0)
+
+enum event_status {
+	EVENT_FOUND,
+	EVENT_DROPPED,
+};
+
+static char payload[] = "Ring buffer test data.\n";
+
+static void wait_to_die(void)
+{
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+	__set_current_state(TASK_RUNNING);
+}
+
+#ifdef RING_BUFFER_PER_CPU_LOCAL_TMPL
+
+static int read_event(struct ring_buffer *buf)
+{
+	ssize_t len;
+	char *read_payload;
+
+	len = ring_buffer_get_next_record(channel, buf);
+	if (len < 0)
+		return len;
+	WARN_ON_ONCE(len != sizeof(payload) - 1);
+	read_payload = kmalloc(len + 1, GFP_KERNEL);
+	if (!read_payload)
+		return -ENOMEM;
+
+	read_current_record(buf, read_payload);
+	read_payload[len] = '\0';
+
+	WARN_ON_ONCE(strcmp(read_payload, payload));
+	iter_read++;
+	global_read++;
+	kfree(read_payload);
+
+	return 0;
+}
+
+static int read_channel_events(void)
+{
+	int cpu;
+	int all_finalized = 1;
+	int all_empty = 1;
+
+	for_each_channel_cpu(cpu, channel) {
+		struct ring_buffer *buf;
+		int found = 1;
+
+		buf = channel_get_ring_buffer(channel->backend.config, channel,
+					      cpu);
+		do {
+			int ret;
+
+			ret = read_event(buf);
+			if (ret != -ENODATA)
+				all_finalized = 0;
+			if (ret == 0)
+				all_empty = 0;
+			else
+				found = 0;
+		} while (found && !kill_test);
+	}
+
+	if (all_finalized)
+		return -ENODATA;
+	if (all_empty)
+		return -EAGAIN;
+	else
+		return 0;
+}
+
+static void ring_buffer_consumer(void)
+{
+	int ret;
+
+	do {
+		ret = read_channel_events();
+		if (ret == -EAGAIN)
+			wait_event_interruptible(channel->read_wait,
+				(ret = read_channel_events(),
+				 ret != -EAGAIN));
+	} while (!kill_test && ret != -ENODATA);
+}
+
+#else /* !RING_BUFFER_PER_CPU_LOCAL_TMPL */
+
+static int read_event(struct channel *chan)
+{
+	struct ring_buffer *buf;
+	ssize_t len;
+	char *read_payload;
+
+	len = channel_get_next_record(chan, &buf);
+	if (len < 0)
+		return len;
+	WARN_ON_ONCE(len != sizeof(payload) - 1);
+	read_payload = kmalloc(len + 1, GFP_KERNEL);
+	if (!read_payload)
+		return -ENOMEM;
+
+	read_current_record(buf, read_payload);
+	read_payload[len] = '\0';
+
+	WARN_ON_ONCE(strcmp(read_payload, payload));
+	iter_read++;
+	global_read++;
+	kfree(read_payload);
+
+	return 0;
+}
+
+static void ring_buffer_consumer(void)
+{
+	int ret;
+
+	do {
+		ret = read_event(channel);
+		if (ret == -EAGAIN)
+			wait_event_interruptible(channel->read_wait,
+				(ret = read_event(channel),
+				 ret != -EAGAIN));
+	} while (!kill_test && ret != -ENODATA);
+}
+
+#endif /* !RING_BUFFER_PER_CPU_LOCAL_TMPL */
+
+static int ring_buffer_consumer_thread(void *arg)
+{
+	WARN_ON_ONCE(channel_iterator_open(channel));
+
+	wait_for_completion_interruptible(&read_start);
+
+	ring_buffer_consumer();
+
+	channel_iterator_release(channel);
+
+	wait_to_die();
+
+	return 0;
+}
+
+static void ring_buffer_producer(unsigned int writer_id)
+{
+	unsigned long long hit = 0;
+	unsigned long long missed = 0;
+	int cnt = 0;
+	int ret;
+
+	/*
+	 * Hammer the buffer for 10 secs (this may make the system stall)
+	 */
+	while (!writer_finish[writer_id] && !kill_test) {
+		ret = RING_BUFFER_TMPL(write)(channel, payload,
+					      sizeof(payload) - 1);
+		if (ret)
+			missed++;
+		else
+			hit++;
+		cnt++;
+
+		if (writer_delay)
+			msleep(writer_delay);
+
+#ifndef CONFIG_PREEMPT
+		/*
+		 * If we are a non preempt kernel, the 10 second run will
+		 * stop everything while it runs. Instead, we will call
+		 * cond_resched and also add any time that was lost by a
+		 * reschedule.
+		 */
+		if (!(cnt % resched_interval))
+			cond_resched();
+#endif
+
+	}
+	writer_finish[writer_id] = 0;
+	atomic_long_add(hit, &tot_hit);
+	atomic_long_add(missed, &tot_missed);
+	complete(&write_done[writer_id]);
+}
+
+static void ring_buffer_report(unsigned long long time)
+{
+	unsigned long long hit, missed;
+	unsigned long long written = 0;
+	unsigned long long lost_full = 0, lost_wrap = 0, lost_big = 0;
+	unsigned long long entries = 0;
+	unsigned long long overruns = 0;
+	unsigned long long read = 0;
+	unsigned long avg;
+	struct ring_buffer *buf;
+	const struct ring_buffer_config *config;
+#ifndef RING_BUFFER_GLOBAL_TMPL
+	int cpu;
+#endif
+
+	config = channel->backend.config;
+	buf = channel_get_ring_buffer(config, channel, 0);
+
+	hit = atomic_long_read(&tot_hit);
+	missed = atomic_long_read(&tot_missed);
+
+	/*
+	 * These values only take into account flushed subbuffers.
+	 */
+#ifdef RING_BUFFER_GLOBAL_TMPL
+	written += ring_buffer_get_records_count(config, buf);
+	lost_full += ring_buffer_get_records_lost_full(config, buf);
+	lost_wrap += ring_buffer_get_records_lost_wrap(config, buf);
+	lost_big += ring_buffer_get_records_lost_big(config, buf);
+	overruns += ring_buffer_get_records_overrun(config, buf);
+	entries += ring_buffer_get_records_unread(config, buf);
+	read += ring_buffer_get_records_read(config, buf);
+#else
+	for_each_channel_cpu(cpu, channel) {
+		struct ring_buffer *buf =
+			channel_get_ring_buffer(config, channel, cpu);
+
+		written += ring_buffer_get_records_count(config, buf);
+		lost_full += ring_buffer_get_records_lost_full(config, buf);
+		lost_wrap += ring_buffer_get_records_lost_wrap(config, buf);
+		lost_big += ring_buffer_get_records_lost_big(config, buf);
+		overruns += ring_buffer_get_records_overrun(config, buf);
+		entries += ring_buffer_get_records_unread(config, buf);
+		read += ring_buffer_get_records_read(config, buf);
+	}
+#endif
+
+	trace_printk("Report for %s\n", RING_BUFFER_NAME_TMPL);
+
+	if (kill_test)
+		trace_printk("ERROR!\n");
+
+	if (!disable_reader) {
+		if (consumer_fifo < 0)
+			trace_printk("Running Consumer at nice: %d\n",
+				     consumer_nice);
+		else
+			trace_printk("Running Consumer at SCHED_FIFO %d\n",
+				     consumer_fifo);
+	}
+	if (producer_fifo < 0)
+		trace_printk("Running Producer at nice: %d\n",
+			     producer_nice);
+	else
+		trace_printk("Running Producer at SCHED_FIFO %d\n",
+			     producer_fifo);
+
+	/* Let the user know that the test is running at low priority */
+	if (producer_fifo < 0 && consumer_fifo < 0 &&
+	    producer_nice == 19 && consumer_nice == 19)
+		trace_printk("WARNING!!! This test is running at lowest "
+			     "priority.\n");
+
+	trace_printk("This iteration:           %llu (usecs)\n", time);
+	trace_printk("  Time:                   %llu (usecs)\n", time);
+	trace_printk("  Data production:\n");
+	trace_printk("    Written:              %llu\n", hit);
+	trace_printk("    Lost:                 %llu\n", missed);
+	trace_printk("  Data consumption:\n");
+	if (disable_reader)
+		trace_printk("    Read:                 (reader disabled)\n");
+	else
+		trace_printk("    Read:                 %lu\n",
+			     iter_read);
+	trace_printk("\n");
+	trace_printk("Global (only flushed subbuffers):\n");
+	trace_printk("  Data production:\n");
+	trace_printk("    Written:              %llu\n", written);
+	trace_printk("    Lost (buffer full)    %llu\n", lost_full);
+	trace_printk("    Lost (wrap around)    %llu\n", lost_wrap);
+	trace_printk("    Lost (event too big)  %llu\n", lost_big);
+	trace_printk("  Data consumption:\n");
+	if (disable_reader)
+		trace_printk("    Read:                 (reader disabled)\n");
+	else
+		trace_printk("    Read:                 %llu (%llu read-side)\n",
+			     read, global_read);
+	trace_printk("    Overruns:             %llu\n", overruns);
+	trace_printk("    Non-consumed entries: %llu\n", entries);
+	trace_printk("    Consumption total:    %llu\n",
+		     entries + overruns + read);
+
+	/* Convert time from usecs to CPU time millisecs */
+	time *= nr_producers;
+	do_div(time, USEC_PER_MSEC);
+	if (time)
+		hit /= (long)time;
+	else
+		trace_printk("TIME IS ZERO??\n");
+
+	trace_printk("Entries per millisec: %llu\n", hit);
+
+	if (hit) {
+		/* Calculate the average time in nanosecs */
+		avg = NSEC_PER_MSEC / hit;
+		trace_printk("%lu ns CPU time per entry written "
+			     "(%u writer threads)\n",
+			     avg, nr_producers);
+	}
+
+	if (missed) {
+		if (time)
+			missed /= (long)time;
+
+		trace_printk("Total iterations per millisec: %llu\n",
+			     hit + missed);
+
+		/* it is possible that hit + missed will overflow and be zero */
+		if (!(hit + missed)) {
+			trace_printk("hit + missed overflowed and "
+				     "totalled zero!\n");
+			hit--; /* make it non zero */
+		}
+
+		/* Caculate the average time in nanosecs */
+		avg = NSEC_PER_MSEC / (hit + missed);
+		trace_printk("%lu ns CPU time per entry (written+lost) "
+			     "(%u writer threads)\n",
+			     avg, nr_producers);
+	}
+	iter_read = 0;
+}
+
+static int ring_buffer_producer_thread(void *arg)
+{
+	unsigned int writer_id = (unsigned long) arg;
+
+	wait_for_completion_interruptible(&write_start[writer_id]);
+	while (!kthread_should_stop() && !kill_test) {
+		ring_buffer_producer(writer_id);
+		wait_for_completion_interruptible(&write_start[writer_id]);
+	}
+	__set_current_state(TASK_RUNNING);
+
+	if (kill_test)
+		wait_to_die();
+
+	return 0;
+}
+
+static int ring_buffer_main_producer_thread(void *arg)
+{
+
+	struct timeval start_tv;
+	struct timeval end_tv;
+	unsigned long long time;
+	int i;
+
+	if (consumer)
+		complete(&read_start);
+
+	while (!kthread_should_stop() && !kill_test) {
+		trace_printk("Starting ring buffer hammer\n");
+		do_gettimeofday(&start_tv);
+		/* Wake up producers */
+		for (i = 0; i < nr_producers; i++)
+			complete(&write_start[i]);
+
+		/* the completions must be visible before the finish var */
+		smp_wmb();
+
+		/* Wait for RUN_TIME  */
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(HZ * RUN_TIME);
+		__set_current_state(TASK_RUNNING);
+
+		/* Stop producers */
+		for (i = 0; i < nr_producers; i++)
+			writer_finish[i] = 1;
+		/* finish var visible before waking up */
+		smp_wmb();
+
+		/* Wait for producers to complete */
+		for (i = 0; i < nr_producers; i++)
+			wake_up_process(producers[i]);
+		for (i = 0; i < nr_producers; i++)
+			wait_for_completion_interruptible(&write_done[i]);
+		do_gettimeofday(&end_tv);
+		trace_printk("End ring buffer hammer\n");
+
+		time = end_tv.tv_sec - start_tv.tv_sec;
+		time *= USEC_PER_SEC;
+		time += (long long)((long)end_tv.tv_usec
+				    - (long)start_tv.tv_usec);
+
+		if (kthread_should_stop() || kill_test)
+			break;
+
+		/* Print report */
+		ring_buffer_report(time);
+		atomic_long_set(&tot_hit, 0);
+		atomic_long_set(&tot_missed, 0);
+
+		trace_printk("Sleeping for 10 secs\n");
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(HZ * SLEEP_TIME);
+		__set_current_state(TASK_RUNNING);
+	}
+
+	if (kill_test)
+		wait_to_die();
+
+	return 0;
+}
+
+#ifdef RING_BUFFER_PER_CPU_LOCAL_TMPL
+
+static int on_buffer_create(struct ring_buffer *buf, int cpu)
+{
+	int ret = 0;
+	char *tmpname;
+
+	if (!file_reader)
+		return 0;
+
+	tmpname = kzalloc(NAME_MAX + 1, GFP_KERNEL);
+	if (!tmpname) {
+		ret = -ENOMEM;
+		goto end;
+	}
+
+	snprintf(tmpname, NAME_MAX, "%s_%d",
+		 __stringify(RING_BUFFER_TMPL(benchmark)), cpu);
+	dentry[cpu] = debugfs_create_file(tmpname, S_IRUSR,
+					  NULL, buf,
+					  &ring_buffer_payload_file_operations);
+	if (!dentry[cpu])
+		ret = -ENOMEM;
+	kfree(tmpname);
+end:
+	return ret;
+}
+
+static void on_buffer_finalize(struct ring_buffer *buf, int cpu)
+{
+	if (file_reader)
+		debugfs_remove(dentry[cpu]);
+}
+
+#endif /* RING_BUFFER_PER_CPU_LOCAL_TMPL */
+
+static int __init ring_buffer_benchmark_init(void)
+{
+	int ret;
+	unsigned int i;
+
+#ifdef RING_BUFFER_PER_CPU_LOCAL_TMPL
+	channel = RING_BUFFER_TMPL(create)(BUFFER_SIZE, on_buffer_create,
+					   on_buffer_finalize);
+#else
+	channel = RING_BUFFER_TMPL(create)(BUFFER_SIZE);
+#endif
+	if (!channel)
+		return -EINVAL;
+
+	if (file_reader && !disable_reader) {
+		printk(KERN_WARNING "Forcefully disabling reader; file "
+				    "descriptor available for reading.\n");
+		disable_reader = 1;
+	}
+
+	if (!disable_reader) {
+		init_completion(&read_start);
+		consumer = kthread_run(ring_buffer_consumer_thread,
+				       NULL, "rb_consumer");
+		ret = PTR_ERR(consumer);
+		if (IS_ERR(consumer))
+			goto out_fail;
+	}
+
+#ifndef RING_BUFFER_PER_CPU_LOCAL_TMPL
+	if (file_reader) {
+		dentry = debugfs_create_file(
+				__stringify(RING_BUFFER_TMPL(benchmark)),
+				S_IRUSR, NULL, channel,
+				&channel_payload_file_operations);
+		WARN_ON(!dentry);
+	}
+#endif
+
+	producers = kzalloc(sizeof(struct task_struct *) * nr_producers,
+			    GFP_KERNEL);
+	writer_finish = kzalloc(sizeof(int) * nr_producers, GFP_KERNEL);
+	write_start = kzalloc(sizeof(struct completion) * nr_producers,
+			      GFP_KERNEL);
+	write_done = kzalloc(sizeof(struct completion) * nr_producers,
+			     GFP_KERNEL);
+	if (!producers || !writer_finish || !write_start || !write_done) {
+		ret = -ENOMEM;
+		goto out_free_writer_structures;
+	}
+
+	for (i = 0; i < nr_producers; i++) {
+		init_completion(&write_start[i]);
+		init_completion(&write_done[i]);
+		producers[i] = kthread_run(ring_buffer_producer_thread,
+					   (void *)(unsigned long)i,
+					   "rb_producer");
+		ret = PTR_ERR(producers[i]);
+
+		if (IS_ERR(producers[i]))
+			goto out_kill_producers;
+	}
+
+	producer = kthread_run(ring_buffer_main_producer_thread,
+			       NULL, "rb_main_producer");
+	ret = PTR_ERR(producer);
+
+	if (IS_ERR(producer))
+		goto out_kill_producers;
+
+	/*
+	 * Run them as low-prio background tasks by default:
+	 */
+	if (!disable_reader) {
+		if (consumer_fifo >= 0) {
+			struct sched_param param = {
+				.sched_priority = consumer_fifo
+			};
+			sched_setscheduler(consumer, SCHED_FIFO, &param);
+		} else
+			set_user_nice(consumer, consumer_nice);
+	}
+
+	if (producer_fifo >= 0) {
+		struct sched_param param = {
+			.sched_priority = consumer_fifo
+		};
+		sched_setscheduler(producer, SCHED_FIFO, &param);
+		for (i = 0; i < nr_producers; i++)
+			sched_setscheduler(producers[i], SCHED_FIFO, &param);
+	} else {
+		set_user_nice(producer, producer_nice);
+		for (i = 0; i < nr_producers; i++)
+			set_user_nice(producers[i], producer_nice);
+	}
+
+	return 0;
+
+out_kill_producers:
+	for (i = 0; i < nr_producers; i++)
+		if (producers[i] && !IS_ERR(producers[i]))
+			kthread_kill_stop(producers[i], SIGKILL);
+out_free_writer_structures:
+	kfree(write_done);
+	kfree(write_start);
+	kfree(writer_finish);
+	kfree(producers);
+#ifndef RING_BUFFER_PER_CPU_LOCAL_TMPL
+	if (file_reader)
+		debugfs_remove(dentry);
+#endif
+	if (consumer)
+		kthread_kill_stop(consumer, SIGKILL);
+out_fail:
+	RING_BUFFER_TMPL(destroy)(channel);
+	return ret;
+}
+
+static void __exit ring_buffer_benchmark_exit(void)
+{
+	unsigned int i;
+
+	kthread_kill_stop(producer, SIGKILL);
+	for (i = 0; i < nr_producers; i++)
+		kthread_kill_stop(producers[i], SIGKILL);
+#ifndef RING_BUFFER_PER_CPU_LOCAL_TMPL
+	if (file_reader)
+		debugfs_remove(dentry);
+#endif
+	RING_BUFFER_TMPL(destroy)(channel);
+	if (consumer)
+		kthread_kill_stop(consumer, SIGKILL);
+	kfree(write_done);
+	kfree(write_start);
+	kfree(writer_finish);
+	kfree(producers);
+}
+
+module_init(ring_buffer_benchmark_init);
+module_exit(ring_buffer_benchmark_exit);
Index: linux.trees.git/kernel/trace/lib_global_discard_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/lib_global_discard_ring_buffer_benchmark.c	2010-07-09 18:35:53.000000000 -0400
@@ -0,0 +1,14 @@
+/*
+ * ring buffer global discard library tester and benchmark
+ *
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#define RING_BUFFER_GLOBAL_TMPL
+#define RING_BUFFER_DISCARD_TMPL
+#define RING_BUFFER_NAME_TMPL "ring buffer global discard"
+#include "ring_buffer_benchmark_template.h"
+
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION(RING_BUFFER_NAME_TMPL " test and benchmark");
+MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/lib_global_overwrite_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/lib_global_overwrite_ring_buffer_benchmark.c	2010-07-09 18:35:53.000000000 -0400
@@ -0,0 +1,14 @@
+/*
+ * ring buffer global overwrite library tester and benchmark
+ *
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#define RING_BUFFER_GLOBAL_TMPL
+#define RING_BUFFER_OVERWRITE_TMPL
+#define RING_BUFFER_NAME_TMPL "ring buffer global overwrite"
+#include "ring_buffer_benchmark_template.h"
+
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION(RING_BUFFER_NAME_TMPL " test and benchmark");
+MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/lib_percpu_discard_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/lib_percpu_discard_ring_buffer_benchmark.c	2010-07-09 18:35:53.000000000 -0400
@@ -0,0 +1,14 @@
+/*
+ * ring buffer per-cpu discard library tester and benchmark
+ *
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#define RING_BUFFER_PER_CPU_TMPL
+#define RING_BUFFER_DISCARD_TMPL
+#define RING_BUFFER_NAME_TMPL "ring buffer per-cpu discard"
+#include "ring_buffer_benchmark_template.h"
+
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION(RING_BUFFER_NAME_TMPL " test and benchmark");
+MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/lib_percpu_local_discard_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/lib_percpu_local_discard_ring_buffer_benchmark.c	2010-07-09 18:35:53.000000000 -0400
@@ -0,0 +1,14 @@
+/*
+ * ring buffer per-cpu local discard library tester and benchmark
+ *
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#define RING_BUFFER_PER_CPU_LOCAL_TMPL
+#define RING_BUFFER_DISCARD_TMPL
+#define RING_BUFFER_NAME_TMPL "ring buffer per-cpu local discard"
+#include "ring_buffer_benchmark_template.h"
+
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION(RING_BUFFER_NAME_TMPL " test and benchmark");
+MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/lib_percpu_local_overwrite_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/lib_percpu_local_overwrite_ring_buffer_benchmark.c	2010-07-09 18:35:53.000000000 -0400
@@ -0,0 +1,14 @@
+/*
+ * ring buffer per-cpu local overwrite library tester and benchmark
+ *
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#define RING_BUFFER_PER_CPU_LOCAL_TMPL
+#define RING_BUFFER_OVERWRITE_TMPL
+#define RING_BUFFER_NAME_TMPL "ring buffer per-cpu local overwrite"
+#include "ring_buffer_benchmark_template.h"
+
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION(RING_BUFFER_NAME_TMPL " test and benchmark");
+MODULE_LICENSE("GPL");
Index: linux.trees.git/kernel/trace/lib_percpu_overwrite_ring_buffer_benchmark.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/kernel/trace/lib_percpu_overwrite_ring_buffer_benchmark.c	2010-07-09 18:35:53.000000000 -0400
@@ -0,0 +1,14 @@
+/*
+ * ring buffer per-cpu overwrite library tester and benchmark
+ *
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ */
+
+#define RING_BUFFER_PER_CPU_TMPL
+#define RING_BUFFER_OVERWRITE_TMPL
+#define RING_BUFFER_NAME_TMPL "ring buffer per-cpu overwrite"
+#include "ring_buffer_benchmark_template.h"
+
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION(RING_BUFFER_NAME_TMPL " test and benchmark");
+MODULE_LICENSE("GPL");


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 01/20] Create generic alignment API (v8)
  2010-07-09 22:57   ` Mathieu Desnoyers
  (?)
@ 2010-08-06 11:41   ` Alexander Shishkin
  2010-08-06 14:48       ` Mathieu Desnoyers
  -1 siblings, 1 reply; 25+ messages in thread
From: Alexander Shishkin @ 2010-08-06 11:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker,
	Thomas Gleixner, Christoph Hellwig, Li Zefan, Lai Jiangshan,
	Johannes Berg, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Tom Zanussi, KOSAKI Motohiro, Andi Kleen,
	Russell King - ARM Linux, linux-arm-kernel, Imre Deak,
	Jamie Lokier, Alexey Dobriyan, Alexander Shishkin

On Fri, Jul 09, 2010 at 06:57:28 -0400, Mathieu Desnoyers wrote:
> Rather than re-doing the "alignment on a type size" trick all over again at
> different levels, import the "ltt_align" from LTTng into kernel.h and make this
> available to everyone. Renaming to:
> 
> - object_align()
> - object_align_floor()
> - offset_align()
> - offset_align_floor()

I was just wondering if this patch makes any progress anywhere?

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 01/20] Create generic alignment API (v8)
  2010-08-06 11:41   ` Alexander Shishkin
@ 2010-08-06 14:48       ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-08-06 14:48 UTC (permalink / raw)
  To: Steven Rostedt, LKML, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Ingo Molnar, Frederic Weisbecker,
	Thomas Gleixner, Christoph Hellwig, Li Zefan, Lai Jiangshan,
	Johannes Berg, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Tom Zanussi, KOSAKI Motohiro, Andi Kleen,
	Russell King - ARM Linux, linux-arm-kernel, Imre Deak,
	Jamie Lokier, Alexey Dobriyan

* Alexander Shishkin (virtuoso@slind.org) wrote:
> On Fri, Jul 09, 2010 at 06:57:28 -0400, Mathieu Desnoyers wrote:
> > Rather than re-doing the "alignment on a type size" trick all over again at
> > different levels, import the "ltt_align" from LTTng into kernel.h and make this
> > available to everyone. Renaming to:
> > 
> > - object_align()
> > - object_align_floor()
> > - offset_align()
> > - offset_align_floor()
> 
> I was just wondering if this patch makes any progress anywhere?

I'm using it in my generic ring buffer code (I'm posting it as part of the
patchset). This ring buffer has only been posted in RFC-stages so far.

Other than that, there has been no objection. So the first user of this patch
that gets merged will probably bring the patch alongside with it.

Thanks,

Mathieu

> 
> Regards,
> --
> Alex

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 01/20] Create generic alignment API (v8)
@ 2010-08-06 14:48       ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2010-08-06 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

* Alexander Shishkin (virtuoso at slind.org) wrote:
> On Fri, Jul 09, 2010 at 06:57:28 -0400, Mathieu Desnoyers wrote:
> > Rather than re-doing the "alignment on a type size" trick all over again at
> > different levels, import the "ltt_align" from LTTng into kernel.h and make this
> > available to everyone. Renaming to:
> > 
> > - object_align()
> > - object_align_floor()
> > - offset_align()
> > - offset_align_floor()
> 
> I was just wondering if this patch makes any progress anywhere?

I'm using it in my generic ring buffer code (I'm posting it as part of the
patchset). This ring buffer has only been posted in RFC-stages so far.

Other than that, there has been no objection. So the first user of this patch
that gets merged will probably bring the patch alongside with it.

Thanks,

Mathieu

> 
> Regards,
> --
> Alex

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-08-06 14:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-09 22:57 [patch 00/20] Generic Ring Buffer Library Mathieu Desnoyers
2010-07-09 22:57 ` [patch 01/20] Create generic alignment API (v8) Mathieu Desnoyers
2010-07-09 22:57   ` Mathieu Desnoyers
2010-08-06 11:41   ` Alexander Shishkin
2010-08-06 14:48     ` Mathieu Desnoyers
2010-08-06 14:48       ` Mathieu Desnoyers
2010-07-09 22:57 ` [patch 02/20] notifier atomic call chain notrace Mathieu Desnoyers
2010-07-09 22:57 ` [patch 03/20] idle notifier standardization Mathieu Desnoyers
2010-07-09 22:57 ` [patch 04/20] idle notifier standardization x86_32 Mathieu Desnoyers
2010-07-09 22:57 ` [patch 05/20] Poll : add poll_wait_set_exclusive Mathieu Desnoyers
2010-07-09 22:57 ` [patch 06/20] prio_heap: heap_remove(), heap_maximum(), heap_replace() and heap_cherrypick() Mathieu Desnoyers
2010-07-09 22:57 ` [patch 07/20] kthread_kill_stop() Mathieu Desnoyers
2010-07-09 22:57 ` [patch 08/20] inline memcpy Mathieu Desnoyers
2010-07-09 22:57 ` [patch 09/20] x86 " Mathieu Desnoyers
2010-07-09 22:57 ` [patch 10/20] Trace clock - build standalone Mathieu Desnoyers
2010-07-09 22:57 ` [patch 11/20] Ftrace ring buffer renaming Mathieu Desnoyers
2010-07-09 22:57 ` [patch 12/20] ring buffer backend Mathieu Desnoyers
2010-07-09 22:57 ` [patch 13/20] ring buffer frontend Mathieu Desnoyers
2010-07-09 22:57 ` [patch 14/20] Ring buffer library - documentation Mathieu Desnoyers
2010-07-09 22:57 ` [patch 15/20] Ring buffer library - VFS operations Mathieu Desnoyers
2010-07-09 22:57 ` [patch 16/20] Ring buffer library - client sample Mathieu Desnoyers
2010-07-09 22:57 ` [patch 17/20] Ring buffer benchmark library Mathieu Desnoyers
2010-07-09 22:57 ` [patch 18/20] Ring Buffer Record Iterator Mathieu Desnoyers
2010-07-09 22:57 ` [patch 19/20] Ring Buffer: Basic API Mathieu Desnoyers
2010-07-09 22:57 ` [patch 20/20] Ring buffer: benchmark simple API Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.