linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] [GIT PULL] ring-buffer: Various optimizations
@ 2010-10-19 17:54 Steven Rostedt
  2010-10-19 17:54 ` [PATCH 1/5] ring-buffer: Make write slow path out of line Steven Rostedt
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Steven Rostedt @ 2010-10-19 17:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker


Ingo,

I based this tree off of Linus's latest, since it removes the macro
that was updated in rc8. This patch set supplies a better fix,
albeit, a bit more complex of a change, hence not going into the rc.

Please pull the latest tip/perf/ringbuffer tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/perf/ringbuffer


Steven Rostedt (5):
      ring-buffer: Make write slow path out of line
      ring-buffer: Bind time extend and data events together
      ring-buffer: Remove condition to add timestamp in fast path
      ring-buffer: Micro-optimize with some strategic inlining
      ring-buffer: Remove unused macro RB_TIMESTAMPS_PER_PAGE

----
 include/linux/ring_buffer.h |   12 --
 kernel/trace/ring_buffer.c  |  344 ++++++++++++++++++++++--------------------
 2 files changed, 180 insertions(+), 176 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] ring-buffer: Make write slow path out of line
  2010-10-19 17:54 [PATCH 0/5] [GIT PULL] ring-buffer: Various optimizations Steven Rostedt
@ 2010-10-19 17:54 ` Steven Rostedt
  2010-10-19 17:54 ` [PATCH 2/5] ring-buffer: Bind time extend and data events together Steven Rostedt
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2010-10-19 17:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0001-ring-buffer-Make-write-slow-path-out-of-line.patch --]
[-- Type: text/plain, Size: 1427 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Gcc inlines the slow path of the ring buffer write which can
hurt performance. This patch simply forces the slow path function
rb_move_tail() to always be a function.

The ring_buffer_benchmark module with reader_disabled=1 shows that
this patch changes the time to record an event from 135 ns to
132 ns. (3 ns or 2.22% improvement)

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index bca9637..0b88df8 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1823,7 +1823,10 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	local_sub(length, &tail_page->write);
 }
 
-static struct ring_buffer_event *
+/*
+ * This is the slow path, force gcc not to inline it.
+ */
+static noinline struct ring_buffer_event *
 rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	     unsigned long length, unsigned long tail,
 	     struct buffer_page *tail_page, u64 *ts)
@@ -1943,7 +1946,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	tail = write - length;
 
 	/* See if we shot pass the end of this buffer page */
-	if (write > BUF_PAGE_SIZE)
+	if (unlikely(write > BUF_PAGE_SIZE))
 		return rb_move_tail(cpu_buffer, length, tail,
 				    tail_page, ts);
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] ring-buffer: Bind time extend and data events together
  2010-10-19 17:54 [PATCH 0/5] [GIT PULL] ring-buffer: Various optimizations Steven Rostedt
  2010-10-19 17:54 ` [PATCH 1/5] ring-buffer: Make write slow path out of line Steven Rostedt
@ 2010-10-19 17:54 ` Steven Rostedt
  2010-10-19 20:46   ` Thomas Gleixner
  2010-10-19 17:54 ` [PATCH 3/5] ring-buffer: Remove condition to add timestamp in fast path Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2010-10-19 17:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt

[-- Attachment #1: 0002-ring-buffer-Bind-time-extend-and-data-events-togethe.patch --]
[-- Type: text/plain, Size: 17600 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

When the time between two timestamps is greater than
2^27 nanosecs (~134 ms) a time extend event is added that extends
the time difference to 59 bits (~18 years). This is due to
events only having a 27 bit field to store time.

Currently this time extend is a separate event. We add it just before
the event data that is being written to the buffer. But before
the event data is committed, the event data can also be discarded (as
with the case of filters). But because the time extend has already been
committed, it will stay in the buffer.

If lots of events are being filtered and no event is being
written, then every 134ms a time extend can be added to the buffer
without any data attached. To keep from filling the entire buffer
with time extends, a time extend will never be the first event
in a page because the page timestamp can be used. Time extends can
only fill the rest of a page with some data at the beginning.

This patch binds the time extend with the data. The difference here
is that the time extend is not committed before the data is added.
Instead, when a time extend is needed, the space reserved on
the ring buffer is the time extend + the data event size. The
time extend is added to the first part of the reserved block and
the data is added to the second. The time extend event is passed
back to the reserver, but since the reserver also uses a function
to find the data portion of the reserved block, no changes to the
ring buffer interface need to be made.

When a commit is discarded, we now remove both the time extend and
the event. With this approach no more than one time extend can
be in the buffer in a row. Data must always follow a time extend.

Thanks to Mathieu Desnoyers for suggesting this idea.

This patch also removes the static inline function
ring_buffer_event_time_delta() since it had no users. I was
going to update it to handle this change, but found that to be
a waste of time, and just removed the function instead.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ring_buffer.h |   12 --
 kernel/trace/ring_buffer.c  |  283 +++++++++++++++++++++++--------------------
 2 files changed, 154 insertions(+), 141 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 25b4f68..8d3a248 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -62,18 +62,6 @@ enum ring_buffer_type {
 unsigned ring_buffer_event_length(struct ring_buffer_event *event);
 void *ring_buffer_event_data(struct ring_buffer_event *event);
 
-/**
- * ring_buffer_event_time_delta - return the delta timestamp of the event
- * @event: the event to get the delta timestamp of
- *
- * The delta timestamp is the 27 bit timestamp since the last event.
- */
-static inline unsigned
-ring_buffer_event_time_delta(struct ring_buffer_event *event)
-{
-	return event->time_delta;
-}
-
 /*
  * ring_buffer_discard_commit will remove an event that has not
  *   ben committed yet. If this is used, then ring_buffer_unlock_commit
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0b88df8..4fc1577 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -224,6 +224,9 @@ enum {
 	RB_LEN_TIME_STAMP = 16,
 };
 
+#define skip_time_extend(event) \
+	((struct ring_buffer_event *)((char *)event + RB_LEN_TIME_EXTEND))
+
 static inline int rb_null_event(struct ring_buffer_event *event)
 {
 	return event->type_len == RINGBUF_TYPE_PADDING && !event->time_delta;
@@ -250,7 +253,7 @@ rb_event_data_length(struct ring_buffer_event *event)
 
 /* inline for ring buffer fast paths */
 static unsigned
-rb_event_length(struct ring_buffer_event *event)
+__rb_event_length(struct ring_buffer_event *event)
 {
 	switch (event->type_len) {
 	case RINGBUF_TYPE_PADDING:
@@ -274,13 +277,52 @@ rb_event_length(struct ring_buffer_event *event)
 	return 0;
 }
 
+/*
+ * Return total length of time extend and data,
+ *   or just the event length for all other events.
+ */
+static unsigned
+rb_event_ts_length(struct ring_buffer_event *event)
+{
+	unsigned len = 0;
+
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
+		/* time extends include the data event after it */
+		len = RB_LEN_TIME_EXTEND;
+		event = skip_time_extend(event);
+	}
+	return len + __rb_event_length(event);
+}
+
+/*
+ * Return the length of the given event. Will return
+ * the length of the time extend if the event is a
+ * time extend.
+ */
+static unsigned
+rb_event_real_length(struct ring_buffer_event *event)
+{
+	return __rb_event_length(event);
+}
+
 /**
  * ring_buffer_event_length - return the length of the event
  * @event: the event to get the length of
+ *
+ * Returns the size of the data load of a data event.
+ * If the event is something other than a data event, it
+ * returns the size of the event itself. With the exception
+ * of a TIME EXTEND, where it still returns the size of the
+ * data load of the data event after it.
  */
 unsigned ring_buffer_event_length(struct ring_buffer_event *event)
 {
-	unsigned length = rb_event_length(event);
+	unsigned length;
+
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+		event = skip_time_extend(event);
+
+	length = rb_event_real_length(event);
 	if (event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
 		return length;
 	length -= RB_EVNT_HDR_SIZE;
@@ -294,6 +336,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_length);
 static void *
 rb_event_data(struct ring_buffer_event *event)
 {
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+		event = skip_time_extend(event);
 	BUG_ON(event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
 	/* If length is in len field, then array[0] has the data */
 	if (event->type_len)
@@ -1546,6 +1590,25 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
 	iter->head = 0;
 }
 
+/* Slow path, do not inline */
+static noinline struct ring_buffer_event *
+rb_add_time_stamp(struct ring_buffer_event *event, u64 delta)
+{
+	event->type_len = RINGBUF_TYPE_TIME_EXTEND;
+
+	/* Not the first event on the page? */
+	if (rb_event_index(event)) {
+		event->time_delta = delta & TS_MASK;
+		event->array[0] = delta >> TS_SHIFT;
+	} else {
+		/* nope, just zero it */
+		event->time_delta = 0;
+		event->array[0] = 0;
+	}
+
+	return skip_time_extend(event);
+}
+
 /**
  * ring_buffer_update_event - update event type and data
  * @event: the even to update
@@ -1558,28 +1621,31 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
  * data field.
  */
 static void
-rb_update_event(struct ring_buffer_event *event,
-			 unsigned type, unsigned length)
+rb_update_event(struct ring_buffer_per_cpu *cpu_buffer,
+		struct ring_buffer_event *event, unsigned length,
+		int add_timestamp, u64 delta)
 {
-	event->type_len = type;
-
-	switch (type) {
-
-	case RINGBUF_TYPE_PADDING:
-	case RINGBUF_TYPE_TIME_EXTEND:
-	case RINGBUF_TYPE_TIME_STAMP:
-		break;
+	/* Only a commit updates the timestamp */
+	if (unlikely(!rb_event_is_commit(cpu_buffer, event)))
+		delta = 0;
 
-	case 0:
-		length -= RB_EVNT_HDR_SIZE;
-		if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT)
-			event->array[0] = length;
-		else
-			event->type_len = DIV_ROUND_UP(length, RB_ALIGNMENT);
-		break;
-	default:
-		BUG();
+	/*
+	 * If we need to add a timestamp, then we
+	 * add it to the start of the resevered space.
+	 */
+	if (unlikely(add_timestamp)) {
+		event = rb_add_time_stamp(event, delta);
+		length -= RB_LEN_TIME_EXTEND;
+		delta = 0;
 	}
+
+	event->time_delta = delta;
+	length -= RB_EVNT_HDR_SIZE;
+	if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT) {
+		event->type_len = 0;
+		event->array[0] = length;
+	} else
+		event->type_len = DIV_ROUND_UP(length, RB_ALIGNMENT);
 }
 
 /*
@@ -1829,7 +1895,7 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 static noinline struct ring_buffer_event *
 rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	     unsigned long length, unsigned long tail,
-	     struct buffer_page *tail_page, u64 *ts)
+	     struct buffer_page *tail_page, u64 ts)
 {
 	struct buffer_page *commit_page = cpu_buffer->commit_page;
 	struct ring_buffer *buffer = cpu_buffer->buffer;
@@ -1912,8 +1978,8 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 		 * Nested commits always have zero deltas, so
 		 * just reread the time stamp
 		 */
-		*ts = rb_time_stamp(buffer);
-		next_page->page->time_stamp = *ts;
+		ts = rb_time_stamp(buffer);
+		next_page->page->time_stamp = ts;
 	}
 
  out_again:
@@ -1932,12 +1998,21 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 
 static struct ring_buffer_event *
 __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
-		  unsigned type, unsigned long length, u64 *ts)
+		  unsigned long length, u64 ts,
+		  u64 delta, int add_timestamp)
 {
 	struct buffer_page *tail_page;
 	struct ring_buffer_event *event;
 	unsigned long tail, write;
 
+	/*
+	 * If the time delta since the last event is too big to
+	 * hold in the time field of the event, then we append a
+	 * TIME EXTEND event ahead of the data event.
+	 */
+	if (unlikely(add_timestamp))
+		length += RB_LEN_TIME_EXTEND;
+
 	tail_page = cpu_buffer->tail_page;
 	write = local_add_return(length, &tail_page->write);
 
@@ -1954,18 +2029,16 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 
 	event = __rb_page_index(tail_page, tail);
 	kmemcheck_annotate_bitfield(event, bitfield);
-	rb_update_event(event, type, length);
+	rb_update_event(cpu_buffer, event, length, add_timestamp, delta);
 
-	/* The passed in type is zero for DATA */
-	if (likely(!type))
-		local_inc(&tail_page->entries);
+	local_inc(&tail_page->entries);
 
 	/*
 	 * If this is the first commit on the page, then update
 	 * its timestamp.
 	 */
 	if (!tail)
-		tail_page->page->time_stamp = *ts;
+		tail_page->page->time_stamp = ts;
 
 	return event;
 }
@@ -1980,7 +2053,7 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
 	unsigned long addr;
 
 	new_index = rb_event_index(event);
-	old_index = new_index + rb_event_length(event);
+	old_index = new_index + rb_event_ts_length(event);
 	addr = (unsigned long)event;
 	addr &= PAGE_MASK;
 
@@ -2006,69 +2079,6 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
 	return 0;
 }
 
-static int
-rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
-		  u64 *ts, u64 *delta)
-{
-	struct ring_buffer_event *event;
-	int ret;
-
-	WARN_ONCE(*delta > (1ULL << 59),
-		  KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n",
-		  (unsigned long long)*delta,
-		  (unsigned long long)*ts,
-		  (unsigned long long)cpu_buffer->write_stamp);
-
-	/*
-	 * The delta is too big, we to add a
-	 * new timestamp.
-	 */
-	event = __rb_reserve_next(cpu_buffer,
-				  RINGBUF_TYPE_TIME_EXTEND,
-				  RB_LEN_TIME_EXTEND,
-				  ts);
-	if (!event)
-		return -EBUSY;
-
-	if (PTR_ERR(event) == -EAGAIN)
-		return -EAGAIN;
-
-	/* Only a commited time event can update the write stamp */
-	if (rb_event_is_commit(cpu_buffer, event)) {
-		/*
-		 * If this is the first on the page, then it was
-		 * updated with the page itself. Try to discard it
-		 * and if we can't just make it zero.
-		 */
-		if (rb_event_index(event)) {
-			event->time_delta = *delta & TS_MASK;
-			event->array[0] = *delta >> TS_SHIFT;
-		} else {
-			/* try to discard, since we do not need this */
-			if (!rb_try_to_discard(cpu_buffer, event)) {
-				/* nope, just zero it */
-				event->time_delta = 0;
-				event->array[0] = 0;
-			}
-		}
-		cpu_buffer->write_stamp = *ts;
-		/* let the caller know this was the commit */
-		ret = 1;
-	} else {
-		/* Try to discard the event */
-		if (!rb_try_to_discard(cpu_buffer, event)) {
-			/* Darn, this is just wasted space */
-			event->time_delta = 0;
-			event->array[0] = 0;
-		}
-		ret = 0;
-	}
-
-	*delta = 0;
-
-	return ret;
-}
-
 static void rb_start_commit(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	local_inc(&cpu_buffer->committing);
@@ -2113,9 +2123,9 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 		      unsigned long length)
 {
 	struct ring_buffer_event *event;
-	u64 ts, delta = 0;
-	int commit = 0;
+	u64 ts, delta;
 	int nr_loops = 0;
+	int add_timestamp;
 
 	rb_start_commit(cpu_buffer);
 
@@ -2136,6 +2146,9 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 
 	length = rb_calculate_event_length(length);
  again:
+	add_timestamp = 0;
+	delta = 0;
+
 	/*
 	 * We allow for interrupts to reenter here and do a trace.
 	 * If one does, it will cause this original code to loop
@@ -2174,31 +2187,24 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 
 		delta = diff;
 		if (unlikely(test_time_stamp(delta))) {
-
-			commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
-			if (commit == -EBUSY)
-				goto out_fail;
-
-			if (commit == -EAGAIN)
-				goto again;
-
-			RB_WARN_ON(cpu_buffer, commit < 0);
+			WARN_ONCE(delta > (1ULL << 59),
+				  KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n",
+				  (unsigned long long)delta,
+				  (unsigned long long)ts,
+				  (unsigned long long)cpu_buffer->write_stamp);
+			add_timestamp = 1;
 		}
 	}
 
  get_event:
-	event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
+	event = __rb_reserve_next(cpu_buffer, length, ts,
+				  delta, add_timestamp);
 	if (unlikely(PTR_ERR(event) == -EAGAIN))
 		goto again;
 
 	if (!event)
 		goto out_fail;
 
-	if (!rb_event_is_commit(cpu_buffer, event))
-		delta = 0;
-
-	event->time_delta = delta;
-
 	return event;
 
  out_fail:
@@ -2311,12 +2317,28 @@ static void
 rb_update_write_stamp(struct ring_buffer_per_cpu *cpu_buffer,
 		      struct ring_buffer_event *event)
 {
+	u64 delta;
+
 	/*
 	 * The event first in the commit queue updates the
 	 * time stamp.
 	 */
-	if (rb_event_is_commit(cpu_buffer, event))
-		cpu_buffer->write_stamp += event->time_delta;
+	if (rb_event_is_commit(cpu_buffer, event)) {
+		/*
+		 * A commit event that is first on a page
+		 * updates the write timestamp with the page stamp
+		 */
+		if (!rb_event_index(event))
+			cpu_buffer->write_stamp =
+				cpu_buffer->commit_page->page->time_stamp;
+		else if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
+			delta = event->array[0];
+			delta <<= TS_SHIFT;
+			delta += event->time_delta;
+			cpu_buffer->write_stamp += delta;
+		} else
+			cpu_buffer->write_stamp += event->time_delta;
+	}
 }
 
 static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
@@ -2356,6 +2378,9 @@ EXPORT_SYMBOL_GPL(ring_buffer_unlock_commit);
 
 static inline void rb_event_discard(struct ring_buffer_event *event)
 {
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+		event = skip_time_extend(event);
+
 	/* array[0] holds the actual length for the discarded event */
 	event->array[0] = rb_event_data_length(event) - RB_EVNT_HDR_SIZE;
 	event->type_len = RINGBUF_TYPE_PADDING;
@@ -2982,7 +3007,7 @@ static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer)
 
 	rb_update_read_stamp(cpu_buffer, event);
 
-	length = rb_event_length(event);
+	length = rb_event_real_length(event);
 	cpu_buffer->reader_page->read += length;
 }
 
@@ -3007,7 +3032,7 @@ static void rb_advance_iter(struct ring_buffer_iter *iter)
 
 	event = rb_iter_head_event(iter);
 
-	length = rb_event_length(event);
+	length = rb_event_real_length(event);
 
 	/*
 	 * This should not be called to advance the header if we are
@@ -3043,12 +3068,12 @@ rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts,
 
  again:
 	/*
-	 * We repeat when a timestamp is encountered. It is possible
-	 * to get multiple timestamps from an interrupt entering just
-	 * as one timestamp is about to be written, or from discarded
-	 * commits. The most that we can have is the number on a single page.
+	 * We repeat when a time extend is encountered.
+	 * Since the time extend is always attached to a data event,
+	 * we should never loop more than once.
+	 * (We never hit the following condition more than twice).
 	 */
-	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
+	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 2))
 		return NULL;
 
 	reader = rb_get_reader_page(cpu_buffer);
@@ -3124,14 +3149,12 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
 		return NULL;
 
 	/*
-	 * We repeat when a timestamp is encountered.
-	 * We can get multiple timestamps by nested interrupts or also
-	 * if filtering is on (discarding commits). Since discarding
-	 * commits can be frequent we can get a lot of timestamps.
-	 * But we limit them by not adding timestamps if they begin
-	 * at the start of a page.
+	 * We repeat when a time extend is encountered.
+	 * Since the time extend is always attached to a data event,
+	 * we should never loop more than once.
+	 * (We never hit the following condition more than twice).
 	 */
-	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
+	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 2))
 		return NULL;
 
 	if (rb_per_cpu_empty(cpu_buffer))
@@ -3829,7 +3852,8 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 		if (len > (commit - read))
 			len = (commit - read);
 
-		size = rb_event_length(event);
+		/* Always keep the time extend and data together */
+		size = rb_event_ts_length(event);
 
 		if (len < size)
 			goto out_unlock;
@@ -3851,7 +3875,8 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 				break;
 
 			event = rb_reader_event(cpu_buffer);
-			size = rb_event_length(event);
+			/* Always keep the time extend and data together */
+			size = rb_event_ts_length(event);
 		} while (len > size);
 
 		/* update bpage */
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] ring-buffer: Remove condition to add timestamp in fast path
  2010-10-19 17:54 [PATCH 0/5] [GIT PULL] ring-buffer: Various optimizations Steven Rostedt
  2010-10-19 17:54 ` [PATCH 1/5] ring-buffer: Make write slow path out of line Steven Rostedt
  2010-10-19 17:54 ` [PATCH 2/5] ring-buffer: Bind time extend and data events together Steven Rostedt
@ 2010-10-19 17:54 ` Steven Rostedt
  2010-10-19 17:54 ` [PATCH 4/5] ring-buffer: Micro-optimize with some strategic inlining Steven Rostedt
  2010-10-19 17:54 ` [PATCH 5/5] ring-buffer: Remove unused macro RB_TIMESTAMPS_PER_PAGE Steven Rostedt
  4 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2010-10-19 17:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0003-ring-buffer-Remove-condition-to-add-timestamp-in-fas.patch --]
[-- Type: text/plain, Size: 2446 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

There's a condition to check if we should add a time extend or
not in the fast path. But this condition is racey (in the sense
that we can add a unnecessary time extend, but nothing that
can break anything). We later check if the time or event time
delta should be zero or have real data in it (not racey), making
this first check redundant.

This check may help save space once in a while, but really is
not worth the hassle to try to save some space that happens at
most 134 ms at a time.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   28 ++++++----------------------
 1 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 4fc1577..9f55330 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2126,6 +2126,7 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 	u64 ts, delta;
 	int nr_loops = 0;
 	int add_timestamp;
+	u64 diff;
 
 	rb_start_commit(cpu_buffer);
 
@@ -2162,29 +2163,13 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 		goto out_fail;
 
 	ts = rb_time_stamp(cpu_buffer->buffer);
+	diff = ts - cpu_buffer->write_stamp;
 
-	/*
-	 * Only the first commit can update the timestamp.
-	 * Yes there is a race here. If an interrupt comes in
-	 * just after the conditional and it traces too, then it
-	 * will also check the deltas. More than one timestamp may
-	 * also be made. But only the entry that did the actual
-	 * commit will be something other than zero.
-	 */
-	if (likely(cpu_buffer->tail_page == cpu_buffer->commit_page &&
-		   rb_page_write(cpu_buffer->tail_page) ==
-		   rb_commit_index(cpu_buffer))) {
-		u64 diff;
-
-		diff = ts - cpu_buffer->write_stamp;
-
-		/* make sure this diff is calculated here */
-		barrier();
-
-		/* Did the write stamp get updated already? */
-		if (unlikely(ts < cpu_buffer->write_stamp))
-			goto get_event;
+	/* make sure this diff is calculated here */
+	barrier();
 
+	/* Did the write stamp get updated already? */
+	if (likely(ts >= cpu_buffer->write_stamp)) {
 		delta = diff;
 		if (unlikely(test_time_stamp(delta))) {
 			WARN_ONCE(delta > (1ULL << 59),
@@ -2196,7 +2181,6 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 		}
 	}
 
- get_event:
 	event = __rb_reserve_next(cpu_buffer, length, ts,
 				  delta, add_timestamp);
 	if (unlikely(PTR_ERR(event) == -EAGAIN))
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] ring-buffer: Micro-optimize with some strategic inlining
  2010-10-19 17:54 [PATCH 0/5] [GIT PULL] ring-buffer: Various optimizations Steven Rostedt
                   ` (2 preceding siblings ...)
  2010-10-19 17:54 ` [PATCH 3/5] ring-buffer: Remove condition to add timestamp in fast path Steven Rostedt
@ 2010-10-19 17:54 ` Steven Rostedt
  2010-10-19 17:54 ` [PATCH 5/5] ring-buffer: Remove unused macro RB_TIMESTAMPS_PER_PAGE Steven Rostedt
  4 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2010-10-19 17:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0004-ring-buffer-Micro-optimize-with-some-strategic-inlin.patch --]
[-- Type: text/plain, Size: 1684 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

By using inline and noinline, we are able to make the fast path of
recording an event 4% faster.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9f55330..fda009a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2085,7 +2085,7 @@ static void rb_start_commit(struct ring_buffer_per_cpu *cpu_buffer)
 	local_inc(&cpu_buffer->commits);
 }
 
-static void rb_end_commit(struct ring_buffer_per_cpu *cpu_buffer)
+static inline void rb_end_commit(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	unsigned long commits;
 
@@ -2200,13 +2200,9 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 
 #define TRACE_RECURSIVE_DEPTH 16
 
-static int trace_recursive_lock(void)
+/* Keep this code out of the fast path cache */
+static noinline void trace_recursive_fail(void)
 {
-	current->trace_recursion++;
-
-	if (likely(current->trace_recursion < TRACE_RECURSIVE_DEPTH))
-		return 0;
-
 	/* Disable all tracing before we do anything else */
 	tracing_off_permanent();
 
@@ -2218,10 +2214,21 @@ static int trace_recursive_lock(void)
 		    in_nmi());
 
 	WARN_ON_ONCE(1);
+}
+
+static inline int trace_recursive_lock(void)
+{
+	current->trace_recursion++;
+
+	if (likely(current->trace_recursion < TRACE_RECURSIVE_DEPTH))
+		return 0;
+
+	trace_recursive_fail();
+
 	return -1;
 }
 
-static void trace_recursive_unlock(void)
+static inline void trace_recursive_unlock(void)
 {
 	WARN_ON_ONCE(!current->trace_recursion);
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] ring-buffer: Remove unused macro RB_TIMESTAMPS_PER_PAGE
  2010-10-19 17:54 [PATCH 0/5] [GIT PULL] ring-buffer: Various optimizations Steven Rostedt
                   ` (3 preceding siblings ...)
  2010-10-19 17:54 ` [PATCH 4/5] ring-buffer: Micro-optimize with some strategic inlining Steven Rostedt
@ 2010-10-19 17:54 ` Steven Rostedt
  4 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2010-10-19 17:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0005-ring-buffer-Remove-unused-macro-RB_TIMESTAMPS_PER_PA.patch --]
[-- Type: text/plain, Size: 883 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

With the binding of time extends to events we no longer need to use
the macro RB_TIMESTAMPS_PER_PAGE. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fda009a..bd31d97 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -448,9 +448,6 @@ static inline int test_time_stamp(u64 delta)
 /* Max payload is BUF_PAGE_SIZE - header (8bytes) */
 #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
 
-/* Max number of timestamps that can fit on a page */
-#define RB_TIMESTAMPS_PER_PAGE	(BUF_PAGE_SIZE / RB_LEN_TIME_EXTEND)
-
 int ring_buffer_print_page_header(struct trace_seq *s)
 {
 	struct buffer_data_page field;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/5] ring-buffer: Bind time extend and data events together
  2010-10-19 17:54 ` [PATCH 2/5] ring-buffer: Bind time extend and data events together Steven Rostedt
@ 2010-10-19 20:46   ` Thomas Gleixner
  2010-10-20 13:06     ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2010-10-19 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker

On Tue, 19 Oct 2010, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> When the time between two timestamps is greater than
> 2^27 nanosecs (~134 ms) a time extend event is added that extends
> the time difference to 59 bits (~18 years). This is due to
> events only having a 27 bit field to store time.
> 
> Currently this time extend is a separate event. We add it just before
> the event data that is being written to the buffer. But before
> the event data is committed, the event data can also be discarded (as
> with the case of filters). But because the time extend has already been
> committed, it will stay in the buffer.
> 
> If lots of events are being filtered and no event is being
> written, then every 134ms a time extend can be added to the buffer
> without any data attached. To keep from filling the entire buffer
> with time extends, a time extend will never be the first event
> in a page because the page timestamp can be used. Time extends can
> only fill the rest of a page with some data at the beginning.
> 
> This patch binds the time extend with the data. The difference here
> is that the time extend is not committed before the data is added.
> Instead, when a time extend is needed, the space reserved on
> the ring buffer is the time extend + the data event size. The
> time extend is added to the first part of the reserved block and
> the data is added to the second. The time extend event is passed
> back to the reserver, but since the reserver also uses a function
> to find the data portion of the reserved block, no changes to the
> ring buffer interface need to be made.
> 
> When a commit is discarded, we now remove both the time extend and
> the event. With this approach no more than one time extend can
> be in the buffer in a row. Data must always follow a time extend.
> 
> Thanks to Mathieu Desnoyers for suggesting this idea.
> 
> This patch also removes the static inline function
> ring_buffer_event_time_delta() since it had no users. I was
> going to update it to handle this change, but found that to be
> a waste of time, and just removed the function instead.
> 
> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  include/linux/ring_buffer.h |   12 --
>  kernel/trace/ring_buffer.c  |  283 +++++++++++++++++++++++--------------------
>  2 files changed, 154 insertions(+), 141 deletions(-)

Is there no way to make this change a bit more finegrained,
i.e. incremental patches ?

Thanks,

	tglx

> 
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 25b4f68..8d3a248 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -62,18 +62,6 @@ enum ring_buffer_type {
>  unsigned ring_buffer_event_length(struct ring_buffer_event *event);
>  void *ring_buffer_event_data(struct ring_buffer_event *event);
>  
> -/**
> - * ring_buffer_event_time_delta - return the delta timestamp of the event
> - * @event: the event to get the delta timestamp of
> - *
> - * The delta timestamp is the 27 bit timestamp since the last event.
> - */
> -static inline unsigned
> -ring_buffer_event_time_delta(struct ring_buffer_event *event)
> -{
> -	return event->time_delta;
> -}
> -
>  /*
>   * ring_buffer_discard_commit will remove an event that has not
>   *   ben committed yet. If this is used, then ring_buffer_unlock_commit
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 0b88df8..4fc1577 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -224,6 +224,9 @@ enum {
>  	RB_LEN_TIME_STAMP = 16,
>  };
>  
> +#define skip_time_extend(event) \
> +	((struct ring_buffer_event *)((char *)event + RB_LEN_TIME_EXTEND))
> +
>  static inline int rb_null_event(struct ring_buffer_event *event)
>  {
>  	return event->type_len == RINGBUF_TYPE_PADDING && !event->time_delta;
> @@ -250,7 +253,7 @@ rb_event_data_length(struct ring_buffer_event *event)
>  
>  /* inline for ring buffer fast paths */
>  static unsigned
> -rb_event_length(struct ring_buffer_event *event)
> +__rb_event_length(struct ring_buffer_event *event)
>  {
>  	switch (event->type_len) {
>  	case RINGBUF_TYPE_PADDING:
> @@ -274,13 +277,52 @@ rb_event_length(struct ring_buffer_event *event)
>  	return 0;
>  }
>  
> +/*
> + * Return total length of time extend and data,
> + *   or just the event length for all other events.
> + */
> +static unsigned
> +rb_event_ts_length(struct ring_buffer_event *event)
> +{
> +	unsigned len = 0;
> +
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
> +		/* time extends include the data event after it */
> +		len = RB_LEN_TIME_EXTEND;
> +		event = skip_time_extend(event);
> +	}
> +	return len + __rb_event_length(event);
> +}
> +
> +/*
> + * Return the length of the given event. Will return
> + * the length of the time extend if the event is a
> + * time extend.
> + */
> +static unsigned
> +rb_event_real_length(struct ring_buffer_event *event)
> +{
> +	return __rb_event_length(event);
> +}
> +
>  /**
>   * ring_buffer_event_length - return the length of the event
>   * @event: the event to get the length of
> + *
> + * Returns the size of the data load of a data event.
> + * If the event is something other than a data event, it
> + * returns the size of the event itself. With the exception
> + * of a TIME EXTEND, where it still returns the size of the
> + * data load of the data event after it.
>   */
>  unsigned ring_buffer_event_length(struct ring_buffer_event *event)
>  {
> -	unsigned length = rb_event_length(event);
> +	unsigned length;
> +
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
> +		event = skip_time_extend(event);
> +
> +	length = rb_event_real_length(event);
>  	if (event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
>  		return length;
>  	length -= RB_EVNT_HDR_SIZE;
> @@ -294,6 +336,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_length);
>  static void *
>  rb_event_data(struct ring_buffer_event *event)
>  {
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
> +		event = skip_time_extend(event);
>  	BUG_ON(event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
>  	/* If length is in len field, then array[0] has the data */
>  	if (event->type_len)
> @@ -1546,6 +1590,25 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
>  	iter->head = 0;
>  }
>  
> +/* Slow path, do not inline */
> +static noinline struct ring_buffer_event *
> +rb_add_time_stamp(struct ring_buffer_event *event, u64 delta)
> +{
> +	event->type_len = RINGBUF_TYPE_TIME_EXTEND;
> +
> +	/* Not the first event on the page? */
> +	if (rb_event_index(event)) {
> +		event->time_delta = delta & TS_MASK;
> +		event->array[0] = delta >> TS_SHIFT;
> +	} else {
> +		/* nope, just zero it */
> +		event->time_delta = 0;
> +		event->array[0] = 0;
> +	}
> +
> +	return skip_time_extend(event);
> +}
> +
>  /**
>   * ring_buffer_update_event - update event type and data
>   * @event: the even to update
> @@ -1558,28 +1621,31 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
>   * data field.
>   */
>  static void
> -rb_update_event(struct ring_buffer_event *event,
> -			 unsigned type, unsigned length)
> +rb_update_event(struct ring_buffer_per_cpu *cpu_buffer,
> +		struct ring_buffer_event *event, unsigned length,
> +		int add_timestamp, u64 delta)
>  {
> -	event->type_len = type;
> -
> -	switch (type) {
> -
> -	case RINGBUF_TYPE_PADDING:
> -	case RINGBUF_TYPE_TIME_EXTEND:
> -	case RINGBUF_TYPE_TIME_STAMP:
> -		break;
> +	/* Only a commit updates the timestamp */
> +	if (unlikely(!rb_event_is_commit(cpu_buffer, event)))
> +		delta = 0;
>  
> -	case 0:
> -		length -= RB_EVNT_HDR_SIZE;
> -		if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT)
> -			event->array[0] = length;
> -		else
> -			event->type_len = DIV_ROUND_UP(length, RB_ALIGNMENT);
> -		break;
> -	default:
> -		BUG();
> +	/*
> +	 * If we need to add a timestamp, then we
> +	 * add it to the start of the resevered space.
> +	 */
> +	if (unlikely(add_timestamp)) {
> +		event = rb_add_time_stamp(event, delta);
> +		length -= RB_LEN_TIME_EXTEND;
> +		delta = 0;
>  	}
> +
> +	event->time_delta = delta;
> +	length -= RB_EVNT_HDR_SIZE;
> +	if (length > RB_MAX_SMALL_DATA || RB_FORCE_8BYTE_ALIGNMENT) {
> +		event->type_len = 0;
> +		event->array[0] = length;
> +	} else
> +		event->type_len = DIV_ROUND_UP(length, RB_ALIGNMENT);
>  }
>  
>  /*
> @@ -1829,7 +1895,7 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  static noinline struct ring_buffer_event *
>  rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  	     unsigned long length, unsigned long tail,
> -	     struct buffer_page *tail_page, u64 *ts)
> +	     struct buffer_page *tail_page, u64 ts)
>  {
>  	struct buffer_page *commit_page = cpu_buffer->commit_page;
>  	struct ring_buffer *buffer = cpu_buffer->buffer;
> @@ -1912,8 +1978,8 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  		 * Nested commits always have zero deltas, so
>  		 * just reread the time stamp
>  		 */
> -		*ts = rb_time_stamp(buffer);
> -		next_page->page->time_stamp = *ts;
> +		ts = rb_time_stamp(buffer);
> +		next_page->page->time_stamp = ts;
>  	}
>  
>   out_again:
> @@ -1932,12 +1998,21 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
>  
>  static struct ring_buffer_event *
>  __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
> -		  unsigned type, unsigned long length, u64 *ts)
> +		  unsigned long length, u64 ts,
> +		  u64 delta, int add_timestamp)
>  {
>  	struct buffer_page *tail_page;
>  	struct ring_buffer_event *event;
>  	unsigned long tail, write;
>  
> +	/*
> +	 * If the time delta since the last event is too big to
> +	 * hold in the time field of the event, then we append a
> +	 * TIME EXTEND event ahead of the data event.
> +	 */
> +	if (unlikely(add_timestamp))
> +		length += RB_LEN_TIME_EXTEND;
> +
>  	tail_page = cpu_buffer->tail_page;
>  	write = local_add_return(length, &tail_page->write);
>  
> @@ -1954,18 +2029,16 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
>  
>  	event = __rb_page_index(tail_page, tail);
>  	kmemcheck_annotate_bitfield(event, bitfield);
> -	rb_update_event(event, type, length);
> +	rb_update_event(cpu_buffer, event, length, add_timestamp, delta);
>  
> -	/* The passed in type is zero for DATA */
> -	if (likely(!type))
> -		local_inc(&tail_page->entries);
> +	local_inc(&tail_page->entries);
>  
>  	/*
>  	 * If this is the first commit on the page, then update
>  	 * its timestamp.
>  	 */
>  	if (!tail)
> -		tail_page->page->time_stamp = *ts;
> +		tail_page->page->time_stamp = ts;
>  
>  	return event;
>  }
> @@ -1980,7 +2053,7 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
>  	unsigned long addr;
>  
>  	new_index = rb_event_index(event);
> -	old_index = new_index + rb_event_length(event);
> +	old_index = new_index + rb_event_ts_length(event);
>  	addr = (unsigned long)event;
>  	addr &= PAGE_MASK;
>  
> @@ -2006,69 +2079,6 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
>  	return 0;
>  }
>  
> -static int
> -rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
> -		  u64 *ts, u64 *delta)
> -{
> -	struct ring_buffer_event *event;
> -	int ret;
> -
> -	WARN_ONCE(*delta > (1ULL << 59),
> -		  KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n",
> -		  (unsigned long long)*delta,
> -		  (unsigned long long)*ts,
> -		  (unsigned long long)cpu_buffer->write_stamp);
> -
> -	/*
> -	 * The delta is too big, we to add a
> -	 * new timestamp.
> -	 */
> -	event = __rb_reserve_next(cpu_buffer,
> -				  RINGBUF_TYPE_TIME_EXTEND,
> -				  RB_LEN_TIME_EXTEND,
> -				  ts);
> -	if (!event)
> -		return -EBUSY;
> -
> -	if (PTR_ERR(event) == -EAGAIN)
> -		return -EAGAIN;
> -
> -	/* Only a commited time event can update the write stamp */
> -	if (rb_event_is_commit(cpu_buffer, event)) {
> -		/*
> -		 * If this is the first on the page, then it was
> -		 * updated with the page itself. Try to discard it
> -		 * and if we can't just make it zero.
> -		 */
> -		if (rb_event_index(event)) {
> -			event->time_delta = *delta & TS_MASK;
> -			event->array[0] = *delta >> TS_SHIFT;
> -		} else {
> -			/* try to discard, since we do not need this */
> -			if (!rb_try_to_discard(cpu_buffer, event)) {
> -				/* nope, just zero it */
> -				event->time_delta = 0;
> -				event->array[0] = 0;
> -			}
> -		}
> -		cpu_buffer->write_stamp = *ts;
> -		/* let the caller know this was the commit */
> -		ret = 1;
> -	} else {
> -		/* Try to discard the event */
> -		if (!rb_try_to_discard(cpu_buffer, event)) {
> -			/* Darn, this is just wasted space */
> -			event->time_delta = 0;
> -			event->array[0] = 0;
> -		}
> -		ret = 0;
> -	}
> -
> -	*delta = 0;
> -
> -	return ret;
> -}
> -
>  static void rb_start_commit(struct ring_buffer_per_cpu *cpu_buffer)
>  {
>  	local_inc(&cpu_buffer->committing);
> @@ -2113,9 +2123,9 @@ rb_reserve_next_event(struct ring_buffer *buffer,
>  		      unsigned long length)
>  {
>  	struct ring_buffer_event *event;
> -	u64 ts, delta = 0;
> -	int commit = 0;
> +	u64 ts, delta;
>  	int nr_loops = 0;
> +	int add_timestamp;
>  
>  	rb_start_commit(cpu_buffer);
>  
> @@ -2136,6 +2146,9 @@ rb_reserve_next_event(struct ring_buffer *buffer,
>  
>  	length = rb_calculate_event_length(length);
>   again:
> +	add_timestamp = 0;
> +	delta = 0;
> +
>  	/*
>  	 * We allow for interrupts to reenter here and do a trace.
>  	 * If one does, it will cause this original code to loop
> @@ -2174,31 +2187,24 @@ rb_reserve_next_event(struct ring_buffer *buffer,
>  
>  		delta = diff;
>  		if (unlikely(test_time_stamp(delta))) {
> -
> -			commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
> -			if (commit == -EBUSY)
> -				goto out_fail;
> -
> -			if (commit == -EAGAIN)
> -				goto again;
> -
> -			RB_WARN_ON(cpu_buffer, commit < 0);
> +			WARN_ONCE(delta > (1ULL << 59),
> +				  KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n",
> +				  (unsigned long long)delta,
> +				  (unsigned long long)ts,
> +				  (unsigned long long)cpu_buffer->write_stamp);
> +			add_timestamp = 1;
>  		}
>  	}
>  
>   get_event:
> -	event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
> +	event = __rb_reserve_next(cpu_buffer, length, ts,
> +				  delta, add_timestamp);
>  	if (unlikely(PTR_ERR(event) == -EAGAIN))
>  		goto again;
>  
>  	if (!event)
>  		goto out_fail;
>  
> -	if (!rb_event_is_commit(cpu_buffer, event))
> -		delta = 0;
> -
> -	event->time_delta = delta;
> -
>  	return event;
>  
>   out_fail:
> @@ -2311,12 +2317,28 @@ static void
>  rb_update_write_stamp(struct ring_buffer_per_cpu *cpu_buffer,
>  		      struct ring_buffer_event *event)
>  {
> +	u64 delta;
> +
>  	/*
>  	 * The event first in the commit queue updates the
>  	 * time stamp.
>  	 */
> -	if (rb_event_is_commit(cpu_buffer, event))
> -		cpu_buffer->write_stamp += event->time_delta;
> +	if (rb_event_is_commit(cpu_buffer, event)) {
> +		/*
> +		 * A commit event that is first on a page
> +		 * updates the write timestamp with the page stamp
> +		 */
> +		if (!rb_event_index(event))
> +			cpu_buffer->write_stamp =
> +				cpu_buffer->commit_page->page->time_stamp;
> +		else if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
> +			delta = event->array[0];
> +			delta <<= TS_SHIFT;
> +			delta += event->time_delta;
> +			cpu_buffer->write_stamp += delta;
> +		} else
> +			cpu_buffer->write_stamp += event->time_delta;
> +	}
>  }
>  
>  static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
> @@ -2356,6 +2378,9 @@ EXPORT_SYMBOL_GPL(ring_buffer_unlock_commit);
>  
>  static inline void rb_event_discard(struct ring_buffer_event *event)
>  {
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
> +		event = skip_time_extend(event);
> +
>  	/* array[0] holds the actual length for the discarded event */
>  	event->array[0] = rb_event_data_length(event) - RB_EVNT_HDR_SIZE;
>  	event->type_len = RINGBUF_TYPE_PADDING;
> @@ -2982,7 +3007,7 @@ static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer)
>  
>  	rb_update_read_stamp(cpu_buffer, event);
>  
> -	length = rb_event_length(event);
> +	length = rb_event_real_length(event);
>  	cpu_buffer->reader_page->read += length;
>  }
>  
> @@ -3007,7 +3032,7 @@ static void rb_advance_iter(struct ring_buffer_iter *iter)
>  
>  	event = rb_iter_head_event(iter);
>  
> -	length = rb_event_length(event);
> +	length = rb_event_real_length(event);
>  
>  	/*
>  	 * This should not be called to advance the header if we are
> @@ -3043,12 +3068,12 @@ rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts,
>  
>   again:
>  	/*
> -	 * We repeat when a timestamp is encountered. It is possible
> -	 * to get multiple timestamps from an interrupt entering just
> -	 * as one timestamp is about to be written, or from discarded
> -	 * commits. The most that we can have is the number on a single page.
> +	 * We repeat when a time extend is encountered.
> +	 * Since the time extend is always attached to a data event,
> +	 * we should never loop more than once.
> +	 * (We never hit the following condition more than twice).
>  	 */
> -	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
> +	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 2))
>  		return NULL;
>  
>  	reader = rb_get_reader_page(cpu_buffer);
> @@ -3124,14 +3149,12 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
>  		return NULL;
>  
>  	/*
> -	 * We repeat when a timestamp is encountered.
> -	 * We can get multiple timestamps by nested interrupts or also
> -	 * if filtering is on (discarding commits). Since discarding
> -	 * commits can be frequent we can get a lot of timestamps.
> -	 * But we limit them by not adding timestamps if they begin
> -	 * at the start of a page.
> +	 * We repeat when a time extend is encountered.
> +	 * Since the time extend is always attached to a data event,
> +	 * we should never loop more than once.
> +	 * (We never hit the following condition more than twice).
>  	 */
> -	if (RB_WARN_ON(cpu_buffer, ++nr_loops > RB_TIMESTAMPS_PER_PAGE))
> +	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 2))
>  		return NULL;
>  
>  	if (rb_per_cpu_empty(cpu_buffer))
> @@ -3829,7 +3852,8 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
>  		if (len > (commit - read))
>  			len = (commit - read);
>  
> -		size = rb_event_length(event);
> +		/* Always keep the time extend and data together */
> +		size = rb_event_ts_length(event);
>  
>  		if (len < size)
>  			goto out_unlock;
> @@ -3851,7 +3875,8 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
>  				break;
>  
>  			event = rb_reader_event(cpu_buffer);
> -			size = rb_event_length(event);
> +			/* Always keep the time extend and data together */
> +			size = rb_event_ts_length(event);
>  		} while (len > size);
>  
>  		/* update bpage */
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/5] ring-buffer: Bind time extend and data events together
  2010-10-19 20:46   ` Thomas Gleixner
@ 2010-10-20 13:06     ` Steven Rostedt
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2010-10-20 13:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Frederic Weisbecker

On Tue, 2010-10-19 at 22:46 +0200, Thomas Gleixner wrote:
> On Tue, 19 Oct 2010, Steven Rostedt wrote:

> > Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  include/linux/ring_buffer.h |   12 --
> >  kernel/trace/ring_buffer.c  |  283 +++++++++++++++++++++++--------------------
> >  2 files changed, 154 insertions(+), 141 deletions(-)
> 
> Is there no way to make this change a bit more finegrained,
> i.e. incremental patches ?
> 
> Thanks,

There's probably a little I can add incrementally, but the main change
that this does is that the ring buffer can return either a data type or
time extend type (which has a data type appended to it).

Luckily, the interface to the ring buffer has all be encapsulated, so
the only changes need to be made in ring_buffer.[ch]. All interfaces
that take a event type needs to check if it is a data or time extend.
Which is why the patch looks so big. But these changes are contained in
this file.

But there's some clean ups that this patch did, I can pull them out and
submit again.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-10-20 13:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-19 17:54 [PATCH 0/5] [GIT PULL] ring-buffer: Various optimizations Steven Rostedt
2010-10-19 17:54 ` [PATCH 1/5] ring-buffer: Make write slow path out of line Steven Rostedt
2010-10-19 17:54 ` [PATCH 2/5] ring-buffer: Bind time extend and data events together Steven Rostedt
2010-10-19 20:46   ` Thomas Gleixner
2010-10-20 13:06     ` Steven Rostedt
2010-10-19 17:54 ` [PATCH 3/5] ring-buffer: Remove condition to add timestamp in fast path Steven Rostedt
2010-10-19 17:54 ` [PATCH 4/5] ring-buffer: Micro-optimize with some strategic inlining Steven Rostedt
2010-10-19 17:54 ` [PATCH 5/5] ring-buffer: Remove unused macro RB_TIMESTAMPS_PER_PAGE Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).