linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Add dynamic updates to trace ring buffer
@ 2011-07-26 22:59 Vaibhav Nagarnaik
  2011-07-26 22:59 ` [PATCH 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
                   ` (10 more replies)
  0 siblings, 11 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-26 22:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

These patches are in response to the fact that sometimes there are
higher rate of events generated on some CPUs as compared to others. This
makes it inefficient to have equal memory size allocated to each of the
per-cpu ring buffers.

This patch series adds 3 things to achieve this:
* Add a way to measure the rate of events generated on a CPU. This is a
  part of patch#2 which makes the 'stats' files print out the number of
  bytes in the ring buffer, the oldest time stamp ("head ts"), and the
  current time stamp ("now ts"). The rate is measured as: bytes /
  (now-head)

* The next patch#3 adds the flexibility to assign different sizes to
  individual per-cpu ring buffers. This is done by adding a
  "buffer_size_kb" debugfs file entry under per_cpu/* directories.

* The final two patches provide a way to change the size of ring
  buffer concurrent to events being added to the ring buffer. Patch#4
  adds functionality to remove pages from the ring buffer and patch#5
  adds functionality to add pages to the ring buffer.

Patch#1 adds a debugfs entry "buffer_total_size_kb" which provides the
total memory allocated for the ring buffer.

This makes it easy for a user process to monitor the rate at which the
ring buffers are being filled up and update the individual per-cpu ring
buffer sizes in response to it.

Vaibhav Nagarnaik (5):
  trace: Add a new readonly entry to report total buffer size
  trace: Add ring buffer stats to measure rate of events
  trace: Add per_cpu ring buffer control files
  trace: Make removal of ring buffer pages atomic
  trace: Make addition of pages in ring buffer atomic

 include/linux/ring_buffer.h |    8 +-
 kernel/trace/ring_buffer.c  |  562 +++++++++++++++++++++++++++++++------------
 kernel/trace/trace.c        |  241 ++++++++++++++----
 kernel/trace/trace.h        |    2 +-
 4 files changed, 603 insertions(+), 210 deletions(-)

-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 1/5] trace: Add a new readonly entry to report total buffer size
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
@ 2011-07-26 22:59 ` Vaibhav Nagarnaik
  2011-07-29 18:01   ` Steven Rostedt
  2011-07-26 22:59 ` [PATCH 2/5] trace: Add ring buffer stats to measure rate of events Vaibhav Nagarnaik
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-26 22:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

The current file "buffer_size_kb" reports the size of per-cpu buffer and
not the overall memory allocated which could be misleading. A new file
"buffer_total_size_kb" adds up all the enabled CPU buffer sizes and
reports it. This is only a readonly entry.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/trace.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e5df02c..ce57c55 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3569,6 +3569,24 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 }
 
 static ssize_t
+tracing_total_entries_read(struct file *filp, char __user *ubuf,
+				size_t cnt, loff_t *ppos)
+{
+	struct trace_array *tr = filp->private_data;
+	char buf[64];
+	int r, cpu;
+	unsigned long size = 0;
+
+	mutex_lock(&trace_types_lock);
+	for_each_tracing_cpu(cpu)
+		size += tr->entries >> 10;
+	mutex_unlock(&trace_types_lock);
+
+	r = sprintf(buf, "%lu\n", size);
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
 tracing_free_buffer_write(struct file *filp, const char __user *ubuf,
 			  size_t cnt, loff_t *ppos)
 {
@@ -3739,6 +3757,12 @@ static const struct file_operations tracing_entries_fops = {
 	.llseek		= generic_file_llseek,
 };
 
+static const struct file_operations tracing_total_entries_fops = {
+	.open		= tracing_open_generic,
+	.read		= tracing_total_entries_read,
+	.llseek		= generic_file_llseek,
+};
+
 static const struct file_operations tracing_free_buffer_fops = {
 	.write		= tracing_free_buffer_write,
 	.release	= tracing_free_buffer_release,
@@ -4450,6 +4474,9 @@ static __init int tracer_init_debugfs(void)
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
 			&global_trace, &tracing_entries_fops);
 
+	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
+			&global_trace, &tracing_total_entries_fops);
+
 	trace_create_file("free_buffer", 0644, d_tracer,
 			&global_trace, &tracing_free_buffer_fops);
 
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 2/5] trace: Add ring buffer stats to measure rate of events
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
  2011-07-26 22:59 ` [PATCH 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
@ 2011-07-26 22:59 ` Vaibhav Nagarnaik
  2011-07-29 18:10   ` Steven Rostedt
  2011-07-26 22:59 ` [PATCH 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-26 22:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

The stats file under per_cpu folder provides the number of entries,
overruns and other statistics about the CPU ring buffer. However, the
numbers do not provide any indication of how full the ring buffer is in
bytes compared to the overall size in bytes. Also, it is helpful to know
the rate at which the cpu buffer is filling up.

This patch adds an entry "bytes: " in printed stats for per_cpu ring
buffer which provides the actual bytes consumed in the ring buffer. This
field includes the number of bytes used by recorded events and the
padding bytes added when moving the tail pointer to next page.

It also adds the following time stamps:
"head ts:" - the oldest timestamp in the ring buffer
"now ts:"  - the timestamp at the time of reading

The field "now ts" provides a consistent time snapshot to the userspace
when being read. This is read from the same trace clock used by tracing
event timestamps.

Together, these values provide the rate at which the buffer is filling
up, from the formula:
bytes / (now_ts - head_ts)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 include/linux/ring_buffer.h |    2 +
 kernel/trace/ring_buffer.c  |   70 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/trace.c        |   13 ++++++++
 3 files changed, 84 insertions(+), 1 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b891de9..af635cf 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -154,6 +154,8 @@ void ring_buffer_record_enable(struct ring_buffer *buffer);
 void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
 
+unsigned long ring_buffer_head_ts(struct ring_buffer *buffer, int cpu);
+unsigned long ring_buffer_bytes_cpu(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_entries(struct ring_buffer *buffer);
 unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
 unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 731201b..3e49dde 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -488,12 +488,14 @@ struct ring_buffer_per_cpu {
 	struct buffer_page		*reader_page;
 	unsigned long			lost_events;
 	unsigned long			last_overrun;
+	local_t				entries_bytes;
 	local_t				commit_overrun;
 	local_t				overrun;
 	local_t				entries;
 	local_t				committing;
 	local_t				commits;
 	unsigned long			read;
+	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
 };
@@ -1708,6 +1710,7 @@ rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer,
 		 * the counters.
 		 */
 		local_add(entries, &cpu_buffer->overrun);
+		local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
 
 		/*
 		 * The entries will be zeroed out when we move the
@@ -1863,6 +1866,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	event = __rb_page_index(tail_page, tail);
 	kmemcheck_annotate_bitfield(event, bitfield);
 
+	/* account for padding bytes */
+	local_add(BUF_PAGE_SIZE - tail, &cpu_buffer->entries_bytes);
+
 	/*
 	 * Save the original length to the meta data.
 	 * This will be used by the reader to add lost event
@@ -2054,6 +2060,9 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	if (!tail)
 		tail_page->page->time_stamp = ts;
 
+	/* account for these added bytes */
+	local_add(length, &cpu_buffer->entries_bytes);
+
 	return event;
 }
 
@@ -2076,6 +2085,7 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
 	if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
 		unsigned long write_mask =
 			local_read(&bpage->write) & ~RB_WRITE_MASK;
+		unsigned long event_length = rb_event_length(event);
 		/*
 		 * This is on the tail page. It is possible that
 		 * a write could come in and move the tail page
@@ -2085,8 +2095,11 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
 		old_index += write_mask;
 		new_index += write_mask;
 		index = local_cmpxchg(&bpage->write, old_index, new_index);
-		if (index == old_index)
+		if (index == old_index) {
+			/* update counters */
+			local_sub(event_length, &cpu_buffer->entries_bytes);
 			return 1;
+		}
 	}
 
 	/* could not discard */
@@ -2661,6 +2674,58 @@ rb_num_of_entries(struct ring_buffer_per_cpu *cpu_buffer)
 }
 
 /**
+ * ring_buffer_head_ts - get the oldest event timestamp from the buffer
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to read from.
+ */
+unsigned long ring_buffer_head_ts(struct ring_buffer *buffer, int cpu)
+{
+	unsigned long flags;
+	struct ring_buffer_per_cpu *cpu_buffer;
+	struct buffer_page *bpage;
+	unsigned long ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+	/*
+	 * if the tail is on reader_page, oldest time stamp is on the reader
+	 * page
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		bpage = cpu_buffer->reader_page;
+	else
+		bpage = rb_set_head_page(cpu_buffer);
+	ret = bpage->page->time_stamp;
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_head_ts);
+
+/**
+ * ring_buffer_bytes_cpu - get the number of bytes consumed in a cpu buffer
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to read from.
+ */
+unsigned long ring_buffer_bytes_cpu(struct ring_buffer *buffer, int cpu)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+	unsigned long ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	ret = local_read(&cpu_buffer->entries_bytes) - cpu_buffer->read_bytes;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_bytes_cpu);
+
+/**
  * ring_buffer_entries_cpu - get the number of entries in a cpu buffer
  * @buffer: The ring buffer
  * @cpu: The per CPU buffer to get the entries from.
@@ -3527,11 +3592,13 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->reader_page->read = 0;
 
 	local_set(&cpu_buffer->commit_overrun, 0);
+	local_set(&cpu_buffer->entries_bytes, 0);
 	local_set(&cpu_buffer->overrun, 0);
 	local_set(&cpu_buffer->entries, 0);
 	local_set(&cpu_buffer->committing, 0);
 	local_set(&cpu_buffer->commits, 0);
 	cpu_buffer->read = 0;
+	cpu_buffer->read_bytes = 0;
 
 	cpu_buffer->write_stamp = 0;
 	cpu_buffer->read_stamp = 0;
@@ -3918,6 +3985,7 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	} else {
 		/* update the entry counter */
 		cpu_buffer->read += rb_page_entries(reader);
+		cpu_buffer->read_bytes += BUF_PAGE_SIZE;
 
 		/* swap the pages */
 		rb_init_page(bpage);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ce57c55..f5b95e9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4050,6 +4050,8 @@ tracing_stats_read(struct file *filp, char __user *ubuf,
 	struct trace_array *tr = &global_trace;
 	struct trace_seq *s;
 	unsigned long cnt;
+	unsigned long long t;
+	unsigned long usec_rem;
 
 	s = kmalloc(sizeof(*s), GFP_KERNEL);
 	if (!s)
@@ -4066,6 +4068,17 @@ tracing_stats_read(struct file *filp, char __user *ubuf,
 	cnt = ring_buffer_commit_overrun_cpu(tr->buffer, cpu);
 	trace_seq_printf(s, "commit overrun: %ld\n", cnt);
 
+	cnt = ring_buffer_bytes_cpu(tr->buffer, cpu);
+	trace_seq_printf(s, "bytes: %ld\n", cnt);
+
+	t = ns2usecs(ring_buffer_head_ts(tr->buffer, cpu));
+	usec_rem = do_div(t, USEC_PER_SEC);
+	trace_seq_printf(s, "head ts: %5llu.%06lu\n", t, usec_rem);
+
+	t = ns2usecs(ring_buffer_time_stamp(tr->buffer, cpu));
+	usec_rem = do_div(t, USEC_PER_SEC);
+	trace_seq_printf(s, "now ts: %5llu.%06lu\n", t, usec_rem);
+
 	count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->len);
 
 	kfree(s);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 3/5] trace: Add per_cpu ring buffer control files
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
  2011-07-26 22:59 ` [PATCH 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
  2011-07-26 22:59 ` [PATCH 2/5] trace: Add ring buffer stats to measure rate of events Vaibhav Nagarnaik
@ 2011-07-26 22:59 ` Vaibhav Nagarnaik
  2011-07-29 18:14   ` Steven Rostedt
  2011-07-26 22:59 ` [PATCH 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-26 22:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
the ring buffer sizes of all the CPUs at that point.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 include/linux/ring_buffer.h |    6 +-
 kernel/trace/ring_buffer.c  |  221 +++++++++++++++++++++++--------------------
 kernel/trace/trace.c        |  185 +++++++++++++++++++++++++++++-------
 kernel/trace/trace.h        |    2 +-
 4 files changed, 272 insertions(+), 142 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index af635cf..1e6ce34 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+#define RING_BUFFER_ALL_CPUS -1
+
 void ring_buffer_free(struct ring_buffer *buffer);
 
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
 
 void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
 
@@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
 void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
 int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
 
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
 
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3e49dde..83450c9 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
 	spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
+	unsigned int			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
 	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
+	/* ring buffer pages to update, > 0 to add, < 0 to remove */
+	int				nr_pages_to_update;
+	struct list_head		new_pages; /* new pages to add */
 };
 
 struct ring_buffer {
-	unsigned			pages;
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
@@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
+static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
 {
+	int i;
 	struct buffer_page *bpage, *tmp;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    GFP_KERNEL | __GFP_NORETRY,
-				    cpu_to_node(cpu_buffer->cpu));
+				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
 
-		rb_check_bpage(cpu_buffer, bpage);
-
-		list_add(&bpage->list, &pages);
+		list_add(&bpage->list, pages);
 
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+		page = alloc_pages_node(cpu_to_node(cpu),
 					GFP_KERNEL | __GFP_NORETRY, 0);
 		if (!page)
 			goto free_pages;
@@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_init_page(bpage->page);
 	}
 
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+
+	return -ENOMEM;
+}
+
+static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	LIST_HEAD(pages);
+
+	WARN_ON(!nr_pages);
+
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+		return -ENOMEM;
+
 	/*
 	 * The ring buffer page list is a circular list that does not
 	 * start and end with a list head. All page list items point to
@@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);
 
+	cpu_buffer->nr_pages = nr_pages;
+
 	rb_check_pages(cpu_buffer);
 
 	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
 }
 
 static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
+rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
@@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	ret = rb_allocate_pages(cpu_buffer, nr_pages);
 	if (ret < 0)
 		goto fail_free_reader;
 
@@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 {
 	struct ring_buffer *buffer;
 	int bsize;
-	int cpu;
+	int cpu, nr_pages;
 
 	/* keep it in its own cache line */
 	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
 
 	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
+	if (nr_pages < 2)
+		nr_pages = 2;
 
 	/*
 	 * In case of non-hotplug cpu, if the ring-buffer is allocated
@@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 
 	for_each_buffer_cpu(buffer, cpu) {
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu])
 			goto fail_free_buffers;
 	}
@@ -1308,6 +1321,17 @@ out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
+static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	if (cpu_buffer->nr_pages_to_update > 0)
+		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
+				cpu_buffer->nr_pages_to_update);
+	else
+		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+	/* reset this value */
+	cpu_buffer->nr_pages_to_update = 0;
+}
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -1317,14 +1341,12 @@ out:
  *
  * Returns -1 on failure.
  */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
+			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	LIST_HEAD(pages);
-	int i, cpu;
+	unsigned nr_pages;
+	int cpu;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1334,15 +1356,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
 
 	/* we need a minimum of two pages */
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	if (size == buffer_size)
-		return size;
-
 	atomic_inc(&buffer->record_disabled);
 
 	/* Make sure all writers are done with this buffer. */
@@ -1353,68 +1371,59 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	if (size < buffer_size) {
+	if (cpu_id == RING_BUFFER_ALL_CPUS) {
+		/* calculate the pages to update */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+
+			cpu_buffer->nr_pages_to_update = nr_pages -
+							cpu_buffer->nr_pages;
 
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
+			/*
+			 * nothing more to do for removing pages or no update
+			 */
+			if (cpu_buffer->nr_pages_to_update <= 0)
+				continue;
 
-		rm_pages = buffer->pages - nr_pages;
+			/*
+			 * to add pages, make sure all new pages can be
+			 * allocated without receiving ENOMEM
+			 */
+			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu))
+				/* not enough memory for new pages */
+				goto no_mem;
+		}
 
+		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
+			if (cpu_buffer->nr_pages_to_update) {
+				update_pages_handler(cpu_buffer);
+				cpu_buffer->nr_pages = nr_pages;
+			}
 		}
-		goto out;
-	}
+	} else {
+		cpu_buffer = buffer->buffers[cpu_id];
+		if (nr_pages == cpu_buffer->nr_pages)
+			goto out;
 
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
+		cpu_buffer->nr_pages_to_update = nr_pages -
+						cpu_buffer->nr_pages;
 
-	new_pages = nr_pages - buffer->pages;
+		INIT_LIST_HEAD(&cpu_buffer->new_pages);
+		if (cpu_buffer->nr_pages_to_update > 0 &&
+			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu_id))
+			goto no_mem;
 
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			struct page *page;
-			/*
-			 * __GFP_NORETRY flag makes sure that the allocation
-			 * fails gracefully without invoking oom-killer and
-			 * the system is not destabilized.
-			 */
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL | __GFP_NORETRY,
-					    cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			page = alloc_pages_node(cpu_to_node(cpu),
-						GFP_KERNEL | __GFP_NORETRY, 0);
-			if (!page)
-				goto free_pages;
-			bpage->page = page_address(page);
-			rb_init_page(bpage->page);
-		}
-	}
+		update_pages_handler(cpu_buffer);
 
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
+		cpu_buffer->nr_pages = nr_pages;
 	}
 
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
-
  out:
-	buffer->pages = nr_pages;
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 
@@ -1422,25 +1431,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	return size;
 
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+ no_mem:
+	for_each_buffer_cpu(buffer, cpu) {
+		struct buffer_page *bpage, *tmp;
+		cpu_buffer = buffer->buffers[cpu];
+		/* reset this number regardless */
+		cpu_buffer->nr_pages_to_update = 0;
+		if (list_empty(&cpu_buffer->new_pages))
+			continue;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
 	}
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1542,7 +1550,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 	 * assign the commit to the tail.
 	 */
  again:
-	max_count = cpu_buffer->buffer->pages * 100;
+	max_count = cpu_buffer->nr_pages * 100;
 
 	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
 		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
@@ -3563,9 +3571,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
  * ring_buffer_size - return the size of the ring buffer (in bytes)
  * @buffer: The ring buffer.
  */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
 {
-	return BUF_PAGE_SIZE * buffer->pages;
+	/*
+	 * Earlier, this method returned
+	 *	BUF_PAGE_SIZE * buffer->nr_pages
+	 * Since the nr_pages field is now removed, we have converted this to
+	 * return the per cpu buffer value.
+	 */
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_size);
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f5b95e9..8ea48e0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2853,7 +2853,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
 	return t->init(tr);
 }
 
-static int __tracing_resize_ring_buffer(unsigned long size)
+static void set_buffer_entries(struct trace_array *tr, unsigned long val)
+{
+	int cpu;
+	for_each_tracing_cpu(cpu)
+		tr->data[cpu]->entries = val;
+}
+
+static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 {
 	int ret;
 
@@ -2864,19 +2871,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 	 */
 	ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
 	if (ret < 0)
 		return ret;
 
 	if (!current_trace->use_max_tr)
 		goto out;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
 	if (ret < 0) {
-		int r;
+		int r = 0;
+
+		if (cpu == RING_BUFFER_ALL_CPUS) {
+			int i;
+			for_each_tracing_cpu(i) {
+				r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[i]->entries,
+						i);
+				if (r < 0)
+					break;
+			}
+		} else {
+			r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+		}
 
-		r = ring_buffer_resize(global_trace.buffer,
-				       global_trace.entries);
 		if (r < 0) {
 			/*
 			 * AARGH! We are left with different
@@ -2898,14 +2918,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 		return ret;
 	}
 
-	max_tr.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&max_tr, size);
+	else
+		max_tr.data[cpu]->entries = size;
+
  out:
-	global_trace.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&global_trace, size);
+	else
+		global_trace.data[cpu]->entries = size;
 
 	return ret;
 }
 
-static ssize_t tracing_resize_ring_buffer(unsigned long size)
+static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
 	int cpu, ret = size;
 
@@ -2921,12 +2948,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
 			atomic_inc(&max_tr.data[cpu]->disabled);
 	}
 
-	if (size != global_trace.entries)
-		ret = __tracing_resize_ring_buffer(size);
+	if (cpu_id != RING_BUFFER_ALL_CPUS) {
+		/* make sure, this cpu is enabled in the mask */
+		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
 
+	ret = __tracing_resize_ring_buffer(size, cpu_id);
 	if (ret < 0)
 		ret = -ENOMEM;
 
+out:
 	for_each_tracing_cpu(cpu) {
 		if (global_trace.data[cpu])
 			atomic_dec(&global_trace.data[cpu]->disabled);
@@ -2957,7 +2991,8 @@ int tracing_update_buffers(void)
 
 	mutex_lock(&trace_types_lock);
 	if (!ring_buffer_expanded)
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -2981,7 +3016,8 @@ static int tracing_set_tracer(const char *buf)
 	mutex_lock(&trace_types_lock);
 
 	if (!ring_buffer_expanded) {
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3007,8 +3043,8 @@ static int tracing_set_tracer(const char *buf)
 		 * The max_tr ring buffer has some state (e.g. ring->clock) and
 		 * we want preserve it.
 		 */
-		ring_buffer_resize(max_tr.buffer, 1);
-		max_tr.entries = 1;
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
 	}
 	destroy_trace_option_files(topts);
 
@@ -3016,10 +3052,17 @@ static int tracing_set_tracer(const char *buf)
 
 	topts = create_trace_option_files(current_trace);
 	if (current_trace->use_max_tr) {
-		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
-		if (ret < 0)
-			goto out;
-		max_tr.entries = global_trace.entries;
+		int cpu;
+		/* we need to make per cpu buffer sizes equivalent */
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				goto out;
+			max_tr.data[cpu]->entries =
+					global_trace.data[cpu]->entries;
+		}
 	}
 
 	if (t->init) {
@@ -3521,30 +3564,82 @@ out_err:
 	goto out;
 }
 
+struct ftrace_entries_info {
+	struct trace_array	*tr;
+	int			cpu;
+};
+
+static int tracing_entries_open(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info;
+
+	if (tracing_disabled)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->tr = &global_trace;
+	info->cpu = (unsigned long)inode->i_private;
+
+	filp->private_data = info;
+
+	return 0;
+}
+
 static ssize_t
 tracing_entries_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	struct trace_array *tr = filp->private_data;
-	char buf[96];
-	int r;
+	struct ftrace_entries_info *info = filp->private_data;
+	struct trace_array *tr = info->tr;
+	char buf[64];
+	int r = 0;
+	ssize_t ret;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		r = sprintf(buf, "%lu (expanded: %lu)\n",
-			    tr->entries >> 10,
-			    trace_buf_size >> 10);
-	else
-		r = sprintf(buf, "%lu\n", tr->entries >> 10);
+
+	if (info->cpu == RING_BUFFER_ALL_CPUS) {
+		int cpu, buf_size_same;
+		unsigned long size;
+
+		size = 0;
+		buf_size_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_tracing_cpu(cpu) {
+			/* fill in the size from first enabled cpu */
+			if (size == 0)
+				size = tr->data[cpu]->entries;
+			if (size != tr->data[cpu]->entries) {
+				buf_size_same = 0;
+				break;
+			}
+		}
+
+		if (buf_size_same) {
+			if (!ring_buffer_expanded)
+				r = sprintf(buf, "%lu (expanded: %lu)\n",
+					    size >> 10,
+					    trace_buf_size >> 10);
+			else
+				r = sprintf(buf, "%lu\n", size >> 10);
+		} else
+			r = sprintf(buf, "X\n");
+	} else
+		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
+
 	mutex_unlock(&trace_types_lock);
 
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	return ret;
 }
 
 static ssize_t
 tracing_entries_write(struct file *filp, const char __user *ubuf,
 		      size_t cnt, loff_t *ppos)
 {
+	struct ftrace_entries_info *info = filp->private_data;
 	unsigned long val;
 	int ret;
 
@@ -3559,7 +3654,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	/* value is in KB */
 	val <<= 10;
 
-	ret = tracing_resize_ring_buffer(val);
+	ret = tracing_resize_ring_buffer(val, info->cpu);
 	if (ret < 0)
 		return ret;
 
@@ -3568,6 +3663,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static int
+tracing_entries_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info = filp->private_data;
+
+	kfree(info);
+
+	return 0;
+}
+
 static ssize_t
 tracing_total_entries_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos)
@@ -3579,7 +3684,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu)
-		size += tr->entries >> 10;
+		size += tr->data[cpu]->entries >> 10;
 	mutex_unlock(&trace_types_lock);
 
 	r = sprintf(buf, "%lu\n", size);
@@ -3607,7 +3712,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
 		tracing_off();
 	/* resize the ring buffer to 0 */
-	tracing_resize_ring_buffer(0);
+	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
 
 	return 0;
 }
@@ -3751,9 +3856,10 @@ static const struct file_operations tracing_pipe_fops = {
 };
 
 static const struct file_operations tracing_entries_fops = {
-	.open		= tracing_open_generic,
+	.open		= tracing_entries_open,
 	.read		= tracing_entries_read,
 	.write		= tracing_entries_write,
+	.release	= tracing_entries_release,
 	.llseek		= generic_file_llseek,
 };
 
@@ -4205,6 +4311,9 @@ static void tracing_init_debugfs_percpu(long cpu)
 
 	trace_create_file("stats", 0444, d_cpu,
 			(void *) cpu, &tracing_stats_fops);
+
+	trace_create_file("buffer_size_kb", 0444, d_cpu,
+			(void *) cpu, &tracing_entries_fops);
 }
 
 #ifdef CONFIG_FTRACE_SELFTEST
@@ -4485,7 +4594,7 @@ static __init int tracer_init_debugfs(void)
 			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
 
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
-			&global_trace, &tracing_entries_fops);
+			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
 
 	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
 			&global_trace, &tracing_total_entries_fops);
@@ -4731,8 +4840,6 @@ __init static int tracer_alloc_buffers(void)
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
-
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 	max_tr.buffer = ring_buffer_alloc(1, rb_flags);
@@ -4742,7 +4849,6 @@ __init static int tracer_alloc_buffers(void)
 		ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = 1;
 #endif
 
 	/* Allocate the first page for all buffers */
@@ -4751,6 +4857,11 @@ __init static int tracer_alloc_buffers(void)
 		max_tr.data[i] = &per_cpu(max_tr_data, i);
 	}
 
+	set_buffer_entries(&global_trace, ring_buf_size);
+#ifdef CONFIG_TRACER_MAX_TRACE
+	set_buffer_entries(&max_tr, 1);
+#endif
+
 	trace_init_cmdlines();
 
 	register_tracer(&nop_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3f381d0..97ee70f 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -125,6 +125,7 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
+	unsigned long		entries;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
@@ -146,7 +147,6 @@ struct trace_array_cpu {
  */
 struct trace_array {
 	struct ring_buffer	*buffer;
-	unsigned long		entries;
 	int			cpu;
 	cycle_t			time_start;
 	struct task_struct	*waiter;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (2 preceding siblings ...)
  2011-07-26 22:59 ` [PATCH 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
@ 2011-07-26 22:59 ` Vaibhav Nagarnaik
  2011-07-29 21:23   ` Steven Rostedt
  2011-07-26 22:59 ` [PATCH 5/5] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-26 22:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to remove pages from a ring buffer
atomically while write operations are going on. This makes it possible
to reduce the ring buffer size without losing any latest events from the
ring buffer.

This is done by moving the page after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
page after tail page happens to be the head page, then that page is
removed and head page is moved forward to the next page. This removes
the oldest data from the ring buffer and keeps the latest data around to
be read.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/ring_buffer.c |  214 ++++++++++++++++++++++++++++++++++---------
 kernel/trace/trace.c       |   20 +----
 2 files changed, 170 insertions(+), 64 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 83450c9..0b43758 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -502,6 +504,8 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
@@ -1080,6 +1084,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1267,29 +1273,142 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
 rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *tail_page;
+	unsigned int nr_removed;
+	int retries, page_entries;
+	struct list_head *to_remove, *next_page, *ret;
+	struct buffer_page *to_remove_page, *next_buffer_page;
 
 	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock.
+	 * The only race would be with the writer in 2 conditions:
+	 * 1. When moving to the new page to write (not head)
+	 * 2. When moving to the head page
+	 * In both these cases, we make sure that if we get any failures, we
+	 * pick the next page available and continue the delete operation.
+	 */
+	nr_removed = 0;
+	retries = 3;
+	while (nr_removed < nr_pages) {
 
-	for (i = 0; i < nr_pages; i++) {
 		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
 			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
 
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
+		/*
+		 * Always get the fresh copy, the writer might have moved the
+		 * tail page while we are in this operation
+		 */
+		tail_page = &cpu_buffer->tail_page->list;
+		/*
+		 * tail page might be on reader page, we remove the next page
+		 * from the ring buffer
+		 */
+		if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+			tail_page = rb_list_head(tail_page->next);
+		to_remove = tail_page->next;
+		next_page = rb_list_head(to_remove)->next;
+
+		ret = NULL;
+		if (((unsigned long)to_remove & RB_FLAG_MASK) == RB_PAGE_HEAD) {
+			/*
+			 * this is a head page, we have to set RB_PAGE_HEAD
+			 * flag while updating the next pointer
+			 */
+			unsigned long tmp = (unsigned long)next_page |
+							RB_PAGE_HEAD;
+			ret = cmpxchg(&tail_page->next, to_remove,
+					(struct list_head *) tmp);
+
+		} else if (((unsigned long)to_remove & ~RB_PAGE_HEAD) ==
+					(unsigned long)to_remove) {
+
+			/* not a head page, just update the next pointer */
+			ret = cmpxchg(&tail_page->next, to_remove, next_page);
+
+		} else {
+			/*
+			 * this means that this page is being operated on
+			 * try the next page in the list
+			 */
+		}
+
+		if (ret != to_remove) {
+			/*
+			 * Well, try again with the next page.
+			 * If we cannot move the page in 3 retries, there are
+			 * lot of interrupts on this cpu and probably causing
+			 * some weird behavior. Warn in this case and stop
+			 * tracing
+			 */
+			if (RB_WARN_ON(cpu_buffer, !retries--))
+				break;
+			else
+				continue;
+		}
+
+		/*
+		 * point the next_page->prev to skip the to_remove page to
+		 * complete the removal process
+		 */
+		rb_list_head(next_page)->prev = rb_list_head(to_remove)->prev;
+
+		/* yay, we removed the page */
+		nr_removed++;
+		/* for the next page to remove, reset retry counter */
+		retries = 3;
+
+		to_remove_page = list_entry(rb_list_head(to_remove),
+					struct buffer_page, list);
+		next_buffer_page = list_entry(rb_list_head(next_page),
+					struct buffer_page, list);
+
+		if (cpu_buffer->head_page == to_remove_page) {
+			/*
+			 * update head page and change read pointer to make
+			 * sure any read iterators reset themselves
+			 */
+			cpu_buffer->head_page = next_buffer_page;
+			cpu_buffer->read = 0;
+		}
+		/* Also, update the start of cpu_buffer to keep it valid */
+		cpu_buffer->pages = rb_list_head(next_page);
+
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		free_buffer_page(to_remove_page);
+	}
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
 
 out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
@@ -1303,6 +1422,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	struct list_head *p;
 	unsigned i;
 
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	/* Make sure all writers are done with this buffer. */
+	synchronize_sched();
+
 	spin_lock_irq(&cpu_buffer->reader_lock);
 	rb_head_page_deactivate(cpu_buffer);
 
@@ -1319,17 +1444,21 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
+	atomic_dec(&cpu_buffer->record_disabled);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void update_pages_handler(struct work_struct *work)
 {
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1361,15 +1490,10 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
-
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
@@ -1396,13 +1520,23 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 				goto no_mem;
 		}
 
+		/* fire off all the required work handlers */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
+		}
+
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-				cpu_buffer->nr_pages = nr_pages;
-			}
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			wait_for_completion(&cpu_buffer->update_completion);
+			cpu_buffer->nr_pages = nr_pages;
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
@@ -1418,36 +1552,36 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 						&cpu_buffer->new_pages, cpu_id))
 			goto no_mem;
 
-		update_pages_handler(cpu_buffer);
+		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
+		wait_for_completion(&cpu_buffer->update_completion);
 
 		cpu_buffer->nr_pages = nr_pages;
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
  no_mem:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
 		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
@@ -1487,21 +1621,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8ea48e0..72894c1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2934,20 +2934,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -2961,14 +2951,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 5/5] trace: Make addition of pages in ring buffer atomic
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (3 preceding siblings ...)
  2011-07-26 22:59 ` [PATCH 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
@ 2011-07-26 22:59 ` Vaibhav Nagarnaik
  2011-08-16 21:46 ` [PATCH v2 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-26 22:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/ring_buffer.c |  109 +++++++++++++++++++++++++++++++------------
 1 files changed, 78 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0b43758..aecef65 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1415,36 +1415,68 @@ out:
 }
 
 static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
+	spin_lock_irq(&cpu_buffer->reader_lock);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 3 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 3;
+	success = 0;
+	while (retries--) {
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *head_page_with_bit;
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
 
-	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+		first_page = pages->next;
+		last_page  = pages->prev;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+
+		head_page_with_bit = (struct list_head *)
+				((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next  = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	spin_unlock_irq(&cpu_buffer->reader_lock);
-	atomic_dec(&cpu_buffer->record_disabled);
 }
 
 static void update_pages_handler(struct work_struct *work)
@@ -1453,8 +1485,7 @@ static void update_pages_handler(struct work_struct *work)
 			struct ring_buffer_per_cpu, update_pages_work);
 
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		rb_insert_pages(cpu_buffer);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
 
@@ -1475,7 +1506,7 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1500,6 +1531,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
+			if (atomic_read(&cpu_buffer->record_disabled)) {
+				err = -EBUSY;
+				goto out_err;
+			}
+
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 
@@ -1515,9 +1551,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
 		}
 
 		/* fire off all the required work handlers */
@@ -1540,6 +1578,12 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+
+		if (atomic_read(&cpu_buffer->record_disabled)) {
+			err = -EBUSY;
+			goto out_err;
+		}
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1549,8 +1593,10 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					&cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
 
 		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
 		wait_for_completion(&cpu_buffer->update_completion);
@@ -1564,7 +1610,7 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	mutex_unlock(&buffer->mutex);
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
 
@@ -1582,7 +1628,7 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		}
 	}
 	mutex_unlock(&buffer->mutex);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -3727,6 +3773,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 1/5] trace: Add a new readonly entry to report total buffer size
  2011-07-26 22:59 ` [PATCH 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
@ 2011-07-29 18:01   ` Steven Rostedt
  2011-07-29 19:09     ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-07-29 18:01 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
> The current file "buffer_size_kb" reports the size of per-cpu buffer and
> not the overall memory allocated which could be misleading. A new file
> "buffer_total_size_kb" adds up all the enabled CPU buffer sizes and
> reports it. This is only a readonly entry.
> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---
>  kernel/trace/trace.c |   27 +++++++++++++++++++++++++++
>  1 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index e5df02c..ce57c55 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -3569,6 +3569,24 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>  }
>  
>  static ssize_t
> +tracing_total_entries_read(struct file *filp, char __user *ubuf,
> +				size_t cnt, loff_t *ppos)
> +{
> +	struct trace_array *tr = filp->private_data;
> +	char buf[64];
> +	int r, cpu;
> +	unsigned long size = 0;
> +
> +	mutex_lock(&trace_types_lock);
> +	for_each_tracing_cpu(cpu)
> +		size += tr->entries >> 10;

Could you make this consistent with buffer_size_kb as well. That is, if
the buffer is "shrunk", could you have the expanded size printed as
well.

Thanks,

-- Steve

> +	mutex_unlock(&trace_types_lock);
> +
> +	r = sprintf(buf, "%lu\n", size);
> +	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +}
> +
> +static ssize_t
>  tracing_free_buffer_write(struct file *filp, const char __user *ubuf,
>  			  size_t cnt, loff_t *ppos)
>  {
> @@ -3739,6 +3757,12 @@ static const struct file_operations tracing_entries_fops = {
>  	.llseek		= generic_file_llseek,
>  };
>  
> +static const struct file_operations tracing_total_entries_fops = {
> +	.open		= tracing_open_generic,
> +	.read		= tracing_total_entries_read,
> +	.llseek		= generic_file_llseek,
> +};
> +
>  static const struct file_operations tracing_free_buffer_fops = {
>  	.write		= tracing_free_buffer_write,
>  	.release	= tracing_free_buffer_release,
> @@ -4450,6 +4474,9 @@ static __init int tracer_init_debugfs(void)
>  	trace_create_file("buffer_size_kb", 0644, d_tracer,
>  			&global_trace, &tracing_entries_fops);
>  
> +	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
> +			&global_trace, &tracing_total_entries_fops);
> +
>  	trace_create_file("free_buffer", 0644, d_tracer,
>  			&global_trace, &tracing_free_buffer_fops);
>  



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] trace: Add ring buffer stats to measure rate of events
  2011-07-26 22:59 ` [PATCH 2/5] trace: Add ring buffer stats to measure rate of events Vaibhav Nagarnaik
@ 2011-07-29 18:10   ` Steven Rostedt
  2011-07-29 19:10     ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-07-29 18:10 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
> The stats file under per_cpu folder provides the number of entries,
> overruns and other statistics about the CPU ring buffer. However, the
> numbers do not provide any indication of how full the ring buffer is in
> bytes compared to the overall size in bytes. Also, it is helpful to know
> the rate at which the cpu buffer is filling up.
> 
> This patch adds an entry "bytes: " in printed stats for per_cpu ring
> buffer which provides the actual bytes consumed in the ring buffer. This
> field includes the number of bytes used by recorded events and the
> padding bytes added when moving the tail pointer to next page.
> 
> It also adds the following time stamps:
> "head ts:" - the oldest timestamp in the ring buffer


I hate the name of "head_ts", as it really is meaningless. The head of
our ring buffer is indeed the oldest events, but other ring buffers have
head as the newest. This is an internal name that should not be used
outside of the ring buffer code itself. Maybe call it "oldest_ts", or
even more verbose (and what it actually is), "oldest_event_ts".


> "now ts:"  - the timestamp at the time of reading
> 
> The field "now ts" provides a consistent time snapshot to the userspace
> when being read. This is read from the same trace clock used by tracing
> event timestamps.
> 
> Together, these values provide the rate at which the buffer is filling
> up, from the formula:
> bytes / (now_ts - head_ts)
> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---

>  /**
> + * ring_buffer_head_ts - get the oldest event timestamp from the buffer
> + * @buffer: The ring buffer
> + * @cpu: The per CPU buffer to read from.
> + */
> +unsigned long ring_buffer_head_ts(struct ring_buffer *buffer, int cpu)

Hence, replace head_ts, with something else here.

Thanks,

-- Steve

> +{
> +	unsigned long flags;
> +	struct ring_buffer_per_cpu *cpu_buffer;
> +	struct buffer_page *bpage;
> +	unsigned long ret;
> +
> +	if (!cpumask_test_cpu(cpu, buffer->cpumask))
> +		return 0;
> +
> +	cpu_buffer = buffer->buffers[cpu];
> +	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> +	/*
> +	 * if the tail is on reader_page, oldest time stamp is on the reader
> +	 * page
> +	 */
> +	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
> +		bpage = cpu_buffer->reader_page;
> +	else
> +		bpage = rb_set_head_page(cpu_buffer);
> +	ret = bpage->page->time_stamp;
> +	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> +
> +	return ret;
> +}



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 3/5] trace: Add per_cpu ring buffer control files
  2011-07-26 22:59 ` [PATCH 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
@ 2011-07-29 18:14   ` Steven Rostedt
  2011-07-29 19:13     ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-07-29 18:14 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
> Add a debugfs entry under per_cpu/ folder for each cpu called
> buffer_size_kb to control the ring buffer size for each CPU
> independently.
> 
> If the global file buffer_size_kb is used to set size, the individual
> ring buffers will be adjusted to the given size. The buffer_size_kb will
> report the common size to maintain backward compatibility.
> 
> If the buffer_size_kb file under the per_cpu/ directory is used to
> change buffer size for a specific CPU, only the size of the respective
> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
> the ring buffer sizes of all the CPUs at that point.

No, buffer_size_kb should not change in the what it reports. This is why
you have a buffer_total_size_kb. Use that. If the per_cpu buffers are
changed, then this should just report "various" or something to that
affect. This will be a good way to know if the per_cpu buffers are the
same or not.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 1/5] trace: Add a new readonly entry to report total buffer size
  2011-07-29 18:01   ` Steven Rostedt
@ 2011-07-29 19:09     ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-29 19:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Fri, Jul 29, 2011 at 11:01 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
>> The current file "buffer_size_kb" reports the size of per-cpu buffer and
>> not the overall memory allocated which could be misleading. A new file
>> "buffer_total_size_kb" adds up all the enabled CPU buffer sizes and
>> reports it. This is only a readonly entry.
>>
>> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
>> ---
>>  kernel/trace/trace.c |   27 +++++++++++++++++++++++++++
>>  1 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index e5df02c..ce57c55 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -3569,6 +3569,24 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>>  }
>>
>>  static ssize_t
>> +tracing_total_entries_read(struct file *filp, char __user *ubuf,
>> +                             size_t cnt, loff_t *ppos)
>> +{
>> +     struct trace_array *tr = filp->private_data;
>> +     char buf[64];
>> +     int r, cpu;
>> +     unsigned long size = 0;
>> +
>> +     mutex_lock(&trace_types_lock);
>> +     for_each_tracing_cpu(cpu)
>> +             size += tr->entries >> 10;
>
> Could you make this consistent with buffer_size_kb as well. That is, if
> the buffer is "shrunk", could you have the expanded size printed as
> well.
>
> Thanks,
>
> -- Steve
>

Sure. I forgot about that.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] trace: Add ring buffer stats to measure rate of events
  2011-07-29 18:10   ` Steven Rostedt
@ 2011-07-29 19:10     ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-29 19:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Fri, Jul 29, 2011 at 11:10 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
>> The stats file under per_cpu folder provides the number of entries,
>> overruns and other statistics about the CPU ring buffer. However, the
>> numbers do not provide any indication of how full the ring buffer is in
>> bytes compared to the overall size in bytes. Also, it is helpful to know
>> the rate at which the cpu buffer is filling up.
>>
>> This patch adds an entry "bytes: " in printed stats for per_cpu ring
>> buffer which provides the actual bytes consumed in the ring buffer. This
>> field includes the number of bytes used by recorded events and the
>> padding bytes added when moving the tail pointer to next page.
>>
>> It also adds the following time stamps:
>> "head ts:" - the oldest timestamp in the ring buffer
>
>
> I hate the name of "head_ts", as it really is meaningless. The head of
> our ring buffer is indeed the oldest events, but other ring buffers have
> head as the newest. This is an internal name that should not be used
> outside of the ring buffer code itself. Maybe call it "oldest_ts", or
> even more verbose (and what it actually is), "oldest_event_ts".
>
>
>> "now ts:"  - the timestamp at the time of reading
>>
>> The field "now ts" provides a consistent time snapshot to the userspace
>> when being read. This is read from the same trace clock used by tracing
>> event timestamps.
>>
>> Together, these values provide the rate at which the buffer is filling
>> up, from the formula:
>> bytes / (now_ts - head_ts)
>>
>> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
>> ---
>
>>  /**
>> + * ring_buffer_head_ts - get the oldest event timestamp from the buffer
>> + * @buffer: The ring buffer
>> + * @cpu: The per CPU buffer to read from.
>> + */
>> +unsigned long ring_buffer_head_ts(struct ring_buffer *buffer, int cpu)
>
> Hence, replace head_ts, with something else here.
>
> Thanks,
>
> -- Steve
>

'oldest_event_ts' sounds good.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 3/5] trace: Add per_cpu ring buffer control files
  2011-07-29 18:14   ` Steven Rostedt
@ 2011-07-29 19:13     ` Vaibhav Nagarnaik
  2011-07-29 21:25       ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-29 19:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Fri, Jul 29, 2011 at 11:14 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
>> Add a debugfs entry under per_cpu/ folder for each cpu called
>> buffer_size_kb to control the ring buffer size for each CPU
>> independently.
>>
>> If the global file buffer_size_kb is used to set size, the individual
>> ring buffers will be adjusted to the given size. The buffer_size_kb will
>> report the common size to maintain backward compatibility.
>>
>> If the buffer_size_kb file under the per_cpu/ directory is used to
>> change buffer size for a specific CPU, only the size of the respective
>> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
>> the ring buffer sizes of all the CPUs at that point.
>
> No, buffer_size_kb should not change in the what it reports. This is why
> you have a buffer_total_size_kb. Use that. If the per_cpu buffers are
> changed, then this should just report "various" or something to that
> affect. This will be a good way to know if the per_cpu buffers are the
> same or not.

Aargh.

I updated the code to return 'X' when individual cpu buffers don't match
up in size and forgot to update the changelog, I will do it now.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-26 22:59 ` [PATCH 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
@ 2011-07-29 21:23   ` Steven Rostedt
  2011-07-29 23:30     ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-07-29 21:23 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:

> +		ret = NULL;
> +		if (((unsigned long)to_remove & RB_FLAG_MASK) == RB_PAGE_HEAD) {
> +			/*
> +			 * this is a head page, we have to set RB_PAGE_HEAD
> +			 * flag while updating the next pointer
> +			 */
> +			unsigned long tmp = (unsigned long)next_page |
> +							RB_PAGE_HEAD;
> +			ret = cmpxchg(&tail_page->next, to_remove,
> +					(struct list_head *) tmp);

This is fine, it will work.

> +
> +		} else if (((unsigned long)to_remove & ~RB_PAGE_HEAD) ==
> +					(unsigned long)to_remove) {
> +
> +			/* not a head page, just update the next pointer */
> +			ret = cmpxchg(&tail_page->next, to_remove, next_page);

This is not, it wont work.

You can *only* remove the HEAD from the ring buffer without causing
issues.

As you probably know, the trick is done with the list pointers. We or
the pointer with 1 for head, and the writer will or it with 2 when it
updates the page.

This only works if we have a 1 or 2 here. Now if we try to do what you
suggest, by starting with a 0, and ending with 0, we may fail. Between
the  to_remove = tail_page->next and the cmpxchg(), the writer could
easily move to the tail page, and you would never know it.

Now we just removed the tail page with no idea that the write is on it.
The writer could have also moved on to the next page, and we just
removed the most recently recorded data.

The only way to really make this work is to always get it from the HEAD
page. If there's data there, we could just store it separately, so that
the read_page can read from it first. We will still need to be careful
with the writer on the page. But I think this is doable.

That is, read the pages from head, if there's no data on it, simply
remove the pages. If there is data, we store it off later. If the writer
happens to be on the page, we will can check that. We could even
continue to get pages, because we will be moving the header page with
the cmpxchg, and the writer does that too. It will take some serious
thought, but it is possible to do this.

-- Steve



> +
> +		} else {
> +			/*
> +			 * this means that this page is being operated on
> +			 * try the next page in the list
> +			 */
> +		}
> +
> +		if (ret != to_remove) {
> +			/*
> +			 * Well, try again with the next page.
> +			 * If we cannot move the page in 3 retries, there are
> +			 * lot of interrupts on this cpu and probably causing
> +			 * some weird behavior. Warn in this case and stop
> +			 * tracing
> +			 */
> +			if (RB_WARN_ON(cpu_buffer, !retries--))
> +				break;
> +			else
> +				continue;



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 3/5] trace: Add per_cpu ring buffer control files
  2011-07-29 19:13     ` Vaibhav Nagarnaik
@ 2011-07-29 21:25       ` Steven Rostedt
  0 siblings, 0 replies; 80+ messages in thread
From: Steven Rostedt @ 2011-07-29 21:25 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Fri, 2011-07-29 at 12:13 -0700, Vaibhav Nagarnaik wrote:
> On Fri, Jul 29, 2011 at 11:14 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
> >> Add a debugfs entry under per_cpu/ folder for each cpu called
> >> buffer_size_kb to control the ring buffer size for each CPU
> >> independently.
> >>
> >> If the global file buffer_size_kb is used to set size, the individual
> >> ring buffers will be adjusted to the given size. The buffer_size_kb will
> >> report the common size to maintain backward compatibility.
> >>
> >> If the buffer_size_kb file under the per_cpu/ directory is used to
> >> change buffer size for a specific CPU, only the size of the respective
> >> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
> >> the ring buffer sizes of all the CPUs at that point.
> >
> > No, buffer_size_kb should not change in the what it reports. This is why
> > you have a buffer_total_size_kb. Use that. If the per_cpu buffers are
> > changed, then this should just report "various" or something to that
> > affect. This will be a good way to know if the per_cpu buffers are the
> > same or not.
> 
> Aargh.
> 
> I updated the code to return 'X' when individual cpu buffers don't match
> up in size and forgot to update the changelog, I will do it now.
> 

Heh, I only looked at the changelog. I was going to say 'X' but then
thought that 'various' would be good too. But since we already use 'X'
for the filters and enabled files, it is better to stay consistent. So
'X' is fine.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-29 21:23   ` Steven Rostedt
@ 2011-07-29 23:30     ` Vaibhav Nagarnaik
  2011-07-30  1:12       ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-07-29 23:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Fri, Jul 29, 2011 at 2:23 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:
>
>> +
>> +             } else if (((unsigned long)to_remove & ~RB_PAGE_HEAD) ==
>> +                                     (unsigned long)to_remove) {
>> +
>> +                     /* not a head page, just update the next pointer */
>> +                     ret = cmpxchg(&tail_page->next, to_remove, next_page);
>
> This is not, it wont work.
>
> You can *only* remove the HEAD from the ring buffer without causing
> issues.
>
> As you probably know, the trick is done with the list pointers. We or
> the pointer with 1 for head, and the writer will or it with 2 when it
> updates the page.
>
> This only works if we have a 1 or 2 here. Now if we try to do what you
> suggest, by starting with a 0, and ending with 0, we may fail. Between
> the  to_remove = tail_page->next and the cmpxchg(), the writer could
> easily move to the tail page, and you would never know it.
>
> Now we just removed the tail page with no idea that the write is on it.
> The writer could have also moved on to the next page, and we just
> removed the most recently recorded data.
>
> The only way to really make this work is to always get it from the HEAD
> page. If there's data there, we could just store it separately, so that
> the read_page can read from it first. We will still need to be careful
> with the writer on the page. But I think this is doable.
>
> That is, read the pages from head, if there's no data on it, simply
> remove the pages. If there is data, we store it off later. If the writer
> happens to be on the page, we will can check that. We could even
> continue to get pages, because we will be moving the header page with
> the cmpxchg, and the writer does that too. It will take some serious
> thought, but it is possible to do this.
>
> -- Steve

There should only be IRQs and NMIs that preempt this operation since
the removal operation of a cpu ring buffer is scheduled on keventd of
the same CPU. But you're right there is a race between reading the
to_remove pointer and cmpxchg() operation.

While we are trying to remove the head page, the writer could move to
the head page. Additionally, we will be adding complexity to manage data
from all the removed pages for read_page.

I discussed with David and here are some ways we thought to address
this:
1. After the cmpxchg(), if we see that the tail page has moved to
   to_remove page, then revert the cmpxchg() operation and try with the
   next page. This might add some more complexity and doesn't work with
   an interrupt storm coming in.
2. Disable/enable IRQs while removing pages. This won't stop traced NMIs
   though and we are now affecting the system behavior.
3. David didn't like this, but we could increment
   cpu_buffer->record_disabled to prevent writer from moving any pages
   for the duration of this process. If we combine this with disabling
   preemption, we would be losing traces from an IRQ/NMI context, but we
   would be safe from races while this operation is going on.

The reason we want to remove the pages after tail is to give priority to
empty pages first before touching any data pages. Also according to your
suggestion, I am not sure how to manage the data pages once they are
removed, since they cannot be freed and the reader might not be present
which will make the pages stay resident, a form of memory leak.



Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-29 23:30     ` Vaibhav Nagarnaik
@ 2011-07-30  1:12       ` Steven Rostedt
  2011-07-30  1:50         ` David Sharp
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-07-30  1:12 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Fri, 2011-07-29 at 16:30 -0700, Vaibhav Nagarnaik wrote:
> On Fri, Jul 29, 2011 at 2:23 PM, Steven Rostedt <rostedt@goodmis.org> wrote:

> There should only be IRQs and NMIs that preempt this operation since
> the removal operation of a cpu ring buffer is scheduled on keventd of
> the same CPU. But you're right there is a race between reading the
> to_remove pointer and cmpxchg() operation.

Bah, this is what I get for reviewing patches and doing other work at
the same time. I saw the work/completion set up, but it didn't register
to me that this was calling schedule_work_on(cpu..). 

But that said, I'm not sure I really like that. This still seems a bit
too complex.

> 
> While we are trying to remove the head page, the writer could move to
> the head page. Additionally, we will be adding complexity to manage data
> from all the removed pages for read_page.
> 
> I discussed with David and here are some ways we thought to address
> this:
> 1. After the cmpxchg(), if we see that the tail page has moved to
>    to_remove page, then revert the cmpxchg() operation and try with the
>    next page. This might add some more complexity and doesn't work with
>    an interrupt storm coming in.

Egad no. That will just make things more complex, and harder to verify
is correct.

> 2. Disable/enable IRQs while removing pages. This won't stop traced NMIs
>    though and we are now affecting the system behavior.
> 3. David didn't like this, but we could increment
>    cpu_buffer->record_disabled to prevent writer from moving any pages
>    for the duration of this process. If we combine this with disabling
>    preemption, we would be losing traces from an IRQ/NMI context, but we
>    would be safe from races while this operation is going on.
> 
> The reason we want to remove the pages after tail is to give priority to
> empty pages first before touching any data pages. Also according to your
> suggestion, I am not sure how to manage the data pages once they are
> removed, since they cannot be freed and the reader might not be present
> which will make the pages stay resident, a form of memory leak.

They will be freed when they are eventually read. Right, if there's no
reader, then they will not be freed, but that isn't really a true memory
leak. It is basically just like we didn't remove the pages, but I do not
consider this a memory leak. The pages are just waiting to be reclaimed,
and will be freed on any reset of the ring buffer.

Anyway, the choices are:

* Remove from the HEAD and use the existing algorithm that we've been
using since 2008. This requires a bit of accounting on the reader side,
but nothing too complex.

Pros: Should not have any major race conditions. Requires no
schedule_work_on() calls. Uses existing algorithm

Cons: Can keep pages around if no reader is present, and ring buffer is
not reset.

* Read from tail. Modify the already complex but tried and true lockless
algorithm.

Pros: Removes empty pages first.

Cons: Adds a lot more complexity to a complex system that has been
working since 2008.


The above makes me lean towards just taking from HEAD.

If you are worried about leaked pages, we could even have a debugfs file
that lets us monitor the pages that are pending read, and have the user
(or application) be able to flush them if they see the ring buffer is
full anyway.

-- Steve




^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-30  1:12       ` Steven Rostedt
@ 2011-07-30  1:50         ` David Sharp
  2011-07-30  2:43           ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: David Sharp @ 2011-07-30  1:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vaibhav Nagarnaik, Frederic Weisbecker, Ingo Molnar,
	Michael Rubin, linux-kernel

On Fri, Jul 29, 2011 at 6:12 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 2011-07-29 at 16:30 -0700, Vaibhav Nagarnaik wrote:
>> On Fri, Jul 29, 2011 at 2:23 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> There should only be IRQs and NMIs that preempt this operation since
>> the removal operation of a cpu ring buffer is scheduled on keventd of
>> the same CPU. But you're right there is a race between reading the
>> to_remove pointer and cmpxchg() operation.
>
> Bah, this is what I get for reviewing patches and doing other work at
> the same time. I saw the work/completion set up, but it didn't register
> to me that this was calling schedule_work_on(cpu..).
>
> But that said, I'm not sure I really like that. This still seems a bit
> too complex.

What is it that you don't like? the work/completion, the reliance on
running on the same cpu, or just the complexity of procedure?

>> While we are trying to remove the head page, the writer could move to
>> the head page. Additionally, we will be adding complexity to manage data
>> from all the removed pages for read_page.
>>
>> I discussed with David and here are some ways we thought to address
>> this:
>> 1. After the cmpxchg(), if we see that the tail page has moved to
>>    to_remove page, then revert the cmpxchg() operation and try with the
>>    next page. This might add some more complexity and doesn't work with
>>    an interrupt storm coming in.
>
> Egad no. That will just make things more complex, and harder to verify
> is correct.
>
>> 2. Disable/enable IRQs while removing pages. This won't stop traced NMIs
>>    though and we are now affecting the system behavior.
>> 3. David didn't like this, but we could increment
>>    cpu_buffer->record_disabled to prevent writer from moving any pages
>>    for the duration of this process. If we combine this with disabling
>>    preemption, we would be losing traces from an IRQ/NMI context, but we
>>    would be safe from races while this operation is going on.
>>
>> The reason we want to remove the pages after tail is to give priority to
>> empty pages first before touching any data pages. Also according to your
>> suggestion, I am not sure how to manage the data pages once they are
>> removed, since they cannot be freed and the reader might not be present
>> which will make the pages stay resident, a form of memory leak.
>
> They will be freed when they are eventually read. Right, if there's no
> reader, then they will not be freed, but that isn't really a true memory
> leak. It is basically just like we didn't remove the pages, but I do not
> consider this a memory leak. The pages are just waiting to be reclaimed,
> and will be freed on any reset of the ring buffer.
>
> Anyway, the choices are:
>
> * Remove from the HEAD and use the existing algorithm that we've been
> using since 2008. This requires a bit of accounting on the reader side,
> but nothing too complex.
>
> Pros: Should not have any major race conditions. Requires no
> schedule_work_on() calls. Uses existing algorithm
>
> Cons: Can keep pages around if no reader is present, and ring buffer is
> not reset.

Con: by definition, removes valid trace data from the ring buffer,
even if it is not full. I think that's a pretty big con for the
usability of the feature.

>
> * Read from tail. Modify the already complex but tried and true lockless
> algorithm.
>
> Pros: Removes empty pages first.
>
> Cons: Adds a lot more complexity to a complex system that has been
> working since 2008.
>
>
> The above makes me lean towards just taking from HEAD.
>
> If you are worried about leaked pages, we could even have a debugfs file
> that lets us monitor the pages that are pending read, and have the user
> (or application) be able to flush them if they see the ring buffer is
> full anyway.

The reason we want per-cpu dynamic resizing is to increase memory
utilization, so leaking pages would make me sad.

Let us mull it over this weekend... maybe we'll come up with something
that works more simply.

>
> -- Steve
>
>
>
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-30  1:50         ` David Sharp
@ 2011-07-30  2:43           ` Steven Rostedt
  2011-07-30  3:44             ` David Sharp
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-07-30  2:43 UTC (permalink / raw)
  To: David Sharp
  Cc: Vaibhav Nagarnaik, Frederic Weisbecker, Ingo Molnar,
	Michael Rubin, linux-kernel

On Fri, 2011-07-29 at 18:50 -0700, David Sharp wrote:
> On Fri, Jul 29, 2011 at 6:12 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Fri, 2011-07-29 at 16:30 -0700, Vaibhav Nagarnaik wrote:
> >> On Fri, Jul 29, 2011 at 2:23 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >> There should only be IRQs and NMIs that preempt this operation since
> >> the removal operation of a cpu ring buffer is scheduled on keventd of
> >> the same CPU. But you're right there is a race between reading the
> >> to_remove pointer and cmpxchg() operation.
> >
> > Bah, this is what I get for reviewing patches and doing other work at
> > the same time. I saw the work/completion set up, but it didn't register
> > to me that this was calling schedule_work_on(cpu..).
> >
> > But that said, I'm not sure I really like that. This still seems a bit
> > too complex.
> 
> What is it that you don't like? the work/completion, the reliance on
> running on the same cpu, or just the complexity of procedure?

The added complexity. This is complex enough, we don't need to make it
more so.

> 
> >> While we are trying to remove the head page, the writer could move to
> >> the head page. Additionally, we will be adding complexity to manage data
> >> from all the removed pages for read_page.
> >>
> >> I discussed with David and here are some ways we thought to address
> >> this:
> >> 1. After the cmpxchg(), if we see that the tail page has moved to
> >>    to_remove page, then revert the cmpxchg() operation and try with the
> >>    next page. This might add some more complexity and doesn't work with
> >>    an interrupt storm coming in.
> >
> > Egad no. That will just make things more complex, and harder to verify
> > is correct.
> >
> >> 2. Disable/enable IRQs while removing pages. This won't stop traced NMIs
> >>    though and we are now affecting the system behavior.
> >> 3. David didn't like this, but we could increment
> >>    cpu_buffer->record_disabled to prevent writer from moving any pages
> >>    for the duration of this process. If we combine this with disabling
> >>    preemption, we would be losing traces from an IRQ/NMI context, but we
> >>    would be safe from races while this operation is going on.
> >>
> >> The reason we want to remove the pages after tail is to give priority to
> >> empty pages first before touching any data pages. Also according to your
> >> suggestion, I am not sure how to manage the data pages once they are
> >> removed, since they cannot be freed and the reader might not be present
> >> which will make the pages stay resident, a form of memory leak.
> >
> > They will be freed when they are eventually read. Right, if there's no
> > reader, then they will not be freed, but that isn't really a true memory
> > leak. It is basically just like we didn't remove the pages, but I do not
> > consider this a memory leak. The pages are just waiting to be reclaimed,
> > and will be freed on any reset of the ring buffer.
> >
> > Anyway, the choices are:
> >
> > * Remove from the HEAD and use the existing algorithm that we've been
> > using since 2008. This requires a bit of accounting on the reader side,
> > but nothing too complex.
> >
> > Pros: Should not have any major race conditions. Requires no
> > schedule_work_on() calls. Uses existing algorithm
> >
> > Cons: Can keep pages around if no reader is present, and ring buffer is
> > not reset.
> 
> Con: by definition, removes valid trace data from the ring buffer,
> even if it is not full. I think that's a pretty big con for the
> usability of the feature.

Um, how does it remove valid trace data? We don't free it, we off load
it. Think of it as "extended reader pages". That is, they are held off
until the user asks to read these pages. Then they will get the data
again. What is a con about that?

> 
> >
> > * Read from tail. Modify the already complex but tried and true lockless
> > algorithm.
> >
> > Pros: Removes empty pages first.
> >
> > Cons: Adds a lot more complexity to a complex system that has been
> > working since 2008.
> >
> >
> > The above makes me lean towards just taking from HEAD.
> >
> > If you are worried about leaked pages, we could even have a debugfs file
> > that lets us monitor the pages that are pending read, and have the user
> > (or application) be able to flush them if they see the ring buffer is
> > full anyway.
> 
> The reason we want per-cpu dynamic resizing is to increase memory
> utilization, so leaking pages would make me sad.

Shouldn't be too leaky, especially if something can read it. Perhaps we
could figure out a way to swap them back in.

> 
> Let us mull it over this weekend... maybe we'll come up with something
> that works more simply.

Hmm, actually, we could take an idea that Mathieu used for his ring
buffer. He couldn't swap out a page if the writer was on it, so he would
send out ipi's to push the writer off the page and just pad the rest.

We could do the same thing here. Use the writer logic to make the
change. That would require starting a commit, perhaps just write a
padding some how. If we fail the reserve, we just try again. The writers
are set up to sync with each other per cpu. We would need a way that the
NMIs and interrupts (if it doesn't work with interrupts enabled, it wont
work for NMIs, so I will not accept disabling interrupts) can work
together in this effort too.

But I'm working on other things right now and don't have time to think
about it. But perhaps you can come up with some ideas too.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-30  2:43           ` Steven Rostedt
@ 2011-07-30  3:44             ` David Sharp
  0 siblings, 0 replies; 80+ messages in thread
From: David Sharp @ 2011-07-30  3:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Vaibhav Nagarnaik, Frederic Weisbecker, Ingo Molnar,
	Michael Rubin, linux-kernel

On Fri, Jul 29, 2011 at 7:43 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 2011-07-29 at 18:50 -0700, David Sharp wrote:
>> On Fri, Jul 29, 2011 at 6:12 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Fri, 2011-07-29 at 16:30 -0700, Vaibhav Nagarnaik wrote:

>> What is it that you don't like? the work/completion, the reliance on
>> running on the same cpu, or just the complexity of procedure?
>
> The added complexity. This is complex enough, we don't need to make it
> more so.

Sure, complexity should be part of the cost-benefit analysis. I think
this will be a pretty powerful feature, though. Let's see how it goes;
maybe Vaibhav and I can come up with something simpler, or using
established protocols.

>> > Anyway, the choices are:
>> >
>> > * Remove from the HEAD and use the existing algorithm that we've been
>> > using since 2008. This requires a bit of accounting on the reader side,
>> > but nothing too complex.
>> >
>> > Pros: Should not have any major race conditions. Requires no
>> > schedule_work_on() calls. Uses existing algorithm
>> >
>> > Cons: Can keep pages around if no reader is present, and ring buffer is
>> > not reset.
>>
>> Con: by definition, removes valid trace data from the ring buffer,
>> even if it is not full. I think that's a pretty big con for the
>> usability of the feature.
>
> Um, how does it remove valid trace data? We don't free it, we off load
> it. Think of it as "extended reader pages". That is, they are held off
> until the user asks to read these pages. Then they will get the data
> again. What is a con about that?

I think we're talking about different things. You're talking about
keeping the "removed" pages around for the reader if it wants it,
whereas I'm talking about trying to free the pages. In our use case,
we're trying to free up memory, so we would want to immediately use
the "flush the extended reader pages" control file you suggested
below. So, in effect, for us it really is the same as removing valid
trace data, even if there are empty pages. "Offloading" the pages
isn't really good enough. It's another interesting use case, but
doesn't meet the goal of this patch series.

Maybe the use case hasn't been stated coherently: We're in overwrite
mode (but perhaps still before overflow has happened), not reading the
trace yet, waiting for something interesting to happen. In the
meantime, CPUs have varying rates of events occurring on them, and we
have only so much memory on the system set aside for tracing. In order
to efficiently use that memory, we want to adjust the sizes of the
per-cpu buffers in flight so that each CPU has approximately the same
time span, and for as far back as possible within our memory
allocation. Therefore, we want to free empty pages first, and then the
pages with the oldest data.

>
>>
>> >
>> > * Read from tail. Modify the already complex but tried and true lockless
>> > algorithm.
>> >
>> > Pros: Removes empty pages first.
>> >
>> > Cons: Adds a lot more complexity to a complex system that has been
>> > working since 2008.
>> >
>> >
>> > The above makes me lean towards just taking from HEAD.
>> >
>> > If you are worried about leaked pages, we could even have a debugfs file
>> > that lets us monitor the pages that are pending read, and have the user
>> > (or application) be able to flush them if they see the ring buffer is
>> > full anyway.
>>
>> The reason we want per-cpu dynamic resizing is to increase memory
>> utilization, so leaking pages would make me sad.
>
> Shouldn't be too leaky, especially if something can read it. Perhaps we
> could figure out a way to swap them back in.
>
>>
>> Let us mull it over this weekend... maybe we'll come up with something
>> that works more simply.
>
> Hmm, actually, we could take an idea that Mathieu used for his ring
> buffer. He couldn't swap out a page if the writer was on it, so he would
> send out ipi's to push the writer off the page and just pad the rest.

hmm, I'm not seeing how we could use that technique without dropping
recent events. Dropping older events is preferable, in which case we
might as well remove from the head page. I'll add it to my toolbox
though, as I think about it.

> (if it doesn't work with interrupts enabled, it wont
> work for NMIs, so I will not accept disabling interrupts)

I agree, we should not disable interrupts.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 0/5] Add dynamic updates to trace ring buffer
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (4 preceding siblings ...)
  2011-07-26 22:59 ` [PATCH 5/5] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
@ 2011-08-16 21:46 ` Vaibhav Nagarnaik
  2011-08-16 21:46 ` [PATCH v2 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-16 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

These patches are in response to the fact that sometimes there is higher
rate of events generated on some CPUs as compared to others. This makes
it inefficient to have equal memory size allocated to each of the
per-cpu ring buffers.

This patch series adds 3 things to achieve this:
* Add a way to measure the rate of events generated on a CPU. This is a
  part of patch#2 which makes the 'stats' files print out the number of
  bytes in the ring buffer, the oldest time stamp ("head ts"), and the
  current time stamp ("now ts"). The rate is measured as: bytes /
  (now-head)

* The next patch#3 adds the flexibility to assign different sizes to
  individual per-cpu ring buffers. This is done by adding a
  "buffer_size_kb" debugfs file entry under per_cpu/* directories.

* The final two patches provide a way to change the size of ring
  buffer concurrent to events being added to the ring buffer. Patch#4
  adds functionality to remove pages from the ring buffer and patch#5
  adds functionality to add pages to the ring buffer.

Patch#1 adds a debugfs entry "buffer_total_size_kb" which provides the
total memory allocated for the ring buffer.

This makes it easy for a user process to monitor the rate at which the
ring buffers are being filled up and update the individual per-cpu ring
buffer sizes in response to it.

Changelog v2-v1:
* This changes the logic of page removal from the ring buffer, based on
  comments from Steven Rostedt about the racey behavior of earlier code.
* Some other changes to the functionality and variable names as asked by
  Steven Rostedt.

Vaibhav Nagarnaik (5):
  trace: Add a new readonly entry to report total buffer size
  trace: Add ring buffer stats to measure rate of events
  trace: Add per_cpu ring buffer control files
  trace: Make removal of ring buffer pages atomic
  trace: Make addition of pages in ring buffer atomic

 include/linux/ring_buffer.h |    8 +-
 kernel/trace/ring_buffer.c  |  543 ++++++++++++++++++++++++++++++-------------
 kernel/trace/trace.c        |  247 +++++++++++++++-----
 kernel/trace/trace.h        |    2 +-
 4 files changed, 583 insertions(+), 217 deletions(-)

-- 
1.7.3.1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v2 1/5] trace: Add a new readonly entry to report total buffer size
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (5 preceding siblings ...)
  2011-08-16 21:46 ` [PATCH v2 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
@ 2011-08-16 21:46 ` Vaibhav Nagarnaik
  2011-08-16 21:46 ` [PATCH v2 2/5] trace: Add ring buffer stats to measure rate of events Vaibhav Nagarnaik
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-16 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

The current file "buffer_size_kb" reports the size of per-cpu buffer and
not the overall memory allocated which could be misleading. A new file
"buffer_total_size_kb" adds up all the enabled CPU buffer sizes and
reports it. This is only a readonly entry.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/trace.c |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e5df02c..0117678 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3569,6 +3569,30 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 }
 
 static ssize_t
+tracing_total_entries_read(struct file *filp, char __user *ubuf,
+				size_t cnt, loff_t *ppos)
+{
+	struct trace_array *tr = filp->private_data;
+	char buf[64];
+	int r, cpu;
+	unsigned long size = 0, expanded_size = 0;
+
+	mutex_lock(&trace_types_lock);
+	for_each_tracing_cpu(cpu) {
+		size += tr->entries >> 10;
+		if (!ring_buffer_expanded)
+			expanded_size += trace_buf_size >> 10;
+	}
+	if (ring_buffer_expanded)
+		r = sprintf(buf, "%lu\n", size);
+	else
+		r = sprintf(buf, "%lu (expanded: %lu)\n", size, expanded_size);
+	mutex_unlock(&trace_types_lock);
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
 tracing_free_buffer_write(struct file *filp, const char __user *ubuf,
 			  size_t cnt, loff_t *ppos)
 {
@@ -3739,6 +3763,12 @@ static const struct file_operations tracing_entries_fops = {
 	.llseek		= generic_file_llseek,
 };
 
+static const struct file_operations tracing_total_entries_fops = {
+	.open		= tracing_open_generic,
+	.read		= tracing_total_entries_read,
+	.llseek		= generic_file_llseek,
+};
+
 static const struct file_operations tracing_free_buffer_fops = {
 	.write		= tracing_free_buffer_write,
 	.release	= tracing_free_buffer_release,
@@ -4450,6 +4480,9 @@ static __init int tracer_init_debugfs(void)
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
 			&global_trace, &tracing_entries_fops);
 
+	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
+			&global_trace, &tracing_total_entries_fops);
+
 	trace_create_file("free_buffer", 0644, d_tracer,
 			&global_trace, &tracing_free_buffer_fops);
 
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 2/5] trace: Add ring buffer stats to measure rate of events
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (6 preceding siblings ...)
  2011-08-16 21:46 ` [PATCH v2 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
@ 2011-08-16 21:46 ` Vaibhav Nagarnaik
  2011-08-16 21:46 ` [PATCH v2 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-16 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

The stats file under per_cpu folder provides the number of entries,
overruns and other statistics about the CPU ring buffer. However, the
numbers do not provide any indication of how full the ring buffer is in
bytes compared to the overall size in bytes. Also, it is helpful to know
the rate at which the cpu buffer is filling up.

This patch adds an entry "bytes: " in printed stats for per_cpu ring
buffer which provides the actual bytes consumed in the ring buffer. This
field includes the number of bytes used by recorded events and the
padding bytes added when moving the tail pointer to next page.

It also adds the following time stamps:
"oldest event ts:" - the oldest timestamp in the ring buffer
"now ts:"  - the timestamp at the time of reading

The field "now ts" provides a consistent time snapshot to the userspace
when being read. This is read from the same trace clock used by tracing
event timestamps.

Together, these values provide the rate at which the buffer is filling
up, from the formula:
bytes / (now_ts - oldest_event_ts)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 include/linux/ring_buffer.h |    2 +
 kernel/trace/ring_buffer.c  |   70 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/trace.c        |   13 ++++++++
 3 files changed, 84 insertions(+), 1 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b891de9..67be037 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -154,6 +154,8 @@ void ring_buffer_record_enable(struct ring_buffer *buffer);
 void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
 
+unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu);
+unsigned long ring_buffer_bytes_cpu(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_entries(struct ring_buffer *buffer);
 unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
 unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 731201b..acf6b68 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -488,12 +488,14 @@ struct ring_buffer_per_cpu {
 	struct buffer_page		*reader_page;
 	unsigned long			lost_events;
 	unsigned long			last_overrun;
+	local_t				entries_bytes;
 	local_t				commit_overrun;
 	local_t				overrun;
 	local_t				entries;
 	local_t				committing;
 	local_t				commits;
 	unsigned long			read;
+	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
 };
@@ -1708,6 +1710,7 @@ rb_handle_head_page(struct ring_buffer_per_cpu *cpu_buffer,
 		 * the counters.
 		 */
 		local_add(entries, &cpu_buffer->overrun);
+		local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
 
 		/*
 		 * The entries will be zeroed out when we move the
@@ -1863,6 +1866,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	event = __rb_page_index(tail_page, tail);
 	kmemcheck_annotate_bitfield(event, bitfield);
 
+	/* account for padding bytes */
+	local_add(BUF_PAGE_SIZE - tail, &cpu_buffer->entries_bytes);
+
 	/*
 	 * Save the original length to the meta data.
 	 * This will be used by the reader to add lost event
@@ -2054,6 +2060,9 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	if (!tail)
 		tail_page->page->time_stamp = ts;
 
+	/* account for these added bytes */
+	local_add(length, &cpu_buffer->entries_bytes);
+
 	return event;
 }
 
@@ -2076,6 +2085,7 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
 	if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
 		unsigned long write_mask =
 			local_read(&bpage->write) & ~RB_WRITE_MASK;
+		unsigned long event_length = rb_event_length(event);
 		/*
 		 * This is on the tail page. It is possible that
 		 * a write could come in and move the tail page
@@ -2085,8 +2095,11 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
 		old_index += write_mask;
 		new_index += write_mask;
 		index = local_cmpxchg(&bpage->write, old_index, new_index);
-		if (index == old_index)
+		if (index == old_index) {
+			/* update counters */
+			local_sub(event_length, &cpu_buffer->entries_bytes);
 			return 1;
+		}
 	}
 
 	/* could not discard */
@@ -2661,6 +2674,58 @@ rb_num_of_entries(struct ring_buffer_per_cpu *cpu_buffer)
 }
 
 /**
+ * ring_buffer_oldest_event_ts - get the oldest event timestamp from the buffer
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to read from.
+ */
+unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
+{
+	unsigned long flags;
+	struct ring_buffer_per_cpu *cpu_buffer;
+	struct buffer_page *bpage;
+	unsigned long ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+	/*
+	 * if the tail is on reader_page, oldest time stamp is on the reader
+	 * page
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		bpage = cpu_buffer->reader_page;
+	else
+		bpage = rb_set_head_page(cpu_buffer);
+	ret = bpage->page->time_stamp;
+	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_oldest_event_ts);
+
+/**
+ * ring_buffer_bytes_cpu - get the number of bytes consumed in a cpu buffer
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to read from.
+ */
+unsigned long ring_buffer_bytes_cpu(struct ring_buffer *buffer, int cpu)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+	unsigned long ret;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	ret = local_read(&cpu_buffer->entries_bytes) - cpu_buffer->read_bytes;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_bytes_cpu);
+
+/**
  * ring_buffer_entries_cpu - get the number of entries in a cpu buffer
  * @buffer: The ring buffer
  * @cpu: The per CPU buffer to get the entries from.
@@ -3527,11 +3592,13 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->reader_page->read = 0;
 
 	local_set(&cpu_buffer->commit_overrun, 0);
+	local_set(&cpu_buffer->entries_bytes, 0);
 	local_set(&cpu_buffer->overrun, 0);
 	local_set(&cpu_buffer->entries, 0);
 	local_set(&cpu_buffer->committing, 0);
 	local_set(&cpu_buffer->commits, 0);
 	cpu_buffer->read = 0;
+	cpu_buffer->read_bytes = 0;
 
 	cpu_buffer->write_stamp = 0;
 	cpu_buffer->read_stamp = 0;
@@ -3918,6 +3985,7 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	} else {
 		/* update the entry counter */
 		cpu_buffer->read += rb_page_entries(reader);
+		cpu_buffer->read_bytes += BUF_PAGE_SIZE;
 
 		/* swap the pages */
 		rb_init_page(bpage);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0117678..b419070 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4056,6 +4056,8 @@ tracing_stats_read(struct file *filp, char __user *ubuf,
 	struct trace_array *tr = &global_trace;
 	struct trace_seq *s;
 	unsigned long cnt;
+	unsigned long long t;
+	unsigned long usec_rem;
 
 	s = kmalloc(sizeof(*s), GFP_KERNEL);
 	if (!s)
@@ -4072,6 +4074,17 @@ tracing_stats_read(struct file *filp, char __user *ubuf,
 	cnt = ring_buffer_commit_overrun_cpu(tr->buffer, cpu);
 	trace_seq_printf(s, "commit overrun: %ld\n", cnt);
 
+	cnt = ring_buffer_bytes_cpu(tr->buffer, cpu);
+	trace_seq_printf(s, "bytes: %ld\n", cnt);
+
+	t = ns2usecs(ring_buffer_oldest_event_ts(tr->buffer, cpu));
+	usec_rem = do_div(t, USEC_PER_SEC);
+	trace_seq_printf(s, "oldest event ts: %5llu.%06lu\n", t, usec_rem);
+
+	t = ns2usecs(ring_buffer_time_stamp(tr->buffer, cpu));
+	usec_rem = do_div(t, USEC_PER_SEC);
+	trace_seq_printf(s, "now ts: %5llu.%06lu\n", t, usec_rem);
+
 	count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->len);
 
 	kfree(s);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 3/5] trace: Add per_cpu ring buffer control files
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (7 preceding siblings ...)
  2011-08-16 21:46 ` [PATCH v2 2/5] trace: Add ring buffer stats to measure rate of events Vaibhav Nagarnaik
@ 2011-08-16 21:46 ` Vaibhav Nagarnaik
  2011-08-22 20:29   ` Steven Rostedt
                     ` (2 more replies)
  2011-08-16 21:46 ` [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
  2011-08-16 21:46 ` [PATCH v2 " Vaibhav Nagarnaik
  10 siblings, 3 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-16 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 include/linux/ring_buffer.h |    6 +-
 kernel/trace/ring_buffer.c  |  221 +++++++++++++++++++++++--------------------
 kernel/trace/trace.c        |  185 +++++++++++++++++++++++++++++-------
 kernel/trace/trace.h        |    2 +-
 4 files changed, 272 insertions(+), 142 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 67be037..ad36702 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+#define RING_BUFFER_ALL_CPUS -1
+
 void ring_buffer_free(struct ring_buffer *buffer);
 
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
 
 void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
 
@@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
 void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
 int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
 
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
 
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index acf6b68..a627680 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
 	spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
+	unsigned int			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
 	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
+	/* ring buffer pages to update, > 0 to add, < 0 to remove */
+	int				nr_pages_to_update;
+	struct list_head		new_pages; /* new pages to add */
 };
 
 struct ring_buffer {
-	unsigned			pages;
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
@@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
+static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
 {
+	int i;
 	struct buffer_page *bpage, *tmp;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    GFP_KERNEL | __GFP_NORETRY,
-				    cpu_to_node(cpu_buffer->cpu));
+				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
 
-		rb_check_bpage(cpu_buffer, bpage);
-
-		list_add(&bpage->list, &pages);
+		list_add(&bpage->list, pages);
 
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+		page = alloc_pages_node(cpu_to_node(cpu),
 					GFP_KERNEL | __GFP_NORETRY, 0);
 		if (!page)
 			goto free_pages;
@@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_init_page(bpage->page);
 	}
 
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+
+	return -ENOMEM;
+}
+
+static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	LIST_HEAD(pages);
+
+	WARN_ON(!nr_pages);
+
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+		return -ENOMEM;
+
 	/*
 	 * The ring buffer page list is a circular list that does not
 	 * start and end with a list head. All page list items point to
@@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);
 
+	cpu_buffer->nr_pages = nr_pages;
+
 	rb_check_pages(cpu_buffer);
 
 	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
 }
 
 static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
+rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
@@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	ret = rb_allocate_pages(cpu_buffer, nr_pages);
 	if (ret < 0)
 		goto fail_free_reader;
 
@@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 {
 	struct ring_buffer *buffer;
 	int bsize;
-	int cpu;
+	int cpu, nr_pages;
 
 	/* keep it in its own cache line */
 	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
 
 	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
+	if (nr_pages < 2)
+		nr_pages = 2;
 
 	/*
 	 * In case of non-hotplug cpu, if the ring-buffer is allocated
@@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 
 	for_each_buffer_cpu(buffer, cpu) {
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu])
 			goto fail_free_buffers;
 	}
@@ -1308,6 +1321,17 @@ out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
+static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	if (cpu_buffer->nr_pages_to_update > 0)
+		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
+				cpu_buffer->nr_pages_to_update);
+	else
+		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+	/* reset this value */
+	cpu_buffer->nr_pages_to_update = 0;
+}
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -1317,14 +1341,12 @@ out:
  *
  * Returns -1 on failure.
  */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
+			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	LIST_HEAD(pages);
-	int i, cpu;
+	unsigned nr_pages;
+	int cpu;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1334,15 +1356,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
 
 	/* we need a minimum of two pages */
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	if (size == buffer_size)
-		return size;
-
 	atomic_inc(&buffer->record_disabled);
 
 	/* Make sure all writers are done with this buffer. */
@@ -1353,68 +1371,59 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	if (size < buffer_size) {
+	if (cpu_id == RING_BUFFER_ALL_CPUS) {
+		/* calculate the pages to update */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+
+			cpu_buffer->nr_pages_to_update = nr_pages -
+							cpu_buffer->nr_pages;
 
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
+			/*
+			 * nothing more to do for removing pages or no update
+			 */
+			if (cpu_buffer->nr_pages_to_update <= 0)
+				continue;
 
-		rm_pages = buffer->pages - nr_pages;
+			/*
+			 * to add pages, make sure all new pages can be
+			 * allocated without receiving ENOMEM
+			 */
+			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu))
+				/* not enough memory for new pages */
+				goto no_mem;
+		}
 
+		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
+			if (cpu_buffer->nr_pages_to_update) {
+				update_pages_handler(cpu_buffer);
+				cpu_buffer->nr_pages = nr_pages;
+			}
 		}
-		goto out;
-	}
+	} else {
+		cpu_buffer = buffer->buffers[cpu_id];
+		if (nr_pages == cpu_buffer->nr_pages)
+			goto out;
 
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
+		cpu_buffer->nr_pages_to_update = nr_pages -
+						cpu_buffer->nr_pages;
 
-	new_pages = nr_pages - buffer->pages;
+		INIT_LIST_HEAD(&cpu_buffer->new_pages);
+		if (cpu_buffer->nr_pages_to_update > 0 &&
+			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu_id))
+			goto no_mem;
 
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			struct page *page;
-			/*
-			 * __GFP_NORETRY flag makes sure that the allocation
-			 * fails gracefully without invoking oom-killer and
-			 * the system is not destabilized.
-			 */
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL | __GFP_NORETRY,
-					    cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			page = alloc_pages_node(cpu_to_node(cpu),
-						GFP_KERNEL | __GFP_NORETRY, 0);
-			if (!page)
-				goto free_pages;
-			bpage->page = page_address(page);
-			rb_init_page(bpage->page);
-		}
-	}
+		update_pages_handler(cpu_buffer);
 
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
+		cpu_buffer->nr_pages = nr_pages;
 	}
 
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
-
  out:
-	buffer->pages = nr_pages;
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 
@@ -1422,25 +1431,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	return size;
 
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+ no_mem:
+	for_each_buffer_cpu(buffer, cpu) {
+		struct buffer_page *bpage, *tmp;
+		cpu_buffer = buffer->buffers[cpu];
+		/* reset this number regardless */
+		cpu_buffer->nr_pages_to_update = 0;
+		if (list_empty(&cpu_buffer->new_pages))
+			continue;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
 	}
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1542,7 +1550,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 	 * assign the commit to the tail.
 	 */
  again:
-	max_count = cpu_buffer->buffer->pages * 100;
+	max_count = cpu_buffer->nr_pages * 100;
 
 	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
 		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
@@ -3563,9 +3571,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
  * ring_buffer_size - return the size of the ring buffer (in bytes)
  * @buffer: The ring buffer.
  */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
 {
-	return BUF_PAGE_SIZE * buffer->pages;
+	/*
+	 * Earlier, this method returned
+	 *	BUF_PAGE_SIZE * buffer->nr_pages
+	 * Since the nr_pages field is now removed, we have converted this to
+	 * return the per cpu buffer value.
+	 */
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_size);
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b419070..305832a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2853,7 +2853,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
 	return t->init(tr);
 }
 
-static int __tracing_resize_ring_buffer(unsigned long size)
+static void set_buffer_entries(struct trace_array *tr, unsigned long val)
+{
+	int cpu;
+	for_each_tracing_cpu(cpu)
+		tr->data[cpu]->entries = val;
+}
+
+static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 {
 	int ret;
 
@@ -2864,19 +2871,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 	 */
 	ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
 	if (ret < 0)
 		return ret;
 
 	if (!current_trace->use_max_tr)
 		goto out;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
 	if (ret < 0) {
-		int r;
+		int r = 0;
+
+		if (cpu == RING_BUFFER_ALL_CPUS) {
+			int i;
+			for_each_tracing_cpu(i) {
+				r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[i]->entries,
+						i);
+				if (r < 0)
+					break;
+			}
+		} else {
+			r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+		}
 
-		r = ring_buffer_resize(global_trace.buffer,
-				       global_trace.entries);
 		if (r < 0) {
 			/*
 			 * AARGH! We are left with different
@@ -2898,14 +2918,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 		return ret;
 	}
 
-	max_tr.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&max_tr, size);
+	else
+		max_tr.data[cpu]->entries = size;
+
  out:
-	global_trace.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&global_trace, size);
+	else
+		global_trace.data[cpu]->entries = size;
 
 	return ret;
 }
 
-static ssize_t tracing_resize_ring_buffer(unsigned long size)
+static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
 	int cpu, ret = size;
 
@@ -2921,12 +2948,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
 			atomic_inc(&max_tr.data[cpu]->disabled);
 	}
 
-	if (size != global_trace.entries)
-		ret = __tracing_resize_ring_buffer(size);
+	if (cpu_id != RING_BUFFER_ALL_CPUS) {
+		/* make sure, this cpu is enabled in the mask */
+		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
 
+	ret = __tracing_resize_ring_buffer(size, cpu_id);
 	if (ret < 0)
 		ret = -ENOMEM;
 
+out:
 	for_each_tracing_cpu(cpu) {
 		if (global_trace.data[cpu])
 			atomic_dec(&global_trace.data[cpu]->disabled);
@@ -2957,7 +2991,8 @@ int tracing_update_buffers(void)
 
 	mutex_lock(&trace_types_lock);
 	if (!ring_buffer_expanded)
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -2981,7 +3016,8 @@ static int tracing_set_tracer(const char *buf)
 	mutex_lock(&trace_types_lock);
 
 	if (!ring_buffer_expanded) {
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3007,8 +3043,8 @@ static int tracing_set_tracer(const char *buf)
 		 * The max_tr ring buffer has some state (e.g. ring->clock) and
 		 * we want preserve it.
 		 */
-		ring_buffer_resize(max_tr.buffer, 1);
-		max_tr.entries = 1;
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
 	}
 	destroy_trace_option_files(topts);
 
@@ -3016,10 +3052,17 @@ static int tracing_set_tracer(const char *buf)
 
 	topts = create_trace_option_files(current_trace);
 	if (current_trace->use_max_tr) {
-		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
-		if (ret < 0)
-			goto out;
-		max_tr.entries = global_trace.entries;
+		int cpu;
+		/* we need to make per cpu buffer sizes equivalent */
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				goto out;
+			max_tr.data[cpu]->entries =
+					global_trace.data[cpu]->entries;
+		}
 	}
 
 	if (t->init) {
@@ -3521,30 +3564,82 @@ out_err:
 	goto out;
 }
 
+struct ftrace_entries_info {
+	struct trace_array	*tr;
+	int			cpu;
+};
+
+static int tracing_entries_open(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info;
+
+	if (tracing_disabled)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->tr = &global_trace;
+	info->cpu = (unsigned long)inode->i_private;
+
+	filp->private_data = info;
+
+	return 0;
+}
+
 static ssize_t
 tracing_entries_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	struct trace_array *tr = filp->private_data;
-	char buf[96];
-	int r;
+	struct ftrace_entries_info *info = filp->private_data;
+	struct trace_array *tr = info->tr;
+	char buf[64];
+	int r = 0;
+	ssize_t ret;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		r = sprintf(buf, "%lu (expanded: %lu)\n",
-			    tr->entries >> 10,
-			    trace_buf_size >> 10);
-	else
-		r = sprintf(buf, "%lu\n", tr->entries >> 10);
+
+	if (info->cpu == RING_BUFFER_ALL_CPUS) {
+		int cpu, buf_size_same;
+		unsigned long size;
+
+		size = 0;
+		buf_size_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_tracing_cpu(cpu) {
+			/* fill in the size from first enabled cpu */
+			if (size == 0)
+				size = tr->data[cpu]->entries;
+			if (size != tr->data[cpu]->entries) {
+				buf_size_same = 0;
+				break;
+			}
+		}
+
+		if (buf_size_same) {
+			if (!ring_buffer_expanded)
+				r = sprintf(buf, "%lu (expanded: %lu)\n",
+					    size >> 10,
+					    trace_buf_size >> 10);
+			else
+				r = sprintf(buf, "%lu\n", size >> 10);
+		} else
+			r = sprintf(buf, "X\n");
+	} else
+		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
+
 	mutex_unlock(&trace_types_lock);
 
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	return ret;
 }
 
 static ssize_t
 tracing_entries_write(struct file *filp, const char __user *ubuf,
 		      size_t cnt, loff_t *ppos)
 {
+	struct ftrace_entries_info *info = filp->private_data;
 	unsigned long val;
 	int ret;
 
@@ -3559,7 +3654,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	/* value is in KB */
 	val <<= 10;
 
-	ret = tracing_resize_ring_buffer(val);
+	ret = tracing_resize_ring_buffer(val, info->cpu);
 	if (ret < 0)
 		return ret;
 
@@ -3568,6 +3663,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static int
+tracing_entries_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info = filp->private_data;
+
+	kfree(info);
+
+	return 0;
+}
+
 static ssize_t
 tracing_total_entries_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos)
@@ -3579,7 +3684,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu) {
-		size += tr->entries >> 10;
+		size += tr->data[cpu]->entries >> 10;
 		if (!ring_buffer_expanded)
 			expanded_size += trace_buf_size >> 10;
 	}
@@ -3613,7 +3718,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
 		tracing_off();
 	/* resize the ring buffer to 0 */
-	tracing_resize_ring_buffer(0);
+	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
 
 	return 0;
 }
@@ -3757,9 +3862,10 @@ static const struct file_operations tracing_pipe_fops = {
 };
 
 static const struct file_operations tracing_entries_fops = {
-	.open		= tracing_open_generic,
+	.open		= tracing_entries_open,
 	.read		= tracing_entries_read,
 	.write		= tracing_entries_write,
+	.release	= tracing_entries_release,
 	.llseek		= generic_file_llseek,
 };
 
@@ -4211,6 +4317,9 @@ static void tracing_init_debugfs_percpu(long cpu)
 
 	trace_create_file("stats", 0444, d_cpu,
 			(void *) cpu, &tracing_stats_fops);
+
+	trace_create_file("buffer_size_kb", 0444, d_cpu,
+			(void *) cpu, &tracing_entries_fops);
 }
 
 #ifdef CONFIG_FTRACE_SELFTEST
@@ -4491,7 +4600,7 @@ static __init int tracer_init_debugfs(void)
 			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
 
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
-			&global_trace, &tracing_entries_fops);
+			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
 
 	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
 			&global_trace, &tracing_total_entries_fops);
@@ -4737,8 +4846,6 @@ __init static int tracer_alloc_buffers(void)
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
-
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 	max_tr.buffer = ring_buffer_alloc(1, rb_flags);
@@ -4748,7 +4855,6 @@ __init static int tracer_alloc_buffers(void)
 		ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = 1;
 #endif
 
 	/* Allocate the first page for all buffers */
@@ -4757,6 +4863,11 @@ __init static int tracer_alloc_buffers(void)
 		max_tr.data[i] = &per_cpu(max_tr_data, i);
 	}
 
+	set_buffer_entries(&global_trace, ring_buf_size);
+#ifdef CONFIG_TRACER_MAX_TRACE
+	set_buffer_entries(&max_tr, 1);
+#endif
+
 	trace_init_cmdlines();
 
 	register_tracer(&nop_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 616846b..126d333 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -125,6 +125,7 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
+	unsigned long		entries;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
@@ -146,7 +147,6 @@ struct trace_array_cpu {
  */
 struct trace_array {
 	struct ring_buffer	*buffer;
-	unsigned long		entries;
 	int			cpu;
 	cycle_t			time_start;
 	struct task_struct	*waiter;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (8 preceding siblings ...)
  2011-08-16 21:46 ` [PATCH v2 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
@ 2011-08-16 21:46 ` Vaibhav Nagarnaik
  2011-08-23  3:27   ` Steven Rostedt
                     ` (2 more replies)
  2011-08-16 21:46 ` [PATCH v2 " Vaibhav Nagarnaik
  10 siblings, 3 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-16 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v2-v1:
* The earlier patch removed pages after the tail page by using cmpxchg()
  operations, which were identified as racey by Steven Rostedt. Now, the
  logic is changed to stop tracing till all the pages are identified and
  unlinked, to remove the race with the writer.

 kernel/trace/ring_buffer.c |  207 +++++++++++++++++++++++++++++++++-----------
 kernel/trace/trace.c       |   20 +----
 2 files changed, 156 insertions(+), 71 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a627680..1c86065 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -502,6 +504,8 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
@@ -1080,6 +1084,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1267,32 +1273,107 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	unsigned int nr_removed;
+	int page_entries;
+	struct list_head *tail_page, *to_remove, *next_page;
+	unsigned long head_bit;
+	struct buffer_page *last_page, *first_page;
+	struct buffer_page *to_remove_page, *tmp_iter_page;
 
+	head_bit = 0;
 	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
-
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+	atomic_inc(&cpu_buffer->record_disabled);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock. We also don't race with writers after disabling recording.
+	 * This makes it easy to figure out the first and the last page to be
+	 * removed from the list. We remove all the pages in between including
+	 * the first and last pages. This is done in a busy loop so that we
+	 * lose the least number of traces.
+	 * The pages are freed after we restart recording and unlock readers.
+	 */
+	tail_page = &cpu_buffer->tail_page->list;
+	/*
+	 * tail page might be on reader page, we remove the next page
+	 * from the ring buffer
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		tail_page = rb_list_head(tail_page->next);
+	to_remove = tail_page;
+
+	/* start of pages to remove */
+	first_page = list_entry(rb_list_head(to_remove->next),
+				struct buffer_page, list);
+	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
+		to_remove = rb_list_head(to_remove)->next;
+		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
 	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
-
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
+	next_page = rb_list_head(to_remove)->next;
+	/* now we remove all pages between tail_page and next_page */
+	tail_page->next = (struct list_head *)((unsigned long)next_page |
+						head_bit);
+	next_page = rb_list_head(next_page);
+	next_page->prev = tail_page;
+	/* make sure pages points to a valid page in the ring buffer */
+	cpu_buffer->pages = next_page;
+	/* update head page */
+	if (head_bit)
+		cpu_buffer->head_page = list_entry(next_page,
+						struct buffer_page, list);
+	/*
+	 * change read pointer to make sure any read iterators reset
+	 * themselves
+	 */
+	cpu_buffer->read = 0;
+	/* pages are removed, resume tracing and then free the pages */
+	atomic_dec(&cpu_buffer->record_disabled);
 	spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
+
+	/* last buffer page to remove */
+	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
+				list);
+	tmp_iter_page = first_page;
+	do {
+		to_remove_page = tmp_iter_page;
+		rb_inc_page(cpu_buffer, &tmp_iter_page);
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		nr_removed--;
+		free_buffer_page(to_remove_page);
+	} while (to_remove_page != last_page);
+
+	RB_WARN_ON(cpu_buffer, nr_removed);
 }
 
 static void
@@ -1303,6 +1384,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	struct list_head *p;
 	unsigned i;
 
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	/* Make sure all writers are done with this buffer. */
+	synchronize_sched();
+
 	spin_lock_irq(&cpu_buffer->reader_lock);
 	rb_head_page_deactivate(cpu_buffer);
 
@@ -1319,17 +1406,21 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
+	atomic_dec(&cpu_buffer->record_disabled);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void update_pages_handler(struct work_struct *work)
 {
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1339,7 +1430,7 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
  *
  * Minimum size is 2 * BUF_PAGE_SIZE.
  *
- * Returns -1 on failure.
+ * Returns 0 on success and < 0 on failure.
  */
 int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			int cpu_id)
@@ -1361,21 +1452,28 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
-
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
+	/*
+	 * Don't succeed if recording is disabled globally, as a reader might
+	 * be manipulating the ring buffer and is expecting a sane state while
+	 * this is true.
+	 */
+	if (atomic_read(&buffer->record_disabled))
+		return -EBUSY;
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
+			if (atomic_read(&cpu_buffer->record_disabled)) {
+				err = -EBUSY;
+				goto out_err;
+			}
+
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 
@@ -1396,16 +1494,31 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 				goto no_mem;
 		}
 
+		/* fire off all the required work handlers */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
+		}
+
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-				cpu_buffer->nr_pages = nr_pages;
-			}
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			wait_for_completion(&cpu_buffer->update_completion);
+			cpu_buffer->nr_pages = nr_pages;
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+		if (atomic_read(&cpu_buffer->record_disabled)) {
+			err = -EBUSY;
+			goto out_err;
+		}
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1418,36 +1531,36 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 						&cpu_buffer->new_pages, cpu_id))
 			goto no_mem;
 
-		update_pages_handler(cpu_buffer);
+		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
+		wait_for_completion(&cpu_buffer->update_completion);
 
 		cpu_buffer->nr_pages = nr_pages;
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
  no_mem:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
 		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
@@ -1487,21 +1600,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 305832a..908cecc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2934,20 +2934,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -2961,14 +2951,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v2 5/5] trace: Make addition of pages in ring buffer atomic
  2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
                   ` (9 preceding siblings ...)
  2011-08-16 21:46 ` [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
@ 2011-08-16 21:46 ` Vaibhav Nagarnaik
  10 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-16 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/ring_buffer.c |   97 ++++++++++++++++++++++++++++++--------------
 1 files changed, 66 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1c86065..90605d1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1377,36 +1377,67 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 }
 
 static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
+	spin_lock_irq(&cpu_buffer->reader_lock);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 10 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 10;
+	success = 0;
+	while (retries--) {
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *head_page_with_bit;
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
 
-	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+		first_page = pages->next;
+		last_page  = pages->prev;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+		head_page_with_bit = (struct list_head *)
+				((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next  = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	spin_unlock_irq(&cpu_buffer->reader_lock);
-	atomic_dec(&cpu_buffer->record_disabled);
 }
 
 static void update_pages_handler(struct work_struct *work)
@@ -1415,8 +1446,7 @@ static void update_pages_handler(struct work_struct *work)
 			struct ring_buffer_per_cpu, update_pages_work);
 
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		rb_insert_pages(cpu_buffer);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
 
@@ -1437,7 +1467,7 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1489,9 +1519,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
 		}
 
 		/* fire off all the required work handlers */
@@ -1528,8 +1560,10 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					&cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
 
 		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
 		wait_for_completion(&cpu_buffer->update_completion);
@@ -1543,7 +1577,7 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	mutex_unlock(&buffer->mutex);
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
 
@@ -1561,7 +1595,7 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		}
 	}
 	mutex_unlock(&buffer->mutex);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -3706,6 +3740,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 3/5] trace: Add per_cpu ring buffer control files
  2011-08-16 21:46 ` [PATCH v2 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
@ 2011-08-22 20:29   ` Steven Rostedt
  2011-08-22 20:36     ` Vaibhav Nagarnaik
  2011-08-22 22:09   ` [PATCH v3] " Vaibhav Nagarnaik
  2011-08-23  1:17   ` Vaibhav Nagarnaik
  2 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-08-22 20:29 UTC (permalink / raw)
  To: Vaibhav Nagarnaik; +Cc: Michael Rubin, David Sharp, linux-kernel

On Tue, 2011-08-16 at 14:46 -0700, Vaibhav Nagarnaik wrote:
> Add a debugfs entry under per_cpu/ folder for each cpu called
> buffer_size_kb to control the ring buffer size for each CPU
> independently.
> 
> If the global file buffer_size_kb is used to set size, the individual
> ring buffers will be adjusted to the given size. The buffer_size_kb will
> report the common size to maintain backward compatibility.
> 
> If the buffer_size_kb file under the per_cpu/ directory is used to
> change buffer size for a specific CPU, only the size of the respective
> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
> 'X' to indicate that sizes of per_cpu ring buffers are not equivalent.
> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>

This patch wasn't tested against any of the latency tracers being
enabled or hot plug:


/home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c: In function 'ring_buffer_swap_cpu':
/home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:3761:14: error: 'struct ring_buffer' has no member named 'pages'
/home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:3761:33: error: 'struct ring_buffer' has no member named 'pages'
  CC      kernel/trace/trace_syscalls.o
/home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c: In function 'rb_cpu_notify':
/home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:4136:4: error: too few arguments to function 'rb_allocate_cpu_buffer'
/home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:1066:1: note: declared here


-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 3/5] trace: Add per_cpu ring buffer control files
  2011-08-22 20:29   ` Steven Rostedt
@ 2011-08-22 20:36     ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-22 20:36 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Michael Rubin, David Sharp, linux-kernel

On Mon, Aug 22, 2011 at 1:29 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-08-16 at 14:46 -0700, Vaibhav Nagarnaik wrote:
>> Add a debugfs entry under per_cpu/ folder for each cpu called
>> buffer_size_kb to control the ring buffer size for each CPU
>> independently.
>>
>> If the global file buffer_size_kb is used to set size, the individual
>> ring buffers will be adjusted to the given size. The buffer_size_kb will
>> report the common size to maintain backward compatibility.
>>
>> If the buffer_size_kb file under the per_cpu/ directory is used to
>> change buffer size for a specific CPU, only the size of the respective
>> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
>> 'X' to indicate that sizes of per_cpu ring buffers are not equivalent.
>>
>> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
>
> This patch wasn't tested against any of the latency tracers being
> enabled or hot plug:
>
>
> /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c: In function 'ring_buffer_swap_cpu':
> /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:3761:14: error: 'struct ring_buffer' has no member named 'pages'
> /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:3761:33: error: 'struct ring_buffer' has no member named 'pages'
>  CC      kernel/trace/trace_syscalls.o
> /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c: In function 'rb_cpu_notify':
> /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:4136:4: error: too few arguments to function 'rb_allocate_cpu_buffer'
> /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/ring_buffer.c:1066:1: note: declared here
>
>
> -- Steve

Oops, sorry about that. I will send a respin of the patch.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v3] trace: Add per_cpu ring buffer control files
  2011-08-16 21:46 ` [PATCH v2 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
  2011-08-22 20:29   ` Steven Rostedt
@ 2011-08-22 22:09   ` Vaibhav Nagarnaik
  2011-08-23  0:49     ` Steven Rostedt
  2011-08-23  1:17   ` Vaibhav Nagarnaik
  2 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-22 22:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v3-v2:
* Fix compilation errors when using allyesconfig.

 include/linux/ring_buffer.h |    6 +-
 kernel/trace/ring_buffer.c  |  250 ++++++++++++++++++++++++------------------
 kernel/trace/trace.c        |  191 ++++++++++++++++++++++++++-------
 kernel/trace/trace.h        |    2 +-
 4 files changed, 300 insertions(+), 149 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 67be037..ad36702 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+#define RING_BUFFER_ALL_CPUS -1
+
 void ring_buffer_free(struct ring_buffer *buffer);
 
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
 
 void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
 
@@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
 void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
 int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
 
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
 
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index acf6b68..cc11be3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
 	spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
+	unsigned int			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
 	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
+	/* ring buffer pages to update, > 0 to add, < 0 to remove */
+	int				nr_pages_to_update;
+	struct list_head		new_pages; /* new pages to add */
 };
 
 struct ring_buffer {
-	unsigned			pages;
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
@@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
+static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
 {
+	int i;
 	struct buffer_page *bpage, *tmp;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    GFP_KERNEL | __GFP_NORETRY,
-				    cpu_to_node(cpu_buffer->cpu));
+				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
 
-		rb_check_bpage(cpu_buffer, bpage);
+		list_add(&bpage->list, pages);
 
-		list_add(&bpage->list, &pages);
-
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+		page = alloc_pages_node(cpu_to_node(cpu),
 					GFP_KERNEL | __GFP_NORETRY, 0);
 		if (!page)
 			goto free_pages;
@@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_init_page(bpage->page);
 	}
 
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+
+	return -ENOMEM;
+}
+
+static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	LIST_HEAD(pages);
+
+	WARN_ON(!nr_pages);
+
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+		return -ENOMEM;
+
 	/*
 	 * The ring buffer page list is a circular list that does not
 	 * start and end with a list head. All page list items point to
@@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);
 
+	cpu_buffer->nr_pages = nr_pages;
+
 	rb_check_pages(cpu_buffer);
 
 	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
 }
 
 static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
+rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
@@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	ret = rb_allocate_pages(cpu_buffer, nr_pages);
 	if (ret < 0)
 		goto fail_free_reader;
 
@@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 {
 	struct ring_buffer *buffer;
 	int bsize;
-	int cpu;
+	int cpu, nr_pages;
 
 	/* keep it in its own cache line */
 	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
 
 	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
+	if (nr_pages < 2)
+		nr_pages = 2;
 
 	/*
 	 * In case of non-hotplug cpu, if the ring-buffer is allocated
@@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 
 	for_each_buffer_cpu(buffer, cpu) {
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu])
 			goto fail_free_buffers;
 	}
@@ -1308,6 +1321,17 @@ out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
+static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	if (cpu_buffer->nr_pages_to_update > 0)
+		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
+				cpu_buffer->nr_pages_to_update);
+	else
+		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+	/* reset this value */
+	cpu_buffer->nr_pages_to_update = 0;
+}
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -1317,14 +1341,12 @@ out:
  *
  * Returns -1 on failure.
  */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
+			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	LIST_HEAD(pages);
-	int i, cpu;
+	unsigned nr_pages;
+	int cpu;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1334,15 +1356,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
 
 	/* we need a minimum of two pages */
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	if (size == buffer_size)
-		return size;
-
 	atomic_inc(&buffer->record_disabled);
 
 	/* Make sure all writers are done with this buffer. */
@@ -1353,68 +1371,59 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	if (size < buffer_size) {
+	if (cpu_id == RING_BUFFER_ALL_CPUS) {
+		/* calculate the pages to update */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
 
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
+			cpu_buffer->nr_pages_to_update = nr_pages -
+							cpu_buffer->nr_pages;
 
-		rm_pages = buffer->pages - nr_pages;
+			/*
+			 * nothing more to do for removing pages or no update
+			 */
+			if (cpu_buffer->nr_pages_to_update <= 0)
+				continue;
 
+			/*
+			 * to add pages, make sure all new pages can be
+			 * allocated without receiving ENOMEM
+			 */
+			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu))
+				/* not enough memory for new pages */
+				goto no_mem;
+		}
+
+		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
+			if (cpu_buffer->nr_pages_to_update) {
+				update_pages_handler(cpu_buffer);
+				cpu_buffer->nr_pages = nr_pages;
+			}
 		}
-		goto out;
-	}
+	} else {
+		cpu_buffer = buffer->buffers[cpu_id];
+		if (nr_pages == cpu_buffer->nr_pages)
+			goto out;
 
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
+		cpu_buffer->nr_pages_to_update = nr_pages -
+						cpu_buffer->nr_pages;
 
-	new_pages = nr_pages - buffer->pages;
+		INIT_LIST_HEAD(&cpu_buffer->new_pages);
+		if (cpu_buffer->nr_pages_to_update > 0 &&
+			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu_id))
+			goto no_mem;
 
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			struct page *page;
-			/*
-			 * __GFP_NORETRY flag makes sure that the allocation
-			 * fails gracefully without invoking oom-killer and
-			 * the system is not destabilized.
-			 */
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL | __GFP_NORETRY,
-					    cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			page = alloc_pages_node(cpu_to_node(cpu),
-						GFP_KERNEL | __GFP_NORETRY, 0);
-			if (!page)
-				goto free_pages;
-			bpage->page = page_address(page);
-			rb_init_page(bpage->page);
-		}
-	}
+		update_pages_handler(cpu_buffer);
 
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
+		cpu_buffer->nr_pages = nr_pages;
 	}
 
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
-
  out:
-	buffer->pages = nr_pages;
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 
@@ -1422,25 +1431,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	return size;
 
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+ no_mem:
+	for_each_buffer_cpu(buffer, cpu) {
+		struct buffer_page *bpage, *tmp;
+		cpu_buffer = buffer->buffers[cpu];
+		/* reset this number regardless */
+		cpu_buffer->nr_pages_to_update = 0;
+		if (list_empty(&cpu_buffer->new_pages))
+			continue;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
 	}
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1542,7 +1550,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 	 * assign the commit to the tail.
 	 */
  again:
-	max_count = cpu_buffer->buffer->pages * 100;
+	max_count = cpu_buffer->nr_pages * 100;
 
 	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
 		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
@@ -3563,9 +3571,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
  * ring_buffer_size - return the size of the ring buffer (in bytes)
  * @buffer: The ring buffer.
  */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
 {
-	return BUF_PAGE_SIZE * buffer->pages;
+	/*
+	 * Earlier, this method returned
+	 *	BUF_PAGE_SIZE * buffer->nr_pages
+	 * Since the nr_pages field is now removed, we have converted this to
+	 * return the per cpu buffer value.
+	 */
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_size);
 
@@ -3740,10 +3757,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
 		goto out;
 
-	/* At least make sure the two buffers are somewhat the same */
-	if (buffer_a->pages != buffer_b->pages)
-		goto out;
-
 	ret = -EAGAIN;
 
 	if (ring_buffer_flags != RB_BUFFERS_ON)
@@ -3758,6 +3771,12 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	cpu_buffer_a = buffer_a->buffers[cpu];
 	cpu_buffer_b = buffer_b->buffers[cpu];
 
+	ret = -EINVAL;
+	/* At least make sure the two buffers are somewhat the same */
+	if (cpu_buffer_a->pages != cpu_buffer_b->pages)
+		goto out;
+
+	ret = -EAGAIN;
 	if (atomic_read(&cpu_buffer_a->record_disabled))
 		goto out;
 
@@ -4108,6 +4127,8 @@ static int rb_cpu_notify(struct notifier_block *self,
 	struct ring_buffer *buffer =
 		container_of(self, struct ring_buffer, cpu_notify);
 	long cpu = (long)hcpu;
+	int cpu_i, nr_pages_same;
+	unsigned int nr_pages;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -4115,8 +4136,23 @@ static int rb_cpu_notify(struct notifier_block *self,
 		if (cpumask_test_cpu(cpu, buffer->cpumask))
 			return NOTIFY_OK;
 
+		nr_pages = 0;
+		nr_pages_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_buffer_cpu(buffer, cpu_i) {
+			/* fill in the size from first enabled cpu */
+			if (nr_pages == 0)
+				nr_pages = buffer->buffers[cpu_i]->nr_pages;
+			if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
+				nr_pages_same = 0;
+				break;
+			}
+		}
+		/* allocate minimum pages, user can later expand it */
+		if (!nr_pages_same)
+			nr_pages = 2;
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu]) {
 			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
 			     cpu);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b419070..bb3c867 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -784,7 +784,8 @@ __acquires(kernel_lock)
 
 		/* If we expanded the buffers, make sure the max is expanded too */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, trace_buf_size);
+			ring_buffer_resize(max_tr.buffer, trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 
 		/* the test is responsible for initializing and enabling */
 		pr_info("Testing tracer %s: ", type->name);
@@ -800,7 +801,8 @@ __acquires(kernel_lock)
 
 		/* Shrink the max buffer again */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, 1);
+			ring_buffer_resize(max_tr.buffer, 1,
+						RING_BUFFER_ALL_CPUS);
 
 		printk(KERN_CONT "PASSED\n");
 	}
@@ -2853,7 +2855,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
 	return t->init(tr);
 }
 
-static int __tracing_resize_ring_buffer(unsigned long size)
+static void set_buffer_entries(struct trace_array *tr, unsigned long val)
+{
+	int cpu;
+	for_each_tracing_cpu(cpu)
+		tr->data[cpu]->entries = val;
+}
+
+static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 {
 	int ret;
 
@@ -2864,19 +2873,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 	 */
 	ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
 	if (ret < 0)
 		return ret;
 
 	if (!current_trace->use_max_tr)
 		goto out;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
 	if (ret < 0) {
-		int r;
+		int r = 0;
+
+		if (cpu == RING_BUFFER_ALL_CPUS) {
+			int i;
+			for_each_tracing_cpu(i) {
+				r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[i]->entries,
+						i);
+				if (r < 0)
+					break;
+			}
+		} else {
+			r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+		}
 
-		r = ring_buffer_resize(global_trace.buffer,
-				       global_trace.entries);
 		if (r < 0) {
 			/*
 			 * AARGH! We are left with different
@@ -2898,14 +2920,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 		return ret;
 	}
 
-	max_tr.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&max_tr, size);
+	else
+		max_tr.data[cpu]->entries = size;
+
  out:
-	global_trace.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&global_trace, size);
+	else
+		global_trace.data[cpu]->entries = size;
 
 	return ret;
 }
 
-static ssize_t tracing_resize_ring_buffer(unsigned long size)
+static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
 	int cpu, ret = size;
 
@@ -2921,12 +2950,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
 			atomic_inc(&max_tr.data[cpu]->disabled);
 	}
 
-	if (size != global_trace.entries)
-		ret = __tracing_resize_ring_buffer(size);
+	if (cpu_id != RING_BUFFER_ALL_CPUS) {
+		/* make sure, this cpu is enabled in the mask */
+		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
 
+	ret = __tracing_resize_ring_buffer(size, cpu_id);
 	if (ret < 0)
 		ret = -ENOMEM;
 
+out:
 	for_each_tracing_cpu(cpu) {
 		if (global_trace.data[cpu])
 			atomic_dec(&global_trace.data[cpu]->disabled);
@@ -2957,7 +2993,8 @@ int tracing_update_buffers(void)
 
 	mutex_lock(&trace_types_lock);
 	if (!ring_buffer_expanded)
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -2981,7 +3018,8 @@ static int tracing_set_tracer(const char *buf)
 	mutex_lock(&trace_types_lock);
 
 	if (!ring_buffer_expanded) {
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3007,8 +3045,8 @@ static int tracing_set_tracer(const char *buf)
 		 * The max_tr ring buffer has some state (e.g. ring->clock) and
 		 * we want preserve it.
 		 */
-		ring_buffer_resize(max_tr.buffer, 1);
-		max_tr.entries = 1;
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
 	}
 	destroy_trace_option_files(topts);
 
@@ -3016,10 +3054,17 @@ static int tracing_set_tracer(const char *buf)
 
 	topts = create_trace_option_files(current_trace);
 	if (current_trace->use_max_tr) {
-		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
-		if (ret < 0)
-			goto out;
-		max_tr.entries = global_trace.entries;
+		int cpu;
+		/* we need to make per cpu buffer sizes equivalent */
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				goto out;
+			max_tr.data[cpu]->entries =
+					global_trace.data[cpu]->entries;
+		}
 	}
 
 	if (t->init) {
@@ -3521,30 +3566,82 @@ out_err:
 	goto out;
 }
 
+struct ftrace_entries_info {
+	struct trace_array	*tr;
+	int			cpu;
+};
+
+static int tracing_entries_open(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info;
+
+	if (tracing_disabled)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->tr = &global_trace;
+	info->cpu = (unsigned long)inode->i_private;
+
+	filp->private_data = info;
+
+	return 0;
+}
+
 static ssize_t
 tracing_entries_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	struct trace_array *tr = filp->private_data;
-	char buf[96];
-	int r;
+	struct ftrace_entries_info *info = filp->private_data;
+	struct trace_array *tr = info->tr;
+	char buf[64];
+	int r = 0;
+	ssize_t ret;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		r = sprintf(buf, "%lu (expanded: %lu)\n",
-			    tr->entries >> 10,
-			    trace_buf_size >> 10);
-	else
-		r = sprintf(buf, "%lu\n", tr->entries >> 10);
+
+	if (info->cpu == RING_BUFFER_ALL_CPUS) {
+		int cpu, buf_size_same;
+		unsigned long size;
+
+		size = 0;
+		buf_size_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_tracing_cpu(cpu) {
+			/* fill in the size from first enabled cpu */
+			if (size == 0)
+				size = tr->data[cpu]->entries;
+			if (size != tr->data[cpu]->entries) {
+				buf_size_same = 0;
+				break;
+			}
+		}
+
+		if (buf_size_same) {
+			if (!ring_buffer_expanded)
+				r = sprintf(buf, "%lu (expanded: %lu)\n",
+					    size >> 10,
+					    trace_buf_size >> 10);
+			else
+				r = sprintf(buf, "%lu\n", size >> 10);
+		} else
+			r = sprintf(buf, "X\n");
+	} else
+		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
+
 	mutex_unlock(&trace_types_lock);
 
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	return ret;
 }
 
 static ssize_t
 tracing_entries_write(struct file *filp, const char __user *ubuf,
 		      size_t cnt, loff_t *ppos)
 {
+	struct ftrace_entries_info *info = filp->private_data;
 	unsigned long val;
 	int ret;
 
@@ -3559,7 +3656,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	/* value is in KB */
 	val <<= 10;
 
-	ret = tracing_resize_ring_buffer(val);
+	ret = tracing_resize_ring_buffer(val, info->cpu);
 	if (ret < 0)
 		return ret;
 
@@ -3568,6 +3665,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static int
+tracing_entries_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info = filp->private_data;
+
+	kfree(info);
+
+	return 0;
+}
+
 static ssize_t
 tracing_total_entries_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos)
@@ -3579,7 +3686,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu) {
-		size += tr->entries >> 10;
+		size += tr->data[cpu]->entries >> 10;
 		if (!ring_buffer_expanded)
 			expanded_size += trace_buf_size >> 10;
 	}
@@ -3613,7 +3720,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
 		tracing_off();
 	/* resize the ring buffer to 0 */
-	tracing_resize_ring_buffer(0);
+	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
 
 	return 0;
 }
@@ -3757,9 +3864,10 @@ static const struct file_operations tracing_pipe_fops = {
 };
 
 static const struct file_operations tracing_entries_fops = {
-	.open		= tracing_open_generic,
+	.open		= tracing_entries_open,
 	.read		= tracing_entries_read,
 	.write		= tracing_entries_write,
+	.release	= tracing_entries_release,
 	.llseek		= generic_file_llseek,
 };
 
@@ -4211,6 +4319,9 @@ static void tracing_init_debugfs_percpu(long cpu)
 
 	trace_create_file("stats", 0444, d_cpu,
 			(void *) cpu, &tracing_stats_fops);
+
+	trace_create_file("buffer_size_kb", 0444, d_cpu,
+			(void *) cpu, &tracing_entries_fops);
 }
 
 #ifdef CONFIG_FTRACE_SELFTEST
@@ -4491,7 +4602,7 @@ static __init int tracer_init_debugfs(void)
 			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
 
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
-			&global_trace, &tracing_entries_fops);
+			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
 
 	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
 			&global_trace, &tracing_total_entries_fops);
@@ -4737,8 +4848,6 @@ __init static int tracer_alloc_buffers(void)
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
-
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 	max_tr.buffer = ring_buffer_alloc(1, rb_flags);
@@ -4748,7 +4857,6 @@ __init static int tracer_alloc_buffers(void)
 		ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = 1;
 #endif
 
 	/* Allocate the first page for all buffers */
@@ -4757,6 +4865,11 @@ __init static int tracer_alloc_buffers(void)
 		max_tr.data[i] = &per_cpu(max_tr_data, i);
 	}
 
+	set_buffer_entries(&global_trace, ring_buf_size);
+#ifdef CONFIG_TRACER_MAX_TRACE
+	set_buffer_entries(&max_tr, 1);
+#endif
+
 	trace_init_cmdlines();
 
 	register_tracer(&nop_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 616846b..126d333 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -125,6 +125,7 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
+	unsigned long		entries;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
@@ -146,7 +147,6 @@ struct trace_array_cpu {
  */
 struct trace_array {
 	struct ring_buffer	*buffer;
-	unsigned long		entries;
 	int			cpu;
 	cycle_t			time_start;
 	struct task_struct	*waiter;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3] trace: Add per_cpu ring buffer control files
  2011-08-22 22:09   ` [PATCH v3] " Vaibhav Nagarnaik
@ 2011-08-23  0:49     ` Steven Rostedt
  2011-08-23  1:16       ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-08-23  0:49 UTC (permalink / raw)
  To: Vaibhav Nagarnaik; +Cc: Michael Rubin, David Sharp, linux-kernel

On Mon, 2011-08-22 at 15:09 -0700, Vaibhav Nagarnaik wrote:
>  
> @@ -3740,10 +3757,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>  	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
>  		goto out;
>  
> -	/* At least make sure the two buffers are somewhat the same */
> -	if (buffer_a->pages != buffer_b->pages)
> -		goto out;
> -
>  	ret = -EAGAIN;
>  
>  	if (ring_buffer_flags != RB_BUFFERS_ON)
> @@ -3758,6 +3771,12 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>  	cpu_buffer_a = buffer_a->buffers[cpu];
>  	cpu_buffer_b = buffer_b->buffers[cpu];

I would move the cpu_buffer_(ab) assignments up instead, as I want the
EINVAL to have dominance. The record_disable just says that we are
temporarily out of commission, when in fact it could be invalid. Thus
the invalid conditions need to be treated first.

-- Steve

>  
> +	ret = -EINVAL;
> +	/* At least make sure the two buffers are somewhat the same */
> +	if (cpu_buffer_a->pages != cpu_buffer_b->pages)
> +		goto out;
> +
> +	ret = -EAGAIN;
>  	if (atomic_read(&cpu_buffer_a->record_disabled))
>  		goto out;
>  



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3] trace: Add per_cpu ring buffer control files
  2011-08-23  0:49     ` Steven Rostedt
@ 2011-08-23  1:16       ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-23  1:16 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Michael Rubin, David Sharp, linux-kernel

On Mon, Aug 22, 2011 at 5:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2011-08-22 at 15:09 -0700, Vaibhav Nagarnaik wrote:
>>
>> @@ -3740,10 +3757,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>>           !cpumask_test_cpu(cpu, buffer_b->cpumask))
>>               goto out;
>>
>> -     /* At least make sure the two buffers are somewhat the same */
>> -     if (buffer_a->pages != buffer_b->pages)
>> -             goto out;
>> -
>>       ret = -EAGAIN;
>>
>>       if (ring_buffer_flags != RB_BUFFERS_ON)
>> @@ -3758,6 +3771,12 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>>       cpu_buffer_a = buffer_a->buffers[cpu];
>>       cpu_buffer_b = buffer_b->buffers[cpu];
>
> I would move the cpu_buffer_(ab) assignments up instead, as I want the
> EINVAL to have dominance. The record_disable just says that we are
> temporarily out of commission, when in fact it could be invalid. Thus
> the invalid conditions need to be treated first.
>
> -- Steve
>

Sure. I am sending an updated patch.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v3] trace: Add per_cpu ring buffer control files
  2011-08-16 21:46 ` [PATCH v2 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
  2011-08-22 20:29   ` Steven Rostedt
  2011-08-22 22:09   ` [PATCH v3] " Vaibhav Nagarnaik
@ 2011-08-23  1:17   ` Vaibhav Nagarnaik
  2011-09-03  2:45     ` Steven Rostedt
                       ` (4 more replies)
  2 siblings, 5 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-23  1:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v3-v2:
* Fix compilation errors when using allyesconfig.

 include/linux/ring_buffer.h |    6 +-
 kernel/trace/ring_buffer.c  |  248 ++++++++++++++++++++++++-------------------
 kernel/trace/trace.c        |  191 ++++++++++++++++++++++++++-------
 kernel/trace/trace.h        |    2 +-
 4 files changed, 298 insertions(+), 149 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 67be037..ad36702 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+#define RING_BUFFER_ALL_CPUS -1
+
 void ring_buffer_free(struct ring_buffer *buffer);
 
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
 
 void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
 
@@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
 void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
 int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
 
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
 
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index acf6b68..bb0ffdd 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
 	spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
+	unsigned int			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
 	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
+	/* ring buffer pages to update, > 0 to add, < 0 to remove */
+	int				nr_pages_to_update;
+	struct list_head		new_pages; /* new pages to add */
 };
 
 struct ring_buffer {
-	unsigned			pages;
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
@@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
+static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
 {
+	int i;
 	struct buffer_page *bpage, *tmp;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    GFP_KERNEL | __GFP_NORETRY,
-				    cpu_to_node(cpu_buffer->cpu));
+				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
 
-		rb_check_bpage(cpu_buffer, bpage);
-
-		list_add(&bpage->list, &pages);
+		list_add(&bpage->list, pages);
 
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+		page = alloc_pages_node(cpu_to_node(cpu),
 					GFP_KERNEL | __GFP_NORETRY, 0);
 		if (!page)
 			goto free_pages;
@@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_init_page(bpage->page);
 	}
 
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+
+	return -ENOMEM;
+}
+
+static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	LIST_HEAD(pages);
+
+	WARN_ON(!nr_pages);
+
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+		return -ENOMEM;
+
 	/*
 	 * The ring buffer page list is a circular list that does not
 	 * start and end with a list head. All page list items point to
@@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);
 
+	cpu_buffer->nr_pages = nr_pages;
+
 	rb_check_pages(cpu_buffer);
 
 	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
 }
 
 static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
+rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
@@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	ret = rb_allocate_pages(cpu_buffer, nr_pages);
 	if (ret < 0)
 		goto fail_free_reader;
 
@@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 {
 	struct ring_buffer *buffer;
 	int bsize;
-	int cpu;
+	int cpu, nr_pages;
 
 	/* keep it in its own cache line */
 	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
 
 	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
+	if (nr_pages < 2)
+		nr_pages = 2;
 
 	/*
 	 * In case of non-hotplug cpu, if the ring-buffer is allocated
@@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 
 	for_each_buffer_cpu(buffer, cpu) {
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu])
 			goto fail_free_buffers;
 	}
@@ -1308,6 +1321,17 @@ out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
+static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	if (cpu_buffer->nr_pages_to_update > 0)
+		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
+				cpu_buffer->nr_pages_to_update);
+	else
+		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+	/* reset this value */
+	cpu_buffer->nr_pages_to_update = 0;
+}
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -1317,14 +1341,12 @@ out:
  *
  * Returns -1 on failure.
  */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
+			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	LIST_HEAD(pages);
-	int i, cpu;
+	unsigned nr_pages;
+	int cpu;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1334,15 +1356,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
 
 	/* we need a minimum of two pages */
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	if (size == buffer_size)
-		return size;
-
 	atomic_inc(&buffer->record_disabled);
 
 	/* Make sure all writers are done with this buffer. */
@@ -1353,68 +1371,59 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	if (size < buffer_size) {
+	if (cpu_id == RING_BUFFER_ALL_CPUS) {
+		/* calculate the pages to update */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+
+			cpu_buffer->nr_pages_to_update = nr_pages -
+							cpu_buffer->nr_pages;
 
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
+			/*
+			 * nothing more to do for removing pages or no update
+			 */
+			if (cpu_buffer->nr_pages_to_update <= 0)
+				continue;
 
-		rm_pages = buffer->pages - nr_pages;
+			/*
+			 * to add pages, make sure all new pages can be
+			 * allocated without receiving ENOMEM
+			 */
+			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu))
+				/* not enough memory for new pages */
+				goto no_mem;
+		}
 
+		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
+			if (cpu_buffer->nr_pages_to_update) {
+				update_pages_handler(cpu_buffer);
+				cpu_buffer->nr_pages = nr_pages;
+			}
 		}
-		goto out;
-	}
+	} else {
+		cpu_buffer = buffer->buffers[cpu_id];
+		if (nr_pages == cpu_buffer->nr_pages)
+			goto out;
 
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
+		cpu_buffer->nr_pages_to_update = nr_pages -
+						cpu_buffer->nr_pages;
 
-	new_pages = nr_pages - buffer->pages;
+		INIT_LIST_HEAD(&cpu_buffer->new_pages);
+		if (cpu_buffer->nr_pages_to_update > 0 &&
+			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu_id))
+			goto no_mem;
 
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			struct page *page;
-			/*
-			 * __GFP_NORETRY flag makes sure that the allocation
-			 * fails gracefully without invoking oom-killer and
-			 * the system is not destabilized.
-			 */
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL | __GFP_NORETRY,
-					    cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			page = alloc_pages_node(cpu_to_node(cpu),
-						GFP_KERNEL | __GFP_NORETRY, 0);
-			if (!page)
-				goto free_pages;
-			bpage->page = page_address(page);
-			rb_init_page(bpage->page);
-		}
-	}
+		update_pages_handler(cpu_buffer);
 
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
+		cpu_buffer->nr_pages = nr_pages;
 	}
 
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
-
  out:
-	buffer->pages = nr_pages;
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 
@@ -1422,25 +1431,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	return size;
 
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+ no_mem:
+	for_each_buffer_cpu(buffer, cpu) {
+		struct buffer_page *bpage, *tmp;
+		cpu_buffer = buffer->buffers[cpu];
+		/* reset this number regardless */
+		cpu_buffer->nr_pages_to_update = 0;
+		if (list_empty(&cpu_buffer->new_pages))
+			continue;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
 	}
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1542,7 +1550,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 	 * assign the commit to the tail.
 	 */
  again:
-	max_count = cpu_buffer->buffer->pages * 100;
+	max_count = cpu_buffer->nr_pages * 100;
 
 	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
 		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
@@ -3563,9 +3571,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
  * ring_buffer_size - return the size of the ring buffer (in bytes)
  * @buffer: The ring buffer.
  */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
 {
-	return BUF_PAGE_SIZE * buffer->pages;
+	/*
+	 * Earlier, this method returned
+	 *	BUF_PAGE_SIZE * buffer->nr_pages
+	 * Since the nr_pages field is now removed, we have converted this to
+	 * return the per cpu buffer value.
+	 */
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_size);
 
@@ -3740,8 +3757,11 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
 		goto out;
 
+	cpu_buffer_a = buffer_a->buffers[cpu];
+	cpu_buffer_b = buffer_b->buffers[cpu];
+
 	/* At least make sure the two buffers are somewhat the same */
-	if (buffer_a->pages != buffer_b->pages)
+	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
 		goto out;
 
 	ret = -EAGAIN;
@@ -3755,9 +3775,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	if (atomic_read(&buffer_b->record_disabled))
 		goto out;
 
-	cpu_buffer_a = buffer_a->buffers[cpu];
-	cpu_buffer_b = buffer_b->buffers[cpu];
-
 	if (atomic_read(&cpu_buffer_a->record_disabled))
 		goto out;
 
@@ -4108,6 +4125,8 @@ static int rb_cpu_notify(struct notifier_block *self,
 	struct ring_buffer *buffer =
 		container_of(self, struct ring_buffer, cpu_notify);
 	long cpu = (long)hcpu;
+	int cpu_i, nr_pages_same;
+	unsigned int nr_pages;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -4115,8 +4134,23 @@ static int rb_cpu_notify(struct notifier_block *self,
 		if (cpumask_test_cpu(cpu, buffer->cpumask))
 			return NOTIFY_OK;
 
+		nr_pages = 0;
+		nr_pages_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_buffer_cpu(buffer, cpu_i) {
+			/* fill in the size from first enabled cpu */
+			if (nr_pages == 0)
+				nr_pages = buffer->buffers[cpu_i]->nr_pages;
+			if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
+				nr_pages_same = 0;
+				break;
+			}
+		}
+		/* allocate minimum pages, user can later expand it */
+		if (!nr_pages_same)
+			nr_pages = 2;
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu]) {
 			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
 			     cpu);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b419070..bb3c867 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -784,7 +784,8 @@ __acquires(kernel_lock)
 
 		/* If we expanded the buffers, make sure the max is expanded too */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, trace_buf_size);
+			ring_buffer_resize(max_tr.buffer, trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 
 		/* the test is responsible for initializing and enabling */
 		pr_info("Testing tracer %s: ", type->name);
@@ -800,7 +801,8 @@ __acquires(kernel_lock)
 
 		/* Shrink the max buffer again */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, 1);
+			ring_buffer_resize(max_tr.buffer, 1,
+						RING_BUFFER_ALL_CPUS);
 
 		printk(KERN_CONT "PASSED\n");
 	}
@@ -2853,7 +2855,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
 	return t->init(tr);
 }
 
-static int __tracing_resize_ring_buffer(unsigned long size)
+static void set_buffer_entries(struct trace_array *tr, unsigned long val)
+{
+	int cpu;
+	for_each_tracing_cpu(cpu)
+		tr->data[cpu]->entries = val;
+}
+
+static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 {
 	int ret;
 
@@ -2864,19 +2873,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 	 */
 	ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
 	if (ret < 0)
 		return ret;
 
 	if (!current_trace->use_max_tr)
 		goto out;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
 	if (ret < 0) {
-		int r;
+		int r = 0;
+
+		if (cpu == RING_BUFFER_ALL_CPUS) {
+			int i;
+			for_each_tracing_cpu(i) {
+				r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[i]->entries,
+						i);
+				if (r < 0)
+					break;
+			}
+		} else {
+			r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+		}
 
-		r = ring_buffer_resize(global_trace.buffer,
-				       global_trace.entries);
 		if (r < 0) {
 			/*
 			 * AARGH! We are left with different
@@ -2898,14 +2920,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 		return ret;
 	}
 
-	max_tr.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&max_tr, size);
+	else
+		max_tr.data[cpu]->entries = size;
+
  out:
-	global_trace.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&global_trace, size);
+	else
+		global_trace.data[cpu]->entries = size;
 
 	return ret;
 }
 
-static ssize_t tracing_resize_ring_buffer(unsigned long size)
+static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
 	int cpu, ret = size;
 
@@ -2921,12 +2950,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
 			atomic_inc(&max_tr.data[cpu]->disabled);
 	}
 
-	if (size != global_trace.entries)
-		ret = __tracing_resize_ring_buffer(size);
+	if (cpu_id != RING_BUFFER_ALL_CPUS) {
+		/* make sure, this cpu is enabled in the mask */
+		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
 
+	ret = __tracing_resize_ring_buffer(size, cpu_id);
 	if (ret < 0)
 		ret = -ENOMEM;
 
+out:
 	for_each_tracing_cpu(cpu) {
 		if (global_trace.data[cpu])
 			atomic_dec(&global_trace.data[cpu]->disabled);
@@ -2957,7 +2993,8 @@ int tracing_update_buffers(void)
 
 	mutex_lock(&trace_types_lock);
 	if (!ring_buffer_expanded)
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -2981,7 +3018,8 @@ static int tracing_set_tracer(const char *buf)
 	mutex_lock(&trace_types_lock);
 
 	if (!ring_buffer_expanded) {
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3007,8 +3045,8 @@ static int tracing_set_tracer(const char *buf)
 		 * The max_tr ring buffer has some state (e.g. ring->clock) and
 		 * we want preserve it.
 		 */
-		ring_buffer_resize(max_tr.buffer, 1);
-		max_tr.entries = 1;
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
 	}
 	destroy_trace_option_files(topts);
 
@@ -3016,10 +3054,17 @@ static int tracing_set_tracer(const char *buf)
 
 	topts = create_trace_option_files(current_trace);
 	if (current_trace->use_max_tr) {
-		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
-		if (ret < 0)
-			goto out;
-		max_tr.entries = global_trace.entries;
+		int cpu;
+		/* we need to make per cpu buffer sizes equivalent */
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				goto out;
+			max_tr.data[cpu]->entries =
+					global_trace.data[cpu]->entries;
+		}
 	}
 
 	if (t->init) {
@@ -3521,30 +3566,82 @@ out_err:
 	goto out;
 }
 
+struct ftrace_entries_info {
+	struct trace_array	*tr;
+	int			cpu;
+};
+
+static int tracing_entries_open(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info;
+
+	if (tracing_disabled)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->tr = &global_trace;
+	info->cpu = (unsigned long)inode->i_private;
+
+	filp->private_data = info;
+
+	return 0;
+}
+
 static ssize_t
 tracing_entries_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	struct trace_array *tr = filp->private_data;
-	char buf[96];
-	int r;
+	struct ftrace_entries_info *info = filp->private_data;
+	struct trace_array *tr = info->tr;
+	char buf[64];
+	int r = 0;
+	ssize_t ret;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		r = sprintf(buf, "%lu (expanded: %lu)\n",
-			    tr->entries >> 10,
-			    trace_buf_size >> 10);
-	else
-		r = sprintf(buf, "%lu\n", tr->entries >> 10);
+
+	if (info->cpu == RING_BUFFER_ALL_CPUS) {
+		int cpu, buf_size_same;
+		unsigned long size;
+
+		size = 0;
+		buf_size_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_tracing_cpu(cpu) {
+			/* fill in the size from first enabled cpu */
+			if (size == 0)
+				size = tr->data[cpu]->entries;
+			if (size != tr->data[cpu]->entries) {
+				buf_size_same = 0;
+				break;
+			}
+		}
+
+		if (buf_size_same) {
+			if (!ring_buffer_expanded)
+				r = sprintf(buf, "%lu (expanded: %lu)\n",
+					    size >> 10,
+					    trace_buf_size >> 10);
+			else
+				r = sprintf(buf, "%lu\n", size >> 10);
+		} else
+			r = sprintf(buf, "X\n");
+	} else
+		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
+
 	mutex_unlock(&trace_types_lock);
 
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	return ret;
 }
 
 static ssize_t
 tracing_entries_write(struct file *filp, const char __user *ubuf,
 		      size_t cnt, loff_t *ppos)
 {
+	struct ftrace_entries_info *info = filp->private_data;
 	unsigned long val;
 	int ret;
 
@@ -3559,7 +3656,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	/* value is in KB */
 	val <<= 10;
 
-	ret = tracing_resize_ring_buffer(val);
+	ret = tracing_resize_ring_buffer(val, info->cpu);
 	if (ret < 0)
 		return ret;
 
@@ -3568,6 +3665,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static int
+tracing_entries_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info = filp->private_data;
+
+	kfree(info);
+
+	return 0;
+}
+
 static ssize_t
 tracing_total_entries_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos)
@@ -3579,7 +3686,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu) {
-		size += tr->entries >> 10;
+		size += tr->data[cpu]->entries >> 10;
 		if (!ring_buffer_expanded)
 			expanded_size += trace_buf_size >> 10;
 	}
@@ -3613,7 +3720,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
 		tracing_off();
 	/* resize the ring buffer to 0 */
-	tracing_resize_ring_buffer(0);
+	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
 
 	return 0;
 }
@@ -3757,9 +3864,10 @@ static const struct file_operations tracing_pipe_fops = {
 };
 
 static const struct file_operations tracing_entries_fops = {
-	.open		= tracing_open_generic,
+	.open		= tracing_entries_open,
 	.read		= tracing_entries_read,
 	.write		= tracing_entries_write,
+	.release	= tracing_entries_release,
 	.llseek		= generic_file_llseek,
 };
 
@@ -4211,6 +4319,9 @@ static void tracing_init_debugfs_percpu(long cpu)
 
 	trace_create_file("stats", 0444, d_cpu,
 			(void *) cpu, &tracing_stats_fops);
+
+	trace_create_file("buffer_size_kb", 0444, d_cpu,
+			(void *) cpu, &tracing_entries_fops);
 }
 
 #ifdef CONFIG_FTRACE_SELFTEST
@@ -4491,7 +4602,7 @@ static __init int tracer_init_debugfs(void)
 			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
 
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
-			&global_trace, &tracing_entries_fops);
+			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
 
 	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
 			&global_trace, &tracing_total_entries_fops);
@@ -4737,8 +4848,6 @@ __init static int tracer_alloc_buffers(void)
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
-
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 	max_tr.buffer = ring_buffer_alloc(1, rb_flags);
@@ -4748,7 +4857,6 @@ __init static int tracer_alloc_buffers(void)
 		ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = 1;
 #endif
 
 	/* Allocate the first page for all buffers */
@@ -4757,6 +4865,11 @@ __init static int tracer_alloc_buffers(void)
 		max_tr.data[i] = &per_cpu(max_tr_data, i);
 	}
 
+	set_buffer_entries(&global_trace, ring_buf_size);
+#ifdef CONFIG_TRACER_MAX_TRACE
+	set_buffer_entries(&max_tr, 1);
+#endif
+
 	trace_init_cmdlines();
 
 	register_tracer(&nop_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 616846b..126d333 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -125,6 +125,7 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
+	unsigned long		entries;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
@@ -146,7 +147,6 @@ struct trace_array_cpu {
  */
 struct trace_array {
 	struct ring_buffer	*buffer;
-	unsigned long		entries;
 	int			cpu;
 	cycle_t			time_start;
 	struct task_struct	*waiter;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic
  2011-08-16 21:46 ` [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
@ 2011-08-23  3:27   ` Steven Rostedt
  2011-08-23 18:55     ` Vaibhav Nagarnaik
  2011-08-23 18:55   ` [PATCH v3 " Vaibhav Nagarnaik
  2011-08-23 18:55   ` [PATCH v3 5/5] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
  2 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-08-23  3:27 UTC (permalink / raw)
  To: Vaibhav Nagarnaik; +Cc: Michael Rubin, David Sharp, linux-kernel

On Tue, 2011-08-16 at 14:46 -0700, Vaibhav Nagarnaik wrote:
> This patch adds the capability to remove pages from a ring buffer
> without destroying any existing data in it.
> 
> This is done by removing the pages after the tail page. This makes sure
> that first all the empty pages in the ring buffer are removed. If the
> head page is one in the list of pages to be removed, then the page after
> the removed ones is made the head page. This removes the oldest data
> from the ring buffer and keeps the latest data around to be read.
> 
> To do this in a non-racey manner, tracing is stopped for a very short
> time while the pages to be removed are identified and unlinked from the
> ring buffer. The pages are freed after the tracing is restarted to
> minimize the time needed to stop tracing.
> 
> The context in which the pages from the per-cpu ring buffer are removed
> runs on the respective CPU. This minimizes the events not traced to only
> NMI trace contexts.

Could you do the same with this patch, as this one fails to build as
well. And probably should check patch 5 while your at it.

-- Steve

> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---
> Changelog v2-v1:
> * The earlier patch removed pages after the tail page by using cmpxchg()
>   operations, which were identified as racey by Steven Rostedt. Now, the
>   logic is changed to stop tracing till all the pages are identified and
>   unlinked, to remove the race with the writer.
> 
>  kernel/trace/ring_buffer.c |  207 +++++++++++++++++++++++++++++++++-----------
>  kernel/trace/trace.c       |   20 +----
>  2 files changed, 156 insertions(+), 71 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index a627680..1c86065 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -23,6 +23,8 @@
>  #include <asm/local.h>
>  #include "trace.h"
>  
> +static void update_pages_handler(struct work_struct *work);
> +
>  /*
>   * The ring buffer header is special. We must manually up keep it.
>   */
> @@ -502,6 +504,8 @@ struct ring_buffer_per_cpu {
>  	/* ring buffer pages to update, > 0 to add, < 0 to remove */
>  	int				nr_pages_to_update;
>  	struct list_head		new_pages; /* new pages to add */
> +	struct work_struct		update_pages_work;
> +	struct completion		update_completion;
>  };
>  
>  struct ring_buffer {
> @@ -1080,6 +1084,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>  	spin_lock_init(&cpu_buffer->reader_lock);
>  	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
>  	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
> +	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
> +	init_completion(&cpu_buffer->update_completion);
>  
>  	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>  			    GFP_KERNEL, cpu_to_node(cpu));
> @@ -1267,32 +1273,107 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
>  
>  static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
>  
> +static inline unsigned long rb_page_entries(struct buffer_page *bpage)
> +{
> +	return local_read(&bpage->entries) & RB_WRITE_MASK;
> +}
> +
> +static inline unsigned long rb_page_write(struct buffer_page *bpage)
> +{
> +	return local_read(&bpage->write) & RB_WRITE_MASK;
> +}
> +
>  static void
> -rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
> +rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
>  {
> -	struct buffer_page *bpage;
> -	struct list_head *p;
> -	unsigned i;
> +	unsigned int nr_removed;
> +	int page_entries;
> +	struct list_head *tail_page, *to_remove, *next_page;
> +	unsigned long head_bit;
> +	struct buffer_page *last_page, *first_page;
> +	struct buffer_page *to_remove_page, *tmp_iter_page;
>  
> +	head_bit = 0;
>  	spin_lock_irq(&cpu_buffer->reader_lock);
> -	rb_head_page_deactivate(cpu_buffer);
> -
> -	for (i = 0; i < nr_pages; i++) {
> -		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
> -			goto out;
> -		p = cpu_buffer->pages->next;
> -		bpage = list_entry(p, struct buffer_page, list);
> -		list_del_init(&bpage->list);
> -		free_buffer_page(bpage);
> +	atomic_inc(&cpu_buffer->record_disabled);
> +	/*
> +	 * We don't race with the readers since we have acquired the reader
> +	 * lock. We also don't race with writers after disabling recording.
> +	 * This makes it easy to figure out the first and the last page to be
> +	 * removed from the list. We remove all the pages in between including
> +	 * the first and last pages. This is done in a busy loop so that we
> +	 * lose the least number of traces.
> +	 * The pages are freed after we restart recording and unlock readers.
> +	 */
> +	tail_page = &cpu_buffer->tail_page->list;
> +	/*
> +	 * tail page might be on reader page, we remove the next page
> +	 * from the ring buffer
> +	 */
> +	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
> +		tail_page = rb_list_head(tail_page->next);
> +	to_remove = tail_page;
> +
> +	/* start of pages to remove */
> +	first_page = list_entry(rb_list_head(to_remove->next),
> +				struct buffer_page, list);
> +	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
> +		to_remove = rb_list_head(to_remove)->next;
> +		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
>  	}
> -	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
> -		goto out;
> -
> -	rb_reset_cpu(cpu_buffer);
> -	rb_check_pages(cpu_buffer);
>  
> -out:
> +	next_page = rb_list_head(to_remove)->next;
> +	/* now we remove all pages between tail_page and next_page */
> +	tail_page->next = (struct list_head *)((unsigned long)next_page |
> +						head_bit);
> +	next_page = rb_list_head(next_page);
> +	next_page->prev = tail_page;
> +	/* make sure pages points to a valid page in the ring buffer */
> +	cpu_buffer->pages = next_page;
> +	/* update head page */
> +	if (head_bit)
> +		cpu_buffer->head_page = list_entry(next_page,
> +						struct buffer_page, list);
> +	/*
> +	 * change read pointer to make sure any read iterators reset
> +	 * themselves
> +	 */
> +	cpu_buffer->read = 0;
> +	/* pages are removed, resume tracing and then free the pages */
> +	atomic_dec(&cpu_buffer->record_disabled);
>  	spin_unlock_irq(&cpu_buffer->reader_lock);
> +
> +	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
> +
> +	/* last buffer page to remove */
> +	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
> +				list);
> +	tmp_iter_page = first_page;
> +	do {
> +		to_remove_page = tmp_iter_page;
> +		rb_inc_page(cpu_buffer, &tmp_iter_page);
> +		/* update the counters */
> +		page_entries = rb_page_entries(to_remove_page);
> +		if (page_entries) {
> +			/*
> +			 * If something was added to this page, it was full
> +			 * since it is not the tail page. So we deduct the
> +			 * bytes consumed in ring buffer from here.
> +			 * No need to update overruns, since this page is
> +			 * deleted from ring buffer and its entries are
> +			 * already accounted for.
> +			 */
> +			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
> +		}
> +		/*
> +		 * We have already removed references to this list item, just
> +		 * free up the buffer_page and its page
> +		 */
> +		nr_removed--;
> +		free_buffer_page(to_remove_page);
> +	} while (to_remove_page != last_page);
> +
> +	RB_WARN_ON(cpu_buffer, nr_removed);
>  }
>  
>  static void
> @@ -1303,6 +1384,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
>  	struct list_head *p;
>  	unsigned i;
>  
> +	/* stop the writers while inserting pages */
> +	atomic_inc(&cpu_buffer->record_disabled);
> +
> +	/* Make sure all writers are done with this buffer. */
> +	synchronize_sched();
> +
>  	spin_lock_irq(&cpu_buffer->reader_lock);
>  	rb_head_page_deactivate(cpu_buffer);
>  
> @@ -1319,17 +1406,21 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
>  
>  out:
>  	spin_unlock_irq(&cpu_buffer->reader_lock);
> +	atomic_dec(&cpu_buffer->record_disabled);
>  }
>  
> -static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
> +static void update_pages_handler(struct work_struct *work)
>  {
> +	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
> +			struct ring_buffer_per_cpu, update_pages_work);
> +
>  	if (cpu_buffer->nr_pages_to_update > 0)
>  		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
>  				cpu_buffer->nr_pages_to_update);
>  	else
>  		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
> -	/* reset this value */
> -	cpu_buffer->nr_pages_to_update = 0;
> +
> +	complete(&cpu_buffer->update_completion);
>  }
>  
>  /**
> @@ -1339,7 +1430,7 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
>   *
>   * Minimum size is 2 * BUF_PAGE_SIZE.
>   *
> - * Returns -1 on failure.
> + * Returns 0 on success and < 0 on failure.
>   */
>  int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>  			int cpu_id)
> @@ -1361,21 +1452,28 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>  	if (size < BUF_PAGE_SIZE * 2)
>  		size = BUF_PAGE_SIZE * 2;
>  
> -	atomic_inc(&buffer->record_disabled);
> -
> -	/* Make sure all writers are done with this buffer. */
> -	synchronize_sched();
> +	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>  
> +	/*
> +	 * Don't succeed if recording is disabled globally, as a reader might
> +	 * be manipulating the ring buffer and is expecting a sane state while
> +	 * this is true.
> +	 */
> +	if (atomic_read(&buffer->record_disabled))
> +		return -EBUSY;
> +	/* prevent another thread from changing buffer sizes */
>  	mutex_lock(&buffer->mutex);
> -	get_online_cpus();
> -
> -	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>  
>  	if (cpu_id == RING_BUFFER_ALL_CPUS) {
>  		/* calculate the pages to update */
>  		for_each_buffer_cpu(buffer, cpu) {
>  			cpu_buffer = buffer->buffers[cpu];
>  
> +			if (atomic_read(&cpu_buffer->record_disabled)) {
> +				err = -EBUSY;
> +				goto out_err;
> +			}
> +
>  			cpu_buffer->nr_pages_to_update = nr_pages -
>  							cpu_buffer->nr_pages;
>  
> @@ -1396,16 +1494,31 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>  				goto no_mem;
>  		}
>  
> +		/* fire off all the required work handlers */
> +		for_each_buffer_cpu(buffer, cpu) {
> +			cpu_buffer = buffer->buffers[cpu];
> +			if (!cpu_buffer->nr_pages_to_update)
> +				continue;
> +			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
> +		}
> +
>  		/* wait for all the updates to complete */
>  		for_each_buffer_cpu(buffer, cpu) {
>  			cpu_buffer = buffer->buffers[cpu];
> -			if (cpu_buffer->nr_pages_to_update) {
> -				update_pages_handler(cpu_buffer);
> -				cpu_buffer->nr_pages = nr_pages;
> -			}
> +			if (!cpu_buffer->nr_pages_to_update)
> +				continue;
> +			wait_for_completion(&cpu_buffer->update_completion);
> +			cpu_buffer->nr_pages = nr_pages;
> +			/* reset this value */
> +			cpu_buffer->nr_pages_to_update = 0;
>  		}
>  	} else {
>  		cpu_buffer = buffer->buffers[cpu_id];
> +		if (atomic_read(&cpu_buffer->record_disabled)) {
> +			err = -EBUSY;
> +			goto out_err;
> +		}
> +
>  		if (nr_pages == cpu_buffer->nr_pages)
>  			goto out;
>  
> @@ -1418,36 +1531,36 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>  						&cpu_buffer->new_pages, cpu_id))
>  			goto no_mem;
>  
> -		update_pages_handler(cpu_buffer);
> +		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
> +		wait_for_completion(&cpu_buffer->update_completion);
>  
>  		cpu_buffer->nr_pages = nr_pages;
> +		/* reset this value */
> +		cpu_buffer->nr_pages_to_update = 0;
>  	}
>  
>   out:
> -	put_online_cpus();
>  	mutex_unlock(&buffer->mutex);
> -
> -	atomic_dec(&buffer->record_disabled);
> -
>  	return size;
>  
>   no_mem:
>  	for_each_buffer_cpu(buffer, cpu) {
>  		struct buffer_page *bpage, *tmp;
> +
>  		cpu_buffer = buffer->buffers[cpu];
>  		/* reset this number regardless */
>  		cpu_buffer->nr_pages_to_update = 0;
> +
>  		if (list_empty(&cpu_buffer->new_pages))
>  			continue;
> +
>  		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
>  					list) {
>  			list_del_init(&bpage->list);
>  			free_buffer_page(bpage);
>  		}
>  	}
> -	put_online_cpus();
>  	mutex_unlock(&buffer->mutex);
> -	atomic_dec(&buffer->record_disabled);
>  	return -ENOMEM;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_resize);
> @@ -1487,21 +1600,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
>  	return __rb_page_index(iter->head_page, iter->head);
>  }
>  
> -static inline unsigned long rb_page_write(struct buffer_page *bpage)
> -{
> -	return local_read(&bpage->write) & RB_WRITE_MASK;
> -}
> -
>  static inline unsigned rb_page_commit(struct buffer_page *bpage)
>  {
>  	return local_read(&bpage->page->commit);
>  }
>  
> -static inline unsigned long rb_page_entries(struct buffer_page *bpage)
> -{
> -	return local_read(&bpage->entries) & RB_WRITE_MASK;
> -}
> -
>  /* Size is determined by what has been committed */
>  static inline unsigned rb_page_size(struct buffer_page *bpage)
>  {
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 305832a..908cecc 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2934,20 +2934,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
>  
>  static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
>  {
> -	int cpu, ret = size;
> +	int ret = size;
>  
>  	mutex_lock(&trace_types_lock);
>  
> -	tracing_stop();
> -
> -	/* disable all cpu buffers */
> -	for_each_tracing_cpu(cpu) {
> -		if (global_trace.data[cpu])
> -			atomic_inc(&global_trace.data[cpu]->disabled);
> -		if (max_tr.data[cpu])
> -			atomic_inc(&max_tr.data[cpu]->disabled);
> -	}
> -
>  	if (cpu_id != RING_BUFFER_ALL_CPUS) {
>  		/* make sure, this cpu is enabled in the mask */
>  		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
> @@ -2961,14 +2951,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
>  		ret = -ENOMEM;
>  
>  out:
> -	for_each_tracing_cpu(cpu) {
> -		if (global_trace.data[cpu])
> -			atomic_dec(&global_trace.data[cpu]->disabled);
> -		if (max_tr.data[cpu])
> -			atomic_dec(&max_tr.data[cpu]->disabled);
> -	}
> -
> -	tracing_start();
>  	mutex_unlock(&trace_types_lock);
>  
>  	return ret;



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic
  2011-08-23  3:27   ` Steven Rostedt
@ 2011-08-23 18:55     ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-23 18:55 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Michael Rubin, David Sharp, linux-kernel

On Mon, Aug 22, 2011 at 8:27 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-08-16 at 14:46 -0700, Vaibhav Nagarnaik wrote:
>> This patch adds the capability to remove pages from a ring buffer
>> without destroying any existing data in it.
>>
>> This is done by removing the pages after the tail page. This makes sure
>> that first all the empty pages in the ring buffer are removed. If the
>> head page is one in the list of pages to be removed, then the page after
>> the removed ones is made the head page. This removes the oldest data
>> from the ring buffer and keeps the latest data around to be read.
>>
>> To do this in a non-racey manner, tracing is stopped for a very short
>> time while the pages to be removed are identified and unlinked from the
>> ring buffer. The pages are freed after the tracing is restarted to
>> minimize the time needed to stop tracing.
>>
>> The context in which the pages from the per-cpu ring buffer are removed
>> runs on the respective CPU. This minimizes the events not traced to only
>> NMI trace contexts.
>
> Could you do the same with this patch, as this one fails to build as
> well. And probably should check patch 5 while your at it.
>
> -- Steve
>

I have corrected both the patches. I will make sure that I add
'allyesconfig' and 'allnoconfig' build targets also in my work flow to
avoid such issues.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v3 4/5] trace: Make removal of ring buffer pages atomic
  2011-08-16 21:46 ` [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
  2011-08-23  3:27   ` Steven Rostedt
@ 2011-08-23 18:55   ` Vaibhav Nagarnaik
  2011-08-23 19:16     ` David Sharp
  2011-08-23 18:55   ` [PATCH v3 5/5] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
  2 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-23 18:55 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Frederic Weisbecker
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v3-v2:
* Fix compile errors

 kernel/trace/ring_buffer.c |  225 ++++++++++++++++++++++++++++++++------------
 kernel/trace/trace.c       |   20 +----
 2 files changed, 167 insertions(+), 78 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index bb0ffdd..f10e439 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -502,6 +504,8 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
@@ -1080,6 +1084,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1267,32 +1273,107 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	unsigned int nr_removed;
+	int page_entries;
+	struct list_head *tail_page, *to_remove, *next_page;
+	unsigned long head_bit;
+	struct buffer_page *last_page, *first_page;
+	struct buffer_page *to_remove_page, *tmp_iter_page;
 
+	head_bit = 0;
 	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
-
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+	atomic_inc(&cpu_buffer->record_disabled);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock. We also don't race with writers after disabling recording.
+	 * This makes it easy to figure out the first and the last page to be
+	 * removed from the list. We remove all the pages in between including
+	 * the first and last pages. This is done in a busy loop so that we
+	 * lose the least number of traces.
+	 * The pages are freed after we restart recording and unlock readers.
+	 */
+	tail_page = &cpu_buffer->tail_page->list;
+	/*
+	 * tail page might be on reader page, we remove the next page
+	 * from the ring buffer
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		tail_page = rb_list_head(tail_page->next);
+	to_remove = tail_page;
+
+	/* start of pages to remove */
+	first_page = list_entry(rb_list_head(to_remove->next),
+				struct buffer_page, list);
+	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
+		to_remove = rb_list_head(to_remove)->next;
+		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
 	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
 
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
-
-out:
+	next_page = rb_list_head(to_remove)->next;
+	/* now we remove all pages between tail_page and next_page */
+	tail_page->next = (struct list_head *)((unsigned long)next_page |
+						head_bit);
+	next_page = rb_list_head(next_page);
+	next_page->prev = tail_page;
+	/* make sure pages points to a valid page in the ring buffer */
+	cpu_buffer->pages = next_page;
+	/* update head page */
+	if (head_bit)
+		cpu_buffer->head_page = list_entry(next_page,
+						struct buffer_page, list);
+	/*
+	 * change read pointer to make sure any read iterators reset
+	 * themselves
+	 */
+	cpu_buffer->read = 0;
+	/* pages are removed, resume tracing and then free the pages */
+	atomic_dec(&cpu_buffer->record_disabled);
 	spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
+
+	/* last buffer page to remove */
+	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
+				list);
+	tmp_iter_page = first_page;
+	do {
+		to_remove_page = tmp_iter_page;
+		rb_inc_page(cpu_buffer, &tmp_iter_page);
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		nr_removed--;
+		free_buffer_page(to_remove_page);
+	} while (to_remove_page != last_page);
+
+	RB_WARN_ON(cpu_buffer, nr_removed);
 }
 
 static void
@@ -1303,6 +1384,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	struct list_head *p;
 	unsigned i;
 
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	/* Make sure all writers are done with this buffer. */
+	synchronize_sched();
+
 	spin_lock_irq(&cpu_buffer->reader_lock);
 	rb_head_page_deactivate(cpu_buffer);
 
@@ -1319,17 +1406,21 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
+	atomic_dec(&cpu_buffer->record_disabled);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void update_pages_handler(struct work_struct *work)
 {
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1339,14 +1430,14 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
  *
  * Minimum size is 2 * BUF_PAGE_SIZE.
  *
- * Returns -1 on failure.
+ * Returns 0 on success and < 0 on failure.
  */
 int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1361,21 +1452,28 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
-
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
+	/*
+	 * Don't succeed if recording is disabled globally, as a reader might
+	 * be manipulating the ring buffer and is expecting a sane state while
+	 * this is true.
+	 */
+	if (atomic_read(&buffer->record_disabled))
+		return -EBUSY;
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
+			if (atomic_read(&cpu_buffer->record_disabled)) {
+				err = -EBUSY;
+				goto out_err;
+			}
+
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 
@@ -1391,21 +1489,38 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
+		}
+
+		/* fire off all the required work handlers */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
 		}
 
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-				cpu_buffer->nr_pages = nr_pages;
-			}
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			wait_for_completion(&cpu_buffer->update_completion);
+			cpu_buffer->nr_pages = nr_pages;
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+		if (atomic_read(&cpu_buffer->record_disabled)) {
+			err = -EBUSY;
+			goto out_err;
+		}
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1415,40 +1530,42 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					&cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
 
-		update_pages_handler(cpu_buffer);
+		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
+		wait_for_completion(&cpu_buffer->update_completion);
 
 		cpu_buffer->nr_pages = nr_pages;
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
 		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1487,21 +1604,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bb3c867..736518f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2936,20 +2936,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -2963,14 +2953,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 5/5] trace: Make addition of pages in ring buffer atomic
  2011-08-16 21:46 ` [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
  2011-08-23  3:27   ` Steven Rostedt
  2011-08-23 18:55   ` [PATCH v3 " Vaibhav Nagarnaik
@ 2011-08-23 18:55   ` Vaibhav Nagarnaik
  2 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-23 18:55 UTC (permalink / raw)
  To: Steven Rostedt, Ingo Molnar, Frederic Weisbecker
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v3-v2:
* Fix compile errors

 kernel/trace/ring_buffer.c |   79 ++++++++++++++++++++++++++++++-------------
 1 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f10e439..5381168 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1377,36 +1377,67 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 }
 
 static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
+	spin_lock_irq(&cpu_buffer->reader_lock);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 10 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 10;
+	success = 0;
+	while (retries--) {
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *head_page_with_bit;
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
 
-	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+		first_page = pages->next;
+		last_page  = pages->prev;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+		head_page_with_bit = (struct list_head *)
+				((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next  = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	spin_unlock_irq(&cpu_buffer->reader_lock);
-	atomic_dec(&cpu_buffer->record_disabled);
 }
 
 static void update_pages_handler(struct work_struct *work)
@@ -1415,8 +1446,7 @@ static void update_pages_handler(struct work_struct *work)
 			struct ring_buffer_per_cpu, update_pages_work);
 
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		rb_insert_pages(cpu_buffer);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
 
@@ -3710,6 +3740,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 4/5] trace: Make removal of ring buffer pages atomic
  2011-08-23 18:55   ` [PATCH v3 " Vaibhav Nagarnaik
@ 2011-08-23 19:16     ` David Sharp
  2011-08-23 19:20       ` Vaibhav Nagarnaik
  2011-08-23 19:24       ` Steven Rostedt
  0 siblings, 2 replies; 80+ messages in thread
From: David Sharp @ 2011-08-23 19:16 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Steven Rostedt, Ingo Molnar, Frederic Weisbecker, Michael Rubin,
	linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 20017 bytes --]

On Tue, Aug 23, 2011 at 11:55 AM, Vaibhav Nagarnaik
<vnagarnaik@google.com> wrote:
> This patch adds the capability to remove pages from a ring buffer
> without destroying any existing data in it.
>
> This is done by removing the pages after the tail page. This makes sure
> that first all the empty pages in the ring buffer are removed. If the
> head page is one in the list of pages to be removed, then the page after
> the removed ones is made the head page. This removes the oldest data
> from the ring buffer and keeps the latest data around to be read.
>
> To do this in a non-racey manner, tracing is stopped for a very short
> time while the pages to be removed are identified and unlinked from the
> ring buffer. The pages are freed after the tracing is restarted to
> minimize the time needed to stop tracing.
>
> The context in which the pages from the per-cpu ring buffer are removed
> runs on the respective CPU. This minimizes the events not traced to only
> NMI trace contexts.

"interrupt contexts". We're not disabling interrupts.

>
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---
> Changelog v3-v2:
> * Fix compile errors
>
>  kernel/trace/ring_buffer.c |  225 ++++++++++++++++++++++++++++++++------------
>  kernel/trace/trace.c       |   20 +----
>  2 files changed, 167 insertions(+), 78 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index bb0ffdd..f10e439 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -23,6 +23,8 @@
>  #include <asm/local.h>
>  #include "trace.h"
>
> +static void update_pages_handler(struct work_struct *work);
> +
>  /*
>  * The ring buffer header is special. We must manually up keep it.
>  */
> @@ -502,6 +504,8 @@ struct ring_buffer_per_cpu {
>        /* ring buffer pages to update, > 0 to add, < 0 to remove */
>        int                             nr_pages_to_update;
>        struct list_head                new_pages; /* new pages to add */
> +       struct work_struct              update_pages_work;
> +       struct completion               update_completion;
>  };
>
>  struct ring_buffer {
> @@ -1080,6 +1084,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>        spin_lock_init(&cpu_buffer->reader_lock);
>        lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
>        cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
> +       INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
> +       init_completion(&cpu_buffer->update_completion);
>
>        bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>                            GFP_KERNEL, cpu_to_node(cpu));
> @@ -1267,32 +1273,107 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
>
>  static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
>
> +static inline unsigned long rb_page_entries(struct buffer_page *bpage)
> +{
> +       return local_read(&bpage->entries) & RB_WRITE_MASK;
> +}
> +
> +static inline unsigned long rb_page_write(struct buffer_page *bpage)
> +{
> +       return local_read(&bpage->write) & RB_WRITE_MASK;
> +}
> +
>  static void
> -rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
> +rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
>  {
> -       struct buffer_page *bpage;
> -       struct list_head *p;
> -       unsigned i;
> +       unsigned int nr_removed;
> +       int page_entries;
> +       struct list_head *tail_page, *to_remove, *next_page;
> +       unsigned long head_bit;
> +       struct buffer_page *last_page, *first_page;
> +       struct buffer_page *to_remove_page, *tmp_iter_page;
>
> +       head_bit = 0;
>        spin_lock_irq(&cpu_buffer->reader_lock);
> -       rb_head_page_deactivate(cpu_buffer);
> -
> -       for (i = 0; i < nr_pages; i++) {
> -               if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
> -                       goto out;
> -               p = cpu_buffer->pages->next;
> -               bpage = list_entry(p, struct buffer_page, list);
> -               list_del_init(&bpage->list);
> -               free_buffer_page(bpage);
> +       atomic_inc(&cpu_buffer->record_disabled);
> +       /*
> +        * We don't race with the readers since we have acquired the reader
> +        * lock. We also don't race with writers after disabling recording.
> +        * This makes it easy to figure out the first and the last page to be
> +        * removed from the list. We remove all the pages in between including
> +        * the first and last pages. This is done in a busy loop so that we
> +        * lose the least number of traces.
> +        * The pages are freed after we restart recording and unlock readers.
> +        */
> +       tail_page = &cpu_buffer->tail_page->list;
> +       /*
> +        * tail page might be on reader page, we remove the next page
> +        * from the ring buffer
> +        */
> +       if (cpu_buffer->tail_page == cpu_buffer->reader_page)
> +               tail_page = rb_list_head(tail_page->next);
> +       to_remove = tail_page;
> +
> +       /* start of pages to remove */
> +       first_page = list_entry(rb_list_head(to_remove->next),
> +                               struct buffer_page, list);
> +       for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
> +               to_remove = rb_list_head(to_remove)->next;
> +               head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
>        }
> -       if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
> -               goto out;
>
> -       rb_reset_cpu(cpu_buffer);
> -       rb_check_pages(cpu_buffer);
> -
> -out:
> +       next_page = rb_list_head(to_remove)->next;
> +       /* now we remove all pages between tail_page and next_page */
> +       tail_page->next = (struct list_head *)((unsigned long)next_page |
> +                                               head_bit);
> +       next_page = rb_list_head(next_page);
> +       next_page->prev = tail_page;
> +       /* make sure pages points to a valid page in the ring buffer */
> +       cpu_buffer->pages = next_page;
> +       /* update head page */
> +       if (head_bit)
> +               cpu_buffer->head_page = list_entry(next_page,
> +                                               struct buffer_page, list);
> +       /*
> +        * change read pointer to make sure any read iterators reset
> +        * themselves
> +        */
> +       cpu_buffer->read = 0;
> +       /* pages are removed, resume tracing and then free the pages */
> +       atomic_dec(&cpu_buffer->record_disabled);
>        spin_unlock_irq(&cpu_buffer->reader_lock);
> +
> +       RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
> +
> +       /* last buffer page to remove */
> +       last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
> +                               list);
> +       tmp_iter_page = first_page;
> +       do {
> +               to_remove_page = tmp_iter_page;
> +               rb_inc_page(cpu_buffer, &tmp_iter_page);
> +               /* update the counters */
> +               page_entries = rb_page_entries(to_remove_page);
> +               if (page_entries) {
> +                       /*
> +                        * If something was added to this page, it was full
> +                        * since it is not the tail page. So we deduct the
> +                        * bytes consumed in ring buffer from here.
> +                        * No need to update overruns, since this page is
> +                        * deleted from ring buffer and its entries are
> +                        * already accounted for.
> +                        */
> +                       local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
> +               }
> +               /*
> +                * We have already removed references to this list item, just
> +                * free up the buffer_page and its page
> +                */
> +               nr_removed--;
> +               free_buffer_page(to_remove_page);
> +       } while (to_remove_page != last_page);
> +
> +       RB_WARN_ON(cpu_buffer, nr_removed);
>  }
>
>  static void
> @@ -1303,6 +1384,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
>        struct list_head *p;
>        unsigned i;
>
> +       /* stop the writers while inserting pages */
> +       atomic_inc(&cpu_buffer->record_disabled);
> +
> +       /* Make sure all writers are done with this buffer. */
> +       synchronize_sched();
> +
>        spin_lock_irq(&cpu_buffer->reader_lock);
>        rb_head_page_deactivate(cpu_buffer);
>
> @@ -1319,17 +1406,21 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
>
>  out:
>        spin_unlock_irq(&cpu_buffer->reader_lock);
> +       atomic_dec(&cpu_buffer->record_disabled);
>  }
>
> -static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
> +static void update_pages_handler(struct work_struct *work)
>  {
> +       struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
> +                       struct ring_buffer_per_cpu, update_pages_work);
> +
>        if (cpu_buffer->nr_pages_to_update > 0)
>                rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
>                                cpu_buffer->nr_pages_to_update);
>        else
>                rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
> -       /* reset this value */
> -       cpu_buffer->nr_pages_to_update = 0;
> +
> +       complete(&cpu_buffer->update_completion);
>  }
>
>  /**
> @@ -1339,14 +1430,14 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
>  *
>  * Minimum size is 2 * BUF_PAGE_SIZE.
>  *
> - * Returns -1 on failure.
> + * Returns 0 on success and < 0 on failure.
>  */
>  int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>                        int cpu_id)
>  {
>        struct ring_buffer_per_cpu *cpu_buffer;
>        unsigned nr_pages;
> -       int cpu;
> +       int cpu, err = 0;
>
>        /*
>         * Always succeed at resizing a non-existent buffer:
> @@ -1361,21 +1452,28 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>        if (size < BUF_PAGE_SIZE * 2)
>                size = BUF_PAGE_SIZE * 2;
>
> -       atomic_inc(&buffer->record_disabled);
> -
> -       /* Make sure all writers are done with this buffer. */
> -       synchronize_sched();
> +       nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>
> +       /*
> +        * Don't succeed if recording is disabled globally, as a reader might
> +        * be manipulating the ring buffer and is expecting a sane state while
> +        * this is true.
> +        */
> +       if (atomic_read(&buffer->record_disabled))
> +               return -EBUSY;
> +       /* prevent another thread from changing buffer sizes */
>        mutex_lock(&buffer->mutex);
> -       get_online_cpus();
> -
> -       nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>
>        if (cpu_id == RING_BUFFER_ALL_CPUS) {
>                /* calculate the pages to update */
>                for_each_buffer_cpu(buffer, cpu) {
>                        cpu_buffer = buffer->buffers[cpu];
>
> +                       if (atomic_read(&cpu_buffer->record_disabled)) {
> +                               err = -EBUSY;
> +                               goto out_err;
> +                       }
> +
>                        cpu_buffer->nr_pages_to_update = nr_pages -
>                                                        cpu_buffer->nr_pages;
>
> @@ -1391,21 +1489,38 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>                         */
>                        INIT_LIST_HEAD(&cpu_buffer->new_pages);
>                        if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> -                                               &cpu_buffer->new_pages, cpu))
> +                                               &cpu_buffer->new_pages, cpu)) {
>                                /* not enough memory for new pages */
> -                               goto no_mem;
> +                               err = -ENOMEM;
> +                               goto out_err;
> +                       }
> +               }
> +
> +               /* fire off all the required work handlers */
> +               for_each_buffer_cpu(buffer, cpu) {
> +                       cpu_buffer = buffer->buffers[cpu];
> +                       if (!cpu_buffer->nr_pages_to_update)
> +                               continue;
> +                       schedule_work_on(cpu, &cpu_buffer->update_pages_work);
>                }
>
>                /* wait for all the updates to complete */
>                for_each_buffer_cpu(buffer, cpu) {
>                        cpu_buffer = buffer->buffers[cpu];
> -                       if (cpu_buffer->nr_pages_to_update) {
> -                               update_pages_handler(cpu_buffer);
> -                               cpu_buffer->nr_pages = nr_pages;
> -                       }
> +                       if (!cpu_buffer->nr_pages_to_update)
> +                               continue;
> +                       wait_for_completion(&cpu_buffer->update_completion);
> +                       cpu_buffer->nr_pages = nr_pages;
> +                       /* reset this value */
> +                       cpu_buffer->nr_pages_to_update = 0;
>                }
>        } else {
>                cpu_buffer = buffer->buffers[cpu_id];
> +               if (atomic_read(&cpu_buffer->record_disabled)) {
> +                       err = -EBUSY;
> +                       goto out_err;
> +               }
> +
>                if (nr_pages == cpu_buffer->nr_pages)
>                        goto out;
>
> @@ -1415,40 +1530,42 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>                INIT_LIST_HEAD(&cpu_buffer->new_pages);
>                if (cpu_buffer->nr_pages_to_update > 0 &&
>                        __rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> -                                               &cpu_buffer->new_pages, cpu_id))
> -                       goto no_mem;
> +                                       &cpu_buffer->new_pages, cpu_id)) {
> +                       err = -ENOMEM;
> +                       goto out_err;
> +               }
>
> -               update_pages_handler(cpu_buffer);
> +               schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
> +               wait_for_completion(&cpu_buffer->update_completion);
>
>                cpu_buffer->nr_pages = nr_pages;
> +               /* reset this value */
> +               cpu_buffer->nr_pages_to_update = 0;
>        }
>
>  out:
> -       put_online_cpus();
>        mutex_unlock(&buffer->mutex);
> -
> -       atomic_dec(&buffer->record_disabled);
> -
>        return size;
>
> - no_mem:
> + out_err:
>        for_each_buffer_cpu(buffer, cpu) {
>                struct buffer_page *bpage, *tmp;
> +
>                cpu_buffer = buffer->buffers[cpu];
>                /* reset this number regardless */
>                cpu_buffer->nr_pages_to_update = 0;
> +
>                if (list_empty(&cpu_buffer->new_pages))
>                        continue;
> +
>                list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
>                                        list) {
>                        list_del_init(&bpage->list);
>                        free_buffer_page(bpage);
>                }
>        }
> -       put_online_cpus();
>        mutex_unlock(&buffer->mutex);
> -       atomic_dec(&buffer->record_disabled);
> -       return -ENOMEM;
> +       return err;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_resize);
>
> @@ -1487,21 +1604,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
>        return __rb_page_index(iter->head_page, iter->head);
>  }
>
> -static inline unsigned long rb_page_write(struct buffer_page *bpage)
> -{
> -       return local_read(&bpage->write) & RB_WRITE_MASK;
> -}
> -
>  static inline unsigned rb_page_commit(struct buffer_page *bpage)
>  {
>        return local_read(&bpage->page->commit);
>  }
>
> -static inline unsigned long rb_page_entries(struct buffer_page *bpage)
> -{
> -       return local_read(&bpage->entries) & RB_WRITE_MASK;
> -}
> -
>  /* Size is determined by what has been committed */
>  static inline unsigned rb_page_size(struct buffer_page *bpage)
>  {
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index bb3c867..736518f 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2936,20 +2936,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
>
>  static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
>  {
> -       int cpu, ret = size;
> +       int ret = size;
>
>        mutex_lock(&trace_types_lock);
>
> -       tracing_stop();
> -
> -       /* disable all cpu buffers */
> -       for_each_tracing_cpu(cpu) {
> -               if (global_trace.data[cpu])
> -                       atomic_inc(&global_trace.data[cpu]->disabled);
> -               if (max_tr.data[cpu])
> -                       atomic_inc(&max_tr.data[cpu]->disabled);
> -       }
> -
>        if (cpu_id != RING_BUFFER_ALL_CPUS) {
>                /* make sure, this cpu is enabled in the mask */
>                if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
> @@ -2963,14 +2953,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
>                ret = -ENOMEM;
>
>  out:
> -       for_each_tracing_cpu(cpu) {
> -               if (global_trace.data[cpu])
> -                       atomic_dec(&global_trace.data[cpu]->disabled);
> -               if (max_tr.data[cpu])
> -                       atomic_dec(&max_tr.data[cpu]->disabled);
> -       }
> -
> -       tracing_start();
>        mutex_unlock(&trace_types_lock);
>
>        return ret;
> --
> 1.7.3.1
>
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 4/5] trace: Make removal of ring buffer pages atomic
  2011-08-23 19:16     ` David Sharp
@ 2011-08-23 19:20       ` Vaibhav Nagarnaik
  2011-08-23 19:24       ` Steven Rostedt
  1 sibling, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-08-23 19:20 UTC (permalink / raw)
  To: David Sharp
  Cc: Steven Rostedt, Ingo Molnar, Frederic Weisbecker, Michael Rubin,
	linux-kernel

On Tue, Aug 23, 2011 at 12:16 PM, David Sharp <dhsharp@google.com> wrote:
> On Tue, Aug 23, 2011 at 11:55 AM, Vaibhav Nagarnaik
> <vnagarnaik@google.com> wrote:
>> This patch adds the capability to remove pages from a ring buffer
>> without destroying any existing data in it.
>>
>> This is done by removing the pages after the tail page. This makes sure
>> that first all the empty pages in the ring buffer are removed. If the
>> head page is one in the list of pages to be removed, then the page after
>> the removed ones is made the head page. This removes the oldest data
>> from the ring buffer and keeps the latest data around to be read.
>>
>> To do this in a non-racey manner, tracing is stopped for a very short
>> time while the pages to be removed are identified and unlinked from the
>> ring buffer. The pages are freed after the tracing is restarted to
>> minimize the time needed to stop tracing.
>>
>> The context in which the pages from the per-cpu ring buffer are removed
>> runs on the respective CPU. This minimizes the events not traced to only
>> NMI trace contexts.
>
> "interrupt contexts". We're not disabling interrupts.
>

Since the reader_lock is held using spin_lock_irq(), there won't be IRQs
coming in, only the NMIs.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 4/5] trace: Make removal of ring buffer pages atomic
  2011-08-23 19:16     ` David Sharp
  2011-08-23 19:20       ` Vaibhav Nagarnaik
@ 2011-08-23 19:24       ` Steven Rostedt
  1 sibling, 0 replies; 80+ messages in thread
From: Steven Rostedt @ 2011-08-23 19:24 UTC (permalink / raw)
  To: David Sharp
  Cc: Vaibhav Nagarnaik, Ingo Molnar, Frederic Weisbecker,
	Michael Rubin, linux-kernel

On Tue, 2011-08-23 at 12:16 -0700, David Sharp wrote:
> On Tue, Aug 23, 2011 at 11:55 AM, Vaibhav Nagarnaik
> <vnagarnaik@google.com> wrote:
> > This patch adds the capability to remove pages from a ring buffer
> > without destroying any existing data in it.
> >
> > This is done by removing the pages after the tail page. This makes sure
> > that first all the empty pages in the ring buffer are removed. If the
> > head page is one in the list of pages to be removed, then the page after
> > the removed ones is made the head page. This removes the oldest data
> > from the ring buffer and keeps the latest data around to be read.
> >
> > To do this in a non-racey manner, tracing is stopped for a very short
> > time while the pages to be removed are identified and unlinked from the
> > ring buffer. The pages are freed after the tracing is restarted to
> > minimize the time needed to stop tracing.
> >
> > The context in which the pages from the per-cpu ring buffer are removed
> > runs on the respective CPU. This minimizes the events not traced to only
> > NMI trace contexts.
> 
> "interrupt contexts". We're not disabling interrupts.

Hi David,

Just a note on LKML etiquette, when you make a single comment in an
email, please delete the rest of the email, or at the very least, sign
it with your name, so those reading your reply wont go looking through
the email for what else you wrote.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3] trace: Add per_cpu ring buffer control files
  2011-08-23  1:17   ` Vaibhav Nagarnaik
@ 2011-09-03  2:45     ` Steven Rostedt
  2011-09-06 18:56       ` Vaibhav Nagarnaik
  2011-10-12  1:20     ` [PATCH v4 1/4] " Vaibhav Nagarnaik
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2011-09-03  2:45 UTC (permalink / raw)
  To: Vaibhav Nagarnaik; +Cc: Michael Rubin, David Sharp, linux-kernel

On Mon, 2011-08-22 at 18:17 -0700, Vaibhav Nagarnaik wrote:
> Add a debugfs entry under per_cpu/ folder for each cpu called
> buffer_size_kb to control the ring buffer size for each CPU
> independently.
> 
> If the global file buffer_size_kb is used to set size, the individual
> ring buffers will be adjusted to the given size. The buffer_size_kb will
> report the common size to maintain backward compatibility.
> 
> If the buffer_size_kb file under the per_cpu/ directory is used to
> change buffer size for a specific CPU, only the size of the respective
> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
> 'X' to indicate that sizes of per_cpu ring buffers are not equivalent.
> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---
> Changelog v3-v2:
> * Fix compilation errors when using allyesconfig.
> 
>  include/linux/ring_buffer.h |    6 +-
>  kernel/trace/ring_buffer.c  |  248 ++++++++++++++++++++++++-------------------
>  kernel/trace/trace.c        |  191 ++++++++++++++++++++++++++-------
>  kernel/trace/trace.h        |    2 +-
>  4 files changed, 298 insertions(+), 149 deletions(-)
> 
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 67be037..ad36702 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
>  	__ring_buffer_alloc((size), (flags), &__key);	\
>  })
>  
> +#define RING_BUFFER_ALL_CPUS -1
> +
>  void ring_buffer_free(struct ring_buffer *buffer);
>  
> -int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
> +int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
>  
>  void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
>  
> @@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
>  void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
>  int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
>  
> -unsigned long ring_buffer_size(struct ring_buffer *buffer);
> +unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
>  
>  void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
>  void ring_buffer_reset(struct ring_buffer *buffer);
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index acf6b68..bb0ffdd 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
>  	spinlock_t			reader_lock;	/* serialize readers */
>  	arch_spinlock_t			lock;
>  	struct lock_class_key		lock_key;
> +	unsigned int			nr_pages;
>  	struct list_head		*pages;
>  	struct buffer_page		*head_page;	/* read from head */
>  	struct buffer_page		*tail_page;	/* write to tail */
> @@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
>  	unsigned long			read_bytes;
>  	u64				write_stamp;
>  	u64				read_stamp;
> +	/* ring buffer pages to update, > 0 to add, < 0 to remove */
> +	int				nr_pages_to_update;
> +	struct list_head		new_pages; /* new pages to add */

There's no reason for the added new_pages. And I'm not sure I even like
the 'nr_pages_to_update' either. These are only used for resizing and
are just wasting space otherwise.

You could allocate an array of numbers for the nr_pages_to_update and
use that instead. As for the list, heck, you can still use a single list
and pass that around like the original code did.


>  };
>  
>  struct ring_buffer {
> -	unsigned			pages;
>  	unsigned			flags;
>  	int				cpus;
>  	atomic_t			record_disabled;
> @@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
>  	return 0;
>  }
>  
> -static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
> -			     unsigned nr_pages)
> +static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
>  {
> +	int i;
>  	struct buffer_page *bpage, *tmp;
> -	LIST_HEAD(pages);
> -	unsigned i;
> -
> -	WARN_ON(!nr_pages);
>  
>  	for (i = 0; i < nr_pages; i++) {
>  		struct page *page;
> @@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>  		 */
>  		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>  				    GFP_KERNEL | __GFP_NORETRY,
> -				    cpu_to_node(cpu_buffer->cpu));
> +				    cpu_to_node(cpu));
>  		if (!bpage)
>  			goto free_pages;
>  
> -		rb_check_bpage(cpu_buffer, bpage);
> -
> -		list_add(&bpage->list, &pages);
> +		list_add(&bpage->list, pages);
>  
> -		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
> +		page = alloc_pages_node(cpu_to_node(cpu),
>  					GFP_KERNEL | __GFP_NORETRY, 0);
>  		if (!page)
>  			goto free_pages;
> @@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>  		rb_init_page(bpage->page);
>  	}
>  
> +	return 0;
> +
> +free_pages:
> +	list_for_each_entry_safe(bpage, tmp, pages, list) {
> +		list_del_init(&bpage->list);
> +		free_buffer_page(bpage);
> +	}
> +
> +	return -ENOMEM;
> +}
> +
> +static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
> +			     unsigned nr_pages)
> +{
> +	LIST_HEAD(pages);
> +
> +	WARN_ON(!nr_pages);
> +
> +	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
> +		return -ENOMEM;
> +
>  	/*
>  	 * The ring buffer page list is a circular list that does not
>  	 * start and end with a list head. All page list items point to
> @@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>  	cpu_buffer->pages = pages.next;
>  	list_del(&pages);
>  
> +	cpu_buffer->nr_pages = nr_pages;
> +
>  	rb_check_pages(cpu_buffer);
>  
>  	return 0;
> -
> - free_pages:
> -	list_for_each_entry_safe(bpage, tmp, &pages, list) {
> -		list_del_init(&bpage->list);
> -		free_buffer_page(bpage);
> -	}
> -	return -ENOMEM;
>  }
>  
>  static struct ring_buffer_per_cpu *
> -rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
> +rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>  {
>  	struct ring_buffer_per_cpu *cpu_buffer;
>  	struct buffer_page *bpage;
> @@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
>  
>  	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
>  
> -	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
> +	ret = rb_allocate_pages(cpu_buffer, nr_pages);
>  	if (ret < 0)
>  		goto fail_free_reader;
>  
> @@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>  {
>  	struct ring_buffer *buffer;
>  	int bsize;
> -	int cpu;
> +	int cpu, nr_pages;
>  
>  	/* keep it in its own cache line */
>  	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
> @@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>  	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
>  		goto fail_free_buffer;
>  
> -	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
> +	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>  	buffer->flags = flags;
>  	buffer->clock = trace_clock_local;
>  	buffer->reader_lock_key = key;
>  
>  	/* need at least two pages */
> -	if (buffer->pages < 2)
> -		buffer->pages = 2;
> +	if (nr_pages < 2)
> +		nr_pages = 2;
>  
>  	/*
>  	 * In case of non-hotplug cpu, if the ring-buffer is allocated
> @@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>  
>  	for_each_buffer_cpu(buffer, cpu) {
>  		buffer->buffers[cpu] =
> -			rb_allocate_cpu_buffer(buffer, cpu);
> +			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
>  		if (!buffer->buffers[cpu])
>  			goto fail_free_buffers;
>  	}
> @@ -1308,6 +1321,17 @@ out:
>  	spin_unlock_irq(&cpu_buffer->reader_lock);
>  }
>  
> +static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
> +{
> +	if (cpu_buffer->nr_pages_to_update > 0)
> +		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
> +				cpu_buffer->nr_pages_to_update);
> +	else
> +		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
> +	/* reset this value */
> +	cpu_buffer->nr_pages_to_update = 0;
> +}
> +
>  /**
>   * ring_buffer_resize - resize the ring buffer
>   * @buffer: the buffer to resize.
> @@ -1317,14 +1341,12 @@ out:
>   *
>   * Returns -1 on failure.
>   */
> -int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
> +int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
> +			int cpu_id)
>  {
>  	struct ring_buffer_per_cpu *cpu_buffer;
> -	unsigned nr_pages, rm_pages, new_pages;
> -	struct buffer_page *bpage, *tmp;
> -	unsigned long buffer_size;
> -	LIST_HEAD(pages);
> -	int i, cpu;
> +	unsigned nr_pages;
> +	int cpu;
>  
>  	/*
>  	 * Always succeed at resizing a non-existent buffer:
> @@ -1334,15 +1356,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>  
>  	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>  	size *= BUF_PAGE_SIZE;
> -	buffer_size = buffer->pages * BUF_PAGE_SIZE;
>  
>  	/* we need a minimum of two pages */
>  	if (size < BUF_PAGE_SIZE * 2)
>  		size = BUF_PAGE_SIZE * 2;
>  
> -	if (size == buffer_size)
> -		return size;
> -
>  	atomic_inc(&buffer->record_disabled);
>  
>  	/* Make sure all writers are done with this buffer. */
> @@ -1353,68 +1371,59 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>  
>  	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>  
> -	if (size < buffer_size) {
> +	if (cpu_id == RING_BUFFER_ALL_CPUS) {
> +		/* calculate the pages to update */
> +		for_each_buffer_cpu(buffer, cpu) {
> +			cpu_buffer = buffer->buffers[cpu];
> +
> +			cpu_buffer->nr_pages_to_update = nr_pages -
> +							cpu_buffer->nr_pages;
>  
> -		/* easy case, just free pages */
> -		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
> -			goto out_fail;
> +			/*
> +			 * nothing more to do for removing pages or no update
> +			 */
> +			if (cpu_buffer->nr_pages_to_update <= 0)
> +				continue;
>  
> -		rm_pages = buffer->pages - nr_pages;
> +			/*
> +			 * to add pages, make sure all new pages can be
> +			 * allocated without receiving ENOMEM
> +			 */
> +			INIT_LIST_HEAD(&cpu_buffer->new_pages);
> +			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> +						&cpu_buffer->new_pages, cpu))
> +				/* not enough memory for new pages */
> +				goto no_mem;
> +		}
>  
> +		/* wait for all the updates to complete */
>  		for_each_buffer_cpu(buffer, cpu) {
>  			cpu_buffer = buffer->buffers[cpu];
> -			rb_remove_pages(cpu_buffer, rm_pages);
> +			if (cpu_buffer->nr_pages_to_update) {
> +				update_pages_handler(cpu_buffer);
> +				cpu_buffer->nr_pages = nr_pages;

The two places that call update_pages_handler() also updates
cpu_buffer->nr_pages. Move that to the update_pages_handler() as well.


> +			}
>  		}
> -		goto out;
> -	}
> +	} else {
> +		cpu_buffer = buffer->buffers[cpu_id];
> +		if (nr_pages == cpu_buffer->nr_pages)
> +			goto out;
>  
> -	/*
> -	 * This is a bit more difficult. We only want to add pages
> -	 * when we can allocate enough for all CPUs. We do this
> -	 * by allocating all the pages and storing them on a local
> -	 * link list. If we succeed in our allocation, then we
> -	 * add these pages to the cpu_buffers. Otherwise we just free
> -	 * them all and return -ENOMEM;
> -	 */
> -	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
> -		goto out_fail;
> +		cpu_buffer->nr_pages_to_update = nr_pages -
> +						cpu_buffer->nr_pages;
>  
> -	new_pages = nr_pages - buffer->pages;
> +		INIT_LIST_HEAD(&cpu_buffer->new_pages);
> +		if (cpu_buffer->nr_pages_to_update > 0 &&
> +			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> +						&cpu_buffer->new_pages, cpu_id))
> +			goto no_mem;
>  
> -	for_each_buffer_cpu(buffer, cpu) {
> -		for (i = 0; i < new_pages; i++) {
> -			struct page *page;
> -			/*
> -			 * __GFP_NORETRY flag makes sure that the allocation
> -			 * fails gracefully without invoking oom-killer and
> -			 * the system is not destabilized.
> -			 */
> -			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
> -						  cache_line_size()),
> -					    GFP_KERNEL | __GFP_NORETRY,
> -					    cpu_to_node(cpu));
> -			if (!bpage)
> -				goto free_pages;
> -			list_add(&bpage->list, &pages);
> -			page = alloc_pages_node(cpu_to_node(cpu),
> -						GFP_KERNEL | __GFP_NORETRY, 0);
> -			if (!page)
> -				goto free_pages;
> -			bpage->page = page_address(page);
> -			rb_init_page(bpage->page);
> -		}
> -	}
> +		update_pages_handler(cpu_buffer);
>  
> -	for_each_buffer_cpu(buffer, cpu) {
> -		cpu_buffer = buffer->buffers[cpu];
> -		rb_insert_pages(cpu_buffer, &pages, new_pages);
> +		cpu_buffer->nr_pages = nr_pages;
>  	}
>  
> -	if (RB_WARN_ON(buffer, !list_empty(&pages)))
> -		goto out_fail;
> -
>   out:
> -	buffer->pages = nr_pages;
>  	put_online_cpus();
>  	mutex_unlock(&buffer->mutex);
>  
> @@ -1422,25 +1431,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>  
>  	return size;
>  
> - free_pages:
> -	list_for_each_entry_safe(bpage, tmp, &pages, list) {
> -		list_del_init(&bpage->list);
> -		free_buffer_page(bpage);
> + no_mem:
> +	for_each_buffer_cpu(buffer, cpu) {
> +		struct buffer_page *bpage, *tmp;
> +		cpu_buffer = buffer->buffers[cpu];
> +		/* reset this number regardless */
> +		cpu_buffer->nr_pages_to_update = 0;
> +		if (list_empty(&cpu_buffer->new_pages))
> +			continue;
> +		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
> +					list) {
> +			list_del_init(&bpage->list);
> +			free_buffer_page(bpage);
> +		}
>  	}
>  	put_online_cpus();
>  	mutex_unlock(&buffer->mutex);
>  	atomic_dec(&buffer->record_disabled);
>  	return -ENOMEM;
> -
> -	/*
> -	 * Something went totally wrong, and we are too paranoid
> -	 * to even clean up the mess.
> -	 */
> - out_fail:
> -	put_online_cpus();
> -	mutex_unlock(&buffer->mutex);
> -	atomic_dec(&buffer->record_disabled);
> -	return -1;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_resize);
>  
> @@ -1542,7 +1550,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
>  	 * assign the commit to the tail.
>  	 */
>   again:
> -	max_count = cpu_buffer->buffer->pages * 100;
> +	max_count = cpu_buffer->nr_pages * 100;
>  
>  	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
>  		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
> @@ -3563,9 +3571,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
>   * ring_buffer_size - return the size of the ring buffer (in bytes)
>   * @buffer: The ring buffer.
>   */
> -unsigned long ring_buffer_size(struct ring_buffer *buffer)
> +unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
>  {
> -	return BUF_PAGE_SIZE * buffer->pages;
> +	/*
> +	 * Earlier, this method returned
> +	 *	BUF_PAGE_SIZE * buffer->nr_pages
> +	 * Since the nr_pages field is now removed, we have converted this to
> +	 * return the per cpu buffer value.
> +	 */
> +	if (!cpumask_test_cpu(cpu, buffer->cpumask))
> +		return 0;
> +
> +	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_size);
>  
> @@ -3740,8 +3757,11 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>  	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
>  		goto out;
>  
> +	cpu_buffer_a = buffer_a->buffers[cpu];
> +	cpu_buffer_b = buffer_b->buffers[cpu];
> +
>  	/* At least make sure the two buffers are somewhat the same */
> -	if (buffer_a->pages != buffer_b->pages)
> +	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
>  		goto out;
>  
>  	ret = -EAGAIN;
> @@ -3755,9 +3775,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>  	if (atomic_read(&buffer_b->record_disabled))
>  		goto out;
>  
> -	cpu_buffer_a = buffer_a->buffers[cpu];
> -	cpu_buffer_b = buffer_b->buffers[cpu];
> -
>  	if (atomic_read(&cpu_buffer_a->record_disabled))
>  		goto out;
>  
> @@ -4108,6 +4125,8 @@ static int rb_cpu_notify(struct notifier_block *self,
>  	struct ring_buffer *buffer =
>  		container_of(self, struct ring_buffer, cpu_notify);
>  	long cpu = (long)hcpu;
> +	int cpu_i, nr_pages_same;
> +	unsigned int nr_pages;
>  
>  	switch (action) {
>  	case CPU_UP_PREPARE:
> @@ -4115,8 +4134,23 @@ static int rb_cpu_notify(struct notifier_block *self,
>  		if (cpumask_test_cpu(cpu, buffer->cpumask))
>  			return NOTIFY_OK;
>  
> +		nr_pages = 0;
> +		nr_pages_same = 1;
> +		/* check if all cpu sizes are same */
> +		for_each_buffer_cpu(buffer, cpu_i) {
> +			/* fill in the size from first enabled cpu */
> +			if (nr_pages == 0)
> +				nr_pages = buffer->buffers[cpu_i]->nr_pages;
> +			if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
> +				nr_pages_same = 0;
> +				break;
> +			}
> +		}
> +		/* allocate minimum pages, user can later expand it */
> +		if (!nr_pages_same)
> +			nr_pages = 2;
>  		buffer->buffers[cpu] =
> -			rb_allocate_cpu_buffer(buffer, cpu);
> +			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
>  		if (!buffer->buffers[cpu]) {
>  			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
>  			     cpu);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index b419070..bb3c867 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -784,7 +784,8 @@ __acquires(kernel_lock)
>  
>  		/* If we expanded the buffers, make sure the max is expanded too */
>  		if (ring_buffer_expanded && type->use_max_tr)
> -			ring_buffer_resize(max_tr.buffer, trace_buf_size);
> +			ring_buffer_resize(max_tr.buffer, trace_buf_size,
> +						RING_BUFFER_ALL_CPUS);
>  
>  		/* the test is responsible for initializing and enabling */
>  		pr_info("Testing tracer %s: ", type->name);
> @@ -800,7 +801,8 @@ __acquires(kernel_lock)
>  
>  		/* Shrink the max buffer again */
>  		if (ring_buffer_expanded && type->use_max_tr)
> -			ring_buffer_resize(max_tr.buffer, 1);
> +			ring_buffer_resize(max_tr.buffer, 1,
> +						RING_BUFFER_ALL_CPUS);
>  
>  		printk(KERN_CONT "PASSED\n");
>  	}
> @@ -2853,7 +2855,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
>  	return t->init(tr);
>  }
>  
> -static int __tracing_resize_ring_buffer(unsigned long size)
> +static void set_buffer_entries(struct trace_array *tr, unsigned long val)
> +{
> +	int cpu;
> +	for_each_tracing_cpu(cpu)
> +		tr->data[cpu]->entries = val;
> +}
> +
> +static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
>  {
>  	int ret;
>  
> @@ -2864,19 +2873,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
>  	 */
>  	ring_buffer_expanded = 1;
>  
> -	ret = ring_buffer_resize(global_trace.buffer, size);
> +	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
>  	if (ret < 0)
>  		return ret;
>  
>  	if (!current_trace->use_max_tr)
>  		goto out;
>  
> -	ret = ring_buffer_resize(max_tr.buffer, size);
> +	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
>  	if (ret < 0) {
> -		int r;
> +		int r = 0;
> +
> +		if (cpu == RING_BUFFER_ALL_CPUS) {
> +			int i;
> +			for_each_tracing_cpu(i) {
> +				r = ring_buffer_resize(global_trace.buffer,
> +						global_trace.data[i]->entries,
> +						i);
> +				if (r < 0)
> +					break;
> +			}
> +		} else {
> +			r = ring_buffer_resize(global_trace.buffer,
> +						global_trace.data[cpu]->entries,
> +						cpu);
> +		}
>  
> -		r = ring_buffer_resize(global_trace.buffer,
> -				       global_trace.entries);
>  		if (r < 0) {
>  			/*
>  			 * AARGH! We are left with different
> @@ -2898,14 +2920,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
>  		return ret;
>  	}
>  
> -	max_tr.entries = size;
> +	if (cpu == RING_BUFFER_ALL_CPUS)
> +		set_buffer_entries(&max_tr, size);
> +	else
> +		max_tr.data[cpu]->entries = size;
> +
>   out:
> -	global_trace.entries = size;
> +	if (cpu == RING_BUFFER_ALL_CPUS)
> +		set_buffer_entries(&global_trace, size);
> +	else
> +		global_trace.data[cpu]->entries = size;
>  
>  	return ret;
>  }
>  
> -static ssize_t tracing_resize_ring_buffer(unsigned long size)
> +static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
>  {
>  	int cpu, ret = size;
>  
> @@ -2921,12 +2950,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
>  			atomic_inc(&max_tr.data[cpu]->disabled);
>  	}
>  
> -	if (size != global_trace.entries)
> -		ret = __tracing_resize_ring_buffer(size);
> +	if (cpu_id != RING_BUFFER_ALL_CPUS) {
> +		/* make sure, this cpu is enabled in the mask */
> +		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +	}
>  
> +	ret = __tracing_resize_ring_buffer(size, cpu_id);
>  	if (ret < 0)
>  		ret = -ENOMEM;
>  
> +out:
>  	for_each_tracing_cpu(cpu) {
>  		if (global_trace.data[cpu])
>  			atomic_dec(&global_trace.data[cpu]->disabled);
> @@ -2957,7 +2993,8 @@ int tracing_update_buffers(void)
>  
>  	mutex_lock(&trace_types_lock);
>  	if (!ring_buffer_expanded)
> -		ret = __tracing_resize_ring_buffer(trace_buf_size);
> +		ret = __tracing_resize_ring_buffer(trace_buf_size,
> +						RING_BUFFER_ALL_CPUS);
>  	mutex_unlock(&trace_types_lock);
>  
>  	return ret;
> @@ -2981,7 +3018,8 @@ static int tracing_set_tracer(const char *buf)
>  	mutex_lock(&trace_types_lock);
>  
>  	if (!ring_buffer_expanded) {
> -		ret = __tracing_resize_ring_buffer(trace_buf_size);
> +		ret = __tracing_resize_ring_buffer(trace_buf_size,
> +						RING_BUFFER_ALL_CPUS);
>  		if (ret < 0)
>  			goto out;
>  		ret = 0;
> @@ -3007,8 +3045,8 @@ static int tracing_set_tracer(const char *buf)
>  		 * The max_tr ring buffer has some state (e.g. ring->clock) and
>  		 * we want preserve it.
>  		 */
> -		ring_buffer_resize(max_tr.buffer, 1);
> -		max_tr.entries = 1;
> +		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
> +		set_buffer_entries(&max_tr, 1);
>  	}
>  	destroy_trace_option_files(topts);
>  
> @@ -3016,10 +3054,17 @@ static int tracing_set_tracer(const char *buf)
>  
>  	topts = create_trace_option_files(current_trace);
>  	if (current_trace->use_max_tr) {
> -		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
> -		if (ret < 0)
> -			goto out;
> -		max_tr.entries = global_trace.entries;
> +		int cpu;
> +		/* we need to make per cpu buffer sizes equivalent */
> +		for_each_tracing_cpu(cpu) {
> +			ret = ring_buffer_resize(max_tr.buffer,
> +						global_trace.data[cpu]->entries,
> +						cpu);
> +			if (ret < 0)
> +				goto out;
> +			max_tr.data[cpu]->entries =
> +					global_trace.data[cpu]->entries;
> +		}
>  	}
>  
>  	if (t->init) {
> @@ -3521,30 +3566,82 @@ out_err:
>  	goto out;
>  }
>  
> +struct ftrace_entries_info {
> +	struct trace_array	*tr;
> +	int			cpu;
> +};
> +
> +static int tracing_entries_open(struct inode *inode, struct file *filp)
> +{
> +	struct ftrace_entries_info *info;
> +
> +	if (tracing_disabled)
> +		return -ENODEV;
> +
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		return -ENOMEM;
> +
> +	info->tr = &global_trace;
> +	info->cpu = (unsigned long)inode->i_private;
> +
> +	filp->private_data = info;
> +
> +	return 0;
> +}
> +
>  static ssize_t
>  tracing_entries_read(struct file *filp, char __user *ubuf,
>  		     size_t cnt, loff_t *ppos)
>  {
> -	struct trace_array *tr = filp->private_data;
> -	char buf[96];
> -	int r;
> +	struct ftrace_entries_info *info = filp->private_data;
> +	struct trace_array *tr = info->tr;
> +	char buf[64];
> +	int r = 0;
> +	ssize_t ret;
>  
>  	mutex_lock(&trace_types_lock);
> -	if (!ring_buffer_expanded)
> -		r = sprintf(buf, "%lu (expanded: %lu)\n",
> -			    tr->entries >> 10,
> -			    trace_buf_size >> 10);
> -	else
> -		r = sprintf(buf, "%lu\n", tr->entries >> 10);
> +
> +	if (info->cpu == RING_BUFFER_ALL_CPUS) {
> +		int cpu, buf_size_same;
> +		unsigned long size;
> +
> +		size = 0;
> +		buf_size_same = 1;
> +		/* check if all cpu sizes are same */
> +		for_each_tracing_cpu(cpu) {
> +			/* fill in the size from first enabled cpu */
> +			if (size == 0)
> +				size = tr->data[cpu]->entries;
> +			if (size != tr->data[cpu]->entries) {
> +				buf_size_same = 0;
> +				break;
> +			}
> +		}
> +
> +		if (buf_size_same) {
> +			if (!ring_buffer_expanded)
> +				r = sprintf(buf, "%lu (expanded: %lu)\n",
> +					    size >> 10,
> +					    trace_buf_size >> 10);
> +			else
> +				r = sprintf(buf, "%lu\n", size >> 10);
> +		} else
> +			r = sprintf(buf, "X\n");
> +	} else
> +		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
> +
>  	mutex_unlock(&trace_types_lock);
>  
> -	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +	return ret;
>  }
>  
>  static ssize_t
>  tracing_entries_write(struct file *filp, const char __user *ubuf,
>  		      size_t cnt, loff_t *ppos)
>  {
> +	struct ftrace_entries_info *info = filp->private_data;
>  	unsigned long val;
>  	int ret;
>  
> @@ -3559,7 +3656,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>  	/* value is in KB */
>  	val <<= 10;
>  
> -	ret = tracing_resize_ring_buffer(val);
> +	ret = tracing_resize_ring_buffer(val, info->cpu);
>  	if (ret < 0)
>  		return ret;
>  
> @@ -3568,6 +3665,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>  	return cnt;
>  }
>  
> +static int
> +tracing_entries_release(struct inode *inode, struct file *filp)
> +{
> +	struct ftrace_entries_info *info = filp->private_data;
> +
> +	kfree(info);
> +
> +	return 0;
> +}
> +
>  static ssize_t
>  tracing_total_entries_read(struct file *filp, char __user *ubuf,
>  				size_t cnt, loff_t *ppos)
> @@ -3579,7 +3686,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
>  
>  	mutex_lock(&trace_types_lock);
>  	for_each_tracing_cpu(cpu) {
> -		size += tr->entries >> 10;
> +		size += tr->data[cpu]->entries >> 10;
>  		if (!ring_buffer_expanded)
>  			expanded_size += trace_buf_size >> 10;
>  	}
> @@ -3613,7 +3720,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
>  	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
>  		tracing_off();
>  	/* resize the ring buffer to 0 */
> -	tracing_resize_ring_buffer(0);
> +	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
>  
>  	return 0;
>  }
> @@ -3757,9 +3864,10 @@ static const struct file_operations tracing_pipe_fops = {
>  };
>  
>  static const struct file_operations tracing_entries_fops = {
> -	.open		= tracing_open_generic,
> +	.open		= tracing_entries_open,
>  	.read		= tracing_entries_read,
>  	.write		= tracing_entries_write,
> +	.release	= tracing_entries_release,
>  	.llseek		= generic_file_llseek,
>  };
>  
> @@ -4211,6 +4319,9 @@ static void tracing_init_debugfs_percpu(long cpu)
>  
>  	trace_create_file("stats", 0444, d_cpu,
>  			(void *) cpu, &tracing_stats_fops);
> +
> +	trace_create_file("buffer_size_kb", 0444, d_cpu,
> +			(void *) cpu, &tracing_entries_fops);
>  }
>  
>  #ifdef CONFIG_FTRACE_SELFTEST
> @@ -4491,7 +4602,7 @@ static __init int tracer_init_debugfs(void)
>  			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
>  
>  	trace_create_file("buffer_size_kb", 0644, d_tracer,
> -			&global_trace, &tracing_entries_fops);
> +			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
>  
>  	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
>  			&global_trace, &tracing_total_entries_fops);
> @@ -4737,8 +4848,6 @@ __init static int tracer_alloc_buffers(void)
>  		WARN_ON(1);
>  		goto out_free_cpumask;
>  	}
> -	global_trace.entries = ring_buffer_size(global_trace.buffer);
> -
>  
>  #ifdef CONFIG_TRACER_MAX_TRACE
>  	max_tr.buffer = ring_buffer_alloc(1, rb_flags);
> @@ -4748,7 +4857,6 @@ __init static int tracer_alloc_buffers(void)
>  		ring_buffer_free(global_trace.buffer);
>  		goto out_free_cpumask;
>  	}
> -	max_tr.entries = 1;
>  #endif
>  
>  	/* Allocate the first page for all buffers */
> @@ -4757,6 +4865,11 @@ __init static int tracer_alloc_buffers(void)
>  		max_tr.data[i] = &per_cpu(max_tr_data, i);
>  	}
>  
> +	set_buffer_entries(&global_trace, ring_buf_size);
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +	set_buffer_entries(&max_tr, 1);
> +#endif
> +
>  	trace_init_cmdlines();
>  
>  	register_tracer(&nop_trace);
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 616846b..126d333 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -125,6 +125,7 @@ struct trace_array_cpu {
>  	atomic_t		disabled;
>  	void			*buffer_page;	/* ring buffer spare */
>  
> +	unsigned long		entries;
>  	unsigned long		saved_latency;
>  	unsigned long		critical_start;
>  	unsigned long		critical_end;
> @@ -146,7 +147,6 @@ struct trace_array_cpu {
>   */
>  struct trace_array {
>  	struct ring_buffer	*buffer;
> -	unsigned long		entries;
>  	int			cpu;
>  	cycle_t			time_start;
>  	struct task_struct	*waiter;


I'm still very nervous about this patch. I'm going to hold off a release
cycle before even giving it up to Ingo.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3] trace: Add per_cpu ring buffer control files
  2011-09-03  2:45     ` Steven Rostedt
@ 2011-09-06 18:56       ` Vaibhav Nagarnaik
  2011-09-07 17:13         ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-09-06 18:56 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Michael Rubin, David Sharp, linux-kernel

On Fri, Sep 2, 2011 at 7:45 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2011-08-22 at 18:17 -0700, Vaibhav Nagarnaik wrote:
>> +     /* ring buffer pages to update, > 0 to add, < 0 to remove */
>> +     int                             nr_pages_to_update;
>> +     struct list_head                new_pages; /* new pages to add */
>
> There's no reason for the added new_pages. And I'm not sure I even like
> the 'nr_pages_to_update' either. These are only used for resizing and
> are just wasting space otherwise.
>
> You could allocate an array of numbers for the nr_pages_to_update and
> use that instead. As for the list, heck, you can still use a single list
> and pass that around like the original code did.
>

In this patch's context, I could still use the same logic of having a
temporary list of new pages and passing the nr_pages_to_update as a
parameter to the function. However, this comes in handy when considering
the next 2 patches. The updates are meant to be per_cpu and so having
separate list of new pages is better to handle. Same thing with
nr_pages_to_update.

>> +                             update_pages_handler(cpu_buffer);
>> +                             cpu_buffer->nr_pages = nr_pages;
>
> The two places that call update_pages_handler() also updates
> cpu_buffer->nr_pages. Move that to the update_pages_handler() as well.

Sure. I will update that.

> I'm still very nervous about this patch. I'm going to hold off a release
> cycle before even giving it up to Ingo.

Sure. I agree that the patches need a rigorous review and your
confidence in them is a necessary element.

I am working on other projects now, so I might have a slower response
time. But I will try to update the patch and send it to you in a timely
manner.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3] trace: Add per_cpu ring buffer control files
  2011-09-06 18:56       ` Vaibhav Nagarnaik
@ 2011-09-07 17:13         ` Steven Rostedt
  0 siblings, 0 replies; 80+ messages in thread
From: Steven Rostedt @ 2011-09-07 17:13 UTC (permalink / raw)
  To: Vaibhav Nagarnaik; +Cc: Michael Rubin, David Sharp, linux-kernel

On Tue, 2011-09-06 at 11:56 -0700, Vaibhav Nagarnaik wrote:

> I am working on other projects now, so I might have a slower response
> time. But I will try to update the patch and send it to you in a timely
> manner.

Take your time. I'm currently at plumbers, and I'm also not pushing
anything while kernel.org is down.

-- Steve


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v4 1/4] trace: Add per_cpu ring buffer control files
  2011-08-23  1:17   ` Vaibhav Nagarnaik
  2011-09-03  2:45     ` Steven Rostedt
@ 2011-10-12  1:20     ` Vaibhav Nagarnaik
  2012-01-31 23:53       ` Vaibhav Nagarnaik
  2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
  2011-10-12  1:20     ` [PATCH v4 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-10-12  1:20 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v4-v3:
* restructure code according to feedback.

 include/linux/ring_buffer.h |    6 +-
 kernel/trace/ring_buffer.c  |  248 ++++++++++++++++++++++++-------------------
 kernel/trace/trace.c        |  191 ++++++++++++++++++++++++++-------
 kernel/trace/trace.h        |    2 +-
 4 files changed, 297 insertions(+), 150 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 67be037..ad36702 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+#define RING_BUFFER_ALL_CPUS -1
+
 void ring_buffer_free(struct ring_buffer *buffer);
 
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
 
 void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
 
@@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
 void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
 int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
 
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
 
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index acf6b68..7b39b09 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
 	spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
+	unsigned int			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
 	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
+	/* ring buffer pages to update, > 0 to add, < 0 to remove */
+	int				nr_pages_to_update;
+	struct list_head		new_pages; /* new pages to add */
 };
 
 struct ring_buffer {
-	unsigned			pages;
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
@@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
+static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
 {
+	int i;
 	struct buffer_page *bpage, *tmp;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    GFP_KERNEL | __GFP_NORETRY,
-				    cpu_to_node(cpu_buffer->cpu));
+				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
 
-		rb_check_bpage(cpu_buffer, bpage);
+		list_add(&bpage->list, pages);
 
-		list_add(&bpage->list, &pages);
-
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+		page = alloc_pages_node(cpu_to_node(cpu),
 					GFP_KERNEL | __GFP_NORETRY, 0);
 		if (!page)
 			goto free_pages;
@@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_init_page(bpage->page);
 	}
 
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+
+	return -ENOMEM;
+}
+
+static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	LIST_HEAD(pages);
+
+	WARN_ON(!nr_pages);
+
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+		return -ENOMEM;
+
 	/*
 	 * The ring buffer page list is a circular list that does not
 	 * start and end with a list head. All page list items point to
@@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);
 
+	cpu_buffer->nr_pages = nr_pages;
+
 	rb_check_pages(cpu_buffer);
 
 	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
 }
 
 static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
+rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
@@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	ret = rb_allocate_pages(cpu_buffer, nr_pages);
 	if (ret < 0)
 		goto fail_free_reader;
 
@@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 {
 	struct ring_buffer *buffer;
 	int bsize;
-	int cpu;
+	int cpu, nr_pages;
 
 	/* keep it in its own cache line */
 	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
 
 	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
+	if (nr_pages < 2)
+		nr_pages = 2;
 
 	/*
 	 * In case of non-hotplug cpu, if the ring-buffer is allocated
@@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 
 	for_each_buffer_cpu(buffer, cpu) {
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu])
 			goto fail_free_buffers;
 	}
@@ -1308,6 +1321,18 @@ out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
+static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	if (cpu_buffer->nr_pages_to_update > 0)
+		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
+				cpu_buffer->nr_pages_to_update);
+	else
+		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	/* reset this value */
+	cpu_buffer->nr_pages_to_update = 0;
+}
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -1317,14 +1342,12 @@ out:
  *
  * Returns -1 on failure.
  */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
+			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	LIST_HEAD(pages);
-	int i, cpu;
+	unsigned nr_pages;
+	int cpu;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1334,15 +1357,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
 
 	/* we need a minimum of two pages */
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	if (size == buffer_size)
-		return size;
-
 	atomic_inc(&buffer->record_disabled);
 
 	/* Make sure all writers are done with this buffer. */
@@ -1353,68 +1372,56 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	if (size < buffer_size) {
-
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
-
-		rm_pages = buffer->pages - nr_pages;
-
+	if (cpu_id == RING_BUFFER_ALL_CPUS) {
+		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
-		}
-		goto out;
-	}
 
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
+			cpu_buffer->nr_pages_to_update = nr_pages -
+							cpu_buffer->nr_pages;
 
-	new_pages = nr_pages - buffer->pages;
+			/*
+			 * nothing more to do for removing pages or no update
+			 */
+			if (cpu_buffer->nr_pages_to_update <= 0)
+				continue;
 
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			struct page *page;
 			/*
-			 * __GFP_NORETRY flag makes sure that the allocation
-			 * fails gracefully without invoking oom-killer and
-			 * the system is not destabilized.
+			 * to add pages, make sure all new pages can be
+			 * allocated without receiving ENOMEM
 			 */
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL | __GFP_NORETRY,
-					    cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			page = alloc_pages_node(cpu_to_node(cpu),
-						GFP_KERNEL | __GFP_NORETRY, 0);
-			if (!page)
-				goto free_pages;
-			bpage->page = page_address(page);
-			rb_init_page(bpage->page);
+			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu))
+				/* not enough memory for new pages */
+				goto no_mem;
 		}
-	}
 
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
-	}
+		/* wait for all the updates to complete */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (cpu_buffer->nr_pages_to_update) {
+				update_pages_handler(cpu_buffer);
+			}
+		}
+	} else {
+		cpu_buffer = buffer->buffers[cpu_id];
+		if (nr_pages == cpu_buffer->nr_pages)
+			goto out;
 
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
+		cpu_buffer->nr_pages_to_update = nr_pages -
+						cpu_buffer->nr_pages;
+
+		INIT_LIST_HEAD(&cpu_buffer->new_pages);
+		if (cpu_buffer->nr_pages_to_update > 0 &&
+			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu_id))
+			goto no_mem;
+
+		update_pages_handler(cpu_buffer);
+	}
 
  out:
-	buffer->pages = nr_pages;
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 
@@ -1422,25 +1429,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	return size;
 
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+ no_mem:
+	for_each_buffer_cpu(buffer, cpu) {
+		struct buffer_page *bpage, *tmp;
+		cpu_buffer = buffer->buffers[cpu];
+		/* reset this number regardless */
+		cpu_buffer->nr_pages_to_update = 0;
+		if (list_empty(&cpu_buffer->new_pages))
+			continue;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
 	}
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1542,7 +1548,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 	 * assign the commit to the tail.
 	 */
  again:
-	max_count = cpu_buffer->buffer->pages * 100;
+	max_count = cpu_buffer->nr_pages * 100;
 
 	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
 		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
@@ -3563,9 +3569,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
  * ring_buffer_size - return the size of the ring buffer (in bytes)
  * @buffer: The ring buffer.
  */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
 {
-	return BUF_PAGE_SIZE * buffer->pages;
+	/*
+	 * Earlier, this method returned
+	 *	BUF_PAGE_SIZE * buffer->nr_pages
+	 * Since the nr_pages field is now removed, we have converted this to
+	 * return the per cpu buffer value.
+	 */
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_size);
 
@@ -3740,8 +3755,11 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
 		goto out;
 
+	cpu_buffer_a = buffer_a->buffers[cpu];
+	cpu_buffer_b = buffer_b->buffers[cpu];
+
 	/* At least make sure the two buffers are somewhat the same */
-	if (buffer_a->pages != buffer_b->pages)
+	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
 		goto out;
 
 	ret = -EAGAIN;
@@ -3755,9 +3773,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	if (atomic_read(&buffer_b->record_disabled))
 		goto out;
 
-	cpu_buffer_a = buffer_a->buffers[cpu];
-	cpu_buffer_b = buffer_b->buffers[cpu];
-
 	if (atomic_read(&cpu_buffer_a->record_disabled))
 		goto out;
 
@@ -4108,6 +4123,8 @@ static int rb_cpu_notify(struct notifier_block *self,
 	struct ring_buffer *buffer =
 		container_of(self, struct ring_buffer, cpu_notify);
 	long cpu = (long)hcpu;
+	int cpu_i, nr_pages_same;
+	unsigned int nr_pages;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -4115,8 +4132,23 @@ static int rb_cpu_notify(struct notifier_block *self,
 		if (cpumask_test_cpu(cpu, buffer->cpumask))
 			return NOTIFY_OK;
 
+		nr_pages = 0;
+		nr_pages_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_buffer_cpu(buffer, cpu_i) {
+			/* fill in the size from first enabled cpu */
+			if (nr_pages == 0)
+				nr_pages = buffer->buffers[cpu_i]->nr_pages;
+			if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
+				nr_pages_same = 0;
+				break;
+			}
+		}
+		/* allocate minimum pages, user can later expand it */
+		if (!nr_pages_same)
+			nr_pages = 2;
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu]) {
 			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
 			     cpu);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b419070..bb3c867 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -784,7 +784,8 @@ __acquires(kernel_lock)
 
 		/* If we expanded the buffers, make sure the max is expanded too */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, trace_buf_size);
+			ring_buffer_resize(max_tr.buffer, trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 
 		/* the test is responsible for initializing and enabling */
 		pr_info("Testing tracer %s: ", type->name);
@@ -800,7 +801,8 @@ __acquires(kernel_lock)
 
 		/* Shrink the max buffer again */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, 1);
+			ring_buffer_resize(max_tr.buffer, 1,
+						RING_BUFFER_ALL_CPUS);
 
 		printk(KERN_CONT "PASSED\n");
 	}
@@ -2853,7 +2855,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
 	return t->init(tr);
 }
 
-static int __tracing_resize_ring_buffer(unsigned long size)
+static void set_buffer_entries(struct trace_array *tr, unsigned long val)
+{
+	int cpu;
+	for_each_tracing_cpu(cpu)
+		tr->data[cpu]->entries = val;
+}
+
+static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 {
 	int ret;
 
@@ -2864,19 +2873,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 	 */
 	ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
 	if (ret < 0)
 		return ret;
 
 	if (!current_trace->use_max_tr)
 		goto out;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
 	if (ret < 0) {
-		int r;
+		int r = 0;
+
+		if (cpu == RING_BUFFER_ALL_CPUS) {
+			int i;
+			for_each_tracing_cpu(i) {
+				r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[i]->entries,
+						i);
+				if (r < 0)
+					break;
+			}
+		} else {
+			r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+		}
 
-		r = ring_buffer_resize(global_trace.buffer,
-				       global_trace.entries);
 		if (r < 0) {
 			/*
 			 * AARGH! We are left with different
@@ -2898,14 +2920,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 		return ret;
 	}
 
-	max_tr.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&max_tr, size);
+	else
+		max_tr.data[cpu]->entries = size;
+
  out:
-	global_trace.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&global_trace, size);
+	else
+		global_trace.data[cpu]->entries = size;
 
 	return ret;
 }
 
-static ssize_t tracing_resize_ring_buffer(unsigned long size)
+static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
 	int cpu, ret = size;
 
@@ -2921,12 +2950,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
 			atomic_inc(&max_tr.data[cpu]->disabled);
 	}
 
-	if (size != global_trace.entries)
-		ret = __tracing_resize_ring_buffer(size);
+	if (cpu_id != RING_BUFFER_ALL_CPUS) {
+		/* make sure, this cpu is enabled in the mask */
+		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
 
+	ret = __tracing_resize_ring_buffer(size, cpu_id);
 	if (ret < 0)
 		ret = -ENOMEM;
 
+out:
 	for_each_tracing_cpu(cpu) {
 		if (global_trace.data[cpu])
 			atomic_dec(&global_trace.data[cpu]->disabled);
@@ -2957,7 +2993,8 @@ int tracing_update_buffers(void)
 
 	mutex_lock(&trace_types_lock);
 	if (!ring_buffer_expanded)
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -2981,7 +3018,8 @@ static int tracing_set_tracer(const char *buf)
 	mutex_lock(&trace_types_lock);
 
 	if (!ring_buffer_expanded) {
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3007,8 +3045,8 @@ static int tracing_set_tracer(const char *buf)
 		 * The max_tr ring buffer has some state (e.g. ring->clock) and
 		 * we want preserve it.
 		 */
-		ring_buffer_resize(max_tr.buffer, 1);
-		max_tr.entries = 1;
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
 	}
 	destroy_trace_option_files(topts);
 
@@ -3016,10 +3054,17 @@ static int tracing_set_tracer(const char *buf)
 
 	topts = create_trace_option_files(current_trace);
 	if (current_trace->use_max_tr) {
-		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
-		if (ret < 0)
-			goto out;
-		max_tr.entries = global_trace.entries;
+		int cpu;
+		/* we need to make per cpu buffer sizes equivalent */
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				goto out;
+			max_tr.data[cpu]->entries =
+					global_trace.data[cpu]->entries;
+		}
 	}
 
 	if (t->init) {
@@ -3521,30 +3566,82 @@ out_err:
 	goto out;
 }
 
+struct ftrace_entries_info {
+	struct trace_array	*tr;
+	int			cpu;
+};
+
+static int tracing_entries_open(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info;
+
+	if (tracing_disabled)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->tr = &global_trace;
+	info->cpu = (unsigned long)inode->i_private;
+
+	filp->private_data = info;
+
+	return 0;
+}
+
 static ssize_t
 tracing_entries_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	struct trace_array *tr = filp->private_data;
-	char buf[96];
-	int r;
+	struct ftrace_entries_info *info = filp->private_data;
+	struct trace_array *tr = info->tr;
+	char buf[64];
+	int r = 0;
+	ssize_t ret;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		r = sprintf(buf, "%lu (expanded: %lu)\n",
-			    tr->entries >> 10,
-			    trace_buf_size >> 10);
-	else
-		r = sprintf(buf, "%lu\n", tr->entries >> 10);
+
+	if (info->cpu == RING_BUFFER_ALL_CPUS) {
+		int cpu, buf_size_same;
+		unsigned long size;
+
+		size = 0;
+		buf_size_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_tracing_cpu(cpu) {
+			/* fill in the size from first enabled cpu */
+			if (size == 0)
+				size = tr->data[cpu]->entries;
+			if (size != tr->data[cpu]->entries) {
+				buf_size_same = 0;
+				break;
+			}
+		}
+
+		if (buf_size_same) {
+			if (!ring_buffer_expanded)
+				r = sprintf(buf, "%lu (expanded: %lu)\n",
+					    size >> 10,
+					    trace_buf_size >> 10);
+			else
+				r = sprintf(buf, "%lu\n", size >> 10);
+		} else
+			r = sprintf(buf, "X\n");
+	} else
+		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
+
 	mutex_unlock(&trace_types_lock);
 
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	return ret;
 }
 
 static ssize_t
 tracing_entries_write(struct file *filp, const char __user *ubuf,
 		      size_t cnt, loff_t *ppos)
 {
+	struct ftrace_entries_info *info = filp->private_data;
 	unsigned long val;
 	int ret;
 
@@ -3559,7 +3656,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	/* value is in KB */
 	val <<= 10;
 
-	ret = tracing_resize_ring_buffer(val);
+	ret = tracing_resize_ring_buffer(val, info->cpu);
 	if (ret < 0)
 		return ret;
 
@@ -3568,6 +3665,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static int
+tracing_entries_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info = filp->private_data;
+
+	kfree(info);
+
+	return 0;
+}
+
 static ssize_t
 tracing_total_entries_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos)
@@ -3579,7 +3686,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu) {
-		size += tr->entries >> 10;
+		size += tr->data[cpu]->entries >> 10;
 		if (!ring_buffer_expanded)
 			expanded_size += trace_buf_size >> 10;
 	}
@@ -3613,7 +3720,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
 		tracing_off();
 	/* resize the ring buffer to 0 */
-	tracing_resize_ring_buffer(0);
+	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
 
 	return 0;
 }
@@ -3757,9 +3864,10 @@ static const struct file_operations tracing_pipe_fops = {
 };
 
 static const struct file_operations tracing_entries_fops = {
-	.open		= tracing_open_generic,
+	.open		= tracing_entries_open,
 	.read		= tracing_entries_read,
 	.write		= tracing_entries_write,
+	.release	= tracing_entries_release,
 	.llseek		= generic_file_llseek,
 };
 
@@ -4211,6 +4319,9 @@ static void tracing_init_debugfs_percpu(long cpu)
 
 	trace_create_file("stats", 0444, d_cpu,
 			(void *) cpu, &tracing_stats_fops);
+
+	trace_create_file("buffer_size_kb", 0444, d_cpu,
+			(void *) cpu, &tracing_entries_fops);
 }
 
 #ifdef CONFIG_FTRACE_SELFTEST
@@ -4491,7 +4602,7 @@ static __init int tracer_init_debugfs(void)
 			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
 
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
-			&global_trace, &tracing_entries_fops);
+			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
 
 	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
 			&global_trace, &tracing_total_entries_fops);
@@ -4737,8 +4848,6 @@ __init static int tracer_alloc_buffers(void)
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
-
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 	max_tr.buffer = ring_buffer_alloc(1, rb_flags);
@@ -4748,7 +4857,6 @@ __init static int tracer_alloc_buffers(void)
 		ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = 1;
 #endif
 
 	/* Allocate the first page for all buffers */
@@ -4757,6 +4865,11 @@ __init static int tracer_alloc_buffers(void)
 		max_tr.data[i] = &per_cpu(max_tr_data, i);
 	}
 
+	set_buffer_entries(&global_trace, ring_buf_size);
+#ifdef CONFIG_TRACER_MAX_TRACE
+	set_buffer_entries(&max_tr, 1);
+#endif
+
 	trace_init_cmdlines();
 
 	register_tracer(&nop_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 616846b..126d333 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -125,6 +125,7 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
+	unsigned long		entries;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
@@ -146,7 +147,6 @@ struct trace_array_cpu {
  */
 struct trace_array {
 	struct ring_buffer	*buffer;
-	unsigned long		entries;
 	int			cpu;
 	cycle_t			time_start;
 	struct task_struct	*waiter;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 2/4] trace: Make removal of ring buffer pages atomic
  2011-08-23  1:17   ` Vaibhav Nagarnaik
  2011-09-03  2:45     ` Steven Rostedt
  2011-10-12  1:20     ` [PATCH v4 1/4] " Vaibhav Nagarnaik
@ 2011-10-12  1:20     ` Vaibhav Nagarnaik
  2011-10-12  1:20     ` [PATCH v4 3/4] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
  2011-10-12  1:20     ` [PATCH v4 4/4] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
  4 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-10-12  1:20 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/ring_buffer.c |  224 ++++++++++++++++++++++++++++++++-----------
 kernel/trace/trace.c       |   20 +----
 2 files changed, 167 insertions(+), 77 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7b39b09..d079702 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -502,6 +504,8 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
@@ -1080,6 +1084,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1267,32 +1273,107 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	unsigned int nr_removed;
+	int page_entries;
+	struct list_head *tail_page, *to_remove, *next_page;
+	unsigned long head_bit;
+	struct buffer_page *last_page, *first_page;
+	struct buffer_page *to_remove_page, *tmp_iter_page;
 
+	head_bit = 0;
 	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
-
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+	atomic_inc(&cpu_buffer->record_disabled);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock. We also don't race with writers after disabling recording.
+	 * This makes it easy to figure out the first and the last page to be
+	 * removed from the list. We remove all the pages in between including
+	 * the first and last pages. This is done in a busy loop so that we
+	 * lose the least number of traces.
+	 * The pages are freed after we restart recording and unlock readers.
+	 */
+	tail_page = &cpu_buffer->tail_page->list;
+	/*
+	 * tail page might be on reader page, we remove the next page
+	 * from the ring buffer
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		tail_page = rb_list_head(tail_page->next);
+	to_remove = tail_page;
+
+	/* start of pages to remove */
+	first_page = list_entry(rb_list_head(to_remove->next),
+				struct buffer_page, list);
+	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
+		to_remove = rb_list_head(to_remove)->next;
+		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
 	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
 
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
-
-out:
+	next_page = rb_list_head(to_remove)->next;
+	/* now we remove all pages between tail_page and next_page */
+	tail_page->next = (struct list_head *)((unsigned long)next_page |
+						head_bit);
+	next_page = rb_list_head(next_page);
+	next_page->prev = tail_page;
+	/* make sure pages points to a valid page in the ring buffer */
+	cpu_buffer->pages = next_page;
+	/* update head page */
+	if (head_bit)
+		cpu_buffer->head_page = list_entry(next_page,
+						struct buffer_page, list);
+	/*
+	 * change read pointer to make sure any read iterators reset
+	 * themselves
+	 */
+	cpu_buffer->read = 0;
+	/* pages are removed, resume tracing and then free the pages */
+	atomic_dec(&cpu_buffer->record_disabled);
 	spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
+
+	/* last buffer page to remove */
+	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
+				list);
+	tmp_iter_page = first_page;
+	do {
+		to_remove_page = tmp_iter_page;
+		rb_inc_page(cpu_buffer, &tmp_iter_page);
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		nr_removed--;
+		free_buffer_page(to_remove_page);
+	} while (to_remove_page != last_page);
+
+	RB_WARN_ON(cpu_buffer, nr_removed);
 }
 
 static void
@@ -1303,6 +1384,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	struct list_head *p;
 	unsigned i;
 
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	/* Make sure all writers are done with this buffer. */
+	synchronize_sched();
+
 	spin_lock_irq(&cpu_buffer->reader_lock);
 	rb_head_page_deactivate(cpu_buffer);
 
@@ -1319,18 +1406,22 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 out:
 	spin_unlock_irq(&cpu_buffer->reader_lock);
+	atomic_dec(&cpu_buffer->record_disabled);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void update_pages_handler(struct work_struct *work)
 {
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+
 	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1340,14 +1431,14 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
  *
  * Minimum size is 2 * BUF_PAGE_SIZE.
  *
- * Returns -1 on failure.
+ * Returns 0 on success and < 0 on failure.
  */
 int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1362,21 +1453,28 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
-
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
+	/*
+	 * Don't succeed if recording is disabled globally, as a reader might
+	 * be manipulating the ring buffer and is expecting a sane state while
+	 * this is true.
+	 */
+	if (atomic_read(&buffer->record_disabled))
+		return -EBUSY;
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
+			if (atomic_read(&cpu_buffer->record_disabled)) {
+				err = -EBUSY;
+				goto out_err;
+			}
+
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 
@@ -1392,20 +1490,37 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
+		}
+
+		/* fire off all the required work handlers */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
 		}
 
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-			}
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			wait_for_completion(&cpu_buffer->update_completion);
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+		if (atomic_read(&cpu_buffer->record_disabled)) {
+			err = -EBUSY;
+			goto out_err;
+		}
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1415,38 +1530,41 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					&cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
+
+		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
+		wait_for_completion(&cpu_buffer->update_completion);
 
-		update_pages_handler(cpu_buffer);
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
 		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1485,21 +1603,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bb3c867..736518f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2936,20 +2936,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -2963,14 +2953,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 3/4] trace: Make addition of pages in ring buffer atomic
  2011-08-23  1:17   ` Vaibhav Nagarnaik
                       ` (2 preceding siblings ...)
  2011-10-12  1:20     ` [PATCH v4 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
@ 2011-10-12  1:20     ` Vaibhav Nagarnaik
  2011-10-12  1:20     ` [PATCH v4 4/4] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
  4 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-10-12  1:20 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v4-v3:
* Check return value from rb_(insert|remove)_pages for more robust error
  handling
* Due to check for record_disabled, resizing was disallowed when
  recording was disabled. Add resize_disabled to fix it.

 kernel/trace/ring_buffer.c |  128 +++++++++++++++++++++++++++++--------------
 1 files changed, 86 insertions(+), 42 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d079702..8c36a90 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -512,6 +512,7 @@ struct ring_buffer {
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
+	atomic_t			resize_disabled;
 	cpumask_var_t			cpumask;
 
 	struct lock_class_key		*reader_lock_key;
@@ -1283,7 +1284,7 @@ static inline unsigned long rb_page_write(struct buffer_page *bpage)
 	return local_read(&bpage->write) & RB_WRITE_MASK;
 }
 
-static void
+static int
 rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
 	unsigned int nr_removed;
@@ -1374,53 +1375,99 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 	} while (to_remove_page != last_page);
 
 	RB_WARN_ON(cpu_buffer, nr_removed);
+
+	return nr_removed == 0;
 }
 
-static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+static int
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
+	spin_lock_irq(&cpu_buffer->reader_lock);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 10 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 10;
+	success = 0;
+	while (retries--) {
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *head_page_with_bit;
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
 
-	spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+		first_page = pages->next;
+		last_page  = pages->prev;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+		head_page_with_bit = (struct list_head *)
+				((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next  = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	spin_unlock_irq(&cpu_buffer->reader_lock);
-	atomic_dec(&cpu_buffer->record_disabled);
+
+	/* free pages if they weren't inserted */
+	if (!success) {
+		struct buffer_page *bpage, *tmp;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
+	}
+	return success;
 }
 
 static void update_pages_handler(struct work_struct *work)
 {
 	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
 			struct ring_buffer_per_cpu, update_pages_work);
+	int success;
 
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		success = rb_insert_pages(cpu_buffer);
 	else
-		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+		success = rb_remove_pages(cpu_buffer,
+					-cpu_buffer->nr_pages_to_update);
 
-	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	if (success)
+		cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
 	complete(&cpu_buffer->update_completion);
 }
 
@@ -1456,11 +1503,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	/*
-	 * Don't succeed if recording is disabled globally, as a reader might
-	 * be manipulating the ring buffer and is expecting a sane state while
+	 * Don't succeed if resizing is disabled, as a reader might be
+	 * manipulating the ring buffer and is expecting a sane state while
 	 * this is true.
 	 */
-	if (atomic_read(&buffer->record_disabled))
+	if (atomic_read(&buffer->resize_disabled))
 		return -EBUSY;
 	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
@@ -1470,11 +1517,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
-			if (atomic_read(&cpu_buffer->record_disabled)) {
-				err = -EBUSY;
-				goto out_err;
-			}
-
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 
@@ -1516,11 +1558,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
-		if (atomic_read(&cpu_buffer->record_disabled)) {
-			err = -EBUSY;
-			goto out_err;
-		}
-
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1537,7 +1574,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 
 		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
 		wait_for_completion(&cpu_buffer->update_completion);
-
 		/* reset this value */
 		cpu_buffer->nr_pages_to_update = 0;
 	}
@@ -3575,6 +3611,7 @@ ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu)
 
 	iter->cpu_buffer = cpu_buffer;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
 	return iter;
@@ -3638,6 +3675,7 @@ ring_buffer_read_finish(struct ring_buffer_iter *iter)
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&cpu_buffer->buffer->resize_disabled);
 	kfree(iter);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read_finish);
@@ -3709,6 +3747,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
@@ -3745,8 +3784,12 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
+	/* Make sure all commits have finished */
+	synchronize_sched();
+
 	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
 	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
@@ -3762,6 +3805,7 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&buffer->resize_disabled);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v4 4/4] trace: change CPU ring buffer state from tracing_cpumask
  2011-08-23  1:17   ` Vaibhav Nagarnaik
                       ` (3 preceding siblings ...)
  2011-10-12  1:20     ` [PATCH v4 3/4] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
@ 2011-10-12  1:20     ` Vaibhav Nagarnaik
  4 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2011-10-12  1:20 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/trace.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 736518f..a8999bd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2529,10 +2529,12 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf,
 		if (cpumask_test_cpu(cpu, tracing_cpumask) &&
 				!cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_inc(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_disable_cpu(global_trace.buffer, cpu);
 		}
 		if (!cpumask_test_cpu(cpu, tracing_cpumask) &&
 				cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_dec(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_enable_cpu(global_trace.buffer, cpu);
 		}
 	}
 	arch_spin_unlock(&ftrace_max_lock);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 1/4] trace: Add per_cpu ring buffer control files
  2011-10-12  1:20     ` [PATCH v4 1/4] " Vaibhav Nagarnaik
@ 2012-01-31 23:53       ` Vaibhav Nagarnaik
  2012-02-02  2:42         ` Steven Rostedt
  2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
  1 sibling, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-01-31 23:53 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, linux-kernel, Vaibhav Nagarnaik

On Tue, Oct 11, 2011 at 6:20 PM, Vaibhav Nagarnaik
<vnagarnaik@google.com> wrote:
> Add a debugfs entry under per_cpu/ folder for each cpu called
> buffer_size_kb to control the ring buffer size for each CPU
> independently.
>
> If the global file buffer_size_kb is used to set size, the individual
> ring buffers will be adjusted to the given size. The buffer_size_kb will
> report the common size to maintain backward compatibility.
>
> If the buffer_size_kb file under the per_cpu/ directory is used to
> change buffer size for a specific CPU, only the size of the respective
> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
> 'X' to indicate that sizes of per_cpu ring buffers are not equivalent.
>
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---
> Changelog v4-v3:
> * restructure code according to feedback.


Hi Steven,

I remember you were waiting for a upstream release cycle to complete
before accepting this patch series. Are you testing it for the next
release?



Vaibhav Nagarnaik



>
>  include/linux/ring_buffer.h |    6 +-
>  kernel/trace/ring_buffer.c  |  248 ++++++++++++++++++++++++-------------------
>  kernel/trace/trace.c        |  191 ++++++++++++++++++++++++++-------
>  kernel/trace/trace.h        |    2 +-
>  4 files changed, 297 insertions(+), 150 deletions(-)
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 67be037..ad36702 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
>        __ring_buffer_alloc((size), (flags), &__key);   \
>  })
>
> +#define RING_BUFFER_ALL_CPUS -1
> +
>  void ring_buffer_free(struct ring_buffer *buffer);
>
> -int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
> +int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
>
>  void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
>
> @@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
>  void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
>  int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
>
> -unsigned long ring_buffer_size(struct ring_buffer *buffer);
> +unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
>
>  void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
>  void ring_buffer_reset(struct ring_buffer *buffer);
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index acf6b68..7b39b09 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
>        spinlock_t                      reader_lock;    /* serialize readers */
>        arch_spinlock_t                 lock;
>        struct lock_class_key           lock_key;
> +       unsigned int                    nr_pages;
>        struct list_head                *pages;
>        struct buffer_page              *head_page;     /* read from head */
>        struct buffer_page              *tail_page;     /* write to tail */
> @@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
>        unsigned long                   read_bytes;
>        u64                             write_stamp;
>        u64                             read_stamp;
> +       /* ring buffer pages to update, > 0 to add, < 0 to remove */
> +       int                             nr_pages_to_update;
> +       struct list_head                new_pages; /* new pages to add */
>  };
>
>  struct ring_buffer {
> -       unsigned                        pages;
>        unsigned                        flags;
>        int                             cpus;
>        atomic_t                        record_disabled;
> @@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
>        return 0;
>  }
>
> -static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
> -                            unsigned nr_pages)
> +static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
>  {
> +       int i;
>        struct buffer_page *bpage, *tmp;
> -       LIST_HEAD(pages);
> -       unsigned i;
> -
> -       WARN_ON(!nr_pages);
>
>        for (i = 0; i < nr_pages; i++) {
>                struct page *page;
> @@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>                 */
>                bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>                                    GFP_KERNEL | __GFP_NORETRY,
> -                                   cpu_to_node(cpu_buffer->cpu));
> +                                   cpu_to_node(cpu));
>                if (!bpage)
>                        goto free_pages;
>
> -               rb_check_bpage(cpu_buffer, bpage);
> +               list_add(&bpage->list, pages);
>
> -               list_add(&bpage->list, &pages);
> -
> -               page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
> +               page = alloc_pages_node(cpu_to_node(cpu),
>                                        GFP_KERNEL | __GFP_NORETRY, 0);
>                if (!page)
>                        goto free_pages;
> @@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>                rb_init_page(bpage->page);
>        }
>
> +       return 0;
> +
> +free_pages:
> +       list_for_each_entry_safe(bpage, tmp, pages, list) {
> +               list_del_init(&bpage->list);
> +               free_buffer_page(bpage);
> +       }
> +
> +       return -ENOMEM;
> +}
> +
> +static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
> +                            unsigned nr_pages)
> +{
> +       LIST_HEAD(pages);
> +
> +       WARN_ON(!nr_pages);
> +
> +       if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
> +               return -ENOMEM;
> +
>        /*
>         * The ring buffer page list is a circular list that does not
>         * start and end with a list head. All page list items point to
> @@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>        cpu_buffer->pages = pages.next;
>        list_del(&pages);
>
> +       cpu_buffer->nr_pages = nr_pages;
> +
>        rb_check_pages(cpu_buffer);
>
>        return 0;
> -
> - free_pages:
> -       list_for_each_entry_safe(bpage, tmp, &pages, list) {
> -               list_del_init(&bpage->list);
> -               free_buffer_page(bpage);
> -       }
> -       return -ENOMEM;
>  }
>
>  static struct ring_buffer_per_cpu *
> -rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
> +rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>  {
>        struct ring_buffer_per_cpu *cpu_buffer;
>        struct buffer_page *bpage;
> @@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
>
>        INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
>
> -       ret = rb_allocate_pages(cpu_buffer, buffer->pages);
> +       ret = rb_allocate_pages(cpu_buffer, nr_pages);
>        if (ret < 0)
>                goto fail_free_reader;
>
> @@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>  {
>        struct ring_buffer *buffer;
>        int bsize;
> -       int cpu;
> +       int cpu, nr_pages;
>
>        /* keep it in its own cache line */
>        buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
> @@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>        if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
>                goto fail_free_buffer;
>
> -       buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
> +       nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>        buffer->flags = flags;
>        buffer->clock = trace_clock_local;
>        buffer->reader_lock_key = key;
>
>        /* need at least two pages */
> -       if (buffer->pages < 2)
> -               buffer->pages = 2;
> +       if (nr_pages < 2)
> +               nr_pages = 2;
>
>        /*
>         * In case of non-hotplug cpu, if the ring-buffer is allocated
> @@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>
>        for_each_buffer_cpu(buffer, cpu) {
>                buffer->buffers[cpu] =
> -                       rb_allocate_cpu_buffer(buffer, cpu);
> +                       rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
>                if (!buffer->buffers[cpu])
>                        goto fail_free_buffers;
>        }
> @@ -1308,6 +1321,18 @@ out:
>        spin_unlock_irq(&cpu_buffer->reader_lock);
>  }
>
> +static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
> +{
> +       if (cpu_buffer->nr_pages_to_update > 0)
> +               rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
> +                               cpu_buffer->nr_pages_to_update);
> +       else
> +               rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
> +       cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
> +       /* reset this value */
> +       cpu_buffer->nr_pages_to_update = 0;
> +}
> +
>  /**
>  * ring_buffer_resize - resize the ring buffer
>  * @buffer: the buffer to resize.
> @@ -1317,14 +1342,12 @@ out:
>  *
>  * Returns -1 on failure.
>  */
> -int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
> +int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
> +                       int cpu_id)
>  {
>        struct ring_buffer_per_cpu *cpu_buffer;
> -       unsigned nr_pages, rm_pages, new_pages;
> -       struct buffer_page *bpage, *tmp;
> -       unsigned long buffer_size;
> -       LIST_HEAD(pages);
> -       int i, cpu;
> +       unsigned nr_pages;
> +       int cpu;
>
>        /*
>         * Always succeed at resizing a non-existent buffer:
> @@ -1334,15 +1357,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>
>        size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>        size *= BUF_PAGE_SIZE;
> -       buffer_size = buffer->pages * BUF_PAGE_SIZE;
>
>        /* we need a minimum of two pages */
>        if (size < BUF_PAGE_SIZE * 2)
>                size = BUF_PAGE_SIZE * 2;
>
> -       if (size == buffer_size)
> -               return size;
> -
>        atomic_inc(&buffer->record_disabled);
>
>        /* Make sure all writers are done with this buffer. */
> @@ -1353,68 +1372,56 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>
>        nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>
> -       if (size < buffer_size) {
> -
> -               /* easy case, just free pages */
> -               if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
> -                       goto out_fail;
> -
> -               rm_pages = buffer->pages - nr_pages;
> -
> +       if (cpu_id == RING_BUFFER_ALL_CPUS) {
> +               /* calculate the pages to update */
>                for_each_buffer_cpu(buffer, cpu) {
>                        cpu_buffer = buffer->buffers[cpu];
> -                       rb_remove_pages(cpu_buffer, rm_pages);
> -               }
> -               goto out;
> -       }
>
> -       /*
> -        * This is a bit more difficult. We only want to add pages
> -        * when we can allocate enough for all CPUs. We do this
> -        * by allocating all the pages and storing them on a local
> -        * link list. If we succeed in our allocation, then we
> -        * add these pages to the cpu_buffers. Otherwise we just free
> -        * them all and return -ENOMEM;
> -        */
> -       if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
> -               goto out_fail;
> +                       cpu_buffer->nr_pages_to_update = nr_pages -
> +                                                       cpu_buffer->nr_pages;
>
> -       new_pages = nr_pages - buffer->pages;
> +                       /*
> +                        * nothing more to do for removing pages or no update
> +                        */
> +                       if (cpu_buffer->nr_pages_to_update <= 0)
> +                               continue;
>
> -       for_each_buffer_cpu(buffer, cpu) {
> -               for (i = 0; i < new_pages; i++) {
> -                       struct page *page;
>                        /*
> -                        * __GFP_NORETRY flag makes sure that the allocation
> -                        * fails gracefully without invoking oom-killer and
> -                        * the system is not destabilized.
> +                        * to add pages, make sure all new pages can be
> +                        * allocated without receiving ENOMEM
>                         */
> -                       bpage = kzalloc_node(ALIGN(sizeof(*bpage),
> -                                                 cache_line_size()),
> -                                           GFP_KERNEL | __GFP_NORETRY,
> -                                           cpu_to_node(cpu));
> -                       if (!bpage)
> -                               goto free_pages;
> -                       list_add(&bpage->list, &pages);
> -                       page = alloc_pages_node(cpu_to_node(cpu),
> -                                               GFP_KERNEL | __GFP_NORETRY, 0);
> -                       if (!page)
> -                               goto free_pages;
> -                       bpage->page = page_address(page);
> -                       rb_init_page(bpage->page);
> +                       INIT_LIST_HEAD(&cpu_buffer->new_pages);
> +                       if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> +                                               &cpu_buffer->new_pages, cpu))
> +                               /* not enough memory for new pages */
> +                               goto no_mem;
>                }
> -       }
>
> -       for_each_buffer_cpu(buffer, cpu) {
> -               cpu_buffer = buffer->buffers[cpu];
> -               rb_insert_pages(cpu_buffer, &pages, new_pages);
> -       }
> +               /* wait for all the updates to complete */
> +               for_each_buffer_cpu(buffer, cpu) {
> +                       cpu_buffer = buffer->buffers[cpu];
> +                       if (cpu_buffer->nr_pages_to_update) {
> +                               update_pages_handler(cpu_buffer);
> +                       }
> +               }
> +       } else {
> +               cpu_buffer = buffer->buffers[cpu_id];
> +               if (nr_pages == cpu_buffer->nr_pages)
> +                       goto out;
>
> -       if (RB_WARN_ON(buffer, !list_empty(&pages)))
> -               goto out_fail;
> +               cpu_buffer->nr_pages_to_update = nr_pages -
> +                                               cpu_buffer->nr_pages;
> +
> +               INIT_LIST_HEAD(&cpu_buffer->new_pages);
> +               if (cpu_buffer->nr_pages_to_update > 0 &&
> +                       __rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> +                                               &cpu_buffer->new_pages, cpu_id))
> +                       goto no_mem;
> +
> +               update_pages_handler(cpu_buffer);
> +       }
>
>  out:
> -       buffer->pages = nr_pages;
>        put_online_cpus();
>        mutex_unlock(&buffer->mutex);
>
> @@ -1422,25 +1429,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>
>        return size;
>
> - free_pages:
> -       list_for_each_entry_safe(bpage, tmp, &pages, list) {
> -               list_del_init(&bpage->list);
> -               free_buffer_page(bpage);
> + no_mem:
> +       for_each_buffer_cpu(buffer, cpu) {
> +               struct buffer_page *bpage, *tmp;
> +               cpu_buffer = buffer->buffers[cpu];
> +               /* reset this number regardless */
> +               cpu_buffer->nr_pages_to_update = 0;
> +               if (list_empty(&cpu_buffer->new_pages))
> +                       continue;
> +               list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
> +                                       list) {
> +                       list_del_init(&bpage->list);
> +                       free_buffer_page(bpage);
> +               }
>        }
>        put_online_cpus();
>        mutex_unlock(&buffer->mutex);
>        atomic_dec(&buffer->record_disabled);
>        return -ENOMEM;
> -
> -       /*
> -        * Something went totally wrong, and we are too paranoid
> -        * to even clean up the mess.
> -        */
> - out_fail:
> -       put_online_cpus();
> -       mutex_unlock(&buffer->mutex);
> -       atomic_dec(&buffer->record_disabled);
> -       return -1;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_resize);
>
> @@ -1542,7 +1548,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
>         * assign the commit to the tail.
>         */
>  again:
> -       max_count = cpu_buffer->buffer->pages * 100;
> +       max_count = cpu_buffer->nr_pages * 100;
>
>        while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
>                if (RB_WARN_ON(cpu_buffer, !(--max_count)))
> @@ -3563,9 +3569,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
>  * ring_buffer_size - return the size of the ring buffer (in bytes)
>  * @buffer: The ring buffer.
>  */
> -unsigned long ring_buffer_size(struct ring_buffer *buffer)
> +unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
>  {
> -       return BUF_PAGE_SIZE * buffer->pages;
> +       /*
> +        * Earlier, this method returned
> +        *      BUF_PAGE_SIZE * buffer->nr_pages
> +        * Since the nr_pages field is now removed, we have converted this to
> +        * return the per cpu buffer value.
> +        */
> +       if (!cpumask_test_cpu(cpu, buffer->cpumask))
> +               return 0;
> +
> +       return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_size);
>
> @@ -3740,8 +3755,11 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>            !cpumask_test_cpu(cpu, buffer_b->cpumask))
>                goto out;
>
> +       cpu_buffer_a = buffer_a->buffers[cpu];
> +       cpu_buffer_b = buffer_b->buffers[cpu];
> +
>        /* At least make sure the two buffers are somewhat the same */
> -       if (buffer_a->pages != buffer_b->pages)
> +       if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
>                goto out;
>
>        ret = -EAGAIN;
> @@ -3755,9 +3773,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>        if (atomic_read(&buffer_b->record_disabled))
>                goto out;
>
> -       cpu_buffer_a = buffer_a->buffers[cpu];
> -       cpu_buffer_b = buffer_b->buffers[cpu];
> -
>        if (atomic_read(&cpu_buffer_a->record_disabled))
>                goto out;
>
> @@ -4108,6 +4123,8 @@ static int rb_cpu_notify(struct notifier_block *self,
>        struct ring_buffer *buffer =
>                container_of(self, struct ring_buffer, cpu_notify);
>        long cpu = (long)hcpu;
> +       int cpu_i, nr_pages_same;
> +       unsigned int nr_pages;
>
>        switch (action) {
>        case CPU_UP_PREPARE:
> @@ -4115,8 +4132,23 @@ static int rb_cpu_notify(struct notifier_block *self,
>                if (cpumask_test_cpu(cpu, buffer->cpumask))
>                        return NOTIFY_OK;
>
> +               nr_pages = 0;
> +               nr_pages_same = 1;
> +               /* check if all cpu sizes are same */
> +               for_each_buffer_cpu(buffer, cpu_i) {
> +                       /* fill in the size from first enabled cpu */
> +                       if (nr_pages == 0)
> +                               nr_pages = buffer->buffers[cpu_i]->nr_pages;
> +                       if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
> +                               nr_pages_same = 0;
> +                               break;
> +                       }
> +               }
> +               /* allocate minimum pages, user can later expand it */
> +               if (!nr_pages_same)
> +                       nr_pages = 2;
>                buffer->buffers[cpu] =
> -                       rb_allocate_cpu_buffer(buffer, cpu);
> +                       rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
>                if (!buffer->buffers[cpu]) {
>                        WARN(1, "failed to allocate ring buffer on CPU %ld\n",
>                             cpu);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index b419070..bb3c867 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -784,7 +784,8 @@ __acquires(kernel_lock)
>
>                /* If we expanded the buffers, make sure the max is expanded too */
>                if (ring_buffer_expanded && type->use_max_tr)
> -                       ring_buffer_resize(max_tr.buffer, trace_buf_size);
> +                       ring_buffer_resize(max_tr.buffer, trace_buf_size,
> +                                               RING_BUFFER_ALL_CPUS);
>
>                /* the test is responsible for initializing and enabling */
>                pr_info("Testing tracer %s: ", type->name);
> @@ -800,7 +801,8 @@ __acquires(kernel_lock)
>
>                /* Shrink the max buffer again */
>                if (ring_buffer_expanded && type->use_max_tr)
> -                       ring_buffer_resize(max_tr.buffer, 1);
> +                       ring_buffer_resize(max_tr.buffer, 1,
> +                                               RING_BUFFER_ALL_CPUS);
>
>                printk(KERN_CONT "PASSED\n");
>        }
> @@ -2853,7 +2855,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
>        return t->init(tr);
>  }
>
> -static int __tracing_resize_ring_buffer(unsigned long size)
> +static void set_buffer_entries(struct trace_array *tr, unsigned long val)
> +{
> +       int cpu;
> +       for_each_tracing_cpu(cpu)
> +               tr->data[cpu]->entries = val;
> +}
> +
> +static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
>  {
>        int ret;
>
> @@ -2864,19 +2873,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
>         */
>        ring_buffer_expanded = 1;
>
> -       ret = ring_buffer_resize(global_trace.buffer, size);
> +       ret = ring_buffer_resize(global_trace.buffer, size, cpu);
>        if (ret < 0)
>                return ret;
>
>        if (!current_trace->use_max_tr)
>                goto out;
>
> -       ret = ring_buffer_resize(max_tr.buffer, size);
> +       ret = ring_buffer_resize(max_tr.buffer, size, cpu);
>        if (ret < 0) {
> -               int r;
> +               int r = 0;
> +
> +               if (cpu == RING_BUFFER_ALL_CPUS) {
> +                       int i;
> +                       for_each_tracing_cpu(i) {
> +                               r = ring_buffer_resize(global_trace.buffer,
> +                                               global_trace.data[i]->entries,
> +                                               i);
> +                               if (r < 0)
> +                                       break;
> +                       }
> +               } else {
> +                       r = ring_buffer_resize(global_trace.buffer,
> +                                               global_trace.data[cpu]->entries,
> +                                               cpu);
> +               }
>
> -               r = ring_buffer_resize(global_trace.buffer,
> -                                      global_trace.entries);
>                if (r < 0) {
>                        /*
>                         * AARGH! We are left with different
> @@ -2898,14 +2920,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
>                return ret;
>        }
>
> -       max_tr.entries = size;
> +       if (cpu == RING_BUFFER_ALL_CPUS)
> +               set_buffer_entries(&max_tr, size);
> +       else
> +               max_tr.data[cpu]->entries = size;
> +
>  out:
> -       global_trace.entries = size;
> +       if (cpu == RING_BUFFER_ALL_CPUS)
> +               set_buffer_entries(&global_trace, size);
> +       else
> +               global_trace.data[cpu]->entries = size;
>
>        return ret;
>  }
>
> -static ssize_t tracing_resize_ring_buffer(unsigned long size)
> +static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
>  {
>        int cpu, ret = size;
>
> @@ -2921,12 +2950,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
>                        atomic_inc(&max_tr.data[cpu]->disabled);
>        }
>
> -       if (size != global_trace.entries)
> -               ret = __tracing_resize_ring_buffer(size);
> +       if (cpu_id != RING_BUFFER_ALL_CPUS) {
> +               /* make sure, this cpu is enabled in the mask */
> +               if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +       }
>
> +       ret = __tracing_resize_ring_buffer(size, cpu_id);
>        if (ret < 0)
>                ret = -ENOMEM;
>
> +out:
>        for_each_tracing_cpu(cpu) {
>                if (global_trace.data[cpu])
>                        atomic_dec(&global_trace.data[cpu]->disabled);
> @@ -2957,7 +2993,8 @@ int tracing_update_buffers(void)
>
>        mutex_lock(&trace_types_lock);
>        if (!ring_buffer_expanded)
> -               ret = __tracing_resize_ring_buffer(trace_buf_size);
> +               ret = __tracing_resize_ring_buffer(trace_buf_size,
> +                                               RING_BUFFER_ALL_CPUS);
>        mutex_unlock(&trace_types_lock);
>
>        return ret;
> @@ -2981,7 +3018,8 @@ static int tracing_set_tracer(const char *buf)
>        mutex_lock(&trace_types_lock);
>
>        if (!ring_buffer_expanded) {
> -               ret = __tracing_resize_ring_buffer(trace_buf_size);
> +               ret = __tracing_resize_ring_buffer(trace_buf_size,
> +                                               RING_BUFFER_ALL_CPUS);
>                if (ret < 0)
>                        goto out;
>                ret = 0;
> @@ -3007,8 +3045,8 @@ static int tracing_set_tracer(const char *buf)
>                 * The max_tr ring buffer has some state (e.g. ring->clock) and
>                 * we want preserve it.
>                 */
> -               ring_buffer_resize(max_tr.buffer, 1);
> -               max_tr.entries = 1;
> +               ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
> +               set_buffer_entries(&max_tr, 1);
>        }
>        destroy_trace_option_files(topts);
>
> @@ -3016,10 +3054,17 @@ static int tracing_set_tracer(const char *buf)
>
>        topts = create_trace_option_files(current_trace);
>        if (current_trace->use_max_tr) {
> -               ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
> -               if (ret < 0)
> -                       goto out;
> -               max_tr.entries = global_trace.entries;
> +               int cpu;
> +               /* we need to make per cpu buffer sizes equivalent */
> +               for_each_tracing_cpu(cpu) {
> +                       ret = ring_buffer_resize(max_tr.buffer,
> +                                               global_trace.data[cpu]->entries,
> +                                               cpu);
> +                       if (ret < 0)
> +                               goto out;
> +                       max_tr.data[cpu]->entries =
> +                                       global_trace.data[cpu]->entries;
> +               }
>        }
>
>        if (t->init) {
> @@ -3521,30 +3566,82 @@ out_err:
>        goto out;
>  }
>
> +struct ftrace_entries_info {
> +       struct trace_array      *tr;
> +       int                     cpu;
> +};
> +
> +static int tracing_entries_open(struct inode *inode, struct file *filp)
> +{
> +       struct ftrace_entries_info *info;
> +
> +       if (tracing_disabled)
> +               return -ENODEV;
> +
> +       info = kzalloc(sizeof(*info), GFP_KERNEL);
> +       if (!info)
> +               return -ENOMEM;
> +
> +       info->tr = &global_trace;
> +       info->cpu = (unsigned long)inode->i_private;
> +
> +       filp->private_data = info;
> +
> +       return 0;
> +}
> +
>  static ssize_t
>  tracing_entries_read(struct file *filp, char __user *ubuf,
>                     size_t cnt, loff_t *ppos)
>  {
> -       struct trace_array *tr = filp->private_data;
> -       char buf[96];
> -       int r;
> +       struct ftrace_entries_info *info = filp->private_data;
> +       struct trace_array *tr = info->tr;
> +       char buf[64];
> +       int r = 0;
> +       ssize_t ret;
>
>        mutex_lock(&trace_types_lock);
> -       if (!ring_buffer_expanded)
> -               r = sprintf(buf, "%lu (expanded: %lu)\n",
> -                           tr->entries >> 10,
> -                           trace_buf_size >> 10);
> -       else
> -               r = sprintf(buf, "%lu\n", tr->entries >> 10);
> +
> +       if (info->cpu == RING_BUFFER_ALL_CPUS) {
> +               int cpu, buf_size_same;
> +               unsigned long size;
> +
> +               size = 0;
> +               buf_size_same = 1;
> +               /* check if all cpu sizes are same */
> +               for_each_tracing_cpu(cpu) {
> +                       /* fill in the size from first enabled cpu */
> +                       if (size == 0)
> +                               size = tr->data[cpu]->entries;
> +                       if (size != tr->data[cpu]->entries) {
> +                               buf_size_same = 0;
> +                               break;
> +                       }
> +               }
> +
> +               if (buf_size_same) {
> +                       if (!ring_buffer_expanded)
> +                               r = sprintf(buf, "%lu (expanded: %lu)\n",
> +                                           size >> 10,
> +                                           trace_buf_size >> 10);
> +                       else
> +                               r = sprintf(buf, "%lu\n", size >> 10);
> +               } else
> +                       r = sprintf(buf, "X\n");
> +       } else
> +               r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
> +
>        mutex_unlock(&trace_types_lock);
>
> -       return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +       ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +       return ret;
>  }
>
>  static ssize_t
>  tracing_entries_write(struct file *filp, const char __user *ubuf,
>                      size_t cnt, loff_t *ppos)
>  {
> +       struct ftrace_entries_info *info = filp->private_data;
>        unsigned long val;
>        int ret;
>
> @@ -3559,7 +3656,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>        /* value is in KB */
>        val <<= 10;
>
> -       ret = tracing_resize_ring_buffer(val);
> +       ret = tracing_resize_ring_buffer(val, info->cpu);
>        if (ret < 0)
>                return ret;
>
> @@ -3568,6 +3665,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>        return cnt;
>  }
>
> +static int
> +tracing_entries_release(struct inode *inode, struct file *filp)
> +{
> +       struct ftrace_entries_info *info = filp->private_data;
> +
> +       kfree(info);
> +
> +       return 0;
> +}
> +
>  static ssize_t
>  tracing_total_entries_read(struct file *filp, char __user *ubuf,
>                                size_t cnt, loff_t *ppos)
> @@ -3579,7 +3686,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
>
>        mutex_lock(&trace_types_lock);
>        for_each_tracing_cpu(cpu) {
> -               size += tr->entries >> 10;
> +               size += tr->data[cpu]->entries >> 10;
>                if (!ring_buffer_expanded)
>                        expanded_size += trace_buf_size >> 10;
>        }
> @@ -3613,7 +3720,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
>        if (trace_flags & TRACE_ITER_STOP_ON_FREE)
>                tracing_off();
>        /* resize the ring buffer to 0 */
> -       tracing_resize_ring_buffer(0);
> +       tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
>
>        return 0;
>  }
> @@ -3757,9 +3864,10 @@ static const struct file_operations tracing_pipe_fops = {
>  };
>
>  static const struct file_operations tracing_entries_fops = {
> -       .open           = tracing_open_generic,
> +       .open           = tracing_entries_open,
>        .read           = tracing_entries_read,
>        .write          = tracing_entries_write,
> +       .release        = tracing_entries_release,
>        .llseek         = generic_file_llseek,
>  };
>
> @@ -4211,6 +4319,9 @@ static void tracing_init_debugfs_percpu(long cpu)
>
>        trace_create_file("stats", 0444, d_cpu,
>                        (void *) cpu, &tracing_stats_fops);
> +
> +       trace_create_file("buffer_size_kb", 0444, d_cpu,
> +                       (void *) cpu, &tracing_entries_fops);
>  }
>
>  #ifdef CONFIG_FTRACE_SELFTEST
> @@ -4491,7 +4602,7 @@ static __init int tracer_init_debugfs(void)
>                        (void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
>
>        trace_create_file("buffer_size_kb", 0644, d_tracer,
> -                       &global_trace, &tracing_entries_fops);
> +                       (void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
>
>        trace_create_file("buffer_total_size_kb", 0444, d_tracer,
>                        &global_trace, &tracing_total_entries_fops);
> @@ -4737,8 +4848,6 @@ __init static int tracer_alloc_buffers(void)
>                WARN_ON(1);
>                goto out_free_cpumask;
>        }
> -       global_trace.entries = ring_buffer_size(global_trace.buffer);
> -
>
>  #ifdef CONFIG_TRACER_MAX_TRACE
>        max_tr.buffer = ring_buffer_alloc(1, rb_flags);
> @@ -4748,7 +4857,6 @@ __init static int tracer_alloc_buffers(void)
>                ring_buffer_free(global_trace.buffer);
>                goto out_free_cpumask;
>        }
> -       max_tr.entries = 1;
>  #endif
>
>        /* Allocate the first page for all buffers */
> @@ -4757,6 +4865,11 @@ __init static int tracer_alloc_buffers(void)
>                max_tr.data[i] = &per_cpu(max_tr_data, i);
>        }
>
> +       set_buffer_entries(&global_trace, ring_buf_size);
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +       set_buffer_entries(&max_tr, 1);
> +#endif
> +
>        trace_init_cmdlines();
>
>        register_tracer(&nop_trace);
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 616846b..126d333 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -125,6 +125,7 @@ struct trace_array_cpu {
>        atomic_t                disabled;
>        void                    *buffer_page;   /* ring buffer spare */
>
> +       unsigned long           entries;
>        unsigned long           saved_latency;
>        unsigned long           critical_start;
>        unsigned long           critical_end;
> @@ -146,7 +147,6 @@ struct trace_array_cpu {
>  */
>  struct trace_array {
>        struct ring_buffer      *buffer;
> -       unsigned long           entries;
>        int                     cpu;
>        cycle_t                 time_start;
>        struct task_struct      *waiter;
> --
> 1.7.3.1
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 1/4] trace: Add per_cpu ring buffer control files
  2012-01-31 23:53       ` Vaibhav Nagarnaik
@ 2012-02-02  2:42         ` Steven Rostedt
  2012-02-02 19:20           ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-02-02  2:42 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Tue, 2012-01-31 at 15:53 -0800, Vaibhav Nagarnaik wrote:

> 
> Hi Steven,
> 
> I remember you were waiting for a upstream release cycle to complete
> before accepting this patch series. Are you testing it for the next
> release?
> 

I was thinking there was issues before and thought there was going to be
another round of patches or something.

I'll see if I can squeeze out some time to review them again. I'm
currently hunting down some other bugs.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v4 1/4] trace: Add per_cpu ring buffer control files
  2012-02-02  2:42         ` Steven Rostedt
@ 2012-02-02 19:20           ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-02-02 19:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	linux-kernel

On Wed, Feb 1, 2012 at 6:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> I was thinking there was issues before and thought there was going to be
> another round of patches or something.

I did send those updated patches with changes according to your
feedback. Although, I just checked and the patches don't merge cleanly
to the latest upstream HEAD. I am sending out the rebased patches.

>
> I'll see if I can squeeze out some time to review them again. I'm
> currently hunting down some other bugs.
>

Thanks.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v5 1/4] trace: Add per_cpu ring buffer control files
  2011-10-12  1:20     ` [PATCH v4 1/4] " Vaibhav Nagarnaik
  2012-01-31 23:53       ` Vaibhav Nagarnaik
@ 2012-02-02 20:00       ` Vaibhav Nagarnaik
  2012-02-02 20:00         ` [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
                           ` (4 more replies)
  1 sibling, 5 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-02-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, Justin Teravest, linux-kernel,
	Vaibhav Nagarnaik

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v5-v4:
* Rebased to latest upstream

 include/linux/ring_buffer.h |    6 +-
 kernel/trace/ring_buffer.c  |  248 ++++++++++++++++++++++++-------------------
 kernel/trace/trace.c        |  191 ++++++++++++++++++++++++++-------
 kernel/trace/trace.h        |    2 +-
 4 files changed, 297 insertions(+), 150 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 67be037..ad36702 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+#define RING_BUFFER_ALL_CPUS -1
+
 void ring_buffer_free(struct ring_buffer *buffer);
 
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
 
 void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
 
@@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
 void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
 int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
 
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
 
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f5b7b5c..c778ab9 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
 	raw_spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
+	unsigned int			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
 	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
+	/* ring buffer pages to update, > 0 to add, < 0 to remove */
+	int				nr_pages_to_update;
+	struct list_head		new_pages; /* new pages to add */
 };
 
 struct ring_buffer {
-	unsigned			pages;
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
@@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
+static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
 {
+	int i;
 	struct buffer_page *bpage, *tmp;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    GFP_KERNEL | __GFP_NORETRY,
-				    cpu_to_node(cpu_buffer->cpu));
+				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
 
-		rb_check_bpage(cpu_buffer, bpage);
+		list_add(&bpage->list, pages);
 
-		list_add(&bpage->list, &pages);
-
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+		page = alloc_pages_node(cpu_to_node(cpu),
 					GFP_KERNEL | __GFP_NORETRY, 0);
 		if (!page)
 			goto free_pages;
@@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_init_page(bpage->page);
 	}
 
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+
+	return -ENOMEM;
+}
+
+static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	LIST_HEAD(pages);
+
+	WARN_ON(!nr_pages);
+
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+		return -ENOMEM;
+
 	/*
 	 * The ring buffer page list is a circular list that does not
 	 * start and end with a list head. All page list items point to
@@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);
 
+	cpu_buffer->nr_pages = nr_pages;
+
 	rb_check_pages(cpu_buffer);
 
 	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
 }
 
 static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
+rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
@@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	ret = rb_allocate_pages(cpu_buffer, nr_pages);
 	if (ret < 0)
 		goto fail_free_reader;
 
@@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 {
 	struct ring_buffer *buffer;
 	int bsize;
-	int cpu;
+	int cpu, nr_pages;
 
 	/* keep it in its own cache line */
 	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
 
 	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
+	if (nr_pages < 2)
+		nr_pages = 2;
 
 	/*
 	 * In case of non-hotplug cpu, if the ring-buffer is allocated
@@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 
 	for_each_buffer_cpu(buffer, cpu) {
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu])
 			goto fail_free_buffers;
 	}
@@ -1308,6 +1321,18 @@ out:
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
+static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	if (cpu_buffer->nr_pages_to_update > 0)
+		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
+				cpu_buffer->nr_pages_to_update);
+	else
+		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	/* reset this value */
+	cpu_buffer->nr_pages_to_update = 0;
+}
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -1317,14 +1342,12 @@ out:
  *
  * Returns -1 on failure.
  */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
+			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	LIST_HEAD(pages);
-	int i, cpu;
+	unsigned nr_pages;
+	int cpu;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1334,15 +1357,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
 
 	/* we need a minimum of two pages */
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	if (size == buffer_size)
-		return size;
-
 	atomic_inc(&buffer->record_disabled);
 
 	/* Make sure all writers are done with this buffer. */
@@ -1353,68 +1372,56 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	if (size < buffer_size) {
-
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
-
-		rm_pages = buffer->pages - nr_pages;
-
+	if (cpu_id == RING_BUFFER_ALL_CPUS) {
+		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
-		}
-		goto out;
-	}
 
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
+			cpu_buffer->nr_pages_to_update = nr_pages -
+							cpu_buffer->nr_pages;
 
-	new_pages = nr_pages - buffer->pages;
+			/*
+			 * nothing more to do for removing pages or no update
+			 */
+			if (cpu_buffer->nr_pages_to_update <= 0)
+				continue;
 
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			struct page *page;
 			/*
-			 * __GFP_NORETRY flag makes sure that the allocation
-			 * fails gracefully without invoking oom-killer and
-			 * the system is not destabilized.
+			 * to add pages, make sure all new pages can be
+			 * allocated without receiving ENOMEM
 			 */
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL | __GFP_NORETRY,
-					    cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			page = alloc_pages_node(cpu_to_node(cpu),
-						GFP_KERNEL | __GFP_NORETRY, 0);
-			if (!page)
-				goto free_pages;
-			bpage->page = page_address(page);
-			rb_init_page(bpage->page);
+			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu))
+				/* not enough memory for new pages */
+				goto no_mem;
 		}
-	}
 
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
-	}
+		/* wait for all the updates to complete */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (cpu_buffer->nr_pages_to_update) {
+				update_pages_handler(cpu_buffer);
+			}
+		}
+	} else {
+		cpu_buffer = buffer->buffers[cpu_id];
+		if (nr_pages == cpu_buffer->nr_pages)
+			goto out;
 
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
+		cpu_buffer->nr_pages_to_update = nr_pages -
+						cpu_buffer->nr_pages;
+
+		INIT_LIST_HEAD(&cpu_buffer->new_pages);
+		if (cpu_buffer->nr_pages_to_update > 0 &&
+			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu_id))
+			goto no_mem;
+
+		update_pages_handler(cpu_buffer);
+	}
 
  out:
-	buffer->pages = nr_pages;
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 
@@ -1422,25 +1429,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	return size;
 
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+ no_mem:
+	for_each_buffer_cpu(buffer, cpu) {
+		struct buffer_page *bpage, *tmp;
+		cpu_buffer = buffer->buffers[cpu];
+		/* reset this number regardless */
+		cpu_buffer->nr_pages_to_update = 0;
+		if (list_empty(&cpu_buffer->new_pages))
+			continue;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
 	}
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1542,7 +1548,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 	 * assign the commit to the tail.
 	 */
  again:
-	max_count = cpu_buffer->buffer->pages * 100;
+	max_count = cpu_buffer->nr_pages * 100;
 
 	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
 		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
@@ -3563,9 +3569,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
  * ring_buffer_size - return the size of the ring buffer (in bytes)
  * @buffer: The ring buffer.
  */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
 {
-	return BUF_PAGE_SIZE * buffer->pages;
+	/*
+	 * Earlier, this method returned
+	 *	BUF_PAGE_SIZE * buffer->nr_pages
+	 * Since the nr_pages field is now removed, we have converted this to
+	 * return the per cpu buffer value.
+	 */
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_size);
 
@@ -3740,8 +3755,11 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
 		goto out;
 
+	cpu_buffer_a = buffer_a->buffers[cpu];
+	cpu_buffer_b = buffer_b->buffers[cpu];
+
 	/* At least make sure the two buffers are somewhat the same */
-	if (buffer_a->pages != buffer_b->pages)
+	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
 		goto out;
 
 	ret = -EAGAIN;
@@ -3755,9 +3773,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	if (atomic_read(&buffer_b->record_disabled))
 		goto out;
 
-	cpu_buffer_a = buffer_a->buffers[cpu];
-	cpu_buffer_b = buffer_b->buffers[cpu];
-
 	if (atomic_read(&cpu_buffer_a->record_disabled))
 		goto out;
 
@@ -4108,6 +4123,8 @@ static int rb_cpu_notify(struct notifier_block *self,
 	struct ring_buffer *buffer =
 		container_of(self, struct ring_buffer, cpu_notify);
 	long cpu = (long)hcpu;
+	int cpu_i, nr_pages_same;
+	unsigned int nr_pages;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -4115,8 +4132,23 @@ static int rb_cpu_notify(struct notifier_block *self,
 		if (cpumask_test_cpu(cpu, buffer->cpumask))
 			return NOTIFY_OK;
 
+		nr_pages = 0;
+		nr_pages_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_buffer_cpu(buffer, cpu_i) {
+			/* fill in the size from first enabled cpu */
+			if (nr_pages == 0)
+				nr_pages = buffer->buffers[cpu_i]->nr_pages;
+			if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
+				nr_pages_same = 0;
+				break;
+			}
+		}
+		/* allocate minimum pages, user can later expand it */
+		if (!nr_pages_same)
+			nr_pages = 2;
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu]) {
 			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
 			     cpu);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a3f1bc5..367659d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -787,7 +787,8 @@ __acquires(kernel_lock)
 
 		/* If we expanded the buffers, make sure the max is expanded too */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, trace_buf_size);
+			ring_buffer_resize(max_tr.buffer, trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 
 		/* the test is responsible for initializing and enabling */
 		pr_info("Testing tracer %s: ", type->name);
@@ -803,7 +804,8 @@ __acquires(kernel_lock)
 
 		/* Shrink the max buffer again */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, 1);
+			ring_buffer_resize(max_tr.buffer, 1,
+						RING_BUFFER_ALL_CPUS);
 
 		printk(KERN_CONT "PASSED\n");
 	}
@@ -2916,7 +2918,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
 	return t->init(tr);
 }
 
-static int __tracing_resize_ring_buffer(unsigned long size)
+static void set_buffer_entries(struct trace_array *tr, unsigned long val)
+{
+	int cpu;
+	for_each_tracing_cpu(cpu)
+		tr->data[cpu]->entries = val;
+}
+
+static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 {
 	int ret;
 
@@ -2927,19 +2936,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 	 */
 	ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
 	if (ret < 0)
 		return ret;
 
 	if (!current_trace->use_max_tr)
 		goto out;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
 	if (ret < 0) {
-		int r;
+		int r = 0;
+
+		if (cpu == RING_BUFFER_ALL_CPUS) {
+			int i;
+			for_each_tracing_cpu(i) {
+				r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[i]->entries,
+						i);
+				if (r < 0)
+					break;
+			}
+		} else {
+			r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+		}
 
-		r = ring_buffer_resize(global_trace.buffer,
-				       global_trace.entries);
 		if (r < 0) {
 			/*
 			 * AARGH! We are left with different
@@ -2961,14 +2983,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 		return ret;
 	}
 
-	max_tr.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&max_tr, size);
+	else
+		max_tr.data[cpu]->entries = size;
+
  out:
-	global_trace.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&global_trace, size);
+	else
+		global_trace.data[cpu]->entries = size;
 
 	return ret;
 }
 
-static ssize_t tracing_resize_ring_buffer(unsigned long size)
+static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
 	int cpu, ret = size;
 
@@ -2984,12 +3013,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
 			atomic_inc(&max_tr.data[cpu]->disabled);
 	}
 
-	if (size != global_trace.entries)
-		ret = __tracing_resize_ring_buffer(size);
+	if (cpu_id != RING_BUFFER_ALL_CPUS) {
+		/* make sure, this cpu is enabled in the mask */
+		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
 
+	ret = __tracing_resize_ring_buffer(size, cpu_id);
 	if (ret < 0)
 		ret = -ENOMEM;
 
+out:
 	for_each_tracing_cpu(cpu) {
 		if (global_trace.data[cpu])
 			atomic_dec(&global_trace.data[cpu]->disabled);
@@ -3020,7 +3056,8 @@ int tracing_update_buffers(void)
 
 	mutex_lock(&trace_types_lock);
 	if (!ring_buffer_expanded)
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -3044,7 +3081,8 @@ static int tracing_set_tracer(const char *buf)
 	mutex_lock(&trace_types_lock);
 
 	if (!ring_buffer_expanded) {
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3070,8 +3108,8 @@ static int tracing_set_tracer(const char *buf)
 		 * The max_tr ring buffer has some state (e.g. ring->clock) and
 		 * we want preserve it.
 		 */
-		ring_buffer_resize(max_tr.buffer, 1);
-		max_tr.entries = 1;
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
 	}
 	destroy_trace_option_files(topts);
 
@@ -3079,10 +3117,17 @@ static int tracing_set_tracer(const char *buf)
 
 	topts = create_trace_option_files(current_trace);
 	if (current_trace->use_max_tr) {
-		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
-		if (ret < 0)
-			goto out;
-		max_tr.entries = global_trace.entries;
+		int cpu;
+		/* we need to make per cpu buffer sizes equivalent */
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				goto out;
+			max_tr.data[cpu]->entries =
+					global_trace.data[cpu]->entries;
+		}
 	}
 
 	if (t->init) {
@@ -3584,30 +3629,82 @@ out_err:
 	goto out;
 }
 
+struct ftrace_entries_info {
+	struct trace_array	*tr;
+	int			cpu;
+};
+
+static int tracing_entries_open(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info;
+
+	if (tracing_disabled)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->tr = &global_trace;
+	info->cpu = (unsigned long)inode->i_private;
+
+	filp->private_data = info;
+
+	return 0;
+}
+
 static ssize_t
 tracing_entries_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	struct trace_array *tr = filp->private_data;
-	char buf[96];
-	int r;
+	struct ftrace_entries_info *info = filp->private_data;
+	struct trace_array *tr = info->tr;
+	char buf[64];
+	int r = 0;
+	ssize_t ret;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		r = sprintf(buf, "%lu (expanded: %lu)\n",
-			    tr->entries >> 10,
-			    trace_buf_size >> 10);
-	else
-		r = sprintf(buf, "%lu\n", tr->entries >> 10);
+
+	if (info->cpu == RING_BUFFER_ALL_CPUS) {
+		int cpu, buf_size_same;
+		unsigned long size;
+
+		size = 0;
+		buf_size_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_tracing_cpu(cpu) {
+			/* fill in the size from first enabled cpu */
+			if (size == 0)
+				size = tr->data[cpu]->entries;
+			if (size != tr->data[cpu]->entries) {
+				buf_size_same = 0;
+				break;
+			}
+		}
+
+		if (buf_size_same) {
+			if (!ring_buffer_expanded)
+				r = sprintf(buf, "%lu (expanded: %lu)\n",
+					    size >> 10,
+					    trace_buf_size >> 10);
+			else
+				r = sprintf(buf, "%lu\n", size >> 10);
+		} else
+			r = sprintf(buf, "X\n");
+	} else
+		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
+
 	mutex_unlock(&trace_types_lock);
 
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	return ret;
 }
 
 static ssize_t
 tracing_entries_write(struct file *filp, const char __user *ubuf,
 		      size_t cnt, loff_t *ppos)
 {
+	struct ftrace_entries_info *info = filp->private_data;
 	unsigned long val;
 	int ret;
 
@@ -3622,7 +3719,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	/* value is in KB */
 	val <<= 10;
 
-	ret = tracing_resize_ring_buffer(val);
+	ret = tracing_resize_ring_buffer(val, info->cpu);
 	if (ret < 0)
 		return ret;
 
@@ -3631,6 +3728,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static int
+tracing_entries_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info = filp->private_data;
+
+	kfree(info);
+
+	return 0;
+}
+
 static ssize_t
 tracing_total_entries_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos)
@@ -3642,7 +3749,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu) {
-		size += tr->entries >> 10;
+		size += tr->data[cpu]->entries >> 10;
 		if (!ring_buffer_expanded)
 			expanded_size += trace_buf_size >> 10;
 	}
@@ -3676,7 +3783,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
 		tracing_off();
 	/* resize the ring buffer to 0 */
-	tracing_resize_ring_buffer(0);
+	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
 
 	return 0;
 }
@@ -3875,9 +3982,10 @@ static const struct file_operations tracing_pipe_fops = {
 };
 
 static const struct file_operations tracing_entries_fops = {
-	.open		= tracing_open_generic,
+	.open		= tracing_entries_open,
 	.read		= tracing_entries_read,
 	.write		= tracing_entries_write,
+	.release	= tracing_entries_release,
 	.llseek		= generic_file_llseek,
 };
 
@@ -4329,6 +4437,9 @@ static void tracing_init_debugfs_percpu(long cpu)
 
 	trace_create_file("stats", 0444, d_cpu,
 			(void *) cpu, &tracing_stats_fops);
+
+	trace_create_file("buffer_size_kb", 0444, d_cpu,
+			(void *) cpu, &tracing_entries_fops);
 }
 
 #ifdef CONFIG_FTRACE_SELFTEST
@@ -4609,7 +4720,7 @@ static __init int tracer_init_debugfs(void)
 			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
 
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
-			&global_trace, &tracing_entries_fops);
+			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
 
 	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
 			&global_trace, &tracing_total_entries_fops);
@@ -4862,8 +4973,6 @@ __init static int tracer_alloc_buffers(void)
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
-
 
 #ifdef CONFIG_TRACER_MAX_TRACE
 	max_tr.buffer = ring_buffer_alloc(1, rb_flags);
@@ -4873,7 +4982,6 @@ __init static int tracer_alloc_buffers(void)
 		ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = 1;
 #endif
 
 	/* Allocate the first page for all buffers */
@@ -4882,6 +4990,11 @@ __init static int tracer_alloc_buffers(void)
 		max_tr.data[i] = &per_cpu(max_tr_data, i);
 	}
 
+	set_buffer_entries(&global_trace, ring_buf_size);
+#ifdef CONFIG_TRACER_MAX_TRACE
+	set_buffer_entries(&max_tr, 1);
+#endif
+
 	trace_init_cmdlines();
 
 	register_tracer(&nop_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b93ecba..decbca3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -125,6 +125,7 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
+	unsigned long		entries;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
@@ -146,7 +147,6 @@ struct trace_array_cpu {
  */
 struct trace_array {
 	struct ring_buffer	*buffer;
-	unsigned long		entries;
 	int			cpu;
 	cycle_t			time_start;
 	struct task_struct	*waiter;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic
  2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
@ 2012-02-02 20:00         ` Vaibhav Nagarnaik
  2012-04-21  4:27           ` Steven Rostedt
  2012-04-25 21:18           ` [PATCH v6 1/3] " Vaibhav Nagarnaik
  2012-02-02 20:00         ` [PATCH v5 3/4] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
                           ` (3 subsequent siblings)
  4 siblings, 2 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-02-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, Justin Teravest, linux-kernel,
	Vaibhav Nagarnaik

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v5-v4:
* Rebased to latest upstream

 kernel/trace/ring_buffer.c |  222 ++++++++++++++++++++++++++++++++-----------
 kernel/trace/trace.c       |   20 +----
 2 files changed, 166 insertions(+), 76 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c778ab9..a7c66e4 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -502,6 +504,8 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
@@ -1080,6 +1084,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	raw_spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1267,32 +1273,107 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	unsigned int nr_removed;
+	int page_entries;
+	struct list_head *tail_page, *to_remove, *next_page;
+	unsigned long head_bit;
+	struct buffer_page *last_page, *first_page;
+	struct buffer_page *to_remove_page, *tmp_iter_page;
 
+	head_bit = 0;
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
-
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+	atomic_inc(&cpu_buffer->record_disabled);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock. We also don't race with writers after disabling recording.
+	 * This makes it easy to figure out the first and the last page to be
+	 * removed from the list. We remove all the pages in between including
+	 * the first and last pages. This is done in a busy loop so that we
+	 * lose the least number of traces.
+	 * The pages are freed after we restart recording and unlock readers.
+	 */
+	tail_page = &cpu_buffer->tail_page->list;
+	/*
+	 * tail page might be on reader page, we remove the next page
+	 * from the ring buffer
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		tail_page = rb_list_head(tail_page->next);
+	to_remove = tail_page;
+
+	/* start of pages to remove */
+	first_page = list_entry(rb_list_head(to_remove->next),
+				struct buffer_page, list);
+	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
+		to_remove = rb_list_head(to_remove)->next;
+		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
 	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
 
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
-
-out:
+	next_page = rb_list_head(to_remove)->next;
+	/* now we remove all pages between tail_page and next_page */
+	tail_page->next = (struct list_head *)((unsigned long)next_page |
+						head_bit);
+	next_page = rb_list_head(next_page);
+	next_page->prev = tail_page;
+	/* make sure pages points to a valid page in the ring buffer */
+	cpu_buffer->pages = next_page;
+	/* update head page */
+	if (head_bit)
+		cpu_buffer->head_page = list_entry(next_page,
+						struct buffer_page, list);
+	/*
+	 * change read pointer to make sure any read iterators reset
+	 * themselves
+	 */
+	cpu_buffer->read = 0;
+	/* pages are removed, resume tracing and then free the pages */
+	atomic_dec(&cpu_buffer->record_disabled);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
+
+	/* last buffer page to remove */
+	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
+				list);
+	tmp_iter_page = first_page;
+	do {
+		to_remove_page = tmp_iter_page;
+		rb_inc_page(cpu_buffer, &tmp_iter_page);
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		nr_removed--;
+		free_buffer_page(to_remove_page);
+	} while (to_remove_page != last_page);
+
+	RB_WARN_ON(cpu_buffer, nr_removed);
 }
 
 static void
@@ -1303,6 +1384,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	struct list_head *p;
 	unsigned i;
 
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	/* Make sure all writers are done with this buffer. */
+	synchronize_sched();
+
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
 	rb_head_page_deactivate(cpu_buffer);
 
@@ -1319,18 +1406,22 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 out:
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+	atomic_dec(&cpu_buffer->record_disabled);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void update_pages_handler(struct work_struct *work)
 {
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+
 	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1340,14 +1431,14 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
  *
  * Minimum size is 2 * BUF_PAGE_SIZE.
  *
- * Returns -1 on failure.
+ * Returns 0 on success and < 0 on failure.
  */
 int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1362,21 +1453,28 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
-
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
+	/*
+	 * Don't succeed if recording is disabled globally, as a reader might
+	 * be manipulating the ring buffer and is expecting a sane state while
+	 * this is true.
+	 */
+	if (atomic_read(&buffer->record_disabled))
+		return -EBUSY;
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
+			if (atomic_read(&cpu_buffer->record_disabled)) {
+				err = -EBUSY;
+				goto out_err;
+			}
+
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 
@@ -1392,20 +1490,37 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
+		}
+
+		/* fire off all the required work handlers */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
 		}
 
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-			}
+			if (!cpu_buffer->nr_pages_to_update)
+				continue;
+			wait_for_completion(&cpu_buffer->update_completion);
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+		if (atomic_read(&cpu_buffer->record_disabled)) {
+			err = -EBUSY;
+			goto out_err;
+		}
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1415,38 +1530,41 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					&cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
+
+		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
+		wait_for_completion(&cpu_buffer->update_completion);
 
-		update_pages_handler(cpu_buffer);
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
 		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1485,21 +1603,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 367659d..abf1108 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2999,20 +2999,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -3026,14 +3016,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 3/4] trace: Make addition of pages in ring buffer atomic
  2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
  2012-02-02 20:00         ` [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
@ 2012-02-02 20:00         ` Vaibhav Nagarnaik
  2012-02-02 20:00         ` [PATCH v5 4/4] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-02-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, Justin Teravest, linux-kernel,
	Vaibhav Nagarnaik

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v5-v4:
* Rebased to latest upstream

 kernel/trace/ring_buffer.c |  128 +++++++++++++++++++++++++++++--------------
 1 files changed, 86 insertions(+), 42 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a7c66e4..5aef474 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -512,6 +512,7 @@ struct ring_buffer {
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
+	atomic_t			resize_disabled;
 	cpumask_var_t			cpumask;
 
 	struct lock_class_key		*reader_lock_key;
@@ -1283,7 +1284,7 @@ static inline unsigned long rb_page_write(struct buffer_page *bpage)
 	return local_read(&bpage->write) & RB_WRITE_MASK;
 }
 
-static void
+static int
 rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
 	unsigned int nr_removed;
@@ -1374,53 +1375,99 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 	} while (to_remove_page != last_page);
 
 	RB_WARN_ON(cpu_buffer, nr_removed);
+
+	return nr_removed == 0;
 }
 
-static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+static int
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
+	raw_spin_lock_irq(&cpu_buffer->reader_lock);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 10 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 10;
+	success = 0;
+	while (retries--) {
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *head_page_with_bit;
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
 
-	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+		first_page = pages->next;
+		last_page  = pages->prev;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+		head_page_with_bit = (struct list_head *)
+				((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next  = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
-	atomic_dec(&cpu_buffer->record_disabled);
+
+	/* free pages if they weren't inserted */
+	if (!success) {
+		struct buffer_page *bpage, *tmp;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
+	}
+	return success;
 }
 
 static void update_pages_handler(struct work_struct *work)
 {
 	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
 			struct ring_buffer_per_cpu, update_pages_work);
+	int success;
 
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		success = rb_insert_pages(cpu_buffer);
 	else
-		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+		success = rb_remove_pages(cpu_buffer,
+					-cpu_buffer->nr_pages_to_update);
 
-	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	if (success)
+		cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
 	complete(&cpu_buffer->update_completion);
 }
 
@@ -1456,11 +1503,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	/*
-	 * Don't succeed if recording is disabled globally, as a reader might
-	 * be manipulating the ring buffer and is expecting a sane state while
+	 * Don't succeed if resizing is disabled, as a reader might be
+	 * manipulating the ring buffer and is expecting a sane state while
 	 * this is true.
 	 */
-	if (atomic_read(&buffer->record_disabled))
+	if (atomic_read(&buffer->resize_disabled))
 		return -EBUSY;
 	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
@@ -1470,11 +1517,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
-			if (atomic_read(&cpu_buffer->record_disabled)) {
-				err = -EBUSY;
-				goto out_err;
-			}
-
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 
@@ -1516,11 +1558,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		}
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
-		if (atomic_read(&cpu_buffer->record_disabled)) {
-			err = -EBUSY;
-			goto out_err;
-		}
-
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1537,7 +1574,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 
 		schedule_work_on(cpu_id, &cpu_buffer->update_pages_work);
 		wait_for_completion(&cpu_buffer->update_completion);
-
 		/* reset this value */
 		cpu_buffer->nr_pages_to_update = 0;
 	}
@@ -3575,6 +3611,7 @@ ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu)
 
 	iter->cpu_buffer = cpu_buffer;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
 	return iter;
@@ -3638,6 +3675,7 @@ ring_buffer_read_finish(struct ring_buffer_iter *iter)
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&cpu_buffer->buffer->resize_disabled);
 	kfree(iter);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read_finish);
@@ -3709,6 +3747,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
@@ -3745,8 +3784,12 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
+	/* Make sure all commits have finished */
+	synchronize_sched();
+
 	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
 	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
@@ -3762,6 +3805,7 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&buffer->resize_disabled);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 4/4] trace: change CPU ring buffer state from tracing_cpumask
  2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
  2012-02-02 20:00         ` [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
  2012-02-02 20:00         ` [PATCH v5 3/4] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
@ 2012-02-02 20:00         ` Vaibhav Nagarnaik
  2012-03-08 23:51         ` [PATCH v5 1/4] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
  2012-05-02 21:03         ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
  4 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-02-02 20:00 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, Justin Teravest, linux-kernel,
	Vaibhav Nagarnaik

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/trace.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index abf1108..e25672e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2592,10 +2592,12 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf,
 		if (cpumask_test_cpu(cpu, tracing_cpumask) &&
 				!cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_inc(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_disable_cpu(global_trace.buffer, cpu);
 		}
 		if (!cpumask_test_cpu(cpu, tracing_cpumask) &&
 				cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_dec(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_enable_cpu(global_trace.buffer, cpu);
 		}
 	}
 	arch_spin_unlock(&ftrace_max_lock);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 1/4] trace: Add per_cpu ring buffer control files
  2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
                           ` (2 preceding siblings ...)
  2012-02-02 20:00         ` [PATCH v5 4/4] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
@ 2012-03-08 23:51         ` Vaibhav Nagarnaik
  2012-05-02 21:03         ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
  4 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-03-08 23:51 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Michael Rubin, David Sharp, Justin Teravest, linux-kernel,
	Vaibhav Nagarnaik

On Thu, Feb 2, 2012 at 12:00 PM, Vaibhav Nagarnaik
<vnagarnaik@google.com> wrote:
> Add a debugfs entry under per_cpu/ folder for each cpu called
> buffer_size_kb to control the ring buffer size for each CPU
> independently.
>
> If the global file buffer_size_kb is used to set size, the individual
> ring buffers will be adjusted to the given size. The buffer_size_kb will
> report the common size to maintain backward compatibility.
>
> If the buffer_size_kb file under the per_cpu/ directory is used to
> change buffer size for a specific CPU, only the size of the respective
> ring buffer is updated. When tracing/buffer_size_kb is read, it reports
> 'X' to indicate that sizes of per_cpu ring buffers are not equivalent.
>
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
> ---
> Changelog v5-v4:
> * Rebased to latest upstream

Hi Steven

Have you had any time to review this latest set of patches?


Thanks

Vaibhav Nagarnaik


>
>  include/linux/ring_buffer.h |    6 +-
>  kernel/trace/ring_buffer.c  |  248 ++++++++++++++++++++++++-------------------
>  kernel/trace/trace.c        |  191 ++++++++++++++++++++++++++-------
>  kernel/trace/trace.h        |    2 +-
>  4 files changed, 297 insertions(+), 150 deletions(-)
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 67be037..ad36702 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
>        __ring_buffer_alloc((size), (flags), &__key);   \
>  })
>
> +#define RING_BUFFER_ALL_CPUS -1
> +
>  void ring_buffer_free(struct ring_buffer *buffer);
>
> -int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
> +int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
>
>  void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
>
> @@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
>  void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
>  int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
>
> -unsigned long ring_buffer_size(struct ring_buffer *buffer);
> +unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
>
>  void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
>  void ring_buffer_reset(struct ring_buffer *buffer);
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index f5b7b5c..c778ab9 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -481,6 +481,7 @@ struct ring_buffer_per_cpu {
>        raw_spinlock_t                  reader_lock;    /* serialize readers */
>        arch_spinlock_t                 lock;
>        struct lock_class_key           lock_key;
> +       unsigned int                    nr_pages;
>        struct list_head                *pages;
>        struct buffer_page              *head_page;     /* read from head */
>        struct buffer_page              *tail_page;     /* write to tail */
> @@ -498,10 +499,12 @@ struct ring_buffer_per_cpu {
>        unsigned long                   read_bytes;
>        u64                             write_stamp;
>        u64                             read_stamp;
> +       /* ring buffer pages to update, > 0 to add, < 0 to remove */
> +       int                             nr_pages_to_update;
> +       struct list_head                new_pages; /* new pages to add */
>  };
>
>  struct ring_buffer {
> -       unsigned                        pages;
>        unsigned                        flags;
>        int                             cpus;
>        atomic_t                        record_disabled;
> @@ -995,14 +998,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
>        return 0;
>  }
>
> -static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
> -                            unsigned nr_pages)
> +static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
>  {
> +       int i;
>        struct buffer_page *bpage, *tmp;
> -       LIST_HEAD(pages);
> -       unsigned i;
> -
> -       WARN_ON(!nr_pages);
>
>        for (i = 0; i < nr_pages; i++) {
>                struct page *page;
> @@ -1013,15 +1012,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>                 */
>                bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
>                                    GFP_KERNEL | __GFP_NORETRY,
> -                                   cpu_to_node(cpu_buffer->cpu));
> +                                   cpu_to_node(cpu));
>                if (!bpage)
>                        goto free_pages;
>
> -               rb_check_bpage(cpu_buffer, bpage);
> +               list_add(&bpage->list, pages);
>
> -               list_add(&bpage->list, &pages);
> -
> -               page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
> +               page = alloc_pages_node(cpu_to_node(cpu),
>                                        GFP_KERNEL | __GFP_NORETRY, 0);
>                if (!page)
>                        goto free_pages;
> @@ -1029,6 +1026,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>                rb_init_page(bpage->page);
>        }
>
> +       return 0;
> +
> +free_pages:
> +       list_for_each_entry_safe(bpage, tmp, pages, list) {
> +               list_del_init(&bpage->list);
> +               free_buffer_page(bpage);
> +       }
> +
> +       return -ENOMEM;
> +}
> +
> +static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
> +                            unsigned nr_pages)
> +{
> +       LIST_HEAD(pages);
> +
> +       WARN_ON(!nr_pages);
> +
> +       if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
> +               return -ENOMEM;
> +
>        /*
>         * The ring buffer page list is a circular list that does not
>         * start and end with a list head. All page list items point to
> @@ -1037,20 +1055,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
>        cpu_buffer->pages = pages.next;
>        list_del(&pages);
>
> +       cpu_buffer->nr_pages = nr_pages;
> +
>        rb_check_pages(cpu_buffer);
>
>        return 0;
> -
> - free_pages:
> -       list_for_each_entry_safe(bpage, tmp, &pages, list) {
> -               list_del_init(&bpage->list);
> -               free_buffer_page(bpage);
> -       }
> -       return -ENOMEM;
>  }
>
>  static struct ring_buffer_per_cpu *
> -rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
> +rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
>  {
>        struct ring_buffer_per_cpu *cpu_buffer;
>        struct buffer_page *bpage;
> @@ -1084,7 +1097,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
>
>        INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
>
> -       ret = rb_allocate_pages(cpu_buffer, buffer->pages);
> +       ret = rb_allocate_pages(cpu_buffer, nr_pages);
>        if (ret < 0)
>                goto fail_free_reader;
>
> @@ -1145,7 +1158,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>  {
>        struct ring_buffer *buffer;
>        int bsize;
> -       int cpu;
> +       int cpu, nr_pages;
>
>        /* keep it in its own cache line */
>        buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
> @@ -1156,14 +1169,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>        if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
>                goto fail_free_buffer;
>
> -       buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
> +       nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>        buffer->flags = flags;
>        buffer->clock = trace_clock_local;
>        buffer->reader_lock_key = key;
>
>        /* need at least two pages */
> -       if (buffer->pages < 2)
> -               buffer->pages = 2;
> +       if (nr_pages < 2)
> +               nr_pages = 2;
>
>        /*
>         * In case of non-hotplug cpu, if the ring-buffer is allocated
> @@ -1186,7 +1199,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
>
>        for_each_buffer_cpu(buffer, cpu) {
>                buffer->buffers[cpu] =
> -                       rb_allocate_cpu_buffer(buffer, cpu);
> +                       rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
>                if (!buffer->buffers[cpu])
>                        goto fail_free_buffers;
>        }
> @@ -1308,6 +1321,18 @@ out:
>        raw_spin_unlock_irq(&cpu_buffer->reader_lock);
>  }
>
> +static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
> +{
> +       if (cpu_buffer->nr_pages_to_update > 0)
> +               rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
> +                               cpu_buffer->nr_pages_to_update);
> +       else
> +               rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
> +       cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
> +       /* reset this value */
> +       cpu_buffer->nr_pages_to_update = 0;
> +}
> +
>  /**
>  * ring_buffer_resize - resize the ring buffer
>  * @buffer: the buffer to resize.
> @@ -1317,14 +1342,12 @@ out:
>  *
>  * Returns -1 on failure.
>  */
> -int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
> +int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
> +                       int cpu_id)
>  {
>        struct ring_buffer_per_cpu *cpu_buffer;
> -       unsigned nr_pages, rm_pages, new_pages;
> -       struct buffer_page *bpage, *tmp;
> -       unsigned long buffer_size;
> -       LIST_HEAD(pages);
> -       int i, cpu;
> +       unsigned nr_pages;
> +       int cpu;
>
>        /*
>         * Always succeed at resizing a non-existent buffer:
> @@ -1334,15 +1357,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>
>        size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>        size *= BUF_PAGE_SIZE;
> -       buffer_size = buffer->pages * BUF_PAGE_SIZE;
>
>        /* we need a minimum of two pages */
>        if (size < BUF_PAGE_SIZE * 2)
>                size = BUF_PAGE_SIZE * 2;
>
> -       if (size == buffer_size)
> -               return size;
> -
>        atomic_inc(&buffer->record_disabled);
>
>        /* Make sure all writers are done with this buffer. */
> @@ -1353,68 +1372,56 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>
>        nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
>
> -       if (size < buffer_size) {
> -
> -               /* easy case, just free pages */
> -               if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
> -                       goto out_fail;
> -
> -               rm_pages = buffer->pages - nr_pages;
> -
> +       if (cpu_id == RING_BUFFER_ALL_CPUS) {
> +               /* calculate the pages to update */
>                for_each_buffer_cpu(buffer, cpu) {
>                        cpu_buffer = buffer->buffers[cpu];
> -                       rb_remove_pages(cpu_buffer, rm_pages);
> -               }
> -               goto out;
> -       }
>
> -       /*
> -        * This is a bit more difficult. We only want to add pages
> -        * when we can allocate enough for all CPUs. We do this
> -        * by allocating all the pages and storing them on a local
> -        * link list. If we succeed in our allocation, then we
> -        * add these pages to the cpu_buffers. Otherwise we just free
> -        * them all and return -ENOMEM;
> -        */
> -       if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
> -               goto out_fail;
> +                       cpu_buffer->nr_pages_to_update = nr_pages -
> +                                                       cpu_buffer->nr_pages;
>
> -       new_pages = nr_pages - buffer->pages;
> +                       /*
> +                        * nothing more to do for removing pages or no update
> +                        */
> +                       if (cpu_buffer->nr_pages_to_update <= 0)
> +                               continue;
>
> -       for_each_buffer_cpu(buffer, cpu) {
> -               for (i = 0; i < new_pages; i++) {
> -                       struct page *page;
>                        /*
> -                        * __GFP_NORETRY flag makes sure that the allocation
> -                        * fails gracefully without invoking oom-killer and
> -                        * the system is not destabilized.
> +                        * to add pages, make sure all new pages can be
> +                        * allocated without receiving ENOMEM
>                         */
> -                       bpage = kzalloc_node(ALIGN(sizeof(*bpage),
> -                                                 cache_line_size()),
> -                                           GFP_KERNEL | __GFP_NORETRY,
> -                                           cpu_to_node(cpu));
> -                       if (!bpage)
> -                               goto free_pages;
> -                       list_add(&bpage->list, &pages);
> -                       page = alloc_pages_node(cpu_to_node(cpu),
> -                                               GFP_KERNEL | __GFP_NORETRY, 0);
> -                       if (!page)
> -                               goto free_pages;
> -                       bpage->page = page_address(page);
> -                       rb_init_page(bpage->page);
> +                       INIT_LIST_HEAD(&cpu_buffer->new_pages);
> +                       if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> +                                               &cpu_buffer->new_pages, cpu))
> +                               /* not enough memory for new pages */
> +                               goto no_mem;
>                }
> -       }
>
> -       for_each_buffer_cpu(buffer, cpu) {
> -               cpu_buffer = buffer->buffers[cpu];
> -               rb_insert_pages(cpu_buffer, &pages, new_pages);
> -       }
> +               /* wait for all the updates to complete */
> +               for_each_buffer_cpu(buffer, cpu) {
> +                       cpu_buffer = buffer->buffers[cpu];
> +                       if (cpu_buffer->nr_pages_to_update) {
> +                               update_pages_handler(cpu_buffer);
> +                       }
> +               }
> +       } else {
> +               cpu_buffer = buffer->buffers[cpu_id];
> +               if (nr_pages == cpu_buffer->nr_pages)
> +                       goto out;
>
> -       if (RB_WARN_ON(buffer, !list_empty(&pages)))
> -               goto out_fail;
> +               cpu_buffer->nr_pages_to_update = nr_pages -
> +                                               cpu_buffer->nr_pages;
> +
> +               INIT_LIST_HEAD(&cpu_buffer->new_pages);
> +               if (cpu_buffer->nr_pages_to_update > 0 &&
> +                       __rb_allocate_pages(cpu_buffer->nr_pages_to_update,
> +                                               &cpu_buffer->new_pages, cpu_id))
> +                       goto no_mem;
> +
> +               update_pages_handler(cpu_buffer);
> +       }
>
>  out:
> -       buffer->pages = nr_pages;
>        put_online_cpus();
>        mutex_unlock(&buffer->mutex);
>
> @@ -1422,25 +1429,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
>
>        return size;
>
> - free_pages:
> -       list_for_each_entry_safe(bpage, tmp, &pages, list) {
> -               list_del_init(&bpage->list);
> -               free_buffer_page(bpage);
> + no_mem:
> +       for_each_buffer_cpu(buffer, cpu) {
> +               struct buffer_page *bpage, *tmp;
> +               cpu_buffer = buffer->buffers[cpu];
> +               /* reset this number regardless */
> +               cpu_buffer->nr_pages_to_update = 0;
> +               if (list_empty(&cpu_buffer->new_pages))
> +                       continue;
> +               list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
> +                                       list) {
> +                       list_del_init(&bpage->list);
> +                       free_buffer_page(bpage);
> +               }
>        }
>        put_online_cpus();
>        mutex_unlock(&buffer->mutex);
>        atomic_dec(&buffer->record_disabled);
>        return -ENOMEM;
> -
> -       /*
> -        * Something went totally wrong, and we are too paranoid
> -        * to even clean up the mess.
> -        */
> - out_fail:
> -       put_online_cpus();
> -       mutex_unlock(&buffer->mutex);
> -       atomic_dec(&buffer->record_disabled);
> -       return -1;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_resize);
>
> @@ -1542,7 +1548,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
>         * assign the commit to the tail.
>         */
>  again:
> -       max_count = cpu_buffer->buffer->pages * 100;
> +       max_count = cpu_buffer->nr_pages * 100;
>
>        while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
>                if (RB_WARN_ON(cpu_buffer, !(--max_count)))
> @@ -3563,9 +3569,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
>  * ring_buffer_size - return the size of the ring buffer (in bytes)
>  * @buffer: The ring buffer.
>  */
> -unsigned long ring_buffer_size(struct ring_buffer *buffer)
> +unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
>  {
> -       return BUF_PAGE_SIZE * buffer->pages;
> +       /*
> +        * Earlier, this method returned
> +        *      BUF_PAGE_SIZE * buffer->nr_pages
> +        * Since the nr_pages field is now removed, we have converted this to
> +        * return the per cpu buffer value.
> +        */
> +       if (!cpumask_test_cpu(cpu, buffer->cpumask))
> +               return 0;
> +
> +       return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
>  }
>  EXPORT_SYMBOL_GPL(ring_buffer_size);
>
> @@ -3740,8 +3755,11 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>            !cpumask_test_cpu(cpu, buffer_b->cpumask))
>                goto out;
>
> +       cpu_buffer_a = buffer_a->buffers[cpu];
> +       cpu_buffer_b = buffer_b->buffers[cpu];
> +
>        /* At least make sure the two buffers are somewhat the same */
> -       if (buffer_a->pages != buffer_b->pages)
> +       if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
>                goto out;
>
>        ret = -EAGAIN;
> @@ -3755,9 +3773,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>        if (atomic_read(&buffer_b->record_disabled))
>                goto out;
>
> -       cpu_buffer_a = buffer_a->buffers[cpu];
> -       cpu_buffer_b = buffer_b->buffers[cpu];
> -
>        if (atomic_read(&cpu_buffer_a->record_disabled))
>                goto out;
>
> @@ -4108,6 +4123,8 @@ static int rb_cpu_notify(struct notifier_block *self,
>        struct ring_buffer *buffer =
>                container_of(self, struct ring_buffer, cpu_notify);
>        long cpu = (long)hcpu;
> +       int cpu_i, nr_pages_same;
> +       unsigned int nr_pages;
>
>        switch (action) {
>        case CPU_UP_PREPARE:
> @@ -4115,8 +4132,23 @@ static int rb_cpu_notify(struct notifier_block *self,
>                if (cpumask_test_cpu(cpu, buffer->cpumask))
>                        return NOTIFY_OK;
>
> +               nr_pages = 0;
> +               nr_pages_same = 1;
> +               /* check if all cpu sizes are same */
> +               for_each_buffer_cpu(buffer, cpu_i) {
> +                       /* fill in the size from first enabled cpu */
> +                       if (nr_pages == 0)
> +                               nr_pages = buffer->buffers[cpu_i]->nr_pages;
> +                       if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
> +                               nr_pages_same = 0;
> +                               break;
> +                       }
> +               }
> +               /* allocate minimum pages, user can later expand it */
> +               if (!nr_pages_same)
> +                       nr_pages = 2;
>                buffer->buffers[cpu] =
> -                       rb_allocate_cpu_buffer(buffer, cpu);
> +                       rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
>                if (!buffer->buffers[cpu]) {
>                        WARN(1, "failed to allocate ring buffer on CPU %ld\n",
>                             cpu);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index a3f1bc5..367659d 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -787,7 +787,8 @@ __acquires(kernel_lock)
>
>                /* If we expanded the buffers, make sure the max is expanded too */
>                if (ring_buffer_expanded && type->use_max_tr)
> -                       ring_buffer_resize(max_tr.buffer, trace_buf_size);
> +                       ring_buffer_resize(max_tr.buffer, trace_buf_size,
> +                                               RING_BUFFER_ALL_CPUS);
>
>                /* the test is responsible for initializing and enabling */
>                pr_info("Testing tracer %s: ", type->name);
> @@ -803,7 +804,8 @@ __acquires(kernel_lock)
>
>                /* Shrink the max buffer again */
>                if (ring_buffer_expanded && type->use_max_tr)
> -                       ring_buffer_resize(max_tr.buffer, 1);
> +                       ring_buffer_resize(max_tr.buffer, 1,
> +                                               RING_BUFFER_ALL_CPUS);
>
>                printk(KERN_CONT "PASSED\n");
>        }
> @@ -2916,7 +2918,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
>        return t->init(tr);
>  }
>
> -static int __tracing_resize_ring_buffer(unsigned long size)
> +static void set_buffer_entries(struct trace_array *tr, unsigned long val)
> +{
> +       int cpu;
> +       for_each_tracing_cpu(cpu)
> +               tr->data[cpu]->entries = val;
> +}
> +
> +static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
>  {
>        int ret;
>
> @@ -2927,19 +2936,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
>         */
>        ring_buffer_expanded = 1;
>
> -       ret = ring_buffer_resize(global_trace.buffer, size);
> +       ret = ring_buffer_resize(global_trace.buffer, size, cpu);
>        if (ret < 0)
>                return ret;
>
>        if (!current_trace->use_max_tr)
>                goto out;
>
> -       ret = ring_buffer_resize(max_tr.buffer, size);
> +       ret = ring_buffer_resize(max_tr.buffer, size, cpu);
>        if (ret < 0) {
> -               int r;
> +               int r = 0;
> +
> +               if (cpu == RING_BUFFER_ALL_CPUS) {
> +                       int i;
> +                       for_each_tracing_cpu(i) {
> +                               r = ring_buffer_resize(global_trace.buffer,
> +                                               global_trace.data[i]->entries,
> +                                               i);
> +                               if (r < 0)
> +                                       break;
> +                       }
> +               } else {
> +                       r = ring_buffer_resize(global_trace.buffer,
> +                                               global_trace.data[cpu]->entries,
> +                                               cpu);
> +               }
>
> -               r = ring_buffer_resize(global_trace.buffer,
> -                                      global_trace.entries);
>                if (r < 0) {
>                        /*
>                         * AARGH! We are left with different
> @@ -2961,14 +2983,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
>                return ret;
>        }
>
> -       max_tr.entries = size;
> +       if (cpu == RING_BUFFER_ALL_CPUS)
> +               set_buffer_entries(&max_tr, size);
> +       else
> +               max_tr.data[cpu]->entries = size;
> +
>  out:
> -       global_trace.entries = size;
> +       if (cpu == RING_BUFFER_ALL_CPUS)
> +               set_buffer_entries(&global_trace, size);
> +       else
> +               global_trace.data[cpu]->entries = size;
>
>        return ret;
>  }
>
> -static ssize_t tracing_resize_ring_buffer(unsigned long size)
> +static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
>  {
>        int cpu, ret = size;
>
> @@ -2984,12 +3013,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
>                        atomic_inc(&max_tr.data[cpu]->disabled);
>        }
>
> -       if (size != global_trace.entries)
> -               ret = __tracing_resize_ring_buffer(size);
> +       if (cpu_id != RING_BUFFER_ALL_CPUS) {
> +               /* make sure, this cpu is enabled in the mask */
> +               if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +       }
>
> +       ret = __tracing_resize_ring_buffer(size, cpu_id);
>        if (ret < 0)
>                ret = -ENOMEM;
>
> +out:
>        for_each_tracing_cpu(cpu) {
>                if (global_trace.data[cpu])
>                        atomic_dec(&global_trace.data[cpu]->disabled);
> @@ -3020,7 +3056,8 @@ int tracing_update_buffers(void)
>
>        mutex_lock(&trace_types_lock);
>        if (!ring_buffer_expanded)
> -               ret = __tracing_resize_ring_buffer(trace_buf_size);
> +               ret = __tracing_resize_ring_buffer(trace_buf_size,
> +                                               RING_BUFFER_ALL_CPUS);
>        mutex_unlock(&trace_types_lock);
>
>        return ret;
> @@ -3044,7 +3081,8 @@ static int tracing_set_tracer(const char *buf)
>        mutex_lock(&trace_types_lock);
>
>        if (!ring_buffer_expanded) {
> -               ret = __tracing_resize_ring_buffer(trace_buf_size);
> +               ret = __tracing_resize_ring_buffer(trace_buf_size,
> +                                               RING_BUFFER_ALL_CPUS);
>                if (ret < 0)
>                        goto out;
>                ret = 0;
> @@ -3070,8 +3108,8 @@ static int tracing_set_tracer(const char *buf)
>                 * The max_tr ring buffer has some state (e.g. ring->clock) and
>                 * we want preserve it.
>                 */
> -               ring_buffer_resize(max_tr.buffer, 1);
> -               max_tr.entries = 1;
> +               ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
> +               set_buffer_entries(&max_tr, 1);
>        }
>        destroy_trace_option_files(topts);
>
> @@ -3079,10 +3117,17 @@ static int tracing_set_tracer(const char *buf)
>
>        topts = create_trace_option_files(current_trace);
>        if (current_trace->use_max_tr) {
> -               ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
> -               if (ret < 0)
> -                       goto out;
> -               max_tr.entries = global_trace.entries;
> +               int cpu;
> +               /* we need to make per cpu buffer sizes equivalent */
> +               for_each_tracing_cpu(cpu) {
> +                       ret = ring_buffer_resize(max_tr.buffer,
> +                                               global_trace.data[cpu]->entries,
> +                                               cpu);
> +                       if (ret < 0)
> +                               goto out;
> +                       max_tr.data[cpu]->entries =
> +                                       global_trace.data[cpu]->entries;
> +               }
>        }
>
>        if (t->init) {
> @@ -3584,30 +3629,82 @@ out_err:
>        goto out;
>  }
>
> +struct ftrace_entries_info {
> +       struct trace_array      *tr;
> +       int                     cpu;
> +};
> +
> +static int tracing_entries_open(struct inode *inode, struct file *filp)
> +{
> +       struct ftrace_entries_info *info;
> +
> +       if (tracing_disabled)
> +               return -ENODEV;
> +
> +       info = kzalloc(sizeof(*info), GFP_KERNEL);
> +       if (!info)
> +               return -ENOMEM;
> +
> +       info->tr = &global_trace;
> +       info->cpu = (unsigned long)inode->i_private;
> +
> +       filp->private_data = info;
> +
> +       return 0;
> +}
> +
>  static ssize_t
>  tracing_entries_read(struct file *filp, char __user *ubuf,
>                     size_t cnt, loff_t *ppos)
>  {
> -       struct trace_array *tr = filp->private_data;
> -       char buf[96];
> -       int r;
> +       struct ftrace_entries_info *info = filp->private_data;
> +       struct trace_array *tr = info->tr;
> +       char buf[64];
> +       int r = 0;
> +       ssize_t ret;
>
>        mutex_lock(&trace_types_lock);
> -       if (!ring_buffer_expanded)
> -               r = sprintf(buf, "%lu (expanded: %lu)\n",
> -                           tr->entries >> 10,
> -                           trace_buf_size >> 10);
> -       else
> -               r = sprintf(buf, "%lu\n", tr->entries >> 10);
> +
> +       if (info->cpu == RING_BUFFER_ALL_CPUS) {
> +               int cpu, buf_size_same;
> +               unsigned long size;
> +
> +               size = 0;
> +               buf_size_same = 1;
> +               /* check if all cpu sizes are same */
> +               for_each_tracing_cpu(cpu) {
> +                       /* fill in the size from first enabled cpu */
> +                       if (size == 0)
> +                               size = tr->data[cpu]->entries;
> +                       if (size != tr->data[cpu]->entries) {
> +                               buf_size_same = 0;
> +                               break;
> +                       }
> +               }
> +
> +               if (buf_size_same) {
> +                       if (!ring_buffer_expanded)
> +                               r = sprintf(buf, "%lu (expanded: %lu)\n",
> +                                           size >> 10,
> +                                           trace_buf_size >> 10);
> +                       else
> +                               r = sprintf(buf, "%lu\n", size >> 10);
> +               } else
> +                       r = sprintf(buf, "X\n");
> +       } else
> +               r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
> +
>        mutex_unlock(&trace_types_lock);
>
> -       return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +       ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +       return ret;
>  }
>
>  static ssize_t
>  tracing_entries_write(struct file *filp, const char __user *ubuf,
>                      size_t cnt, loff_t *ppos)
>  {
> +       struct ftrace_entries_info *info = filp->private_data;
>        unsigned long val;
>        int ret;
>
> @@ -3622,7 +3719,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>        /* value is in KB */
>        val <<= 10;
>
> -       ret = tracing_resize_ring_buffer(val);
> +       ret = tracing_resize_ring_buffer(val, info->cpu);
>        if (ret < 0)
>                return ret;
>
> @@ -3631,6 +3728,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
>        return cnt;
>  }
>
> +static int
> +tracing_entries_release(struct inode *inode, struct file *filp)
> +{
> +       struct ftrace_entries_info *info = filp->private_data;
> +
> +       kfree(info);
> +
> +       return 0;
> +}
> +
>  static ssize_t
>  tracing_total_entries_read(struct file *filp, char __user *ubuf,
>                                size_t cnt, loff_t *ppos)
> @@ -3642,7 +3749,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
>
>        mutex_lock(&trace_types_lock);
>        for_each_tracing_cpu(cpu) {
> -               size += tr->entries >> 10;
> +               size += tr->data[cpu]->entries >> 10;
>                if (!ring_buffer_expanded)
>                        expanded_size += trace_buf_size >> 10;
>        }
> @@ -3676,7 +3783,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
>        if (trace_flags & TRACE_ITER_STOP_ON_FREE)
>                tracing_off();
>        /* resize the ring buffer to 0 */
> -       tracing_resize_ring_buffer(0);
> +       tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
>
>        return 0;
>  }
> @@ -3875,9 +3982,10 @@ static const struct file_operations tracing_pipe_fops = {
>  };
>
>  static const struct file_operations tracing_entries_fops = {
> -       .open           = tracing_open_generic,
> +       .open           = tracing_entries_open,
>        .read           = tracing_entries_read,
>        .write          = tracing_entries_write,
> +       .release        = tracing_entries_release,
>        .llseek         = generic_file_llseek,
>  };
>
> @@ -4329,6 +4437,9 @@ static void tracing_init_debugfs_percpu(long cpu)
>
>        trace_create_file("stats", 0444, d_cpu,
>                        (void *) cpu, &tracing_stats_fops);
> +
> +       trace_create_file("buffer_size_kb", 0444, d_cpu,
> +                       (void *) cpu, &tracing_entries_fops);
>  }
>
>  #ifdef CONFIG_FTRACE_SELFTEST
> @@ -4609,7 +4720,7 @@ static __init int tracer_init_debugfs(void)
>                        (void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
>
>        trace_create_file("buffer_size_kb", 0644, d_tracer,
> -                       &global_trace, &tracing_entries_fops);
> +                       (void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
>
>        trace_create_file("buffer_total_size_kb", 0444, d_tracer,
>                        &global_trace, &tracing_total_entries_fops);
> @@ -4862,8 +4973,6 @@ __init static int tracer_alloc_buffers(void)
>                WARN_ON(1);
>                goto out_free_cpumask;
>        }
> -       global_trace.entries = ring_buffer_size(global_trace.buffer);
> -
>
>  #ifdef CONFIG_TRACER_MAX_TRACE
>        max_tr.buffer = ring_buffer_alloc(1, rb_flags);
> @@ -4873,7 +4982,6 @@ __init static int tracer_alloc_buffers(void)
>                ring_buffer_free(global_trace.buffer);
>                goto out_free_cpumask;
>        }
> -       max_tr.entries = 1;
>  #endif
>
>        /* Allocate the first page for all buffers */
> @@ -4882,6 +4990,11 @@ __init static int tracer_alloc_buffers(void)
>                max_tr.data[i] = &per_cpu(max_tr_data, i);
>        }
>
> +       set_buffer_entries(&global_trace, ring_buf_size);
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +       set_buffer_entries(&max_tr, 1);
> +#endif
> +
>        trace_init_cmdlines();
>
>        register_tracer(&nop_trace);
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index b93ecba..decbca3 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -125,6 +125,7 @@ struct trace_array_cpu {
>        atomic_t                disabled;
>        void                    *buffer_page;   /* ring buffer spare */
>
> +       unsigned long           entries;
>        unsigned long           saved_latency;
>        unsigned long           critical_start;
>        unsigned long           critical_end;
> @@ -146,7 +147,6 @@ struct trace_array_cpu {
>  */
>  struct trace_array {
>        struct ring_buffer      *buffer;
> -       unsigned long           entries;
>        int                     cpu;
>        cycle_t                 time_start;
>        struct task_struct      *waiter;
> --
> 1.7.7.3
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic
  2012-02-02 20:00         ` [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
@ 2012-04-21  4:27           ` Steven Rostedt
  2012-04-23 17:31             ` Vaibhav Nagarnaik
  2012-04-25 21:18           ` [PATCH v6 1/3] " Vaibhav Nagarnaik
  1 sibling, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-04-21  4:27 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	Justin Teravest, linux-kernel

On Thu, 2012-02-02 at 12:00 -0800, Vaibhav Nagarnaik wrote:
> This patch adds the capability to remove pages from a ring buffer
> without destroying any existing data in it.
> 
> This is done by removing the pages after the tail page. This makes sure
> that first all the empty pages in the ring buffer are removed. If the
> head page is one in the list of pages to be removed, then the page after
> the removed ones is made the head page. This removes the oldest data
> from the ring buffer and keeps the latest data around to be read.
> 
> To do this in a non-racey manner, tracing is stopped for a very short
> time while the pages to be removed are identified and unlinked from the
> ring buffer. The pages are freed after the tracing is restarted to
> minimize the time needed to stop tracing.
> 
> The context in which the pages from the per-cpu ring buffer are removed
> runs on the respective CPU. This minimizes the events not traced to only
> NMI trace contexts.

Can't do this. (see below)



> +rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int
> nr_pages)
>  {
> -       struct buffer_page *bpage;
> -       struct list_head *p;
> -       unsigned i;
> +       unsigned int nr_removed;
> +       int page_entries;
> +       struct list_head *tail_page, *to_remove, *next_page;
> +       unsigned long head_bit;
> +       struct buffer_page *last_page, *first_page;
> +       struct buffer_page *to_remove_page, *tmp_iter_page;
> 
Also, please use the "upside down x-mas tree" for the declarations:

ie.

       struct list_head *tail_page, *to_remove, *next_page;
       struct buffer_page *to_remove_page, *tmp_iter_page;
       struct buffer_page *last_page, *first_page;
       unsigned int nr_removed;
       unsigned long head_bit;
       int page_entries;

See, it looks easier to read then what you had.

> +		/* fire off all the required work handlers */
> +		for_each_buffer_cpu(buffer, cpu) {
> +			cpu_buffer = buffer->buffers[cpu];
> +			if (!cpu_buffer->nr_pages_to_update)
> +				continue;
> +			schedule_work_on(cpu, &cpu_buffer->update_pages_work);

This locks up. I just tried the following, and it hung the task.

Here:

# cd /sys/kernel/debug/tracing
# echo 1 > events/enable
# sleep 10
# echo 0 > event/enable
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 100 > buffer_size_kb

<locked!>

I guess you could test if the cpu is online. And if so, then do the
schedule_work_on(). You will need to get_online_cpus first.

If the cpu is offline, just change it.

-- Steve

PS. The first patch looks good, but I think you need to add some more
blank lines in your patches. You like to bunch a lot of text together,
and that causes some eye strain.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic
  2012-04-21  4:27           ` Steven Rostedt
@ 2012-04-23 17:31             ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-04-23 17:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Michael Rubin, David Sharp,
	Justin Teravest, linux-kernel

On Fri, Apr 20, 2012 at 9:27 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2012-02-02 at 12:00 -0800, Vaibhav Nagarnaik wrote:
>> +rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int
>> nr_pages)
>>  {
>> -       struct buffer_page *bpage;
>> -       struct list_head *p;
>> -       unsigned i;
>> +       unsigned int nr_removed;
>> +       int page_entries;
>> +       struct list_head *tail_page, *to_remove, *next_page;
>> +       unsigned long head_bit;
>> +       struct buffer_page *last_page, *first_page;
>> +       struct buffer_page *to_remove_page, *tmp_iter_page;
>>
> Also, please use the "upside down x-mas tree" for the declarations:
>
> ie.
>
>       struct list_head *tail_page, *to_remove, *next_page;
>       struct buffer_page *to_remove_page, *tmp_iter_page;
>       struct buffer_page *last_page, *first_page;
>       unsigned int nr_removed;
>       unsigned long head_bit;
>       int page_entries;
>
> See, it looks easier to read then what you had.
>
>> +             /* fire off all the required work handlers */
>> +             for_each_buffer_cpu(buffer, cpu) {
>> +                     cpu_buffer = buffer->buffers[cpu];
>> +                     if (!cpu_buffer->nr_pages_to_update)
>> +                             continue;
>> +                     schedule_work_on(cpu, &cpu_buffer->update_pages_work);
>
> This locks up. I just tried the following, and it hung the task.
>
> Here:
>
> # cd /sys/kernel/debug/tracing
> # echo 1 > events/enable
> # sleep 10
> # echo 0 > event/enable
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> # echo 100 > buffer_size_kb
>
> <locked!>
>
> I guess you could test if the cpu is online. And if so, then do the
> schedule_work_on(). You will need to get_online_cpus first.
>
> If the cpu is offline, just change it.
>
> -- Steve
>
> PS. The first patch looks good, but I think you need to add some more
> blank lines in your patches. You like to bunch a lot of text together,
> and that causes some eye strain.


Thanks for the feedback. I will update the patch with your suggestions.



Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-02-02 20:00         ` [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
  2012-04-21  4:27           ` Steven Rostedt
@ 2012-04-25 21:18           ` Vaibhav Nagarnaik
  2012-04-25 21:18             ` [PATCH v6 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
                               ` (3 more replies)
  1 sibling, 4 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-04-25 21:18 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Laurent Chavey, Justin Teravest, David Sharp, linux-kernel,
	Vaibhav Nagarnaik

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog:
v6-v5:
* Add a check for cpu_online before scheduling resize task for it

 kernel/trace/ring_buffer.c |  269 ++++++++++++++++++++++++++++++++++---------
 kernel/trace/trace.c       |   20 +---
 2 files changed, 213 insertions(+), 76 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 2d5eb33..a966f9b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -470,6 +472,8 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
@@ -1048,6 +1052,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	raw_spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1235,32 +1241,123 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *tail_page, *to_remove, *next_page;
+	struct buffer_page *to_remove_page, *tmp_iter_page;
+	struct buffer_page *last_page, *first_page;
+	unsigned int nr_removed;
+	unsigned long head_bit;
+	int page_entries;
+
+	head_bit = 0;
 
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+	atomic_inc(&cpu_buffer->record_disabled);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock. We also don't race with writers after disabling recording.
+	 * This makes it easy to figure out the first and the last page to be
+	 * removed from the list. We unlink all the pages in between including
+	 * the first and last pages. This is done in a busy loop so that we
+	 * lose the least number of traces.
+	 * The pages are freed after we restart recording and unlock readers.
+	 */
+	tail_page = &cpu_buffer->tail_page->list;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+	/*
+	 * tail page might be on reader page, we remove the next page
+	 * from the ring buffer
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		tail_page = rb_list_head(tail_page->next);
+	to_remove = tail_page;
+
+	/* start of pages to remove */
+	first_page = list_entry(rb_list_head(to_remove->next),
+				struct buffer_page, list);
+
+	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
+		to_remove = rb_list_head(to_remove)->next;
+		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
 	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
 
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
+	next_page = rb_list_head(to_remove)->next;
 
-out:
+	/*
+	 * Now we remove all pages between tail_page and next_page.
+	 * Make sure that we have head_bit value preserved for the
+	 * next page
+	 */
+	tail_page->next = (struct list_head *)((unsigned long)next_page |
+						head_bit);
+	next_page = rb_list_head(next_page);
+	next_page->prev = tail_page;
+
+	/* make sure pages points to a valid page in the ring buffer */
+	cpu_buffer->pages = next_page;
+
+	/* update head page */
+	if (head_bit)
+		cpu_buffer->head_page = list_entry(next_page,
+						struct buffer_page, list);
+
+	/*
+	 * change read pointer to make sure any read iterators reset
+	 * themselves
+	 */
+	cpu_buffer->read = 0;
+
+	/* pages are removed, resume tracing and then free the pages */
+	atomic_dec(&cpu_buffer->record_disabled);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
+
+	/* last buffer page to remove */
+	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
+				list);
+	tmp_iter_page = first_page;
+
+	do {
+		to_remove_page = tmp_iter_page;
+		rb_inc_page(cpu_buffer, &tmp_iter_page);
+
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		free_buffer_page(to_remove_page);
+		nr_removed--;
+
+	} while (to_remove_page != last_page);
+
+	RB_WARN_ON(cpu_buffer, nr_removed);
 }
 
 static void
@@ -1271,6 +1368,12 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	struct list_head *p;
 	unsigned i;
 
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
+
+	/* Make sure all writers are done with this buffer. */
+	synchronize_sched();
+
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
 	rb_head_page_deactivate(cpu_buffer);
 
@@ -1287,18 +1390,26 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 
 out:
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+	atomic_dec(&cpu_buffer->record_disabled);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void rb_update_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+
 	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+}
+
+static void update_pages_handler(struct work_struct *work)
+{
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+	rb_update_pages(cpu_buffer);
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1308,14 +1419,14 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
  *
  * Minimum size is 2 * BUF_PAGE_SIZE.
  *
- * Returns -1 on failure.
+ * Returns 0 on success and < 0 on failure.
  */
 int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1330,50 +1441,95 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	/*
+	 * Don't succeed if recording is disabled globally, as a reader might
+	 * be manipulating the ring buffer and is expecting a sane state while
+	 * this is true.
+	 */
+	if (atomic_read(&buffer->record_disabled))
+		return -EBUSY;
 
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
+			if (atomic_read(&cpu_buffer->record_disabled)) {
+				err = -EBUSY;
+				goto out_err;
+			}
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
-
 			/*
 			 * nothing more to do for removing pages or no update
 			 */
 			if (cpu_buffer->nr_pages_to_update <= 0)
 				continue;
-
 			/*
 			 * to add pages, make sure all new pages can be
 			 * allocated without receiving ENOMEM
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
+		}
+
+		get_online_cpus();
+		/*
+		 * Fire off all the required work handlers
+		 * Look out for offline CPUs
+		 */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update ||
+			    !cpu_online(cpu))
+				continue;
+
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
+		}
+		/*
+		 * This loop is for the CPUs that are not online.
+		 * We can't schedule anything on them, but it's not necessary
+		 * since we can change their buffer sizes without any race.
+		 */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update ||
+			    cpu_online(cpu))
+				continue;
+
+			rb_update_pages(cpu_buffer);
 		}
 
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-			}
+			if (!cpu_buffer->nr_pages_to_update||
+			    !cpu_online(cpu))
+				continue;
+
+			wait_for_completion(&cpu_buffer->update_completion);
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
+
+		put_online_cpus();
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+		if (atomic_read(&cpu_buffer->record_disabled)) {
+			err = -EBUSY;
+			goto out_err;
+		}
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1383,38 +1539,47 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					    &cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
 
-		update_pages_handler(cpu_buffer);
+		get_online_cpus();
+
+		if (cpu_online(cpu_id)) {
+			schedule_work_on(cpu_id,
+					 &cpu_buffer->update_pages_work);
+			wait_for_completion(&cpu_buffer->update_completion);
+		} else
+			rb_update_pages(cpu_buffer);
+
+		put_online_cpus();
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
-		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1453,21 +1618,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7bb735b..401d77a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3057,20 +3057,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -3084,14 +3074,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 2/3] trace: Make addition of pages in ring buffer atomic
  2012-04-25 21:18           ` [PATCH v6 1/3] " Vaibhav Nagarnaik
@ 2012-04-25 21:18             ` Vaibhav Nagarnaik
  2012-04-25 21:18             ` [PATCH v6 3/3] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-04-25 21:18 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Laurent Chavey, Justin Teravest, David Sharp, linux-kernel,
	Vaibhav Nagarnaik

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog:
v6-v5:
* Rebased to latest

 kernel/trace/ring_buffer.c |  127 ++++++++++++++++++++++++++++++--------------
 1 files changed, 87 insertions(+), 40 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a966f9b..d4c458a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -480,6 +480,7 @@ struct ring_buffer {
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
+	atomic_t			resize_disabled;
 	cpumask_var_t			cpumask;
 
 	struct lock_class_key		*reader_lock_key;
@@ -1251,7 +1252,7 @@ static inline unsigned long rb_page_write(struct buffer_page *bpage)
 	return local_read(&bpage->write) & RB_WRITE_MASK;
 }
 
-static void
+static int
 rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
 	struct list_head *tail_page, *to_remove, *next_page;
@@ -1358,50 +1359,97 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 	} while (to_remove_page != last_page);
 
 	RB_WARN_ON(cpu_buffer, nr_removed);
+
+	return nr_removed == 0;
 }
 
-static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+static int
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
+	raw_spin_lock_irq(&cpu_buffer->reader_lock);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 10 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 10;
+	success = 0;
+	while (retries--) {
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page_with_bit;
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
 
-	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+		first_page = pages->next;
+		last_page  = pages->prev;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+		head_page_with_bit = (struct list_head *)
+				     ((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
-	atomic_dec(&cpu_buffer->record_disabled);
+
+	/* free pages if they weren't inserted */
+	if (!success) {
+		struct buffer_page *bpage, *tmp;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					 list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
+	}
+	return success;
 }
 
 static void rb_update_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
+	int success;
+
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		success = rb_insert_pages(cpu_buffer);
 	else
-		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+		success = rb_remove_pages(cpu_buffer,
+					-cpu_buffer->nr_pages_to_update);
 
-	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	if (success)
+		cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
 }
 
 static void update_pages_handler(struct work_struct *work)
@@ -1444,11 +1492,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	/*
-	 * Don't succeed if recording is disabled globally, as a reader might
-	 * be manipulating the ring buffer and is expecting a sane state while
+	 * Don't succeed if resizing is disabled, as a reader might be
+	 * manipulating the ring buffer and is expecting a sane state while
 	 * this is true.
 	 */
-	if (atomic_read(&buffer->record_disabled))
+	if (atomic_read(&buffer->resize_disabled))
 		return -EBUSY;
 
 	/* prevent another thread from changing buffer sizes */
@@ -1459,10 +1507,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
 
-			if (atomic_read(&cpu_buffer->record_disabled)) {
-				err = -EBUSY;
-				goto out_err;
-			}
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
 			/*
@@ -1525,11 +1569,6 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		put_online_cpus();
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
-		if (atomic_read(&cpu_buffer->record_disabled)) {
-			err = -EBUSY;
-			goto out_err;
-		}
-
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -3647,6 +3686,7 @@ ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu)
 
 	iter->cpu_buffer = cpu_buffer;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
 	return iter;
@@ -3710,6 +3750,7 @@ ring_buffer_read_finish(struct ring_buffer_iter *iter)
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&cpu_buffer->buffer->resize_disabled);
 	kfree(iter);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read_finish);
@@ -3781,6 +3822,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
@@ -3817,8 +3859,12 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
+	/* Make sure all commits have finished */
+	synchronize_sched();
+
 	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
 	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
@@ -3834,6 +3880,7 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&buffer->resize_disabled);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 3/3] trace: change CPU ring buffer state from tracing_cpumask
  2012-04-25 21:18           ` [PATCH v6 1/3] " Vaibhav Nagarnaik
  2012-04-25 21:18             ` [PATCH v6 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
@ 2012-04-25 21:18             ` Vaibhav Nagarnaik
  2012-05-03  1:55             ` [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
  2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
  3 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-04-25 21:18 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Laurent Chavey, Justin Teravest, David Sharp, linux-kernel,
	Vaibhav Nagarnaik

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/trace.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 401d77a..6d4c2dd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2650,10 +2650,12 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf,
 		if (cpumask_test_cpu(cpu, tracing_cpumask) &&
 				!cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_inc(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_disable_cpu(global_trace.buffer, cpu);
 		}
 		if (!cpumask_test_cpu(cpu, tracing_cpumask) &&
 				cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_dec(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_enable_cpu(global_trace.buffer, cpu);
 		}
 	}
 	arch_spin_unlock(&ftrace_max_lock);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [tip:perf/core] ring-buffer: Add per_cpu ring buffer control files
  2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
                           ` (3 preceding siblings ...)
  2012-03-08 23:51         ` [PATCH v5 1/4] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
@ 2012-05-02 21:03         ` tip-bot for Vaibhav Nagarnaik
  4 siblings, 0 replies; 80+ messages in thread
From: tip-bot for Vaibhav Nagarnaik @ 2012-05-02 21:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, vnagarnaik, hpa, mingo, dhsharp, fweisbec, mrubin,
	rostedt, tglx, teravest

Commit-ID:  438ced1720b584000a9e8a4349d1f6bb7ee3ad6d
Gitweb:     http://git.kernel.org/tip/438ced1720b584000a9e8a4349d1f6bb7ee3ad6d
Author:     Vaibhav Nagarnaik <vnagarnaik@google.com>
AuthorDate: Thu, 2 Feb 2012 12:00:41 -0800
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Mon, 23 Apr 2012 21:17:51 -0400

ring-buffer: Add per_cpu ring buffer control files

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Link: http://lkml.kernel.org/r/1328212844-11889-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Justin Teravest <teravest@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ring_buffer.h |    6 +-
 kernel/trace/ring_buffer.c  |  248 ++++++++++++++++++++++++-------------------
 kernel/trace/trace.c        |  190 ++++++++++++++++++++++++++-------
 kernel/trace/trace.h        |    2 +-
 4 files changed, 297 insertions(+), 149 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 7be2e88..6c8835f 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -96,9 +96,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
 	__ring_buffer_alloc((size), (flags), &__key);	\
 })
 
+#define RING_BUFFER_ALL_CPUS -1
+
 void ring_buffer_free(struct ring_buffer *buffer);
 
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size);
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
 
 void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
 
@@ -129,7 +131,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts);
 void ring_buffer_iter_reset(struct ring_buffer_iter *iter);
 int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
 
-unsigned long ring_buffer_size(struct ring_buffer *buffer);
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu);
 
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cf8d11e..2d5eb33 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -449,6 +449,7 @@ struct ring_buffer_per_cpu {
 	raw_spinlock_t			reader_lock;	/* serialize readers */
 	arch_spinlock_t			lock;
 	struct lock_class_key		lock_key;
+	unsigned int			nr_pages;
 	struct list_head		*pages;
 	struct buffer_page		*head_page;	/* read from head */
 	struct buffer_page		*tail_page;	/* write to tail */
@@ -466,10 +467,12 @@ struct ring_buffer_per_cpu {
 	unsigned long			read_bytes;
 	u64				write_stamp;
 	u64				read_stamp;
+	/* ring buffer pages to update, > 0 to add, < 0 to remove */
+	int				nr_pages_to_update;
+	struct list_head		new_pages; /* new pages to add */
 };
 
 struct ring_buffer {
-	unsigned			pages;
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
@@ -963,14 +966,10 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
 	return 0;
 }
 
-static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
-			     unsigned nr_pages)
+static int __rb_allocate_pages(int nr_pages, struct list_head *pages, int cpu)
 {
+	int i;
 	struct buffer_page *bpage, *tmp;
-	LIST_HEAD(pages);
-	unsigned i;
-
-	WARN_ON(!nr_pages);
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
@@ -981,15 +980,13 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 				    GFP_KERNEL | __GFP_NORETRY,
-				    cpu_to_node(cpu_buffer->cpu));
+				    cpu_to_node(cpu));
 		if (!bpage)
 			goto free_pages;
 
-		rb_check_bpage(cpu_buffer, bpage);
+		list_add(&bpage->list, pages);
 
-		list_add(&bpage->list, &pages);
-
-		page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu),
+		page = alloc_pages_node(cpu_to_node(cpu),
 					GFP_KERNEL | __GFP_NORETRY, 0);
 		if (!page)
 			goto free_pages;
@@ -997,6 +994,27 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 		rb_init_page(bpage->page);
 	}
 
+	return 0;
+
+free_pages:
+	list_for_each_entry_safe(bpage, tmp, pages, list) {
+		list_del_init(&bpage->list);
+		free_buffer_page(bpage);
+	}
+
+	return -ENOMEM;
+}
+
+static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
+			     unsigned nr_pages)
+{
+	LIST_HEAD(pages);
+
+	WARN_ON(!nr_pages);
+
+	if (__rb_allocate_pages(nr_pages, &pages, cpu_buffer->cpu))
+		return -ENOMEM;
+
 	/*
 	 * The ring buffer page list is a circular list that does not
 	 * start and end with a list head. All page list items point to
@@ -1005,20 +1023,15 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	cpu_buffer->pages = pages.next;
 	list_del(&pages);
 
+	cpu_buffer->nr_pages = nr_pages;
+
 	rb_check_pages(cpu_buffer);
 
 	return 0;
-
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
-	}
-	return -ENOMEM;
 }
 
 static struct ring_buffer_per_cpu *
-rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
+rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_page *bpage;
@@ -1052,7 +1065,7 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int cpu)
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
 
-	ret = rb_allocate_pages(cpu_buffer, buffer->pages);
+	ret = rb_allocate_pages(cpu_buffer, nr_pages);
 	if (ret < 0)
 		goto fail_free_reader;
 
@@ -1113,7 +1126,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 {
 	struct ring_buffer *buffer;
 	int bsize;
-	int cpu;
+	int cpu, nr_pages;
 
 	/* keep it in its own cache line */
 	buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
@@ -1124,14 +1137,14 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 	if (!alloc_cpumask_var(&buffer->cpumask, GFP_KERNEL))
 		goto fail_free_buffer;
 
-	buffer->pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	buffer->flags = flags;
 	buffer->clock = trace_clock_local;
 	buffer->reader_lock_key = key;
 
 	/* need at least two pages */
-	if (buffer->pages < 2)
-		buffer->pages = 2;
+	if (nr_pages < 2)
+		nr_pages = 2;
 
 	/*
 	 * In case of non-hotplug cpu, if the ring-buffer is allocated
@@ -1154,7 +1167,7 @@ struct ring_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags,
 
 	for_each_buffer_cpu(buffer, cpu) {
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu])
 			goto fail_free_buffers;
 	}
@@ -1276,6 +1289,18 @@ out:
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
+static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	if (cpu_buffer->nr_pages_to_update > 0)
+		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
+				cpu_buffer->nr_pages_to_update);
+	else
+		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	/* reset this value */
+	cpu_buffer->nr_pages_to_update = 0;
+}
+
 /**
  * ring_buffer_resize - resize the ring buffer
  * @buffer: the buffer to resize.
@@ -1285,14 +1310,12 @@ out:
  *
  * Returns -1 on failure.
  */
-int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
+int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
+			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
-	unsigned nr_pages, rm_pages, new_pages;
-	struct buffer_page *bpage, *tmp;
-	unsigned long buffer_size;
-	LIST_HEAD(pages);
-	int i, cpu;
+	unsigned nr_pages;
+	int cpu;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1302,15 +1325,11 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 	size *= BUF_PAGE_SIZE;
-	buffer_size = buffer->pages * BUF_PAGE_SIZE;
 
 	/* we need a minimum of two pages */
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	if (size == buffer_size)
-		return size;
-
 	atomic_inc(&buffer->record_disabled);
 
 	/* Make sure all writers are done with this buffer. */
@@ -1321,68 +1340,56 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	if (size < buffer_size) {
-
-		/* easy case, just free pages */
-		if (RB_WARN_ON(buffer, nr_pages >= buffer->pages))
-			goto out_fail;
-
-		rm_pages = buffer->pages - nr_pages;
-
+	if (cpu_id == RING_BUFFER_ALL_CPUS) {
+		/* calculate the pages to update */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			rb_remove_pages(cpu_buffer, rm_pages);
-		}
-		goto out;
-	}
 
-	/*
-	 * This is a bit more difficult. We only want to add pages
-	 * when we can allocate enough for all CPUs. We do this
-	 * by allocating all the pages and storing them on a local
-	 * link list. If we succeed in our allocation, then we
-	 * add these pages to the cpu_buffers. Otherwise we just free
-	 * them all and return -ENOMEM;
-	 */
-	if (RB_WARN_ON(buffer, nr_pages <= buffer->pages))
-		goto out_fail;
+			cpu_buffer->nr_pages_to_update = nr_pages -
+							cpu_buffer->nr_pages;
 
-	new_pages = nr_pages - buffer->pages;
+			/*
+			 * nothing more to do for removing pages or no update
+			 */
+			if (cpu_buffer->nr_pages_to_update <= 0)
+				continue;
 
-	for_each_buffer_cpu(buffer, cpu) {
-		for (i = 0; i < new_pages; i++) {
-			struct page *page;
 			/*
-			 * __GFP_NORETRY flag makes sure that the allocation
-			 * fails gracefully without invoking oom-killer and
-			 * the system is not destabilized.
+			 * to add pages, make sure all new pages can be
+			 * allocated without receiving ENOMEM
 			 */
-			bpage = kzalloc_node(ALIGN(sizeof(*bpage),
-						  cache_line_size()),
-					    GFP_KERNEL | __GFP_NORETRY,
-					    cpu_to_node(cpu));
-			if (!bpage)
-				goto free_pages;
-			list_add(&bpage->list, &pages);
-			page = alloc_pages_node(cpu_to_node(cpu),
-						GFP_KERNEL | __GFP_NORETRY, 0);
-			if (!page)
-				goto free_pages;
-			bpage->page = page_address(page);
-			rb_init_page(bpage->page);
+			INIT_LIST_HEAD(&cpu_buffer->new_pages);
+			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu))
+				/* not enough memory for new pages */
+				goto no_mem;
 		}
-	}
 
-	for_each_buffer_cpu(buffer, cpu) {
-		cpu_buffer = buffer->buffers[cpu];
-		rb_insert_pages(cpu_buffer, &pages, new_pages);
-	}
+		/* wait for all the updates to complete */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (cpu_buffer->nr_pages_to_update) {
+				update_pages_handler(cpu_buffer);
+			}
+		}
+	} else {
+		cpu_buffer = buffer->buffers[cpu_id];
+		if (nr_pages == cpu_buffer->nr_pages)
+			goto out;
 
-	if (RB_WARN_ON(buffer, !list_empty(&pages)))
-		goto out_fail;
+		cpu_buffer->nr_pages_to_update = nr_pages -
+						cpu_buffer->nr_pages;
+
+		INIT_LIST_HEAD(&cpu_buffer->new_pages);
+		if (cpu_buffer->nr_pages_to_update > 0 &&
+			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
+						&cpu_buffer->new_pages, cpu_id))
+			goto no_mem;
+
+		update_pages_handler(cpu_buffer);
+	}
 
  out:
-	buffer->pages = nr_pages;
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 
@@ -1390,25 +1397,24 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
 
 	return size;
 
- free_pages:
-	list_for_each_entry_safe(bpage, tmp, &pages, list) {
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+ no_mem:
+	for_each_buffer_cpu(buffer, cpu) {
+		struct buffer_page *bpage, *tmp;
+		cpu_buffer = buffer->buffers[cpu];
+		/* reset this number regardless */
+		cpu_buffer->nr_pages_to_update = 0;
+		if (list_empty(&cpu_buffer->new_pages))
+			continue;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
 	}
 	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
 	atomic_dec(&buffer->record_disabled);
 	return -ENOMEM;
-
-	/*
-	 * Something went totally wrong, and we are too paranoid
-	 * to even clean up the mess.
-	 */
- out_fail:
-	put_online_cpus();
-	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -1;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1510,7 +1516,7 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
 	 * assign the commit to the tail.
 	 */
  again:
-	max_count = cpu_buffer->buffer->pages * 100;
+	max_count = cpu_buffer->nr_pages * 100;
 
 	while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
 		if (RB_WARN_ON(cpu_buffer, !(--max_count)))
@@ -3588,9 +3594,18 @@ EXPORT_SYMBOL_GPL(ring_buffer_read);
  * ring_buffer_size - return the size of the ring buffer (in bytes)
  * @buffer: The ring buffer.
  */
-unsigned long ring_buffer_size(struct ring_buffer *buffer)
+unsigned long ring_buffer_size(struct ring_buffer *buffer, int cpu)
 {
-	return BUF_PAGE_SIZE * buffer->pages;
+	/*
+	 * Earlier, this method returned
+	 *	BUF_PAGE_SIZE * buffer->nr_pages
+	 * Since the nr_pages field is now removed, we have converted this to
+	 * return the per cpu buffer value.
+	 */
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	return BUF_PAGE_SIZE * buffer->buffers[cpu]->nr_pages;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_size);
 
@@ -3765,8 +3780,11 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	    !cpumask_test_cpu(cpu, buffer_b->cpumask))
 		goto out;
 
+	cpu_buffer_a = buffer_a->buffers[cpu];
+	cpu_buffer_b = buffer_b->buffers[cpu];
+
 	/* At least make sure the two buffers are somewhat the same */
-	if (buffer_a->pages != buffer_b->pages)
+	if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages)
 		goto out;
 
 	ret = -EAGAIN;
@@ -3780,9 +3798,6 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	if (atomic_read(&buffer_b->record_disabled))
 		goto out;
 
-	cpu_buffer_a = buffer_a->buffers[cpu];
-	cpu_buffer_b = buffer_b->buffers[cpu];
-
 	if (atomic_read(&cpu_buffer_a->record_disabled))
 		goto out;
 
@@ -4071,6 +4086,8 @@ static int rb_cpu_notify(struct notifier_block *self,
 	struct ring_buffer *buffer =
 		container_of(self, struct ring_buffer, cpu_notify);
 	long cpu = (long)hcpu;
+	int cpu_i, nr_pages_same;
+	unsigned int nr_pages;
 
 	switch (action) {
 	case CPU_UP_PREPARE:
@@ -4078,8 +4095,23 @@ static int rb_cpu_notify(struct notifier_block *self,
 		if (cpumask_test_cpu(cpu, buffer->cpumask))
 			return NOTIFY_OK;
 
+		nr_pages = 0;
+		nr_pages_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_buffer_cpu(buffer, cpu_i) {
+			/* fill in the size from first enabled cpu */
+			if (nr_pages == 0)
+				nr_pages = buffer->buffers[cpu_i]->nr_pages;
+			if (nr_pages != buffer->buffers[cpu_i]->nr_pages) {
+				nr_pages_same = 0;
+				break;
+			}
+		}
+		/* allocate minimum pages, user can later expand it */
+		if (!nr_pages_same)
+			nr_pages = 2;
 		buffer->buffers[cpu] =
-			rb_allocate_cpu_buffer(buffer, cpu);
+			rb_allocate_cpu_buffer(buffer, nr_pages, cpu);
 		if (!buffer->buffers[cpu]) {
 			WARN(1, "failed to allocate ring buffer on CPU %ld\n",
 			     cpu);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bbcde54..f11a285 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -838,7 +838,8 @@ __acquires(kernel_lock)
 
 		/* If we expanded the buffers, make sure the max is expanded too */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, trace_buf_size);
+			ring_buffer_resize(max_tr.buffer, trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 
 		/* the test is responsible for initializing and enabling */
 		pr_info("Testing tracer %s: ", type->name);
@@ -854,7 +855,8 @@ __acquires(kernel_lock)
 
 		/* Shrink the max buffer again */
 		if (ring_buffer_expanded && type->use_max_tr)
-			ring_buffer_resize(max_tr.buffer, 1);
+			ring_buffer_resize(max_tr.buffer, 1,
+						RING_BUFFER_ALL_CPUS);
 
 		printk(KERN_CONT "PASSED\n");
 	}
@@ -3053,7 +3055,14 @@ int tracer_init(struct tracer *t, struct trace_array *tr)
 	return t->init(tr);
 }
 
-static int __tracing_resize_ring_buffer(unsigned long size)
+static void set_buffer_entries(struct trace_array *tr, unsigned long val)
+{
+	int cpu;
+	for_each_tracing_cpu(cpu)
+		tr->data[cpu]->entries = val;
+}
+
+static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 {
 	int ret;
 
@@ -3064,19 +3073,32 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 	 */
 	ring_buffer_expanded = 1;
 
-	ret = ring_buffer_resize(global_trace.buffer, size);
+	ret = ring_buffer_resize(global_trace.buffer, size, cpu);
 	if (ret < 0)
 		return ret;
 
 	if (!current_trace->use_max_tr)
 		goto out;
 
-	ret = ring_buffer_resize(max_tr.buffer, size);
+	ret = ring_buffer_resize(max_tr.buffer, size, cpu);
 	if (ret < 0) {
-		int r;
+		int r = 0;
+
+		if (cpu == RING_BUFFER_ALL_CPUS) {
+			int i;
+			for_each_tracing_cpu(i) {
+				r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[i]->entries,
+						i);
+				if (r < 0)
+					break;
+			}
+		} else {
+			r = ring_buffer_resize(global_trace.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+		}
 
-		r = ring_buffer_resize(global_trace.buffer,
-				       global_trace.entries);
 		if (r < 0) {
 			/*
 			 * AARGH! We are left with different
@@ -3098,14 +3120,21 @@ static int __tracing_resize_ring_buffer(unsigned long size)
 		return ret;
 	}
 
-	max_tr.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&max_tr, size);
+	else
+		max_tr.data[cpu]->entries = size;
+
  out:
-	global_trace.entries = size;
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		set_buffer_entries(&global_trace, size);
+	else
+		global_trace.data[cpu]->entries = size;
 
 	return ret;
 }
 
-static ssize_t tracing_resize_ring_buffer(unsigned long size)
+static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
 	int cpu, ret = size;
 
@@ -3121,12 +3150,19 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size)
 			atomic_inc(&max_tr.data[cpu]->disabled);
 	}
 
-	if (size != global_trace.entries)
-		ret = __tracing_resize_ring_buffer(size);
+	if (cpu_id != RING_BUFFER_ALL_CPUS) {
+		/* make sure, this cpu is enabled in the mask */
+		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
 
+	ret = __tracing_resize_ring_buffer(size, cpu_id);
 	if (ret < 0)
 		ret = -ENOMEM;
 
+out:
 	for_each_tracing_cpu(cpu) {
 		if (global_trace.data[cpu])
 			atomic_dec(&global_trace.data[cpu]->disabled);
@@ -3157,7 +3193,8 @@ int tracing_update_buffers(void)
 
 	mutex_lock(&trace_types_lock);
 	if (!ring_buffer_expanded)
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
@@ -3181,7 +3218,8 @@ static int tracing_set_tracer(const char *buf)
 	mutex_lock(&trace_types_lock);
 
 	if (!ring_buffer_expanded) {
-		ret = __tracing_resize_ring_buffer(trace_buf_size);
+		ret = __tracing_resize_ring_buffer(trace_buf_size,
+						RING_BUFFER_ALL_CPUS);
 		if (ret < 0)
 			goto out;
 		ret = 0;
@@ -3207,8 +3245,8 @@ static int tracing_set_tracer(const char *buf)
 		 * The max_tr ring buffer has some state (e.g. ring->clock) and
 		 * we want preserve it.
 		 */
-		ring_buffer_resize(max_tr.buffer, 1);
-		max_tr.entries = 1;
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
 	}
 	destroy_trace_option_files(topts);
 
@@ -3216,10 +3254,17 @@ static int tracing_set_tracer(const char *buf)
 
 	topts = create_trace_option_files(current_trace);
 	if (current_trace->use_max_tr) {
-		ret = ring_buffer_resize(max_tr.buffer, global_trace.entries);
-		if (ret < 0)
-			goto out;
-		max_tr.entries = global_trace.entries;
+		int cpu;
+		/* we need to make per cpu buffer sizes equivalent */
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				goto out;
+			max_tr.data[cpu]->entries =
+					global_trace.data[cpu]->entries;
+		}
 	}
 
 	if (t->init) {
@@ -3721,30 +3766,82 @@ out_err:
 	goto out;
 }
 
+struct ftrace_entries_info {
+	struct trace_array	*tr;
+	int			cpu;
+};
+
+static int tracing_entries_open(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info;
+
+	if (tracing_disabled)
+		return -ENODEV;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->tr = &global_trace;
+	info->cpu = (unsigned long)inode->i_private;
+
+	filp->private_data = info;
+
+	return 0;
+}
+
 static ssize_t
 tracing_entries_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	struct trace_array *tr = filp->private_data;
-	char buf[96];
-	int r;
+	struct ftrace_entries_info *info = filp->private_data;
+	struct trace_array *tr = info->tr;
+	char buf[64];
+	int r = 0;
+	ssize_t ret;
 
 	mutex_lock(&trace_types_lock);
-	if (!ring_buffer_expanded)
-		r = sprintf(buf, "%lu (expanded: %lu)\n",
-			    tr->entries >> 10,
-			    trace_buf_size >> 10);
-	else
-		r = sprintf(buf, "%lu\n", tr->entries >> 10);
+
+	if (info->cpu == RING_BUFFER_ALL_CPUS) {
+		int cpu, buf_size_same;
+		unsigned long size;
+
+		size = 0;
+		buf_size_same = 1;
+		/* check if all cpu sizes are same */
+		for_each_tracing_cpu(cpu) {
+			/* fill in the size from first enabled cpu */
+			if (size == 0)
+				size = tr->data[cpu]->entries;
+			if (size != tr->data[cpu]->entries) {
+				buf_size_same = 0;
+				break;
+			}
+		}
+
+		if (buf_size_same) {
+			if (!ring_buffer_expanded)
+				r = sprintf(buf, "%lu (expanded: %lu)\n",
+					    size >> 10,
+					    trace_buf_size >> 10);
+			else
+				r = sprintf(buf, "%lu\n", size >> 10);
+		} else
+			r = sprintf(buf, "X\n");
+	} else
+		r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10);
+
 	mutex_unlock(&trace_types_lock);
 
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	ret = simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+	return ret;
 }
 
 static ssize_t
 tracing_entries_write(struct file *filp, const char __user *ubuf,
 		      size_t cnt, loff_t *ppos)
 {
+	struct ftrace_entries_info *info = filp->private_data;
 	unsigned long val;
 	int ret;
 
@@ -3759,7 +3856,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	/* value is in KB */
 	val <<= 10;
 
-	ret = tracing_resize_ring_buffer(val);
+	ret = tracing_resize_ring_buffer(val, info->cpu);
 	if (ret < 0)
 		return ret;
 
@@ -3768,6 +3865,16 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static int
+tracing_entries_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_entries_info *info = filp->private_data;
+
+	kfree(info);
+
+	return 0;
+}
+
 static ssize_t
 tracing_total_entries_read(struct file *filp, char __user *ubuf,
 				size_t cnt, loff_t *ppos)
@@ -3779,7 +3886,7 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf,
 
 	mutex_lock(&trace_types_lock);
 	for_each_tracing_cpu(cpu) {
-		size += tr->entries >> 10;
+		size += tr->data[cpu]->entries >> 10;
 		if (!ring_buffer_expanded)
 			expanded_size += trace_buf_size >> 10;
 	}
@@ -3813,7 +3920,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_STOP_ON_FREE)
 		tracing_off();
 	/* resize the ring buffer to 0 */
-	tracing_resize_ring_buffer(0);
+	tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS);
 
 	return 0;
 }
@@ -4012,9 +4119,10 @@ static const struct file_operations tracing_pipe_fops = {
 };
 
 static const struct file_operations tracing_entries_fops = {
-	.open		= tracing_open_generic,
+	.open		= tracing_entries_open,
 	.read		= tracing_entries_read,
 	.write		= tracing_entries_write,
+	.release	= tracing_entries_release,
 	.llseek		= generic_file_llseek,
 };
 
@@ -4466,6 +4574,9 @@ static void tracing_init_debugfs_percpu(long cpu)
 
 	trace_create_file("stats", 0444, d_cpu,
 			(void *) cpu, &tracing_stats_fops);
+
+	trace_create_file("buffer_size_kb", 0444, d_cpu,
+			(void *) cpu, &tracing_entries_fops);
 }
 
 #ifdef CONFIG_FTRACE_SELFTEST
@@ -4795,7 +4906,7 @@ static __init int tracer_init_debugfs(void)
 			(void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops);
 
 	trace_create_file("buffer_size_kb", 0644, d_tracer,
-			&global_trace, &tracing_entries_fops);
+			(void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops);
 
 	trace_create_file("buffer_total_size_kb", 0444, d_tracer,
 			&global_trace, &tracing_total_entries_fops);
@@ -5056,7 +5167,6 @@ __init static int tracer_alloc_buffers(void)
 		WARN_ON(1);
 		goto out_free_cpumask;
 	}
-	global_trace.entries = ring_buffer_size(global_trace.buffer);
 	if (global_trace.buffer_disabled)
 		tracing_off();
 
@@ -5069,7 +5179,6 @@ __init static int tracer_alloc_buffers(void)
 		ring_buffer_free(global_trace.buffer);
 		goto out_free_cpumask;
 	}
-	max_tr.entries = 1;
 #endif
 
 	/* Allocate the first page for all buffers */
@@ -5078,6 +5187,11 @@ __init static int tracer_alloc_buffers(void)
 		max_tr.data[i] = &per_cpu(max_tr_data, i);
 	}
 
+	set_buffer_entries(&global_trace, ring_buf_size);
+#ifdef CONFIG_TRACER_MAX_TRACE
+	set_buffer_entries(&max_tr, 1);
+#endif
+
 	trace_init_cmdlines();
 
 	register_tracer(&nop_trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f9d8550..1c8b7c6 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -131,6 +131,7 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
+	unsigned long		entries;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
@@ -152,7 +153,6 @@ struct trace_array_cpu {
  */
 struct trace_array {
 	struct ring_buffer	*buffer;
-	unsigned long		entries;
 	int			cpu;
 	int			buffer_disabled;
 	cycle_t			time_start;

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-04-25 21:18           ` [PATCH v6 1/3] " Vaibhav Nagarnaik
  2012-04-25 21:18             ` [PATCH v6 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
  2012-04-25 21:18             ` [PATCH v6 3/3] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
@ 2012-05-03  1:55             ` Steven Rostedt
  2012-05-03  6:40               ` Vaibhav Nagarnaik
  2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
  3 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-05-03  1:55 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Wed, 2012-04-25 at 14:18 -0700, Vaibhav Nagarnaik wrote:
> This patch adds the capability to remove pages from a ring buffer
> without destroying any existing data in it.
> 
> This is done by removing the pages after the tail page. This makes sure
> that first all the empty pages in the ring buffer are removed. If the
> head page is one in the list of pages to be removed, then the page after
> the removed ones is made the head page. This removes the oldest data
> from the ring buffer and keeps the latest data around to be read.
> 
> To do this in a non-racey manner, tracing is stopped for a very short
> time while the pages to be removed are identified and unlinked from the
> ring buffer. The pages are freed after the tracing is restarted to
> minimize the time needed to stop tracing.
> 
> The context in which the pages from the per-cpu ring buffer are removed
> runs on the respective CPU. This minimizes the events not traced to only
> NMI trace contexts.
> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>

Hmm, something in this patch breaks buffers_size_kb and friends.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-03  1:55             ` [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
@ 2012-05-03  6:40               ` Vaibhav Nagarnaik
  2012-05-03 12:57                 ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-03  6:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Wed, May 2, 2012 at 6:55 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> Hmm, something in this patch breaks buffers_size_kb and friends.

I checked and I guess you are referring to the initial state of the
buffer_size_kb, which returns:
0 (expanded: 1408)
instead of:
7 (expanded: 1408)

I found this got in with the earlier patch which added per-cpu
buffer_size_kb. I will send a small fix-up patch for it.

After expanding the ring buffer to various sizes, I couldn't find any
other breakage. Is there any other behavior that you saw as odd?



Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-03  6:40               ` Vaibhav Nagarnaik
@ 2012-05-03 12:57                 ` Steven Rostedt
  2012-05-03 14:12                   ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-05-03 12:57 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Wed, 2012-05-02 at 23:40 -0700, Vaibhav Nagarnaik wrote:
> On Wed, May 2, 2012 at 6:55 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > Hmm, something in this patch breaks buffers_size_kb and friends.
> 
> I checked and I guess you are referring to the initial state of the
> buffer_size_kb, which returns:
> 0 (expanded: 1408)
> instead of:
> 7 (expanded: 1408)
> 

No I realized that. That changed with your other patch.

> I found this got in with the earlier patch which added per-cpu
> buffer_size_kb. I will send a small fix-up patch for it.

Yeah, probably should send a fix for that.

> 
> After expanding the ring buffer to various sizes, I couldn't find any
> other breakage. Is there any other behavior that you saw as odd?

The issue I see seems to trigger with trace-cmd:

[root@ixf ~]# cat /debug/tracing/buffer_size_kb 
0 (expanded: 1408)
[root@ixf ~]# trace-cmd start -e sched
/debug/tracing/events/sched/filter
/debug/tracing/events/*/sched/filter
[root@ixf ~]# cat /debug/tracing/buffer_size_kb 
0

But if I enable it via the command line it works:

[root@ixf ~]# cat /debug/tracing/buffer_size_kb 
0 (expanded: 1408)
[root@ixf ~]# echo 1 > /debug/tracing/events/sched/enable 
[root@ixf ~]# cat /debug/tracing/buffer_size_kb 
1408


Without your patch:

[root@ixf ~]# cat /debug/tracing/buffer_size_kb 
0 (expanded: 1408)
[root@ixf ~]# trace-cmd start -e sched
/debug/tracing/events/sched/filter
/debug/tracing/events/*/sched/filter
[root@ixf ~]# cat /debug/tracing/buffer_size_kb 
1408


So it seems to be trace-cmd doing something different that prevents the
expand from happening. Not sure what it is. If I get time, I'll
investigate it a little more.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-03 12:57                 ` Steven Rostedt
@ 2012-05-03 14:12                   ` Steven Rostedt
  2012-05-03 18:43                     ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-05-03 14:12 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Thu, 2012-05-03 at 08:57 -0400, Steven Rostedt wrote:

> So it seems to be trace-cmd doing something different that prevents the
> expand from happening. Not sure what it is. If I get time, I'll
> investigate it a little more.

Do the following:

echo 0 > /debug/tracing/tracing_on
echo nop  > /debug/tracing/current_tracer

then look at buffer_size_kb.

It's also giving me errors when I try to enable events or anything else
when tracing_on is disabled.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-03 14:12                   ` Steven Rostedt
@ 2012-05-03 18:43                     ` Vaibhav Nagarnaik
  2012-05-03 18:54                       ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-03 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Thu, May 3, 2012 at 7:12 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> It's also giving me errors when I try to enable events or anything else
> when tracing_on is disabled.

Yes, I had to add a check to see if resizing is safe. I re-used
record_disabled in this patch for that purpose.

In the next patch for "atomic addition of pages" I added a new field
resize_disabled to guard against unsafe resizing. So with that patch,
you won't see this behavior. It also fixes the issue with trace-cmd
where the ring buffer doesn't get expanded.



Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-03 18:43                     ` Vaibhav Nagarnaik
@ 2012-05-03 18:54                       ` Steven Rostedt
  2012-05-03 18:54                         ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-05-03 18:54 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Thu, 2012-05-03 at 11:43 -0700, Vaibhav Nagarnaik wrote:
> On Thu, May 3, 2012 at 7:12 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > It's also giving me errors when I try to enable events or anything else
> > when tracing_on is disabled.
> 
> Yes, I had to add a check to see if resizing is safe. I re-used
> record_disabled in this patch for that purpose.
> 
> In the next patch for "atomic addition of pages" I added a new field
> resize_disabled to guard against unsafe resizing. So with that patch,
> you won't see this behavior. It also fixes the issue with trace-cmd
> where the ring buffer doesn't get expanded.

Would it be possible to put that resize_disabled into the first patch so
we don't have this bug popping up in a git bisect?

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-03 18:54                       ` Steven Rostedt
@ 2012-05-03 18:54                         ` Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-03 18:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Thu, May 3, 2012 at 11:54 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> Would it be possible to put that resize_disabled into the first patch so
> we don't have this bug popping up in a git bisect?
>


Will do.


Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-04-25 21:18           ` [PATCH v6 1/3] " Vaibhav Nagarnaik
                               ` (2 preceding siblings ...)
  2012-05-03  1:55             ` [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
@ 2012-05-04  1:59             ` Vaibhav Nagarnaik
  2012-05-04  1:59               ` [PATCH v7 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
                                 ` (4 more replies)
  3 siblings, 5 replies; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-04  1:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Laurent Chavey, Justin Teravest, David Sharp, linux-kernel,
	Vaibhav Nagarnaik

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
Changelog v7-v6:
* Fix error with resizing when tracing_on is set to '0'
* Fix error with resizing by trace-cmd

 kernel/trace/ring_buffer.c |  265 ++++++++++++++++++++++++++++++++++----------
 kernel/trace/trace.c       |   20 +---
 2 files changed, 209 insertions(+), 76 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 2d5eb33..230ae9d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -470,12 +472,15 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
+	atomic_t			resize_disabled;
 	cpumask_var_t			cpumask;
 
 	struct lock_class_key		*reader_lock_key;
@@ -1048,6 +1053,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	raw_spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1235,32 +1242,123 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *tail_page, *to_remove, *next_page;
+	struct buffer_page *to_remove_page, *tmp_iter_page;
+	struct buffer_page *last_page, *first_page;
+	unsigned int nr_removed;
+	unsigned long head_bit;
+	int page_entries;
+
+	head_bit = 0;
 
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+	atomic_inc(&cpu_buffer->record_disabled);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock. We also don't race with writers after disabling recording.
+	 * This makes it easy to figure out the first and the last page to be
+	 * removed from the list. We unlink all the pages in between including
+	 * the first and last pages. This is done in a busy loop so that we
+	 * lose the least number of traces.
+	 * The pages are freed after we restart recording and unlock readers.
+	 */
+	tail_page = &cpu_buffer->tail_page->list;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+	/*
+	 * tail page might be on reader page, we remove the next page
+	 * from the ring buffer
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		tail_page = rb_list_head(tail_page->next);
+	to_remove = tail_page;
+
+	/* start of pages to remove */
+	first_page = list_entry(rb_list_head(to_remove->next),
+				struct buffer_page, list);
+
+	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
+		to_remove = rb_list_head(to_remove)->next;
+		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
 	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
 
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
+	next_page = rb_list_head(to_remove)->next;
 
-out:
+	/*
+	 * Now we remove all pages between tail_page and next_page.
+	 * Make sure that we have head_bit value preserved for the
+	 * next page
+	 */
+	tail_page->next = (struct list_head *)((unsigned long)next_page |
+						head_bit);
+	next_page = rb_list_head(next_page);
+	next_page->prev = tail_page;
+
+	/* make sure pages points to a valid page in the ring buffer */
+	cpu_buffer->pages = next_page;
+
+	/* update head page */
+	if (head_bit)
+		cpu_buffer->head_page = list_entry(next_page,
+						struct buffer_page, list);
+
+	/*
+	 * change read pointer to make sure any read iterators reset
+	 * themselves
+	 */
+	cpu_buffer->read = 0;
+
+	/* pages are removed, resume tracing and then free the pages */
+	atomic_dec(&cpu_buffer->record_disabled);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
+
+	/* last buffer page to remove */
+	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
+				list);
+	tmp_iter_page = first_page;
+
+	do {
+		to_remove_page = tmp_iter_page;
+		rb_inc_page(cpu_buffer, &tmp_iter_page);
+
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		free_buffer_page(to_remove_page);
+		nr_removed--;
+
+	} while (to_remove_page != last_page);
+
+	RB_WARN_ON(cpu_buffer, nr_removed);
 }
 
 static void
@@ -1272,6 +1370,8 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	unsigned i;
 
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
 	rb_head_page_deactivate(cpu_buffer);
 
 	for (i = 0; i < nr_pages; i++) {
@@ -1286,19 +1386,27 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	rb_check_pages(cpu_buffer);
 
 out:
+	atomic_dec(&cpu_buffer->record_disabled);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void rb_update_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+
 	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+}
+
+static void update_pages_handler(struct work_struct *work)
+{
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+	rb_update_pages(cpu_buffer);
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1308,14 +1416,14 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
  *
  * Minimum size is 2 * BUF_PAGE_SIZE.
  *
- * Returns -1 on failure.
+ * Returns 0 on success and < 0 on failure.
  */
 int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1330,15 +1438,18 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	/*
+	 * Don't succeed if resizing is disabled, as a reader might be
+	 * manipulating the ring buffer and is expecting a sane state while
+	 * this is true.
+	 */
+	if (atomic_read(&buffer->resize_disabled))
+		return -EBUSY;
 
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
@@ -1347,33 +1458,67 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
-
 			/*
 			 * nothing more to do for removing pages or no update
 			 */
 			if (cpu_buffer->nr_pages_to_update <= 0)
 				continue;
-
 			/*
 			 * to add pages, make sure all new pages can be
 			 * allocated without receiving ENOMEM
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
+		}
+
+		get_online_cpus();
+		/*
+		 * Fire off all the required work handlers
+		 * Look out for offline CPUs
+		 */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update ||
+			    !cpu_online(cpu))
+				continue;
+
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
+		}
+		/*
+		 * This loop is for the CPUs that are not online.
+		 * We can't schedule anything on them, but it's not necessary
+		 * since we can change their buffer sizes without any race.
+		 */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update ||
+			    cpu_online(cpu))
+				continue;
+
+			rb_update_pages(cpu_buffer);
 		}
 
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-			}
+			if (!cpu_buffer->nr_pages_to_update||
+			    !cpu_online(cpu))
+				continue;
+
+			wait_for_completion(&cpu_buffer->update_completion);
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
+
+		put_online_cpus();
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1383,38 +1528,47 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					    &cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
 
-		update_pages_handler(cpu_buffer);
+		get_online_cpus();
+
+		if (cpu_online(cpu_id)) {
+			schedule_work_on(cpu_id,
+					 &cpu_buffer->update_pages_work);
+			wait_for_completion(&cpu_buffer->update_completion);
+		} else
+			rb_update_pages(cpu_buffer);
+
+		put_online_cpus();
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
-		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1453,21 +1607,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
@@ -3492,6 +3636,7 @@ ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu)
 
 	iter->cpu_buffer = cpu_buffer;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
 	return iter;
@@ -3555,6 +3700,7 @@ ring_buffer_read_finish(struct ring_buffer_iter *iter)
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&cpu_buffer->buffer->resize_disabled);
 	kfree(iter);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read_finish);
@@ -3662,8 +3808,12 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
+	/* Make sure all commits have finished */
+	synchronize_sched();
+
 	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
 	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
@@ -3679,6 +3829,7 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&buffer->resize_disabled);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7bb735b..401d77a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3057,20 +3057,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -3084,14 +3074,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v7 2/3] trace: Make addition of pages in ring buffer atomic
  2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
@ 2012-05-04  1:59               ` Vaibhav Nagarnaik
  2012-05-19 10:18                 ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
  2012-05-04  1:59               ` [PATCH v7 3/3] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
                                 ` (3 subsequent siblings)
  4 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-04  1:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Laurent Chavey, Justin Teravest, David Sharp, linux-kernel,
	Vaibhav Nagarnaik

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/ring_buffer.c |  102 +++++++++++++++++++++++++++++++++-----------
 1 files changed, 77 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 230ae9d..a084b4c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1252,7 +1252,7 @@ static inline unsigned long rb_page_write(struct buffer_page *bpage)
 	return local_read(&bpage->write) & RB_WRITE_MASK;
 }
 
-static void
+static int
 rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
 	struct list_head *tail_page, *to_remove, *next_page;
@@ -1359,46 +1359,97 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 	} while (to_remove_page != last_page);
 
 	RB_WARN_ON(cpu_buffer, nr_removed);
+
+	return nr_removed == 0;
 }
 
-static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+static int
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
-	rb_head_page_deactivate(cpu_buffer);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 10 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 10;
+	success = 0;
+	while (retries--) {
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page_with_bit;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
+
+		first_page = pages->next;
+		last_page  = pages->prev;
+
+		head_page_with_bit = (struct list_head *)
+				     ((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
-	atomic_dec(&cpu_buffer->record_disabled);
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	/* free pages if they weren't inserted */
+	if (!success) {
+		struct buffer_page *bpage, *tmp;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					 list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
+	}
+	return success;
 }
 
 static void rb_update_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
+	int success;
+
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		success = rb_insert_pages(cpu_buffer);
 	else
-		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+		success = rb_remove_pages(cpu_buffer,
+					-cpu_buffer->nr_pages_to_update);
 
-	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	if (success)
+		cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
 }
 
 static void update_pages_handler(struct work_struct *work)
@@ -3772,6 +3823,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v7 3/3] trace: change CPU ring buffer state from tracing_cpumask
  2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
  2012-05-04  1:59               ` [PATCH v7 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
@ 2012-05-04  1:59               ` Vaibhav Nagarnaik
  2012-05-19 10:21                 ` [tip:perf/core] tracing: " tip-bot for Vaibhav Nagarnaik
  2012-05-07 20:22               ` [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-04  1:59 UTC (permalink / raw)
  To: Steven Rostedt, Frederic Weisbecker, Ingo Molnar
  Cc: Laurent Chavey, Justin Teravest, David Sharp, linux-kernel,
	Vaibhav Nagarnaik

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
---
 kernel/trace/trace.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 401d77a..6d4c2dd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2650,10 +2650,12 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf,
 		if (cpumask_test_cpu(cpu, tracing_cpumask) &&
 				!cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_inc(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_disable_cpu(global_trace.buffer, cpu);
 		}
 		if (!cpumask_test_cpu(cpu, tracing_cpumask) &&
 				cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_dec(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_enable_cpu(global_trace.buffer, cpu);
 		}
 	}
 	arch_spin_unlock(&ftrace_max_lock);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
  2012-05-04  1:59               ` [PATCH v7 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
  2012-05-04  1:59               ` [PATCH v7 3/3] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
@ 2012-05-07 20:22               ` Steven Rostedt
  2012-05-07 21:48                 ` Vaibhav Nagarnaik
  2012-05-09  3:38               ` Steven Rostedt
  2012-05-19 10:17               ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
  4 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-05-07 20:22 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Thu, 2012-05-03 at 18:59 -0700, Vaibhav Nagarnaik wrote:
> This patch adds the capability to remove pages from a ring buffer
> without destroying any existing data in it.
> 
> This is done by removing the pages after the tail page. This makes sure
> that first all the empty pages in the ring buffer are removed. If the
> head page is one in the list of pages to be removed, then the page after
> the removed ones is made the head page. This removes the oldest data
> from the ring buffer and keeps the latest data around to be read.
> 
> To do this in a non-racey manner, tracing is stopped for a very short
> time while the pages to be removed are identified and unlinked from the
> ring buffer. The pages are freed after the tracing is restarted to
> minimize the time needed to stop tracing.
> 
> The context in which the pages from the per-cpu ring buffer are removed
> runs on the respective CPU. This minimizes the events not traced to only
> NMI trace contexts.
> 
> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>

After applying this patch, I get this:

# trace-cmd start -e all
# echo 100 > /debug/tracing/buffer_size_kb

BUG: scheduling while atomic: trace-cmd/2018/0x00000002
no locks held by trace-cmd/2018.
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 kvm uinput snd_hda_codec_idt
 snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 shpchp microcode pata_acpi firewire_ohci firewire_core crc_itu
_t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 2018, comm: trace-cmd Not tainted 3.4.0-rc2-test+ #2
Call Trace:
 [<ffffffff81480d52>] __schedule_bug+0x66/0x6a
 [<ffffffff81487d16>] __schedule+0x93/0x605
 [<ffffffff8107a045>] ? __lock_acquire+0x4dc/0xcf1
 [<ffffffff8148833b>] schedule+0x64/0x66
 [<ffffffff81486b28>] schedule_timeout+0x37/0xf7
 [<ffffffff81489386>] ? _raw_spin_unlock_irq+0x2d/0x5e
 [<ffffffff8107b117>] ? trace_hardirqs_on_caller+0x121/0x158
 [<ffffffff81487b73>] wait_for_common+0x97/0xf1
 [<ffffffff8105c922>] ? try_to_wake_up+0x1ec/0x1ec
 [<ffffffff810a4d29>] ? call_rcu_bh+0x19/0x19
 [<ffffffff810b4845>] ? tracing_iter_reset+0x8b/0x8b
 [<ffffffff81487c81>] wait_for_completion+0x1d/0x1f
 [<ffffffff8104ba3d>] wait_rcu_gp+0x5c/0x77
 [<ffffffff8104ba58>] ? wait_rcu_gp+0x77/0x77
 [<ffffffff810a39eb>] synchronize_sched+0x25/0x27
 [<ffffffff810ae27f>] ring_buffer_reset_cpu+0x4e/0xd1
 [<ffffffff810b4845>] ? tracing_iter_reset+0x8b/0x8b
 [<ffffffff810b30e6>] tracing_reset_online_cpus+0x49/0x74
 [<ffffffff810b4885>] tracing_open+0x40/0x2c9
 [<ffffffff810b4845>] ? tracing_iter_reset+0x8b/0x8b
 [<ffffffff8111491b>] __dentry_open+0x166/0x299
 [<ffffffff81115850>] nameidata_to_filp+0x60/0x67
 [<ffffffff8112270a>] do_last+0x565/0x59f
 [<ffffffff81122922>] path_openat+0xd0/0x30e
 [<ffffffff8107ace8>] ? lock_acquire+0xe0/0x112
 [<ffffffff8112cc9e>] ? alloc_fd+0x3c/0xfe
 [<ffffffff81122c5d>] do_filp_open+0x38/0x86
 [<ffffffff814892d1>] ? _raw_spin_unlock+0x48/0x56
 [<ffffffff8112cd4e>] ? alloc_fd+0xec/0xfe
 [<ffffffff811158c6>] do_sys_open+0x6f/0x101
 [<ffffffff81115979>] sys_open+0x21/0x23
 [<ffffffff8148f3e9>] system_call_fastpath+0x16/0x1b

Let me know if you need my config.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-07 20:22               ` [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
@ 2012-05-07 21:48                 ` Vaibhav Nagarnaik
  2012-05-08  0:14                   ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-07 21:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Mon, May 7, 2012 at 1:22 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> After applying this patch, I get this:
>
> # trace-cmd start -e all
> # echo 100 > /debug/tracing/buffer_size_kb
>
> BUG: scheduling while atomic: trace-cmd/2018/0x00000002
> no locks held by trace-cmd/2018.
> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 kvm uinput snd_hda_codec_idt
>  snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 shpchp microcode pata_acpi firewire_ohci firewire_core crc_itu
> _t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
> Pid: 2018, comm: trace-cmd Not tainted 3.4.0-rc2-test+ #2
> Call Trace:
>  [<ffffffff81480d52>] __schedule_bug+0x66/0x6a
>  [<ffffffff81487d16>] __schedule+0x93/0x605
>  [<ffffffff8107a045>] ? __lock_acquire+0x4dc/0xcf1
>  [<ffffffff8148833b>] schedule+0x64/0x66
>  [<ffffffff81486b28>] schedule_timeout+0x37/0xf7
>  [<ffffffff81489386>] ? _raw_spin_unlock_irq+0x2d/0x5e
>  [<ffffffff8107b117>] ? trace_hardirqs_on_caller+0x121/0x158
>  [<ffffffff81487b73>] wait_for_common+0x97/0xf1
>  [<ffffffff8105c922>] ? try_to_wake_up+0x1ec/0x1ec
>  [<ffffffff810a4d29>] ? call_rcu_bh+0x19/0x19
>  [<ffffffff810b4845>] ? tracing_iter_reset+0x8b/0x8b
>  [<ffffffff81487c81>] wait_for_completion+0x1d/0x1f
>  [<ffffffff8104ba3d>] wait_rcu_gp+0x5c/0x77
>  [<ffffffff8104ba58>] ? wait_rcu_gp+0x77/0x77
>  [<ffffffff810a39eb>] synchronize_sched+0x25/0x27
>  [<ffffffff810ae27f>] ring_buffer_reset_cpu+0x4e/0xd1

The following seems to be the culprit. I am guessing you have a preempt
kernel?

@@ -3662,8 +3808,12 @@ void ring_buffer_reset_cpu(struct ring_buffer
*buffer, int cpu)
       if (!cpumask_test_cpu(cpu, buffer->cpumask))
               return;

+       atomic_inc(&buffer->resize_disabled);
       atomic_inc(&cpu_buffer->record_disabled);

+       /* Make sure all commits have finished */
+       synchronize_sched();
+
       raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);


I guess I can disable resizing in ring_buffer_record_disable(), that
seems to be a reasonable assumption.



Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-07 21:48                 ` Vaibhav Nagarnaik
@ 2012-05-08  0:14                   ` Steven Rostedt
  0 siblings, 0 replies; 80+ messages in thread
From: Steven Rostedt @ 2012-05-08  0:14 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Mon, 2012-05-07 at 14:48 -0700, Vaibhav Nagarnaik wrote:

> The following seems to be the culprit. I am guessing you have a preempt
> kernel?

I'm one of the real-time Linux maintainers, what do you think ;-)

> 
> @@ -3662,8 +3808,12 @@ void ring_buffer_reset_cpu(struct ring_buffer
> *buffer, int cpu)
>        if (!cpumask_test_cpu(cpu, buffer->cpumask))
>                return;
> 
> +       atomic_inc(&buffer->resize_disabled);
>        atomic_inc(&cpu_buffer->record_disabled);
> 
> +       /* Make sure all commits have finished */
> +       synchronize_sched();
> +
>        raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> 
> 
> I guess I can disable resizing in ring_buffer_record_disable(), that
> seems to be a reasonable assumption.
> 

Looking into this, the culprit is really the __tracing_reset():

static void __tracing_reset(struct ring_buffer *buffer, int cpu)
{
	ftrace_disable_cpu();
	ring_buffer_reset_cpu(buffer, cpu);
	ftrace_enable_cpu();
}


This function is useless. It's from the time the ring buffer was being
converted to lockless, but today it's no longer needed. The reset of the
ring buffer just needs to have the ring buffer disabled.

The bug is on my end ;-)

We can nuke the __tracing_reset() and just call ring_buffer_reset_cpu()
directly. We no longer need those "ftrace_disable/enable_cpu()s". The
comments for the ring_buffer_reset*() should state that the ring buffer
must be disabled before calling this.

Actually, your patch fixes this. As it makes the reset itself takes care
of the the ring buffer being disabled.

OK, you don't need to send another patch set (yet ;-), I'll fix the
ftrace side, and then apply your patches, and then see what else breaks.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
                                 ` (2 preceding siblings ...)
  2012-05-07 20:22               ` [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
@ 2012-05-09  3:38               ` Steven Rostedt
  2012-05-09  5:00                 ` Vaibhav Nagarnaik
  2012-05-19 10:17               ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
  4 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-05-09  3:38 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Thu, 2012-05-03 at 18:59 -0700, Vaibhav Nagarnaik wrote:

> +		get_online_cpus();
> +		/*
> +		 * Fire off all the required work handlers
> +		 * Look out for offline CPUs
> +		 */
> +		for_each_buffer_cpu(buffer, cpu) {
> +			cpu_buffer = buffer->buffers[cpu];
> +			if (!cpu_buffer->nr_pages_to_update ||
> +			    !cpu_online(cpu))
> +				continue;
> +
> +			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
> +		}
> +		/*
> +		 * This loop is for the CPUs that are not online.
> +		 * We can't schedule anything on them, but it's not necessary
> +		 * since we can change their buffer sizes without any race.
> +		 */
> +		for_each_buffer_cpu(buffer, cpu) {
> +			cpu_buffer = buffer->buffers[cpu];
> +			if (!cpu_buffer->nr_pages_to_update ||
> +			    cpu_online(cpu))
> +				continue;
> +
> +			rb_update_pages(cpu_buffer);
>  		}

BTW, why the two loops and not just:

		for_each_buffer_cpu(buffer, cpu) {
			cpu_buffer = buffer->buffers[cpu];
			if (!cpu_buffer->nr_pages_to_update)
				continue;

			if (cpu_online(cpu))
				schedule_work_on(cpu, &cpu_buffer->update_pages_work);
			else
				rb_update_pages(cpu_buffer);
		}

??


>  
>  		/* wait for all the updates to complete */
>  		for_each_buffer_cpu(buffer, cpu) {
>  			cpu_buffer = buffer->buffers[cpu];
> -			if (cpu_buffer->nr_pages_to_update) {
> -				update_pages_handler(cpu_buffer);
> -			}
> +			if (!cpu_buffer->nr_pages_to_update||

			      !cpu_buffer->nr_pages_to_update ||

-- Steve

> +			    !cpu_online(cpu))
> +				continue;
> +
> +			wait_for_completion(&cpu_buffer->update_completion);
> +			/* reset this value */
> +			cpu_buffer->nr_pages_to_update = 0;
>  		}



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-09  3:38               ` Steven Rostedt
@ 2012-05-09  5:00                 ` Vaibhav Nagarnaik
  2012-05-09 14:29                   ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-09  5:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Tue, May 8, 2012 at 8:38 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> BTW, why the two loops and not just:
>
>                for_each_buffer_cpu(buffer, cpu) {
>                        cpu_buffer = buffer->buffers[cpu];
>                        if (!cpu_buffer->nr_pages_to_update)
>                                continue;
>
>                        if (cpu_online(cpu))
>                                schedule_work_on(cpu, &cpu_buffer->update_pages_work);
>                        else
>                                rb_update_pages(cpu_buffer);
>                }
>
> ??
>
>
>>
>>               /* wait for all the updates to complete */
>>               for_each_buffer_cpu(buffer, cpu) {
>>                       cpu_buffer = buffer->buffers[cpu];
>> -                     if (cpu_buffer->nr_pages_to_update) {
>> -                             update_pages_handler(cpu_buffer);
>> -                     }
>> +                     if (!cpu_buffer->nr_pages_to_update||
>
>                              !cpu_buffer->nr_pages_to_update ||

This schedules work for all online CPUs and the offline CPUs resizing
(if any) can occur concurrently. It might not be too much of big deal
to just make it one loop.



Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-09  5:00                 ` Vaibhav Nagarnaik
@ 2012-05-09 14:29                   ` Steven Rostedt
  2012-05-09 17:46                     ` Vaibhav Nagarnaik
  0 siblings, 1 reply; 80+ messages in thread
From: Steven Rostedt @ 2012-05-09 14:29 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Tue, 2012-05-08 at 22:00 -0700, Vaibhav Nagarnaik wrote:
> On Tue, May 8, 2012 at 8:38 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > BTW, why the two loops and not just:
> >
> >                for_each_buffer_cpu(buffer, cpu) {
> >                        cpu_buffer = buffer->buffers[cpu];
> >                        if (!cpu_buffer->nr_pages_to_update)
> >                                continue;
> >
> >                        if (cpu_online(cpu))
> >                                schedule_work_on(cpu, &cpu_buffer->update_pages_work);
> >                        else
> >                                rb_update_pages(cpu_buffer);
> >                }
> >
> > ??
> >
> >
> >>
> >>               /* wait for all the updates to complete */
> >>               for_each_buffer_cpu(buffer, cpu) {
> >>                       cpu_buffer = buffer->buffers[cpu];
> >> -                     if (cpu_buffer->nr_pages_to_update) {
> >> -                             update_pages_handler(cpu_buffer);
> >> -                     }
> >> +                     if (!cpu_buffer->nr_pages_to_update||
> >
> >                              !cpu_buffer->nr_pages_to_update ||
> 
> This schedules work for all online CPUs and the offline CPUs resizing
> (if any) can occur concurrently. It might not be too much of big deal
> to just make it one loop.

This is far from a hot path. In fact, it's quite slow. Lets not uglify
code just to optimize something that's not time critical.

Please combine these two into a single loop. The wait for completion is
fine as a separate loop.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-09 14:29                   ` Steven Rostedt
@ 2012-05-09 17:46                     ` Vaibhav Nagarnaik
  2012-05-09 17:54                       ` Steven Rostedt
  0 siblings, 1 reply; 80+ messages in thread
From: Vaibhav Nagarnaik @ 2012-05-09 17:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Wed, May 9, 2012 at 7:29 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> This is far from a hot path. In fact, it's quite slow. Lets not uglify
> code just to optimize something that's not time critical.
>
> Please combine these two into a single loop. The wait for completion is
> fine as a separate loop.
>

Sure, I will update the patch with that change. Have you found any
other issues in testing the patch?

Vaibhav Nagarnaik

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic
  2012-05-09 17:46                     ` Vaibhav Nagarnaik
@ 2012-05-09 17:54                       ` Steven Rostedt
  0 siblings, 0 replies; 80+ messages in thread
From: Steven Rostedt @ 2012-05-09 17:54 UTC (permalink / raw)
  To: Vaibhav Nagarnaik
  Cc: Frederic Weisbecker, Ingo Molnar, Laurent Chavey,
	Justin Teravest, David Sharp, linux-kernel

On Wed, 2012-05-09 at 10:46 -0700, Vaibhav Nagarnaik wrote:
> On Wed, May 9, 2012 at 7:29 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > This is far from a hot path. In fact, it's quite slow. Lets not uglify
> > code just to optimize something that's not time critical.
> >
> > Please combine these two into a single loop. The wait for completion is
> > fine as a separate loop.
> >
> 
> Sure, I will update the patch with that change. Have you found any
> other issues in testing the patch?
> 

Nope, not yet. Sorry for the spotty checks, I'm doing several other
things in between.

-- Steve



^ permalink raw reply	[flat|nested] 80+ messages in thread

* [tip:perf/core] ring-buffer: Make removal of ring buffer pages atomic
  2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
                                 ` (3 preceding siblings ...)
  2012-05-09  3:38               ` Steven Rostedt
@ 2012-05-19 10:17               ` tip-bot for Vaibhav Nagarnaik
  4 siblings, 0 replies; 80+ messages in thread
From: tip-bot for Vaibhav Nagarnaik @ 2012-05-19 10:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, vnagarnaik, hpa, mingo, dhsharp, fweisbec,
	rostedt, chavey, tglx, teravest

Commit-ID:  83f40318dab00e3298a1f6d0b12ac025e84e478d
Gitweb:     http://git.kernel.org/tip/83f40318dab00e3298a1f6d0b12ac025e84e478d
Author:     Vaibhav Nagarnaik <vnagarnaik@google.com>
AuthorDate: Thu, 3 May 2012 18:59:50 -0700
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Wed, 16 May 2012 16:18:57 -0400

ring-buffer: Make removal of ring buffer pages atomic

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |  265 ++++++++++++++++++++++++++++++++++----------
 kernel/trace/trace.c       |   20 +---
 2 files changed, 209 insertions(+), 76 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 2d5eb33..27ac37e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -23,6 +23,8 @@
 #include <asm/local.h>
 #include "trace.h"
 
+static void update_pages_handler(struct work_struct *work);
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -470,12 +472,15 @@ struct ring_buffer_per_cpu {
 	/* ring buffer pages to update, > 0 to add, < 0 to remove */
 	int				nr_pages_to_update;
 	struct list_head		new_pages; /* new pages to add */
+	struct work_struct		update_pages_work;
+	struct completion		update_completion;
 };
 
 struct ring_buffer {
 	unsigned			flags;
 	int				cpus;
 	atomic_t			record_disabled;
+	atomic_t			resize_disabled;
 	cpumask_var_t			cpumask;
 
 	struct lock_class_key		*reader_lock_key;
@@ -1048,6 +1053,8 @@ rb_allocate_cpu_buffer(struct ring_buffer *buffer, int nr_pages, int cpu)
 	raw_spin_lock_init(&cpu_buffer->reader_lock);
 	lockdep_set_class(&cpu_buffer->reader_lock, buffer->reader_lock_key);
 	cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+	INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler);
+	init_completion(&cpu_buffer->update_completion);
 
 	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
 			    GFP_KERNEL, cpu_to_node(cpu));
@@ -1235,32 +1242,123 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
+static inline unsigned long rb_page_entries(struct buffer_page *bpage)
+{
+	return local_read(&bpage->entries) & RB_WRITE_MASK;
+}
+
+static inline unsigned long rb_page_write(struct buffer_page *bpage)
+{
+	return local_read(&bpage->write) & RB_WRITE_MASK;
+}
+
 static void
-rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned nr_pages)
+rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *tail_page, *to_remove, *next_page;
+	struct buffer_page *to_remove_page, *tmp_iter_page;
+	struct buffer_page *last_page, *first_page;
+	unsigned int nr_removed;
+	unsigned long head_bit;
+	int page_entries;
+
+	head_bit = 0;
 
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	rb_head_page_deactivate(cpu_buffer);
+	atomic_inc(&cpu_buffer->record_disabled);
+	/*
+	 * We don't race with the readers since we have acquired the reader
+	 * lock. We also don't race with writers after disabling recording.
+	 * This makes it easy to figure out the first and the last page to be
+	 * removed from the list. We unlink all the pages in between including
+	 * the first and last pages. This is done in a busy loop so that we
+	 * lose the least number of traces.
+	 * The pages are freed after we restart recording and unlock readers.
+	 */
+	tail_page = &cpu_buffer->tail_page->list;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-			goto out;
-		p = cpu_buffer->pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		free_buffer_page(bpage);
+	/*
+	 * tail page might be on reader page, we remove the next page
+	 * from the ring buffer
+	 */
+	if (cpu_buffer->tail_page == cpu_buffer->reader_page)
+		tail_page = rb_list_head(tail_page->next);
+	to_remove = tail_page;
+
+	/* start of pages to remove */
+	first_page = list_entry(rb_list_head(to_remove->next),
+				struct buffer_page, list);
+
+	for (nr_removed = 0; nr_removed < nr_pages; nr_removed++) {
+		to_remove = rb_list_head(to_remove)->next;
+		head_bit |= (unsigned long)to_remove & RB_PAGE_HEAD;
 	}
-	if (RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages)))
-		goto out;
 
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
+	next_page = rb_list_head(to_remove)->next;
 
-out:
+	/*
+	 * Now we remove all pages between tail_page and next_page.
+	 * Make sure that we have head_bit value preserved for the
+	 * next page
+	 */
+	tail_page->next = (struct list_head *)((unsigned long)next_page |
+						head_bit);
+	next_page = rb_list_head(next_page);
+	next_page->prev = tail_page;
+
+	/* make sure pages points to a valid page in the ring buffer */
+	cpu_buffer->pages = next_page;
+
+	/* update head page */
+	if (head_bit)
+		cpu_buffer->head_page = list_entry(next_page,
+						struct buffer_page, list);
+
+	/*
+	 * change read pointer to make sure any read iterators reset
+	 * themselves
+	 */
+	cpu_buffer->read = 0;
+
+	/* pages are removed, resume tracing and then free the pages */
+	atomic_dec(&cpu_buffer->record_disabled);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	RB_WARN_ON(cpu_buffer, list_empty(cpu_buffer->pages));
+
+	/* last buffer page to remove */
+	last_page = list_entry(rb_list_head(to_remove), struct buffer_page,
+				list);
+	tmp_iter_page = first_page;
+
+	do {
+		to_remove_page = tmp_iter_page;
+		rb_inc_page(cpu_buffer, &tmp_iter_page);
+
+		/* update the counters */
+		page_entries = rb_page_entries(to_remove_page);
+		if (page_entries) {
+			/*
+			 * If something was added to this page, it was full
+			 * since it is not the tail page. So we deduct the
+			 * bytes consumed in ring buffer from here.
+			 * No need to update overruns, since this page is
+			 * deleted from ring buffer and its entries are
+			 * already accounted for.
+			 */
+			local_sub(BUF_PAGE_SIZE, &cpu_buffer->entries_bytes);
+		}
+
+		/*
+		 * We have already removed references to this list item, just
+		 * free up the buffer_page and its page
+		 */
+		free_buffer_page(to_remove_page);
+		nr_removed--;
+
+	} while (to_remove_page != last_page);
+
+	RB_WARN_ON(cpu_buffer, nr_removed);
 }
 
 static void
@@ -1272,6 +1370,8 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	unsigned i;
 
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
+	/* stop the writers while inserting pages */
+	atomic_inc(&cpu_buffer->record_disabled);
 	rb_head_page_deactivate(cpu_buffer);
 
 	for (i = 0; i < nr_pages; i++) {
@@ -1286,19 +1386,27 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
 	rb_check_pages(cpu_buffer);
 
 out:
+	atomic_dec(&cpu_buffer->record_disabled);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
 }
 
-static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
+static void rb_update_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	if (cpu_buffer->nr_pages_to_update > 0)
 		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
 				cpu_buffer->nr_pages_to_update);
 	else
 		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+
 	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
-	/* reset this value */
-	cpu_buffer->nr_pages_to_update = 0;
+}
+
+static void update_pages_handler(struct work_struct *work)
+{
+	struct ring_buffer_per_cpu *cpu_buffer = container_of(work,
+			struct ring_buffer_per_cpu, update_pages_work);
+	rb_update_pages(cpu_buffer);
+	complete(&cpu_buffer->update_completion);
 }
 
 /**
@@ -1308,14 +1416,14 @@ static void update_pages_handler(struct ring_buffer_per_cpu *cpu_buffer)
  *
  * Minimum size is 2 * BUF_PAGE_SIZE.
  *
- * Returns -1 on failure.
+ * Returns 0 on success and < 0 on failure.
  */
 int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 			int cpu_id)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	unsigned nr_pages;
-	int cpu;
+	int cpu, err = 0;
 
 	/*
 	 * Always succeed at resizing a non-existent buffer:
@@ -1330,15 +1438,18 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 	if (size < BUF_PAGE_SIZE * 2)
 		size = BUF_PAGE_SIZE * 2;
 
-	atomic_inc(&buffer->record_disabled);
+	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
-	/* Make sure all writers are done with this buffer. */
-	synchronize_sched();
+	/*
+	 * Don't succeed if resizing is disabled, as a reader might be
+	 * manipulating the ring buffer and is expecting a sane state while
+	 * this is true.
+	 */
+	if (atomic_read(&buffer->resize_disabled))
+		return -EBUSY;
 
+	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
-	get_online_cpus();
-
-	nr_pages = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
 
 	if (cpu_id == RING_BUFFER_ALL_CPUS) {
 		/* calculate the pages to update */
@@ -1347,33 +1458,67 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 
 			cpu_buffer->nr_pages_to_update = nr_pages -
 							cpu_buffer->nr_pages;
-
 			/*
 			 * nothing more to do for removing pages or no update
 			 */
 			if (cpu_buffer->nr_pages_to_update <= 0)
 				continue;
-
 			/*
 			 * to add pages, make sure all new pages can be
 			 * allocated without receiving ENOMEM
 			 */
 			INIT_LIST_HEAD(&cpu_buffer->new_pages);
 			if (__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu))
+						&cpu_buffer->new_pages, cpu)) {
 				/* not enough memory for new pages */
-				goto no_mem;
+				err = -ENOMEM;
+				goto out_err;
+			}
+		}
+
+		get_online_cpus();
+		/*
+		 * Fire off all the required work handlers
+		 * Look out for offline CPUs
+		 */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update ||
+			    !cpu_online(cpu))
+				continue;
+
+			schedule_work_on(cpu, &cpu_buffer->update_pages_work);
+		}
+		/*
+		 * This loop is for the CPUs that are not online.
+		 * We can't schedule anything on them, but it's not necessary
+		 * since we can change their buffer sizes without any race.
+		 */
+		for_each_buffer_cpu(buffer, cpu) {
+			cpu_buffer = buffer->buffers[cpu];
+			if (!cpu_buffer->nr_pages_to_update ||
+			    cpu_online(cpu))
+				continue;
+
+			rb_update_pages(cpu_buffer);
 		}
 
 		/* wait for all the updates to complete */
 		for_each_buffer_cpu(buffer, cpu) {
 			cpu_buffer = buffer->buffers[cpu];
-			if (cpu_buffer->nr_pages_to_update) {
-				update_pages_handler(cpu_buffer);
-			}
+			if (!cpu_buffer->nr_pages_to_update ||
+			    !cpu_online(cpu))
+				continue;
+
+			wait_for_completion(&cpu_buffer->update_completion);
+			/* reset this value */
+			cpu_buffer->nr_pages_to_update = 0;
 		}
+
+		put_online_cpus();
 	} else {
 		cpu_buffer = buffer->buffers[cpu_id];
+
 		if (nr_pages == cpu_buffer->nr_pages)
 			goto out;
 
@@ -1383,38 +1528,47 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
 		INIT_LIST_HEAD(&cpu_buffer->new_pages);
 		if (cpu_buffer->nr_pages_to_update > 0 &&
 			__rb_allocate_pages(cpu_buffer->nr_pages_to_update,
-						&cpu_buffer->new_pages, cpu_id))
-			goto no_mem;
+					    &cpu_buffer->new_pages, cpu_id)) {
+			err = -ENOMEM;
+			goto out_err;
+		}
 
-		update_pages_handler(cpu_buffer);
+		get_online_cpus();
+
+		if (cpu_online(cpu_id)) {
+			schedule_work_on(cpu_id,
+					 &cpu_buffer->update_pages_work);
+			wait_for_completion(&cpu_buffer->update_completion);
+		} else
+			rb_update_pages(cpu_buffer);
+
+		put_online_cpus();
+		/* reset this value */
+		cpu_buffer->nr_pages_to_update = 0;
 	}
 
  out:
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-
-	atomic_dec(&buffer->record_disabled);
-
 	return size;
 
- no_mem:
+ out_err:
 	for_each_buffer_cpu(buffer, cpu) {
 		struct buffer_page *bpage, *tmp;
+
 		cpu_buffer = buffer->buffers[cpu];
-		/* reset this number regardless */
 		cpu_buffer->nr_pages_to_update = 0;
+
 		if (list_empty(&cpu_buffer->new_pages))
 			continue;
+
 		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
 					list) {
 			list_del_init(&bpage->list);
 			free_buffer_page(bpage);
 		}
 	}
-	put_online_cpus();
 	mutex_unlock(&buffer->mutex);
-	atomic_dec(&buffer->record_disabled);
-	return -ENOMEM;
+	return err;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_resize);
 
@@ -1453,21 +1607,11 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
 	return __rb_page_index(iter->head_page, iter->head);
 }
 
-static inline unsigned long rb_page_write(struct buffer_page *bpage)
-{
-	return local_read(&bpage->write) & RB_WRITE_MASK;
-}
-
 static inline unsigned rb_page_commit(struct buffer_page *bpage)
 {
 	return local_read(&bpage->page->commit);
 }
 
-static inline unsigned long rb_page_entries(struct buffer_page *bpage)
-{
-	return local_read(&bpage->entries) & RB_WRITE_MASK;
-}
-
 /* Size is determined by what has been committed */
 static inline unsigned rb_page_size(struct buffer_page *bpage)
 {
@@ -3492,6 +3636,7 @@ ring_buffer_read_prepare(struct ring_buffer *buffer, int cpu)
 
 	iter->cpu_buffer = cpu_buffer;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
 	return iter;
@@ -3555,6 +3700,7 @@ ring_buffer_read_finish(struct ring_buffer_iter *iter)
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&cpu_buffer->buffer->resize_disabled);
 	kfree(iter);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read_finish);
@@ -3662,8 +3808,12 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
 		return;
 
+	atomic_inc(&buffer->resize_disabled);
 	atomic_inc(&cpu_buffer->record_disabled);
 
+	/* Make sure all commits have finished */
+	synchronize_sched();
+
 	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
 	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
@@ -3679,6 +3829,7 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	atomic_dec(&cpu_buffer->record_disabled);
+	atomic_dec(&buffer->resize_disabled);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d1b3469..dfbd86c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3076,20 +3076,10 @@ static int __tracing_resize_ring_buffer(unsigned long size, int cpu)
 
 static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 {
-	int cpu, ret = size;
+	int ret = size;
 
 	mutex_lock(&trace_types_lock);
 
-	tracing_stop();
-
-	/* disable all cpu buffers */
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_inc(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_inc(&max_tr.data[cpu]->disabled);
-	}
-
 	if (cpu_id != RING_BUFFER_ALL_CPUS) {
 		/* make sure, this cpu is enabled in the mask */
 		if (!cpumask_test_cpu(cpu_id, tracing_buffer_mask)) {
@@ -3103,14 +3093,6 @@ static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id)
 		ret = -ENOMEM;
 
 out:
-	for_each_tracing_cpu(cpu) {
-		if (global_trace.data[cpu])
-			atomic_dec(&global_trace.data[cpu]->disabled);
-		if (max_tr.data[cpu])
-			atomic_dec(&max_tr.data[cpu]->disabled);
-	}
-
-	tracing_start();
 	mutex_unlock(&trace_types_lock);
 
 	return ret;

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [tip:perf/core] ring-buffer: Make addition of pages in ring buffer atomic
  2012-05-04  1:59               ` [PATCH v7 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
@ 2012-05-19 10:18                 ` tip-bot for Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: tip-bot for Vaibhav Nagarnaik @ 2012-05-19 10:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, vnagarnaik, hpa, mingo, dhsharp, fweisbec,
	rostedt, chavey, tglx, teravest

Commit-ID:  5040b4b7bcc26a311c799d46f67174bcb20d05dd
Gitweb:     http://git.kernel.org/tip/5040b4b7bcc26a311c799d46f67174bcb20d05dd
Author:     Vaibhav Nagarnaik <vnagarnaik@google.com>
AuthorDate: Thu, 3 May 2012 18:59:51 -0700
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Wed, 16 May 2012 16:25:51 -0400

ring-buffer: Make addition of pages in ring buffer atomic

This patch adds the capability to add new pages to a ring buffer
atomically while write operations are going on. This makes it possible
to expand the ring buffer size without reinitializing the ring buffer.

The new pages are attached between the head page and its previous page.

Link: http://lkml.kernel.org/r/1336096792-25373-2-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |  102 +++++++++++++++++++++++++++++++++-----------
 1 files changed, 77 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 27ac37e..d673ef0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1252,7 +1252,7 @@ static inline unsigned long rb_page_write(struct buffer_page *bpage)
 	return local_read(&bpage->write) & RB_WRITE_MASK;
 }
 
-static void
+static int
 rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 {
 	struct list_head *tail_page, *to_remove, *next_page;
@@ -1359,46 +1359,97 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned int nr_pages)
 	} while (to_remove_page != last_page);
 
 	RB_WARN_ON(cpu_buffer, nr_removed);
+
+	return nr_removed == 0;
 }
 
-static void
-rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer,
-		struct list_head *pages, unsigned nr_pages)
+static int
+rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
-	struct buffer_page *bpage;
-	struct list_head *p;
-	unsigned i;
+	struct list_head *pages = &cpu_buffer->new_pages;
+	int retries, success;
 
 	raw_spin_lock_irq(&cpu_buffer->reader_lock);
-	/* stop the writers while inserting pages */
-	atomic_inc(&cpu_buffer->record_disabled);
-	rb_head_page_deactivate(cpu_buffer);
+	/*
+	 * We are holding the reader lock, so the reader page won't be swapped
+	 * in the ring buffer. Now we are racing with the writer trying to
+	 * move head page and the tail page.
+	 * We are going to adapt the reader page update process where:
+	 * 1. We first splice the start and end of list of new pages between
+	 *    the head page and its previous page.
+	 * 2. We cmpxchg the prev_page->next to point from head page to the
+	 *    start of new pages list.
+	 * 3. Finally, we update the head->prev to the end of new list.
+	 *
+	 * We will try this process 10 times, to make sure that we don't keep
+	 * spinning.
+	 */
+	retries = 10;
+	success = 0;
+	while (retries--) {
+		struct list_head *head_page, *prev_page, *r;
+		struct list_head *last_page, *first_page;
+		struct list_head *head_page_with_bit;
 
-	for (i = 0; i < nr_pages; i++) {
-		if (RB_WARN_ON(cpu_buffer, list_empty(pages)))
-			goto out;
-		p = pages->next;
-		bpage = list_entry(p, struct buffer_page, list);
-		list_del_init(&bpage->list);
-		list_add_tail(&bpage->list, cpu_buffer->pages);
+		head_page = &rb_set_head_page(cpu_buffer)->list;
+		prev_page = head_page->prev;
+
+		first_page = pages->next;
+		last_page  = pages->prev;
+
+		head_page_with_bit = (struct list_head *)
+				     ((unsigned long)head_page | RB_PAGE_HEAD);
+
+		last_page->next = head_page_with_bit;
+		first_page->prev = prev_page;
+
+		r = cmpxchg(&prev_page->next, head_page_with_bit, first_page);
+
+		if (r == head_page_with_bit) {
+			/*
+			 * yay, we replaced the page pointer to our new list,
+			 * now, we just have to update to head page's prev
+			 * pointer to point to end of list
+			 */
+			head_page->prev = last_page;
+			success = 1;
+			break;
+		}
 	}
-	rb_reset_cpu(cpu_buffer);
-	rb_check_pages(cpu_buffer);
 
-out:
-	atomic_dec(&cpu_buffer->record_disabled);
+	if (success)
+		INIT_LIST_HEAD(pages);
+	/*
+	 * If we weren't successful in adding in new pages, warn and stop
+	 * tracing
+	 */
+	RB_WARN_ON(cpu_buffer, !success);
 	raw_spin_unlock_irq(&cpu_buffer->reader_lock);
+
+	/* free pages if they weren't inserted */
+	if (!success) {
+		struct buffer_page *bpage, *tmp;
+		list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages,
+					 list) {
+			list_del_init(&bpage->list);
+			free_buffer_page(bpage);
+		}
+	}
+	return success;
 }
 
 static void rb_update_pages(struct ring_buffer_per_cpu *cpu_buffer)
 {
+	int success;
+
 	if (cpu_buffer->nr_pages_to_update > 0)
-		rb_insert_pages(cpu_buffer, &cpu_buffer->new_pages,
-				cpu_buffer->nr_pages_to_update);
+		success = rb_insert_pages(cpu_buffer);
 	else
-		rb_remove_pages(cpu_buffer, -cpu_buffer->nr_pages_to_update);
+		success = rb_remove_pages(cpu_buffer,
+					-cpu_buffer->nr_pages_to_update);
 
-	cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
+	if (success)
+		cpu_buffer->nr_pages += cpu_buffer->nr_pages_to_update;
 }
 
 static void update_pages_handler(struct work_struct *work)
@@ -3772,6 +3823,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->commit_page = cpu_buffer->head_page;
 
 	INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
+	INIT_LIST_HEAD(&cpu_buffer->new_pages);
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [tip:perf/core] tracing: change CPU ring buffer state from tracing_cpumask
  2012-05-04  1:59               ` [PATCH v7 3/3] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
@ 2012-05-19 10:21                 ` tip-bot for Vaibhav Nagarnaik
  0 siblings, 0 replies; 80+ messages in thread
From: tip-bot for Vaibhav Nagarnaik @ 2012-05-19 10:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, vnagarnaik, hpa, mingo, dhsharp, fweisbec,
	rostedt, chavey, tglx, teravest

Commit-ID:  71babb2705e2203a64c27ede13ae3508a0d2c16c
Gitweb:     http://git.kernel.org/tip/71babb2705e2203a64c27ede13ae3508a0d2c16c
Author:     Vaibhav Nagarnaik <vnagarnaik@google.com>
AuthorDate: Thu, 3 May 2012 18:59:52 -0700
Committer:  Steven Rostedt <rostedt@goodmis.org>
CommitDate: Wed, 16 May 2012 19:50:38 -0400

tracing: change CPU ring buffer state from tracing_cpumask

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

        This is a mask that lets the user only trace
        on specified CPUS. The format is a hex string
        representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Link: http://lkml.kernel.org/r/1336096792-25373-3-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0ed4df0..08a08ba 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2669,10 +2669,12 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf,
 		if (cpumask_test_cpu(cpu, tracing_cpumask) &&
 				!cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_inc(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_disable_cpu(global_trace.buffer, cpu);
 		}
 		if (!cpumask_test_cpu(cpu, tracing_cpumask) &&
 				cpumask_test_cpu(cpu, tracing_cpumask_new)) {
 			atomic_dec(&global_trace.data[cpu]->disabled);
+			ring_buffer_record_enable_cpu(global_trace.buffer, cpu);
 		}
 	}
 	arch_spin_unlock(&ftrace_max_lock);

^ permalink raw reply related	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2012-05-19 10:23 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-26 22:59 [PATCH 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
2011-07-26 22:59 ` [PATCH 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
2011-07-29 18:01   ` Steven Rostedt
2011-07-29 19:09     ` Vaibhav Nagarnaik
2011-07-26 22:59 ` [PATCH 2/5] trace: Add ring buffer stats to measure rate of events Vaibhav Nagarnaik
2011-07-29 18:10   ` Steven Rostedt
2011-07-29 19:10     ` Vaibhav Nagarnaik
2011-07-26 22:59 ` [PATCH 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
2011-07-29 18:14   ` Steven Rostedt
2011-07-29 19:13     ` Vaibhav Nagarnaik
2011-07-29 21:25       ` Steven Rostedt
2011-07-26 22:59 ` [PATCH 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
2011-07-29 21:23   ` Steven Rostedt
2011-07-29 23:30     ` Vaibhav Nagarnaik
2011-07-30  1:12       ` Steven Rostedt
2011-07-30  1:50         ` David Sharp
2011-07-30  2:43           ` Steven Rostedt
2011-07-30  3:44             ` David Sharp
2011-07-26 22:59 ` [PATCH 5/5] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
2011-08-16 21:46 ` [PATCH v2 0/5] Add dynamic updates to trace ring buffer Vaibhav Nagarnaik
2011-08-16 21:46 ` [PATCH v2 1/5] trace: Add a new readonly entry to report total buffer size Vaibhav Nagarnaik
2011-08-16 21:46 ` [PATCH v2 2/5] trace: Add ring buffer stats to measure rate of events Vaibhav Nagarnaik
2011-08-16 21:46 ` [PATCH v2 3/5] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
2011-08-22 20:29   ` Steven Rostedt
2011-08-22 20:36     ` Vaibhav Nagarnaik
2011-08-22 22:09   ` [PATCH v3] " Vaibhav Nagarnaik
2011-08-23  0:49     ` Steven Rostedt
2011-08-23  1:16       ` Vaibhav Nagarnaik
2011-08-23  1:17   ` Vaibhav Nagarnaik
2011-09-03  2:45     ` Steven Rostedt
2011-09-06 18:56       ` Vaibhav Nagarnaik
2011-09-07 17:13         ` Steven Rostedt
2011-10-12  1:20     ` [PATCH v4 1/4] " Vaibhav Nagarnaik
2012-01-31 23:53       ` Vaibhav Nagarnaik
2012-02-02  2:42         ` Steven Rostedt
2012-02-02 19:20           ` Vaibhav Nagarnaik
2012-02-02 20:00       ` [PATCH v5 " Vaibhav Nagarnaik
2012-02-02 20:00         ` [PATCH v5 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
2012-04-21  4:27           ` Steven Rostedt
2012-04-23 17:31             ` Vaibhav Nagarnaik
2012-04-25 21:18           ` [PATCH v6 1/3] " Vaibhav Nagarnaik
2012-04-25 21:18             ` [PATCH v6 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
2012-04-25 21:18             ` [PATCH v6 3/3] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
2012-05-03  1:55             ` [PATCH v6 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
2012-05-03  6:40               ` Vaibhav Nagarnaik
2012-05-03 12:57                 ` Steven Rostedt
2012-05-03 14:12                   ` Steven Rostedt
2012-05-03 18:43                     ` Vaibhav Nagarnaik
2012-05-03 18:54                       ` Steven Rostedt
2012-05-03 18:54                         ` Vaibhav Nagarnaik
2012-05-04  1:59             ` [PATCH v7 " Vaibhav Nagarnaik
2012-05-04  1:59               ` [PATCH v7 2/3] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
2012-05-19 10:18                 ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
2012-05-04  1:59               ` [PATCH v7 3/3] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
2012-05-19 10:21                 ` [tip:perf/core] tracing: " tip-bot for Vaibhav Nagarnaik
2012-05-07 20:22               ` [PATCH v7 1/3] trace: Make removal of ring buffer pages atomic Steven Rostedt
2012-05-07 21:48                 ` Vaibhav Nagarnaik
2012-05-08  0:14                   ` Steven Rostedt
2012-05-09  3:38               ` Steven Rostedt
2012-05-09  5:00                 ` Vaibhav Nagarnaik
2012-05-09 14:29                   ` Steven Rostedt
2012-05-09 17:46                     ` Vaibhav Nagarnaik
2012-05-09 17:54                       ` Steven Rostedt
2012-05-19 10:17               ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
2012-02-02 20:00         ` [PATCH v5 3/4] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
2012-02-02 20:00         ` [PATCH v5 4/4] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
2012-03-08 23:51         ` [PATCH v5 1/4] trace: Add per_cpu ring buffer control files Vaibhav Nagarnaik
2012-05-02 21:03         ` [tip:perf/core] ring-buffer: " tip-bot for Vaibhav Nagarnaik
2011-10-12  1:20     ` [PATCH v4 2/4] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
2011-10-12  1:20     ` [PATCH v4 3/4] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
2011-10-12  1:20     ` [PATCH v4 4/4] trace: change CPU ring buffer state from tracing_cpumask Vaibhav Nagarnaik
2011-08-16 21:46 ` [PATCH v2 4/5] trace: Make removal of ring buffer pages atomic Vaibhav Nagarnaik
2011-08-23  3:27   ` Steven Rostedt
2011-08-23 18:55     ` Vaibhav Nagarnaik
2011-08-23 18:55   ` [PATCH v3 " Vaibhav Nagarnaik
2011-08-23 19:16     ` David Sharp
2011-08-23 19:20       ` Vaibhav Nagarnaik
2011-08-23 19:24       ` Steven Rostedt
2011-08-23 18:55   ` [PATCH v3 5/5] trace: Make addition of pages in ring buffer atomic Vaibhav Nagarnaik
2011-08-16 21:46 ` [PATCH v2 " Vaibhav Nagarnaik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).