linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3][RFC] trace_printk() using percpu buffers
@ 2011-10-08 17:02 Steven Rostedt
  2011-10-08 17:02 ` [PATCH 1/3][RFC] tracing: Do not allocate buffer for trace_marker Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Steven Rostedt @ 2011-10-08 17:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1469 bytes --]

Peter,

You had issues with the previous version of my trace_printk() code.
I rewrote it to do the following.

By default, it still uses the single buffer protected by a spinlock
and an atomic (for NMIs). The NMI case can cause dropped prints if
the NMI happens while a trace_printk() is processing.

When trace_printk_percpu is enabled, either via the trace options or
the kernel command line, then two sets of percpu buffers are made,
one for normal and irqs (interrupts are still disabled), and the other
is for NMIs. These can be added or removed at anytime.

The last patch adds a CONFIG_TRACE_PRINTK_PERCPU that makes trace_printk()
permanently use two sets of per_cpu buffers, and these can not be
removed. This will give the least amount of overhead for trace_printk()
with the sacrifice of memory overhead. This is an option I could imagine
you would just set and forget about.

Let me know your thoughts.

Thanks,

-- Steve


Steven Rostedt (3):
      tracing: Do not allocate buffer for trace_marker
      tracing: Add optional percpu buffers for trace_printk()
      tracing: Add config to permanently have trace_printk() use percpu buffers

----
 Documentation/kernel-parameters.txt |   11 +
 kernel/trace/Kconfig                |   18 ++
 kernel/trace/trace.c                |  360 ++++++++++++++++++++++++++++-------
 kernel/trace/trace.h                |    1 +
 4 files changed, 320 insertions(+), 70 deletions(-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3][RFC] tracing: Do not allocate buffer for trace_marker
  2011-10-08 17:02 [PATCH 0/3][RFC] trace_printk() using percpu buffers Steven Rostedt
@ 2011-10-08 17:02 ` Steven Rostedt
  2011-10-08 17:02 ` [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk() Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2011-10-08 17:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 4873 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

When doing intense tracing, the kmalloc inside trace_marker can
introduce side effects to what is being traced.

As trace_marker() is used by userspace to inject data into the
kernel ring buffer, it needs to do so with the least amount
of intrusion to the operations of the kernel or the user space
application.

As the ring buffer is designed to write directly into the buffer
without the need to make a temporary buffer, and userspace already
went through the hassle of knowing how big the write will be,
we can simply pin the userspace pages and write the data directly
into the buffer. This improves the impact of tracing via trace_marker
tremendously!

Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the
use of get_user_pages_fast() and kmap_atomic().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |  111 +++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 13f2b84..f86efe9 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3628,22 +3628,24 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static int mark_printk(const char *fmt, ...)
-{
-	int ret;
-	va_list args;
-	va_start(args, fmt);
-	ret = trace_vprintk(0, fmt, args);
-	va_end(args);
-	return ret;
-}
-
 static ssize_t
 tracing_mark_write(struct file *filp, const char __user *ubuf,
 					size_t cnt, loff_t *fpos)
 {
-	char *buf;
-	size_t written;
+	unsigned long addr = (unsigned long)ubuf;
+	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
+	struct print_entry *entry;
+	unsigned long irq_flags;
+	struct page *pages[2];
+	int nr_pages = 1;
+	ssize_t written;
+	void *page1;
+	void *page2;
+	int offset;
+	int size;
+	int len;
+	int ret;
 
 	if (tracing_disabled)
 		return -EINVAL;
@@ -3651,28 +3653,81 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
 	if (cnt > TRACE_BUF_SIZE)
 		cnt = TRACE_BUF_SIZE;
 
-	buf = kmalloc(cnt + 2, GFP_KERNEL);
-	if (buf == NULL)
-		return -ENOMEM;
+	/*
+	 * Userspace is injecting traces into the kernel trace buffer.
+	 * We want to be as non intrusive as possible.
+	 * To do so, we do not want to allocate any special buffers
+	 * or take any locks, but instead write the userspace data
+	 * straight into the ring buffer.
+	 *
+	 * First we need to pin the userspace buffer into memory,
+	 * which, most likely it is, because it just referenced it.
+	 * But there's no guarantee that it is. By using get_user_pages_fast()
+	 * and kmap_atomic/kunmap_atomic() we can get access to the
+	 * pages directly. We then write the data directly into the
+	 * ring buffer.
+	 */
+	BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);
 
-	if (copy_from_user(buf, ubuf, cnt)) {
-		kfree(buf);
-		return -EFAULT;
+	/* check if we cross pages */
+	if ((addr & PAGE_MASK) != ((addr + cnt) & PAGE_MASK))
+		nr_pages = 2;
+
+	offset = addr & (PAGE_SIZE - 1);
+	addr &= PAGE_MASK;
+
+	ret = get_user_pages_fast(addr, nr_pages, 0, pages);
+	if (ret < nr_pages) {
+		while (--ret >= 0)
+			put_page(pages[ret]);
+		written = -EFAULT;
+		goto out;
 	}
-	if (buf[cnt-1] != '\n') {
-		buf[cnt] = '\n';
-		buf[cnt+1] = '\0';
+
+	page1 = kmap_atomic(pages[0]);
+	if (nr_pages == 2)
+		page2 = kmap_atomic(pages[1]);
+
+	local_save_flags(irq_flags);
+	size = sizeof(*entry) + cnt + 2; /* possible \n added */
+	buffer = global_trace.buffer;
+	event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
+					  irq_flags, preempt_count());
+	if (!event) {
+		/* Ring buffer disabled, return as if not open for write */
+		written = -EBADF;
+		goto out_unlock;
+	}
+
+	entry = ring_buffer_event_data(event);
+	entry->ip = _THIS_IP_;
+
+	if (nr_pages == 2) {
+		len = PAGE_SIZE - offset;
+		memcpy(&entry->buf, page1 + offset, len);
+		memcpy(&entry->buf[len], page2, cnt - len);
 	} else
-		buf[cnt] = '\0';
+		memcpy(&entry->buf, page1 + offset, cnt);
 
-	written = mark_printk("%s", buf);
-	kfree(buf);
-	*fpos += written;
+	if (entry->buf[cnt - 1] != '\n') {
+		entry->buf[cnt] = '\n';
+		entry->buf[cnt + 1] = '\0';
+	} else
+		entry->buf[cnt] = '\0';
+
+	ring_buffer_unlock_commit(buffer, event);
 
-	/* don't tell userspace we wrote more - it might confuse them */
-	if (written > cnt)
-		written = cnt;
+	written = cnt;
 
+	*fpos += written;
+
+ out_unlock:
+	if (nr_pages == 2)
+		kunmap_atomic(page2);
+	kunmap_atomic(page1);
+	while (nr_pages > 0)
+		put_page(pages[--nr_pages]);
+ out:
 	return written;
 }
 
-- 
1.7.6.3



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk()
  2011-10-08 17:02 [PATCH 0/3][RFC] trace_printk() using percpu buffers Steven Rostedt
  2011-10-08 17:02 ` [PATCH 1/3][RFC] tracing: Do not allocate buffer for trace_marker Steven Rostedt
@ 2011-10-08 17:02 ` Steven Rostedt
  2011-10-10 11:08   ` Peter Zijlstra
  2011-10-08 17:02 ` [PATCH 3/3][RFC] tracing: Add config to permanently have trace_printk() use percpu Steven Rostedt
  2011-10-10 11:04 ` [PATCH 0/3][RFC] trace_printk() using percpu buffers Peter Zijlstra
  3 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2011-10-08 17:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 12416 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently, trace_printk() uses a single buffer to write into
to calculate the size and format needed to save the trace. To
do this safely in an SMP environment, a spin_lock() is taken
to only allow one writer at a time to the buffer. But this could
also affect what is being traced, and add synchronization that
would not be there otherwise.

Ideally, using percpu buffers would be useful, but since trace_printk()
is only used in development, having per cpu buffers for something
never used is a waste of space. We could make this a compile option
but trace_printk() may also be used for developing modules, on a
distro kernels or for debugging at customer sites.

The approach taken here is to dynamically allocate percpu buffers
with a new tracing/options switch "trace_printk_percpu". It can
be allocated and freed at runtime, or "trace_printk_percpu" may also
be specified on the command line and it will be allocated at boot up.

This allows a developer to create percpu buffers for trace_printk()
on a running system, and also free them when not in used.

Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 Documentation/kernel-parameters.txt |   11 ++
 kernel/trace/trace.c                |  216 ++++++++++++++++++++++++++++-------
 kernel/trace/trace.h                |    1 +
 3 files changed, 187 insertions(+), 41 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e279b72..8fc3e58 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2521,6 +2521,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			to facilitate early boot debugging.
 			See also Documentation/trace/events.txt
 
+	trace_printk_percpu
+			[FTRACE] When using trace_printk(), by default the
+			temporary buffer is a single instance with a spinlock
+			to protect it. By adding this switch or enabling it at
+			runtime with:
+			echo 1 > /sys/kernel/debug/tracing/options/trace_printk_percpu
+			Per CPU buffers are allocated and trace_printk() will write
+			to them without taking any locks. Echo 0 into the above
+			file, and the percpu buffers will safely be freed.
+			Even if they were allocated with this kernel command line.
+
 	tsc=		Disable clocksource stability checks for TSC.
 			Format: <string>
 			[x86] reliable: mark tsc clocksource as reliable, this
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f86efe9..c59ab4c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -343,6 +343,13 @@ unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
 static int trace_stop_count;
 static DEFINE_SPINLOCK(tracing_start_lock);
 
+static int __init set_percpu_trace_buffer(char *str)
+{
+	trace_flags |= TRACE_ITER_PRINTK_PERCPU;
+	return 1;
+}
+__setup("trace_printk_percpu", set_percpu_trace_buffer);
+
 static void wakeup_work_handler(struct work_struct *work)
 {
 	wake_up(&trace_wait);
@@ -426,6 +433,7 @@ static const char *trace_options[] = {
 	"record-cmd",
 	"overwrite",
 	"disable_on_free",
+	"trace_printk_percpu",
 	NULL
 };
 
@@ -1442,25 +1450,98 @@ static void __trace_userstack(struct trace_array *tr, unsigned long flags)
 
 #endif /* CONFIG_STACKTRACE */
 
+static char trace_buf[TRACE_BUF_SIZE];
+static arch_spinlock_t trace_buf_lock =
+	(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
+/* created for use with alloc_percpu */
+struct trace_buffer_struct {
+	char buffer[TRACE_BUF_SIZE];
+};
+
+static struct trace_buffer_struct *trace_percpu_buffer;
+static struct trace_buffer_struct *trace_percpu_nmi_buffer;
+
+/*
+ * Since the buffers may be freed, we access the trace_percpu_buffer
+ * once, and then use it or not depending on if it exists.
+ * The code that frees it will NULL out trace_percpu_buffer first
+ * and then perform a synchronize_sched() to ensure all users
+ * are done with it.
+ *
+ * Interrupts are disabled around the get and put operations below.
+ */
+static char *get_trace_buf(struct trace_array *tr, int *put)
+{
+	struct trace_buffer_struct *percpu_buffer;
+	struct trace_buffer_struct *buffer;
+	struct trace_array_cpu *data;
+	int disable;
+	int cpu;
+
+	/*
+	 * If we have allocated per cpu buffers, then we do not
+	 * need to do any locking.
+	 */
+	if (in_nmi())
+		percpu_buffer = rcu_dereference_raw(trace_percpu_nmi_buffer);
+	else
+		percpu_buffer = rcu_dereference_raw(trace_percpu_buffer);
+
+	if (percpu_buffer) {
+		*put = 1;
+		buffer = per_cpu_ptr(percpu_buffer, smp_processor_id());
+		return buffer->buffer;
+	}
+
+	/*
+	 * We are using a single buffer, we can lose NMI data too.
+	 */
+	*put = 0;
+
+	cpu = raw_smp_processor_id();
+	data = tr->data[cpu];
+
+	disable = atomic_inc_return(&data->disabled);
+	if (unlikely(disable != 1)) {
+		atomic_dec_return(&data->disabled);
+		return NULL;
+	}
+
+	arch_spin_lock(&trace_buf_lock);
+	return trace_buf;
+}
+
+static void put_trace_buf(struct trace_array *tr, int put)
+{
+	struct trace_array_cpu *data;
+	int cpu;
+
+	if (put)
+		return;
+
+	cpu = raw_smp_processor_id();
+	data = tr->data[cpu];
+
+	arch_spin_unlock(&trace_buf_lock);
+	atomic_dec_return(&data->disabled);
+}
+
 /**
  * trace_vbprintk - write binary msg to tracing buffer
  *
  */
 int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 {
-	static arch_spinlock_t trace_buf_lock =
-		(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
-	static u32 trace_buf[TRACE_BUF_SIZE];
-
 	struct ftrace_event_call *call = &event_bprint;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
 	struct trace_array *tr = &global_trace;
-	struct trace_array_cpu *data;
 	struct bprint_entry *entry;
+	char *tbuffer;
 	unsigned long flags;
-	int disable;
-	int cpu, len = 0, size, pc;
+	int len = 0, size, pc;
+	int put;
 
 	if (unlikely(tracing_selftest_running || tracing_disabled))
 		return 0;
@@ -1470,19 +1551,18 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 
 	pc = preempt_count();
 	preempt_disable_notrace();
-	cpu = raw_smp_processor_id();
-	data = tr->data[cpu];
 
-	disable = atomic_inc_return(&data->disabled);
-	if (unlikely(disable != 1))
+	local_irq_save(flags);
+
+	tbuffer = get_trace_buf(tr, &put);
+	if (!tbuffer) {
+		len = 0;
 		goto out;
+	}
 
-	/* Lockdep uses trace_printk for lock tracing */
-	local_irq_save(flags);
-	arch_spin_lock(&trace_buf_lock);
-	len = vbin_printf(trace_buf, TRACE_BUF_SIZE, fmt, args);
+	len = vbin_printf((u32 *)tbuffer, TRACE_BUF_SIZE/sizeof(int), fmt, args);
 
-	if (len > TRACE_BUF_SIZE || len < 0)
+	if (len > TRACE_BUF_SIZE/sizeof(int) || len < 0)
 		goto out_unlock;
 
 	size = sizeof(*entry) + sizeof(u32) * len;
@@ -1495,18 +1575,17 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 	entry->ip			= ip;
 	entry->fmt			= fmt;
 
-	memcpy(entry->buf, trace_buf, sizeof(u32) * len);
+	memcpy(entry->buf, tbuffer, sizeof(u32) * len);
 	if (!filter_check_discard(call, entry, buffer, event)) {
 		ring_buffer_unlock_commit(buffer, event);
 		ftrace_trace_stack(buffer, flags, 6, pc);
 	}
 
 out_unlock:
-	arch_spin_unlock(&trace_buf_lock);
+	put_trace_buf(tr, put);
 	local_irq_restore(flags);
 
 out:
-	atomic_dec_return(&data->disabled);
 	preempt_enable_notrace();
 	unpause_graph_tracing();
 
@@ -1532,58 +1611,58 @@ int trace_array_printk(struct trace_array *tr,
 int trace_array_vprintk(struct trace_array *tr,
 			unsigned long ip, const char *fmt, va_list args)
 {
-	static arch_spinlock_t trace_buf_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-	static char trace_buf[TRACE_BUF_SIZE];
-
 	struct ftrace_event_call *call = &event_print;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
-	struct trace_array_cpu *data;
-	int cpu, len = 0, size, pc;
+	int len = 0, size, pc;
 	struct print_entry *entry;
-	unsigned long irq_flags;
-	int disable;
+	unsigned long flags;
+	char *tbuffer;
+	int put;
 
 	if (tracing_disabled || tracing_selftest_running)
 		return 0;
 
+	/* Don't pollute graph traces with trace_vprintk internals */
+	pause_graph_tracing();
+
 	pc = preempt_count();
 	preempt_disable_notrace();
-	cpu = raw_smp_processor_id();
-	data = tr->data[cpu];
 
-	disable = atomic_inc_return(&data->disabled);
-	if (unlikely(disable != 1))
+	local_irq_save(flags);
+
+	tbuffer = get_trace_buf(tr, &put);
+	if (!tbuffer) {
+		len = 0;
 		goto out;
+	}
 
-	pause_graph_tracing();
-	raw_local_irq_save(irq_flags);
-	arch_spin_lock(&trace_buf_lock);
-	len = vsnprintf(trace_buf, TRACE_BUF_SIZE, fmt, args);
+	len = vsnprintf(tbuffer, TRACE_BUF_SIZE, fmt, args);
+	if (len > TRACE_BUF_SIZE)
+		goto out_unlock;
 
 	size = sizeof(*entry) + len + 1;
 	buffer = tr->buffer;
 	event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
-					  irq_flags, pc);
+					  flags, pc);
 	if (!event)
 		goto out_unlock;
 	entry = ring_buffer_event_data(event);
 	entry->ip = ip;
 
-	memcpy(&entry->buf, trace_buf, len);
+	memcpy(&entry->buf, tbuffer, len);
 	entry->buf[len] = '\0';
 	if (!filter_check_discard(call, entry, buffer, event)) {
 		ring_buffer_unlock_commit(buffer, event);
-		ftrace_trace_stack(buffer, irq_flags, 6, pc);
+		ftrace_trace_stack(buffer, flags, 6, pc);
 	}
 
  out_unlock:
-	arch_spin_unlock(&trace_buf_lock);
-	raw_local_irq_restore(irq_flags);
-	unpause_graph_tracing();
+	put_trace_buf(tr, put);
+	local_irq_restore(flags);
  out:
-	atomic_dec_return(&data->disabled);
 	preempt_enable_notrace();
+	unpause_graph_tracing();
 
 	return len;
 }
@@ -2629,6 +2708,51 @@ static int set_tracer_option(struct tracer *trace, char *cmp, int neg)
 	return -EINVAL;
 }
 
+static void alloc_percpu_trace_buffer(void)
+{
+	struct trace_buffer_struct *buffers;
+	struct trace_buffer_struct *nmi_buffers;
+
+	if (!(trace_flags & TRACE_ITER_PRINTK_PERCPU))
+		return; /* Warn? */
+
+	buffers = alloc_percpu(struct trace_buffer_struct);
+	if (WARN(!buffers, "Could not allocate percpu trace_printk buffer")) {
+		trace_flags &= ~TRACE_ITER_PRINTK_PERCPU;
+		return;
+	}
+
+	nmi_buffers = alloc_percpu(struct trace_buffer_struct);
+	if (WARN(!nmi_buffers, "Could not allocate NMI percpu trace_printk buffer")) {
+		free_percpu(buffers);
+		trace_flags &= ~TRACE_ITER_PRINTK_PERCPU;
+		return;
+	}
+
+	trace_percpu_buffer = buffers;
+	trace_percpu_buffer = nmi_buffers;
+}
+
+static void free_percpu_trace_buffer(void)
+{
+	struct trace_buffer_struct *buffers;
+	struct trace_buffer_struct *nmi_buffers;
+
+	buffers = trace_percpu_buffer;
+	nmi_buffers = trace_percpu_nmi_buffer;
+	trace_percpu_buffer = NULL;
+	trace_percpu_nmi_buffer = NULL;
+	/*
+	 * As the users of these buffers disable preemption
+	 * then check if the buffers exist to use them or not,
+	 * we just need to wait till all tasks have scheduled
+	 * to free them.
+	 */
+	synchronize_sched();
+	free_percpu(buffers);
+	free_percpu(nmi_buffers);
+}
+
 static void set_tracer_flags(unsigned int mask, int enabled)
 {
 	/* do nothing if flag is already set */
@@ -2645,6 +2769,13 @@ static void set_tracer_flags(unsigned int mask, int enabled)
 
 	if (mask == TRACE_ITER_OVERWRITE)
 		ring_buffer_change_overwrite(global_trace.buffer, enabled);
+
+	if (mask == TRACE_ITER_PRINTK_PERCPU) {
+		if (enabled)
+			alloc_percpu_trace_buffer();
+		else
+			free_percpu_trace_buffer();
+	}
 }
 
 static ssize_t
@@ -4835,6 +4966,9 @@ __init static int tracer_alloc_buffers(void)
 	/* All seems OK, enable tracing */
 	tracing_disabled = 0;
 
+	if (trace_flags & TRACE_ITER_PRINTK_PERCPU)
+		alloc_percpu_trace_buffer();
+
 	atomic_notifier_chain_register(&panic_notifier_list,
 				       &trace_panic_notifier);
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index aa5d94a..d2b7b4c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -654,6 +654,7 @@ enum trace_iterator_flags {
 	TRACE_ITER_RECORD_CMD		= 0x100000,
 	TRACE_ITER_OVERWRITE		= 0x200000,
 	TRACE_ITER_STOP_ON_FREE		= 0x400000,
+	TRACE_ITER_PRINTK_PERCPU	= 0x800000,
 };
 
 /*
-- 
1.7.6.3



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3][RFC] tracing: Add config to permanently have trace_printk() use percpu
  2011-10-08 17:02 [PATCH 0/3][RFC] trace_printk() using percpu buffers Steven Rostedt
  2011-10-08 17:02 ` [PATCH 1/3][RFC] tracing: Do not allocate buffer for trace_marker Steven Rostedt
  2011-10-08 17:02 ` [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk() Steven Rostedt
@ 2011-10-08 17:02 ` Steven Rostedt
  2011-10-10 11:04 ` [PATCH 0/3][RFC] trace_printk() using percpu buffers Peter Zijlstra
  3 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2011-10-08 17:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 4650 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

For those that do a lot of developing, and use trace_printk() often.
There is no need to have a switch to change between percpu buffers and
a locked protected buffer. Just have the percpu buffers permanently
allocated via a config option and keep a fast and non-intrusive
tracing mechanism that can handle tracing in NMIs, but sacrifices
memory overhead. But developers don't mind that.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/Kconfig |   18 ++++++++++++++++++
 kernel/trace/trace.c |   41 ++++++++++++++++++++++++++++++++++++-----
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 2ad39e5..18bd377 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -223,6 +223,24 @@ config SCHED_TRACER
 	  This tracer tracks the latency of the highest priority task
 	  to be scheduled in, starting from the point it has woken up.
 
+config TRACE_PRINTK_PERCPU
+	bool "Allocate trace_printk() per_cpu buffers"
+	depends on TRACING
+	 help
+	  By default trace_printk() uses a singe 1KB buffer protected
+	  with a spinlock. It also may lose trace data from NMIs, if an
+	  NMI happens on a CPU that a trace is being recorded on.
+
+	  By default, per_cpu buffers can be allocated either by
+	  setting trace_printk_percpu on the kernel command line,
+	  or by enabling the trace option trace_printk_percpu.
+
+	  For developers that want it always set, by enabling this
+	  config, per_cpu buffers will be permanently set, removing
+	  any logic to check between single and per cpu buffers
+	  adding less impact to tracing with the added cost of more
+	  memory overhead.
+
 config ENABLE_DEFAULT_TRACERS
 	bool "Trace process context switches and events"
 	depends on !GENERIC_TRACER
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c59ab4c..b245a4e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -338,7 +338,11 @@ static DECLARE_WAIT_QUEUE_HEAD(trace_wait);
 /* trace_flags holds trace_options default values */
 unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
 	TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME |
-	TRACE_ITER_GRAPH_TIME | TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE;
+	TRACE_ITER_GRAPH_TIME | TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE
+#ifdef CONFIG_TRACE_PRINTK_PERCPU
+	| TRACE_ITER_PRINTK_PERCPU
+#endif
+	;
 
 static int trace_stop_count;
 static DEFINE_SPINLOCK(tracing_start_lock);
@@ -1450,15 +1454,32 @@ static void __trace_userstack(struct trace_array *tr, unsigned long flags)
 
 #endif /* CONFIG_STACKTRACE */
 
-static char trace_buf[TRACE_BUF_SIZE];
-static arch_spinlock_t trace_buf_lock =
-	(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
-
 /* created for use with alloc_percpu */
 struct trace_buffer_struct {
 	char buffer[TRACE_BUF_SIZE];
 };
 
+#ifdef CONFIG_TRACE_PRINTK_PERCPU
+static DEFINE_PER_CPU(struct trace_buffer_struct, trace_buffers);
+static DEFINE_PER_CPU(struct trace_buffer_struct, trace_nmi_buffers);
+
+static inline char *get_trace_buf(struct trace_array *tr, int *put)
+{
+	if (in_nmi())
+		return __get_cpu_var(trace_nmi_buffers).buffer;
+	else
+		return __get_cpu_var(trace_buffers).buffer;
+}
+
+static inline void put_trace_buf(struct trace_array *tr, int put)
+{
+}
+
+#else
+static char trace_buf[TRACE_BUF_SIZE];
+static arch_spinlock_t trace_buf_lock =
+	(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
 static struct trace_buffer_struct *trace_percpu_buffer;
 static struct trace_buffer_struct *trace_percpu_nmi_buffer;
 
@@ -1526,6 +1547,7 @@ static void put_trace_buf(struct trace_array *tr, int put)
 	arch_spin_unlock(&trace_buf_lock);
 	atomic_dec_return(&data->disabled);
 }
+#endif /* CONFIG_TRACE_PRINTK_PERCPU */
 
 /**
  * trace_vbprintk - write binary msg to tracing buffer
@@ -2708,6 +2730,7 @@ static int set_tracer_option(struct tracer *trace, char *cmp, int neg)
 	return -EINVAL;
 }
 
+#ifndef CONFIG_TRACE_PRINTK_PERCPU
 static void alloc_percpu_trace_buffer(void)
 {
 	struct trace_buffer_struct *buffers;
@@ -2752,6 +2775,14 @@ static void free_percpu_trace_buffer(void)
 	free_percpu(buffers);
 	free_percpu(nmi_buffers);
 }
+#else
+static void alloc_percpu_trace_buffer(void) { }
+static void free_percpu_trace_buffer(void)
+{
+	/* trace_printk() buffers are permanently on */
+	trace_flags |= TRACE_ITER_PRINTK_PERCPU;
+}
+#endif
 
 static void set_tracer_flags(unsigned int mask, int enabled)
 {
-- 
1.7.6.3



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3][RFC] trace_printk() using percpu buffers
  2011-10-08 17:02 [PATCH 0/3][RFC] trace_printk() using percpu buffers Steven Rostedt
                   ` (2 preceding siblings ...)
  2011-10-08 17:02 ` [PATCH 3/3][RFC] tracing: Add config to permanently have trace_printk() use percpu Steven Rostedt
@ 2011-10-10 11:04 ` Peter Zijlstra
  2011-10-10 12:31   ` Steven Rostedt
  3 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2011-10-10 11:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Sat, 2011-10-08 at 13:02 -0400, Steven Rostedt wrote:
> 
> Peter,
> 
> You had issues with the previous version of my trace_printk() code.
> I rewrote it to do the following.
> 
> By default, it still uses the single buffer protected by a spinlock
> and an atomic (for NMIs). The NMI case can cause dropped prints if
> the NMI happens while a trace_printk() is processing.

Why bother keeping that?

> When trace_printk_percpu is enabled, either via the trace options or
> the kernel command line, then two sets of percpu buffers are made,
> one for normal and irqs (interrupts are still disabled), and the other
> is for NMIs. These can be added or removed at anytime.

So why not allocate 4, one for {task, softirq, irq, NMI} resp, then all
you need to do is disable preemption.

depending on tracing/options/trace_printk ?

> The last patch adds a CONFIG_TRACE_PRINTK_PERCPU that makes trace_printk()
> permanently use two sets of per_cpu buffers, and these can not be
> removed. This will give the least amount of overhead for trace_printk()
> with the sacrifice of memory overhead. This is an option I could imagine
> you would just set and forget about.

Is that one dereference really that expensive?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk()
  2011-10-08 17:02 ` [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk() Steven Rostedt
@ 2011-10-10 11:08   ` Peter Zijlstra
  2011-10-10 12:37     ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2011-10-10 11:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Sat, 2011-10-08 at 13:02 -0400, Steven Rostedt wrote:
> 
> Ideally, using percpu buffers would be useful, but since trace_printk()
> is only used in development, having per cpu buffers for something
> never used is a waste of space. We could make this a compile option
> but trace_printk() may also be used for developing modules, on a
> distro kernels or for debugging at customer sites.
> 
> The approach taken here is to dynamically allocate percpu buffers
> with a new tracing/options switch "trace_printk_percpu". It can
> be allocated and freed at runtime, or "trace_printk_percpu" may also
> be specified on the command line and it will be allocated at boot up.
> 
> This allows a developer to create percpu buffers for trace_printk()
> on a running system, and also free them when not in used. 

Kill the old code, make tracing/events/trace_printk default to 0 and use
enabling of that to allocate your per-cpu buffers.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3][RFC] trace_printk() using percpu buffers
  2011-10-10 11:04 ` [PATCH 0/3][RFC] trace_printk() using percpu buffers Peter Zijlstra
@ 2011-10-10 12:31   ` Steven Rostedt
  2011-10-10 13:03     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2011-10-10 12:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Mon, 2011-10-10 at 13:04 +0200, Peter Zijlstra wrote:
> On Sat, 2011-10-08 at 13:02 -0400, Steven Rostedt wrote:
> > 
> > Peter,
> > 
> > You had issues with the previous version of my trace_printk() code.
> > I rewrote it to do the following.
> > 
> > By default, it still uses the single buffer protected by a spinlock
> > and an atomic (for NMIs). The NMI case can cause dropped prints if
> > the NMI happens while a trace_printk() is processing.
> 
> Why bother keeping that?

Because very few developers debug nmi's. printk is known not to work
there.

I still find it useful to have without having to switch on a config
option or kernel command line.

> 
> > When trace_printk_percpu is enabled, either via the trace options or
> > the kernel command line, then two sets of percpu buffers are made,
> > one for normal and irqs (interrupts are still disabled), and the other
> > is for NMIs. These can be added or removed at anytime.
> 
> So why not allocate 4, one for {task, softirq, irq, NMI} resp, then all
> you need to do is disable preemption.
> 
> depending on tracing/options/trace_printk ?

Preemption still needs to be disabled. But if you think that's better
than disabling interrupts, I could do that too.

> 
> > The last patch adds a CONFIG_TRACE_PRINTK_PERCPU that makes trace_printk()
> > permanently use two sets of per_cpu buffers, and these can not be
> > removed. This will give the least amount of overhead for trace_printk()
> > with the sacrifice of memory overhead. This is an option I could imagine
> > you would just set and forget about.
> 
> Is that one dereference really that expensive?

It's also a compare and jump, but I added this option for you :)

That way, you could set this option and forget about it.

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk()
  2011-10-10 11:08   ` Peter Zijlstra
@ 2011-10-10 12:37     ` Steven Rostedt
  2011-10-10 13:04       ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2011-10-10 12:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Mon, 2011-10-10 at 13:08 +0200, Peter Zijlstra wrote:
> On Sat, 2011-10-08 at 13:02 -0400, Steven Rostedt wrote:
> > 
> > Ideally, using percpu buffers would be useful, but since trace_printk()
> > is only used in development, having per cpu buffers for something
> > never used is a waste of space. We could make this a compile option
> > but trace_printk() may also be used for developing modules, on a
> > distro kernels or for debugging at customer sites.
> > 
> > The approach taken here is to dynamically allocate percpu buffers
> > with a new tracing/options switch "trace_printk_percpu". It can
> > be allocated and freed at runtime, or "trace_printk_percpu" may also
> > be specified on the command line and it will be allocated at boot up.
> > 
> > This allows a developer to create percpu buffers for trace_printk()
> > on a running system, and also free them when not in used. 
> 
> Kill the old code, make tracing/events/trace_printk default to 0 and use
> enabling of that to allocate your per-cpu buffers.

This will make debugging some of my embedded boards difficult (the ones
where I have no way to add kernel command lines). If there's a problem
at boot up, I will have to modify the kernel to get this to print.

For those that don't know the tracing code, it will just simply break.

Unless you want me to keep the kconfig option that enables it
permanently. Then I could do this.

I also expect tglx to throw frozen sharks at me for "why doesn't
trace_printk() work anymore?!!!!"

-- Steve




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3][RFC] trace_printk() using percpu buffers
  2011-10-10 12:31   ` Steven Rostedt
@ 2011-10-10 13:03     ` Peter Zijlstra
  2011-10-10 13:06       ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2011-10-10 13:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Mon, 2011-10-10 at 08:31 -0400, Steven Rostedt wrote:

> > > By default, it still uses the single buffer protected by a spinlock
> > > and an atomic (for NMIs). The NMI case can cause dropped prints if
> > > the NMI happens while a trace_printk() is processing.
> > 
> > Why bother keeping that?
> 
> Because very few developers debug nmi's. printk is known not to work
> there.
> 
> I still find it useful to have without having to switch on a config
> option or kernel command line.

But its also a massive scalability fail. There's simply no sane reason
to keep the shared buffer trace_printk() implementation.

> > > When trace_printk_percpu is enabled, either via the trace options or
> > > the kernel command line, then two sets of percpu buffers are made,
> > > one for normal and irqs (interrupts are still disabled), and the other
> > > is for NMIs. These can be added or removed at anytime.
> > 
> > So why not allocate 4, one for {task, softirq, irq, NMI} resp, then all
> > you need to do is disable preemption.
> > 
> > depending on tracing/options/trace_printk ?
> 
> Preemption still needs to be disabled. But if you think that's better
> than disabling interrupts, I could do that too.

I guess it doesn't really matter that much..

> > > The last patch adds a CONFIG_TRACE_PRINTK_PERCPU that makes trace_printk()
> > > permanently use two sets of per_cpu buffers, and these can not be
> > > removed. This will give the least amount of overhead for trace_printk()
> > > with the sacrifice of memory overhead. This is an option I could imagine
> > > you would just set and forget about.
> > 
> > Is that one dereference really that expensive?
> 
> It's also a compare and jump, but I added this option for you :)

Ah, right :-)

> That way, you could set this option and forget about it.

Well, if all we have is the per-cpu option I'm fine either way.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk()
  2011-10-10 12:37     ` Steven Rostedt
@ 2011-10-10 13:04       ` Peter Zijlstra
  2011-10-10 13:17         ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2011-10-10 13:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Mon, 2011-10-10 at 08:37 -0400, Steven Rostedt wrote:

> This will make debugging some of my embedded boards difficult (the ones
> where I have no way to add kernel command lines). If there's a problem
> at boot up, I will have to modify the kernel to get this to print.
> 
> For those that don't know the tracing code, it will just simply break.
> 
> Unless you want me to keep the kconfig option that enables it
> permanently. Then I could do this.

There's a fairly easy fix for that, initialize it at boot, and free once
userspace it up and running if no trace_printk() came by.

> I also expect tglx to throw frozen sharks at me for "why doesn't
> trace_printk() work anymore?!!!!"

I'm sure you'll get over that :-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3][RFC] trace_printk() using percpu buffers
  2011-10-10 13:03     ` Peter Zijlstra
@ 2011-10-10 13:06       ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2011-10-10 13:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Mon, 2011-10-10 at 15:03 +0200, Peter Zijlstra wrote:
> On Mon, 2011-10-10 at 08:31 -0400, Steven Rostedt wrote:
> 
> > > > By default, it still uses the single buffer protected by a spinlock
> > > > and an atomic (for NMIs). The NMI case can cause dropped prints if
> > > > the NMI happens while a trace_printk() is processing.
> > > 
> > > Why bother keeping that?
> > 
> > Because very few developers debug nmi's. printk is known not to work
> > there.
> > 
> > I still find it useful to have without having to switch on a config
> > option or kernel command line.
> 
> But its also a massive scalability fail. There's simply no sane reason
> to keep the shared buffer trace_printk() implementation.

It was created as a "fast printk". Printk has the same scalability
issues, and other issues as well. It matters what you are debugging and
in most cases the single buffer has worked fine.

Nobody has yet to complain about it. Except for you when I told you how
it worked. ;)



> > That way, you could set this option and forget about it.
> 
> Well, if all we have is the per-cpu option I'm fine either way.

Good! This patch set improves trace_printk() from what it currently is
to something that will work for you and not break the way it use to work
for others.

I'll keep this patchset as is.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk()
  2011-10-10 13:04       ` Peter Zijlstra
@ 2011-10-10 13:17         ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2011-10-10 13:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Frederic Weisbecker

On Mon, 2011-10-10 at 15:04 +0200, Peter Zijlstra wrote:
> On Mon, 2011-10-10 at 08:37 -0400, Steven Rostedt wrote:
> 
> > This will make debugging some of my embedded boards difficult (the ones
> > where I have no way to add kernel command lines). If there's a problem
> > at boot up, I will have to modify the kernel to get this to print.
> > 
> > For those that don't know the tracing code, it will just simply break.
> > 
> > Unless you want me to keep the kconfig option that enables it
> > permanently. Then I could do this.
> 
> There's a fairly easy fix for that, initialize it at boot, and free once
> userspace it up and running if no trace_printk() came by.

Hmm, I could do that. Of course this will add a check for a global
variable in all trace_printks. I could make it a read_mostly variable,
as it only gets updated once, or even encapsulate the entire check with
a static_switch() :)


> 
> > I also expect tglx to throw frozen sharks at me for "why doesn't
> > trace_printk() work anymore?!!!!"
> 
> I'm sure you'll get over that :-)

I've had too many, I'm not sure my head could take much more.

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-10-10 13:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-08 17:02 [PATCH 0/3][RFC] trace_printk() using percpu buffers Steven Rostedt
2011-10-08 17:02 ` [PATCH 1/3][RFC] tracing: Do not allocate buffer for trace_marker Steven Rostedt
2011-10-08 17:02 ` [PATCH 2/3][RFC] tracing: Add optional percpu buffers for trace_printk() Steven Rostedt
2011-10-10 11:08   ` Peter Zijlstra
2011-10-10 12:37     ` Steven Rostedt
2011-10-10 13:04       ` Peter Zijlstra
2011-10-10 13:17         ` Steven Rostedt
2011-10-08 17:02 ` [PATCH 3/3][RFC] tracing: Add config to permanently have trace_printk() use percpu Steven Rostedt
2011-10-10 11:04 ` [PATCH 0/3][RFC] trace_printk() using percpu buffers Peter Zijlstra
2011-10-10 12:31   ` Steven Rostedt
2011-10-10 13:03     ` Peter Zijlstra
2011-10-10 13:06       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).