linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely
@ 2014-06-26 21:49 Steven Rostedt
  2014-06-26 21:49 ` [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function Steven Rostedt
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Steven Rostedt @ 2014-06-26 21:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Jiri Kosina,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov


This is version 2:

This is my proposal to print the NMI stack traces from an RCU stall safely.
Here's the gist of it.

Patch 1: add trace_seq_buffer_ptr() helper function to clean up
   open coded trace_seq code. This is already in my 3.17 queue.

Patch 2: Add a new layer to trace_seq called seq_buf, that the trace
   seq uses. The seq_buf is more generic, and does not supply its own
   buffer. The buffer must be supplied when initializing the seq_buf.
   Note, this patch too may be going into 3.17. There's some places in
   the tracing code that I wanted to use a different size buffer
   but still needed trace_seq to have a pre-allocated buffer. Having
   this new layer will help me out. I have not added this to 3.17 until
   I decide to make use of it.

Patch 3: move the seq_buf out of the tracing code. It's useful for other
 purposes too. Like writing from an NMI context.

Patch 4: Add a per_cpu "printk_func" that printk calls. By default it calls
 vprintk_def() which does what it has always done. This allows us to
 override what printk() calls normally on a per cpu basis.

Patch 5: Have the NMI handler that dumps the stack trace just change the
 printk_func to call a NMI safe printk function that writes to a per cpu
 seq_buf. When all NMI handlers chimed in, the original caller prints
 out the seq_buf buffers for each CPU from a printk safe context.

This is much less intrusive than the other versions out there.

Changes since V1:

- Note, I based this off of my 3.17 queue that already updated trace_seq
  with a lot of comments from Andrew Morton. His comments have already
  been incorporated into the trace_seq.c file. This patch set is on top
  of those.

- Added a change that is in my 3.17 queue that cleans up open coded
  calculations of the trace_seq buffer to get the current location of
  the buffer.

- Biggest change is the added seq_buf. I'm keeping trace_seq doing the
  stop everything once it fills up. But seq_buf will act more like other
  utilities, as it will return what I would have written, and fills
  up the buffer as much as it can. It sets an overflow flag to test to
  see if there wasn't enough buffer space.

- I updated the last patch to use the seq_buf instead of the trace_seq
  by using its own struct nmi_seq_buf that allocates the buffer used.
  I also made some updates from previous reviews like not stripping the
  log level from the NMI printks and just using it in the final output
  as well as adding more comments and fixing the "return true" on a void
  function.

Thoughts?


Steven Rostedt (Red Hat) (5):
      tracing: Add trace_seq_buffer_ptr() helper function
      tracing: Create seq_buf layer in trace_seq
      seq_buf: Move the seq_buf code to lib/
      printk: Add per_cpu printk func to allow printk to be diverted
      x86/nmi: Perform a safe NMI stack trace on all CPUs

----
 arch/x86/kernel/apic/hw_nmi.c        |  84 ++++++++-
 arch/x86/kvm/mmutrace.h              |   2 +-
 drivers/scsi/scsi_trace.c            |  16 +-
 include/linux/percpu.h               |   3 +
 include/linux/printk.h               |   2 +
 include/linux/seq_buf.h              |  58 ++++++
 include/linux/trace_seq.h            |  23 ++-
 kernel/printk/printk.c               |  38 +++-
 kernel/trace/Makefile                |   1 +
 kernel/trace/trace.c                 |  39 ++--
 kernel/trace/trace_events.c          |   6 +-
 kernel/trace/trace_functions_graph.c |   6 +-
 kernel/trace/trace_output.c          |  14 +-
 kernel/trace/trace_seq.c             | 184 +++++++++---------
 lib/Makefile                         |   2 +-
 lib/seq_buf.c                        | 348 +++++++++++++++++++++++++++++++++++
 lib/trace_seq.c                      | 303 ++++++++++++++++++++++++++++++
 17 files changed, 976 insertions(+), 153 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function
  2014-06-26 21:49 [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Steven Rostedt
@ 2014-06-26 21:49 ` Steven Rostedt
  2014-06-27  1:06   ` Steven Rostedt
  2014-06-26 21:49 ` [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq Steven Rostedt
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-26 21:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Jiri Kosina,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

[-- Attachment #1: 0001-tracing-Add-trace_seq_buffer_ptr-helper-function.patch --]
[-- Type: text/plain, Size: 6826 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

There's several locations in the kernel that open code the calculation
of the next location in the trace_seq buffer. This is usually done with

  p->buffer + p->len

Instead of having this open coded, supply a helper function in the
header to do it for them. This function is called trace_seq_buffer_ptr().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kvm/mmutrace.h     |  2 +-
 drivers/scsi/scsi_trace.c   | 16 ++++++++--------
 include/linux/trace_seq.h   | 15 +++++++++++++++
 kernel/trace/trace_output.c | 14 +++++++-------
 4 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 9d2e0ffcb190..2e5652b62fd6 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -22,7 +22,7 @@
 	__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
-	const char *ret = p->buffer + p->len;				\
+	const char *ret = trace_seq_buffer_ptr(p);			\
 	static const char *access_str[] = {			        \
 		"---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"  \
 	};							        \
diff --git a/drivers/scsi/scsi_trace.c b/drivers/scsi/scsi_trace.c
index 2bea4f0b684a..503594e5f76d 100644
--- a/drivers/scsi/scsi_trace.c
+++ b/drivers/scsi/scsi_trace.c
@@ -28,7 +28,7 @@ scsi_trace_misc(struct trace_seq *, unsigned char *, int);
 static const char *
 scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 	sector_t lba = 0, txlen = 0;
 
 	lba |= ((cdb[1] & 0x1F) << 16);
@@ -46,7 +46,7 @@ scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
 static const char *
 scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 	sector_t lba = 0, txlen = 0;
 
 	lba |= (cdb[2] << 24);
@@ -71,7 +71,7 @@ scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
 static const char *
 scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 	sector_t lba = 0, txlen = 0;
 
 	lba |= (cdb[2] << 24);
@@ -94,7 +94,7 @@ scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
 static const char *
 scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 	sector_t lba = 0, txlen = 0;
 
 	lba |= ((u64)cdb[2] << 56);
@@ -125,7 +125,7 @@ scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
 static const char *
 scsi_trace_rw32(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len, *cmd;
+	const char *ret = trace_seq_buffer_ptr(p), *cmd;
 	sector_t lba = 0, txlen = 0;
 	u32 ei_lbrt = 0;
 
@@ -180,7 +180,7 @@ out:
 static const char *
 scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 	unsigned int regions = cdb[7] << 8 | cdb[8];
 
 	trace_seq_printf(p, "regions=%u", (regions - 8) / 16);
@@ -192,7 +192,7 @@ scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
 static const char *
 scsi_trace_service_action_in(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len, *cmd;
+	const char *ret = trace_seq_buffer_ptr(p), *cmd;
 	sector_t lba = 0;
 	u32 alloc_len = 0;
 
@@ -247,7 +247,7 @@ scsi_trace_varlen(struct trace_seq *p, unsigned char *cdb, int len)
 static const char *
 scsi_trace_misc(struct trace_seq *p, unsigned char *cdb, int len)
 {
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 
 	trace_seq_printf(p, "-");
 	trace_seq_putc(p, 0);
diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
index dd85753e1bb0..ea6c9dea79e3 100644
--- a/include/linux/trace_seq.h
+++ b/include/linux/trace_seq.h
@@ -25,6 +25,21 @@ trace_seq_init(struct trace_seq *s)
 	s->full = 0;
 }
 
+/**
+ * trace_seq_buffer_ptr - return pointer to next location in buffer
+ * @s: trace sequence descriptor
+ *
+ * Returns the pointer to the buffer where the next write to
+ * the buffer will happen. This is useful to save the location
+ * that is about to be written to and then return the result
+ * of that write.
+ */
+static inline unsigned char *
+trace_seq_buffer_ptr(struct trace_seq *s)
+{
+	return s->buffer + s->len;
+}
+
 /*
  * Currently only defined when tracing is enabled.
  */
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index b8930f79a04b..c6977d5a9b12 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -75,7 +75,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
 {
 	unsigned long mask;
 	const char *str;
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 	int i, first = 1;
 
 	for (i = 0;  flag_array[i].name && flags; i++) {
@@ -111,7 +111,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
 			 const struct trace_print_flags *symbol_array)
 {
 	int i;
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 
 	for (i = 0;  symbol_array[i].name; i++) {
 
@@ -122,7 +122,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
 		break;
 	}
 
-	if (ret == (const char *)(p->buffer + p->len))
+	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
 		trace_seq_printf(p, "0x%lx", val);
 		
 	trace_seq_putc(p, 0);
@@ -137,7 +137,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
 			 const struct trace_print_flags_u64 *symbol_array)
 {
 	int i;
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 
 	for (i = 0;  symbol_array[i].name; i++) {
 
@@ -148,7 +148,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
 		break;
 	}
 
-	if (ret == (const char *)(p->buffer + p->len))
+	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
 		trace_seq_printf(p, "0x%llx", val);
 
 	trace_seq_putc(p, 0);
@@ -162,7 +162,7 @@ const char *
 ftrace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr,
 			 unsigned int bitmask_size)
 {
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 
 	trace_seq_bitmask(p, bitmask_ptr, bitmask_size * 8);
 	trace_seq_putc(p, 0);
@@ -175,7 +175,7 @@ const char *
 ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len)
 {
 	int i;
-	const char *ret = p->buffer + p->len;
+	const char *ret = trace_seq_buffer_ptr(p);
 
 	for (i = 0; i < buf_len; i++)
 		trace_seq_printf(p, "%s%2.2x", i == 0 ? "" : " ", buf[i]);
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-26 21:49 [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Steven Rostedt
  2014-06-26 21:49 ` [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function Steven Rostedt
@ 2014-06-26 21:49 ` Steven Rostedt
  2014-06-27 13:45   ` Petr Mládek
  2014-06-26 21:49 ` [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/ Steven Rostedt
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-26 21:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Jiri Kosina,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

[-- Attachment #1: 0002-tracing-Create-seq_buf-layer-in-trace_seq.patch --]
[-- Type: text/plain, Size: 26028 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Create a seq_buf layer that trace_seq sits on. The seq_buf will not
be limited to page size. This will allow other usages of seq_buf
instead of a hard set PAGE_SIZE one that trace_seq has.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/seq_buf.h              |  58 ++++++
 include/linux/trace_seq.h            |  10 +-
 kernel/trace/Makefile                |   1 +
 kernel/trace/seq_buf.c               | 348 +++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c                 |  39 ++--
 kernel/trace/trace_events.c          |   6 +-
 kernel/trace/trace_functions_graph.c |   6 +-
 kernel/trace/trace_seq.c             | 184 +++++++++---------
 8 files changed, 527 insertions(+), 125 deletions(-)
 create mode 100644 include/linux/seq_buf.h
 create mode 100644 kernel/trace/seq_buf.c

diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
new file mode 100644
index 000000000000..23b3fa2bd0cc
--- /dev/null
+++ b/include/linux/seq_buf.h
@@ -0,0 +1,58 @@
+#ifndef _LINUX_SEQ_BUF_H
+#define _LINUX_SEQ_BUF_H
+
+#include <linux/fs.h>
+
+#include <asm/page.h>
+
+/*
+ * Trace sequences are used to allow a function to call several other functions
+ * to create a string of data to use.
+ */
+
+/**
+ * seq_buf - seq buffer structure
+ * @buffer:	pointer to the buffer
+ * @size:	size of the buffer
+ * @len:	the amount of data inside the buffer
+ * @readpos:	The next position to read in the buffer.
+ * @overflow:	Set if more bytes should have been written to buffer
+ */
+struct seq_buf {
+	unsigned char		*buffer;
+	unsigned int		size;
+	unsigned int		len;
+	unsigned int		readpos;
+	unsigned int		overflow;
+};
+
+static inline void
+seq_buf_init(struct seq_buf *s, unsigned char *buf, unsigned int size)
+{
+	s->buffer = buf;
+	s->size = size;
+	s->len = 0;
+	s->readpos = 0;
+	s->overflow = 0;
+}
+
+extern __printf(2, 3)
+int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
+extern __printf(2, 0)
+int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args);
+extern int
+seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary);
+extern int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s);
+extern int seq_buf_to_user(struct seq_buf *s, char __user *ubuf,
+			   int cnt);
+extern int seq_buf_puts(struct seq_buf *s, const char *str);
+extern int seq_buf_putc(struct seq_buf *s, unsigned char c);
+extern int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len);
+extern int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
+			      unsigned int len);
+extern int seq_buf_path(struct seq_buf *s, const struct path *path);
+
+extern int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
+			   int nmaskbits);
+
+#endif /* _LINUX_SEQ_BUF_H */
diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
index ea6c9dea79e3..27c98fd76503 100644
--- a/include/linux/trace_seq.h
+++ b/include/linux/trace_seq.h
@@ -1,7 +1,7 @@
 #ifndef _LINUX_TRACE_SEQ_H
 #define _LINUX_TRACE_SEQ_H
 
-#include <linux/fs.h>
+#include <linux/seq_buf.h>
 
 #include <asm/page.h>
 
@@ -12,16 +12,14 @@
 
 struct trace_seq {
 	unsigned char		buffer[PAGE_SIZE];
-	unsigned int		len;
-	unsigned int		readpos;
+	struct seq_buf		seq;
 	int			full;
 };
 
 static inline void
 trace_seq_init(struct trace_seq *s)
 {
-	s->len = 0;
-	s->readpos = 0;
+	seq_buf_init(&s->seq, s->buffer, PAGE_SIZE);
 	s->full = 0;
 }
 
@@ -37,7 +35,7 @@ trace_seq_init(struct trace_seq *s)
 static inline unsigned char *
 trace_seq_buffer_ptr(struct trace_seq *s)
 {
-	return s->buffer + s->len;
+	return s->buffer + s->seq.len;
 }
 
 /*
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 67d6369ddf83..edc98c72a634 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o
 obj-$(CONFIG_TRACING) += trace.o
 obj-$(CONFIG_TRACING) += trace_output.o
 obj-$(CONFIG_TRACING) += trace_seq.o
+obj-$(CONFIG_TRACING) += seq_buf.o
 obj-$(CONFIG_TRACING) += trace_stat.o
 obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
new file mode 100644
index 000000000000..b7e33c18540a
--- /dev/null
+++ b/kernel/trace/seq_buf.c
@@ -0,0 +1,348 @@
+/*
+ * seq_buf.c
+ *
+ * Copyright (C) 2014 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
+ *
+ * The seq_buf is a handy tool that allows you to pass a descriptor around
+ * to a buffer that other functions can write to. It is similar to the
+ * seq_file functionality but has some differences.
+ *
+ * To use it, the seq_buf must be initialized with seq_buf_init().
+ * This will set up the counters within the descriptor. You can call
+ * seq_buf_init() more than once to reset the seq_buf to start
+ * from scratch.
+ * 
+ */
+#include <linux/uaccess.h>
+#include <linux/seq_file.h>
+#include <linux/seq_buf.h>
+
+/* How much buffer is left on the seq_buf? */
+#define SEQ_BUF_LEFT(s) (((s)->size - 1) - (s)->len)
+
+/* How much buffer is written? */
+#define SEQ_BUF_USED(s) min((s)->len, (s)->size - 1)
+
+static inline void seq_buf_check_len(struct seq_buf *s)
+{
+	if (unlikely(s->len > (s->size - 1))) {
+		s->len = s->size - 1;
+		s->overflow = 1;
+	}
+}
+
+/**
+ * seq_buf_print_seq - move the contents of seq_buf into a seq_file
+ * @m: the seq_file descriptor that is the destination
+ * @s: the seq_buf descriptor that is the source.
+ *
+ * Returns zero on success, non zero otherwise
+ */
+int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
+{
+	unsigned int len = SEQ_BUF_USED(s);
+
+	return seq_write(m, s->buffer, len);
+}
+
+/**
+ * seq_buf_printf - sequence printing of trace information
+ * @s: seq_buf descriptor
+ * @fmt: printf format string
+ *
+ * Writes a printf() format into the sequence buffer.
+ *
+ * Returns number of bytes written.
+ */
+int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	va_list ap;
+	int ret;
+
+	WARN_ON((int)len < 0);
+	va_start(ap, fmt);
+	ret = vsnprintf(s->buffer + s->len, len, fmt, ap);
+	va_end(ap);
+
+	s->len += ret;
+
+	seq_buf_check_len(s);
+
+	return ret;
+}
+
+/**
+ * seq_buf_bitmask - write a bitmask array in its ASCII representation
+ * @s:		seq_buf descriptor
+ * @maskp:	points to an array of unsigned longs that represent a bitmask
+ * @nmaskbits:	The number of bits that are valid in @maskp
+ *
+ * Writes a ASCII representation of a bitmask string into @s.
+ *
+ * Returns the number of bytes written.
+ */
+int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
+		    int nmaskbits)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	int ret;
+
+	WARN_ON((int)len < 0);
+	ret = bitmap_scnprintf(s->buffer, len, maskp, nmaskbits);
+	s->len += ret;
+	seq_buf_check_len(s);
+
+	return ret;
+}
+
+/**
+ * seq_buf_vprintf - write vprintf style into the sequence buffer
+ * @s: seq_buf descriptor
+ * @fmt: printf format string
+ *
+ * Write a vprintf() style into the sequence buffer.
+ *
+ * Returns the number of bytes written.
+ */
+int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	int ret;
+
+	if (WARN_ON((int)len < 0))
+		printk("len=%d size=%d s->len=%d\n", len, s->size, s->len);
+	ret = vsnprintf(s->buffer + s->len, len, fmt, args);
+	s->len += ret;
+	seq_buf_check_len(s);
+
+	return len;
+}
+
+/**
+ * seq_buf_bprintf - Write the printf string from binary arguments
+ * @s: seq_buf descriptor
+ * @fmt: The format string for the @binary arguments
+ * @binary: The binary arguments for @fmt.
+ *
+ * When recording in a fast path, a printf may be recorded with just
+ * saving the format and the arguments as they were passed to the
+ * function, instead of wasting cycles converting the arguments into
+ * ASCII characters. Instead, the arguments are saved in a 32 bit
+ * word array that is defined by the format string constraints.
+ *
+ * This function will take the format and the binary array and finish
+ * the conversion into the ASCII string within the buffer.
+ *
+ * Returns number of bytes written.
+ */
+int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	int ret;
+
+	if (WARN_ON((int)len < 0))
+		printk("len=%d size=%d s->len=%d\n", len, s->size, s->len);
+	ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
+	s->len += ret;
+	seq_buf_check_len(s);
+
+	return len;
+}
+
+/**
+ * seq_buf_puts - sequence printing of simple string
+ * @s: seq_buf descriptor
+ * @str: simple string to record
+ *
+ * Copy a simple string into the sequence buffer.
+ *
+ * Returns number of bytes written.
+ */
+int seq_buf_puts(struct seq_buf *s, const char *str)
+{
+	unsigned int len = strlen(str);
+
+	if (len > SEQ_BUF_LEFT(s)) {
+		s->overflow = 1;
+		len = SEQ_BUF_LEFT(s);
+	}
+
+	memcpy(s->buffer + s->len, str, len);
+	s->len += len;
+	WARN_ON(s->len > (s->size - 1));
+
+	return len;
+}
+
+/**
+ * seq_buf_putc - sequence printing of simple character
+ * @s: seq_buf descriptor
+ * @c: simple character to record
+ *
+ * Copy a single character into the sequence buffer.
+ *
+ * Returns 1 if the character was written, 0 otherwise.
+ */
+int seq_buf_putc(struct seq_buf *s, unsigned char c)
+{
+	if (SEQ_BUF_LEFT(s) < 1) {
+		s->overflow = 1;
+		return 0;
+	}
+
+	s->buffer[s->len++] = c;
+	WARN_ON(s->len > (s->size - 1));
+
+	return 1;
+}
+
+/**
+ * seq_buf_putmem - write raw data into the sequenc buffer
+ * @s: seq_buf descriptor
+ * @mem: The raw memory to copy into the buffer
+ * @len: The length of the raw memory to copy (in bytes)
+ *
+ * There may be cases where raw memory needs to be written into the
+ * buffer and a strcpy() would not work. Using this function allows
+ * for such cases.
+ *
+ * Returns the number of bytes written in the buffer.
+ */
+int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
+{
+	if (len > SEQ_BUF_LEFT(s)) {
+		s->overflow = 1;
+		len = SEQ_BUF_LEFT(s);
+	}
+
+	memcpy(s->buffer + s->len, mem, len);
+	s->len += len;
+	WARN_ON(s->len > (s->size - 1));
+
+	return len;
+}
+
+#define MAX_MEMHEX_BYTES	8U
+#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
+
+/**
+ * seq_buf_putmem_hex - write raw memory into the buffer in ASCII hex
+ * @s: seq_buf descriptor
+ * @mem: The raw memory to write its hex ASCII representation of
+ * @len: The length of the raw memory to copy (in bytes)
+ *
+ * This is similar to seq_buf_putmem() except instead of just copying the
+ * raw memory into the buffer it writes its ASCII representation of it
+ * in hex characters.
+ *
+ * Returns how much it wrote to the buffer.
+ */
+int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
+		       unsigned int len)
+{
+	unsigned char hex[HEX_CHARS];
+	const unsigned char *data = mem;
+	unsigned int start_len;
+	int i, j;
+	int cnt = 0;
+
+	while (len) {
+		start_len = min(len, HEX_CHARS - 1);
+#ifdef __BIG_ENDIAN
+		for (i = 0, j = 0; i < start_len; i++) {
+#else
+		for (i = start_len-1, j = 0; i >= 0; i--) {
+#endif
+			hex[j++] = hex_asc_hi(data[i]);
+			hex[j++] = hex_asc_lo(data[i]);
+		}
+		if (WARN_ON_ONCE(j == 0 || j/2 > len))
+			break;
+
+		/* j increments twice per loop */
+		len -= j / 2;
+		hex[j++] = ' ';
+
+		cnt += seq_buf_putmem(s, hex, j);
+	}
+	return cnt;
+}
+
+/**
+ * seq_buf_path - copy a path into the sequence buffer
+ * @s: seq_buf descriptor
+ * @path: path to write into the sequence buffer.
+ *
+ * Write a path name into the sequence buffer.
+ *
+ * Returns the number of bytes written into the buffer.
+ */
+int seq_buf_path(struct seq_buf *s, const struct path *path)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	unsigned char *p;
+	unsigned int start = s->len;
+
+	WARN_ON((int)len < 0);
+	p = d_path(path, s->buffer + s->len, len);
+	if (!IS_ERR(p)) {
+		p = mangle_path(s->buffer + s->len, p, "\n");
+		if (p) {
+			s->len = p - s->buffer;
+	WARN_ON(s->len > (s->size - 1));
+			return s->len - start;
+		}
+	} else {
+		s->buffer[s->len++] = '?';
+	WARN_ON(s->len > (s->size - 1));
+		return s->len - start;
+	}
+
+	s->overflow = 1;
+	return 0;
+}
+
+/**
+ * seq_buf_to_user - copy the squence buffer to user space
+ * @s: seq_buf descriptor
+ * @ubuf: The userspace memory location to copy to
+ * @cnt: The amount to copy
+ *
+ * Copies the sequence buffer into the userspace memory pointed to
+ * by @ubuf. It starts from the last read position (@s->readpos)
+ * and writes up to @cnt characters or till it reaches the end of
+ * the content in the buffer (@s->len), which ever comes first.
+ *
+ * On success, it returns a positive number of the number of bytes
+ * it copied.
+ *
+ * On failure it returns -EBUSY if all of the content in the
+ * sequence has been already read, which includes nothing in the
+ * sequenc (@s->len == @s->readpos).
+ *
+ * Returns -EFAULT if the copy to userspace fails.
+ */
+int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, int cnt)
+{
+	int len;
+	int ret;
+
+	if (!cnt)
+		return 0;
+
+	if (s->len <= s->readpos)
+		return -EBUSY;
+
+	len = s->len - s->readpos;
+	if (cnt > len)
+		cnt = len;
+	ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt);
+	if (ret == cnt)
+		return -EFAULT;
+
+	cnt -= ret;
+
+	s->readpos += cnt;
+	return cnt;
+}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4caa814d41c3..0e4894823a2f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -923,19 +923,20 @@ out:
 	return ret;
 }
 
+/* TODO add a seq_buf_to_buffer() */
 static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt)
 {
 	int len;
 
-	if (s->len <= s->readpos)
+	if (s->seq.len <= s->seq.readpos)
 		return -EBUSY;
 
-	len = s->len - s->readpos;
+	len = s->seq.len - s->seq.readpos;
 	if (cnt > len)
 		cnt = len;
-	memcpy(buf, s->buffer + s->readpos, cnt);
+	memcpy(buf, s->buffer + s->seq.readpos, cnt);
 
-	s->readpos += cnt;
+	s->seq.readpos += cnt;
 	return cnt;
 }
 
@@ -4255,6 +4256,8 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 		goto out;
 	}
 
+	trace_seq_init(&iter->seq);
+
 	/*
 	 * We make a copy of the current tracer to avoid concurrent
 	 * changes on it while we are reading.
@@ -4451,18 +4454,18 @@ waitagain:
 	trace_access_lock(iter->cpu_file);
 	while (trace_find_next_entry_inc(iter) != NULL) {
 		enum print_line_t ret;
-		int len = iter->seq.len;
+		int len = iter->seq.seq.len;
 
 		ret = print_trace_line(iter);
 		if (ret == TRACE_TYPE_PARTIAL_LINE) {
 			/* don't print partial lines */
-			iter->seq.len = len;
+			iter->seq.seq.len = len;
 			break;
 		}
 		if (ret != TRACE_TYPE_NO_CONSUME)
 			trace_consume(iter);
 
-		if (iter->seq.len >= cnt)
+		if (iter->seq.seq.len >= cnt)
 			break;
 
 		/*
@@ -4478,7 +4481,7 @@ waitagain:
 
 	/* Now copy what we have to the user */
 	sret = trace_seq_to_user(&iter->seq, ubuf, cnt);
-	if (iter->seq.readpos >= iter->seq.len)
+	if (iter->seq.seq.readpos >= iter->seq.seq.len)
 		trace_seq_init(&iter->seq);
 
 	/*
@@ -4516,16 +4519,16 @@ tracing_fill_pipe_page(size_t rem, struct trace_iterator *iter)
 
 	/* Seq buffer is page-sized, exactly what we need. */
 	for (;;) {
-		count = iter->seq.len;
+		count = iter->seq.seq.len;
 		ret = print_trace_line(iter);
-		count = iter->seq.len - count;
+		count = iter->seq.seq.len - count;
 		if (rem < count) {
 			rem = 0;
-			iter->seq.len -= count;
+			iter->seq.seq.len -= count;
 			break;
 		}
 		if (ret == TRACE_TYPE_PARTIAL_LINE) {
-			iter->seq.len -= count;
+			iter->seq.seq.len -= count;
 			break;
 		}
 
@@ -4606,13 +4609,13 @@ static ssize_t tracing_splice_read_pipe(struct file *filp,
 		/* Copy the data into the page, so we can start over. */
 		ret = trace_seq_to_buffer(&iter->seq,
 					  page_address(spd.pages[i]),
-					  iter->seq.len);
+					  iter->seq.seq.len);
 		if (ret < 0) {
 			__free_page(spd.pages[i]);
 			break;
 		}
 		spd.partial[i].offset = 0;
-		spd.partial[i].len = iter->seq.len;
+		spd.partial[i].len = iter->seq.seq.len;
 
 		trace_seq_init(&iter->seq);
 	}
@@ -5606,7 +5609,7 @@ tracing_stats_read(struct file *filp, char __user *ubuf,
 	cnt = ring_buffer_read_events_cpu(trace_buf->buffer, cpu);
 	trace_seq_printf(s, "read events: %ld\n", cnt);
 
-	count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->len);
+	count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->seq.len);
 
 	kfree(s);
 
@@ -6569,11 +6572,11 @@ void
 trace_printk_seq(struct trace_seq *s)
 {
 	/* Probably should print a warning here. */
-	if (s->len >= TRACE_MAX_PRINT)
-		s->len = TRACE_MAX_PRINT;
+	if (s->seq.len >= TRACE_MAX_PRINT)
+		s->seq.len = TRACE_MAX_PRINT;
 
 	/* should be zero ended, but we are paranoid. */
-	s->buffer[s->len] = 0;
+	s->buffer[s->seq.len] = 0;
 
 	printk(KERN_TRACE "%s", s->buffer);
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f99e0b3bca8c..d879a8d69792 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1041,7 +1041,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
 	mutex_unlock(&event_mutex);
 
 	if (file)
-		r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);
+		r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->seq.len);
 
 	kfree(s);
 
@@ -1207,7 +1207,7 @@ subsystem_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
 	trace_seq_init(s);
 
 	print_subsystem_event_filter(system, s);
-	r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);
+	r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->seq.len);
 
 	kfree(s);
 
@@ -1262,7 +1262,7 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
 	trace_seq_init(s);
 
 	func(s);
-	r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);
+	r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->seq.len);
 
 	kfree(s);
 
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 4de3e57f723c..bc0f02eb062c 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -1255,9 +1255,9 @@ print_graph_comment(struct trace_seq *s, struct trace_entry *ent,
 	}
 
 	/* Strip ending newline */
-	if (s->buffer[s->len - 1] == '\n') {
-		s->buffer[s->len - 1] = '\0';
-		s->len--;
+	if (s->buffer[s->seq.len - 1] == '\n') {
+		s->buffer[s->seq.len - 1] = '\0';
+		s->seq.len--;
 	}
 
 	ret = trace_seq_puts(s, " */\n");
diff --git a/kernel/trace/trace_seq.c b/kernel/trace/trace_seq.c
index 1f24ed99dca2..1de967021586 100644
--- a/kernel/trace/trace_seq.c
+++ b/kernel/trace/trace_seq.c
@@ -27,10 +27,19 @@
 #include <linux/trace_seq.h>
 
 /* How much buffer is left on the trace_seq? */
-#define TRACE_SEQ_BUF_LEFT(s) ((PAGE_SIZE - 1) - (s)->len)
+#define TRACE_SEQ_BUF_LEFT(s) ((PAGE_SIZE - 1) - (s)->seq.len)
 
 /* How much buffer is written? */
-#define TRACE_SEQ_BUF_USED(s) min((s)->len, (unsigned int)(PAGE_SIZE - 1))
+#define TRACE_SEQ_BUF_USED(s) min((s)->seq.len, (unsigned int)(PAGE_SIZE - 1))
+
+/*
+ * trace_seq should work with being initialized with 0s.
+ */
+static inline void __trace_seq_init(struct trace_seq *s)
+{
+	if (unlikely(!s->seq.size))
+		trace_seq_init(s);
+}
 
 /**
  * trace_print_seq - move the contents of trace_seq into a seq_file
@@ -43,10 +52,11 @@
  */
 int trace_print_seq(struct seq_file *m, struct trace_seq *s)
 {
-	unsigned int len = TRACE_SEQ_BUF_USED(s);
 	int ret;
 
-	ret = seq_write(m, s->buffer, len);
+	__trace_seq_init(s);
+
+	ret = seq_buf_print_seq(m, &s->seq);
 
 	/*
 	 * Only reset this buffer if we successfully wrote to the
@@ -77,25 +87,25 @@ int trace_print_seq(struct seq_file *m, struct trace_seq *s)
  */
 int trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
 {
-	unsigned int len = TRACE_SEQ_BUF_LEFT(s);
+	unsigned int save_len = s->seq.len;
 	va_list ap;
-	int ret;
 
-	if (s->full || !len)
+	if (s->full)
 		return 0;
 
+	__trace_seq_init(s);
+
 	va_start(ap, fmt);
-	ret = vsnprintf(s->buffer + s->len, len, fmt, ap);
+	seq_buf_vprintf(&s->seq, fmt, ap);
 	va_end(ap);
 
 	/* If we can't write it all, don't bother writing anything */
-	if (ret >= len) {
+	if (unlikely(s->seq.overflow)) {
+		s->seq.len = save_len;
 		s->full = 1;
 		return 0;
 	}
 
-	s->len += ret;
-
 	return 1;
 }
 EXPORT_SYMBOL_GPL(trace_seq_printf);
@@ -116,14 +126,20 @@ EXPORT_SYMBOL_GPL(trace_seq_printf);
 int trace_seq_bitmask(struct trace_seq *s, const unsigned long *maskp,
 		      int nmaskbits)
 {
-	unsigned int len = TRACE_SEQ_BUF_LEFT(s);
-	int ret;
+	unsigned int save_len = s->seq.len;
 
-	if (s->full || !len)
+	if (s->full)
 		return 0;
 
-	ret = bitmap_scnprintf(s->buffer, len, maskp, nmaskbits);
-	s->len += ret;
+	__trace_seq_init(s);
+
+	seq_buf_bitmask(&s->seq, maskp, nmaskbits);
+
+	if (unlikely(s->seq.overflow)) {
+		s->seq.len = save_len;
+		s->full = 1;
+		return 0;
+	}
 
 	return 1;
 }
@@ -144,23 +160,24 @@ EXPORT_SYMBOL_GPL(trace_seq_bitmask);
  */
 int trace_seq_vprintf(struct trace_seq *s, const char *fmt, va_list args)
 {
-	unsigned int len = TRACE_SEQ_BUF_LEFT(s);
+	unsigned int save_len = s->seq.len;
 	int ret;
 
-	if (s->full || !len)
+	if (s->full)
 		return 0;
 
-	ret = vsnprintf(s->buffer + s->len, len, fmt, args);
+	__trace_seq_init(s);
+
+	ret = seq_buf_vprintf(&s->seq, fmt, args);
 
 	/* If we can't write it all, don't bother writing anything */
-	if (ret >= len) {
+	if (s->seq.overflow) {
+		s->seq.len = save_len;
 		s->full = 1;
 		return 0;
 	}
 
-	s->len += ret;
-
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(trace_seq_vprintf);
 
@@ -183,23 +200,24 @@ EXPORT_SYMBOL_GPL(trace_seq_vprintf);
  */
 int trace_seq_bprintf(struct trace_seq *s, const char *fmt, const u32 *binary)
 {
-	unsigned int len = TRACE_SEQ_BUF_LEFT(s);
+	unsigned int save_len = s->seq.len;
 	int ret;
 
-	if (s->full || !len)
+	if (s->full)
 		return 0;
 
-	ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
+	__trace_seq_init(s);
+
+	ret = seq_buf_bprintf(&s->seq, fmt, binary);
 
 	/* If we can't write it all, don't bother writing anything */
-	if (ret >= len) {
+	if (s->seq.overflow) {
+		s->seq.len = save_len;
 		s->full = 1;
 		return 0;
 	}
 
-	s->len += ret;
-
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(trace_seq_bprintf);
 
@@ -222,13 +240,14 @@ int trace_seq_puts(struct trace_seq *s, const char *str)
 	if (s->full)
 		return 0;
 
+	__trace_seq_init(s);
+
 	if (len > TRACE_SEQ_BUF_LEFT(s)) {
 		s->full = 1;
 		return 0;
 	}
 
-	memcpy(s->buffer + s->len, str, len);
-	s->len += len;
+	seq_buf_putmem(&s->seq, str, len);
 
 	return len;
 }
@@ -251,12 +270,14 @@ int trace_seq_putc(struct trace_seq *s, unsigned char c)
 	if (s->full)
 		return 0;
 
+	__trace_seq_init(s);
+
 	if (TRACE_SEQ_BUF_LEFT(s) < 1) {
 		s->full = 1;
 		return 0;
 	}
 
-	s->buffer[s->len++] = c;
+	seq_buf_putc(&s->seq, c);
 
 	return 1;
 }
@@ -279,21 +300,19 @@ int trace_seq_putmem(struct trace_seq *s, const void *mem, unsigned int len)
 	if (s->full)
 		return 0;
 
+	__trace_seq_init(s);
+
 	if (len > TRACE_SEQ_BUF_LEFT(s)) {
 		s->full = 1;
 		return 0;
 	}
 
-	memcpy(s->buffer + s->len, mem, len);
-	s->len += len;
+	seq_buf_putmem(&s->seq, mem, len);
 
 	return len;
 }
 EXPORT_SYMBOL_GPL(trace_seq_putmem);
 
-#define MAX_MEMHEX_BYTES	8U
-#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
-
 /**
  * trace_seq_putmem_hex - write raw memory into the buffer in ASCII hex
  * @s: trace sequence descriptor
@@ -309,35 +328,30 @@ EXPORT_SYMBOL_GPL(trace_seq_putmem);
 int trace_seq_putmem_hex(struct trace_seq *s, const void *mem,
 			 unsigned int len)
 {
-	unsigned char hex[HEX_CHARS];
-	const unsigned char *data = mem;
-	unsigned int start_len;
-	int i, j;
-	int cnt = 0;
+	unsigned int save_len = s->seq.len;
+	int ret;
 
 	if (s->full)
 		return 0;
 
-	while (len) {
-		start_len = min(len, HEX_CHARS - 1);
-#ifdef __BIG_ENDIAN
-		for (i = 0, j = 0; i < start_len; i++) {
-#else
-		for (i = start_len-1, j = 0; i >= 0; i--) {
-#endif
-			hex[j++] = hex_asc_hi(data[i]);
-			hex[j++] = hex_asc_lo(data[i]);
-		}
-		if (WARN_ON_ONCE(j == 0 || j/2 > len))
-			break;
-
-		/* j increments twice per loop */
-		len -= j / 2;
-		hex[j++] = ' ';
-
-		cnt += trace_seq_putmem(s, hex, j);
+	__trace_seq_init(s);
+
+	/* Each byte is represented by two chars */
+	if (len * 2 > TRACE_SEQ_BUF_LEFT(s)) {
+		s->full = 1;
+		return 0;
+	}
+
+	/* The added spaces can still cause an overflow */
+	ret = seq_buf_putmem_hex(&s->seq, mem, len);
+
+	if (unlikely(s->seq.overflow)) {
+		s->seq.len = save_len;
+		s->full = 1;
+		return 0;
 	}
-	return cnt;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(trace_seq_putmem_hex);
 
@@ -355,30 +369,28 @@ EXPORT_SYMBOL_GPL(trace_seq_putmem_hex);
  */
 int trace_seq_path(struct trace_seq *s, const struct path *path)
 {
-	unsigned char *p;
+	unsigned int save_len = s->seq.len;
+	int ret;
 
 	if (s->full)
 		return 0;
 
+	__trace_seq_init(s);
+
 	if (TRACE_SEQ_BUF_LEFT(s) < 1) {
 		s->full = 1;
 		return 0;
 	}
 
-	p = d_path(path, s->buffer + s->len, PAGE_SIZE - s->len);
-	if (!IS_ERR(p)) {
-		p = mangle_path(s->buffer + s->len, p, "\n");
-		if (p) {
-			s->len = p - s->buffer;
-			return 1;
-		}
-	} else {
-		s->buffer[s->len++] = '?';
-		return 1;
+	ret = seq_buf_path(&s->seq, path);
+
+	if (unlikely(s->seq.overflow)) {
+		s->seq.len = save_len;
+		s->full = 1;
+		return 0;
 	}
 
-	s->full = 1;
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(trace_seq_path);
 
@@ -404,25 +416,7 @@ EXPORT_SYMBOL_GPL(trace_seq_path);
  */
 int trace_seq_to_user(struct trace_seq *s, char __user *ubuf, int cnt)
 {
-	int len;
-	int ret;
-
-	if (!cnt)
-		return 0;
-
-	if (s->len <= s->readpos)
-		return -EBUSY;
-
-	len = s->len - s->readpos;
-	if (cnt > len)
-		cnt = len;
-	ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt);
-	if (ret == cnt)
-		return -EFAULT;
-
-	cnt -= ret;
-
-	s->readpos += cnt;
-	return cnt;
+	__trace_seq_init(s);
+	return seq_buf_to_user(&s->seq, ubuf, cnt);
 }
 EXPORT_SYMBOL_GPL(trace_seq_to_user);
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/
  2014-06-26 21:49 [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Steven Rostedt
  2014-06-26 21:49 ` [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function Steven Rostedt
  2014-06-26 21:49 ` [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq Steven Rostedt
@ 2014-06-26 21:49 ` Steven Rostedt
  2014-06-27 13:48   ` Petr Mládek
  2014-06-26 21:49 ` [RFC][PATCH 4/5 v2] printk: Add per_cpu printk func to allow printk to be diverted Steven Rostedt
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-26 21:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Jiri Kosina,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

[-- Attachment #1: 0003-seq_buf-Move-the-seq_buf-code-to-lib.patch --]
[-- Type: text/plain, Size: 26749 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The seq_buf functions are rather useful outside of tracing. Instead
of having it be dependent on CONFIG_TRACING, move the code into lib/
and allow other users to have access to it even when tracing is not
configured.

The seq_buf utility is similar to the seq_file utility, but instead of
writing sending data back up to userland, it writes it into a buffer
defined at seq_buf_init(). This allows us to send a descriptor around
that writes printf() formatted strings into it that can be retrieved
later.

It is currently used by the tracing facility for such things like trace
events to convert its binary saved data in the ring buffer into an
ASCII human readable context to be displayed in /sys/kernel/debug/trace.

It can also be used for doing NMI prints safely from NMI context into
the seq_buf and retrieved later and dumped to printk() safely. Doing
printk() from an NMI context is dangerous because an NMI can preempt
a current printk() and deadlock on it.

Link: http://lkml.kernel.org/p/20140619213952.058255809@goodmis.org

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/seq_buf.c | 348 -------------------------------------------------
 lib/Makefile           |   2 +-
 lib/seq_buf.c          | 348 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/trace_seq.c        | 303 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 652 insertions(+), 349 deletions(-)
 delete mode 100644 kernel/trace/seq_buf.c
 create mode 100644 lib/seq_buf.c
 create mode 100644 lib/trace_seq.c

diff --git a/kernel/trace/seq_buf.c b/kernel/trace/seq_buf.c
deleted file mode 100644
index b7e33c18540a..000000000000
--- a/kernel/trace/seq_buf.c
+++ /dev/null
@@ -1,348 +0,0 @@
-/*
- * seq_buf.c
- *
- * Copyright (C) 2014 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
- *
- * The seq_buf is a handy tool that allows you to pass a descriptor around
- * to a buffer that other functions can write to. It is similar to the
- * seq_file functionality but has some differences.
- *
- * To use it, the seq_buf must be initialized with seq_buf_init().
- * This will set up the counters within the descriptor. You can call
- * seq_buf_init() more than once to reset the seq_buf to start
- * from scratch.
- * 
- */
-#include <linux/uaccess.h>
-#include <linux/seq_file.h>
-#include <linux/seq_buf.h>
-
-/* How much buffer is left on the seq_buf? */
-#define SEQ_BUF_LEFT(s) (((s)->size - 1) - (s)->len)
-
-/* How much buffer is written? */
-#define SEQ_BUF_USED(s) min((s)->len, (s)->size - 1)
-
-static inline void seq_buf_check_len(struct seq_buf *s)
-{
-	if (unlikely(s->len > (s->size - 1))) {
-		s->len = s->size - 1;
-		s->overflow = 1;
-	}
-}
-
-/**
- * seq_buf_print_seq - move the contents of seq_buf into a seq_file
- * @m: the seq_file descriptor that is the destination
- * @s: the seq_buf descriptor that is the source.
- *
- * Returns zero on success, non zero otherwise
- */
-int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
-{
-	unsigned int len = SEQ_BUF_USED(s);
-
-	return seq_write(m, s->buffer, len);
-}
-
-/**
- * seq_buf_printf - sequence printing of trace information
- * @s: seq_buf descriptor
- * @fmt: printf format string
- *
- * Writes a printf() format into the sequence buffer.
- *
- * Returns number of bytes written.
- */
-int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
-{
-	unsigned int len = SEQ_BUF_LEFT(s);
-	va_list ap;
-	int ret;
-
-	WARN_ON((int)len < 0);
-	va_start(ap, fmt);
-	ret = vsnprintf(s->buffer + s->len, len, fmt, ap);
-	va_end(ap);
-
-	s->len += ret;
-
-	seq_buf_check_len(s);
-
-	return ret;
-}
-
-/**
- * seq_buf_bitmask - write a bitmask array in its ASCII representation
- * @s:		seq_buf descriptor
- * @maskp:	points to an array of unsigned longs that represent a bitmask
- * @nmaskbits:	The number of bits that are valid in @maskp
- *
- * Writes a ASCII representation of a bitmask string into @s.
- *
- * Returns the number of bytes written.
- */
-int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
-		    int nmaskbits)
-{
-	unsigned int len = SEQ_BUF_LEFT(s);
-	int ret;
-
-	WARN_ON((int)len < 0);
-	ret = bitmap_scnprintf(s->buffer, len, maskp, nmaskbits);
-	s->len += ret;
-	seq_buf_check_len(s);
-
-	return ret;
-}
-
-/**
- * seq_buf_vprintf - write vprintf style into the sequence buffer
- * @s: seq_buf descriptor
- * @fmt: printf format string
- *
- * Write a vprintf() style into the sequence buffer.
- *
- * Returns the number of bytes written.
- */
-int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
-{
-	unsigned int len = SEQ_BUF_LEFT(s);
-	int ret;
-
-	if (WARN_ON((int)len < 0))
-		printk("len=%d size=%d s->len=%d\n", len, s->size, s->len);
-	ret = vsnprintf(s->buffer + s->len, len, fmt, args);
-	s->len += ret;
-	seq_buf_check_len(s);
-
-	return len;
-}
-
-/**
- * seq_buf_bprintf - Write the printf string from binary arguments
- * @s: seq_buf descriptor
- * @fmt: The format string for the @binary arguments
- * @binary: The binary arguments for @fmt.
- *
- * When recording in a fast path, a printf may be recorded with just
- * saving the format and the arguments as they were passed to the
- * function, instead of wasting cycles converting the arguments into
- * ASCII characters. Instead, the arguments are saved in a 32 bit
- * word array that is defined by the format string constraints.
- *
- * This function will take the format and the binary array and finish
- * the conversion into the ASCII string within the buffer.
- *
- * Returns number of bytes written.
- */
-int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
-{
-	unsigned int len = SEQ_BUF_LEFT(s);
-	int ret;
-
-	if (WARN_ON((int)len < 0))
-		printk("len=%d size=%d s->len=%d\n", len, s->size, s->len);
-	ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
-	s->len += ret;
-	seq_buf_check_len(s);
-
-	return len;
-}
-
-/**
- * seq_buf_puts - sequence printing of simple string
- * @s: seq_buf descriptor
- * @str: simple string to record
- *
- * Copy a simple string into the sequence buffer.
- *
- * Returns number of bytes written.
- */
-int seq_buf_puts(struct seq_buf *s, const char *str)
-{
-	unsigned int len = strlen(str);
-
-	if (len > SEQ_BUF_LEFT(s)) {
-		s->overflow = 1;
-		len = SEQ_BUF_LEFT(s);
-	}
-
-	memcpy(s->buffer + s->len, str, len);
-	s->len += len;
-	WARN_ON(s->len > (s->size - 1));
-
-	return len;
-}
-
-/**
- * seq_buf_putc - sequence printing of simple character
- * @s: seq_buf descriptor
- * @c: simple character to record
- *
- * Copy a single character into the sequence buffer.
- *
- * Returns 1 if the character was written, 0 otherwise.
- */
-int seq_buf_putc(struct seq_buf *s, unsigned char c)
-{
-	if (SEQ_BUF_LEFT(s) < 1) {
-		s->overflow = 1;
-		return 0;
-	}
-
-	s->buffer[s->len++] = c;
-	WARN_ON(s->len > (s->size - 1));
-
-	return 1;
-}
-
-/**
- * seq_buf_putmem - write raw data into the sequenc buffer
- * @s: seq_buf descriptor
- * @mem: The raw memory to copy into the buffer
- * @len: The length of the raw memory to copy (in bytes)
- *
- * There may be cases where raw memory needs to be written into the
- * buffer and a strcpy() would not work. Using this function allows
- * for such cases.
- *
- * Returns the number of bytes written in the buffer.
- */
-int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
-{
-	if (len > SEQ_BUF_LEFT(s)) {
-		s->overflow = 1;
-		len = SEQ_BUF_LEFT(s);
-	}
-
-	memcpy(s->buffer + s->len, mem, len);
-	s->len += len;
-	WARN_ON(s->len > (s->size - 1));
-
-	return len;
-}
-
-#define MAX_MEMHEX_BYTES	8U
-#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
-
-/**
- * seq_buf_putmem_hex - write raw memory into the buffer in ASCII hex
- * @s: seq_buf descriptor
- * @mem: The raw memory to write its hex ASCII representation of
- * @len: The length of the raw memory to copy (in bytes)
- *
- * This is similar to seq_buf_putmem() except instead of just copying the
- * raw memory into the buffer it writes its ASCII representation of it
- * in hex characters.
- *
- * Returns how much it wrote to the buffer.
- */
-int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
-		       unsigned int len)
-{
-	unsigned char hex[HEX_CHARS];
-	const unsigned char *data = mem;
-	unsigned int start_len;
-	int i, j;
-	int cnt = 0;
-
-	while (len) {
-		start_len = min(len, HEX_CHARS - 1);
-#ifdef __BIG_ENDIAN
-		for (i = 0, j = 0; i < start_len; i++) {
-#else
-		for (i = start_len-1, j = 0; i >= 0; i--) {
-#endif
-			hex[j++] = hex_asc_hi(data[i]);
-			hex[j++] = hex_asc_lo(data[i]);
-		}
-		if (WARN_ON_ONCE(j == 0 || j/2 > len))
-			break;
-
-		/* j increments twice per loop */
-		len -= j / 2;
-		hex[j++] = ' ';
-
-		cnt += seq_buf_putmem(s, hex, j);
-	}
-	return cnt;
-}
-
-/**
- * seq_buf_path - copy a path into the sequence buffer
- * @s: seq_buf descriptor
- * @path: path to write into the sequence buffer.
- *
- * Write a path name into the sequence buffer.
- *
- * Returns the number of bytes written into the buffer.
- */
-int seq_buf_path(struct seq_buf *s, const struct path *path)
-{
-	unsigned int len = SEQ_BUF_LEFT(s);
-	unsigned char *p;
-	unsigned int start = s->len;
-
-	WARN_ON((int)len < 0);
-	p = d_path(path, s->buffer + s->len, len);
-	if (!IS_ERR(p)) {
-		p = mangle_path(s->buffer + s->len, p, "\n");
-		if (p) {
-			s->len = p - s->buffer;
-	WARN_ON(s->len > (s->size - 1));
-			return s->len - start;
-		}
-	} else {
-		s->buffer[s->len++] = '?';
-	WARN_ON(s->len > (s->size - 1));
-		return s->len - start;
-	}
-
-	s->overflow = 1;
-	return 0;
-}
-
-/**
- * seq_buf_to_user - copy the squence buffer to user space
- * @s: seq_buf descriptor
- * @ubuf: The userspace memory location to copy to
- * @cnt: The amount to copy
- *
- * Copies the sequence buffer into the userspace memory pointed to
- * by @ubuf. It starts from the last read position (@s->readpos)
- * and writes up to @cnt characters or till it reaches the end of
- * the content in the buffer (@s->len), which ever comes first.
- *
- * On success, it returns a positive number of the number of bytes
- * it copied.
- *
- * On failure it returns -EBUSY if all of the content in the
- * sequence has been already read, which includes nothing in the
- * sequenc (@s->len == @s->readpos).
- *
- * Returns -EFAULT if the copy to userspace fails.
- */
-int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, int cnt)
-{
-	int len;
-	int ret;
-
-	if (!cnt)
-		return 0;
-
-	if (s->len <= s->readpos)
-		return -EBUSY;
-
-	len = s->len - s->readpos;
-	if (cnt > len)
-		cnt = len;
-	ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt);
-	if (ret == cnt)
-		return -EFAULT;
-
-	cnt -= ret;
-
-	s->readpos += cnt;
-	return cnt;
-}
diff --git a/lib/Makefile b/lib/Makefile
index ba967a19edba..6007194082c6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o
+	 earlycpio.o seq_buf.o
 
 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
new file mode 100644
index 000000000000..5c3c65becdb1
--- /dev/null
+++ b/lib/seq_buf.c
@@ -0,0 +1,348 @@
+/*
+ * seq_buf.c
+ *
+ * Copyright (C) 2014 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
+ *
+ * The seq_buf is a handy tool that allows you to pass a descriptor around
+ * to a buffer that other functions can write to. It is similar to the
+ * seq_file functionality but has some differences.
+ *
+ * To use it, the seq_buf must be initialized with seq_buf_init().
+ * This will set up the counters within the descriptor. You can call
+ * seq_buf_init() more than once to reset the seq_buf to start
+ * from scratch.
+ *
+ */
+#include <linux/uaccess.h>
+#include <linux/seq_file.h>
+#include <linux/seq_buf.h>
+
+/* How much buffer is left on the seq_buf? */
+#define SEQ_BUF_LEFT(s) (((s)->size - 1) - (s)->len)
+
+/* How much buffer is written? */
+#define SEQ_BUF_USED(s) min((s)->len, (s)->size - 1)
+
+static inline void seq_buf_check_len(struct seq_buf *s)
+{
+	if (unlikely(s->len > (s->size - 1))) {
+		s->len = s->size - 1;
+		s->overflow = 1;
+	}
+}
+
+/**
+ * seq_buf_print_seq - move the contents of seq_buf into a seq_file
+ * @m: the seq_file descriptor that is the destination
+ * @s: the seq_buf descriptor that is the source.
+ *
+ * Returns zero on success, non zero otherwise
+ */
+int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
+{
+	unsigned int len = SEQ_BUF_USED(s);
+
+	return seq_write(m, s->buffer, len);
+}
+
+/**
+ * seq_buf_printf - sequence printing of trace information
+ * @s: seq_buf descriptor
+ * @fmt: printf format string
+ *
+ * Writes a printf() format into the sequence buffer.
+ *
+ * Returns number of bytes written.
+ */
+int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	va_list ap;
+	int ret;
+
+	WARN_ON((int)len < 0);
+	va_start(ap, fmt);
+	ret = vsnprintf(s->buffer + s->len, len, fmt, ap);
+	va_end(ap);
+
+	s->len += ret;
+
+	seq_buf_check_len(s);
+
+	return ret;
+}
+
+/**
+ * seq_buf_bitmask - write a bitmask array in its ASCII representation
+ * @s:		seq_buf descriptor
+ * @maskp:	points to an array of unsigned longs that represent a bitmask
+ * @nmaskbits:	The number of bits that are valid in @maskp
+ *
+ * Writes a ASCII representation of a bitmask string into @s.
+ *
+ * Returns the number of bytes written.
+ */
+int seq_buf_bitmask(struct seq_buf *s, const unsigned long *maskp,
+		    int nmaskbits)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	int ret;
+
+	WARN_ON((int)len < 0);
+	ret = bitmap_scnprintf(s->buffer, len, maskp, nmaskbits);
+	s->len += ret;
+	seq_buf_check_len(s);
+
+	return ret;
+}
+
+/**
+ * seq_buf_vprintf - write vprintf style into the sequence buffer
+ * @s: seq_buf descriptor
+ * @fmt: printf format string
+ *
+ * Write a vprintf() style into the sequence buffer.
+ *
+ * Returns the number of bytes written.
+ */
+int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	int ret;
+
+	if (WARN_ON((int)len < 0))
+		printk("len=%d size=%d s->len=%d\n", len, s->size, s->len);
+	ret = vsnprintf(s->buffer + s->len, len, fmt, args);
+	s->len += ret;
+	seq_buf_check_len(s);
+
+	return len;
+}
+
+/**
+ * seq_buf_bprintf - Write the printf string from binary arguments
+ * @s: seq_buf descriptor
+ * @fmt: The format string for the @binary arguments
+ * @binary: The binary arguments for @fmt.
+ *
+ * When recording in a fast path, a printf may be recorded with just
+ * saving the format and the arguments as they were passed to the
+ * function, instead of wasting cycles converting the arguments into
+ * ASCII characters. Instead, the arguments are saved in a 32 bit
+ * word array that is defined by the format string constraints.
+ *
+ * This function will take the format and the binary array and finish
+ * the conversion into the ASCII string within the buffer.
+ *
+ * Returns number of bytes written.
+ */
+int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	int ret;
+
+	if (WARN_ON((int)len < 0))
+		printk("len=%d size=%d s->len=%d\n", len, s->size, s->len);
+	ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
+	s->len += ret;
+	seq_buf_check_len(s);
+
+	return len;
+}
+
+/**
+ * seq_buf_puts - sequence printing of simple string
+ * @s: seq_buf descriptor
+ * @str: simple string to record
+ *
+ * Copy a simple string into the sequence buffer.
+ *
+ * Returns number of bytes written.
+ */
+int seq_buf_puts(struct seq_buf *s, const char *str)
+{
+	unsigned int len = strlen(str);
+
+	if (len > SEQ_BUF_LEFT(s)) {
+		s->overflow = 1;
+		len = SEQ_BUF_LEFT(s);
+	}
+
+	memcpy(s->buffer + s->len, str, len);
+	s->len += len;
+	WARN_ON(s->len > (s->size - 1));
+
+	return len;
+}
+
+/**
+ * seq_buf_putc - sequence printing of simple character
+ * @s: seq_buf descriptor
+ * @c: simple character to record
+ *
+ * Copy a single character into the sequence buffer.
+ *
+ * Returns 1 if the character was written, 0 otherwise.
+ */
+int seq_buf_putc(struct seq_buf *s, unsigned char c)
+{
+	if (SEQ_BUF_LEFT(s) < 1) {
+		s->overflow = 1;
+		return 0;
+	}
+
+	s->buffer[s->len++] = c;
+	WARN_ON(s->len > (s->size - 1));
+
+	return 1;
+}
+
+/**
+ * seq_buf_putmem - write raw data into the sequenc buffer
+ * @s: seq_buf descriptor
+ * @mem: The raw memory to copy into the buffer
+ * @len: The length of the raw memory to copy (in bytes)
+ *
+ * There may be cases where raw memory needs to be written into the
+ * buffer and a strcpy() would not work. Using this function allows
+ * for such cases.
+ *
+ * Returns the number of bytes written in the buffer.
+ */
+int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
+{
+	if (len > SEQ_BUF_LEFT(s)) {
+		s->overflow = 1;
+		len = SEQ_BUF_LEFT(s);
+	}
+
+	memcpy(s->buffer + s->len, mem, len);
+	s->len += len;
+	WARN_ON(s->len > (s->size - 1));
+
+	return len;
+}
+
+#define MAX_MEMHEX_BYTES	8U
+#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
+
+/**
+ * seq_buf_putmem_hex - write raw memory into the buffer in ASCII hex
+ * @s: seq_buf descriptor
+ * @mem: The raw memory to write its hex ASCII representation of
+ * @len: The length of the raw memory to copy (in bytes)
+ *
+ * This is similar to seq_buf_putmem() except instead of just copying the
+ * raw memory into the buffer it writes its ASCII representation of it
+ * in hex characters.
+ *
+ * Returns how much it wrote to the buffer.
+ */
+int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
+		       unsigned int len)
+{
+	unsigned char hex[HEX_CHARS];
+	const unsigned char *data = mem;
+	unsigned int start_len;
+	int i, j;
+	int cnt = 0;
+
+	while (len) {
+		start_len = min(len, HEX_CHARS - 1);
+#ifdef __BIG_ENDIAN
+		for (i = 0, j = 0; i < start_len; i++) {
+#else
+		for (i = start_len-1, j = 0; i >= 0; i--) {
+#endif
+			hex[j++] = hex_asc_hi(data[i]);
+			hex[j++] = hex_asc_lo(data[i]);
+		}
+		if (WARN_ON_ONCE(j == 0 || j/2 > len))
+			break;
+
+		/* j increments twice per loop */
+		len -= j / 2;
+		hex[j++] = ' ';
+
+		cnt += seq_buf_putmem(s, hex, j);
+	}
+	return cnt;
+}
+
+/**
+ * seq_buf_path - copy a path into the sequence buffer
+ * @s: seq_buf descriptor
+ * @path: path to write into the sequence buffer.
+ *
+ * Write a path name into the sequence buffer.
+ *
+ * Returns the number of bytes written into the buffer.
+ */
+int seq_buf_path(struct seq_buf *s, const struct path *path)
+{
+	unsigned int len = SEQ_BUF_LEFT(s);
+	unsigned char *p;
+	unsigned int start = s->len;
+
+	WARN_ON((int)len < 0);
+	p = d_path(path, s->buffer + s->len, len);
+	if (!IS_ERR(p)) {
+		p = mangle_path(s->buffer + s->len, p, "\n");
+		if (p) {
+			s->len = p - s->buffer;
+	WARN_ON(s->len > (s->size - 1));
+			return s->len - start;
+		}
+	} else {
+		s->buffer[s->len++] = '?';
+	WARN_ON(s->len > (s->size - 1));
+		return s->len - start;
+	}
+
+	s->overflow = 1;
+	return 0;
+}
+
+/**
+ * seq_buf_to_user - copy the squence buffer to user space
+ * @s: seq_buf descriptor
+ * @ubuf: The userspace memory location to copy to
+ * @cnt: The amount to copy
+ *
+ * Copies the sequence buffer into the userspace memory pointed to
+ * by @ubuf. It starts from the last read position (@s->readpos)
+ * and writes up to @cnt characters or till it reaches the end of
+ * the content in the buffer (@s->len), which ever comes first.
+ *
+ * On success, it returns a positive number of the number of bytes
+ * it copied.
+ *
+ * On failure it returns -EBUSY if all of the content in the
+ * sequence has been already read, which includes nothing in the
+ * sequenc (@s->len == @s->readpos).
+ *
+ * Returns -EFAULT if the copy to userspace fails.
+ */
+int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, int cnt)
+{
+	int len;
+	int ret;
+
+	if (!cnt)
+		return 0;
+
+	if (s->len <= s->readpos)
+		return -EBUSY;
+
+	len = s->len - s->readpos;
+	if (cnt > len)
+		cnt = len;
+	ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt);
+	if (ret == cnt)
+		return -EFAULT;
+
+	cnt -= ret;
+
+	s->readpos += cnt;
+	return cnt;
+}
diff --git a/lib/trace_seq.c b/lib/trace_seq.c
new file mode 100644
index 000000000000..5ba99c6cf834
--- /dev/null
+++ b/lib/trace_seq.c
@@ -0,0 +1,303 @@
+/*
+ * trace_seq.c
+ *
+ * Copyright (C) 2008-2014 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
+ *
+ */
+#include <linux/uaccess.h>
+#include <linux/seq_file.h>
+#include <linux/trace_seq.h>
+
+int trace_print_seq(struct seq_file *m, struct trace_seq *s)
+{
+	int len = s->len >= PAGE_SIZE ? PAGE_SIZE - 1 : s->len;
+	int ret;
+
+	ret = seq_write(m, s->buffer, len);
+
+	/*
+	 * Only reset this buffer if we successfully wrote to the
+	 * seq_file buffer.
+	 */
+	if (!ret)
+		trace_seq_init(s);
+
+	return ret;
+}
+
+/**
+ * trace_seq_printf - sequence printing of trace information
+ * @s: trace sequence descriptor
+ * @fmt: printf format string
+ *
+ * It returns 0 if the trace oversizes the buffer's free
+ * space, 1 otherwise.
+ *
+ * The tracer may use either sequence operations or its own
+ * copy to user routines. To simplify formating of a trace
+ * trace_seq_printf is used to store strings into a special
+ * buffer (@s). Then the output may be either used by
+ * the sequencer or pulled into another buffer.
+ */
+int
+trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
+{
+	int len = (PAGE_SIZE - 1) - s->len;
+	va_list ap;
+	int ret;
+
+	if (s->full || !len)
+		return 0;
+
+	va_start(ap, fmt);
+	ret = vsnprintf(s->buffer + s->len, len, fmt, ap);
+	va_end(ap);
+
+	/* If we can't write it all, don't bother writing anything */
+	if (ret >= len) {
+		s->full = 1;
+		return 0;
+	}
+
+	s->len += ret;
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(trace_seq_printf);
+
+/**
+ * trace_seq_bitmask - put a list of longs as a bitmask print output
+ * @s:		trace sequence descriptor
+ * @maskp:	points to an array of unsigned longs that represent a bitmask
+ * @nmaskbits:	The number of bits that are valid in @maskp
+ *
+ * It returns 0 if the trace oversizes the buffer's free
+ * space, 1 otherwise.
+ *
+ * Writes a ASCII representation of a bitmask string into @s.
+ */
+int
+trace_seq_bitmask(struct trace_seq *s, const unsigned long *maskp,
+		  int nmaskbits)
+{
+	int len = (PAGE_SIZE - 1) - s->len;
+	int ret;
+
+	if (s->full || !len)
+		return 0;
+
+	ret = bitmap_scnprintf(s->buffer, len, maskp, nmaskbits);
+	s->len += ret;
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(trace_seq_bitmask);
+
+/**
+ * trace_seq_vprintf - sequence printing of trace information
+ * @s: trace sequence descriptor
+ * @fmt: printf format string
+ *
+ * The tracer may use either sequence operations or its own
+ * copy to user routines. To simplify formating of a trace
+ * trace_seq_printf is used to store strings into a special
+ * buffer (@s). Then the output may be either used by
+ * the sequencer or pulled into another buffer.
+ */
+int
+trace_seq_vprintf(struct trace_seq *s, const char *fmt, va_list args)
+{
+	int len = (PAGE_SIZE - 1) - s->len;
+	int ret;
+
+	if (s->full || !len)
+		return 0;
+
+	ret = vsnprintf(s->buffer + s->len, len, fmt, args);
+
+	/* If we can't write it all, don't bother writing anything */
+	if (ret >= len) {
+		s->full = 1;
+		return 0;
+	}
+
+	s->len += ret;
+
+	return len;
+}
+EXPORT_SYMBOL_GPL(trace_seq_vprintf);
+
+int trace_seq_bprintf(struct trace_seq *s, const char *fmt, const u32 *binary)
+{
+	int len = (PAGE_SIZE - 1) - s->len;
+	int ret;
+
+	if (s->full || !len)
+		return 0;
+
+	ret = bstr_printf(s->buffer + s->len, len, fmt, binary);
+
+	/* If we can't write it all, don't bother writing anything */
+	if (ret >= len) {
+		s->full = 1;
+		return 0;
+	}
+
+	s->len += ret;
+
+	return len;
+}
+
+/**
+ * trace_seq_puts - trace sequence printing of simple string
+ * @s: trace sequence descriptor
+ * @str: simple string to record
+ *
+ * The tracer may use either the sequence operations or its own
+ * copy to user routines. This function records a simple string
+ * into a special buffer (@s) for later retrieval by a sequencer
+ * or other mechanism.
+ */
+int trace_seq_puts(struct trace_seq *s, const char *str)
+{
+	int len = strlen(str);
+
+	if (s->full)
+		return 0;
+
+	if (len > ((PAGE_SIZE - 1) - s->len)) {
+		s->full = 1;
+		return 0;
+	}
+
+	memcpy(s->buffer + s->len, str, len);
+	s->len += len;
+
+	return len;
+}
+
+int trace_seq_putc(struct trace_seq *s, unsigned char c)
+{
+	if (s->full)
+		return 0;
+
+	if (s->len >= (PAGE_SIZE - 1)) {
+		s->full = 1;
+		return 0;
+	}
+
+	s->buffer[s->len++] = c;
+
+	return 1;
+}
+EXPORT_SYMBOL(trace_seq_putc);
+
+int trace_seq_putmem(struct trace_seq *s, const void *mem, size_t len)
+{
+	if (s->full)
+		return 0;
+
+	if (len > ((PAGE_SIZE - 1) - s->len)) {
+		s->full = 1;
+		return 0;
+	}
+
+	memcpy(s->buffer + s->len, mem, len);
+	s->len += len;
+
+	return len;
+}
+
+#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
+
+int trace_seq_putmem_hex(struct trace_seq *s, const void *mem, size_t len)
+{
+	unsigned char hex[HEX_CHARS];
+	const unsigned char *data = mem;
+	int i, j;
+
+	if (s->full)
+		return 0;
+
+#ifdef __BIG_ENDIAN
+	for (i = 0, j = 0; i < len; i++) {
+#else
+	for (i = len-1, j = 0; i >= 0; i--) {
+#endif
+		hex[j++] = hex_asc_hi(data[i]);
+		hex[j++] = hex_asc_lo(data[i]);
+	}
+	hex[j++] = ' ';
+
+	return trace_seq_putmem(s, hex, j);
+}
+
+void *trace_seq_reserve(struct trace_seq *s, size_t len)
+{
+	void *ret;
+
+	if (s->full)
+		return NULL;
+
+	if (len > ((PAGE_SIZE - 1) - s->len)) {
+		s->full = 1;
+		return NULL;
+	}
+
+	ret = s->buffer + s->len;
+	s->len += len;
+
+	return ret;
+}
+
+int trace_seq_path(struct trace_seq *s, const struct path *path)
+{
+	unsigned char *p;
+
+	if (s->full)
+		return 0;
+
+	if (s->len >= (PAGE_SIZE - 1)) {
+		s->full = 1;
+		return 0;
+	}
+
+	p = d_path(path, s->buffer + s->len, PAGE_SIZE - s->len);
+	if (!IS_ERR(p)) {
+		p = mangle_path(s->buffer + s->len, p, "\n");
+		if (p) {
+			s->len = p - s->buffer;
+			return 1;
+		}
+	} else {
+		s->buffer[s->len++] = '?';
+		return 1;
+	}
+
+	s->full = 1;
+	return 0;
+}
+
+ssize_t trace_seq_to_user(struct trace_seq *s, char __user *ubuf, size_t cnt)
+{
+	int len;
+	int ret;
+
+	if (!cnt)
+		return 0;
+
+	if (s->len <= s->readpos)
+		return -EBUSY;
+
+	len = s->len - s->readpos;
+	if (cnt > len)
+		cnt = len;
+	ret = copy_to_user(ubuf, s->buffer + s->readpos, cnt);
+	if (ret == cnt)
+		return -EFAULT;
+
+	cnt -= ret;
+
+	s->readpos += cnt;
+	return cnt;
+}
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC][PATCH 4/5 v2] printk: Add per_cpu printk func to allow printk to be diverted
  2014-06-26 21:49 [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Steven Rostedt
                   ` (2 preceding siblings ...)
  2014-06-26 21:49 ` [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/ Steven Rostedt
@ 2014-06-26 21:49 ` Steven Rostedt
  2014-06-27 14:20   ` Petr Mládek
       [not found] ` <20140626220130.764213722@goodmis.org>
  2014-08-07  8:41 ` [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Jiri Kosina
  5 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-26 21:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Jiri Kosina,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

[-- Attachment #1: 0004-printk-Add-per_cpu-printk-func-to-allow-printk-to-be.patch --]
[-- Type: text/plain, Size: 3493 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Being able to divert printk to call another function besides the normal
logging is useful for such things like NMI handling. If some functions
are to be called from NMI that does printk() it is possible to lock up
the box if the nmi handler triggers when another printk is happening.

One example of this use is to perform a stack trace on all CPUs via NMI.
But if the NMI is to do the printk() it can cause the system to lock up.
By allowing the printk to be diverted to another function that can safely
record the printk output and then print it when it in a safe context
then NMIs will be safe to call these functions like show_regs().

Link: http://lkml.kernel.org/p/20140619213952.209176403@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/percpu.h |  3 +++
 include/linux/printk.h |  2 ++
 kernel/printk/printk.c | 38 +++++++++++++++++++++++++++++---------
 3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 8419053d0f2e..9997c92ce3bd 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -802,4 +802,7 @@ do { __this_cpu_preempt_check("or");					\
 	(__this_cpu_preempt_check("cmpxchg_double"),__pcpu_double_call_return_bool(raw_cpu_cmpxchg_double_, (pcp1), (pcp2), (oval1), (oval2), (nval1), (nval2)))
 #endif
 
+/* To avoid include hell, as printk can not declare this, we declare it here */
+DECLARE_PER_CPU(printk_func_t, printk_func);
+
 #endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 319ff7e53efb..e26310b2d2fd 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -159,6 +159,8 @@ extern int kptr_restrict;
 
 extern void wake_up_klogd(void);
 
+typedef int(*printk_func_t)(const char *fmt, va_list args);
+
 void log_buf_kexec_setup(void);
 void __init setup_log_buf(int early);
 void dump_stack_set_arch_desc(const char *fmt, ...);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ea2d5f6962ed..e3581f95ba5c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1764,6 +1764,30 @@ asmlinkage int printk_emit(int facility, int level,
 }
 EXPORT_SYMBOL(printk_emit);
 
+int vprintk_default(const char *fmt, va_list args)
+{
+	int r;
+
+#ifdef CONFIG_KGDB_KDB
+	if (unlikely(kdb_trap_printk)) {
+		r = vkdb_printf(fmt, args);
+		return r;
+	}
+#endif
+	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
+
+	return r;
+}
+EXPORT_SYMBOL_GPL(vprintk_default);
+
+/*
+ * This allows printk to be diverted to another function per cpu.
+ * This is useful for calling printk functions from within NMI
+ * without worrying about race conditions that can lock up the
+ * box.
+ */
+DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
+
 /**
  * printk - print a kernel message
  * @fmt: format string
@@ -1787,19 +1811,15 @@ EXPORT_SYMBOL(printk_emit);
  */
 asmlinkage __visible int printk(const char *fmt, ...)
 {
+	printk_func_t vprintk_func;
 	va_list args;
 	int r;
 
-#ifdef CONFIG_KGDB_KDB
-	if (unlikely(kdb_trap_printk)) {
-		va_start(args, fmt);
-		r = vkdb_printf(fmt, args);
-		va_end(args);
-		return r;
-	}
-#endif
 	va_start(args, fmt);
-	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
+	preempt_disable();
+	vprintk_func = this_cpu_read(printk_func);
+	r = vprintk_func(fmt, args);
+	preempt_enable();
 	va_end(args);
 
 	return r;
-- 
2.0.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 5/5 v2] x86/nmi: Perform a safe NMI stack trace on all CPUs
       [not found] ` <20140626220130.764213722@goodmis.org>
@ 2014-06-26 22:51   ` Steven Rostedt
  2014-06-27 14:32   ` Petr Mládek
  1 sibling, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2014-06-26 22:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Petr Mladek, Paul E. McKenney,
	Konstantin Khlebnikov

This patch got rejected by lkml:

>>> linux-kernel@vger.kernel.org (reading confirmation): 550 5.7.1 Content-Policy reject msg: Wrong MIME labeling on 8-bit character texts. BF:<H 0>; S1751405AbaFZWBb  

I guess quilt couldn't handle Petr's last name (I changed it to 'a'
here).

-- Steve


On Thu, 26 Jun 2014 17:49:06 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> When trigger_all_cpu_backtrace() is called on x86, it will trigger an
> NMI on each CPU and call show_regs(). But this can lead to a hard lock
> up if the NMI comes in on another printk().
> 
> In order to avoid this, when the NMI triggers, it switches the printk
> routine for that CPU to call a NMI safe printk function that records the
> printk in a per_cpu seq_buf descriptor. After all NMIs have finished
> recording its data, the trace_seqs are printed in a safe context.
> 
> Link: http://lkml.kernel.org/p/20140619213952.360076309@goodmis.org
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> V2 - updated to use seq_buf instead of trace_seq.
>    - Keep original printk loglevel, do not strip it, but use it.
>      (suggested by Petr Mladek)
>    - Updated comments
>    - Removed return of true (noticed by Petr Mladek)
> ---
>  arch/x86/kernel/apic/hw_nmi.c | 84 +++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 81 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index c3fcb5de5083..91ddba672839 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -18,6 +18,7 @@
>  #include <linux/nmi.h>
>  #include <linux/module.h>
>  #include <linux/delay.h>
> +#include <linux/seq_buf.h>
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  u64 hw_nmi_get_sample_period(int watchdog_thresh)
> @@ -30,11 +31,31 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
>  /* For reliability, we're prepared to waste bits here. */
>  static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
>  
> +#define NMI_BUF_SIZE		4096
> +
> +struct nmi_seq_buf {
> +	unsigned char		buffer[NMI_BUF_SIZE];
> +	struct seq_buf		seq;
> +};
> +
> +/* Safe printing in NMI context */
> +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
> +
>  /* "in progress" flag of arch_trigger_all_cpu_backtrace */
>  static unsigned long backtrace_flag;
>  
> +static void print_seq_line(struct nmi_seq_buf *s, int last, int pos)
> +{
> +	const char *buf = s->buffer + last;
> +
> +	printk("%.*s", (pos - last) + 1, buf);
> +}
> +
>  void arch_trigger_all_cpu_backtrace(void)
>  {
> +	struct nmi_seq_buf *s;
> +	int len;
> +	int cpu;
>  	int i;
>  
>  	if (test_and_set_bit(0, &backtrace_flag))
> @@ -44,6 +65,15 @@ void arch_trigger_all_cpu_backtrace(void)
>  		 */
>  		return;
>  
> +	/*
> +	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
> +	 * CPUs will write to.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		s = &per_cpu(nmi_print_seq, cpu);
> +		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
> +	}
> +
>  	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
>  
>  	printk(KERN_INFO "sending NMI to all CPUs:\n");
> @@ -56,10 +86,56 @@ void arch_trigger_all_cpu_backtrace(void)
>  		mdelay(1);
>  	}
>  
> +	/*
> +	 * Now that all the NMIs have triggered, we can dump out their
> +	 * back traces safely to the console.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		int last_i = 0;
> +
> +		s = &per_cpu(nmi_print_seq, cpu);
> +		len = s->seq.len;
> +		if (!len)
> +			continue;
> +
> +		/* Print line by line. */
> +		for (i = 0; i < len; i++) {
> +			if (s->buffer[i] == '\n') {
> +				print_seq_line(s, last_i, i);
> +				last_i = i + 1;
> +			}
> +		}
> +		if (last_i < i - 1) {
> +			print_seq_line(s, last_i, i);
> +			pr_cont("\n");
> +		}
> +	}
> +
>  	clear_bit(0, &backtrace_flag);
>  	smp_mb__after_atomic();
>  }
>  
> +/*
> + * It is not safe to call printk() directly from NMI handlers.
> + * It may be fine if the NMI detected a lock up and we have no choice
> + * but to do so, but doing a NMI on all other CPUs to get a back trace
> + * can be done with a sysrq-l. We don't want that to lock up, which
> + * can happen if the NMI interrupts a printk in progress.
> + *
> + * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
> + * the content into a per cpu seq_buf buffer. Then when the NMIs are
> + * all done, we can safely dump the contents of the seq_buf to a printk()
> + * from a non NMI context.
> + */
> +static int nmi_vprintk(const char *fmt, va_list args)
> +{
> +	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> +	unsigned int len = s->seq.len;
> +
> +	seq_buf_vprintf(&s->seq, fmt, args);
> +	return s->seq.len - len;
> +}
> +
>  static int
>  arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
>  {
> @@ -68,12 +144,14 @@ arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
>  	cpu = smp_processor_id();
>  
>  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> -		static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
> +		printk_func_t printk_func_save = this_cpu_read(printk_func);
>  
> -		arch_spin_lock(&lock);
> +		/* Replace printk to write into the NMI seq */
> +		this_cpu_write(printk_func, nmi_vprintk);
>  		printk(KERN_WARNING "NMI backtrace for cpu %d\n", cpu);
>  		show_regs(regs);
> -		arch_spin_unlock(&lock);
> +		this_cpu_write(printk_func, printk_func_save);
> +
>  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>  		return NMI_HANDLED;
>  	}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function
  2014-06-26 21:49 ` [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function Steven Rostedt
@ 2014-06-27  1:06   ` Steven Rostedt
  2014-06-27  3:14     ` James Bottomley
  2014-06-27  7:37     ` Paolo Bonzini
  0 siblings, 2 replies; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27  1:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Petr Mladek, Paul E. McKenney,
	Konstantin Khlebnikov, James E.J. Bottomley, Gleb Natapov,
	Paolo Bonzini


As this patch is in my 3.17 queue and it touches the kvm and scsi
tracepoint code, I figured I should at least do the courtesy of
notifying the maintainers of those subsystems.

Do you have any issues with this going through my tree? If not, please
give me an Acked-by.

Thanks!

-- Steve

On Thu, 26 Jun 2014 17:49:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> There's several locations in the kernel that open code the calculation
> of the next location in the trace_seq buffer. This is usually done with
> 
>   p->buffer + p->len
> 
> Instead of having this open coded, supply a helper function in the
> header to do it for them. This function is called trace_seq_buffer_ptr().
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  arch/x86/kvm/mmutrace.h     |  2 +-
>  drivers/scsi/scsi_trace.c   | 16 ++++++++--------
>  include/linux/trace_seq.h   | 15 +++++++++++++++
>  kernel/trace/trace_output.c | 14 +++++++-------
>  4 files changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
> index 9d2e0ffcb190..2e5652b62fd6 100644
> --- a/arch/x86/kvm/mmutrace.h
> +++ b/arch/x86/kvm/mmutrace.h
> @@ -22,7 +22,7 @@
>  	__entry->unsync = sp->unsync;
>  
>  #define KVM_MMU_PAGE_PRINTK() ({				        \
> -	const char *ret = p->buffer + p->len;				\
> +	const char *ret = trace_seq_buffer_ptr(p);			\
>  	static const char *access_str[] = {			        \
>  		"---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"  \
>  	};							        \
> diff --git a/drivers/scsi/scsi_trace.c b/drivers/scsi/scsi_trace.c
> index 2bea4f0b684a..503594e5f76d 100644
> --- a/drivers/scsi/scsi_trace.c
> +++ b/drivers/scsi/scsi_trace.c
> @@ -28,7 +28,7 @@ scsi_trace_misc(struct trace_seq *, unsigned char *, int);
>  static const char *
>  scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  	sector_t lba = 0, txlen = 0;
>  
>  	lba |= ((cdb[1] & 0x1F) << 16);
> @@ -46,7 +46,7 @@ scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
>  static const char *
>  scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  	sector_t lba = 0, txlen = 0;
>  
>  	lba |= (cdb[2] << 24);
> @@ -71,7 +71,7 @@ scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
>  static const char *
>  scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  	sector_t lba = 0, txlen = 0;
>  
>  	lba |= (cdb[2] << 24);
> @@ -94,7 +94,7 @@ scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
>  static const char *
>  scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  	sector_t lba = 0, txlen = 0;
>  
>  	lba |= ((u64)cdb[2] << 56);
> @@ -125,7 +125,7 @@ scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
>  static const char *
>  scsi_trace_rw32(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len, *cmd;
> +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
>  	sector_t lba = 0, txlen = 0;
>  	u32 ei_lbrt = 0;
>  
> @@ -180,7 +180,7 @@ out:
>  static const char *
>  scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  	unsigned int regions = cdb[7] << 8 | cdb[8];
>  
>  	trace_seq_printf(p, "regions=%u", (regions - 8) / 16);
> @@ -192,7 +192,7 @@ scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
>  static const char *
>  scsi_trace_service_action_in(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len, *cmd;
> +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
>  	sector_t lba = 0;
>  	u32 alloc_len = 0;
>  
> @@ -247,7 +247,7 @@ scsi_trace_varlen(struct trace_seq *p, unsigned char *cdb, int len)
>  static const char *
>  scsi_trace_misc(struct trace_seq *p, unsigned char *cdb, int len)
>  {
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  
>  	trace_seq_printf(p, "-");
>  	trace_seq_putc(p, 0);
> diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
> index dd85753e1bb0..ea6c9dea79e3 100644
> --- a/include/linux/trace_seq.h
> +++ b/include/linux/trace_seq.h
> @@ -25,6 +25,21 @@ trace_seq_init(struct trace_seq *s)
>  	s->full = 0;
>  }
>  
> +/**
> + * trace_seq_buffer_ptr - return pointer to next location in buffer
> + * @s: trace sequence descriptor
> + *
> + * Returns the pointer to the buffer where the next write to
> + * the buffer will happen. This is useful to save the location
> + * that is about to be written to and then return the result
> + * of that write.
> + */
> +static inline unsigned char *
> +trace_seq_buffer_ptr(struct trace_seq *s)
> +{
> +	return s->buffer + s->len;
> +}
> +
>  /*
>   * Currently only defined when tracing is enabled.
>   */
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index b8930f79a04b..c6977d5a9b12 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -75,7 +75,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
>  {
>  	unsigned long mask;
>  	const char *str;
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  	int i, first = 1;
>  
>  	for (i = 0;  flag_array[i].name && flags; i++) {
> @@ -111,7 +111,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
>  			 const struct trace_print_flags *symbol_array)
>  {
>  	int i;
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  
>  	for (i = 0;  symbol_array[i].name; i++) {
>  
> @@ -122,7 +122,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
>  		break;
>  	}
>  
> -	if (ret == (const char *)(p->buffer + p->len))
> +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
>  		trace_seq_printf(p, "0x%lx", val);
>  		
>  	trace_seq_putc(p, 0);
> @@ -137,7 +137,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
>  			 const struct trace_print_flags_u64 *symbol_array)
>  {
>  	int i;
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  
>  	for (i = 0;  symbol_array[i].name; i++) {
>  
> @@ -148,7 +148,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
>  		break;
>  	}
>  
> -	if (ret == (const char *)(p->buffer + p->len))
> +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
>  		trace_seq_printf(p, "0x%llx", val);
>  
>  	trace_seq_putc(p, 0);
> @@ -162,7 +162,7 @@ const char *
>  ftrace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr,
>  			 unsigned int bitmask_size)
>  {
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  
>  	trace_seq_bitmask(p, bitmask_ptr, bitmask_size * 8);
>  	trace_seq_putc(p, 0);
> @@ -175,7 +175,7 @@ const char *
>  ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len)
>  {
>  	int i;
> -	const char *ret = p->buffer + p->len;
> +	const char *ret = trace_seq_buffer_ptr(p);
>  
>  	for (i = 0; i < buf_len; i++)
>  		trace_seq_printf(p, "%s%2.2x", i == 0 ? "" : " ", buf[i]);


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function
  2014-06-27  1:06   ` Steven Rostedt
@ 2014-06-27  3:14     ` James Bottomley
  2014-07-03 16:03       ` Steven Rostedt
  2014-06-27  7:37     ` Paolo Bonzini
  1 sibling, 1 reply; 33+ messages in thread
From: James Bottomley @ 2014-06-27  3:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Petr Mladek, Paul E. McKenney,
	Konstantin Khlebnikov, Gleb Natapov, Paolo Bonzini, linux-scsi

On Thu, 2014-06-26 at 21:06 -0400, Steven Rostedt wrote:
> As this patch is in my 3.17 queue and it touches the kvm and scsi
> tracepoint code, I figured I should at least do the courtesy of
> notifying the maintainers of those subsystems.
> 
> Do you have any issues with this going through my tree? If not, please
> give me an Acked-by.

Missing cc to linux-scsi added.

Hannes has a set of patches to move logging to trace points.  I don't
think there's a clash but I'd like him to comment before you do this.

James


> Thanks!
> 
> -- Steve
> 
> On Thu, 26 Jun 2014 17:49:02 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> > 
> > There's several locations in the kernel that open code the calculation
> > of the next location in the trace_seq buffer. This is usually done with
> > 
> >   p->buffer + p->len
> > 
> > Instead of having this open coded, supply a helper function in the
> > header to do it for them. This function is called trace_seq_buffer_ptr().
> > 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  arch/x86/kvm/mmutrace.h     |  2 +-
> >  drivers/scsi/scsi_trace.c   | 16 ++++++++--------
> >  include/linux/trace_seq.h   | 15 +++++++++++++++
> >  kernel/trace/trace_output.c | 14 +++++++-------
> >  4 files changed, 31 insertions(+), 16 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
> > index 9d2e0ffcb190..2e5652b62fd6 100644
> > --- a/arch/x86/kvm/mmutrace.h
> > +++ b/arch/x86/kvm/mmutrace.h
> > @@ -22,7 +22,7 @@
> >  	__entry->unsync = sp->unsync;
> >  
> >  #define KVM_MMU_PAGE_PRINTK() ({				        \
> > -	const char *ret = p->buffer + p->len;				\
> > +	const char *ret = trace_seq_buffer_ptr(p);			\
> >  	static const char *access_str[] = {			        \
> >  		"---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"  \
> >  	};							        \
> > diff --git a/drivers/scsi/scsi_trace.c b/drivers/scsi/scsi_trace.c
> > index 2bea4f0b684a..503594e5f76d 100644
> > --- a/drivers/scsi/scsi_trace.c
> > +++ b/drivers/scsi/scsi_trace.c
> > @@ -28,7 +28,7 @@ scsi_trace_misc(struct trace_seq *, unsigned char *, int);
> >  static const char *
> >  scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  	sector_t lba = 0, txlen = 0;
> >  
> >  	lba |= ((cdb[1] & 0x1F) << 16);
> > @@ -46,7 +46,7 @@ scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
> >  static const char *
> >  scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  	sector_t lba = 0, txlen = 0;
> >  
> >  	lba |= (cdb[2] << 24);
> > @@ -71,7 +71,7 @@ scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
> >  static const char *
> >  scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  	sector_t lba = 0, txlen = 0;
> >  
> >  	lba |= (cdb[2] << 24);
> > @@ -94,7 +94,7 @@ scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
> >  static const char *
> >  scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  	sector_t lba = 0, txlen = 0;
> >  
> >  	lba |= ((u64)cdb[2] << 56);
> > @@ -125,7 +125,7 @@ scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
> >  static const char *
> >  scsi_trace_rw32(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len, *cmd;
> > +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
> >  	sector_t lba = 0, txlen = 0;
> >  	u32 ei_lbrt = 0;
> >  
> > @@ -180,7 +180,7 @@ out:
> >  static const char *
> >  scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  	unsigned int regions = cdb[7] << 8 | cdb[8];
> >  
> >  	trace_seq_printf(p, "regions=%u", (regions - 8) / 16);
> > @@ -192,7 +192,7 @@ scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
> >  static const char *
> >  scsi_trace_service_action_in(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len, *cmd;
> > +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
> >  	sector_t lba = 0;
> >  	u32 alloc_len = 0;
> >  
> > @@ -247,7 +247,7 @@ scsi_trace_varlen(struct trace_seq *p, unsigned char *cdb, int len)
> >  static const char *
> >  scsi_trace_misc(struct trace_seq *p, unsigned char *cdb, int len)
> >  {
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  
> >  	trace_seq_printf(p, "-");
> >  	trace_seq_putc(p, 0);
> > diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
> > index dd85753e1bb0..ea6c9dea79e3 100644
> > --- a/include/linux/trace_seq.h
> > +++ b/include/linux/trace_seq.h
> > @@ -25,6 +25,21 @@ trace_seq_init(struct trace_seq *s)
> >  	s->full = 0;
> >  }
> >  
> > +/**
> > + * trace_seq_buffer_ptr - return pointer to next location in buffer
> > + * @s: trace sequence descriptor
> > + *
> > + * Returns the pointer to the buffer where the next write to
> > + * the buffer will happen. This is useful to save the location
> > + * that is about to be written to and then return the result
> > + * of that write.
> > + */
> > +static inline unsigned char *
> > +trace_seq_buffer_ptr(struct trace_seq *s)
> > +{
> > +	return s->buffer + s->len;
> > +}
> > +
> >  /*
> >   * Currently only defined when tracing is enabled.
> >   */
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index b8930f79a04b..c6977d5a9b12 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -75,7 +75,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
> >  {
> >  	unsigned long mask;
> >  	const char *str;
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  	int i, first = 1;
> >  
> >  	for (i = 0;  flag_array[i].name && flags; i++) {
> > @@ -111,7 +111,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
> >  			 const struct trace_print_flags *symbol_array)
> >  {
> >  	int i;
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  
> >  	for (i = 0;  symbol_array[i].name; i++) {
> >  
> > @@ -122,7 +122,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
> >  		break;
> >  	}
> >  
> > -	if (ret == (const char *)(p->buffer + p->len))
> > +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
> >  		trace_seq_printf(p, "0x%lx", val);
> >  		
> >  	trace_seq_putc(p, 0);
> > @@ -137,7 +137,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
> >  			 const struct trace_print_flags_u64 *symbol_array)
> >  {
> >  	int i;
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  
> >  	for (i = 0;  symbol_array[i].name; i++) {
> >  
> > @@ -148,7 +148,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
> >  		break;
> >  	}
> >  
> > -	if (ret == (const char *)(p->buffer + p->len))
> > +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
> >  		trace_seq_printf(p, "0x%llx", val);
> >  
> >  	trace_seq_putc(p, 0);
> > @@ -162,7 +162,7 @@ const char *
> >  ftrace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr,
> >  			 unsigned int bitmask_size)
> >  {
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  
> >  	trace_seq_bitmask(p, bitmask_ptr, bitmask_size * 8);
> >  	trace_seq_putc(p, 0);
> > @@ -175,7 +175,7 @@ const char *
> >  ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len)
> >  {
> >  	int i;
> > -	const char *ret = p->buffer + p->len;
> > +	const char *ret = trace_seq_buffer_ptr(p);
> >  
> >  	for (i = 0; i < buf_len; i++)
> >  		trace_seq_printf(p, "%s%2.2x", i == 0 ? "" : " ", buf[i]);




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function
  2014-06-27  1:06   ` Steven Rostedt
  2014-06-27  3:14     ` James Bottomley
@ 2014-06-27  7:37     ` Paolo Bonzini
  1 sibling, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2014-06-27  7:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Petr Mladek, Paul E. McKenney,
	Konstantin Khlebnikov, James E.J. Bottomley, Gleb Natapov

Il 27/06/2014 03:06, Steven Rostedt ha scritto:
>
> As this patch is in my 3.17 queue and it touches the kvm and scsi
> tracepoint code, I figured I should at least do the courtesy of
> notifying the maintainers of those subsystems.
>
> Do you have any issues with this going through my tree? If not, please
> give me an Acked-by.

No problem absolutely.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> Thanks!
>
> -- Steve
>
> On Thu, 26 Jun 2014 17:49:02 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
>>
>> There's several locations in the kernel that open code the calculation
>> of the next location in the trace_seq buffer. This is usually done with
>>
>>   p->buffer + p->len
>>
>> Instead of having this open coded, supply a helper function in the
>> header to do it for them. This function is called trace_seq_buffer_ptr().
>>
>> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>> ---
>>  arch/x86/kvm/mmutrace.h     |  2 +-
>>  drivers/scsi/scsi_trace.c   | 16 ++++++++--------
>>  include/linux/trace_seq.h   | 15 +++++++++++++++
>>  kernel/trace/trace_output.c | 14 +++++++-------
>>  4 files changed, 31 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
>> index 9d2e0ffcb190..2e5652b62fd6 100644
>> --- a/arch/x86/kvm/mmutrace.h
>> +++ b/arch/x86/kvm/mmutrace.h
>> @@ -22,7 +22,7 @@
>>  	__entry->unsync = sp->unsync;
>>
>>  #define KVM_MMU_PAGE_PRINTK() ({				        \
>> -	const char *ret = p->buffer + p->len;				\
>> +	const char *ret = trace_seq_buffer_ptr(p);			\
>>  	static const char *access_str[] = {			        \
>>  		"---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"  \
>>  	};							        \
>> diff --git a/drivers/scsi/scsi_trace.c b/drivers/scsi/scsi_trace.c
>> index 2bea4f0b684a..503594e5f76d 100644
>> --- a/drivers/scsi/scsi_trace.c
>> +++ b/drivers/scsi/scsi_trace.c
>> @@ -28,7 +28,7 @@ scsi_trace_misc(struct trace_seq *, unsigned char *, int);
>>  static const char *
>>  scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>  	sector_t lba = 0, txlen = 0;
>>
>>  	lba |= ((cdb[1] & 0x1F) << 16);
>> @@ -46,7 +46,7 @@ scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
>>  static const char *
>>  scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>  	sector_t lba = 0, txlen = 0;
>>
>>  	lba |= (cdb[2] << 24);
>> @@ -71,7 +71,7 @@ scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
>>  static const char *
>>  scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>  	sector_t lba = 0, txlen = 0;
>>
>>  	lba |= (cdb[2] << 24);
>> @@ -94,7 +94,7 @@ scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
>>  static const char *
>>  scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>  	sector_t lba = 0, txlen = 0;
>>
>>  	lba |= ((u64)cdb[2] << 56);
>> @@ -125,7 +125,7 @@ scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
>>  static const char *
>>  scsi_trace_rw32(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len, *cmd;
>> +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
>>  	sector_t lba = 0, txlen = 0;
>>  	u32 ei_lbrt = 0;
>>
>> @@ -180,7 +180,7 @@ out:
>>  static const char *
>>  scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>  	unsigned int regions = cdb[7] << 8 | cdb[8];
>>
>>  	trace_seq_printf(p, "regions=%u", (regions - 8) / 16);
>> @@ -192,7 +192,7 @@ scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
>>  static const char *
>>  scsi_trace_service_action_in(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len, *cmd;
>> +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
>>  	sector_t lba = 0;
>>  	u32 alloc_len = 0;
>>
>> @@ -247,7 +247,7 @@ scsi_trace_varlen(struct trace_seq *p, unsigned char *cdb, int len)
>>  static const char *
>>  scsi_trace_misc(struct trace_seq *p, unsigned char *cdb, int len)
>>  {
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>
>>  	trace_seq_printf(p, "-");
>>  	trace_seq_putc(p, 0);
>> diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
>> index dd85753e1bb0..ea6c9dea79e3 100644
>> --- a/include/linux/trace_seq.h
>> +++ b/include/linux/trace_seq.h
>> @@ -25,6 +25,21 @@ trace_seq_init(struct trace_seq *s)
>>  	s->full = 0;
>>  }
>>
>> +/**
>> + * trace_seq_buffer_ptr - return pointer to next location in buffer
>> + * @s: trace sequence descriptor
>> + *
>> + * Returns the pointer to the buffer where the next write to
>> + * the buffer will happen. This is useful to save the location
>> + * that is about to be written to and then return the result
>> + * of that write.
>> + */
>> +static inline unsigned char *
>> +trace_seq_buffer_ptr(struct trace_seq *s)
>> +{
>> +	return s->buffer + s->len;
>> +}
>> +
>>  /*
>>   * Currently only defined when tracing is enabled.
>>   */
>> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
>> index b8930f79a04b..c6977d5a9b12 100644
>> --- a/kernel/trace/trace_output.c
>> +++ b/kernel/trace/trace_output.c
>> @@ -75,7 +75,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
>>  {
>>  	unsigned long mask;
>>  	const char *str;
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>  	int i, first = 1;
>>
>>  	for (i = 0;  flag_array[i].name && flags; i++) {
>> @@ -111,7 +111,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
>>  			 const struct trace_print_flags *symbol_array)
>>  {
>>  	int i;
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>
>>  	for (i = 0;  symbol_array[i].name; i++) {
>>
>> @@ -122,7 +122,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
>>  		break;
>>  	}
>>
>> -	if (ret == (const char *)(p->buffer + p->len))
>> +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
>>  		trace_seq_printf(p, "0x%lx", val);
>>  		
>>  	trace_seq_putc(p, 0);
>> @@ -137,7 +137,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
>>  			 const struct trace_print_flags_u64 *symbol_array)
>>  {
>>  	int i;
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>
>>  	for (i = 0;  symbol_array[i].name; i++) {
>>
>> @@ -148,7 +148,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
>>  		break;
>>  	}
>>
>> -	if (ret == (const char *)(p->buffer + p->len))
>> +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
>>  		trace_seq_printf(p, "0x%llx", val);
>>
>>  	trace_seq_putc(p, 0);
>> @@ -162,7 +162,7 @@ const char *
>>  ftrace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr,
>>  			 unsigned int bitmask_size)
>>  {
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>
>>  	trace_seq_bitmask(p, bitmask_ptr, bitmask_size * 8);
>>  	trace_seq_putc(p, 0);
>> @@ -175,7 +175,7 @@ const char *
>>  ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len)
>>  {
>>  	int i;
>> -	const char *ret = p->buffer + p->len;
>> +	const char *ret = trace_seq_buffer_ptr(p);
>>
>>  	for (i = 0; i < buf_len; i++)
>>  		trace_seq_printf(p, "%s%2.2x", i == 0 ? "" : " ", buf[i]);
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-26 21:49 ` [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq Steven Rostedt
@ 2014-06-27 13:45   ` Petr Mládek
  2014-06-27 14:19     ` Steven Rostedt
  2014-06-27 14:21     ` Steven Rostedt
  0 siblings, 2 replies; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 13:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Thu 2014-06-26 17:49:03, Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> Create a seq_buf layer that trace_seq sits on. The seq_buf will not
> be limited to page size. This will allow other usages of seq_buf
> instead of a hard set PAGE_SIZE one that trace_seq has.
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  include/linux/seq_buf.h              |  58 ++++++
>  include/linux/trace_seq.h            |  10 +-
>  kernel/trace/Makefile                |   1 +
>  kernel/trace/seq_buf.c               | 348 +++++++++++++++++++++++++++++++++++
>  kernel/trace/trace.c                 |  39 ++--
>  kernel/trace/trace_events.c          |   6 +-
>  kernel/trace/trace_functions_graph.c |   6 +-
>  kernel/trace/trace_seq.c             | 184 +++++++++---------
>  8 files changed, 527 insertions(+), 125 deletions(-)
>  create mode 100644 include/linux/seq_buf.h
>  create mode 100644 kernel/trace/seq_buf.c

There is a lot of similar code in the two layers. Do you have any
plans to transform the trace stuff to seq_buf and get rid of the
trace_seq layer in long term?

If I get it correctly, the two layers are needed because:

   1. seq_buf works with any buffer but
      trace_seq with static buffer

   2. seq_buf writes even part of the message until the buffer is full but
      trace_buf writes only full messages or nothing

   3. seq_buf returns the number of written characters but
      trace_seq returns 1 on success and 0 on failure

   4. seq_buf sets "overflow" flag when an operation failed but
      trace_seq sets "full" when this happens


I wounder if we could get a compromise and use only one layer.

ad 1st:

   I have checked many locations and it seems that trace_seq_init() is
   called before the other functions as used. There is the WARN_ON()
   in the generic seq_buf() functions, so they would not crash. If
   there is some init missing, we could fix it later.

   But I haven't really tested it yet.


ad 2nd and 3rd:

   These are connected.

   On solution is to disable writing part of the text even in the
   generic seq_buf functions. I think that it is perfectly fine.
   If we do not print the entire line, we are in troubles anyway.
   Note that we could not allocate another buffer in NMI, so
   we will lose part of the message anyway.

   Another solution would be to live with incomplete lines in tracing.
   I wonder if any of the functions tries to write the line again when the
   write failed.

   IMHO, the most important thing is that both functions return 0 on
   failure.


ad 4th:

   Both "full" and "overflow" flags seems to have the same meaning.
   For example, trace_seq_printf() sets "full" on failure even
   when s->seq.len != s->size.


Best Regards,
Petr

[...]

> --- /dev/null
> +++ b/kernel/trace/seq_buf.c
> @@ -0,0 +1,348 @@
> +/*
> + * seq_buf.c
> + *
> + * Copyright (C) 2014 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
> + *
> + * The seq_buf is a handy tool that allows you to pass a descriptor around
> + * to a buffer that other functions can write to. It is similar to the
> + * seq_file functionality but has some differences.
> + *
> + * To use it, the seq_buf must be initialized with seq_buf_init().
> + * This will set up the counters within the descriptor. You can call
> + * seq_buf_init() more than once to reset the seq_buf to start
> + * from scratch.
> + * 
> + */
> +#include <linux/uaccess.h>
> +#include <linux/seq_file.h>
> +#include <linux/seq_buf.h>
> +
> +/* How much buffer is left on the seq_buf? */
> +#define SEQ_BUF_LEFT(s) (((s)->size - 1) - (s)->len)
> +
> +/* How much buffer is written? */
> +#define SEQ_BUF_USED(s) min((s)->len, (s)->size - 1)
> +
> +static inline void seq_buf_check_len(struct seq_buf *s)
> +{
> +	if (unlikely(s->len > (s->size - 1))) {

I would create macro for this check. It is repeated many times. I mean

#define SEQ_BUF_OVERFLOW(s) (s->len > (s->size - 1))

if (unlikely(SEQ_BUF_OVERFLOW))

> +		s->len = s->size - 1;
> +		s->overflow = 1;
> +	}
> +}
> +

[...]

> +#define MAX_MEMHEX_BYTES	8U
> +#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
> +
> +/**
> + * seq_buf_putmem_hex - write raw memory into the buffer in ASCII hex
> + * @s: seq_buf descriptor
> + * @mem: The raw memory to write its hex ASCII representation of
> + * @len: The length of the raw memory to copy (in bytes)
> + *
> + * This is similar to seq_buf_putmem() except instead of just copying the
> + * raw memory into the buffer it writes its ASCII representation of it
> + * in hex characters.
> + *
> + * Returns how much it wrote to the buffer.
> + */
> +int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
> +		       unsigned int len)
> +{
> +	unsigned char hex[HEX_CHARS];
> +	const unsigned char *data = mem;
> +	unsigned int start_len;
> +	int i, j;
> +	int cnt = 0;
> +
> +	while (len) {
> +		start_len = min(len, HEX_CHARS - 1);

I would do s/start_len/end_len/

I know that it depends on the point of view. But iteration between "0"
and "end" is better understandable for me :-)

> +#ifdef __BIG_ENDIAN
> +		for (i = 0, j = 0; i < start_len; i++) {
> +#else
> +		for (i = start_len-1, j = 0; i >= 0; i--) {
> +#endif
> +			hex[j++] = hex_asc_hi(data[i]);
> +			hex[j++] = hex_asc_lo(data[i]);
> +		}
> +		if (WARN_ON_ONCE(j == 0 || j/2 > len))
> +			break;
> +
> +		/* j increments twice per loop */
> +		len -= j / 2;
> +		hex[j++] = ' ';
> +
> +		cnt += seq_buf_putmem(s, hex, j);
> +	}
> +	return cnt;
> +}
> +
> +/**
> + * seq_buf_path - copy a path into the sequence buffer
> + * @s: seq_buf descriptor
> + * @path: path to write into the sequence buffer.
> + *
> + * Write a path name into the sequence buffer.
> + *
> + * Returns the number of bytes written into the buffer.
> + */
> +int seq_buf_path(struct seq_buf *s, const struct path *path)
> +{
> +	unsigned int len = SEQ_BUF_LEFT(s);
> +	unsigned char *p;
> +	unsigned int start = s->len;
> +
> +	WARN_ON((int)len < 0);
> +	p = d_path(path, s->buffer + s->len, len);
> +	if (!IS_ERR(p)) {
> +		p = mangle_path(s->buffer + s->len, p, "\n");
> +		if (p) {
> +			s->len = p - s->buffer;
> +	WARN_ON(s->len > (s->size - 1));

strange indentation

> +			return s->len - start;
> +		}
> +	} else {
> +		s->buffer[s->len++] = '?';
> +	WARN_ON(s->len > (s->size - 1));

same here

> +		return s->len - start;
> +	}
> +
> +	s->overflow = 1;
> +	return 0;
> +}
> +


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/
  2014-06-26 21:49 ` [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/ Steven Rostedt
@ 2014-06-27 13:48   ` Petr Mládek
  2014-06-27 14:27     ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 13:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Thu 2014-06-26 17:49:04, Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> The seq_buf functions are rather useful outside of tracing. Instead
> of having it be dependent on CONFIG_TRACING, move the code into lib/
> and allow other users to have access to it even when tracing is not
> configured.
> 
> The seq_buf utility is similar to the seq_file utility, but instead of
> writing sending data back up to userland, it writes it into a buffer
> defined at seq_buf_init(). This allows us to send a descriptor around
> that writes printf() formatted strings into it that can be retrieved
> later.
> 
> It is currently used by the tracing facility for such things like trace
> events to convert its binary saved data in the ring buffer into an
> ASCII human readable context to be displayed in /sys/kernel/debug/trace.
> 
> It can also be used for doing NMI prints safely from NMI context into
> the seq_buf and retrieved later and dumped to printk() safely. Doing
> printk() from an NMI context is dangerous because an NMI can preempt
> a current printk() and deadlock on it.
> 
> Link: http://lkml.kernel.org/p/20140619213952.058255809@goodmis.org
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/seq_buf.c | 348 -------------------------------------------------
>  lib/Makefile           |   2 +-
>  lib/seq_buf.c          | 348 +++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/trace_seq.c        | 303 ++++++++++++++++++++++++++++++++++++++++++

I guess that lib/trace_seq.c was copied by mistake.

>  4 files changed, 652 insertions(+), 349 deletions(-)
>  delete mode 100644 kernel/trace/seq_buf.c
>  create mode 100644 lib/seq_buf.c
>  create mode 100644 lib/trace_seq.c
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index ba967a19edba..6007194082c6 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
>  	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
>  	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
>  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
> -	 earlycpio.o
> +	 earlycpio.o seq_buf.o
>  
>  obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
>  lib-$(CONFIG_MMU) += ioremap.o

There is missing:

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index edc98c72a634..67d6369ddf83 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -29,7 +29,6 @@ obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o
 obj-$(CONFIG_TRACING) += trace.o
 obj-$(CONFIG_TRACING) += trace_output.o
 obj-$(CONFIG_TRACING) += trace_seq.o
-obj-$(CONFIG_TRACING) += seq_buf.o
 obj-$(CONFIG_TRACING) += trace_stat.o
 obj-$(CONFIG_TRACING) += trace_printk.o
 obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o


Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-27 13:45   ` Petr Mládek
@ 2014-06-27 14:19     ` Steven Rostedt
  2014-06-27 15:18       ` Petr Mládek
  2014-06-27 14:21     ` Steven Rostedt
  1 sibling, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27 14:19 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 27 Jun 2014 15:45:38 +0200
Petr Mládek <pmladek@suse.cz> wrote:

> On Thu 2014-06-26 17:49:03, Steven Rostedt wrote:
> > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> > 
> > Create a seq_buf layer that trace_seq sits on. The seq_buf will not
> > be limited to page size. This will allow other usages of seq_buf
> > instead of a hard set PAGE_SIZE one that trace_seq has.
> >
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  include/linux/seq_buf.h              |  58 ++++++
> >  include/linux/trace_seq.h            |  10 +-
> >  kernel/trace/Makefile                |   1 +
> >  kernel/trace/seq_buf.c               | 348 +++++++++++++++++++++++++++++++++++
> >  kernel/trace/trace.c                 |  39 ++--
> >  kernel/trace/trace_events.c          |   6 +-
> >  kernel/trace/trace_functions_graph.c |   6 +-
> >  kernel/trace/trace_seq.c             | 184 +++++++++---------
> >  8 files changed, 527 insertions(+), 125 deletions(-)
> >  create mode 100644 include/linux/seq_buf.h
> >  create mode 100644 kernel/trace/seq_buf.c
> 
> There is a lot of similar code in the two layers. Do you have any
> plans to transform the trace stuff to seq_buf and get rid of the
> trace_seq layer in long term?
> 
> If I get it correctly, the two layers are needed because:
> 
>    1. seq_buf works with any buffer but
>       trace_seq with static buffer
> 
>    2. seq_buf writes even part of the message until the buffer is full but
>       trace_buf writes only full messages or nothing
> 
>    3. seq_buf returns the number of written characters but
>       trace_seq returns 1 on success and 0 on failure
> 
>    4. seq_buf sets "overflow" flag when an operation failed but
>       trace_seq sets "full" when this happens
> 
> 
> I wounder if we could get a compromise and use only one layer.
> 
> ad 1st:
> 
>    I have checked many locations and it seems that trace_seq_init() is
>    called before the other functions as used. There is the WARN_ON()
>    in the generic seq_buf() functions, so they would not crash. If
>    there is some init missing, we could fix it later.
> 
>    But I haven't really tested it yet.

Actually, there's a few hidden places that initialize a struct with
kzalloc that contains a trace_seq. I started fixing them but there were
more and more to find that I decided to give up and just add the check
to see if it needed to be initialized at each call.

Not that pretty, but not that critical to be worth crashing something
we missed. Maybe in the future this can change, but not now.


> 
> 
> ad 2nd and 3rd:
> 
>    These are connected.
> 
>    On solution is to disable writing part of the text even in the
>    generic seq_buf functions. I think that it is perfectly fine.
>    If we do not print the entire line, we are in troubles anyway.
>    Note that we could not allocate another buffer in NMI, so
>    we will lose part of the message anyway.

This patch uses seq_buf for the NMI code so it will fill to the end of
the buffer and just truncate what can't fit.

trace_pipe depends on the trace_seq behavior.

> 
>    Another solution would be to live with incomplete lines in tracing.
>    I wonder if any of the functions tries to write the line again when the
>    write failed.

This may break trace_pipe. Although there looks to be redundant
behavior in that the pipe code also resets the seq.len on partial line,
so maybe it's not an issue.

> 
>    IMHO, the most important thing is that both functions return 0 on
>    failure.

Note, I'm not sure how tightly these two need to be. I'm actually
trying to get trace_seq to be specific to tracing and nothing more.
Have seq_buf be used for all other instances.



> 
> 
> ad 4th:
> 
>    Both "full" and "overflow" flags seems to have the same meaning.
>    For example, trace_seq_printf() sets "full" on failure even
>    when s->seq.len != s->size.

The difference is that the overflow flag is just used for info letting
the user know that it did not fit. The full flag in trace_seq lets you
know that you can not add anything else, even though the new stuff may
fit.

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 4/5 v2] printk: Add per_cpu printk func to allow printk to be diverted
  2014-06-26 21:49 ` [RFC][PATCH 4/5 v2] printk: Add per_cpu printk func to allow printk to be diverted Steven Rostedt
@ 2014-06-27 14:20   ` Petr Mládek
  2014-06-27 14:39     ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 14:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Thu 2014-06-26 17:49:05, Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> Being able to divert printk to call another function besides the normal
> logging is useful for such things like NMI handling. If some functions
> are to be called from NMI that does printk() it is possible to lock up
> the box if the nmi handler triggers when another printk is happening.
> 
> One example of this use is to perform a stack trace on all CPUs via NMI.
> But if the NMI is to do the printk() it can cause the system to lock up.
> By allowing the printk to be diverted to another function that can safely
> record the printk output and then print it when it in a safe context
> then NMIs will be safe to call these functions like show_regs().
> 
> Link: http://lkml.kernel.org/p/20140619213952.209176403@goodmis.org
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

It is nice trick and works very well from the backtrace stuff.

Tested-by: Petr Mladek <pmladek@suse.cz>

See one comment below.

> ---
>  include/linux/percpu.h |  3 +++
>  include/linux/printk.h |  2 ++
>  kernel/printk/printk.c | 38 +++++++++++++++++++++++++++++---------
>  3 files changed, 34 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index 8419053d0f2e..9997c92ce3bd 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -802,4 +802,7 @@ do { __this_cpu_preempt_check("or");					\
>  	(__this_cpu_preempt_check("cmpxchg_double"),__pcpu_double_call_return_bool(raw_cpu_cmpxchg_double_, (pcp1), (pcp2), (oval1), (oval2), (nval1), (nval2)))
>  #endif
>  
> +/* To avoid include hell, as printk can not declare this, we declare it here */
> +DECLARE_PER_CPU(printk_func_t, printk_func);
> +
>  #endif /* __LINUX_PERCPU_H */
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 319ff7e53efb..e26310b2d2fd 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -159,6 +159,8 @@ extern int kptr_restrict;
>  
>  extern void wake_up_klogd(void);
>  
> +typedef int(*printk_func_t)(const char *fmt, va_list args);
> +
>  void log_buf_kexec_setup(void);
>  void __init setup_log_buf(int early);
>  void dump_stack_set_arch_desc(const char *fmt, ...);
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea2d5f6962ed..e3581f95ba5c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1764,6 +1764,30 @@ asmlinkage int printk_emit(int facility, int level,
>  }
>  EXPORT_SYMBOL(printk_emit);
>  
> +int vprintk_default(const char *fmt, va_list args)
> +{
> +	int r;
> +
> +#ifdef CONFIG_KGDB_KDB
> +	if (unlikely(kdb_trap_printk)) {
> +		r = vkdb_printf(fmt, args);
> +		return r;
> +	}
> +#endif
> +	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
> +
> +	return r;
> +}
> +EXPORT_SYMBOL_GPL(vprintk_default);
> +
> +/*
> + * This allows printk to be diverted to another function per cpu.
> + * This is useful for calling printk functions from within NMI
> + * without worrying about race conditions that can lock up the
> + * box.
> + */
> +DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> +
>  /**
>   * printk - print a kernel message
>   * @fmt: format string
> @@ -1787,19 +1811,15 @@ EXPORT_SYMBOL(printk_emit);
>   */
>  asmlinkage __visible int printk(const char *fmt, ...)
>  {
> +	printk_func_t vprintk_func;
>  	va_list args;
>  	int r;
>  
> -#ifdef CONFIG_KGDB_KDB
> -	if (unlikely(kdb_trap_printk)) {
> -		va_start(args, fmt);
> -		r = vkdb_printf(fmt, args);
> -		va_end(args);
> -		return r;
> -	}
> -#endif
>  	va_start(args, fmt);
> -	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
> +	preempt_disable();

I think that it is too late to disable the preemption here.
It has to be done by the printk() caller if it wants to be sure
that the requested function is used.

> +	vprintk_func = this_cpu_read(printk_func);
> +	r = vprintk_func(fmt, args);
> +	preempt_enable();
>  	va_end(args);
>  
>  	return r;

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-27 13:45   ` Petr Mládek
  2014-06-27 14:19     ` Steven Rostedt
@ 2014-06-27 14:21     ` Steven Rostedt
  2014-06-27 14:56       ` Petr Mládek
  1 sibling, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27 14:21 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 27 Jun 2014 15:45:38 +0200
Petr Mládek <pmladek@suse.cz> wrote:

> ad 4th:
> 
>    Both "full" and "overflow" flags seems to have the same meaning.
>    For example, trace_seq_printf() sets "full" on failure even
>    when s->seq.len != s->size.
> 
> 
> Best Regards,
> Petr
> 
> [...]


BTW, you shouldn't sign off if you have more comments, I usually stop
reading when I see someone's signature. Or at the very least, state
that you have more comments below.
 
> > --- /dev/null
> > +++ b/kernel/trace/seq_buf.c
> > @@ -0,0 +1,348 @@
> > +/*
> > + * seq_buf.c
> > + *
> > + * Copyright (C) 2014 Red Hat Inc, Steven Rostedt <srostedt@redhat.com>
> > + *
> > + * The seq_buf is a handy tool that allows you to pass a descriptor around
> > + * to a buffer that other functions can write to. It is similar to the
> > + * seq_file functionality but has some differences.
> > + *
> > + * To use it, the seq_buf must be initialized with seq_buf_init().
> > + * This will set up the counters within the descriptor. You can call
> > + * seq_buf_init() more than once to reset the seq_buf to start
> > + * from scratch.
> > + * 
> > + */
> > +#include <linux/uaccess.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/seq_buf.h>
> > +
> > +/* How much buffer is left on the seq_buf? */
> > +#define SEQ_BUF_LEFT(s) (((s)->size - 1) - (s)->len)
> > +
> > +/* How much buffer is written? */
> > +#define SEQ_BUF_USED(s) min((s)->len, (s)->size - 1)
> > +
> > +static inline void seq_buf_check_len(struct seq_buf *s)
> > +{
> > +	if (unlikely(s->len > (s->size - 1))) {
> 
> I would create macro for this check. It is repeated many times. I mean
> 
> #define SEQ_BUF_OVERFLOW(s) (s->len > (s->size - 1))
> 
> if (unlikely(SEQ_BUF_OVERFLOW))

Yeah, I was hoping that the static inline would be enough, but then it
didn't fit into the functions at the end. I never got around to then
adding a macro.

Everything was open coded when I first created it.

> 
> > +		s->len = s->size - 1;
> > +		s->overflow = 1;
> > +	}
> > +}
> > +
> 
> [...]
> 
> > +#define MAX_MEMHEX_BYTES	8U
> > +#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
> > +
> > +/**
> > + * seq_buf_putmem_hex - write raw memory into the buffer in ASCII hex
> > + * @s: seq_buf descriptor
> > + * @mem: The raw memory to write its hex ASCII representation of
> > + * @len: The length of the raw memory to copy (in bytes)
> > + *
> > + * This is similar to seq_buf_putmem() except instead of just copying the
> > + * raw memory into the buffer it writes its ASCII representation of it
> > + * in hex characters.
> > + *
> > + * Returns how much it wrote to the buffer.
> > + */
> > +int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
> > +		       unsigned int len)
> > +{
> > +	unsigned char hex[HEX_CHARS];
> > +	const unsigned char *data = mem;
> > +	unsigned int start_len;
> > +	int i, j;
> > +	int cnt = 0;
> > +
> > +	while (len) {
> > +		start_len = min(len, HEX_CHARS - 1);
> 
> I would do s/start_len/end_len/
> 
> I know that it depends on the point of view. But iteration between "0"
> and "end" is better understandable for me :-)

Yeah, I didn't like the name of that variable. I guess end_len is
better, but still not exactly what I want to call it. Unfortunately, I
don't know what exactly I want to call it ;-)

> 
> > +#ifdef __BIG_ENDIAN
> > +		for (i = 0, j = 0; i < start_len; i++) {
> > +#else
> > +		for (i = start_len-1, j = 0; i >= 0; i--) {
> > +#endif
> > +			hex[j++] = hex_asc_hi(data[i]);
> > +			hex[j++] = hex_asc_lo(data[i]);
> > +		}
> > +		if (WARN_ON_ONCE(j == 0 || j/2 > len))
> > +			break;
> > +
> > +		/* j increments twice per loop */
> > +		len -= j / 2;
> > +		hex[j++] = ' ';
> > +
> > +		cnt += seq_buf_putmem(s, hex, j);
> > +	}
> > +	return cnt;
> > +}
> > +
> > +/**
> > + * seq_buf_path - copy a path into the sequence buffer
> > + * @s: seq_buf descriptor
> > + * @path: path to write into the sequence buffer.
> > + *
> > + * Write a path name into the sequence buffer.
> > + *
> > + * Returns the number of bytes written into the buffer.
> > + */
> > +int seq_buf_path(struct seq_buf *s, const struct path *path)
> > +{
> > +	unsigned int len = SEQ_BUF_LEFT(s);
> > +	unsigned char *p;
> > +	unsigned int start = s->len;
> > +
> > +	WARN_ON((int)len < 0);
> > +	p = d_path(path, s->buffer + s->len, len);
> > +	if (!IS_ERR(p)) {
> > +		p = mangle_path(s->buffer + s->len, p, "\n");
> > +		if (p) {
> > +			s->len = p - s->buffer;
> > +	WARN_ON(s->len > (s->size - 1));
> 
> strange indentation
> 
> > +			return s->len - start;
> > +		}
> > +	} else {
> > +		s->buffer[s->len++] = '?';
> > +	WARN_ON(s->len > (s->size - 1));
> 
> same here

Heh, I added those for debugging, and then decided to keep it. Never
went back to clean it up :-/

-- Steve

> 
> > +		return s->len - start;
> > +	}
> > +
> > +	s->overflow = 1;
> > +	return 0;
> > +}
> > +
> 
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/
  2014-06-27 13:48   ` Petr Mládek
@ 2014-06-27 14:27     ` Steven Rostedt
  2014-06-27 14:39       ` Petr Mládek
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27 14:27 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 27 Jun 2014 15:48:10 +0200
Petr Mládek <pmladek@suse.cz> wrote:

> On Thu 2014-06-26 17:49:04, Steven Rostedt wrote:
> > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> > 
> > The seq_buf functions are rather useful outside of tracing. Instead
> > of having it be dependent on CONFIG_TRACING, move the code into lib/
> > and allow other users to have access to it even when tracing is not
> > configured.
> > 
> > The seq_buf utility is similar to the seq_file utility, but instead of
> > writing sending data back up to userland, it writes it into a buffer
> > defined at seq_buf_init(). This allows us to send a descriptor around
> > that writes printf() formatted strings into it that can be retrieved
> > later.
> > 
> > It is currently used by the tracing facility for such things like trace
> > events to convert its binary saved data in the ring buffer into an
> > ASCII human readable context to be displayed in /sys/kernel/debug/trace.
> > 
> > It can also be used for doing NMI prints safely from NMI context into
> > the seq_buf and retrieved later and dumped to printk() safely. Doing
> > printk() from an NMI context is dangerous because an NMI can preempt
> > a current printk() and deadlock on it.
> > 
> > Link: http://lkml.kernel.org/p/20140619213952.058255809@goodmis.org
> > 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  kernel/trace/seq_buf.c | 348 -------------------------------------------------
> >  lib/Makefile           |   2 +-
> >  lib/seq_buf.c          | 348 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  lib/trace_seq.c        | 303 ++++++++++++++++++++++++++++++++++++++++++
> 
> I guess that lib/trace_seq.c was copied by mistake.

Heh, that's what I get for using both quilt and git :-)

To start with what I had before (and use my scripts for the Link tag),
I pulled the old patch from my email and pulled it into a quilt queue.
Did a "quilt push -f" and then started from there. I removed all the
trace_seq.c updates but forgot that quilt push -f would create the file
in lib/.

When done with all my updates, I just did a git am of the patch in my
quilt queue, which pulled in the old copy of trace_seq.c in lib/ :-p

I'll remove it.

> 
> >  4 files changed, 652 insertions(+), 349 deletions(-)
> >  delete mode 100644 kernel/trace/seq_buf.c
> >  create mode 100644 lib/seq_buf.c
> >  create mode 100644 lib/trace_seq.c
> > 
> > diff --git a/lib/Makefile b/lib/Makefile
> > index ba967a19edba..6007194082c6 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
> >  	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
> >  	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
> >  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
> > -	 earlycpio.o
> > +	 earlycpio.o seq_buf.o
> >  
> >  obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
> >  lib-$(CONFIG_MMU) += ioremap.o
> 
> There is missing:

Nope, it just was copied by mistake. It only added dead code, it didn't
move it.

-- Steve

> 
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index edc98c72a634..67d6369ddf83 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -29,7 +29,6 @@ obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o
>  obj-$(CONFIG_TRACING) += trace.o
>  obj-$(CONFIG_TRACING) += trace_output.o
>  obj-$(CONFIG_TRACING) += trace_seq.o
> -obj-$(CONFIG_TRACING) += seq_buf.o
>  obj-$(CONFIG_TRACING) += trace_stat.o
>  obj-$(CONFIG_TRACING) += trace_printk.o
>  obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
> 
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 5/5 v2] x86/nmi: Perform a safe NMI stack trace on all CPUs
       [not found] ` <20140626220130.764213722@goodmis.org>
  2014-06-26 22:51   ` [RFC][PATCH 5/5 v2] x86/nmi: Perform a safe NMI stack trace on all CPUs Steven Rostedt
@ 2014-06-27 14:32   ` Petr Mládek
  2014-06-27 14:40     ` Steven Rostedt
  1 sibling, 1 reply; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 14:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Thu 2014-06-26 17:49:06, Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> When trigger_all_cpu_backtrace() is called on x86, it will trigger an
> NMI on each CPU and call show_regs(). But this can lead to a hard lock
> up if the NMI comes in on another printk().
> 
> In order to avoid this, when the NMI triggers, it switches the printk
> routine for that CPU to call a NMI safe printk function that records the
> printk in a per_cpu seq_buf descriptor. After all NMIs have finished
> recording its data, the trace_seqs are printed in a safe context.
> 
> Link: http://lkml.kernel.org/p/20140619213952.360076309@goodmis.org
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

It seems to work very well.

Tested-by: Petr Mladek <pmladek@suse.cz>

See one comment below.

> ---
> V2 - updated to use seq_buf instead of trace_seq.
>    - Keep original printk loglevel, do not strip it, but use it.
>      (suggested by Petr Mládek)
>    - Updated comments
>    - Removed return of true (noticed by Petr Mládek)
> ---
>  arch/x86/kernel/apic/hw_nmi.c | 84 +++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 81 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index c3fcb5de5083..91ddba672839 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -18,6 +18,7 @@
>  #include <linux/nmi.h>
>  #include <linux/module.h>
>  #include <linux/delay.h>
> +#include <linux/seq_buf.h>
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  u64 hw_nmi_get_sample_period(int watchdog_thresh)
> @@ -30,11 +31,31 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
>  /* For reliability, we're prepared to waste bits here. */
>  static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
>  
> +#define NMI_BUF_SIZE		4096
> +
> +struct nmi_seq_buf {
> +	unsigned char		buffer[NMI_BUF_SIZE];
> +	struct seq_buf		seq;
> +};
> +
> +/* Safe printing in NMI context */
> +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
> +
>  /* "in progress" flag of arch_trigger_all_cpu_backtrace */
>  static unsigned long backtrace_flag;
>  
> +static void print_seq_line(struct nmi_seq_buf *s, int last, int pos)
> +{
> +	const char *buf = s->buffer + last;
> +
> +	printk("%.*s", (pos - last) + 1, buf);
> +}
> +
>  void arch_trigger_all_cpu_backtrace(void)
>  {
> +	struct nmi_seq_buf *s;
> +	int len;
> +	int cpu;
>  	int i;
>  
>  	if (test_and_set_bit(0, &backtrace_flag))
> @@ -44,6 +65,15 @@ void arch_trigger_all_cpu_backtrace(void)
>  		 */
>  		return;
>  
> +	/*
> +	 * Set up per_cpu seq_buf buffers that the NMIs running on the other
> +	 * CPUs will write to.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		s = &per_cpu(nmi_print_seq, cpu);
> +		seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
> +	}
> +
>  	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
>  
>  	printk(KERN_INFO "sending NMI to all CPUs:\n");
> @@ -56,10 +86,56 @@ void arch_trigger_all_cpu_backtrace(void)
>  		mdelay(1);
>  	}
>  
> +	/*
> +	 * Now that all the NMIs have triggered, we can dump out their
> +	 * back traces safely to the console.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		int last_i = 0;
> +
> +		s = &per_cpu(nmi_print_seq, cpu);
> +		len = s->seq.len;
> +		if (!len)
> +			continue;
> +
> +		/* Print line by line. */
> +		for (i = 0; i < len; i++) {
> +			if (s->buffer[i] == '\n') {
> +				print_seq_line(s, last_i, i);
> +				last_i = i + 1;
> +			}
> +		}
> +		if (last_i < i - 1) {
> +			print_seq_line(s, last_i, i);
> +			pr_cont("\n");
> +		}
> +	}
> +
>  	clear_bit(0, &backtrace_flag);
>  	smp_mb__after_atomic();
>  }
>  
> +/*
> + * It is not safe to call printk() directly from NMI handlers.
> + * It may be fine if the NMI detected a lock up and we have no choice
> + * but to do so, but doing a NMI on all other CPUs to get a back trace
> + * can be done with a sysrq-l. We don't want that to lock up, which
> + * can happen if the NMI interrupts a printk in progress.
> + *
> + * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
> + * the content into a per cpu seq_buf buffer. Then when the NMIs are
> + * all done, we can safely dump the contents of the seq_buf to a printk()
> + * from a non NMI context.
> + */

I would move this comment above #define NMI_BUF_SIZE
It should be on top because it helps to understand many other tricks that are
used above.

Regards,
Petr

> +static int nmi_vprintk(const char *fmt, va_list args)
> +{
> +	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> +	unsigned int len = s->seq.len;
> +
> +	seq_buf_vprintf(&s->seq, fmt, args);
> +	return s->seq.len - len;
> +}
> +
>  static int
>  arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
>  {
> @@ -68,12 +144,14 @@ arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
>  	cpu = smp_processor_id();
>  
>  	if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
> -		static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED;
> +		printk_func_t printk_func_save = this_cpu_read(printk_func);
>  
> -		arch_spin_lock(&lock);
> +		/* Replace printk to write into the NMI seq */
> +		this_cpu_write(printk_func, nmi_vprintk);
>  		printk(KERN_WARNING "NMI backtrace for cpu %d\n", cpu);
>  		show_regs(regs);
> -		arch_spin_unlock(&lock);
> +		this_cpu_write(printk_func, printk_func_save);
> +
>  		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>  		return NMI_HANDLED;
>  	}
> -- 
> 2.0.0
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 4/5 v2] printk: Add per_cpu printk func to allow printk to be diverted
  2014-06-27 14:20   ` Petr Mládek
@ 2014-06-27 14:39     ` Steven Rostedt
  2014-06-27 14:43       ` Petr Mládek
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27 14:39 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 27 Jun 2014 16:20:25 +0200
Petr Mládek <pmladek@suse.cz> wrote:


> >  	va_start(args, fmt);
> > -	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
> > +	preempt_disable();
> 
> I think that it is too late to disable the preemption here.
> It has to be done by the printk() caller if it wants to be sure
> that the requested function is used.

That's only if the printk() caller cares. But it would be nice that we
run the printk_func for the CPU that vprintk_func() is on, thus the
preempt_disable() is required. (in -rt, this would turn into a
migrate_disable()).

-- Steve

> 
> > +	vprintk_func = this_cpu_read(printk_func);
> > +	r = vprintk_func(fmt, args);
> > +	preempt_enable();
> >  	va_end(args);
> >  
> >  	return r;
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/
  2014-06-27 14:27     ` Steven Rostedt
@ 2014-06-27 14:39       ` Petr Mládek
  2014-06-27 14:44         ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 14:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri 2014-06-27 10:27:15, Steven Rostedt wrote:
> On Fri, 27 Jun 2014 15:48:10 +0200
> Petr Mládek <pmladek@suse.cz> wrote:
> 
> > On Thu 2014-06-26 17:49:04, Steven Rostedt wrote:
> > > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> > > 
> > > The seq_buf functions are rather useful outside of tracing. Instead
> > > of having it be dependent on CONFIG_TRACING, move the code into lib/
> > > and allow other users to have access to it even when tracing is not
> > > configured.
> > > 
> > > The seq_buf utility is similar to the seq_file utility, but instead of
> > > writing sending data back up to userland, it writes it into a buffer
> > > defined at seq_buf_init(). This allows us to send a descriptor around
> > > that writes printf() formatted strings into it that can be retrieved
> > > later.
> > > 
> > > It is currently used by the tracing facility for such things like trace
> > > events to convert its binary saved data in the ring buffer into an
> > > ASCII human readable context to be displayed in /sys/kernel/debug/trace.
> > > 
> > > It can also be used for doing NMI prints safely from NMI context into
> > > the seq_buf and retrieved later and dumped to printk() safely. Doing
> > > printk() from an NMI context is dangerous because an NMI can preempt
> > > a current printk() and deadlock on it.
> > > 
> > > Link: http://lkml.kernel.org/p/20140619213952.058255809@goodmis.org
> > > 
> > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > > ---
> > >  kernel/trace/seq_buf.c | 348 -------------------------------------------------
> > >  lib/Makefile           |   2 +-
> > >  lib/seq_buf.c          | 348 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  lib/trace_seq.c        | 303 ++++++++++++++++++++++++++++++++++++++++++
> > 
> > I guess that lib/trace_seq.c was copied by mistake.
> 
> Heh, that's what I get for using both quilt and git :-)
> 
> To start with what I had before (and use my scripts for the Link tag),
> I pulled the old patch from my email and pulled it into a quilt queue.
> Did a "quilt push -f" and then started from there. I removed all the
> trace_seq.c updates but forgot that quilt push -f would create the file
> in lib/.
> 
> When done with all my updates, I just did a git am of the patch in my
> quilt queue, which pulled in the old copy of trace_seq.c in lib/ :-p

:-)
 
> I'll remove it.
> 
> > 
> > >  4 files changed, 652 insertions(+), 349 deletions(-)
> > >  delete mode 100644 kernel/trace/seq_buf.c
> > >  create mode 100644 lib/seq_buf.c
> > >  create mode 100644 lib/trace_seq.c
> > > 
> > > diff --git a/lib/Makefile b/lib/Makefile
> > > index ba967a19edba..6007194082c6 100644
> > > --- a/lib/Makefile
> > > +++ b/lib/Makefile
> > > @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
> > >  	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
> > >  	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
> > >  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
> > > -	 earlycpio.o
> > > +	 earlycpio.o seq_buf.o
> > >  
> > >  obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
> > >  lib-$(CONFIG_MMU) += ioremap.o
> > 
> > There is missing:
> 
> Nope, it just was copied by mistake. It only added dead code, it didn't
> move it.

It is really missing! You removed kernel/trace/seq_buf.c and you have to
remove it also from kernel/trace/Makefile. :-)

Best Regards,
Petr

> -- Steve
> 
> > 
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index edc98c72a634..67d6369ddf83 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -29,7 +29,6 @@ obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o
> >  obj-$(CONFIG_TRACING) += trace.o
> >  obj-$(CONFIG_TRACING) += trace_output.o
> >  obj-$(CONFIG_TRACING) += trace_seq.o
> > -obj-$(CONFIG_TRACING) += seq_buf.o
> >  obj-$(CONFIG_TRACING) += trace_stat.o
> >  obj-$(CONFIG_TRACING) += trace_printk.o
> >  obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
> > 
> > 
> > Best Regards,
> > Petr
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 5/5 v2] x86/nmi: Perform a safe NMI stack trace on all CPUs
  2014-06-27 14:32   ` Petr Mládek
@ 2014-06-27 14:40     ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27 14:40 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 27 Jun 2014 16:32:46 +0200
Petr Mládek <pmladek@suse.cz> wrote:

 
> > +/*
> > + * It is not safe to call printk() directly from NMI handlers.
> > + * It may be fine if the NMI detected a lock up and we have no choice
> > + * but to do so, but doing a NMI on all other CPUs to get a back trace
> > + * can be done with a sysrq-l. We don't want that to lock up, which
> > + * can happen if the NMI interrupts a printk in progress.
> > + *
> > + * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
> > + * the content into a per cpu seq_buf buffer. Then when the NMIs are
> > + * all done, we can safely dump the contents of the seq_buf to a printk()
> > + * from a non NMI context.
> > + */
> 
> I would move this comment above #define NMI_BUF_SIZE
> It should be on top because it helps to understand many other tricks that are
> used above.
> 

I have no problem with moving it. This started out as a simple comment
and then I just got carried away ;-)

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 4/5 v2] printk: Add per_cpu printk func to allow printk to be diverted
  2014-06-27 14:39     ` Steven Rostedt
@ 2014-06-27 14:43       ` Petr Mládek
  0 siblings, 0 replies; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 14:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri 2014-06-27 10:39:33, Steven Rostedt wrote:
> On Fri, 27 Jun 2014 16:20:25 +0200
> Petr Mládek <pmladek@suse.cz> wrote:
> 
> 
> > >  	va_start(args, fmt);
> > > -	r = vprintk_emit(0, -1, NULL, 0, fmt, args);
> > > +	preempt_disable();
> > 
> > I think that it is too late to disable the preemption here.
> > It has to be done by the printk() caller if it wants to be sure
> > that the requested function is used.
> 
> That's only if the printk() caller cares. But it would be nice that we
> run the printk_func for the CPU that vprintk_func() is on, thus the
> preempt_disable() is required. (in -rt, this would turn into a
> migrate_disable()).

It makes sense. Thanks for explanation.

Best Regards,
Petr
 
> -- Steve
> 
> > 
> > > +	vprintk_func = this_cpu_read(printk_func);
> > > +	r = vprintk_func(fmt, args);
> > > +	preempt_enable();
> > >  	va_end(args);
> > >  
> > >  	return r;
> > 
> > Best Regards,
> > Petr
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/
  2014-06-27 14:39       ` Petr Mládek
@ 2014-06-27 14:44         ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27 14:44 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 27 Jun 2014 16:39:56 +0200
Petr Mládek <pmladek@suse.cz> wrote:


> > Nope, it just was copied by mistake. It only added dead code, it didn't
> > move it.
> 
> It is really missing! You removed kernel/trace/seq_buf.c and you have to
> remove it also from kernel/trace/Makefile. :-)

OK, I see. Strange that this didn't complain when I compiled it :-/

Oh, I bet the .o file still existed. It would have complained if I did
a make clean first. But then it should have complained about the dual
definitions ??

-- Steve

> 
> Best Regards,
> Petr
> 
> > -- Steve
> > 
> > > 
> > > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > > index edc98c72a634..67d6369ddf83 100644
> > > --- a/kernel/trace/Makefile
> > > +++ b/kernel/trace/Makefile
> > > @@ -29,7 +29,6 @@ obj-$(CONFIG_RING_BUFFER_BENCHMARK) += ring_buffer_benchmark.o
> > >  obj-$(CONFIG_TRACING) += trace.o
> > >  obj-$(CONFIG_TRACING) += trace_output.o
> > >  obj-$(CONFIG_TRACING) += trace_seq.o
> > > -obj-$(CONFIG_TRACING) += seq_buf.o
> > >  obj-$(CONFIG_TRACING) += trace_stat.o
> > >  obj-$(CONFIG_TRACING) += trace_printk.o
> > >  obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
> > > 
> > > 
> > > Best Regards,
> > > Petr
> > 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-27 14:21     ` Steven Rostedt
@ 2014-06-27 14:56       ` Petr Mládek
  0 siblings, 0 replies; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 14:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri 2014-06-27 10:21:34, Steven Rostedt wrote:
> On Fri, 27 Jun 2014 15:45:38 +0200
> Petr Mládek <pmladek@suse.cz> wrote:
> 
> > ad 4th:
> > 
> >    Both "full" and "overflow" flags seems to have the same meaning.
> >    For example, trace_seq_printf() sets "full" on failure even
> >    when s->seq.len != s->size.
> > 
> > 
> > Best Regards,
> > Petr
> > 
> > [...]
> 
> 
> BTW, you shouldn't sign off if you have more comments, I usually stop
> reading when I see someone's signature. Or at the very least, state
> that you have more comments below.

Shame on me. I was about to send the mail but then I decided to add
more comments into the code. :-)
 
> > 
> > > +#define MAX_MEMHEX_BYTES	8U
> > > +#define HEX_CHARS		(MAX_MEMHEX_BYTES*2 + 1)
> > > +
> > > +/**
> > > + * seq_buf_putmem_hex - write raw memory into the buffer in ASCII hex
> > > + * @s: seq_buf descriptor
> > > + * @mem: The raw memory to write its hex ASCII representation of
> > > + * @len: The length of the raw memory to copy (in bytes)
> > > + *
> > > + * This is similar to seq_buf_putmem() except instead of just copying the
> > > + * raw memory into the buffer it writes its ASCII representation of it
> > > + * in hex characters.
> > > + *
> > > + * Returns how much it wrote to the buffer.
> > > + */
> > > +int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
> > > +		       unsigned int len)
> > > +{
> > > +	unsigned char hex[HEX_CHARS];
> > > +	const unsigned char *data = mem;
> > > +	unsigned int start_len;
> > > +	int i, j;
> > > +	int cnt = 0;
> > > +
> > > +	while (len) {
> > > +		start_len = min(len, HEX_CHARS - 1);
> > 
> > I would do s/start_len/end_len/
> > 
> > I know that it depends on the point of view. But iteration between "0"
> > and "end" is better understandable for me :-)
> 
> Yeah, I didn't like the name of that variable. I guess end_len is
> better, but still not exactly what I want to call it. Unfortunately, I
> don't know what exactly I want to call it ;-)

Yup, the code is tricky :-) The names like break_len, wrap_len,
split_len, block_len come to my mind.

I would prefer wrap_len after all. But the choice it yours.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-27 14:19     ` Steven Rostedt
@ 2014-06-27 15:18       ` Petr Mládek
  2014-06-27 15:39         ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 15:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri 2014-06-27 10:19:07, Steven Rostedt wrote:
> On Fri, 27 Jun 2014 15:45:38 +0200
> Petr Mládek <pmladek@suse.cz> wrote:
> 
> > On Thu 2014-06-26 17:49:03, Steven Rostedt wrote:
> > > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> > > 
> > > Create a seq_buf layer that trace_seq sits on. The seq_buf will not
> > > be limited to page size. This will allow other usages of seq_buf
> > > instead of a hard set PAGE_SIZE one that trace_seq has.
> > >
> > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > > ---
> > >  include/linux/seq_buf.h              |  58 ++++++
> > >  include/linux/trace_seq.h            |  10 +-
> > >  kernel/trace/Makefile                |   1 +
> > >  kernel/trace/seq_buf.c               | 348 +++++++++++++++++++++++++++++++++++
> > >  kernel/trace/trace.c                 |  39 ++--
> > >  kernel/trace/trace_events.c          |   6 +-
> > >  kernel/trace/trace_functions_graph.c |   6 +-
> > >  kernel/trace/trace_seq.c             | 184 +++++++++---------
> > >  8 files changed, 527 insertions(+), 125 deletions(-)
> > >  create mode 100644 include/linux/seq_buf.h
> > >  create mode 100644 kernel/trace/seq_buf.c
> > 
> > There is a lot of similar code in the two layers. Do you have any
> > plans to transform the trace stuff to seq_buf and get rid of the
> > trace_seq layer in long term?
> > 
> > If I get it correctly, the two layers are needed because:
> > 
> >    1. seq_buf works with any buffer but
> >       trace_seq with static buffer
> > 
> >    2. seq_buf writes even part of the message until the buffer is full but
> >       trace_buf writes only full messages or nothing
> > 
> >    3. seq_buf returns the number of written characters but
> >       trace_seq returns 1 on success and 0 on failure
> > 
> >    4. seq_buf sets "overflow" flag when an operation failed but
> >       trace_seq sets "full" when this happens
> > 
> > 
> > I wounder if we could get a compromise and use only one layer.
> > 
> > ad 1st:
> > 
> >    I have checked many locations and it seems that trace_seq_init() is
> >    called before the other functions as used. There is the WARN_ON()
> >    in the generic seq_buf() functions, so they would not crash. If
> >    there is some init missing, we could fix it later.
> > 
> >    But I haven't really tested it yet.
> 
> Actually, there's a few hidden places that initialize a struct with
> kzalloc that contains a trace_seq. I started fixing them but there were
> more and more to find that I decided to give up and just add the check
> to see if it needed to be initialized at each call.
> 
> Not that pretty, but not that critical to be worth crashing something
> we missed. Maybe in the future this can change, but not now.

I see.
 
> > 
> > ad 2nd and 3rd:
> > 
> >    These are connected.
> > 
> >    On solution is to disable writing part of the text even in the
> >    generic seq_buf functions. I think that it is perfectly fine.
> >    If we do not print the entire line, we are in troubles anyway.
> >    Note that we could not allocate another buffer in NMI, so
> >    we will lose part of the message anyway.
> 
> This patch uses seq_buf for the NMI code so it will fill to the end of
> the buffer and just truncate what can't fit.

I think that NMI code could live with the trace_seq behavior. The
lines are short. If we miss few characters it is not that big difference.

> trace_pipe depends on the trace_seq behavior.
> 
> > 
> >    Another solution would be to live with incomplete lines in tracing.
> >    I wonder if any of the functions tries to write the line again when the
> >    write failed.
> 
> This may break trace_pipe. Although there looks to be redundant
> behavior in that the pipe code also resets the seq.len on partial line,
> so maybe it's not an issue.
> 
> > 
> >    IMHO, the most important thing is that both functions return 0 on
> >    failure.
> 
> Note, I'm not sure how tightly these two need to be. I'm actually
> trying to get trace_seq to be specific to tracing and nothing more.
> Have seq_buf be used for all other instances.

If the two layers make your life easier then they might make sense. I
just saw many similarities and wanted to help. IMHO, if anyone breaks
seq_buf, it will break trace_seq anyway. So, they are not really separated.

> 
> > ad 4th:
> > 
> >    Both "full" and "overflow" flags seems to have the same meaning.
> >    For example, trace_seq_printf() sets "full" on failure even
> >    when s->seq.len != s->size.
> 
> The difference is that the overflow flag is just used for info letting
> the user know that it did not fit. The full flag in trace_seq lets you
> know that you can not add anything else, even though the new stuff may
> fit.

I see. They have another meaning but they are set at the same time:

	if (s->seq.overflow) {
		...
		s->full = 1;
		return 0;
	}

In fact, both names are slightly misleading. seq.overflow is set
when the buffer is full even when all characters were written.
s->full is set even when there is still some space :-)

I suggest to rename "overflow" to "full" and have it only in the
struct "seq_buf". Then it has the right meaning in "seq_buf" and is still
backward compatible with "trace_seq".


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-27 15:18       ` Petr Mládek
@ 2014-06-27 15:39         ` Steven Rostedt
  2014-06-27 16:52           ` Petr Mládek
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-06-27 15:39 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 27 Jun 2014 17:18:04 +0200
Petr Mládek <pmladek@suse.cz> wrote:


> > This patch uses seq_buf for the NMI code so it will fill to the end of
> > the buffer and just truncate what can't fit.
> 
> I think that NMI code could live with the trace_seq behavior. The
> lines are short. If we miss few characters it is not that big difference.

True, but I'm trying to keep trace_seq more tracing specific.

> 
> > trace_pipe depends on the trace_seq behavior.
> > 
> > > 
> > >    Another solution would be to live with incomplete lines in tracing.
> > >    I wonder if any of the functions tries to write the line again when the
> > >    write failed.
> > 
> > This may break trace_pipe. Although there looks to be redundant
> > behavior in that the pipe code also resets the seq.len on partial line,
> > so maybe it's not an issue.
> > 
> > > 
> > >    IMHO, the most important thing is that both functions return 0 on
> > >    failure.
> > 
> > Note, I'm not sure how tightly these two need to be. I'm actually
> > trying to get trace_seq to be specific to tracing and nothing more.
> > Have seq_buf be used for all other instances.
> 
> If the two layers make your life easier then they might make sense. I
> just saw many similarities and wanted to help. IMHO, if anyone breaks
> seq_buf, it will break trace_seq anyway. So, they are not really separated.

There's actually things I want to do with the seq_buf behavior that I
can't do with the trace_seq behavior. That's more about having a
dynamic way to create seq_buf buffers and resize them if need be.
Although, perhaps an all or nothing approach will help in that.


> 
> > 
> > > ad 4th:
> > > 
> > >    Both "full" and "overflow" flags seems to have the same meaning.
> > >    For example, trace_seq_printf() sets "full" on failure even
> > >    when s->seq.len != s->size.
> > 
> > The difference is that the overflow flag is just used for info letting
> > the user know that it did not fit. The full flag in trace_seq lets you
> > know that you can not add anything else, even though the new stuff may
> > fit.
> 
> I see. They have another meaning but they are set at the same time:
> 
> 	if (s->seq.overflow) {
> 		...
> 		s->full = 1;
> 		return 0;
> 	}
> 
> In fact, both names are slightly misleading. seq.overflow is set
> when the buffer is full even when all characters were written.
> s->full is set even when there is still some space :-)

I actually disagree. overflow means that you wrote more than what was
there. In this case, it was cropped.

Full suggests that you can't add anymore because it wont allow you.

The difference is that you are looking at a implementation point of
view. I'm looking at a usage point of view. With trace_seq, there is no
space left, you can't add any more. seq_buf, you can add more but it
wont be saved because it was overflowed.

What I should do is change overflow from being a flag to the number of
characters that it overflowed by. That would be useful information, and
would make more sense. It lets the user know what wasn't written.

-- Steve



> 
> I suggest to rename "overflow" to "full" and have it only in the
> struct "seq_buf". Then it has the right meaning in "seq_buf" and is still
> backward compatible with "trace_seq".
> 
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-27 15:39         ` Steven Rostedt
@ 2014-06-27 16:52           ` Petr Mládek
  2014-09-26 15:00             ` Steven Rostedt
  0 siblings, 1 reply; 33+ messages in thread
From: Petr Mládek @ 2014-06-27 16:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri 2014-06-27 11:39:09, Steven Rostedt wrote:
> On Fri, 27 Jun 2014 17:18:04 +0200
> Petr Mládek <pmladek@suse.cz> wrote:
> 
> 
> > > This patch uses seq_buf for the NMI code so it will fill to the end of
> > > the buffer and just truncate what can't fit.
> > 
> > I think that NMI code could live with the trace_seq behavior. The
> > lines are short. If we miss few characters it is not that big difference.
> 
> True, but I'm trying to keep trace_seq more tracing specific.

It is true that most writing functions write as much as possible. On
the other hand, I do not think that refusing to write, if there is not
enough space, is specific to tracing. It might be useful also for others.

In the worst case, we could add a flag into the "struct seq_buf" that
might define the behavior. Well, I am not sure if we want to add this
complexity when there are no users. As I said, the backtraces might
live with trace_seq behavior.

>> 
> > > trace_pipe depends on the trace_seq behavior.
> > > 
> > > > 
> > > >    Another solution would be to live with incomplete lines in tracing.
> > > >    I wonder if any of the functions tries to write the line again when the
> > > >    write failed.
> > > 
> > > This may break trace_pipe. Although there looks to be redundant
> > > behavior in that the pipe code also resets the seq.len on partial line,
> > > so maybe it's not an issue.
> > > 
> > > > 
> > > >    IMHO, the most important thing is that both functions return 0 on
> > > >    failure.
> > > 
> > > Note, I'm not sure how tightly these two need to be. I'm actually
> > > trying to get trace_seq to be specific to tracing and nothing more.
> > > Have seq_buf be used for all other instances.
> > 
> > If the two layers make your life easier then they might make sense. I
> > just saw many similarities and wanted to help. IMHO, if anyone breaks
> > seq_buf, it will break trace_seq anyway. So, they are not really separated.
> 
> There's actually things I want to do with the seq_buf behavior that I
> can't do with the trace_seq behavior. That's more about having a
> dynamic way to create seq_buf buffers and resize them if need be.
> Although, perhaps an all or nothing approach will help in that.

Note that we should not allocate in NMI context, especially when there
is a softlock. So, we might need to define the behavior anyway.


> > 
> > > 
> > > > ad 4th:
> > > > 
> > > >    Both "full" and "overflow" flags seems to have the same meaning.
> > > >    For example, trace_seq_printf() sets "full" on failure even
> > > >    when s->seq.len != s->size.
> > > 
> > > The difference is that the overflow flag is just used for info letting
> > > the user know that it did not fit. The full flag in trace_seq lets you
> > > know that you can not add anything else, even though the new stuff may
> > > fit.
> > 
> > I see. They have another meaning but they are set at the same time:
> > 
> > 	if (s->seq.overflow) {
> > 		...
> > 		s->full = 1;
> > 		return 0;
> > 	}
> > 
> > In fact, both names are slightly misleading. seq.overflow is set
> > when the buffer is full even when all characters were written.
> > s->full is set even when there is still some space :-)
> 
> I actually disagree. overflow means that you wrote more than what was
> there. In this case, it was cropped.

The problem is that we do not know that it was cropped.

The "overflow" flag is set when (s->len > (s->size - 1)). In most
cases it will be set when (s->len == s->size).

For example, seq_buf_printf() calls vsnprintf(). It will never write
over the buffer. We do not know if the message was cropped or if we
were lucky and the message was exactly as long as the free space.


> Full suggests that you can't add anymore because it wont allow you.
> 
> The difference is that you are looking at a implementation point of
> view. I'm looking at a usage point of view. With trace_seq, there is no
> space left, you can't add any more. seq_buf, you can add more but it
> wont be saved because it was overflowed.

I know that there is some difference but I am not sure if users are
really interested into such details. In both cases, they are unable
to write more data. From they point of view, the buffer is simply full.
Honestly, I think that the two flags and the two point of view cause
more confusion than win.

Note that I do not want to change the meaning for trace_buf. I want to
change the meaning for "seq_buf" that has no users yet.


> What I should do is change overflow from being a flag to the number of
> characters that it overflowed by. That would be useful information, and
> would make more sense. It lets the user know what wasn't written.

I am afraid that we do not have this information. See above for the
example with vsnprintf().

In each case, I do not want to block this patch. The generic "seq_buf"
API looks reasonable. The tracing code is your area and it is your
decision. You know much more about it than me and the extra complexity
might be needed.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function
  2014-06-27  3:14     ` James Bottomley
@ 2014-07-03 16:03       ` Steven Rostedt
  0 siblings, 0 replies; 33+ messages in thread
From: Steven Rostedt @ 2014-07-03 16:03 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Petr Mladek, Paul E. McKenney,
	Konstantin Khlebnikov, Gleb Natapov, Paolo Bonzini, linux-scsi

On Thu, 26 Jun 2014 20:14:15 -0700
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Thu, 2014-06-26 at 21:06 -0400, Steven Rostedt wrote:
> > As this patch is in my 3.17 queue and it touches the kvm and scsi
> > tracepoint code, I figured I should at least do the courtesy of
> > notifying the maintainers of those subsystems.
> > 
> > Do you have any issues with this going through my tree? If not, please
> > give me an Acked-by.
> 
> Missing cc to linux-scsi added.
> 
> Hannes has a set of patches to move logging to trace points.  I don't
> think there's a clash but I'd like him to comment before you do this.

I'm about to push this out. I already tested it fully at my end. Is
there any objection to this change?

I'll post the patches but I'll wait till I get a reply before I push it
to my for-next branch. I try not to rebase that branch. If there's an
issue, I'll have to revert the scsi changes.

-- Steve

> 
> James
> 
> 
> > Thanks!
> > 
> > -- Steve
> > 
> > On Thu, 26 Jun 2014 17:49:02 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> > > 
> > > There's several locations in the kernel that open code the calculation
> > > of the next location in the trace_seq buffer. This is usually done with
> > > 
> > >   p->buffer + p->len
> > > 
> > > Instead of having this open coded, supply a helper function in the
> > > header to do it for them. This function is called trace_seq_buffer_ptr().
> > > 
> > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > > ---
> > >  arch/x86/kvm/mmutrace.h     |  2 +-
> > >  drivers/scsi/scsi_trace.c   | 16 ++++++++--------
> > >  include/linux/trace_seq.h   | 15 +++++++++++++++
> > >  kernel/trace/trace_output.c | 14 +++++++-------
> > >  4 files changed, 31 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
> > > index 9d2e0ffcb190..2e5652b62fd6 100644
> > > --- a/arch/x86/kvm/mmutrace.h
> > > +++ b/arch/x86/kvm/mmutrace.h
> > > @@ -22,7 +22,7 @@
> > >  	__entry->unsync = sp->unsync;
> > >  
> > >  #define KVM_MMU_PAGE_PRINTK() ({				        \
> > > -	const char *ret = p->buffer + p->len;				\
> > > +	const char *ret = trace_seq_buffer_ptr(p);			\
> > >  	static const char *access_str[] = {			        \
> > >  		"---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"  \
> > >  	};							        \
> > > diff --git a/drivers/scsi/scsi_trace.c b/drivers/scsi/scsi_trace.c
> > > index 2bea4f0b684a..503594e5f76d 100644
> > > --- a/drivers/scsi/scsi_trace.c
> > > +++ b/drivers/scsi/scsi_trace.c
> > > @@ -28,7 +28,7 @@ scsi_trace_misc(struct trace_seq *, unsigned char *, int);
> > >  static const char *
> > >  scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  	sector_t lba = 0, txlen = 0;
> > >  
> > >  	lba |= ((cdb[1] & 0x1F) << 16);
> > > @@ -46,7 +46,7 @@ scsi_trace_rw6(struct trace_seq *p, unsigned char *cdb, int len)
> > >  static const char *
> > >  scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  	sector_t lba = 0, txlen = 0;
> > >  
> > >  	lba |= (cdb[2] << 24);
> > > @@ -71,7 +71,7 @@ scsi_trace_rw10(struct trace_seq *p, unsigned char *cdb, int len)
> > >  static const char *
> > >  scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  	sector_t lba = 0, txlen = 0;
> > >  
> > >  	lba |= (cdb[2] << 24);
> > > @@ -94,7 +94,7 @@ scsi_trace_rw12(struct trace_seq *p, unsigned char *cdb, int len)
> > >  static const char *
> > >  scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  	sector_t lba = 0, txlen = 0;
> > >  
> > >  	lba |= ((u64)cdb[2] << 56);
> > > @@ -125,7 +125,7 @@ scsi_trace_rw16(struct trace_seq *p, unsigned char *cdb, int len)
> > >  static const char *
> > >  scsi_trace_rw32(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len, *cmd;
> > > +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
> > >  	sector_t lba = 0, txlen = 0;
> > >  	u32 ei_lbrt = 0;
> > >  
> > > @@ -180,7 +180,7 @@ out:
> > >  static const char *
> > >  scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  	unsigned int regions = cdb[7] << 8 | cdb[8];
> > >  
> > >  	trace_seq_printf(p, "regions=%u", (regions - 8) / 16);
> > > @@ -192,7 +192,7 @@ scsi_trace_unmap(struct trace_seq *p, unsigned char *cdb, int len)
> > >  static const char *
> > >  scsi_trace_service_action_in(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len, *cmd;
> > > +	const char *ret = trace_seq_buffer_ptr(p), *cmd;
> > >  	sector_t lba = 0;
> > >  	u32 alloc_len = 0;
> > >  
> > > @@ -247,7 +247,7 @@ scsi_trace_varlen(struct trace_seq *p, unsigned char *cdb, int len)
> > >  static const char *
> > >  scsi_trace_misc(struct trace_seq *p, unsigned char *cdb, int len)
> > >  {
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  
> > >  	trace_seq_printf(p, "-");
> > >  	trace_seq_putc(p, 0);
> > > diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
> > > index dd85753e1bb0..ea6c9dea79e3 100644
> > > --- a/include/linux/trace_seq.h
> > > +++ b/include/linux/trace_seq.h
> > > @@ -25,6 +25,21 @@ trace_seq_init(struct trace_seq *s)
> > >  	s->full = 0;
> > >  }
> > >  
> > > +/**
> > > + * trace_seq_buffer_ptr - return pointer to next location in buffer
> > > + * @s: trace sequence descriptor
> > > + *
> > > + * Returns the pointer to the buffer where the next write to
> > > + * the buffer will happen. This is useful to save the location
> > > + * that is about to be written to and then return the result
> > > + * of that write.
> > > + */
> > > +static inline unsigned char *
> > > +trace_seq_buffer_ptr(struct trace_seq *s)
> > > +{
> > > +	return s->buffer + s->len;
> > > +}
> > > +
> > >  /*
> > >   * Currently only defined when tracing is enabled.
> > >   */
> > > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > > index b8930f79a04b..c6977d5a9b12 100644
> > > --- a/kernel/trace/trace_output.c
> > > +++ b/kernel/trace/trace_output.c
> > > @@ -75,7 +75,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
> > >  {
> > >  	unsigned long mask;
> > >  	const char *str;
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  	int i, first = 1;
> > >  
> > >  	for (i = 0;  flag_array[i].name && flags; i++) {
> > > @@ -111,7 +111,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
> > >  			 const struct trace_print_flags *symbol_array)
> > >  {
> > >  	int i;
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  
> > >  	for (i = 0;  symbol_array[i].name; i++) {
> > >  
> > > @@ -122,7 +122,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
> > >  		break;
> > >  	}
> > >  
> > > -	if (ret == (const char *)(p->buffer + p->len))
> > > +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
> > >  		trace_seq_printf(p, "0x%lx", val);
> > >  		
> > >  	trace_seq_putc(p, 0);
> > > @@ -137,7 +137,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
> > >  			 const struct trace_print_flags_u64 *symbol_array)
> > >  {
> > >  	int i;
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  
> > >  	for (i = 0;  symbol_array[i].name; i++) {
> > >  
> > > @@ -148,7 +148,7 @@ ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
> > >  		break;
> > >  	}
> > >  
> > > -	if (ret == (const char *)(p->buffer + p->len))
> > > +	if (ret == (const char *)(trace_seq_buffer_ptr(p)))
> > >  		trace_seq_printf(p, "0x%llx", val);
> > >  
> > >  	trace_seq_putc(p, 0);
> > > @@ -162,7 +162,7 @@ const char *
> > >  ftrace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr,
> > >  			 unsigned int bitmask_size)
> > >  {
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  
> > >  	trace_seq_bitmask(p, bitmask_ptr, bitmask_size * 8);
> > >  	trace_seq_putc(p, 0);
> > > @@ -175,7 +175,7 @@ const char *
> > >  ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len)
> > >  {
> > >  	int i;
> > > -	const char *ret = p->buffer + p->len;
> > > +	const char *ret = trace_seq_buffer_ptr(p);
> > >  
> > >  	for (i = 0; i < buf_len; i++)
> > >  		trace_seq_printf(p, "%s%2.2x", i == 0 ? "" : " ", buf[i]);
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely
  2014-06-26 21:49 [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Steven Rostedt
                   ` (4 preceding siblings ...)
       [not found] ` <20140626220130.764213722@goodmis.org>
@ 2014-08-07  8:41 ` Jiri Kosina
  2014-08-08 18:44   ` Steven Rostedt
  5 siblings, 1 reply; 33+ messages in thread
From: Jiri Kosina @ 2014-08-07  8:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

On Thu, 26 Jun 2014, Steven Rostedt wrote:

> This is version 2:
> 
> This is my proposal to print the NMI stack traces from an RCU stall safely.
> Here's the gist of it.

Steven, what's your plan with this patchset please? 

It has been tested by Petr here on the systems under the loads that are 
able to trigger the hangs.

It'd be nice to have this in 3.17, so that we can then start working on 
fixing all the remaining printk() calls in NMI context once this biggest 
offender (stack dumping) is done.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely
  2014-08-07  8:41 ` [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Jiri Kosina
@ 2014-08-08 18:44   ` Steven Rostedt
  2014-08-12 14:17     ` Jiri Kosina
  2014-09-10  8:08     ` Jiri Kosina
  0 siblings, 2 replies; 33+ messages in thread
From: Steven Rostedt @ 2014-08-08 18:44 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

On Thu, 7 Aug 2014 10:41:45 +0200 (CEST)
Jiri Kosina <jkosina@suse.cz> wrote:

> On Thu, 26 Jun 2014, Steven Rostedt wrote:
> 
> > This is version 2:
> > 
> > This is my proposal to print the NMI stack traces from an RCU stall safely.
> > Here's the gist of it.
> 
> Steven, what's your plan with this patchset please? 

Has there been consensus on this approach?

> 
> It has been tested by Petr here on the systems under the loads that are 
> able to trigger the hangs.
> 
> It'd be nice to have this in 3.17, so that we can then start working on 
> fixing all the remaining printk() calls in NMI context once this biggest 
> offender (stack dumping) is done.

It definitely wont be in 3.17 (too late). I'm fine with getting this
ready unless there's other objections.

3.17 does contain the updates to trace_seq that Andrew made. Linus had
issues with the "trace" name, which this patch set creates a new
"seq_buf" that is a subset of the trace_seq.

I'll keep this on my todo list for 3.18 and push out patches to get
people's comments about it after 3.17 merge window has closed (the
required people are a bit busy at the moment).

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely
  2014-08-08 18:44   ` Steven Rostedt
@ 2014-08-12 14:17     ` Jiri Kosina
  2014-09-10  8:08     ` Jiri Kosina
  1 sibling, 0 replies; 33+ messages in thread
From: Jiri Kosina @ 2014-08-12 14:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 8 Aug 2014, Steven Rostedt wrote:

> > > This is version 2:
> > > 
> > > This is my proposal to print the NMI stack traces from an RCU stall safely.
> > > Here's the gist of it.
> > 
> > Steven, what's your plan with this patchset please? 
> 
> Has there been consensus on this approach?

Well, I don't think any better proposals have been brought up so far, all 
the other aproaches (including the one we brought up) were more intrusive, 
and the hang-in-NMI it fixes is real and simply has to be fixed.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely
  2014-08-08 18:44   ` Steven Rostedt
  2014-08-12 14:17     ` Jiri Kosina
@ 2014-09-10  8:08     ` Jiri Kosina
  2014-09-10 10:12       ` Jan Kara
  1 sibling, 1 reply; 33+ messages in thread
From: Jiri Kosina @ 2014-09-10  8:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Michal Hocko, Jan Kara, Frederic Weisbecker, Dave Anderson,
	Petr Mladek, Paul E. McKenney, Konstantin Khlebnikov

On Fri, 8 Aug 2014, Steven Rostedt wrote:

> > > This is my proposal to print the NMI stack traces from an RCU stall safely.
> > > Here's the gist of it.
> > 
> > Steven, what's your plan with this patchset please? 
> 
> Has there been consensus on this approach?

>From what I heard, the kernel summit workshop seems to have reached that 
conclusion; I was unfortunately unable to attend it due to another 
discussion I was interested in happening at the same time -- Jan Kara was 
there, so I believe he can provide all the details if necessary.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely
  2014-09-10  8:08     ` Jiri Kosina
@ 2014-09-10 10:12       ` Jan Kara
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Kara @ 2014-09-10 10:12 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Petr Mladek, Paul E. McKenney,
	Konstantin Khlebnikov

On Wed 10-09-14 10:08:02, Jiri Kosina wrote:
> On Fri, 8 Aug 2014, Steven Rostedt wrote:
> 
> > > > This is my proposal to print the NMI stack traces from an RCU stall safely.
> > > > Here's the gist of it.
> > > 
> > > Steven, what's your plan with this patchset please? 
> > 
> > Has there been consensus on this approach?
> 
> From what I heard, the kernel summit workshop seems to have reached that 
> conclusion; I was unfortunately unable to attend it due to another 
> discussion I was interested in happening at the same time -- Jan Kara was 
> there, so I believe he can provide all the details if necessary.
  Yes, we briefly talked about printk from NMI and noone was objecting to
Steven's solution.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-06-27 16:52           ` Petr Mládek
@ 2014-09-26 15:00             ` Steven Rostedt
  2014-09-26 16:28               ` Petr Mladek
  0 siblings, 1 reply; 33+ messages in thread
From: Steven Rostedt @ 2014-09-26 15:00 UTC (permalink / raw)
  To: Petr Mládek
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

As people are asking for this patch series to be added, I'm going back
through your comments. I never replied to this email (at least my email
client says I did not).

On Fri, 27 Jun 2014 18:52:04 +0200
Petr Mládek <pmladek@suse.cz> wrote:

> On Fri 2014-06-27 11:39:09, Steven Rostedt wrote:
> > On Fri, 27 Jun 2014 17:18:04 +0200
> > Petr Mládek <pmladek@suse.cz> wrote:
> > 
> > 
> > > > This patch uses seq_buf for the NMI code so it will fill to the end of
> > > > the buffer and just truncate what can't fit.
> > > 
> > > I think that NMI code could live with the trace_seq behavior. The
> > > lines are short. If we miss few characters it is not that big difference.
> > 
> > True, but I'm trying to keep trace_seq more tracing specific.
> 
> It is true that most writing functions write as much as possible. On
> the other hand, I do not think that refusing to write, if there is not
> enough space, is specific to tracing. It might be useful also for others.

Looking at what seq_file does, for example seq_printf(), it fills the
buffer to the max size. If the printf is truncated, it sets the count
to the size of the buffer. Then in traverse(), it detects that an
overflow happened and stops reading, frees the buffer, increases the
size of the buffer (by power of two), and then tries again with the
bigger buffer.

Really, it doesn't matter if seq_printf() didn't write anything or if
it truncated, the result would be the same. Perhaps then I can keep
seq_buf doing a all or nothing approach like trace_seq does. I think it
was Andrew (akpm) that criticized this behavior.


> 
> In the worst case, we could add a flag into the "struct seq_buf" that
> might define the behavior. Well, I am not sure if we want to add this
> complexity when there are no users. As I said, the backtraces might
> live with trace_seq behavior.
> 

Yeah, I don't want to add more complexity to this. No flags :-)


> >> 
> > > > trace_pipe depends on the trace_seq behavior.
> > > > 
> > > > > 
> > > > >    Another solution would be to live with incomplete lines in tracing.
> > > > >    I wonder if any of the functions tries to write the line again when the
> > > > >    write failed.
> > > > 
> > > > This may break trace_pipe. Although there looks to be redundant
> > > > behavior in that the pipe code also resets the seq.len on partial line,
> > > > so maybe it's not an issue.
> > > > 
> > > > > 
> > > > >    IMHO, the most important thing is that both functions return 0 on
> > > > >    failure.
> > > > 
> > > > Note, I'm not sure how tightly these two need to be. I'm actually
> > > > trying to get trace_seq to be specific to tracing and nothing more.
> > > > Have seq_buf be used for all other instances.
> > > 
> > > If the two layers make your life easier then they might make sense. I
> > > just saw many similarities and wanted to help. IMHO, if anyone breaks
> > > seq_buf, it will break trace_seq anyway. So, they are not really separated.
> > 
> > There's actually things I want to do with the seq_buf behavior that I
> > can't do with the trace_seq behavior. That's more about having a
> > dynamic way to create seq_buf buffers and resize them if need be.
> > Although, perhaps an all or nothing approach will help in that.
> 
> Note that we should not allocate in NMI context, especially when there
> is a softlock. So, we might need to define the behavior anyway.

No, I wouldn't mean that seq_buf would automatically allocate, but a
wrapper to seq_buf could. Like what seq_file does. In fact, perhaps, we
can have seq_buf be part of seq_file and consolidate the behavior?

> 
> 
> > > 
> > > > 
> > > > > ad 4th:
> > > > > 
> > > > >    Both "full" and "overflow" flags seems to have the same meaning.
> > > > >    For example, trace_seq_printf() sets "full" on failure even
> > > > >    when s->seq.len != s->size.
> > > > 
> > > > The difference is that the overflow flag is just used for info letting
> > > > the user know that it did not fit. The full flag in trace_seq lets you
> > > > know that you can not add anything else, even though the new stuff may
> > > > fit.
> > > 
> > > I see. They have another meaning but they are set at the same time:
> > > 
> > > 	if (s->seq.overflow) {
> > > 		...
> > > 		s->full = 1;
> > > 		return 0;
> > > 	}

Well, if it is overflowed, it can't write anymore ;-)

But notice that the seq.len is still the same. Hmm, I need to zero that
as well. As tools are allowed to print s->buffer with it. We don't want
added data to it.


> > > 
> > > In fact, both names are slightly misleading. seq.overflow is set
> > > when the buffer is full even when all characters were written.
> > > s->full is set even when there is still some space :-)
> > 
> > I actually disagree. overflow means that you wrote more than what was
> > there. In this case, it was cropped.
> 
> The problem is that we do not know that it was cropped.
> 
> The "overflow" flag is set when (s->len > (s->size - 1)). In most
> cases it will be set when (s->len == s->size).
> 
> For example, seq_buf_printf() calls vsnprintf(). It will never write
> over the buffer. We do not know if the message was cropped or if we
> were lucky and the message was exactly as long as the free space.
> 

Again, I was doing this because of what was suggested before. I'll try
to find the email. I may work to have seq_buf() be used for seq_file
first, and then make trace_seq() use it. That might make even more
sense.

> 
> > Full suggests that you can't add anymore because it wont allow you.
> > 
> > The difference is that you are looking at a implementation point of
> > view. I'm looking at a usage point of view. With trace_seq, there is no
> > space left, you can't add any more. seq_buf, you can add more but it
> > wont be saved because it was overflowed.
> 
> I know that there is some difference but I am not sure if users are
> really interested into such details. In both cases, they are unable
> to write more data. From they point of view, the buffer is simply full.
> Honestly, I think that the two flags and the two point of view cause
> more confusion than win.

Lets see what happens if I make seq_file use seq_buf. This may be
interesting.

> 
> Note that I do not want to change the meaning for trace_buf. I want to
> change the meaning for "seq_buf" that has no users yet.

Understood. But others have suggested ideas that contradict yours. We
do need to straighten this out.

> 
> 
> > What I should do is change overflow from being a flag to the number of
> > characters that it overflowed by. That would be useful information, and
> > would make more sense. It lets the user know what wasn't written.
> 
> I am afraid that we do not have this information. See above for the
> example with vsnprintf().
> 
> In each case, I do not want to block this patch. The generic "seq_buf"
> API looks reasonable. The tracing code is your area and it is your
> decision. You know much more about it than me and the extra complexity
> might be needed.

Yeah, we can always change it later. It's not an interface into
userspace, thus it's not set in stone. But I still want something that
is reasonable before pushing further. I'll go find those comments from
Andrew and see where things can be figured out. Perhaps writing a patch
that makes seq_file() use seq_buf might show what is needed better.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq
  2014-09-26 15:00             ` Steven Rostedt
@ 2014-09-26 16:28               ` Petr Mladek
  0 siblings, 0 replies; 33+ messages in thread
From: Petr Mladek @ 2014-09-26 16:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Jiri Kosina, Michal Hocko, Jan Kara, Frederic Weisbecker,
	Dave Anderson, Paul E. McKenney, Konstantin Khlebnikov

On Fri 26-09-14 11:00:43, Steven Rostedt wrote:
> As people are asking for this patch series to be added, I'm going back
> through your comments. I never replied to this email (at least my email
> client says I did not).

It is great that you are on it again. I am looking forward to have the
proposed solution of the backtrace printing under NMI.


> On Fri, 27 Jun 2014 18:52:04 +0200
> Petr Mládek <pmladek@suse.cz> wrote:
> 
> > On Fri 2014-06-27 11:39:09, Steven Rostedt wrote:
> > > On Fri, 27 Jun 2014 17:18:04 +0200
> > > Petr Mládek <pmladek@suse.cz> wrote:
> > > 
> > > 
> > > > > This patch uses seq_buf for the NMI code so it will fill to the end of
> > > > > the buffer and just truncate what can't fit.
> > > > 
> > > > I think that NMI code could live with the trace_seq behavior. The
> > > > lines are short. If we miss few characters it is not that big difference.
> > > 
> > > True, but I'm trying to keep trace_seq more tracing specific.
> > 
> > It is true that most writing functions write as much as possible. On
> > the other hand, I do not think that refusing to write, if there is not
> > enough space, is specific to tracing. It might be useful also for others.
> 
> Looking at what seq_file does, for example seq_printf(), it fills the
> buffer to the max size. If the printf is truncated, it sets the count
> to the size of the buffer. Then in traverse(), it detects that an
> overflow happened and stops reading, frees the buffer, increases the
> size of the buffer (by power of two), and then tries again with the
> bigger buffer.
> 
> Really, it doesn't matter if seq_printf() didn't write anything or if
> it truncated, the result would be the same. Perhaps then I can keep
> seq_buf doing a all or nothing approach like trace_seq does. I think it
> was Andrew (akpm) that criticized this behavior.

I guess that you mean https://lkml.org/lkml/2014/6/20/26 and the part:

--- cut ---
> +
> +	/* If we can't write it all, don't bother writing anything */

This is somewhat unusual behavior for a write()-style thing.  Comment
should explain "why", not "what".
--- cut ---

> 
> > 
> > 
> > > > 
> > > > > 
> > > > > > ad 4th:
> > > > > > 
> > > > > >    Both "full" and "overflow" flags seems to have the same meaning.
> > > > > >    For example, trace_seq_printf() sets "full" on failure even
> > > > > >    when s->seq.len != s->size.
> > > > > 
> > > > > The difference is that the overflow flag is just used for info letting
> > > > > the user know that it did not fit. The full flag in trace_seq lets you
> > > > > know that you can not add anything else, even though the new stuff may
> > > > > fit.
> > > > 
> > > > I see. They have another meaning but they are set at the same time:
> > > > 
> > > > 	if (s->seq.overflow) {
> > > > 		...
> > > > 		s->full = 1;
> > > > 		return 0;
> > > > 	}
> 
> Well, if it is overflowed, it can't write anymore ;-)
> 
> But notice that the seq.len is still the same. Hmm, I need to zero that
> as well. As tools are allowed to print s->buffer with it. We don't want
> added data to it.

Your patch actually do

    back s->seq.len = save_len;

which is reasonable. I probably simplified it by the dots (...) too much. :-)

> > > > 
> > > > In fact, both names are slightly misleading. seq.overflow is set
> > > > when the buffer is full even when all characters were written.
> > > > s->full is set even when there is still some space :-)
> > > 
> > > I actually disagree. overflow means that you wrote more than what was
> > > there. In this case, it was cropped.
> > 
> > The problem is that we do not know that it was cropped.
> > 
> > The "overflow" flag is set when (s->len > (s->size - 1)). In most
> > cases it will be set when (s->len == s->size).
> > 
> > For example, seq_buf_printf() calls vsnprintf(). It will never write
> > over the buffer. We do not know if the message was cropped or if we
> > were lucky and the message was exactly as long as the free space.
> > 
> 
> Again, I was doing this because of what was suggested before. I'll try
> to find the email. I may work to have seq_buf() be used for seq_file
> first, and then make trace_seq() use it. That might make even more
> sense.

OK, let's see what is going out of the integration with seq_file.

Also I am going to think about another solution. In fact, I think that
both names "overflow" and "full" are slightly misleading. As explained
above, "overflow" might be set event when there was no real overflow and
"full" is set even when there seems to be a space. I understand
how it is designed but I wonder if we might find a more clear
solution, maybe just a better name(s).

> > In each case, I do not want to block this patch. The generic "seq_buf"
> > API looks reasonable. The tracing code is your area and it is your
> > decision. You know much more about it than me and the extra complexity
> > might be needed.
> 
> Yeah, we can always change it later. It's not an interface into
> userspace, thus it's not set in stone. But I still want something that
> is reasonable before pushing further. I'll go find those comments from
> Andrew and see where things can be figured out. Perhaps writing a patch
> that makes seq_file() use seq_buf might show what is needed better.

I see. It makes sense.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-09-26 16:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-26 21:49 [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Steven Rostedt
2014-06-26 21:49 ` [RFC][PATCH 1/5 v2] tracing: Add trace_seq_buffer_ptr() helper function Steven Rostedt
2014-06-27  1:06   ` Steven Rostedt
2014-06-27  3:14     ` James Bottomley
2014-07-03 16:03       ` Steven Rostedt
2014-06-27  7:37     ` Paolo Bonzini
2014-06-26 21:49 ` [RFC][PATCH 2/5 v2] tracing: Create seq_buf layer in trace_seq Steven Rostedt
2014-06-27 13:45   ` Petr Mládek
2014-06-27 14:19     ` Steven Rostedt
2014-06-27 15:18       ` Petr Mládek
2014-06-27 15:39         ` Steven Rostedt
2014-06-27 16:52           ` Petr Mládek
2014-09-26 15:00             ` Steven Rostedt
2014-09-26 16:28               ` Petr Mladek
2014-06-27 14:21     ` Steven Rostedt
2014-06-27 14:56       ` Petr Mládek
2014-06-26 21:49 ` [RFC][PATCH 3/5 v2] seq_buf: Move the seq_buf code to lib/ Steven Rostedt
2014-06-27 13:48   ` Petr Mládek
2014-06-27 14:27     ` Steven Rostedt
2014-06-27 14:39       ` Petr Mládek
2014-06-27 14:44         ` Steven Rostedt
2014-06-26 21:49 ` [RFC][PATCH 4/5 v2] printk: Add per_cpu printk func to allow printk to be diverted Steven Rostedt
2014-06-27 14:20   ` Petr Mládek
2014-06-27 14:39     ` Steven Rostedt
2014-06-27 14:43       ` Petr Mládek
     [not found] ` <20140626220130.764213722@goodmis.org>
2014-06-26 22:51   ` [RFC][PATCH 5/5 v2] x86/nmi: Perform a safe NMI stack trace on all CPUs Steven Rostedt
2014-06-27 14:32   ` Petr Mládek
2014-06-27 14:40     ` Steven Rostedt
2014-08-07  8:41 ` [RFC][PATCH 0/5 v2] x86/nmi: Print all cpu stacks from NMI safely Jiri Kosina
2014-08-08 18:44   ` Steven Rostedt
2014-08-12 14:17     ` Jiri Kosina
2014-09-10  8:08     ` Jiri Kosina
2014-09-10 10:12       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).