All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] [GIT PULL] more updates for the tag format
@ 2009-06-10  5:42 Steven Rostedt
  2009-06-10  5:42 ` [PATCH 01/11] tracing/events: convert block trace points to TRACE_EVENT(), fix Steven Rostedt
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong


Ingo,

Please pull the latest tip/tracing/event-print-format tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/event-print-format


Li Zefan (1):
      tracing/events: convert block trace points to TRACE_EVENT(), fix

Steven Rostedt (10):
      tracing: add nsec2sec print formats
      tracing: convert lockdep lock_acquired trace point to use nsec2usec tag
      tracing: add major and minor tags for print format
      tracing: use << to print < instead of \<
      tracing: convert the block trace points to use the new tag format
      tracing: add test for strings in event tag format
      tracing: add func and symfunc to tag format
      tracing: check full name for field
      tracing: update sample code with new tag format
      tracing: move '>' to out of macros and into print statement

----
 include/linux/blktrace_api.h               |    4 +-
 include/linux/ftrace_event.h               |    3 +-
 include/trace/events/block.h               |  101 +++------
 include/trace/events/irq.h                 |    8 +-
 include/trace/events/kmem.h                |   12 +-
 include/trace/events/lockdep.h             |    8 +-
 include/trace/ftrace.h                     |    2 +-
 kernel/trace/trace_output.c                |    2 +-
 kernel/trace/trace_output.h                |    4 +
 kernel/trace/trace_read_binary.c           |  304 +++++++++++++++++++++++-----
 samples/trace_events/trace-events-sample.c |   21 ++-
 samples/trace_events/trace-events-sample.h |   66 ++++++
 12 files changed, 399 insertions(+), 136 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/11] tracing/events: convert block trace points to TRACE_EVENT(), fix
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 02/11] tracing: add nsec2sec print formats Steven Rostedt
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong, Li Zefan

[-- Attachment #1: 0001-tracing-events-convert-block-trace-points-to-TRACE_E.patch --]
[-- Type: text/plain, Size: 1135 bytes --]

From: Li Zefan <lizf@cn.fujitsu.com>

Fix building failures when CONFIG_BLOCK == n.

[ Impact: build fix ]

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A2F1520.8020003@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/blktrace_api.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index c7ec31d..7e4350e 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -218,7 +218,7 @@ static inline int blk_trace_init_sysfs(struct device *dev)
 
 #endif /* CONFIG_BLK_DEV_IO_TRACE */
 
-#ifdef CONFIG_EVENT_TRACING
+#if defined(CONFIG_EVENT_TRACING) && defined(CONFIG_BLOCK)
 
 static inline int blk_cmd_buf_len(struct request *rq)
 {
@@ -229,7 +229,7 @@ extern void blk_dump_cmd(char *buf, struct request *rq);
 extern void blk_fill_rwbs(char *rwbs, u32 rw, int bytes);
 extern void blk_fill_rwbs_rq(char *rwbs, struct request *rq);
 
-#endif /* CONFIG_EVENT_TRACING */
+#endif /* CONFIG_EVENT_TRACING && CONFIG_BLOCK */
 
 #endif /* __KERNEL__ */
 #endif
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/11] tracing: add nsec2sec print formats
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
  2009-06-10  5:42 ` [PATCH 01/11] tracing/events: convert block trace points to TRACE_EVENT(), fix Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 03/11] tracing: convert lockdep lock_acquired trace point to use nsec2usec tag Steven Rostedt
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0002-tracing-add-nsec2sec-print-formats.patch --]
[-- Type: text/plain, Size: 8535 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

This patch adds the tags nsec2sec, nsec2usec and nsec2msec tags that
will do the conversion of a nsec unsigned long long value into a
printf("%lu.%0*lu", precision, nsec) format.

The tags are:

  <nsec2sec:precision:field>
  <nsec2msec:precision:field>
  <nsec2usec:precision:field>

For example, having the field ns that holds an unsigned long long
nanosecond value, we can do:

  <nsec2sec:6:ns>

That will produce something like:

 23.123456  as a value, if ns was 23123456789.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_read_binary.c |  189 +++++++++++++++++++++++++++++++-------
 1 files changed, 154 insertions(+), 35 deletions(-)

diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
index b47285d..e2cb6c4 100644
--- a/kernel/trace/trace_read_binary.c
+++ b/kernel/trace/trace_read_binary.c
@@ -21,7 +21,9 @@ static struct trace_seq buffer;
  *
  * FMT :=  constant string FMT | COMMAND FMT | empty
  * COMMAND := <TYPE:FIELD> | <mask:FIELD:DELIM:MASKS> | <sym:FIELD:SYMBOLS> |
- *               <if:FIELD:TRUE:FALSE> | <ifmask:FIELD:MASK:TRUE:FALSE>
+ *            <if:FIELD:TRUE:FALSE> | <ifmask:FIELD:MASK:TRUE:FALSE> |
+ *            <nsec2sec:PRECISION:FIELD> | <nsec2usec:PRECISION:FIELD> |
+ *            <nsec2msec:PRECISION:FIELD>
  * TYPE := int | uint | hex | ptr | string | strarray
  * FIELD := defined by the event structure
  * MASKS := MASK=NAME,MASKS | MASK=NAME
@@ -34,6 +36,30 @@ static struct trace_seq buffer;
  * NAME := the name to write when a match is found
  *
  * A '\<' would print '<'
+ *
+ * Commands:
+ *  int       : Print the field out as an signed integer. The size of the field
+ *              determines if it is a char, short, long, or long long.
+ *  uint      : Same as int, but for unsigned.
+ *  hex       : Print the field as a hex (ie. 0x4ab)
+ *  ptr       : Print the field as a hex but without the '0x'.
+ *  string    : Used with dynamic sized strings (__string)
+ *  strarray  : Used with static sized arrays (__array ( char , item, len)).
+ *  if        : If the field is non zero, print the TRUE text,
+ *              otherwise print FALSE text.
+ *  ifmask    : If the bit in MASK in the field is set,
+ *              then print the TRUE text, otherwise print FALSE text.
+ *  nsec2sec  : Convert a nsec field into secs in the format:
+ *              printf("%lu.%0Plu", field), where PRECISION defines what 'P' is.
+ *  nsec2usec : Same as nsec2sec but will convert to usec.
+ *  nsec2usec : Same as nsec2sec but will convert to msec.
+ *  mask      : Print out the values of a bit mask. Each matching mask will
+ *              print its name. The order does matter. Mask of '0' is special
+ *              for it will print only if the value matches zero. The given
+ *              DELIM will separate the different masks.
+ *  sym       : Print out the name of a matching value. This is similar to
+ *              mask, but only one value may print.
+ *
  */
 
 #define TOK_SIZE 32
@@ -49,6 +75,9 @@ enum field_types {
 	FIELD_IS_STRING,
 	FIELD_IS_STRARRAY,
 	FIELD_IS_HEX,
+	FIELD_IS_NSEC2SEC,
+	FIELD_IS_NSEC2USEC,
+	FIELD_IS_NSEC2MSEC,
 	FIELD_IS_MASK,
 	FIELD_IS_SYMBOL,
 };
@@ -81,6 +110,10 @@ struct print_info {
 		} cond;
 		struct {
 			struct ftrace_event_field	*field;
+			unsigned int			precision;
+		} time;
+		struct {
+			struct ftrace_event_field	*field;
 			struct list_head		masks;
 			unsigned short			delim;
 			unsigned short			len;
@@ -217,6 +250,22 @@ add_if(struct ftrace_event_call *call, enum field_types type,
 	return 0;
 }
 
+static int
+add_time(struct ftrace_event_call *call, enum field_types type,
+	 struct ftrace_event_field *field, unsigned int precision)
+{
+	struct print_info *info;
+
+	info = alloc_print_info(call, type);
+	if (!info)
+		return -ENOMEM;
+
+	info->time.field = field;
+	info->time.precision = precision;
+
+	return 0;
+}
+
 static int add_sym_mask(struct ftrace_event_call *call, struct list_head *list,
 			unsigned long long val,
 			const char *start, const char *end)
@@ -344,6 +393,7 @@ handle_field(struct ftrace_event_call *event,
 	const char *end, *tok, *delim;
 	unsigned long long mask;
 	unsigned int delim_len;
+	unsigned long precision;
 	int ret;
 
 	end = strchr(fmt, '>');
@@ -392,6 +442,26 @@ handle_field(struct ftrace_event_call *event,
 			return NULL;
 		break;
 
+	case FIELD_IS_NSEC2SEC:
+	case FIELD_IS_NSEC2USEC:
+	case FIELD_IS_NSEC2MSEC:
+		tok = strchr(fmt, ':');
+		if (!tok || tok > end)
+			goto out_err;
+
+		precision = simple_strtoul(fmt, NULL, 0);
+
+		fmt = tok + 1;
+
+		field = find_field(event, fmt, end - fmt);
+		if (!field)
+			goto out_err;
+
+		ret = add_time(event, field_type, field, precision);
+		if (ret)
+			return NULL;
+		break;
+
 	case FIELD_IS_MASK:
 	case FIELD_IS_SYMBOL:
 		tok = strchr(fmt, ':');
@@ -517,6 +587,15 @@ ftrace_initialize_print(struct ftrace_event_call *event, const char *fmt, ...)
 		else if (strncmp(fmt, "ifmask:", 7) == 0)
 			field_type = FIELD_IS_IFMASK;
 
+		else if (strncmp(fmt, "nsec2sec:", 9) == 0)
+			field_type = FIELD_IS_NSEC2SEC;
+
+		else if (strncmp(fmt, "nsec2usec:", 10) == 0)
+			field_type = FIELD_IS_NSEC2USEC;
+
+		else if (strncmp(fmt, "nsec2msec:", 10) == 0)
+			field_type = FIELD_IS_NSEC2MSEC;
+
 		else if (strncmp(fmt, "mask:", 5) == 0)
 			field_type = FIELD_IS_MASK;
 
@@ -613,6 +692,39 @@ trace_read_symbol(struct trace_seq *s, unsigned long long val,
 
 }
 
+#define BAD_SIZE	0xdeadbeaf
+
+static unsigned long long
+get_val(void *p, int size, unsigned long long *mask)
+{
+	unsigned long long val;
+
+	switch (size) {
+	case 1:
+		val = *(char *)p;
+		*mask =  0xffULL;
+		break;
+	case 2:
+		val = *(short *)p;
+		*mask =  0xffffULL;
+		break;
+	case 4:
+		val = *(int *)p;
+		*mask =  0xffffffffULL;
+		break;
+	case 8:
+		val = *(long long *)p;
+		*mask = 0;
+		break;
+	default:
+		/* pass back a bad mask on error */
+		*mask = BAD_SIZE;
+		return 0;
+	}
+
+	return val;
+}
+
 char *
 ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 		   struct trace_entry *entry)
@@ -621,6 +733,7 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 	struct print_info *info;
 	char *start = s->buffer + s->len;
 	struct ftrace_event_field *field;
+	unsigned long divisor, rem;
 	void *p;
 
 	if (!event->print_text) {
@@ -654,25 +767,8 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
  skip_if:
 			p += field->offset;
 
-			switch (field->size) {
-			case 1:
-				val = *(char *)p;
-				mask =  0xffULL;
-				break;
-			case 2:
-				val = *(short *)p;
-				mask =  0xffffULL;
-				break;
-			case 4:
-				val = *(int *)p;
-				mask =  0xffffffffULL;
-				break;
-			case 8:
-				val = *(long long *)p;
-				mask = 0;
-				break;
-
-			default:
+			val = get_val(p, field->size, &mask);
+			if (mask == BAD_SIZE) {
 				trace_seq_printf(s,
 					"<error: bad field size %d?>\n",
 					field->size);
@@ -723,32 +819,55 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 			trace_seq_puts(s, p);
 			break;
 
+		case FIELD_IS_NSEC2USEC:
+			divisor = NSEC_PER_USEC;
+			goto do_time;
+		case FIELD_IS_NSEC2MSEC:
+			divisor = NSEC_PER_MSEC;
+			goto do_time;
+		case FIELD_IS_NSEC2SEC:
+			divisor = NSEC_PER_SEC;
+
+ do_time:
+			field = info->time.field;
+
+			p += field->offset;
+
+			val = get_val(p, field->size, &mask);
+
+			if (mask == BAD_SIZE) {
+				trace_seq_printf(s,
+					"<error: bad field size %d?>\n",
+					field->size);
+				return start;
+			}
+
+			rem = do_div(val, divisor);
+			if (info->time.precision)
+				trace_seq_printf(s, "%llu.%0*lu",
+						 val, info->time.precision,
+						 rem);
+			else
+				trace_seq_printf(s, "%llu", val);
+
+			break;
+
 		case FIELD_IS_MASK:
 		case FIELD_IS_SYMBOL:
 
 			p += info->sym_mask.field->offset;
 
-			switch (info->sym_mask.field->size) {
-			case 1:
-				val = *(unsigned char *)p;
-				break;
-			case 2:
-				val = *(unsigned short *)p;
-				break;
-			case 4:
-				val = *(unsigned int *)p;
-				break;
-			case 8:
-				val = *(unsigned long long *)p;
-				break;
-
-			default:
+			val = get_val(p, info->sym_mask.field->size, &mask);
+
+			if (mask == BAD_SIZE) {
 				trace_seq_printf(s,
 					 "<error: bad field size %d?>\n",
 					 info->sym_mask.field->size);
 				return start;
 			}
 
+			val &= mask;
+
 			if (info->type == FIELD_IS_MASK)
 				trace_read_mask(s, val, info, event);
 			else
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/11] tracing: convert lockdep lock_acquired trace point to use nsec2usec tag
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
  2009-06-10  5:42 ` [PATCH 01/11] tracing/events: convert block trace points to TRACE_EVENT(), fix Steven Rostedt
  2009-06-10  5:42 ` [PATCH 02/11] tracing: add nsec2sec print formats Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 04/11] tracing: add major and minor tags for print format Steven Rostedt
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0003-tracing-convert-lockdep-lock_acquired-trace-point-to.patch --]
[-- Type: text/plain, Size: 1304 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The current conversion of the nsec value in the lockstat lock event
is performed in the fast path. Now that there is a tag to do the conversion
we can move this into the slow path and increase performance while tracing.

This patch also fixes a typo in the tag name.

[ Impact: increase performance in lockstat tracing and tag typo fix ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/lockdep.h |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/lockdep.h b/include/trace/events/lockdep.h
index f9d74c8..4251f17 100644
--- a/include/trace/events/lockdep.h
+++ b/include/trace/events/lockdep.h
@@ -74,15 +74,13 @@ TRACE_EVENT(lock_acquired,
 
 	TP_STRUCT__entry(
 		__string(name, lock->name)
-		__field(unsigned long, wait_usec)
-		__field(unsigned long, wait_nsec_rem)
+		__field(long long, wait_nsec)
 	),
 	TP_fast_assign(
 		__assign_str(name, lock->name);
-		__entry->wait_nsec_rem = do_div(waittime, NSEC_PER_USEC);
-		__entry->wait_usec = (unsigned long) waittime;
+		__entry->wait_nsec = waittime;
 	),
-	TP_FORMAT("<sting:name> (<uint:wait_usec>.<printf:wait_nsec_rem:03lu> us)")
+	TP_FORMAT("<string:name> (<nsec2usec:3:wait_nsec> us)")
 );
 
 #endif
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/11] tracing: add major and minor tags for print format
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 03/11] tracing: convert lockdep lock_acquired trace point to use nsec2usec tag Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 05/11] tracing: use << to print < instead of \< Steven Rostedt
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0004-tracing-add-major-and-minor-tags-for-print-format.patch --]
[-- Type: text/plain, Size: 3810 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Create the tags <major:field> and <minor:field> for trace format prints.

Given a field (should be of type dev_t) these tags will extract the
major or minor number from the field.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_read_binary.c |   44 +++++++++++++++++++++++++++++++------
 1 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
index e2cb6c4..a4b5eaa 100644
--- a/kernel/trace/trace_read_binary.c
+++ b/kernel/trace/trace_read_binary.c
@@ -23,7 +23,7 @@ static struct trace_seq buffer;
  * COMMAND := <TYPE:FIELD> | <mask:FIELD:DELIM:MASKS> | <sym:FIELD:SYMBOLS> |
  *            <if:FIELD:TRUE:FALSE> | <ifmask:FIELD:MASK:TRUE:FALSE> |
  *            <nsec2sec:PRECISION:FIELD> | <nsec2usec:PRECISION:FIELD> |
- *            <nsec2msec:PRECISION:FIELD>
+ *            <nsec2msec:PRECISION:FIELD> | <major:FIELD> | <minor:FIELD>
  * TYPE := int | uint | hex | ptr | string | strarray
  * FIELD := defined by the event structure
  * MASKS := MASK=NAME,MASKS | MASK=NAME
@@ -59,7 +59,10 @@ static struct trace_seq buffer;
  *              DELIM will separate the different masks.
  *  sym       : Print out the name of a matching value. This is similar to
  *              mask, but only one value may print.
- *
+ *  major     : Given a 'device number' value, this will convert it to its
+ *              major number.
+ *  minor     : Given a 'device number' value, this will convert it to its
+ *              minor number.
  */
 
 #define TOK_SIZE 32
@@ -75,6 +78,8 @@ enum field_types {
 	FIELD_IS_STRING,
 	FIELD_IS_STRARRAY,
 	FIELD_IS_HEX,
+	FIELD_IS_MINOR,
+	FIELD_IS_MAJOR,
 	FIELD_IS_NSEC2SEC,
 	FIELD_IS_NSEC2USEC,
 	FIELD_IS_NSEC2MSEC,
@@ -405,6 +410,8 @@ handle_field(struct ftrace_event_call *event,
 	case FIELD_IS_UINT:
 	case FIELD_IS_PTR:
 	case FIELD_IS_HEX:
+	case FIELD_IS_MINOR:
+	case FIELD_IS_MAJOR:
 	case FIELD_IS_STRING:
 	case FIELD_IS_STRARRAY:
 		field = find_field(event, fmt, end - fmt);
@@ -605,6 +612,12 @@ ftrace_initialize_print(struct ftrace_event_call *event, const char *fmt, ...)
 		else if (strncmp(fmt, "strarray:", 9) == 0)
 			field_type = FIELD_IS_STRARRAY;
 
+		else if (strncmp(fmt, "major:", 6) == 0)
+			field_type = FIELD_IS_MAJOR;
+
+		else if (strncmp(fmt, "minor:", 6) == 0)
+			field_type = FIELD_IS_MINOR;
+
 		else
 			goto err_format;
 
@@ -758,6 +771,8 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 		case FIELD_IS_UINT:
 		case FIELD_IS_HEX:
 		case FIELD_IS_PTR:
+		case FIELD_IS_MAJOR:
+		case FIELD_IS_MINOR:
 			field = info->data.field;
 			goto skip_if;
 
@@ -775,8 +790,9 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 				return start;
 			}
 
-			if (info->type == FIELD_IS_IF ||
-			    info->type == FIELD_IS_IFMASK) {
+			switch (info->type) {
+			case FIELD_IS_IF:
+			case FIELD_IS_IFMASK:
 
 				if (info->type == FIELD_IS_IFMASK)
 					val &= info->cond.mask;
@@ -789,11 +805,25 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 					trace_seq_putmem(s, event->print_text +
 							 info->cond.false_text,
 							 info->cond.false_len);
-			} else if (info->type == FIELD_IS_INT)
+				break;
+
+			case FIELD_IS_INT:
 				trace_seq_printf(s, "%lld", val);
-			else if (info->type == FIELD_IS_UINT)
+				break;
+
+			case FIELD_IS_UINT:
 				trace_seq_printf(s, "%llu", val);
-			else {
+				break;
+
+			case FIELD_IS_MAJOR:
+				trace_seq_printf(s, "%u", MAJOR(val));
+				break;
+
+			case FIELD_IS_MINOR:
+				trace_seq_printf(s, "%u", MINOR(val));
+				break;
+
+			default:
 				/* hex should only print the size specified */
 				if (mask)
 					val &= mask;
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/11] tracing: use << to print < instead of \<
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 04/11] tracing: add major and minor tags for print format Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 06/11] tracing: convert the block trace points to use the new tag format Steven Rostedt
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0005-tracing-use-to-print-instead-of.patch --]
[-- Type: text/plain, Size: 1295 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The print formats are passed through sprintf for processing. This means that
we can not use % or \ unless we process it for printk. Trying to use \< to
print a single < breaks the sprintf processing.

Instead, to print a single '<' a '<<' is used instead.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_read_binary.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
index a4b5eaa..d005138 100644
--- a/kernel/trace/trace_read_binary.c
+++ b/kernel/trace/trace_read_binary.c
@@ -35,7 +35,7 @@ static struct trace_seq buffer;
  * FALSE := print when field is zero or NULL
  * NAME := the name to write when a match is found
  *
- * A '\<' would print '<'
+ * A '<<' would print '<'
  *
  * Commands:
  *  int       : Print the field out as an signed integer. The size of the field
@@ -555,11 +555,11 @@ ftrace_initialize_print(struct ftrace_event_call *event, const char *fmt, ...)
 				goto err_free;
 			break;
 		}
-		if (*(tok - 1) == '\\') {
+		if (*(tok + 1) == '<') {
 			ret = add_less_than(event, fmt, tok);
 			if (ret)
 				goto err_free;
-			fmt = tok + 1;
+			fmt = tok + 2;
 			continue;
 		}
 
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/11] tracing: convert the block trace points to use the new tag format
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (4 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 05/11] tracing: use << to print < instead of \< Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 07/11] tracing: add test for strings in event " Steven Rostedt
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0006-tracing-convert-the-block-trace-points-to-use-the-ne.patch --]
[-- Type: text/plain, Size: 7872 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Convert the block trace points to use the new tag format for easier parsing
by userspace tools.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/block.h |  101 ++++++++++++++----------------------------
 1 files changed, 34 insertions(+), 67 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 53effd4..f12cee3 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -34,11 +34,8 @@ TRACE_EVENT(block_rq_abort,
 		blk_dump_cmd(__get_str(cmd), rq);
 	),
 
-	TP_printk("%d,%d %s (%s) %llu + %u [%d]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->rwbs, __get_str(cmd),
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->errors)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> (<string:cmd>) "
+		  "<uint:sector> + <uint:nr_sector> [<int:errors>]")
 );
 
 TRACE_EVENT(block_rq_insert,
@@ -69,11 +66,9 @@ TRACE_EVENT(block_rq_insert,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %u (%s) %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->rwbs, __entry->bytes, __get_str(cmd),
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> <uint:bytes>"
+		  " (<string:cmd>) <uint:sector> + <uint:nr_sector>"
+		  " [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_rq_issue,
@@ -104,11 +99,9 @@ TRACE_EVENT(block_rq_issue,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %u (%s) %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->rwbs, __entry->bytes, __get_str(cmd),
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> <uint:bytes>"
+		  " (<string:cmd>) <uint:sector> + <uint:nr_sector>"
+		  " [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_rq_requeue,
@@ -137,11 +130,8 @@ TRACE_EVENT(block_rq_requeue,
 		blk_dump_cmd(__get_str(cmd), rq);
 	),
 
-	TP_printk("%d,%d %s (%s) %llu + %u [%d]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->rwbs, __get_str(cmd),
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->errors)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> (<string:cmd>)"
+		  " <uint:sector> + <uint:nr_sector> [<int:errors>]")
 );
 
 TRACE_EVENT(block_rq_complete,
@@ -170,11 +160,8 @@ TRACE_EVENT(block_rq_complete,
 		blk_dump_cmd(__get_str(cmd), rq);
 	),
 
-	TP_printk("%d,%d %s (%s) %llu + %u [%d]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->rwbs, __get_str(cmd),
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->errors)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> (<string:cmd>) "
+		  "<uint:sector> + <uint:nr_sector> [<int:errors>]")
 );
 TRACE_EVENT(block_bio_bounce,
 
@@ -198,10 +185,8 @@ TRACE_EVENT(block_bio_bounce,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> "
+		  "<uint:sector> + <uint:nr_sector> [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_bio_complete,
@@ -225,10 +210,8 @@ TRACE_EVENT(block_bio_complete,
 		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
 	),
 
-	TP_printk("%d,%d %s %llu + %u [%d]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->error)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> <uint:sector>"
+		  " + <uint:nr_sector> [<int:error>]")
 );
 
 TRACE_EVENT(block_bio_backmerge,
@@ -253,10 +236,8 @@ TRACE_EVENT(block_bio_backmerge,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> "
+		  "<uint:sector> + <uint:nr_sector> [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_bio_frontmerge,
@@ -281,10 +262,8 @@ TRACE_EVENT(block_bio_frontmerge,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> "
+		  "<uint:sector> + <uint:nr_sector> [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_bio_queue,
@@ -309,10 +288,8 @@ TRACE_EVENT(block_bio_queue,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> "
+		  "<uint:sector> + <uint:nr_sector> [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_getrq,
@@ -338,10 +315,8 @@ TRACE_EVENT(block_getrq,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
         ),
 
-	TP_printk("%d,%d %s %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> "
+		  "<uint:sector> + <uint:nr_sector> [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_sleeprq,
@@ -367,10 +342,8 @@ TRACE_EVENT(block_sleeprq,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %llu + %u [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector, __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> "
+		  "<uint:sector> + <uint:nr_sector> [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_plug,
@@ -387,7 +360,7 @@ TRACE_EVENT(block_plug,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("[%s]", __entry->comm)
+	TP_FORMAT("[<strarray:comm>]")
 );
 
 TRACE_EVENT(block_unplug_timer,
@@ -406,7 +379,7 @@ TRACE_EVENT(block_unplug_timer,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("[%s] %d", __entry->comm, __entry->nr_rq)
+	TP_FORMAT("[<strarray:comm>] <int:nr_rq>")
 );
 
 TRACE_EVENT(block_unplug_io,
@@ -425,7 +398,7 @@ TRACE_EVENT(block_unplug_io,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("[%s] %d", __entry->comm, __entry->nr_rq)
+	TP_FORMAT("[<strarray:comm>] <int:nr_rq>")
 );
 
 TRACE_EVENT(block_split,
@@ -451,11 +424,8 @@ TRACE_EVENT(block_split,
 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
 	),
 
-	TP_printk("%d,%d %s %llu / %llu [%s]",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  (unsigned long long)__entry->new_sector,
-		  __entry->comm)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> <uint:sector> / "
+		  "<uint:new_sector> [<strarray:comm>]")
 );
 
 TRACE_EVENT(block_remap,
@@ -483,12 +453,9 @@ TRACE_EVENT(block_remap,
 		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
 	),
 
-	TP_printk("%d,%d %s %llu + %u <- (%d,%d) %llu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
-		  (unsigned long long)__entry->sector,
-		  __entry->nr_sector,
-		  MAJOR(__entry->old_dev), MINOR(__entry->old_dev),
-		  (unsigned long long)__entry->old_sector)
+	TP_FORMAT("<major:dev>,<minor:dev> <strarray:rwbs> <uint:sector> + "
+		  "<uint:nr_sector> <<- (<major:old_dev>,<minor:old_dev>) "
+		  "<uint:old_sector>")
 );
 
 #endif /* _TRACE_BLOCK_H */
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/11] tracing: add test for strings in event tag format
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (5 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 06/11] tracing: convert the block trace points to use the new tag format Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 08/11] tracing: add func and symfunc to " Steven Rostedt
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0007-tracing-add-test-for-strings-in-event-tag-format.patch --]
[-- Type: text/plain, Size: 3864 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

To protect against any disaster if a format uses a string when it
should have used something else, this code puts in test to make sure
the output will not cause any harm.

The entry size is passed to the format parser and checks are made
to ensure that the string read into the trace_seq at least has an
ending '\0' character within the range of the entry record.

[ Impact: prevent unpredictable results with bad formats ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h     |    3 +-
 include/trace/ftrace.h           |    2 +-
 kernel/trace/trace_read_binary.c |   43 +++++++++++++++++++++++++++++++++----
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 1802459..26fed99 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -142,7 +142,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call,
 
 extern char *ftrace_read_binary(struct trace_seq *p,
 				struct ftrace_event_call *event,
-				struct trace_entry *entry);
+				struct trace_entry *entry,
+				int entry_size);
 extern int ftrace_initialize_print(struct ftrace_event_call *event,
 				   const char *fmt, ...)
 	__attribute__ ((format (printf, 2, 3)));
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index e3370c5..d250cb9 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -126,7 +126,7 @@
 
 #undef TP_FORMAT
 #define TP_FORMAT(fmt, args...) \
-	"%s\n", ftrace_read_binary(p, event_call, entry)
+	"%s\n", ftrace_read_binary(p, event_call, entry, iter->ent_size)
 
 #undef __get_dynamic_array
 #define __get_dynamic_array(field)	\
diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
index d005138..f3fdac8 100644
--- a/kernel/trace/trace_read_binary.c
+++ b/kernel/trace/trace_read_binary.c
@@ -738,9 +738,22 @@ get_val(void *p, int size, unsigned long long *mask)
 	return val;
 }
 
+/* make sure string pointer has an end */
+static int test_string(const char *str, int size)
+{
+	int i;
+
+	for (i = 0; i < size; i++) {
+		if (!str[i])
+			return 0;
+	}
+
+	return -1;
+}
+
 char *
 ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
-		   struct trace_entry *entry)
+		   struct trace_entry *entry, int entry_size)
 {
 	unsigned long long val, mask;
 	struct print_info *info;
@@ -748,6 +761,7 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 	struct ftrace_event_field *field;
 	unsigned long divisor, rem;
 	void *p;
+	int len;
 
 	if (!event->print_text) {
 		trace_seq_puts(s, "UNDEFINED EVENT\n");
@@ -834,18 +848,37 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 			break;
 
 		case FIELD_IS_STRING:
-			p += info->data.field->offset;
+			field = info->data.field;
+			p += field->offset;
 			/* indexes are expected to be unsigned short */
-			if (info->data.field->size != 2) {
-				trace_seq_puts(s, "BAD FIELD SIZE\n");
+			if (field->size != 2) {
+				trace_seq_printf(s, "BAD FIELD SIZE %d\n",
+						 field->size);
 				return start;
 			}
+			if (*(unsigned short *)p > entry_size) {
+				trace_seq_puts(s, "BAD STRING POINTER\n");
+				return start;
+			}
+			len = entry_size - *(unsigned short *)p;
 			p = (void *)entry + *(unsigned short *)p;
+
+			/* make sure that the string has an end */
+			if (test_string(p, len)) {
+				trace_seq_puts(s, "BAD STRING CONTENT\n");
+				return start;
+			}
+
 			trace_seq_puts(s, p);
 			break;
 
 		case FIELD_IS_STRARRAY:
-			p += info->data.field->offset;
+			field = info->data.field;
+			p += field->offset;
+			if (test_string(p, field->size)) {
+				trace_seq_puts(s, "BAD STRING CONTENT\n");
+				return start;
+			}
 			trace_seq_puts(s, p);
 			break;
 
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/11] tracing: add func and symfunc to tag format
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (6 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 07/11] tracing: add test for strings in event " Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  7:48   ` Frederic Weisbecker
  2009-06-10  5:42 ` [PATCH 09/11] tracing: check full name for field Steven Rostedt
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0008-tracing-add-func-and-symfunc-to-tag-format.patch --]
[-- Type: text/plain, Size: 4744 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

This adds a way to print out a function name from a pointer the same way
that the Linux printf "%pF" does. Well the symfunc tag does.

If we change kmalloc and kfree formats to:

  TP_FORMAT("call_site=<symfunc:call_site> ptr=<ptr:ptr> bytes_req=<int:bytes_req> "

and

  TP_FORMAT("call_site=<func:call_site> ptr=<ptr:ptr>")

respectively, we would get something like:

   kmalloc: call_site=__alloc_skb+0x70/0x160 ptr=ffff88003d830800 bytes_req=680
   kfree: call_site=skb_release_data ptr=ffff88003d830800

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_output.c      |    2 +-
 kernel/trace/trace_output.h      |    4 ++++
 kernel/trace/trace_read_binary.c |   27 +++++++++++++++++++++++++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 7938f3a..39d5eeb 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -326,7 +326,7 @@ static inline const char *kretprobed(const char *name)
 }
 #endif /* CONFIG_KRETPROBES */
 
-static int
+int
 seq_print_sym_short(struct trace_seq *s, const char *fmt, unsigned long address)
 {
 #ifdef CONFIG_KALLSYMS
diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
index d38bec4..412eee8 100644
--- a/kernel/trace/trace_output.h
+++ b/kernel/trace/trace_output.h
@@ -47,5 +47,9 @@ do {							\
 		return TRACE_TYPE_PARTIAL_LINE;		\
 } while (0)
 
+/* until we have a way with sprintf to do this */
+int seq_print_sym_short(struct trace_seq *s, const char *fmt,
+			unsigned long address);
+
 #endif
 
diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
index f3fdac8..9ff4ed8 100644
--- a/kernel/trace/trace_read_binary.c
+++ b/kernel/trace/trace_read_binary.c
@@ -10,7 +10,7 @@
 #include <linux/module.h>
 #include <linux/ctype.h>
 
-#include "trace.h"
+#include "trace_output.h"
 
 static DEFINE_MUTEX(buffer_lock);
 static struct trace_seq buffer;
@@ -23,7 +23,8 @@ static struct trace_seq buffer;
  * COMMAND := <TYPE:FIELD> | <mask:FIELD:DELIM:MASKS> | <sym:FIELD:SYMBOLS> |
  *            <if:FIELD:TRUE:FALSE> | <ifmask:FIELD:MASK:TRUE:FALSE> |
  *            <nsec2sec:PRECISION:FIELD> | <nsec2usec:PRECISION:FIELD> |
- *            <nsec2msec:PRECISION:FIELD> | <major:FIELD> | <minor:FIELD>
+ *            <nsec2msec:PRECISION:FIELD> | <major:FIELD> | <minor:FIELD> |
+ *            <func:FIELD> | <symfunc:FIELD>
  * TYPE := int | uint | hex | ptr | string | strarray
  * FIELD := defined by the event structure
  * MASKS := MASK=NAME,MASKS | MASK=NAME
@@ -43,6 +44,8 @@ static struct trace_seq buffer;
  *  uint      : Same as int, but for unsigned.
  *  hex       : Print the field as a hex (ie. 0x4ab)
  *  ptr       : Print the field as a hex but without the '0x'.
+ *  symfunc   : Print a ptr like the "%pF" does.
+ *  func      : Like symfunc but without the address.
  *  string    : Used with dynamic sized strings (__string)
  *  strarray  : Used with static sized arrays (__array ( char , item, len)).
  *  if        : If the field is non zero, print the TRUE text,
@@ -80,6 +83,8 @@ enum field_types {
 	FIELD_IS_HEX,
 	FIELD_IS_MINOR,
 	FIELD_IS_MAJOR,
+	FIELD_IS_FUNC,
+	FIELD_IS_SYMFUNC,
 	FIELD_IS_NSEC2SEC,
 	FIELD_IS_NSEC2USEC,
 	FIELD_IS_NSEC2MSEC,
@@ -412,6 +417,8 @@ handle_field(struct ftrace_event_call *event,
 	case FIELD_IS_HEX:
 	case FIELD_IS_MINOR:
 	case FIELD_IS_MAJOR:
+	case FIELD_IS_FUNC:
+	case FIELD_IS_SYMFUNC:
 	case FIELD_IS_STRING:
 	case FIELD_IS_STRARRAY:
 		field = find_field(event, fmt, end - fmt);
@@ -594,6 +601,12 @@ ftrace_initialize_print(struct ftrace_event_call *event, const char *fmt, ...)
 		else if (strncmp(fmt, "ifmask:", 7) == 0)
 			field_type = FIELD_IS_IFMASK;
 
+		else if (strncmp(fmt, "func:", 5) == 0)
+			field_type = FIELD_IS_FUNC;
+
+		else if (strncmp(fmt, "symfunc:", 8) == 0)
+			field_type = FIELD_IS_SYMFUNC;
+
 		else if (strncmp(fmt, "nsec2sec:", 9) == 0)
 			field_type = FIELD_IS_NSEC2SEC;
 
@@ -785,6 +798,8 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 		case FIELD_IS_UINT:
 		case FIELD_IS_HEX:
 		case FIELD_IS_PTR:
+		case FIELD_IS_FUNC:
+		case FIELD_IS_SYMFUNC:
 		case FIELD_IS_MAJOR:
 		case FIELD_IS_MINOR:
 			field = info->data.field;
@@ -837,6 +852,14 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
 				trace_seq_printf(s, "%u", MINOR(val));
 				break;
 
+			case FIELD_IS_SYMFUNC:
+				trace_seq_printf(s, "%pF", (void *)val);
+				break;
+
+			case FIELD_IS_FUNC:
+				seq_print_sym_short(s, "%s", val);
+				break;
+
 			default:
 				/* hex should only print the size specified */
 				if (mask)
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/11] tracing: check full name for field
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (7 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 08/11] tracing: add func and symfunc to " Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 10/11] tracing: update sample code with new tag format Steven Rostedt
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0009-tracing-check-full-name-for-field.patch --]
[-- Type: text/plain, Size: 1013 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The check for print format fields only compared the name of the field
up to the length of the field to look for. But it did not test if the
field name itself matched but was larger. The wrong field could be
found as a match.

 ie. str could match str_arr.

[ Impact: fix the matching of fields ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_read_binary.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
index 9ff4ed8..779de20 100644
--- a/kernel/trace/trace_read_binary.c
+++ b/kernel/trace/trace_read_binary.c
@@ -388,7 +388,8 @@ find_field(struct ftrace_event_call *call, const char *name, int len)
 	struct ftrace_event_field *field;
 
 	list_for_each_entry(field, &call->fields, link) {
-		if (!strncmp(field->name, name, len))
+		if (!strncmp(field->name, name, len) &&
+		    !field->name[len])
 			return field;
 	}
 
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/11] tracing: update sample code with new tag format
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (8 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 09/11] tracing: check full name for field Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  5:42 ` [PATCH 11/11] tracing: move > to out of macros and into print statement Steven Rostedt
  2009-06-10  9:26 ` [PATCH 00/11] [GIT PULL] more updates for the tag format Ingo Molnar
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0010-tracing-update-sample-code-with-new-tag-format.patch --]
[-- Type: text/plain, Size: 4225 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

This patch adds examples of all the new tags in the
samples/trace_events/ example code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 samples/trace_events/trace-events-sample.c |   21 ++++++++-
 samples/trace_events/trace-events-sample.h |   66 ++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index aabc4e9..dea8eb3 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -10,12 +10,31 @@
 #define CREATE_TRACE_POINTS
 #include "trace-events-sample.h"
 
+static struct task_struct *simple_tsk;
 
 static void simple_thread_func(int cnt)
 {
+	static int mask;
+	static int myif;
+
 	set_current_state(TASK_INTERRUPTIBLE);
 	schedule_timeout(HZ);
 	trace_foo_bar("hello", cnt);
+	trace_format_examples(jiffies,
+			      jiffies * 2,
+			      "a string here",
+			      &simple_tsk,
+			      mask, mask, simple_thread_func,
+			      myif, myif, cpu_clock(raw_smp_processor_id()),
+			      123456);
+	if (!myif)
+		myif = 3;
+
+	if (!mask)
+		mask = 5;
+
+	mask <<= mask;
+	myif <<= myif;
 }
 
 static int simple_thread(void *arg)
@@ -28,8 +47,6 @@ static int simple_thread(void *arg)
 	return 0;
 }
 
-static struct task_struct *simple_tsk;
-
 static int __init trace_event_init(void)
 {
 	simple_tsk = kthread_run(simple_thread, NULL, "event-sample");
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 128a897..08b4768 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -65,6 +65,11 @@
  * Note, that for both the assign and the printk, __entry is the handler
  * to the data structure in the ring buffer, and is defined by the
  * TP_STRUCT__entry.
+ *
+ * Note it is better to now use the tag format. Examples are in the event
+ * below, and the explanation of the tag format language is in:
+ *
+ *  kernel/trace/trace_read_binary.h
  */
 TRACE_EVENT(foo_bar,
 
@@ -84,6 +89,67 @@ TRACE_EVENT(foo_bar,
 
 	TP_printk("foo %s %d", __entry->foo, __entry->bar)
 );
+
+/*
+ * These are a few examples of the tag format to do the output.
+ */
+TRACE_EVENT(format_examples,
+
+	TP_PROTO(int myint, long mylong, char *mystr, void *myptr,
+		 unsigned long mask, unsigned sym, void *func,
+		 int myif, int myifmask, unsigned long long nsec,
+		 int dev),
+
+	TP_ARGS(myint, mylong, mystr, myptr, mask, sym, func,
+		myif, myifmask, nsec, dev),
+
+	TP_STRUCT__entry(
+		__field(	int,		myint		)
+		__field(	long,		mylong		)
+		__string(str, mystr)
+		__array(	char,		str_arr,	20	)
+		__field(	void *,		func		)
+		__field(	int,		myif		)
+		__field(	unsigned long,	mask		)
+		__field(	unsigned long,	sym		)
+		__field(	int,		myifmask	)
+		__field(	int,		dev		)
+		__field(	unsigned long long, nsec	)
+	),
+
+	TP_fast_assign(
+		__entry->myint = myint;
+		__entry->mylong = mylong;
+		__assign_str(str, mystr);
+		strncpy(__entry->str_arr, mystr, 19);
+		__entry->str_arr[19] = 0;
+		__entry->mask	= mask;
+		__entry->sym	= sym;
+		__entry->func	= func;
+		__entry->myif	= myif;
+		__entry->myifmask = myifmask;
+		__entry->dev	= dev;
+		__entry->nsec	= nsec;
+	),
+
+	TP_FORMAT("to see a normal int <int:myint> \n"
+		  "or perhaps make it unsigned <uint:myint>\n"
+		  "no difference for the size <int:mylong>\n"
+		  "lets make it into a hex <hex:myint>\n"
+		  "dynamic strings are simple <string:str>\n"
+		  "and so are static ones <strarray:str_arr>\n"
+		  "see a func and offset <symfunc:func>\n"
+		  "or just the func name <func:func>\n"
+		  "should we print <if:myif:this:that>\n"
+		  "or base it on a bit mask <ifmask:myif:0xc:foo:bar>\n"
+		  "we want to see that device <major:dev>,<minor:dev>\n"
+		  "we can see masks <mask:mask:,:0=nothing,0xa=a,1=one,2=two>\n"
+		  "or values of symbols <sym:sym:1=this,2=that,3=nevermind>\n"
+		  "and know what time it is in seconds <nsec2sec:6:nsec> \n"
+		  "millisecs <nsec2msec:6:nsec>\n"
+		  "or microsecs <nsec2usec:3:nsec>")
+);
+
 #endif
 
 /***** NOTICE! The #if protection ends here. *****/
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/11] tracing: move > to out of macros and into print statement
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (9 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 10/11] tracing: update sample code with new tag format Steven Rostedt
@ 2009-06-10  5:42 ` Steven Rostedt
  2009-06-10  9:26 ` [PATCH 00/11] [GIT PULL] more updates for the tag format Ingo Molnar
  11 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10  5:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: 0011-tracing-move-to-out-of-macros-and-into-print-stateme.patch --]
[-- Type: text/plain, Size: 3580 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Frederic Weisbecker stated that it is confusing to see a unmatched
'<', when the '>' is hidden inside a macro. This patch moves it out
back to the string.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/irq.h  |    8 +++++---
 include/trace/events/kmem.h |   12 +++++++-----
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index c6c0ff1..4294a6d 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -9,7 +9,7 @@
 
 #define softirq_name(sirq) sirq##_SOFTIRQ, #sirq
 #define show_softirq_name_format(val)					\
-	"%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s>"
+	"%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s,%d=%s"
 #define show_softirq_name_args()					\
 		softirq_name(HI),					\
 		softirq_name(TIMER),					\
@@ -106,7 +106,8 @@ TRACE_EVENT(softirq_entry,
 		__entry->vec = (int)(h - vec);
 	),
 
-	TP_FORMAT("softirq=<int:vec> action=<sym:vec:" show_softirq_name_format(),
+	TP_FORMAT("softirq=<int:vec> action=<sym:vec:"
+		  show_softirq_name_format() ">",
 		  show_softirq_name_args())
 );
 
@@ -135,7 +136,8 @@ TRACE_EVENT(softirq_exit,
 		__entry->vec = (int)(h - vec);
 	),
 
-	TP_FORMAT("softirq=<int:vec> action=<sym:vec:" show_softirq_name_format(),
+	TP_FORMAT("softirq=<int:vec> action=<sym:vec:"
+		  show_softirq_name_format() ">",
 		  show_softirq_name_args())
 );
 
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 5c8f942..32c93da 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -22,7 +22,7 @@
 	"0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s," \
 	"0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s," \
 	"0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s,0x%lx=%s," \
-	"0x%lx=%s,0x%lx=%s>"
+	"0x%lx=%s,0x%lx=%s"
 #define show_gfp_flags_args()						\
 	(unsigned long)GFP_HIGHUSER_MOVABLE,	"GFP_HIGHUSER_MOVABLE", \
 	(unsigned long)GFP_HIGHUSER,		"GFP_HIGHUSER",		\
@@ -76,7 +76,7 @@ TRACE_EVENT(kmalloc,
 
 	TP_FORMAT("call_site=<hex:call_site> ptr=<ptr:ptr> bytes_req=<int:bytes_req> "
 		  "bytes_alloc=<int:bytes_alloc> gfp_flags=<mask:gfp_flags:|:"
-		  show_gfp_flags_format(), show_gfp_flags_args())
+		  show_gfp_flags_format() ">", show_gfp_flags_args())
 );
 
 TRACE_EVENT(kmem_cache_alloc,
@@ -107,7 +107,7 @@ TRACE_EVENT(kmem_cache_alloc,
 
 	TP_FORMAT("call_site=<hex:call_site> ptr=<ptr:ptr> bytes_req=<int:bytes_req> "
 		  "bytes_alloc=<int:bytes_alloc> gfp_flags=<mask:gfp_flags:|:"
-		  show_gfp_flags_format(), show_gfp_flags_args())
+		  show_gfp_flags_format() ">", show_gfp_flags_args())
 );
 
 TRACE_EVENT(kmalloc_node,
@@ -141,7 +141,8 @@ TRACE_EVENT(kmalloc_node,
 
 	TP_FORMAT("call_site=<hex:call_site> ptr=<ptr:ptr> bytes_req=<int:bytes_req> "
 		  "bytes_alloc=<int:bytes_alloc> gfp_flags=<mask:gfp_flags:|:"
-		  show_gfp_flags_format() " node=<int:node>", show_gfp_flags_args())
+		  show_gfp_flags_format() "> node=<int:node>",
+		  show_gfp_flags_args())
 );
 
 TRACE_EVENT(kmem_cache_alloc_node,
@@ -175,7 +176,8 @@ TRACE_EVENT(kmem_cache_alloc_node,
 
 	TP_FORMAT("call_site=<hex:call_site> ptr=<ptr:ptr> bytes_req=<int:bytes_req> "
 		  "bytes_alloc=<int:bytes_alloc> gfp_flags=<mask:gfp_flags:|:"
-		  show_gfp_flags_format() " node=<int:node>", show_gfp_flags_args())
+		  show_gfp_flags_format() "> node=<int:node>",
+		  show_gfp_flags_args())
 );
 
 TRACE_EVENT(kfree,
-- 
1.6.3.1

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] tracing: add func and symfunc to tag format
  2009-06-10  5:42 ` [PATCH 08/11] tracing: add func and symfunc to " Steven Rostedt
@ 2009-06-10  7:48   ` Frederic Weisbecker
  2009-06-10 12:55     ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2009-06-10  7:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Minchan Kim,
	Mel Gorman, Christoph Hellwig, Rik van Riel, Pekka Enberg,
	Peter Zijlstra, Theodore Tso, Mathieu Desnoyers, Lai Jiangshan,
	Zhaolei, KOSAKI Motohiro, Jason Baron, Jiaying Zhang,
	Tom Zanussi, Xiao Guangrong

On Wed, Jun 10, 2009 at 01:42:14AM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> This adds a way to print out a function name from a pointer the same way
> that the Linux printf "%pF" does. Well the symfunc tag does.
> 
> If we change kmalloc and kfree formats to:
> 
>   TP_FORMAT("call_site=<symfunc:call_site> ptr=<ptr:ptr> bytes_req=<int:bytes_req> "
> 
> and
> 
>   TP_FORMAT("call_site=<func:call_site> ptr=<ptr:ptr>")
> 
> respectively, we would get something like:
> 
>    kmalloc: call_site=__alloc_skb+0x70/0x160 ptr=ffff88003d830800 bytes_req=680
>    kfree: call_site=skb_release_data ptr=ffff88003d830800


Nice!

I thought about func and func_offset, but symfunc seems enough.


> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_output.c      |    2 +-
>  kernel/trace/trace_output.h      |    4 ++++
>  kernel/trace/trace_read_binary.c |   27 +++++++++++++++++++++++++--
>  3 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 7938f3a..39d5eeb 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -326,7 +326,7 @@ static inline const char *kretprobed(const char *name)
>  }
>  #endif /* CONFIG_KRETPROBES */
>  
> -static int
> +int
>  seq_print_sym_short(struct trace_seq *s, const char *fmt, unsigned long address)
>  {
>  #ifdef CONFIG_KALLSYMS
> diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
> index d38bec4..412eee8 100644
> --- a/kernel/trace/trace_output.h
> +++ b/kernel/trace/trace_output.h
> @@ -47,5 +47,9 @@ do {							\
>  		return TRACE_TYPE_PARTIAL_LINE;		\
>  } while (0)
>  
> +/* until we have a way with sprintf to do this */
> +int seq_print_sym_short(struct trace_seq *s, const char *fmt,
> +			unsigned long address);
> +
>  #endif
>  
> diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
> index f3fdac8..9ff4ed8 100644
> --- a/kernel/trace/trace_read_binary.c
> +++ b/kernel/trace/trace_read_binary.c
> @@ -10,7 +10,7 @@
>  #include <linux/module.h>
>  #include <linux/ctype.h>
>  
> -#include "trace.h"
> +#include "trace_output.h"
>  
>  static DEFINE_MUTEX(buffer_lock);
>  static struct trace_seq buffer;
> @@ -23,7 +23,8 @@ static struct trace_seq buffer;
>   * COMMAND := <TYPE:FIELD> | <mask:FIELD:DELIM:MASKS> | <sym:FIELD:SYMBOLS> |
>   *            <if:FIELD:TRUE:FALSE> | <ifmask:FIELD:MASK:TRUE:FALSE> |
>   *            <nsec2sec:PRECISION:FIELD> | <nsec2usec:PRECISION:FIELD> |
> - *            <nsec2msec:PRECISION:FIELD> | <major:FIELD> | <minor:FIELD>
> + *            <nsec2msec:PRECISION:FIELD> | <major:FIELD> | <minor:FIELD> |
> + *            <func:FIELD> | <symfunc:FIELD>
>   * TYPE := int | uint | hex | ptr | string | strarray
>   * FIELD := defined by the event structure
>   * MASKS := MASK=NAME,MASKS | MASK=NAME
> @@ -43,6 +44,8 @@ static struct trace_seq buffer;
>   *  uint      : Same as int, but for unsigned.
>   *  hex       : Print the field as a hex (ie. 0x4ab)
>   *  ptr       : Print the field as a hex but without the '0x'.
> + *  symfunc   : Print a ptr like the "%pF" does.
> + *  func      : Like symfunc but without the address.
>   *  string    : Used with dynamic sized strings (__string)
>   *  strarray  : Used with static sized arrays (__array ( char , item, len)).
>   *  if        : If the field is non zero, print the TRUE text,
> @@ -80,6 +83,8 @@ enum field_types {
>  	FIELD_IS_HEX,
>  	FIELD_IS_MINOR,
>  	FIELD_IS_MAJOR,
> +	FIELD_IS_FUNC,
> +	FIELD_IS_SYMFUNC,
>  	FIELD_IS_NSEC2SEC,
>  	FIELD_IS_NSEC2USEC,
>  	FIELD_IS_NSEC2MSEC,
> @@ -412,6 +417,8 @@ handle_field(struct ftrace_event_call *event,
>  	case FIELD_IS_HEX:
>  	case FIELD_IS_MINOR:
>  	case FIELD_IS_MAJOR:
> +	case FIELD_IS_FUNC:
> +	case FIELD_IS_SYMFUNC:
>  	case FIELD_IS_STRING:
>  	case FIELD_IS_STRARRAY:
>  		field = find_field(event, fmt, end - fmt);
> @@ -594,6 +601,12 @@ ftrace_initialize_print(struct ftrace_event_call *event, const char *fmt, ...)
>  		else if (strncmp(fmt, "ifmask:", 7) == 0)
>  			field_type = FIELD_IS_IFMASK;
>  
> +		else if (strncmp(fmt, "func:", 5) == 0)
> +			field_type = FIELD_IS_FUNC;
> +
> +		else if (strncmp(fmt, "symfunc:", 8) == 0)
> +			field_type = FIELD_IS_SYMFUNC;
> +
>  		else if (strncmp(fmt, "nsec2sec:", 9) == 0)
>  			field_type = FIELD_IS_NSEC2SEC;
>  
> @@ -785,6 +798,8 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
>  		case FIELD_IS_UINT:
>  		case FIELD_IS_HEX:
>  		case FIELD_IS_PTR:
> +		case FIELD_IS_FUNC:
> +		case FIELD_IS_SYMFUNC:
>  		case FIELD_IS_MAJOR:
>  		case FIELD_IS_MINOR:
>  			field = info->data.field;
> @@ -837,6 +852,14 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
>  				trace_seq_printf(s, "%u", MINOR(val));
>  				break;
>  
> +			case FIELD_IS_SYMFUNC:
> +				trace_seq_printf(s, "%pF", (void *)val);
> +				break;
> +
> +			case FIELD_IS_FUNC:
> +				seq_print_sym_short(s, "%s", val);


May be after the merge window it would be nice to start use %pf (for
now it's on another branch).
I'll have more time to do such things in about two weeks.



> +				break;
> +
>  			default:
>  				/* hex should only print the size specified */
>  				if (mask)
> -- 
> 1.6.3.1
> 
> -- 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
                   ` (10 preceding siblings ...)
  2009-06-10  5:42 ` [PATCH 11/11] tracing: move > to out of macros and into print statement Steven Rostedt
@ 2009-06-10  9:26 ` Ingo Molnar
  2009-06-10 11:11   ` Frédéric Weisbecker
  11 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2009-06-10  9:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Frederic Weisbecker, Theodore Tso, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> Ingo,
> 
> Please pull the latest tip/tracing/event-print-format tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/event-print-format
> 
> 
> Li Zefan (1):
>       tracing/events: convert block trace points to TRACE_EVENT(), fix
> 
> Steven Rostedt (10):
>       tracing: add nsec2sec print formats
>       tracing: convert lockdep lock_acquired trace point to use nsec2usec tag
>       tracing: add major and minor tags for print format
>       tracing: use << to print < instead of \<
>       tracing: convert the block trace points to use the new tag format
>       tracing: add test for strings in event tag format
>       tracing: add func and symfunc to tag format
>       tracing: check full name for field
>       tracing: update sample code with new tag format
>       tracing: move '>' to out of macros and into print statement
> 
> ----
>  include/linux/blktrace_api.h               |    4 +-
>  include/linux/ftrace_event.h               |    3 +-
>  include/trace/events/block.h               |  101 +++------
>  include/trace/events/irq.h                 |    8 +-
>  include/trace/events/kmem.h                |   12 +-
>  include/trace/events/lockdep.h             |    8 +-
>  include/trace/ftrace.h                     |    2 +-
>  kernel/trace/trace_output.c                |    2 +-
>  kernel/trace/trace_output.h                |    4 +
>  kernel/trace/trace_read_binary.c           |  304 +++++++++++++++++++++++-----
>  samples/trace_events/trace-events-sample.c |   21 ++-
>  samples/trace_events/trace-events-sample.h |   66 ++++++
>  12 files changed, 399 insertions(+), 136 deletions(-)

Hm, that's way too much back and forth really - trivial typo fixes, 
build failure, etc. This is really a 'oh, the merge window is 
coming' last minute scrambling and we dont want to mess up the 
squeaky-clean tracing tree history be messed up with this.

Frederic also expressed worries about the tag format. Could we have 
a wider buy-in for this format?

I've separated these bits into tip:tracing/ftrace, and kept 
tip:tracing/core on a pre-print-formats state (going back 8 
commits), so that upstream merging of the other bits does not get 
held up.

Could we try a cleaner, bisectable, consiously built up version of 
these final tracing/core..tracing/ftrace please? I think we can - 
the rest of the tree is clean. Please do a exact-same-content rebase 
so that the merge back gets obvious and that the testing we've 
injected does not get invalidated?

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10  9:26 ` [PATCH 00/11] [GIT PULL] more updates for the tag format Ingo Molnar
@ 2009-06-10 11:11   ` Frédéric Weisbecker
  2009-06-10 13:01     ` Theodore Tso
  0 siblings, 1 reply; 30+ messages in thread
From: Frédéric Weisbecker @ 2009-06-10 11:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, linux-kernel, Andrew Morton, Minchan Kim,
	Mel Gorman, Christoph Hellwig, Rik van Riel, Pekka Enberg,
	Peter Zijlstra, Theodore Tso, Mathieu Desnoyers, Lai Jiangshan,
	Zhaolei, KOSAKI Motohiro, Jason Baron, Jiaying Zhang,
	Tom Zanussi, Xiao Guangrong

2009/6/10 Ingo Molnar <mingo@elte.hu>:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
>>
>> Ingo,
>>
>> Please pull the latest tip/tracing/event-print-format tree, which can be found at:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
>> tip/tracing/event-print-format
>>
>>
>> Li Zefan (1):
>>       tracing/events: convert block trace points to TRACE_EVENT(), fix
>>
>> Steven Rostedt (10):
>>       tracing: add nsec2sec print formats
>>       tracing: convert lockdep lock_acquired trace point to use nsec2usec tag
>>       tracing: add major and minor tags for print format
>>       tracing: use << to print < instead of \<
>>       tracing: convert the block trace points to use the new tag format
>>       tracing: add test for strings in event tag format
>>       tracing: add func and symfunc to tag format
>>       tracing: check full name for field
>>       tracing: update sample code with new tag format
>>       tracing: move '>' to out of macros and into print statement
>>
>> ----
>>  include/linux/blktrace_api.h               |    4 +-
>>  include/linux/ftrace_event.h               |    3 +-
>>  include/trace/events/block.h               |  101 +++------
>>  include/trace/events/irq.h                 |    8 +-
>>  include/trace/events/kmem.h                |   12 +-
>>  include/trace/events/lockdep.h             |    8 +-
>>  include/trace/ftrace.h                     |    2 +-
>>  kernel/trace/trace_output.c                |    2 +-
>>  kernel/trace/trace_output.h                |    4 +
>>  kernel/trace/trace_read_binary.c           |  304 +++++++++++++++++++++++-----
>>  samples/trace_events/trace-events-sample.c |   21 ++-
>>  samples/trace_events/trace-events-sample.h |   66 ++++++
>>  12 files changed, 399 insertions(+), 136 deletions(-)
>
> Hm, that's way too much back and forth really - trivial typo fixes,
> build failure, etc. This is really a 'oh, the merge window is
> coming' last minute scrambling and we dont want to mess up the
> squeaky-clean tracing tree history be messed up with this.
>
> Frederic also expressed worries about the tag format. Could we have
> a wider buy-in for this format?



Well, indeed I had worries, but I discussed about it with Steven and
now I actually
think this new tag format is much more powerful than printf style.
It brings a cleaner, and much higher level way to control the data exports.

But it would be nice to read some opinions from end users (end
developers) of TRACE_EVENT().


>
> I've separated these bits into tip:tracing/ftrace, and kept
> tip:tracing/core on a pre-print-formats state (going back 8
> commits), so that upstream merging of the other bits does not get
> held up.
>
> Could we try a cleaner, bisectable, consiously built up version of
> these final tracing/core..tracing/ftrace please? I think we can -
> the rest of the tree is clean. Please do a exact-same-content rebase
> so that the merge back gets obvious and that the testing we've
> injected does not get invalidated?
>
>        Ingo
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] tracing: add func and symfunc to tag format
  2009-06-10  7:48   ` Frederic Weisbecker
@ 2009-06-10 12:55     ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10 12:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Minchan Kim,
	Mel Gorman, Christoph Hellwig, Rik van Riel, Pekka Enberg,
	Peter Zijlstra, Theodore Tso, Mathieu Desnoyers, Lai Jiangshan,
	Zhaolei, KOSAKI Motohiro, Jason Baron, Jiaying Zhang,
	Tom Zanussi, Xiao Guangrong


On Wed, 10 Jun 2009, Frederic Weisbecker wrote:

> On Wed, Jun 10, 2009 at 01:42:14AM -0400, Steven Rostedt wrote:
> > From: Steven Rostedt <srostedt@redhat.com>
> > 
> > This adds a way to print out a function name from a pointer the same way
> > that the Linux printf "%pF" does. Well the symfunc tag does.
> > 
> > If we change kmalloc and kfree formats to:
> > 
> >   TP_FORMAT("call_site=<symfunc:call_site> ptr=<ptr:ptr> bytes_req=<int:bytes_req> "
> > 
> > and
> > 
> >   TP_FORMAT("call_site=<func:call_site> ptr=<ptr:ptr>")
> > 
> > respectively, we would get something like:
> > 
> >    kmalloc: call_site=__alloc_skb+0x70/0x160 ptr=ffff88003d830800 bytes_req=680
> >    kfree: call_site=skb_release_data ptr=ffff88003d830800
> 
> 
> Nice!
> 
> I thought about func and func_offset, but symfunc seems enough.

func_offset is more intuitive, I'll switch it to that. Or maybe 
"funcoffset" or just "funcoff". OK, maybe not ;-)



> 
> 
> > 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  kernel/trace/trace_output.c      |    2 +-
> >  kernel/trace/trace_output.h      |    4 ++++
> >  kernel/trace/trace_read_binary.c |   27 +++++++++++++++++++++++++--
> >  3 files changed, 30 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index 7938f3a..39d5eeb 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -326,7 +326,7 @@ static inline const char *kretprobed(const char *name)
> >  }
> >  #endif /* CONFIG_KRETPROBES */
> >  
> > -static int
> > +int
> >  seq_print_sym_short(struct trace_seq *s, const char *fmt, unsigned long address)
> >  {
> >  #ifdef CONFIG_KALLSYMS
> > diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
> > index d38bec4..412eee8 100644
> > --- a/kernel/trace/trace_output.h
> > +++ b/kernel/trace/trace_output.h
> > @@ -47,5 +47,9 @@ do {							\
> >  		return TRACE_TYPE_PARTIAL_LINE;		\
> >  } while (0)
> >  
> > +/* until we have a way with sprintf to do this */
> > +int seq_print_sym_short(struct trace_seq *s, const char *fmt,
> > +			unsigned long address);
> > +
> >  #endif
> >  
> > diff --git a/kernel/trace/trace_read_binary.c b/kernel/trace/trace_read_binary.c
> > index f3fdac8..9ff4ed8 100644
> > --- a/kernel/trace/trace_read_binary.c
> > +++ b/kernel/trace/trace_read_binary.c
> > @@ -10,7 +10,7 @@
> >  #include <linux/module.h>
> >  #include <linux/ctype.h>
> >  
> > -#include "trace.h"
> > +#include "trace_output.h"
> >  
> >  static DEFINE_MUTEX(buffer_lock);
> >  static struct trace_seq buffer;
> > @@ -23,7 +23,8 @@ static struct trace_seq buffer;
> >   * COMMAND := <TYPE:FIELD> | <mask:FIELD:DELIM:MASKS> | <sym:FIELD:SYMBOLS> |
> >   *            <if:FIELD:TRUE:FALSE> | <ifmask:FIELD:MASK:TRUE:FALSE> |
> >   *            <nsec2sec:PRECISION:FIELD> | <nsec2usec:PRECISION:FIELD> |
> > - *            <nsec2msec:PRECISION:FIELD> | <major:FIELD> | <minor:FIELD>
> > + *            <nsec2msec:PRECISION:FIELD> | <major:FIELD> | <minor:FIELD> |
> > + *            <func:FIELD> | <symfunc:FIELD>
> >   * TYPE := int | uint | hex | ptr | string | strarray
> >   * FIELD := defined by the event structure
> >   * MASKS := MASK=NAME,MASKS | MASK=NAME
> > @@ -43,6 +44,8 @@ static struct trace_seq buffer;
> >   *  uint      : Same as int, but for unsigned.
> >   *  hex       : Print the field as a hex (ie. 0x4ab)
> >   *  ptr       : Print the field as a hex but without the '0x'.
> > + *  symfunc   : Print a ptr like the "%pF" does.
> > + *  func      : Like symfunc but without the address.
> >   *  string    : Used with dynamic sized strings (__string)
> >   *  strarray  : Used with static sized arrays (__array ( char , item, len)).
> >   *  if        : If the field is non zero, print the TRUE text,
> > @@ -80,6 +83,8 @@ enum field_types {
> >  	FIELD_IS_HEX,
> >  	FIELD_IS_MINOR,
> >  	FIELD_IS_MAJOR,
> > +	FIELD_IS_FUNC,
> > +	FIELD_IS_SYMFUNC,
> >  	FIELD_IS_NSEC2SEC,
> >  	FIELD_IS_NSEC2USEC,
> >  	FIELD_IS_NSEC2MSEC,
> > @@ -412,6 +417,8 @@ handle_field(struct ftrace_event_call *event,
> >  	case FIELD_IS_HEX:
> >  	case FIELD_IS_MINOR:
> >  	case FIELD_IS_MAJOR:
> > +	case FIELD_IS_FUNC:
> > +	case FIELD_IS_SYMFUNC:
> >  	case FIELD_IS_STRING:
> >  	case FIELD_IS_STRARRAY:
> >  		field = find_field(event, fmt, end - fmt);
> > @@ -594,6 +601,12 @@ ftrace_initialize_print(struct ftrace_event_call *event, const char *fmt, ...)
> >  		else if (strncmp(fmt, "ifmask:", 7) == 0)
> >  			field_type = FIELD_IS_IFMASK;
> >  
> > +		else if (strncmp(fmt, "func:", 5) == 0)
> > +			field_type = FIELD_IS_FUNC;
> > +
> > +		else if (strncmp(fmt, "symfunc:", 8) == 0)
> > +			field_type = FIELD_IS_SYMFUNC;
> > +
> >  		else if (strncmp(fmt, "nsec2sec:", 9) == 0)
> >  			field_type = FIELD_IS_NSEC2SEC;
> >  
> > @@ -785,6 +798,8 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
> >  		case FIELD_IS_UINT:
> >  		case FIELD_IS_HEX:
> >  		case FIELD_IS_PTR:
> > +		case FIELD_IS_FUNC:
> > +		case FIELD_IS_SYMFUNC:
> >  		case FIELD_IS_MAJOR:
> >  		case FIELD_IS_MINOR:
> >  			field = info->data.field;
> > @@ -837,6 +852,14 @@ ftrace_read_binary(struct trace_seq *s, struct ftrace_event_call *event,
> >  				trace_seq_printf(s, "%u", MINOR(val));
> >  				break;
> >  
> > +			case FIELD_IS_SYMFUNC:
> > +				trace_seq_printf(s, "%pF", (void *)val);
> > +				break;
> > +
> > +			case FIELD_IS_FUNC:
> > +				seq_print_sym_short(s, "%s", val);
> 
> 
> May be after the merge window it would be nice to start use %pf (for
> now it's on another branch).
> I'll have more time to do such things in about two weeks.

Yeah, I spent about twenty minutes trying to find the "%pf" until I 
realized it was not in yet :-(

-- Steve

> 
> 
> 
> > +				break;
> > +
> >  			default:
> >  				/* hex should only print the size specified */
> >  				if (mask)
> > -- 
> > 1.6.3.1
> > 
> > -- 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10 11:11   ` Frédéric Weisbecker
@ 2009-06-10 13:01     ` Theodore Tso
  2009-06-10 13:49       ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Theodore Tso @ 2009-06-10 13:01 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Ingo Molnar, Steven Rostedt, linux-kernel, Andrew Morton,
	Minchan Kim, Mel Gorman, Christoph Hellwig, Rik van Riel,
	Pekka Enberg, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan,
	Zhaolei, KOSAKI Motohiro, Jason Baron, Jiaying Zhang,
	Tom Zanussi, Xiao Guangrong

On Wed, Jun 10, 2009 at 01:11:40PM +0200, Frédéric Weisbecker wrote:
> Well, indeed I had worries, but I discussed about it with Steven and
> now I actually
> think this new tag format is much more powerful than printf style.
> It brings a cleaner, and much higher level way to control the data exports.
> 
> But it would be nice to read some opinions from end users (end
> developers) of TRACE_EVENT().

Maybe I'm missing something, but looks like the this new format, while
simpler and easier to read, doesn't have support for using a more
complicated C expression as a printk argument.  For example:

	TP_printk("dev %s ino %lu mode %d uid %u gid %u blocks %llu",
		  jbd2_dev_to_name(__entry->dev), __entry->ino, __entry->mode,
		  __entry->uid, __entry->gid, __entry->blocks)

How should I handle the "jbd2_dev_to_name(__entry->dev)" argument to
TP_printk?  The whole point of calling jbd2_dev_to_name() at TP_printk
time is to not bloat the ring buffer with a 32 byte devname.

     	       	     	      	     	  - Ted

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10 13:01     ` Theodore Tso
@ 2009-06-10 13:49       ` Steven Rostedt
  2009-06-10 14:39         ` Mathieu Desnoyers
  2009-06-10 16:03         ` Theodore Tso
  0 siblings, 2 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10 13:49 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frédéric Weisbecker, Ingo Molnar, linux-kernel,
	Andrew Morton, Minchan Kim, Mel Gorman, Christoph Hellwig,
	Rik van Riel, Pekka Enberg, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5299 bytes --]


On Wed, 10 Jun 2009, Theodore Tso wrote:

> On Wed, Jun 10, 2009 at 01:11:40PM +0200, Frédéric Weisbecker wrote:
> > Well, indeed I had worries, but I discussed about it with Steven and
> > now I actually
> > think this new tag format is much more powerful than printf style.
> > It brings a cleaner, and much higher level way to control the data exports.
> > 
> > But it would be nice to read some opinions from end users (end
> > developers) of TRACE_EVENT().
> 
> Maybe I'm missing something, but looks like the this new format, while
> simpler and easier to read, doesn't have support for using a more
> complicated C expression as a printk argument.  For example:
> 
> 	TP_printk("dev %s ino %lu mode %d uid %u gid %u blocks %llu",
> 		  jbd2_dev_to_name(__entry->dev), __entry->ino, __entry->mode,
> 		  __entry->uid, __entry->gid, __entry->blocks)
> 
> How should I handle the "jbd2_dev_to_name(__entry->dev)" argument to
> TP_printk?  The whole point of calling jbd2_dev_to_name() at TP_printk
> time is to not bloat the ring buffer with a 32 byte devname.

Understood, and the example you just gave also has the flaw that a 
userspace tool could not parse it, because it would not know what to do 
with "jbd2_dev_to_name()".

This is why I suggested keeping the TP_printk, for cases like this. Since 
it is also currently useless in userspace.

But we really should convert all cases, and I was toying with an idea to 
dynamically make your own data type, and be able to make a way to print 
it. That is you could register:

register_trace_event_data_type(const char *name, 
	(int)(*print_func)(struct trace_seq *s, void *data, int size),
	const char *fmt, ...);

Where the name would be the data type you are making, the print_func is 
how ftrace would print it in debugfs/tracing/trace, and the fmt, ... would 
be who to show the user how to print it.

For example, for the GFP flags we could do something like:

/* helper routine */
unsigned long long trace_get_word(void *p, int len)
{
	unsigned long long val;

        switch (size) {
        case 1:
                val = *(char *)p;
                break;
        case 2:
                val = *(short *)p;
                break;
        case 4:
                val = *(int *)p;
                break;
        case 8:
                val = *(long long *)p;
                break;
        default:
		WARN(1,"length %d not valid word size\n");
                return 0;
        }

        return val;
}

static int test_gfp(unsigned long *gfp, unsigned long mask)
{
	if ((*gfp & mask) == mask) {
		*gfp &= ~mask;
		return 1;
	}
	return 0;
}

#define test_gfp_name(under, name)			\
	if (test_gfp(&gfp, under##GFP_##name)) {	\
		if (first)				\
			first = 0;			\
		else					\
			trace_seq_putc(s, '|');		\
		trace_seq_puts(s, "GFP_" #name);	\
	}


static int print_gfp(struct trace_seq *s, void *data, int len)
{
	unsigned long gfp;

	gfp = trace_get_word(data, len);

	if (!gfp) {
		trace_seq_puts(s, GPF_NOWAIT);
		return 0;
	}

	while (gfp) {
		test_gfp_name(,HIGHUSER_MOVABLE);
		test_gfp_name(,HIGHUSER);
		test_gfp_name(,USER);
		test_gfp_name(,TEMPORARY);
		test_gfp_name(,KERNEL);
		test_gfp_name(,NOFS);
		test_gfp_name(,ATOMIC);
		test_gfp_name(,NOIO);
		test_gfp_name(__,HIGH);
		test_gfp_name(__,WAIT);
		test_gfp_name(__,IO);
		test_gfp_name(__,COLD);
		test_gfp_name(__,NOWARN);
		test_gfp_name(__,REPEAT);
		test_gfp_name(__,NOFAIL);
		test_gfp_name(__,NORETRY);
		test_gfp_name(__,COMP);
		test_gfp_name(__,ZERO);
		test_gfp_name(__,NOMEMALLOC);
		test_gfp_name(__,HARDWALL);
		test_gfp_name(__,THISNODE);
		test_gfp_name(__,RECLAIMABLE);
		test_gfp_name(__,MOVABLE);

}

#define gfp_insert(under, name)	\
		(unsigned long)under##GFP_##name, "GFP_" #name

	register_trace_event_data_type("gfp", print_gfp,
		"mask:\n"
	        " 0=GFP_NOWAIT,"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n"
		" 0x%lx=%s,\n",
		gfp_insert(,HIGHUSER_MOVABLE),
		gfp_insert(,HIGHUSER),
		gfp_insert(,USER),
		gfp_insert(,TEMPORARY),
		gfp_insert(,NOFS),
		gfp_insert(,ATOMIC),
		gfp_insert(,NOIO),
		gfp_insert(__,HIGH),
		gfp_insert(__,WAIT),
		gfp_insert(__,IO),
		gfp_insert(__,COLD),
		gfp_insert(__,NOWARN),
		gfp_insert(__,REPEAT),
		gfp_insert(__,NOFAIL),
		gfp_insert(__,NORETRY),
		gfp_insert(__,COMP),
		gfp_insert(__,ZERO),
		gfp_insert(__,NOMEMALLOC),
		gfp_insert(__,HARDWALL),
		gfp_insert(__,THISNODE),
		gfp_insert(__,RECLAIMABLE),
		gfp_insert(__,MOVEABLE));


And then in the trace format, we could do:

	<data:gfp:field>

And the 'data' will flag us to how to print the data.

For userland, there could be a file in:

	/debug/tracing/events/data_types/gfp/format

That will show that format. Yes we duplicate some of the code, but it 
it would solve these types of issues.

-- Steve


		
		
	

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10 13:49       ` Steven Rostedt
@ 2009-06-10 14:39         ` Mathieu Desnoyers
  2009-06-10 15:21           ` Steven Rostedt
  2009-06-10 16:03         ` Theodore Tso
  1 sibling, 1 reply; 30+ messages in thread
From: Mathieu Desnoyers @ 2009-06-10 14:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Theodore Tso, Frédéric Weisbecker, Ingo Molnar,
	linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> On Wed, 10 Jun 2009, Theodore Tso wrote:
> 
> > On Wed, Jun 10, 2009 at 01:11:40PM +0200, Frédéric Weisbecker wrote:
> > > Well, indeed I had worries, but I discussed about it with Steven and
> > > now I actually
> > > think this new tag format is much more powerful than printf style.
> > > It brings a cleaner, and much higher level way to control the data exports.
> > > 
> > > But it would be nice to read some opinions from end users (end
> > > developers) of TRACE_EVENT().
> > 
> > Maybe I'm missing something, but looks like the this new format, while
> > simpler and easier to read, doesn't have support for using a more
> > complicated C expression as a printk argument.  For example:
> > 
> > 	TP_printk("dev %s ino %lu mode %d uid %u gid %u blocks %llu",
> > 		  jbd2_dev_to_name(__entry->dev), __entry->ino, __entry->mode,
> > 		  __entry->uid, __entry->gid, __entry->blocks)
> > 
> > How should I handle the "jbd2_dev_to_name(__entry->dev)" argument to
> > TP_printk?  The whole point of calling jbd2_dev_to_name() at TP_printk
> > time is to not bloat the ring buffer with a 32 byte devname.
> 
> Understood, and the example you just gave also has the flaw that a 
> userspace tool could not parse it, because it would not know what to do 
> with "jbd2_dev_to_name()".
> 
> This is why I suggested keeping the TP_printk, for cases like this. Since 
> it is also currently useless in userspace.
> 
> But we really should convert all cases, and I was toying with an idea to 
> dynamically make your own data type, and be able to make a way to print 
> it. That is you could register:
> 
> register_trace_event_data_type(const char *name, 
> 	(int)(*print_func)(struct trace_seq *s, void *data, int size),
> 	const char *fmt, ...);
> 
> Where the name would be the data type you are making, the print_func is 
> how ftrace would print it in debugfs/tracing/trace, and the fmt, ... would 
> be who to show the user how to print it.
> 
> For example, for the GFP flags we could do something like:
> 
> /* helper routine */
> unsigned long long trace_get_word(void *p, int len)
> {
> 	unsigned long long val;
> 
>         switch (size) {
>         case 1:
>                 val = *(char *)p;
>                 break;
>         case 2:
>                 val = *(short *)p;
>                 break;
>         case 4:
>                 val = *(int *)p;
>                 break;
>         case 8:
>                 val = *(long long *)p;
>                 break;
>         default:
> 		WARN(1,"length %d not valid word size\n");
>                 return 0;
>         }
> 
>         return val;
> }
> 
> static int test_gfp(unsigned long *gfp, unsigned long mask)
> {
> 	if ((*gfp & mask) == mask) {
> 		*gfp &= ~mask;
> 		return 1;
> 	}
> 	return 0;
> }
> 
> #define test_gfp_name(under, name)			\
> 	if (test_gfp(&gfp, under##GFP_##name)) {	\
> 		if (first)				\
> 			first = 0;			\
> 		else					\
> 			trace_seq_putc(s, '|');		\
> 		trace_seq_puts(s, "GFP_" #name);	\
> 	}
> 
> 
> static int print_gfp(struct trace_seq *s, void *data, int len)
> {
> 	unsigned long gfp;
> 
> 	gfp = trace_get_word(data, len);
> 
> 	if (!gfp) {
> 		trace_seq_puts(s, GPF_NOWAIT);
> 		return 0;
> 	}
> 
> 	while (gfp) {
> 		test_gfp_name(,HIGHUSER_MOVABLE);
> 		test_gfp_name(,HIGHUSER);
> 		test_gfp_name(,USER);
> 		test_gfp_name(,TEMPORARY);
> 		test_gfp_name(,KERNEL);
> 		test_gfp_name(,NOFS);
> 		test_gfp_name(,ATOMIC);
> 		test_gfp_name(,NOIO);
> 		test_gfp_name(__,HIGH);
> 		test_gfp_name(__,WAIT);
> 		test_gfp_name(__,IO);
> 		test_gfp_name(__,COLD);
> 		test_gfp_name(__,NOWARN);
> 		test_gfp_name(__,REPEAT);
> 		test_gfp_name(__,NOFAIL);
> 		test_gfp_name(__,NORETRY);
> 		test_gfp_name(__,COMP);
> 		test_gfp_name(__,ZERO);
> 		test_gfp_name(__,NOMEMALLOC);
> 		test_gfp_name(__,HARDWALL);
> 		test_gfp_name(__,THISNODE);
> 		test_gfp_name(__,RECLAIMABLE);
> 		test_gfp_name(__,MOVABLE);
> 
> }
> 
> #define gfp_insert(under, name)	\
> 		(unsigned long)under##GFP_##name, "GFP_" #name
> 
> 	register_trace_event_data_type("gfp", print_gfp,
> 		"mask:\n"
> 	        " 0=GFP_NOWAIT,"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n"
> 		" 0x%lx=%s,\n",
> 		gfp_insert(,HIGHUSER_MOVABLE),
> 		gfp_insert(,HIGHUSER),
> 		gfp_insert(,USER),
> 		gfp_insert(,TEMPORARY),
> 		gfp_insert(,NOFS),
> 		gfp_insert(,ATOMIC),
> 		gfp_insert(,NOIO),
> 		gfp_insert(__,HIGH),
> 		gfp_insert(__,WAIT),
> 		gfp_insert(__,IO),
> 		gfp_insert(__,COLD),
> 		gfp_insert(__,NOWARN),
> 		gfp_insert(__,REPEAT),
> 		gfp_insert(__,NOFAIL),
> 		gfp_insert(__,NORETRY),
> 		gfp_insert(__,COMP),
> 		gfp_insert(__,ZERO),
> 		gfp_insert(__,NOMEMALLOC),
> 		gfp_insert(__,HARDWALL),
> 		gfp_insert(__,THISNODE),
> 		gfp_insert(__,RECLAIMABLE),
> 		gfp_insert(__,MOVEABLE));
> 
> 
> And then in the trace format, we could do:
> 
> 	<data:gfp:field>
> 
> And the 'data' will flag us to how to print the data.
> 
> For userland, there could be a file in:
> 
> 	/debug/tracing/events/data_types/gfp/format
> 
> That will show that format. Yes we duplicate some of the code, but it 
> it would solve these types of issues.
> 

It sounds a lot like the type tables LTTng is currently exporting
through specific channels. One for the list of IRQ handlers, one listing
softirqs, one for syscalls.... etc etc. The nice side of this approach
is that it permits to deal with dynamic events that modify the table
state while tracing is active, e.g. : loadling a module which adds an
IRQ handlers.

This is planned to be used for enum description eventually.

Mathieu

> -- Steve
> 
> 
> 		
> 		
> 	


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10 14:39         ` Mathieu Desnoyers
@ 2009-06-10 15:21           ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10 15:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Theodore Tso, Frédéric Weisbecker, Ingo Molnar,
	linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong


On Wed, 10 Jun 2009, Mathieu Desnoyers wrote:
> > 
> > 
> > And then in the trace format, we could do:
> > 
> > 	<data:gfp:field>
> > 
> > And the 'data' will flag us to how to print the data.
> > 
> > For userland, there could be a file in:
> > 
> > 	/debug/tracing/events/data_types/gfp/format
> > 
> > That will show that format. Yes we duplicate some of the code, but it 
> > it would solve these types of issues.
> > 
> 
> It sounds a lot like the type tables LTTng is currently exporting
> through specific channels. One for the list of IRQ handlers, one listing
> softirqs, one for syscalls.... etc etc. The nice side of this approach
> is that it permits to deal with dynamic events that modify the table
> state while tracing is active, e.g. : loadling a module which adds an
> IRQ handlers.
> 
> This is planned to be used for enum description eventually.

Well, I wouldn't make it dependent on any class, just a generic type to 
add that any user could do. The types just need to be added before the 
events are registered.

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10 13:49       ` Steven Rostedt
  2009-06-10 14:39         ` Mathieu Desnoyers
@ 2009-06-10 16:03         ` Theodore Tso
  2009-06-10 16:17           ` Steven Rostedt
  2009-06-11 13:03           ` Christoph Hellwig
  1 sibling, 2 replies; 30+ messages in thread
From: Theodore Tso @ 2009-06-10 16:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frédéric Weisbecker, Ingo Molnar, linux-kernel,
	Andrew Morton, Minchan Kim, Mel Gorman, Christoph Hellwig,
	Rik van Riel, Pekka Enberg, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

On Wed, Jun 10, 2009 at 09:49:29AM -0400, Steven Rostedt wrote:
> > 
> > Maybe I'm missing something, but looks like the this new format, while
> > simpler and easier to read, doesn't have support for using a more
> > complicated C expression as a printk argument.  For example:
> > 
> > 	TP_printk("dev %s ino %lu mode %d uid %u gid %u blocks %llu",
> > 		  jbd2_dev_to_name(__entry->dev), __entry->ino, __entry->mode,
> > 		  __entry->uid, __entry->gid, __entry->blocks)
> > 
> > How should I handle the "jbd2_dev_to_name(__entry->dev)" argument to
> > TP_printk?  The whole point of calling jbd2_dev_to_name() at TP_printk
> > time is to not bloat the ring buffer with a 32 byte devname.
> 
> Understood, and the example you just gave also has the flaw that a 
> userspace tool could not parse it, because it would not know what to do 
> with "jbd2_dev_to_name()".
> 
> This is why I suggested keeping the TP_printk, for cases like this. Since 
> it is also currently useless in userspace.
> 
> But we really should convert all cases, and I was toying with an idea to 
> dynamically make your own data type, and be able to make a way to print 
> it.

Yes, another approach for handling this case would be to take my
"jbd2_dev_to_name" function and support it as a first-class tagged
type; after all, I'm sure ext4 won't be the only place that would like
to take a dev_t and print the device name.  So this could certainly be
fixed by adding some kind of "<dev:xxx>" sort of tagged name.

But I think it would be good to keep TP_printk because otherwise I'll
have to scramble and change my marker->tracepoint patches during the
merge window, which would invalidate all of the testing to date.

I agree that the new tagged format is superior, but I'm wondering
whether it really makes sense to try to scramble and try to switch my
ext4/jbd2 users in the 36 hours or so before Linus opens the merge
window....

						- Ted

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10 16:03         ` Theodore Tso
@ 2009-06-10 16:17           ` Steven Rostedt
  2009-06-11 13:03           ` Christoph Hellwig
  1 sibling, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-06-10 16:17 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frédéric Weisbecker, Ingo Molnar, linux-kernel,
	Andrew Morton, Minchan Kim, Mel Gorman, Christoph Hellwig,
	Rik van Riel, Pekka Enberg, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong


On Wed, 10 Jun 2009, Theodore Tso wrote:

> On Wed, Jun 10, 2009 at 09:49:29AM -0400, Steven Rostedt wrote:
> > > 
> > > Maybe I'm missing something, but looks like the this new format, while
> > > simpler and easier to read, doesn't have support for using a more
> > > complicated C expression as a printk argument.  For example:
> > > 
> > > 	TP_printk("dev %s ino %lu mode %d uid %u gid %u blocks %llu",
> > > 		  jbd2_dev_to_name(__entry->dev), __entry->ino, __entry->mode,
> > > 		  __entry->uid, __entry->gid, __entry->blocks)
> > > 
> > > How should I handle the "jbd2_dev_to_name(__entry->dev)" argument to
> > > TP_printk?  The whole point of calling jbd2_dev_to_name() at TP_printk
> > > time is to not bloat the ring buffer with a 32 byte devname.
> > 
> > Understood, and the example you just gave also has the flaw that a 
> > userspace tool could not parse it, because it would not know what to do 
> > with "jbd2_dev_to_name()".
> > 
> > This is why I suggested keeping the TP_printk, for cases like this. Since 
> > it is also currently useless in userspace.
> > 
> > But we really should convert all cases, and I was toying with an idea to 
> > dynamically make your own data type, and be able to make a way to print 
> > it.
> 
> Yes, another approach for handling this case would be to take my
> "jbd2_dev_to_name" function and support it as a first-class tagged
> type; after all, I'm sure ext4 won't be the only place that would like
> to take a dev_t and print the device name.  So this could certainly be
> fixed by adding some kind of "<dev:xxx>" sort of tagged name.

Yep that could be done as long as we know the mapping will never change.
Userspace needs know what those numbers mean.

> 
> But I think it would be good to keep TP_printk because otherwise I'll
> have to scramble and change my marker->tracepoint patches during the
> merge window, which would invalidate all of the testing to date.

Understood, I made it that both TP_printk and TP_FORMAT can exist
together, but Christoph Hellwig doesn't like that idea. I'm thinking
for quick debug sessions, TP_printk() be used. In fact, if we go to
TP_FORMAT, I'll just make TP_printk no longer show up in the user format.

Then TP_printk() can be used for quick hacks, but if you want something 
merged, it would need to be added to the tag format.

> 
> I agree that the new tagged format is superior, but I'm wondering
> whether it really makes sense to try to scramble and try to switch my
> ext4/jbd2 users in the 36 hours or so before Linus opens the merge
> window....

Relax, we already decided this is .32 material ;-)

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-10 16:03         ` Theodore Tso
  2009-06-10 16:17           ` Steven Rostedt
@ 2009-06-11 13:03           ` Christoph Hellwig
  2009-06-11 15:47             ` Theodore Tso
  1 sibling, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2009-06-11 13:03 UTC (permalink / raw)
  To: Theodore Tso, Steven Rostedt, Fr?d?ric Weisbecker, Ingo Molnar,
	linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
	Christoph Hellwig, Rik van Riel, Pekka Enberg, Peter Zijlstra,
	Mathieu Desnoyers, Lai Jiangshan, Zhaolei, KOSAKI Motohiro,
	Jason Baron, Jiaying Zhang, Tom Zanussi, Xiao Guangrong

On Wed, Jun 10, 2009 at 12:03:03PM -0400, Theodore Tso wrote:
> Yes, another approach for handling this case would be to take my
> "jbd2_dev_to_name" function and support it as a first-class tagged
> type; after all, I'm sure ext4 won't be the only place that would like
> to take a dev_t and print the device name.  So this could certainly be
> fixed by adding some kind of "<dev:xxx>" sort of tagged name.

We should absolutely agree on one way to represent block device for
tracers.  At this point xfs, gfs2 and the block tracer all use
major/minor.  I agree that having a name would be nicer for text output,
but I really want to keep the dev_t for the binary output.  We should
also make sure we agree on a field name so that filtering will work
the same for all block-related tracers (especially important if we want
to trace something from the filesystem down to the block layer).


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-11 13:03           ` Christoph Hellwig
@ 2009-06-11 15:47             ` Theodore Tso
  2009-06-11 17:14               ` Frederic Weisbecker
  0 siblings, 1 reply; 30+ messages in thread
From: Theodore Tso @ 2009-06-11 15:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Steven Rostedt, Fr?d?ric Weisbecker, Ingo Molnar, linux-kernel,
	Andrew Morton, Minchan Kim, Mel Gorman, Rik van Riel,
	Pekka Enberg, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan,
	Zhaolei, KOSAKI Motohiro, Jason Baron, Jiaying Zhang,
	Tom Zanussi, Xiao Guangrong

On Thu, Jun 11, 2009 at 09:03:18AM -0400, Christoph Hellwig wrote:
> On Wed, Jun 10, 2009 at 12:03:03PM -0400, Theodore Tso wrote:
> > Yes, another approach for handling this case would be to take my
> > "jbd2_dev_to_name" function and support it as a first-class tagged
> > type; after all, I'm sure ext4 won't be the only place that would like
> > to take a dev_t and print the device name.  So this could certainly be
> > fixed by adding some kind of "<dev:xxx>" sort of tagged name.
> 
> We should absolutely agree on one way to represent block device for
> tracers.  At this point xfs, gfs2 and the block tracer all use
> major/minor.  I agree that having a name would be nicer for text output,
> but I really want to keep the dev_t for the binary output.  We should
> also make sure we agree on a field name so that filtering will work
> the same for all block-related tracers (especially important if we want
> to trace something from the filesystem down to the block layer).

And I want to keep the dev_t as what stored in the ring buffer for
space reasons; but the text name is far more convient in terms of
being human-readable, *especially* given there are some crazy people
out there wanting to make major/minor numbers be completely random.
The problem with that is that a log trace file might not be useful
after a reboot, unless someone had enough foresight to save the device
mappings which were in effect at the that the trace was taken.

So if we store the dev_t in the ring buffer, and have a way of
translating the dev_t into a human-friendly name at printk happen, I
think that meets both of our goals.  That would imply that filtering
would be using a device number, which would certainly be more
efficient than doing a string compare.

					- Ted


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-11 15:47             ` Theodore Tso
@ 2009-06-11 17:14               ` Frederic Weisbecker
  2009-06-11 19:20                 ` Theodore Tso
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2009-06-11 17:14 UTC (permalink / raw)
  To: Theodore Tso, Christoph Hellwig, Steven Rostedt, Ingo Molnar,
	linux-kernel, Andrew Morton, Minchan Kim, Mel Gorman,
	Rik van Riel, Pekka Enberg, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Zhaolei, KOSAKI Motohiro, Jason Baron,
	Jiaying Zhang, Tom Zanussi, Xiao Guangrong

On Thu, Jun 11, 2009 at 11:47:51AM -0400, Theodore Tso wrote:
> On Thu, Jun 11, 2009 at 09:03:18AM -0400, Christoph Hellwig wrote:
> > On Wed, Jun 10, 2009 at 12:03:03PM -0400, Theodore Tso wrote:
> > > Yes, another approach for handling this case would be to take my
> > > "jbd2_dev_to_name" function and support it as a first-class tagged
> > > type; after all, I'm sure ext4 won't be the only place that would like
> > > to take a dev_t and print the device name.  So this could certainly be
> > > fixed by adding some kind of "<dev:xxx>" sort of tagged name.
> > 
> > We should absolutely agree on one way to represent block device for
> > tracers.  At this point xfs, gfs2 and the block tracer all use
> > major/minor.  I agree that having a name would be nicer for text output,
> > but I really want to keep the dev_t for the binary output.  We should
> > also make sure we agree on a field name so that filtering will work
> > the same for all block-related tracers (especially important if we want
> > to trace something from the filesystem down to the block layer).
> 
> And I want to keep the dev_t as what stored in the ring buffer for
> space reasons; but the text name is far more convient in terms of
> being human-readable, *especially* given there are some crazy people
> out there wanting to make major/minor numbers be completely random.
> The problem with that is that a log trace file might not be useful
> after a reboot, unless someone had enough foresight to save the device
> mappings which were in effect at the that the trace was taken.
> 
> So if we store the dev_t in the ring buffer, and have a way of
> translating the dev_t into a human-friendly name at printk happen, I
> think that meets both of our goals.  That would imply that filtering
> would be using a device number, which would certainly be more
> efficient than doing a string compare.
> 
> 					- Ted


For the filters, we could enter the text name which would be
internally converted into a dev_t, there should be no problem.

Also the raw dev_t can be stored and then human-friendly printed on
print time.

Both seem about trivial to add.
 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] [GIT PULL] more updates for the tag format
  2009-06-11 17:14               ` Frederic Weisbecker
@ 2009-06-11 19:20                 ` Theodore Tso
  2009-06-19  8:14                   ` [BUG] bugs in jbd2_dev_to_name() (was Re: [PATCH 00/11] [GIT PULL] more updates for the tag format) Li Zefan
  0 siblings, 1 reply; 30+ messages in thread
From: Theodore Tso @ 2009-06-11 19:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Hellwig, Steven Rostedt, Ingo Molnar, linux-kernel,
	Andrew Morton, Minchan Kim, Mel Gorman, Rik van Riel,
	Pekka Enberg, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan,
	Zhaolei, KOSAKI Motohiro, Jason Baron, Jiaying Zhang,
	Tom Zanussi, Xiao Guangrong

On Thu, Jun 11, 2009 at 07:14:36PM +0200, Frederic Weisbecker wrote:
> 
> For the filters, we could enter the text name which would be
> internally converted into a dev_t, there should be no problem.
> 
> Also the raw dev_t can be stored and then human-friendly printed on
> print time.
> 
> Both seem about trivial to add.

If someone wants to take this code and drop it into the core tracing
code, please feel free.

						- Ted

/* 
 * jbd2_dev_to_name is a utility function used by the jbd2 and ext4 
 * tracing infrastructure to map a dev_t to a device name.
 *
 * The caller should use rcu_read_lock() in order to make sure the
 * device name stays valid until its done with it.  We use
 * rcu_read_lock() as well to make sure we're safe in case the caller
 * gets sloppy, and because rcu_read_lock() is cheap and can be safely
 * nested.
 */
struct devname_cache {
	struct rcu_head	rcu;
	dev_t		device;
	char		devname[BDEVNAME_SIZE];
};
#define CACHE_SIZE_BITS 6
static struct devname_cache *devcache[1 << CACHE_SIZE_BITS];
static DEFINE_SPINLOCK(devname_cache_lock);

static void free_devcache(struct rcu_head *rcu)
{
	kfree(rcu);
}

const char *jbd2_dev_to_name(dev_t device)
{
	int	i = hash_32(device, CACHE_SIZE_BITS);
	char	*ret;
	struct block_device *bd;

	rcu_read_lock();
	if (devcache[i] && devcache[i]->device == device) {
		ret = devcache[i]->devname;
		rcu_read_unlock();
		return ret;
	}
	rcu_read_unlock();

	spin_lock(&devname_cache_lock);
	if (devcache[i]) {
		if (devcache[i]->device == device) {
			ret = devcache[i]->devname;
			spin_unlock(&devname_cache_lock);
			return ret;
		}
		call_rcu(&devcache[i]->rcu, free_devcache);
	}
	devcache[i] = kmalloc(sizeof(struct devname_cache), GFP_KERNEL);
	if (!devcache[i]) {
		spin_unlock(&devname_cache_lock);
		return "NODEV-ALLOCFAILURE"; /* Something non-NULL */
	}
	devcache[i]->device = device;
	bd = bdget(device);
	if (bd) {
		bdevname(bd, devcache[i]->devname);
		bdput(bd);
	} else
		__bdevname(device, devcache[i]->devname);
	ret = devcache[i]->devname;
	spin_unlock(&devname_cache_lock);
	return ret;
}
EXPORT_SYMBOL(jbd2_dev_to_name);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [BUG] bugs in jbd2_dev_to_name() (was Re: [PATCH 00/11] [GIT PULL] more updates for the tag format)
  2009-06-11 19:20                 ` Theodore Tso
@ 2009-06-19  8:14                   ` Li Zefan
  2009-06-19 12:32                     ` Theodore Tso
  0 siblings, 1 reply; 30+ messages in thread
From: Li Zefan @ 2009-06-19  8:14 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frederic Weisbecker, Christoph Hellwig, Steven Rostedt,
	Ingo Molnar, linux-kernel, Andrew Morton, linux-ext4

Theodore Tso wrote:
> On Thu, Jun 11, 2009 at 07:14:36PM +0200, Frederic Weisbecker wrote:
>> For the filters, we could enter the text name which would be
>> internally converted into a dev_t, there should be no problem.
>>
>> Also the raw dev_t can be stored and then human-friendly printed on
>> print time.
>>
>> Both seem about trivial to add.
> 
> If someone wants to take this code and drop it into the core tracing
> code, please feel free.
> 

Just notice this code has been merge, but there are 2 bugs in it.

> 						- Ted
> 
> /* 
>  * jbd2_dev_to_name is a utility function used by the jbd2 and ext4 
>  * tracing infrastructure to map a dev_t to a device name.
>  *
>  * The caller should use rcu_read_lock() in order to make sure the
>  * device name stays valid until its done with it.  We use
>  * rcu_read_lock() as well to make sure we're safe in case the caller
>  * gets sloppy, and because rcu_read_lock() is cheap and can be safely
>  * nested.
>  */
> struct devname_cache {
> 	struct rcu_head	rcu;
> 	dev_t		device;
> 	char		devname[BDEVNAME_SIZE];
> };
> #define CACHE_SIZE_BITS 6
> static struct devname_cache *devcache[1 << CACHE_SIZE_BITS];
> static DEFINE_SPINLOCK(devname_cache_lock);
> 
> static void free_devcache(struct rcu_head *rcu)
> {
> 	kfree(rcu);
> }
> 
> const char *jbd2_dev_to_name(dev_t device)
> {
> 	int	i = hash_32(device, CACHE_SIZE_BITS);
> 	char	*ret;
> 	struct block_device *bd;
> 
> 	rcu_read_lock();
> 	if (devcache[i] && devcache[i]->device == device) {
> 		ret = devcache[i]->devname;
> 		rcu_read_unlock();
> 		return ret;

It doesn't seem safe to dereference @ret outside rcu read section.

> 	}
> 	rcu_read_unlock();
> 
> 	spin_lock(&devname_cache_lock);
> 	if (devcache[i]) {
> 		if (devcache[i]->device == device) {
> 			ret = devcache[i]->devname;
> 			spin_unlock(&devname_cache_lock);
> 			return ret;
> 		}
> 		call_rcu(&devcache[i]->rcu, free_devcache);
> 	}
> 	devcache[i] = kmalloc(sizeof(struct devname_cache), GFP_KERNEL);

kmalloc(GFP_KERNEL) called with spin_lock held..

> 	if (!devcache[i]) {
> 		spin_unlock(&devname_cache_lock);
> 		return "NODEV-ALLOCFAILURE"; /* Something non-NULL */
> 	}
> 	devcache[i]->device = device;
> 	bd = bdget(device);
> 	if (bd) {
> 		bdevname(bd, devcache[i]->devname);
> 		bdput(bd);
> 	} else
> 		__bdevname(device, devcache[i]->devname);
> 	ret = devcache[i]->devname;
> 	spin_unlock(&devname_cache_lock);
> 	return ret;
> }
> EXPORT_SYMBOL(jbd2_dev_to_name);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] bugs in jbd2_dev_to_name() (was Re: [PATCH 00/11] [GIT PULL] more updates for the tag format)
  2009-06-19  8:14                   ` [BUG] bugs in jbd2_dev_to_name() (was Re: [PATCH 00/11] [GIT PULL] more updates for the tag format) Li Zefan
@ 2009-06-19 12:32                     ` Theodore Tso
  2009-06-22  1:36                         ` Li Zefan
  0 siblings, 1 reply; 30+ messages in thread
From: Theodore Tso @ 2009-06-19 12:32 UTC (permalink / raw)
  To: Li Zefan
  Cc: Frederic Weisbecker, Christoph Hellwig, Steven Rostedt,
	Ingo Molnar, linux-kernel, Andrew Morton, linux-ext4

On Fri, Jun 19, 2009 at 04:14:23PM +0800, Li Zefan wrote:
> > 	rcu_read_lock();
> > 	if (devcache[i] && devcache[i]->device == device) {
> > 		ret = devcache[i]->devname;
> > 		rcu_read_unlock();
> > 		return ret;
> 
> It doesn't seem safe to dereference @ret outside rcu read section.

Note the comments at the beginning of the function:

    The caller should use rcu_read_lock() in order to make sure the
    device name stays valid until its done with it.  We use
    rcu_read_lock() as well to make sure we're safe in case the caller
    gets sloppy, and because rcu_read_lock() is cheap and can be safely
    nested.

I suppose I should change the wording to indicate that it adds a bit
more safety (as in, the crash won't happen inside this function, but
as far as the caller is concerned, all bets are off!)

> > 	spin_lock(&devname_cache_lock);
> > 	if (devcache[i]) {
> > 		if (devcache[i]->device == device) {
> > 			ret = devcache[i]->devname;
> > 			spin_unlock(&devname_cache_lock);
> > 			return ret;
> > 		}
> > 		call_rcu(&devcache[i]->rcu, free_devcache);
> > 	}
> > 	devcache[i] = kmalloc(sizeof(struct devname_cache), GFP_KERNEL);
> 
> kmalloc(GFP_KERNEL) called with spin_lock held..

Good catch, thanks.  I'll get a patch in to fix this.

							- Ted

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] bugs in jbd2_dev_to_name() (was Re: [PATCH 00/11] [GIT PULL] more updates for the tag format)
  2009-06-19 12:32                     ` Theodore Tso
@ 2009-06-22  1:36                         ` Li Zefan
  0 siblings, 0 replies; 30+ messages in thread
From: Li Zefan @ 2009-06-22  1:36 UTC (permalink / raw)
  To: Theodore Tso, Li Zefan, Frederic Weisbecker, Christoph Hellwig,
	Steven Rostedt, Ingo Molnar, linux-kernel, Andrew Morton,
	linux-ext4

Theodore Tso wrote:
> On Fri, Jun 19, 2009 at 04:14:23PM +0800, Li Zefan wrote:
>>> 	rcu_read_lock();
>>> 	if (devcache[i] && devcache[i]->device == device) {
>>> 		ret = devcache[i]->devname;
>>> 		rcu_read_unlock();
>>> 		return ret;
>> It doesn't seem safe to dereference @ret outside rcu read section.
> 
> Note the comments at the beginning of the function:
> 

Ah, I overlooked the comments.

But the patch that adds rcu locking around trace event prints
never gets merged:

http://lkml.org/lkml/2009/4/15/549

Steven?

>     The caller should use rcu_read_lock() in order to make sure the
>     device name stays valid until its done with it.  We use
>     rcu_read_lock() as well to make sure we're safe in case the caller
>     gets sloppy, and because rcu_read_lock() is cheap and can be safely
>     nested.
> 
> I suppose I should change the wording to indicate that it adds a bit
> more safety (as in, the crash won't happen inside this function, but
> as far as the caller is concerned, all bets are off!)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [BUG] bugs in jbd2_dev_to_name() (was Re: [PATCH 00/11] [GIT PULL] more updates for the tag format)
@ 2009-06-22  1:36                         ` Li Zefan
  0 siblings, 0 replies; 30+ messages in thread
From: Li Zefan @ 2009-06-22  1:36 UTC (permalink / raw)
  To: Theodore Tso, Li Zefan, Frederic Weisbecker, Christoph Hellwig,
	Steven Rostedt, I

Theodore Tso wrote:
> On Fri, Jun 19, 2009 at 04:14:23PM +0800, Li Zefan wrote:
>>> 	rcu_read_lock();
>>> 	if (devcache[i] && devcache[i]->device == device) {
>>> 		ret = devcache[i]->devname;
>>> 		rcu_read_unlock();
>>> 		return ret;
>> It doesn't seem safe to dereference @ret outside rcu read section.
> 
> Note the comments at the beginning of the function:
> 

Ah, I overlooked the comments.

But the patch that adds rcu locking around trace event prints
never gets merged:

http://lkml.org/lkml/2009/4/15/549

Steven?

>     The caller should use rcu_read_lock() in order to make sure the
>     device name stays valid until its done with it.  We use
>     rcu_read_lock() as well to make sure we're safe in case the caller
>     gets sloppy, and because rcu_read_lock() is cheap and can be safely
>     nested.
> 
> I suppose I should change the wording to indicate that it adds a bit
> more safety (as in, the crash won't happen inside this function, but
> as far as the caller is concerned, all bets are off!)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-06-22  1:35 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-10  5:42 [PATCH 00/11] [GIT PULL] more updates for the tag format Steven Rostedt
2009-06-10  5:42 ` [PATCH 01/11] tracing/events: convert block trace points to TRACE_EVENT(), fix Steven Rostedt
2009-06-10  5:42 ` [PATCH 02/11] tracing: add nsec2sec print formats Steven Rostedt
2009-06-10  5:42 ` [PATCH 03/11] tracing: convert lockdep lock_acquired trace point to use nsec2usec tag Steven Rostedt
2009-06-10  5:42 ` [PATCH 04/11] tracing: add major and minor tags for print format Steven Rostedt
2009-06-10  5:42 ` [PATCH 05/11] tracing: use << to print < instead of \< Steven Rostedt
2009-06-10  5:42 ` [PATCH 06/11] tracing: convert the block trace points to use the new tag format Steven Rostedt
2009-06-10  5:42 ` [PATCH 07/11] tracing: add test for strings in event " Steven Rostedt
2009-06-10  5:42 ` [PATCH 08/11] tracing: add func and symfunc to " Steven Rostedt
2009-06-10  7:48   ` Frederic Weisbecker
2009-06-10 12:55     ` Steven Rostedt
2009-06-10  5:42 ` [PATCH 09/11] tracing: check full name for field Steven Rostedt
2009-06-10  5:42 ` [PATCH 10/11] tracing: update sample code with new tag format Steven Rostedt
2009-06-10  5:42 ` [PATCH 11/11] tracing: move > to out of macros and into print statement Steven Rostedt
2009-06-10  9:26 ` [PATCH 00/11] [GIT PULL] more updates for the tag format Ingo Molnar
2009-06-10 11:11   ` Frédéric Weisbecker
2009-06-10 13:01     ` Theodore Tso
2009-06-10 13:49       ` Steven Rostedt
2009-06-10 14:39         ` Mathieu Desnoyers
2009-06-10 15:21           ` Steven Rostedt
2009-06-10 16:03         ` Theodore Tso
2009-06-10 16:17           ` Steven Rostedt
2009-06-11 13:03           ` Christoph Hellwig
2009-06-11 15:47             ` Theodore Tso
2009-06-11 17:14               ` Frederic Weisbecker
2009-06-11 19:20                 ` Theodore Tso
2009-06-19  8:14                   ` [BUG] bugs in jbd2_dev_to_name() (was Re: [PATCH 00/11] [GIT PULL] more updates for the tag format) Li Zefan
2009-06-19 12:32                     ` Theodore Tso
2009-06-22  1:36                       ` Li Zefan
2009-06-22  1:36                         ` Li Zefan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.