All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vfs: Add a trace point in the mark_inode_dirty function
@ 2010-11-26 20:56 Arjan van de Ven
  2010-11-28 17:52 ` Christoph Hellwig
  0 siblings, 1 reply; 39+ messages in thread
From: Arjan van de Ven @ 2010-11-26 20:56 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Al Viro

long time overdue to clean this guy up;
this patch adds a very basic trace point for tracking who dirties an inode
(this is the guy who is the real cause for a disk to spin up, so PowerTOP would
like to know this...)

compared to earlier patches the trace point is done a lot simpler, and consistent with how others
(ext4 etc) do their vfs level tracepoints; with a device number and inode number


>From 880a17e27f0b90d32c0c0b9cf4cd9b4e5d779c73 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Fri, 26 Nov 2010 12:18:03 -0800
Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

PowerTOP would like to be able to show who is keeping the disk
busy by dirtying data. The most logical spot for this is in the vfs
in the mark_inode_dirty() function, doing this on the block level
is not possible because by the time the IO hits the block layer the
guilty party can no longer be found ("kjournald" and "pdflush" are not
useful answers to "who caused this file to be dirty).

The trace point follows the same logic/style as the block_dump code
and pretty much dumps the same data, just not to dmesg (and thus to
/var/log/messages) but via the trace events streams.

Eventually we should be able to phase out the block dump code, but that's
for later on after a transition time.

Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/fs-writeback.c          |    4 ++++
 fs/inode.c                 |    4 ++++
 include/trace/events/vfs.h |   35 +++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/vfs.h

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 3d06ccc..6c93517 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -27,6 +27,7 @@
 #include <linux/backing-dev.h>
 #include <linux/buffer_head.h>
 #include <linux/tracepoint.h>
+#include <trace/events/vfs.h>
 #include "internal.h"
 
 /*
@@ -942,6 +943,9 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 			sb->s_op->dirty_inode(inode);
 	}
 
+	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES))
+		trace_inode_dirty(inode);
+
 	/*
 	 * make sure that changes are seen by all cpus before we test i_state
 	 * -- mikulas
diff --git a/fs/inode.c b/fs/inode.c
index ae2727a..e8b1d42 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1708,3 +1708,7 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
 	inode->i_mode = mode;
 }
 EXPORT_SYMBOL(inode_init_owner);
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/vfs.h>
+
diff --git a/include/trace/events/vfs.h b/include/trace/events/vfs.h
new file mode 100644
index 0000000..7e5c135
--- /dev/null
+++ b/include/trace/events/vfs.h
@@ -0,0 +1,35 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vfs
+
+#if !defined(_TRACE_VFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VFS_H
+
+/*
+ * Tracepoint for dirtying an inode:
+ */
+TRACE_EVENT(inode_dirty,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode),
+
+	TP_STRUCT__entry(
+		__field(	int,	dev_major		)
+		__field(	int,	dev_minor		)
+		__field(	ino_t,	ino			)
+	),
+
+	TP_fast_assign(
+		__entry->dev_major = MAJOR(inode->i_sb->s_dev);
+		__entry->dev_minor = MINOR(inode->i_sb->s_dev);
+		__entry->ino	   = inode->i_ino;
+	),
+
+	TP_printk("dev %d,%d ino %lu ", __entry->dev_major, __entry->dev_minor,
+		  (unsigned long) __entry->ino)
+);
+
+#endif /* _TRACE_VFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.7.2.3


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-26 20:56 [PATCH] vfs: Add a trace point in the mark_inode_dirty function Arjan van de Ven
@ 2010-11-28 17:52 ` Christoph Hellwig
  2010-11-28 18:43   ` Arjan van de Ven
  0 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2010-11-28 17:52 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-fsdevel, Christoph Hellwig, Al Viro

Looks generally good to me.

Two sugestions:

 - also trace the flags value and decode it using __print_flags in the
   output
 - use a dev_t instead of the split major/minor to follow the way the
   block layer, writeback and xfs tracepoints track the block device.
   Seems like this was copied from the recent conversion of ext4 away
   from it's braindead string printing, which didn't follow the existing
   way either.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-28 17:52 ` Christoph Hellwig
@ 2010-11-28 18:43   ` Arjan van de Ven
  2010-11-29  1:41     ` KOSAKI Motohiro
  2010-11-29  3:53     ` Nick Piggin
  0 siblings, 2 replies; 39+ messages in thread
From: Arjan van de Ven @ 2010-11-28 18:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, Al Viro

On Sun, 28 Nov 2010 12:52:56 -0500
Christoph Hellwig <hch@infradead.org> wrote:

> Looks generally good to me.
> 
> Two sugestions:
> 
>  - also trace the flags value and decode it using __print_flags in the
>    output
>  - use a dev_t instead of the split major/minor to follow the way the
>    block layer, writeback and xfs tracepoints track the block device.
>    Seems like this was copied from the recent conversion of ext4 away
>    from it's braindead string printing, which didn't follow the
> existing way either.
> 

ok how about this?

only question left is if we should trave every dirty, or only those that
change the flags.

Right now the patch only traces the non-trivial (eg changing) ones... but opinions welcome
(otherwise you get MANY duplicate, basically no-op trace events)


>From 3950d3c04a6bf8ccf9ff912a49bdd242a2fe9e47 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Fri, 26 Nov 2010 12:18:03 -0800
Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

PowerTOP would like to be able to show who is keeping the disk
busy by dirtying data. The most logical spot for this is in the vfs
in the mark_inode_dirty() function, doing this on the block level
is not possible because by the time the IO hits the block layer the
guilty party can no longer be found ("kjournald" and "pdflush" are not
useful answers to "who caused this file to be dirty).

The trace point follows the same logic/style as the block_dump code
and pretty much dumps the same data, just not to dmesg (and thus to
/var/log/messages) but via the trace events streams.

Eventually we should be able to phase out the block dump code, but that's
for later on after a transition time.

Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/fs-writeback.c                |    3 +++
 include/linux/fs.h               |   12 ++++++++++++
 include/trace/events/writeback.h |   28 ++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 3d06ccc..62e33cc 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -952,6 +952,9 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	if ((inode->i_state & flags) == flags)
 		return;
 
+	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES))
+		trace_writeback_inode_dirty(inode, flags);
+
 	if (unlikely(block_dump))
 		block_dump___mark_inode_dirty(inode);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c9e06cc..25935e1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1676,6 +1676,18 @@ struct super_operations {
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 
+#define INODE_DIRTY_FLAGS \
+	{ I_DIRTY_SYNC,		"DIRTY-SYNC" }, \
+	{ I_DIRTY_DATASYNC,	"DIRTY-DATASYNC" }, \
+	{ I_DIRTY_PAGES,	"DIRTY-PAGES" }, \
+	{ I_NEW,		"NEW" }, \
+	{ I_WILL_FREE,		"WILL-FREE" }, \
+	{ I_FREEING,		"FREEING" }, \
+	{ I_CLEAR,		"CLEAR" }, \
+	{ I_SYNC,		"SYNC" }, \
+	{ I_REFERENCED,		"REFERENCED" }
+
+
 extern void __mark_inode_dirty(struct inode *, int);
 static inline void mark_inode_dirty(struct inode *inode)
 {
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 89a2b2d..5c80875 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -186,6 +186,34 @@ DEFINE_EVENT(writeback_congest_waited_template, writeback_wait_iff_congested,
 	TP_ARGS(usec_timeout, usec_delayed)
 );
 
+/*
+ * Tracepoint for dirtying an inode; used by PowerTOP
+ */
+TRACE_EVENT(writeback_inode_dirty,
+
+	TP_PROTO(struct inode *inode, int flags),
+
+	TP_ARGS(inode, flags),
+
+	TP_STRUCT__entry(
+		__field(	__kernel_dev_t,	dev		)
+		__field(	ino_t,		ino		)
+		__field(	u32,		flags		)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->flags  = flags;
+	),
+
+	TP_printk("dev %d:%d ino %lu flags %d %s", MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino,
+		  __entry->flags,
+		  __print_flags(__entry->flags, "|", INODE_DIRTY_FLAGS)
+	)
+);
+
 #endif /* _TRACE_WRITEBACK_H */
 
 /* This part must be outside protection */
-- 
1.7.2.3



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-28 18:43   ` Arjan van de Ven
@ 2010-11-29  1:41     ` KOSAKI Motohiro
  2010-11-29  4:54       ` Arjan van de Ven
  2010-11-29  3:53     ` Nick Piggin
  1 sibling, 1 reply; 39+ messages in thread
From: KOSAKI Motohiro @ 2010-11-29  1:41 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: kosaki.motohiro, Christoph Hellwig, linux-fsdevel, Al Viro

> Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  fs/fs-writeback.c                |    3 +++
>  include/linux/fs.h               |   12 ++++++++++++
>  include/trace/events/writeback.h |   28 ++++++++++++++++++++++++++++
>  3 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 3d06ccc..62e33cc 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -952,6 +952,9 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  	if ((inode->i_state & flags) == flags)
>  		return;
>  
> +	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES))
> +		trace_writeback_inode_dirty(inode, flags);
> +

Why can't we move this branch into TP_fast_assign()?




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-28 18:43   ` Arjan van de Ven
  2010-11-29  1:41     ` KOSAKI Motohiro
@ 2010-11-29  3:53     ` Nick Piggin
  1 sibling, 0 replies; 39+ messages in thread
From: Nick Piggin @ 2010-11-29  3:53 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Christoph Hellwig, linux-fsdevel, Al Viro

On Mon, Nov 29, 2010 at 5:43 AM, Arjan van de Ven <arjan@infradead.org> wrote:
> On Sun, 28 Nov 2010 12:52:56 -0500
> Christoph Hellwig <hch@infradead.org> wrote:
>
>> Looks generally good to me.
>>
>> Two sugestions:
>>
>>  - also trace the flags value and decode it using __print_flags in the
>>    output
>>  - use a dev_t instead of the split major/minor to follow the way the
>>    block layer, writeback and xfs tracepoints track the block device.
>>    Seems like this was copied from the recent conversion of ext4 away
>>    from it's braindead string printing, which didn't follow the
>> existing way either.
>>
>
> ok how about this?
>
> only question left is if we should trave every dirty, or only those that
> change the flags.
>
> Right now the patch only traces the non-trivial (eg changing) ones... but opinions welcome
> (otherwise you get MANY duplicate, basically no-op trace events)
>
>
> From 3950d3c04a6bf8ccf9ff912a49bdd242a2fe9e47 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Fri, 26 Nov 2010 12:18:03 -0800
> Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
>
> PowerTOP would like to be able to show who is keeping the disk
> busy by dirtying data. The most logical spot for this is in the vfs
> in the mark_inode_dirty() function, doing this on the block level
> is not possible because by the time the IO hits the block layer the
> guilty party can no longer be found ("kjournald" and "pdflush" are not
> useful answers to "who caused this file to be dirty).
>
> The trace point follows the same logic/style as the block_dump code
> and pretty much dumps the same data, just not to dmesg (and thus to
> /var/log/messages) but via the trace events streams.
>
> Eventually we should be able to phase out the block dump code, but that's
> for later on after a transition time.
>
> Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  fs/fs-writeback.c                |    3 +++
>  include/linux/fs.h               |   12 ++++++++++++
>  include/trace/events/writeback.h |   28 ++++++++++++++++++++++++++++
>  3 files changed, 43 insertions(+), 0 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 3d06ccc..62e33cc 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -952,6 +952,9 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>        if ((inode->i_state & flags) == flags)
>                return;
>
> +       if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES))
> +               trace_writeback_inode_dirty(inode, flags);
> +
>        if (unlikely(block_dump))
>                block_dump___mark_inode_dirty(inode);
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c9e06cc..25935e1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1676,6 +1676,18 @@ struct super_operations {
>
>  #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
>
> +#define INODE_DIRTY_FLAGS \
> +       { I_DIRTY_SYNC,         "DIRTY-SYNC" }, \
> +       { I_DIRTY_DATASYNC,     "DIRTY-DATASYNC" }, \
> +       { I_DIRTY_PAGES,        "DIRTY-PAGES" }, \
> +       { I_NEW,                "NEW" }, \
> +       { I_WILL_FREE,          "WILL-FREE" }, \
> +       { I_FREEING,            "FREEING" }, \
> +       { I_CLEAR,              "CLEAR" }, \
> +       { I_SYNC,               "SYNC" }, \
> +       { I_REFERENCED,         "REFERENCED" }

Can you change these to exactly the same name as the flags ("I_DIRTY_SYNC",..).
Arbitrarily changing them a little bit is unnecessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-29  1:41     ` KOSAKI Motohiro
@ 2010-11-29  4:54       ` Arjan van de Ven
  2010-11-29  5:15         ` KOSAKI Motohiro
  0 siblings, 1 reply; 39+ messages in thread
From: Arjan van de Ven @ 2010-11-29  4:54 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Christoph Hellwig, linux-fsdevel, Al Viro

On Mon, 29 Nov 2010 10:41:51 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
> > ---
> >  fs/fs-writeback.c                |    3 +++
> >  include/linux/fs.h               |   12 ++++++++++++
> >  include/trace/events/writeback.h |   28
> > ++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 0
> > deletions(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 3d06ccc..62e33cc 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -952,6 +952,9 @@ void __mark_inode_dirty(struct inode *inode,
> > int flags) if ((inode->i_state & flags) == flags)
> >  		return;
> >  
> > +	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC |
> > I_DIRTY_PAGES))
> > +		trace_writeback_inode_dirty(inode, flags);
> > +
> 
> Why can't we move this branch into TP_fast_assign()?

not really because then the tracepoint is already in process of being
emitted... no way to retract it anymore.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-29  4:54       ` Arjan van de Ven
@ 2010-11-29  5:15         ` KOSAKI Motohiro
  2010-11-29 14:21           ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: KOSAKI Motohiro @ 2010-11-29  5:15 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: kosaki.motohiro, Christoph Hellwig, linux-fsdevel, Al Viro,
	Steven Rostedt

> On Mon, 29 Nov 2010 10:41:51 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > > Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
> > > ---
> > >  fs/fs-writeback.c                |    3 +++
> > >  include/linux/fs.h               |   12 ++++++++++++
> > >  include/trace/events/writeback.h |   28
> > > ++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 0
> > > deletions(-)
> > > 
> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index 3d06ccc..62e33cc 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -952,6 +952,9 @@ void __mark_inode_dirty(struct inode *inode,
> > > int flags) if ((inode->i_state & flags) == flags)
> > >  		return;
> > >  
> > > +	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC |
> > > I_DIRTY_PAGES))
> > > +		trace_writeback_inode_dirty(inode, flags);
> > > +
> > 
> > Why can't we move this branch into TP_fast_assign()?
> 
> not really because then the tracepoint is already in process of being
> emitted... no way to retract it anymore.

I'm not tracing expert. but Steven said we can it in past. (cc him)

http://www.spinics.net/lists/linux-ext4/msg20045.html





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-29  5:15         ` KOSAKI Motohiro
@ 2010-11-29 14:21           ` Steven Rostedt
  2010-11-29 14:41             ` Mathieu Desnoyers
  2010-11-30  0:37             ` KOSAKI Motohiro
  0 siblings, 2 replies; 39+ messages in thread
From: Steven Rostedt @ 2010-11-29 14:21 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Arjan van de Ven, Christoph Hellwig, linux-fsdevel, Al Viro,
	Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers

[ Adding Ingo, Peter and Mathieu ]

On Mon, 2010-11-29 at 14:15 +0900, KOSAKI Motohiro wrote:
> > On Mon, 29 Nov 2010 10:41:51 +0900 (JST)
> > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > 
> > > > Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
> > > > ---
> > > >  fs/fs-writeback.c                |    3 +++
> > > >  include/linux/fs.h               |   12 ++++++++++++
> > > >  include/trace/events/writeback.h |   28
> > > > ++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 0
> > > > deletions(-)
> > > > 
> > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > > index 3d06ccc..62e33cc 100644
> > > > --- a/fs/fs-writeback.c
> > > > +++ b/fs/fs-writeback.c
> > > > @@ -952,6 +952,9 @@ void __mark_inode_dirty(struct inode *inode,
> > > > int flags) if ((inode->i_state & flags) == flags)
> > > >  		return;
> > > >  
> > > > +	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC |
> > > > I_DIRTY_PAGES))
> > > > +		trace_writeback_inode_dirty(inode, flags);
> > > > +
> > > 
> > > Why can't we move this branch into TP_fast_assign()?
> > 
> > not really because then the tracepoint is already in process of being
> > emitted... no way to retract it anymore.
> 
> I'm not tracing expert. but Steven said we can it in past. (cc him)
> 
> http://www.spinics.net/lists/linux-ext4/msg20045.html
> 

Yes, this came up before. Currently, if you add the if statement in the
trcacepoint, you just wasted space. But if we can add a discard_entry,
or better yet, I can add a TRACE_EVENT_CONDITION() that adds an if
statement to check. Something like:

	TRACE_EVENT_CONDITION(name,
		[...]
	TP_condition(
			if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC |
			    I_DIRTY_PAGES))
				return 1;
			else
				return 0;
		    ),

Have this run on the parameters and not the entry fields (because the
entry fields are from the ring buffer, and to use them, we must first
write to the ring buffer (something we want to avoid).

We could also make this code work with "pre-filters". That is, filters
on the tracepoints that access the parameters and not the final entries.
This would let us circumvent allocating ring buffer space when the
filter states to skip the entry.

-- Steve

			



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-29 14:21           ` Steven Rostedt
@ 2010-11-29 14:41             ` Mathieu Desnoyers
  2010-11-29 16:31               ` Steven Rostedt
  2010-11-30  0:37             ` KOSAKI Motohiro
  1 sibling, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2010-11-29 14:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: KOSAKI Motohiro, Arjan van de Ven, Christoph Hellwig,
	linux-fsdevel, Al Viro, Ingo Molnar, Peter Zijlstra

* Steven Rostedt (rostedt@goodmis.org) wrote:
> [ Adding Ingo, Peter and Mathieu ]
> 
> On Mon, 2010-11-29 at 14:15 +0900, KOSAKI Motohiro wrote:
> > > On Mon, 29 Nov 2010 10:41:51 +0900 (JST)
> > > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > > 
> > > > > Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
> > > > > ---
> > > > >  fs/fs-writeback.c                |    3 +++
> > > > >  include/linux/fs.h               |   12 ++++++++++++
> > > > >  include/trace/events/writeback.h |   28
> > > > > ++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 0
> > > > > deletions(-)
> > > > > 
> > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > > > index 3d06ccc..62e33cc 100644
> > > > > --- a/fs/fs-writeback.c
> > > > > +++ b/fs/fs-writeback.c
> > > > > @@ -952,6 +952,9 @@ void __mark_inode_dirty(struct inode *inode,
> > > > > int flags) if ((inode->i_state & flags) == flags)
> > > > >  		return;
> > > > >  
> > > > > +	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC |
> > > > > I_DIRTY_PAGES))
> > > > > +		trace_writeback_inode_dirty(inode, flags);
> > > > > +
> > > > 
> > > > Why can't we move this branch into TP_fast_assign()?
> > > 
> > > not really because then the tracepoint is already in process of being
> > > emitted... no way to retract it anymore.
> > 
> > I'm not tracing expert. but Steven said we can it in past. (cc him)
> > 
> > http://www.spinics.net/lists/linux-ext4/msg20045.html
> > 
> 
> Yes, this came up before. Currently, if you add the if statement in the
> trcacepoint, you just wasted space. But if we can add a discard_entry,
> or better yet, I can add a TRACE_EVENT_CONDITION() that adds an if
> statement to check. Something like:
> 
> 	TRACE_EVENT_CONDITION(name,
> 		[...]
> 	TP_condition(
> 			if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC |
> 			    I_DIRTY_PAGES))
> 				return 1;
> 			else
> 				return 0;
> 		    ),

Yep, something like this would be really useful for us too in user-space. One
example use-case is to let a telecom application filter by "call ID" when
following one call across multiple telecom switches. The kind of condition we
would have is:

 	TRACE_EVENT_CONDITION(name,
 		[...]
 	TP_condition(
 			if (unlikely(call_id == monitored_call_id))
 				return 1;
                        if (unlikely(!call_id_monitoring))
                                return 1;
 			return 0;
 		    ),

So in this case, "call_id" is received as parameter, but "monitored_call_id" and
"call_id_monitoring" are global variables.

One question that strikes me is: would you declare this outside of the
TRACE_EVENT() or inside it ? How would you match the TRACE_EVENT() and the
TRACE_EVENT_CONDITION() if they are separate, by name ?

The dynamic "pre-filters" will also be useful, but I think this kind of
statically defined conditions will answer a large amount of our requirements,
letting these filtering expressions be designed by application developers and
used by operators/engineers.

Thanks,

Mathieu

> 
> Have this run on the parameters and not the entry fields (because the
> entry fields are from the ring buffer, and to use them, we must first
> write to the ring buffer (something we want to avoid).
> 
> We could also make this code work with "pre-filters". That is, filters
> on the tracepoints that access the parameters and not the final entries.
> This would let us circumvent allocating ring buffer space when the
> filter states to skip the entry.
> 
> -- Steve
> 
> 			
> 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-29 14:41             ` Mathieu Desnoyers
@ 2010-11-29 16:31               ` Steven Rostedt
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2010-11-29 16:31 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: KOSAKI Motohiro, Arjan van de Ven, Christoph Hellwig,
	linux-fsdevel, Al Viro, Ingo Molnar, Peter Zijlstra

On Mon, 2010-11-29 at 09:41 -0500, Mathieu Desnoyers wrote:

> One question that strikes me is: would you declare this outside of the
> TRACE_EVENT() or inside it ? How would you match the TRACE_EVENT() and the
> TRACE_EVENT_CONDITION() if they are separate, by name ?
> 

TRACE_EVENT_CONDITION() would just add the condition field and
TRACE_EVENT() will just have a null condition:

#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
	TRACE_EVENT_CONDITION(name, proto, args, , tsturct, \
		assign, print)

where that ", ," is the condition expression. (I also left out the
PARAMS() that would be required here too.)

-- Steve



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2010-11-29 14:21           ` Steven Rostedt
  2010-11-29 14:41             ` Mathieu Desnoyers
@ 2010-11-30  0:37             ` KOSAKI Motohiro
  1 sibling, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2010-11-30  0:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: kosaki.motohiro, Arjan van de Ven, Christoph Hellwig,
	linux-fsdevel, Al Viro, Ingo Molnar, Peter Zijlstra,
	Mathieu Desnoyers

> Yes, this came up before. Currently, if you add the if statement in the
> trcacepoint, you just wasted space. But if we can add a discard_entry,
> or better yet, I can add a TRACE_EVENT_CONDITION() that adds an if
> statement to check. Something like:
> 
> 	TRACE_EVENT_CONDITION(name,
> 		[...]
> 	TP_condition(
> 			if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC |
> 			    I_DIRTY_PAGES))
> 				return 1;
> 			else
> 				return 0;
> 		    ),

I love this idea. :)

For MM, struct page has a lot of union hack. then, page tracer strongly need conditional
trace point.


> 
> Have this run on the parameters and not the entry fields (because the
> entry fields are from the ring buffer, and to use them, we must first
> write to the ring buffer (something we want to avoid).
> 
> We could also make this code work with "pre-filters". That is, filters
> on the tracepoints that access the parameters and not the final entries.
> This would let us circumvent allocating ring buffer space when the
> filter states to skip the entry.
> 
> -- Steve
> 
> 			
> 
> 




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-20 16:05                   ` Jamie Lokier
@ 2009-11-20 16:45                     ` Arjan van de Ven
  0 siblings, 0 replies; 39+ messages in thread
From: Arjan van de Ven @ 2009-11-20 16:45 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Christoph Hellwig, Ingo Molnar, Kok, Auke, Frank Ch. Eigler,
	Jeff Garzik, Wu, Fengguang, linux-kernel, linux-fsdevel, Al Viro,
	Frederic Weisbecker

On Fri, 20 Nov 2009 16:05:26 +0000
Jamie Lokier <jamie@shareable.org> wrote:

> Arjan van de Ven wrote:
> > in my case it's not about finding the file, but finding the place in
> > the application that is doing the writing. The last pathname
> > component is more than enough for this....
> 
> So what you really need is the source file and line number in your
> application where it does the writing :-)

while you present this as a joke.... perf can more or less do this
using the backtrace infrastructure ...



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-20 14:45                 ` Arjan van de Ven
@ 2009-11-20 16:05                   ` Jamie Lokier
  2009-11-20 16:45                     ` Arjan van de Ven
  0 siblings, 1 reply; 39+ messages in thread
From: Jamie Lokier @ 2009-11-20 16:05 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Christoph Hellwig, Ingo Molnar, Kok, Auke, Frank Ch. Eigler,
	Jeff Garzik, Wu, Fengguang, linux-kernel, linux-fsdevel, Al Viro,
	Frederic Weisbecker

Arjan van de Ven wrote:
> in my case it's not about finding the file, but finding the place in
> the application that is doing the writing. The last pathname component
> is more than enough for this....

So what you really need is the source file and line number in your
application where it does the writing :-)

-- Jamie

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-20 10:43               ` Christoph Hellwig
  2009-11-20 10:51                 ` Ingo Molnar
@ 2009-11-20 14:45                 ` Arjan van de Ven
  2009-11-20 16:05                   ` Jamie Lokier
  1 sibling, 1 reply; 39+ messages in thread
From: Arjan van de Ven @ 2009-11-20 14:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Kok, Auke, Frank Ch. Eigler, Jeff Garzik, Wu,
	Fengguang, linux-kernel, linux-fsdevel, Christoph Hellwig,
	Al Viro, Frederic Weisbecker

On Fri, 20 Nov 2009 05:43:35 -0500
Christoph Hellwig <hch@infradead.org> wrote:

> 
> Which btw brings up another good argument - to make the tracing really
> useful we need to have conventions.  While the inode number seems to
> be a realtively easy one printing the device is more difficult.  XFS
> just prints the raw major/minor to stay simple, ext4 has a
> complicated ad-hoc cache of device names, and this one just prints
> the superblock id string.

I was just trying to stay compatible with blockdump, and it even makes
sense ;)

> 
> Of course for a user the name is a lot more meaninful, but also
> relatively expensive to generate.  Then again I'm not even sure how
> the last pathname component only here is all that useful - it can't
> be used to easily find the file.

in my case it's not about finding the file, but finding the place in
the application that is doing the writing. The last pathname component
is more than enough for this....

> 


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-20 10:43               ` Christoph Hellwig
@ 2009-11-20 10:51                 ` Ingo Molnar
  2009-11-20 14:45                 ` Arjan van de Ven
  1 sibling, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-20 10:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kok, Auke, Frank Ch. Eigler, Arjan van de Ven, Jeff Garzik, Wu,
	Fengguang, linux-kernel, linux-fsdevel, Al Viro,
	Frederic Weisbecker


* Christoph Hellwig <hch@infradead.org> wrote:

> Guys, I think both the inode number and name do have a use case.  For 
> file system developers observing the filesystem the inode number is 
> very useful, and if you look at the ext4 tracing already in tree or 
> the xfs tracing going in in the next window they use the inode number 
> all over.
> 
> Which btw brings up another good argument - to make the tracing really 
> useful we need to have conventions.  While the inode number seems to 
> be a realtively easy one printing the device is more difficult.  XFS 
> just prints the raw major/minor to stay simple, ext4 has a complicated 
> ad-hoc cache of device names, and this one just prints the superblock 
> id string.

Agreed.

> Of course for a user the name is a lot more meaninful, but also 
> relatively expensive to generate.  Then again I'm not even sure how 
> the last pathname component only here is all that useful - it can't be 
> used to easily find the file.

That's not the main point though - the point is for app developers (and 
users) being able to see 'oh, _that_ file it is, we need to fix that'. 
In the context of a specific app, the last component filename carries 
95% of the useful information.

Look at how PowerTOP does it, for a real-life usecase.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-12  7:22             ` Ingo Molnar
@ 2009-11-20 10:43               ` Christoph Hellwig
  2009-11-20 10:51                 ` Ingo Molnar
  2009-11-20 14:45                 ` Arjan van de Ven
  0 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2009-11-20 10:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kok, Auke, Frank Ch. Eigler, Arjan van de Ven, Jeff Garzik, Wu,
	Fengguang, linux-kernel, linux-fsdevel, Christoph Hellwig,
	Al Viro, Frederic Weisbecker

Guys, I think both the inode number and name do have a use case.  For
file system developers observing the filesystem the inode number is very
useful, and if you look at the ext4 tracing already in tree or the xfs
tracing going in in the next window they use the inode number all over.

Which btw brings up another good argument - to make the tracing really
useful we need to have conventions.  While the inode number seems to be
a realtively easy one printing the device is more difficult.  XFS just
prints the raw major/minor to stay simple, ext4 has a complicated ad-hoc
cache of device names, and this one just prints the superblock id
string.

Of course for a user the name is a lot more meaninful, but also
relatively expensive to generate.  Then again I'm not even sure how the
last pathname component only here is all that useful - it can't be used
to easily find the file.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-15 19:00   ` Arjan van de Ven
@ 2009-11-16  0:56     ` Li Zefan
  0 siblings, 0 replies; 39+ messages in thread
From: Li Zefan @ 2009-11-16  0:56 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

Arjan van de Ven wrote:
> On Wed, 11 Nov 2009 10:33:55 +0800
> Li Zefan <lizf@cn.fujitsu.com> wrote:
>> This will leave __entry->comm, __entry->file and __entry->dev
>> uninitialized in the "else" case..
> 
> I would expect the struct to be zeroed before.... is it?
> (the TP_ stuff is so obscure I find it hard to figure out where it ends
> up)
> 

No, it won't. Ring buffer won't zero the buffer before returning it,
and TP_ stuff won't zero the buffer after getting it.

See static void ftrace_raw_event_##call(proto) in include/event/ftrace.h


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  2:33 ` Li Zefan
@ 2009-11-15 19:00   ` Arjan van de Ven
  2009-11-16  0:56     ` Li Zefan
  0 siblings, 1 reply; 39+ messages in thread
From: Arjan van de Ven @ 2009-11-15 19:00 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

On Wed, 11 Nov 2009 10:33:55 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:
> 
> This will leave __entry->comm, __entry->file and __entry->dev
> uninitialized in the "else" case..

I would expect the struct to be zeroed before.... is it?
(the TP_ stuff is so obscure I find it hard to figure out where it ends
up)


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11 23:37           ` Kok, Auke
@ 2009-11-12  7:22             ` Ingo Molnar
  2009-11-20 10:43               ` Christoph Hellwig
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2009-11-12  7:22 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Frank Ch. Eigler, Arjan van de Ven, Jeff Garzik, Wu, Fengguang,
	linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro,
	Frederic Weisbecker


* Kok, Auke <auke-jan.h.kok@intel.com> wrote:

> If you already know what the file object is, sure. We're interested in 
> the case where we have no clue what the file object actually is to 
> begin with. Having a trace with a random inode number pop up and then 
> disappear into thin air won't help much at all, especially if we can't 
> map it back to something "real" on disk. in time.

Yep.

It's similar to PID/comm tracing, which we already do consistently for 
all major task events such as fork/exit, sleep/wakeup/context-switch, 
etc.

By the 'use inode numbers' argument it should be perfectly fine to only 
trace the physical PID itself, and look up the comm later in /proc, or 
to add a syscall to do it.

In reality it's not fine. Not just the unnecessary overhead (you have to 
look up something you already had) - but also that tasks will exit in 
high-freq workloads (so the comm is lost), the PID might not match up 
anymore, tasks can change their comm, etc.

The most important principle with event logging is that we want the most 
high quality information and we want to a trustable and simple data 
source: so for tasks we want the PID and the comm, and for files we want 
the top name component and perhaps also the inode number (plus a 
filesystem id), captured when the event happened.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11 18:29         ` Theodore Tso
  2009-11-11 18:56           ` Ingo Molnar
  2009-11-11 18:56           ` Ingo Molnar
@ 2009-11-12  2:15           ` Arjan van de Ven
  2 siblings, 0 replies; 39+ messages in thread
From: Arjan van de Ven @ 2009-11-12  2:15 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Ingo Molnar, Jeff Garzik, Wu Fengguang, linux-kernel,
	linux-fsdevel, Christoph Hellwig, Al Viro, Frederic Weisbecker,
	auke-jan.h.kok

On Wed, 11 Nov 2009 13:29:25 -0500
Theodore Tso <tytso@mit.edu> wrote:
 
> If you really want to avoid that, one relatively lightweight thing we
> could do, which would avoid needing to dump the entire pathname out,
> would be to print out the triple (devno, dir_ino, file_ino), and then
> provide a privileged syscall which translates this to a user-visible
> pathname.  It won't be necessarily the pathname which the user used to
> open the file (since there might be links, and bind mounts, et. al),
> but if the goal is to give one of the user-friendly names of the inode
> (as opposed to _the_ pathname used to open the file), it's quite
> sufficient.


here's the problem. PowerTOP wants to tell you "application FOO caused
the disk to spin up. BAD APPLICATION." And then give a suggestion as to
which file in question was involved, to give the developer a hint as to
what to fix. Oh and this has to be done 30 seconds after the actual
dirty happened of course (since clearly we don't want to wake a cpu up
or interfere with the running system).

What the patch does is perfectly fine for that. Your "solution" to get a
filename 30 seconds later with debugfs involves spinning up the disk...
... exactly the kind of thing this trace point wants to avoid in the
first place! 
(and for those who say "yeah but the file may have been renamed". I
don't care! I want to show the filename that the app dirtied! Not what
name that file is right now, 30 seconds later. At least, I want to show
enough so that the developer of the app knows how to fix his pile of
power-eating-dung.

Fixing an OS this way (and yes I have used this patch to fix Moblin and
other partner linux OSes) saves around 0.5 Watts or so. Quite worth it 
in terms of battery life.

I would like to turn this around: The current patch is sufficient for my
real-world usecase. I am still looking for the "other" mythical usecase
that would need an inode number instead.... lets put it this way: if
one of those comes around it's trivial to extend the trace data in an
ABI compatible way to include that. Until then.. it's just bloat.





-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11 23:10         ` Frank Ch. Eigler
@ 2009-11-11 23:37           ` Kok, Auke
  2009-11-12  7:22             ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Kok, Auke @ 2009-11-11 23:37 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Arjan van de Ven, Jeff Garzik, Wu, Fengguang, linux-kernel,
	linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker

Frank Ch. Eigler wrote:
> Arjan van de Ven <arjan@infradead.org> writes:
> 
>> [...]  ok let me rephrase that. What would a user DO with this inode
>> number information, given that the filename is already passed on;
>> what additional useful, not-infinitely-hard-to-get piece of
>> information [...]
> 
> Perhaps "infinitely hard" is the wrong criterion, but passing inode
> numbers lets a tracepoint client track changes to the same file, even
> though the file may be renamed/unlinked over time.

If you already know what the file object is, sure. We're interested in the case
where we have no clue what the file object actually is to begin with. Having a
trace with a random inode number pop up and then disappear into thin air won't
help much at all, especially if we can't map it back to something "real" on disk.
in time.

Auke


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11 16:19       ` Arjan van de Ven
@ 2009-11-11 23:10         ` Frank Ch. Eigler
  2009-11-11 23:37           ` Kok, Auke
  0 siblings, 1 reply; 39+ messages in thread
From: Frank Ch. Eigler @ 2009-11-11 23:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jeff Garzik, Wu Fengguang, linux-kernel, linux-fsdevel,
	Christoph Hellwig, Al Viro, mingo, Frederic Weisbecker,
	auke-jan.h.kok

Arjan van de Ven <arjan@infradead.org> writes:

> [...]  ok let me rephrase that. What would a user DO with this inode
> number information, given that the filename is already passed on;
> what additional useful, not-infinitely-hard-to-get piece of
> information [...]

Perhaps "infinitely hard" is the wrong criterion, but passing inode
numbers lets a tracepoint client track changes to the same file, even
though the file may be renamed/unlinked over time.

- FChE

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11 18:29         ` Theodore Tso
@ 2009-11-11 18:56           ` Ingo Molnar
  2009-11-11 18:56           ` Ingo Molnar
  2009-11-12  2:15           ` Arjan van de Ven
  2 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-11 18:56 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Arjan van de Ven, Wu Fengguang,
	linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro,
	Frederic Weisbecker, auke-jan.h.kok


* Theodore Tso <tytso@mit.edu> wrote:

> On Wed, Nov 11, 2009 at 08:45:42AM +0100, Ingo Molnar wrote:
> > Without an inode->vfs-name lookup/matching service it's of limited 
> > utility though to developers and users. So inode numbers are fine (as 
> > nicely unique physical identifiers)- as long as corresponding vfs name 
> > string is available too.
> 
> Inode numbers are quite usable for me; but I'm not afraid to do
> 
>       debugfs /dev/sdb -R "ncheck 12345"
> 
> :-)
> 
> If you really want to avoid that, one relatively lightweight thing we 
> could do, which would avoid needing to dump the entire pathname out, 
> would be to print out the triple (devno, dir_ino, file_ino), and then 
> provide a privileged syscall which translates this to a user-visible 
> pathname.  It won't be necessarily the pathname which the user used to 
> open the file (since there might be links, and bind mounts, et. al), 
> but if the goal is to give one of the user-friendly names of the inode 
> (as opposed to _the_ pathname used to open the file), it's quite 
> sufficient.

Hm, why add a new syscall to retrieve the name we already had when the 
event happened?

Also, why add a new syscall to retrieve something that might not exist 
anymore? (the VFS namespace is quite dynamic - post-processing to 
retrieve names is fundamentally racy)

What matters most for analysis is the 'name of the moment' - the thing 
that the app used at that point.

Arjan isnt doing this just randomly, he's one of the few people trying 
to speed up Linux booting - and this is the info he finds useful. We 
should give that information in a reasonable way, and the tracepoint he 
proposed looks pretty reasonable.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11 18:29         ` Theodore Tso
  2009-11-11 18:56           ` Ingo Molnar
@ 2009-11-11 18:56           ` Ingo Molnar
  2009-11-12  2:15           ` Arjan van de Ven
  2 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-11 18:56 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Arjan van de Ven, Wu Fengguang,
	linux-kernel, linux-fsdevel


* Theodore Tso <tytso@mit.edu> wrote:

> On Wed, Nov 11, 2009 at 08:45:42AM +0100, Ingo Molnar wrote:
> > Without an inode->vfs-name lookup/matching service it's of limited 
> > utility though to developers and users. So inode numbers are fine (as 
> > nicely unique physical identifiers)- as long as corresponding vfs name 
> > string is available too.
> 
> Inode numbers are quite usable for me; but I'm not afraid to do
> 
>       debugfs /dev/sdb -R "ncheck 12345"
> 
> :-)
> 
> If you really want to avoid that, one relatively lightweight thing we 
> could do, which would avoid needing to dump the entire pathname out, 
> would be to print out the triple (devno, dir_ino, file_ino), and then 
> provide a privileged syscall which translates this to a user-visible 
> pathname.  It won't be necessarily the pathname which the user used to 
> open the file (since there might be links, and bind mounts, et. al), 
> but if the goal is to give one of the user-friendly names of the inode 
> (as opposed to _the_ pathname used to open the file), it's quite 
> sufficient.

Hm, why add a new syscall to retrieve the name we already had when the 
event happened?

Also, why add a new syscall to retrieve something that might not exist 
anymore? (the VFS namespace is quite dynamic - post-processing to 
retrieve names is fundamentally racy)

What matters most for analysis is the 'name of the moment' - the thing 
that the app used at that point.

Arjan isnt doing this just randomly, he's one of the few people trying 
to speed up Linux booting - and this is the info he finds useful. We 
should give that information in a reasonable way, and the tracepoint he 
proposed looks pretty reasonable.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  7:45       ` Ingo Molnar
  2009-11-11  7:56         ` Jeff Garzik
  2009-11-11 17:27         ` Kok, Auke
@ 2009-11-11 18:29         ` Theodore Tso
  2009-11-11 18:56           ` Ingo Molnar
                             ` (2 more replies)
  2 siblings, 3 replies; 39+ messages in thread
From: Theodore Tso @ 2009-11-11 18:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeff Garzik, Arjan van de Ven, Wu Fengguang, linux-kernel,
	linux-fsdevel, Christoph Hellwig, Al Viro, Frederic Weisbecker,
	auke-jan.h.kok

On Wed, Nov 11, 2009 at 08:45:42AM +0100, Ingo Molnar wrote:
> Without an inode->vfs-name lookup/matching service it's of limited 
> utility though to developers and users. So inode numbers are fine (as 
> nicely unique physical identifiers)- as long as corresponding vfs name 
> string is available too.

Inode numbers are quite usable for me; but I'm not afraid to do

      debugfs /dev/sdb -R "ncheck 12345"

:-)

If you really want to avoid that, one relatively lightweight thing we
could do, which would avoid needing to dump the entire pathname out,
would be to print out the triple (devno, dir_ino, file_ino), and then
provide a privileged syscall which translates this to a user-visible
pathname.  It won't be necessarily the pathname which the user used to
open the file (since there might be links, and bind mounts, et. al),
but if the goal is to give one of the user-friendly names of the inode
(as opposed to _the_ pathname used to open the file), it's quite sufficient.

    	       	     	      	      - Ted

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  7:45       ` Ingo Molnar
  2009-11-11  7:56         ` Jeff Garzik
@ 2009-11-11 17:27         ` Kok, Auke
  2009-11-11 18:29         ` Theodore Tso
  2 siblings, 0 replies; 39+ messages in thread
From: Kok, Auke @ 2009-11-11 17:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeff Garzik, Arjan van de Ven, Wu, Fengguang, linux-kernel,
	linux-fsdevel, Christoph Hellwig, Al Viro, Frederic Weisbecker

Ingo Molnar wrote:
> * Jeff Garzik <jeff@garzik.org> wrote:
> 
>> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
>>> Wu Fengguang<fengguang.wu@intel.com>  wrote:
>>>> Maybe this is enough for POWERTOP, however for general use, the dirty
>>>> type(data/metadata) and inode number may be valuable to some users?
>>> what can a user do with an inode number????
>> Inode numbers have always been visible to userspace...  IIRC, tar(1) 
>> uses the st_ino member of struct stat to detect hard links in certain 
>> cases.  ls(1) displays inode numbers with -i, find(1) looks for them 
>> with -inum, ...
> 
> Without an inode->vfs-name lookup/matching service it's of limited 
> utility though to developers and users. So inode numbers are fine (as 
> nicely unique physical identifiers)- as long as corresponding vfs name 
> string is available too.

agreed, this is the second problem I've stumbled upon (readahead was the first)
that could use something like this.

you can't reliably use inode numbers unless you actually know all of them at the
time that the tracepoint is generated: they could be deleted, modified etc. An
inode number is nice, but the filename and blockdev info would be superior for
both problems we're trying to solve.

Auke



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  7:42     ` Jeff Garzik
  2009-11-11  7:45       ` Ingo Molnar
@ 2009-11-11 16:19       ` Arjan van de Ven
  2009-11-11 23:10         ` Frank Ch. Eigler
  1 sibling, 1 reply; 39+ messages in thread
From: Arjan van de Ven @ 2009-11-11 16:19 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Wu Fengguang, linux-kernel, linux-fsdevel, Christoph Hellwig,
	Al Viro, mingo, Frederic Weisbecker, auke-jan.h.kok

On Wed, 11 Nov 2009 02:42:39 -0500
Jeff Garzik <jeff@garzik.org> wrote:

> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> > Wu Fengguang<fengguang.wu@intel.com>  wrote:
> >> Maybe this is enough for POWERTOP, however for general use, the
> >> dirty type(data/metadata) and inode number may be valuable to some
> >> users?
> >
> > what can a user do with an inode number????
> 
> Inode numbers have always been visible to userspace...  IIRC, tar(1) 
> uses the st_ino member of struct stat to detect hard links in certain 
> cases.  ls(1) displays inode numbers with -i, find(1) looks for them 
> with -inum, ...

ok let me rephrase that. What would a user DO with this inode number
information, given that the filename is already passed on;
what additional useful, not-infinitely-hard-to-get piece of information
does this give the user that he can use to do <something> that he cannot
do with the current information ? E.g. what is that <something> ?



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  7:56         ` Jeff Garzik
@ 2009-11-11 11:15           ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-11 11:15 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Arjan van de Ven, Wu Fengguang, linux-kernel, linux-fsdevel,
	Christoph Hellwig, Al Viro, Frederic Weisbecker, auke-jan.h.kok


* Jeff Garzik <jeff@garzik.org> wrote:

> On 11/11/2009 02:45 AM, Ingo Molnar wrote:
> >
> >* Jeff Garzik<jeff@garzik.org>  wrote:
> >
> >>On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> >>>Wu Fengguang<fengguang.wu@intel.com>   wrote:
> >>>>Maybe this is enough for POWERTOP, however for general use, the dirty
> >>>>type(data/metadata) and inode number may be valuable to some users?
> >>>
> >>>what can a user do with an inode number????
> >>
> >>Inode numbers have always been visible to userspace...  IIRC, tar(1)
> >>uses the st_ino member of struct stat to detect hard links in certain
> >>cases.  ls(1) displays inode numbers with -i, find(1) looks for them
> >>with -inum, ...
> >
> >Without an inode->vfs-name lookup/matching service it's of limited
> >utility
> 
> Look in the quoted text for one such service...  :)

I'm not sure i understand your point - do you really suggest using 
find(1) to make traces readable?

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  7:45       ` Ingo Molnar
@ 2009-11-11  7:56         ` Jeff Garzik
  2009-11-11 11:15           ` Ingo Molnar
  2009-11-11 17:27         ` Kok, Auke
  2009-11-11 18:29         ` Theodore Tso
  2 siblings, 1 reply; 39+ messages in thread
From: Jeff Garzik @ 2009-11-11  7:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Wu Fengguang, linux-kernel, linux-fsdevel,
	Christoph Hellwig, Al Viro, Frederic Weisbecker, auke-jan.h.kok

On 11/11/2009 02:45 AM, Ingo Molnar wrote:
>
> * Jeff Garzik<jeff@garzik.org>  wrote:
>
>> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
>>> Wu Fengguang<fengguang.wu@intel.com>   wrote:
>>>> Maybe this is enough for POWERTOP, however for general use, the dirty
>>>> type(data/metadata) and inode number may be valuable to some users?
>>>
>>> what can a user do with an inode number????
>>
>> Inode numbers have always been visible to userspace...  IIRC, tar(1)
>> uses the st_ino member of struct stat to detect hard links in certain
>> cases.  ls(1) displays inode numbers with -i, find(1) looks for them
>> with -inum, ...
>
> Without an inode->vfs-name lookup/matching service it's of limited
> utility

Look in the quoted text for one such service...  :)

	Jeff



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  7:42     ` Jeff Garzik
@ 2009-11-11  7:45       ` Ingo Molnar
  2009-11-11  7:56         ` Jeff Garzik
                           ` (2 more replies)
  2009-11-11 16:19       ` Arjan van de Ven
  1 sibling, 3 replies; 39+ messages in thread
From: Ingo Molnar @ 2009-11-11  7:45 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Arjan van de Ven, Wu Fengguang, linux-kernel, linux-fsdevel,
	Christoph Hellwig, Al Viro, Frederic Weisbecker, auke-jan.h.kok


* Jeff Garzik <jeff@garzik.org> wrote:

> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> >Wu Fengguang<fengguang.wu@intel.com>  wrote:
> >>Maybe this is enough for POWERTOP, however for general use, the dirty
> >>type(data/metadata) and inode number may be valuable to some users?
> >
> >what can a user do with an inode number????
> 
> Inode numbers have always been visible to userspace...  IIRC, tar(1) 
> uses the st_ino member of struct stat to detect hard links in certain 
> cases.  ls(1) displays inode numbers with -i, find(1) looks for them 
> with -inum, ...

Without an inode->vfs-name lookup/matching service it's of limited 
utility though to developers and users. So inode numbers are fine (as 
nicely unique physical identifiers)- as long as corresponding vfs name 
string is available too.

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  6:34   ` Arjan van de Ven
  2009-11-11  6:40     ` Wu Fengguang
@ 2009-11-11  7:42     ` Jeff Garzik
  2009-11-11  7:45       ` Ingo Molnar
  2009-11-11 16:19       ` Arjan van de Ven
  1 sibling, 2 replies; 39+ messages in thread
From: Jeff Garzik @ 2009-11-11  7:42 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Wu Fengguang, linux-kernel, linux-fsdevel, Christoph Hellwig,
	Al Viro, mingo, Frederic Weisbecker, auke-jan.h.kok

On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> Wu Fengguang<fengguang.wu@intel.com>  wrote:
>> Maybe this is enough for POWERTOP, however for general use, the dirty
>> type(data/metadata) and inode number may be valuable to some users?
>
> what can a user do with an inode number????

Inode numbers have always been visible to userspace...  IIRC, tar(1) 
uses the st_ino member of struct stat to detect hard links in certain 
cases.  ls(1) displays inode numbers with -i, find(1) looks for them 
with -inum, ...

	Jeff



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  6:34   ` Arjan van de Ven
@ 2009-11-11  6:40     ` Wu Fengguang
  2009-11-11  7:42     ` Jeff Garzik
  1 sibling, 0 replies; 39+ messages in thread
From: Wu Fengguang @ 2009-11-11  6:40 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, Kok, Auke-jan H

On Wed, Nov 11, 2009 at 02:34:56PM +0800, Arjan van de Ven wrote:
> On Wed, 11 Nov 2009 10:01:08 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Hi Arjan,
> > 
> > > +	TP_printk("task=%i (%s) file=%s dev=%s",
> > > +		__entry->pid, __entry->comm, __entry->file,
> > > __entry->dev)
> > 
> > Maybe this is enough for POWERTOP, however for general use, the dirty
> > type(data/metadata) and inode number may be valuable to some users?
> 
> what can a user do with an inode number????

Hmm, maybe to tell one file from another in the trace stream :)
And maybe to compare it with other data sources, like "ls -i".

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-11-11  2:01 ` Wu Fengguang
@ 2009-11-11  6:34   ` Arjan van de Ven
  2009-11-11  6:40     ` Wu Fengguang
  2009-11-11  7:42     ` Jeff Garzik
  0 siblings, 2 replies; 39+ messages in thread
From: Arjan van de Ven @ 2009-11-11  6:34 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

On Wed, 11 Nov 2009 10:01:08 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Hi Arjan,
> 
> > +	TP_printk("task=%i (%s) file=%s dev=%s",
> > +		__entry->pid, __entry->comm, __entry->file,
> > __entry->dev)
> 
> Maybe this is enough for POWERTOP, however for general use, the dirty
> type(data/metadata) and inode number may be valuable to some users?

what can a user do with an inode number????

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-10-26  5:53 Arjan van de Ven
                   ` (2 preceding siblings ...)
  2009-11-11  2:01 ` Wu Fengguang
@ 2009-11-11  2:33 ` Li Zefan
  2009-11-15 19:00   ` Arjan van de Ven
  3 siblings, 1 reply; 39+ messages in thread
From: Li Zefan @ 2009-11-11  2:33 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

> +/*
> + * Tracepoint for dirtying an inode:
> + */
> +TRACE_EVENT(dirty_inode,
> +
> +	TP_PROTO(struct inode *inode, struct task_struct *task),
> +
> +	TP_ARGS(inode, task),
> +
> +	TP_STRUCT__entry(
> +		__array( char,	comm,	TASK_COMM_LEN	)
> +		__field( pid_t,	pid			)
> +		__array( char,  dev,    16		)
> +		__array( char,  file,   32		)
> +	),
> +
> +	TP_fast_assign(
> +		if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
> +                	struct dentry *dentry;
> +                	const char *name = "?";
> +
> +			dentry = d_find_alias(inode);
> +			if (dentry) {
> +				spin_lock(&dentry->d_lock);
> +				name = (const char *) dentry->d_name.name;
> +			}
> +
> +			memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
> +			__entry->pid = task->pid;
> +			strlcpy(__entry->file, name, 32);
> +			strlcpy(__entry->dev, inode->i_sb->s_id, 16);
> +
> +			if (dentry) {
> +				spin_unlock(&dentry->d_lock);
> +				dput(dentry);
> +			}
> +		}

This will leave __entry->comm, __entry->file and __entry->dev
uninitialized in the "else" case..

And is there any reason that we have to use __array() but can't
use __string()?

> +	),
> +
> +	TP_printk("task=%i (%s) file=%s dev=%s",
> +		__entry->pid, __entry->comm, __entry->file, __entry->dev)
> +);
> +
> +#endif /* _TRACE_VFS_H */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-10-26  5:53 Arjan van de Ven
  2009-10-26  6:03 ` Andrew Morton
  2009-10-27 16:01 ` Jason Baron
@ 2009-11-11  2:01 ` Wu Fengguang
  2009-11-11  6:34   ` Arjan van de Ven
  2009-11-11  2:33 ` Li Zefan
  3 siblings, 1 reply; 39+ messages in thread
From: Wu Fengguang @ 2009-11-11  2:01 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

Hi Arjan,

> +	TP_printk("task=%i (%s) file=%s dev=%s",
> +		__entry->pid, __entry->comm, __entry->file, __entry->dev)

Maybe this is enough for POWERTOP, however for general use, the dirty
type(data/metadata) and inode number may be valuable to some users?

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-10-26  5:53 Arjan van de Ven
  2009-10-26  6:03 ` Andrew Morton
@ 2009-10-27 16:01 ` Jason Baron
  2009-11-11  2:01 ` Wu Fengguang
  2009-11-11  2:33 ` Li Zefan
  3 siblings, 0 replies; 39+ messages in thread
From: Jason Baron @ 2009-10-27 16:01 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

On Sun, Oct 25, 2009 at 10:53:42PM -0700, Arjan van de Ven wrote:
> diff --git a/include/trace/events/vfs.h b/include/trace/events/vfs.h
> new file mode 100644
> index 0000000..21cf9fb
> --- /dev/null
> +++ b/include/trace/events/vfs.h
> @@ -0,0 +1,53 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM vfs
> +
> +#if !defined(_TRACE_VFS_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_VFS_H
> +
> +/*
> + * Tracepoint for dirtying an inode:
> + */

hi,

I would like to see us add docbook style comments for each added trace
event. I've already started a docbook for this, and written comments for
the irq tracepoints, in include/trace/events/irq.h. The docbook can be
viewed at: http://www.kernel.org/doc/htmldocs/tracepoint/

I think there has been some talk that the documentation should be
included as part of the kernel, so that 'perf' can easily access this. I
don't see how these docbook style comments wouldn't help with that
effort as well. We already have scripts to parse these comments, and it
probably wouldn't be hard to extend them to create a binary version, if
needed. Further, as your above comment indicates, these types of
comments want to be included in the source code above the TRACE_EVENTS
macros.

The initial lkml postings were:

1) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02647.html
2) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02650.html
3) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02648.html
4) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02651.html

thanks,

-Jason


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-10-26  6:03 ` Andrew Morton
@ 2009-10-26  6:55   ` Arjan van de Ven
  0 siblings, 0 replies; 39+ messages in thread
From: Arjan van de Ven @ 2009-10-26  6:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

On Sun, 25 Oct 2009 23:03:21 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> > ...
> >
> > @@ -1071,6 +1072,8 @@ void __mark_inode_dirty(struct inode *inode,
> > int flags) if ((inode->i_state & flags) == flags)
> >  		return;
> >  
> > +	trace_dirty_inode(inode, current);
> > +
> >  	if (unlikely(block_dump))
> >  		block_dump___mark_inode_dirty(inode);
> >  
> 
> Doesn't powertop also want to know who is spinning up the disk via
> buffered reads, direct-io reads and direct-io writes?
> 
> That's why the block_dump hook in submit_bio() is there.

there is already an existing trace event for this in the block layer...
I was planning to use block_bio_queue for this.

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
  2009-10-26  5:53 Arjan van de Ven
@ 2009-10-26  6:03 ` Andrew Morton
  2009-10-26  6:55   ` Arjan van de Ven
  2009-10-27 16:01 ` Jason Baron
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Andrew Morton @ 2009-10-26  6:03 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

On Sun, 25 Oct 2009 22:53:42 -0700 Arjan van de Ven <arjan@infradead.org> wrote:

> >From b894af8a33bec621dd1a4126603a3ca372bf0643 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Sun, 25 Oct 2009 15:37:04 -0700
> Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
> 
> PowerTOP would like to be able to show who is keeping the disk
> busy by dirtying data. The most logical spot for this is in the vfs
> in the mark_inode_dirty() function. Doing this on the block level
> is not possible because by the time the IO hits the block layer the
> guilty party can no longer be found ("kjournald" and "pdflush" are not
> useful answers to "who caused this file to be dirty).
> 
> The trace point follows the same logic/style as the block_dump code
> and pretty much dumps the same data, just not to dmesg (and thus to
> /var/log/messages) but via the trace events streams.
> 
> ...
>
> @@ -1071,6 +1072,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  	if ((inode->i_state & flags) == flags)
>  		return;
>  
> +	trace_dirty_inode(inode, current);
> +
>  	if (unlikely(block_dump))
>  		block_dump___mark_inode_dirty(inode);
>  

Doesn't powertop also want to know who is spinning up the disk via
buffered reads, direct-io reads and direct-io writes?

That's why the block_dump hook in submit_bio() is there.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] vfs: Add a trace point in the mark_inode_dirty function
@ 2009-10-26  5:53 Arjan van de Ven
  2009-10-26  6:03 ` Andrew Morton
                   ` (3 more replies)
  0 siblings, 4 replies; 39+ messages in thread
From: Arjan van de Ven @ 2009-10-26  5:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, Christoph Hellwig, Al Viro, mingo,
	Frederic Weisbecker, auke-jan.h.kok

>From b894af8a33bec621dd1a4126603a3ca372bf0643 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 25 Oct 2009 15:37:04 -0700
Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

PowerTOP would like to be able to show who is keeping the disk
busy by dirtying data. The most logical spot for this is in the vfs
in the mark_inode_dirty() function. Doing this on the block level
is not possible because by the time the IO hits the block layer the
guilty party can no longer be found ("kjournald" and "pdflush" are not
useful answers to "who caused this file to be dirty).

The trace point follows the same logic/style as the block_dump code
and pretty much dumps the same data, just not to dmesg (and thus to
/var/log/messages) but via the trace events streams.

Signed-of-by: Arjan van de Ven <arjan@linux.intel.com>
---
 fs/fs-writeback.c          |    3 ++
 fs/inode.c                 |    4 +++
 include/trace/events/vfs.h |   53 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/vfs.h

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 9d5360c..4102f20 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -25,6 +25,7 @@
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/buffer_head.h>
+#include <trace/events/vfs.h>
 #include "internal.h"
 
 #define inode_to_bdi(inode)	((inode)->i_mapping->backing_dev_info)
@@ -1071,6 +1072,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	if ((inode->i_state & flags) == flags)
 		return;
 
+	trace_dirty_inode(inode, current);
+
 	if (unlikely(block_dump))
 		block_dump___mark_inode_dirty(inode);
 
diff --git a/fs/inode.c b/fs/inode.c
index 4d8e3be..a61e8ba 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1624,3 +1624,7 @@ void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
 				  inode->i_ino);
 }
 EXPORT_SYMBOL(init_special_inode);
+
+#define CREATE_TRACE_POINTS  
+#include <trace/events/vfs.h>
+
diff --git a/include/trace/events/vfs.h b/include/trace/events/vfs.h
new file mode 100644
index 0000000..21cf9fb
--- /dev/null
+++ b/include/trace/events/vfs.h
@@ -0,0 +1,53 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vfs
+
+#if !defined(_TRACE_VFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VFS_H
+
+/*
+ * Tracepoint for dirtying an inode:
+ */
+TRACE_EVENT(dirty_inode,
+
+	TP_PROTO(struct inode *inode, struct task_struct *task),
+
+	TP_ARGS(inode, task),
+
+	TP_STRUCT__entry(
+		__array( char,	comm,	TASK_COMM_LEN	)
+		__field( pid_t,	pid			)
+		__array( char,  dev,    16		)
+		__array( char,  file,   32		)
+	),
+
+	TP_fast_assign(
+		if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
+                	struct dentry *dentry;
+                	const char *name = "?";
+
+			dentry = d_find_alias(inode);
+			if (dentry) {
+				spin_lock(&dentry->d_lock);
+				name = (const char *) dentry->d_name.name;
+			}
+
+			memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+			__entry->pid = task->pid;
+			strlcpy(__entry->file, name, 32);
+			strlcpy(__entry->dev, inode->i_sb->s_id, 16);
+
+			if (dentry) {
+				spin_unlock(&dentry->d_lock);
+				dput(dentry);
+			}
+		}
+	),
+
+	TP_printk("task=%i (%s) file=%s dev=%s",
+		__entry->pid, __entry->comm, __entry->file, __entry->dev)
+);
+
+#endif /* _TRACE_VFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
1.6.2.5



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2010-11-30  0:37 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-26 20:56 [PATCH] vfs: Add a trace point in the mark_inode_dirty function Arjan van de Ven
2010-11-28 17:52 ` Christoph Hellwig
2010-11-28 18:43   ` Arjan van de Ven
2010-11-29  1:41     ` KOSAKI Motohiro
2010-11-29  4:54       ` Arjan van de Ven
2010-11-29  5:15         ` KOSAKI Motohiro
2010-11-29 14:21           ` Steven Rostedt
2010-11-29 14:41             ` Mathieu Desnoyers
2010-11-29 16:31               ` Steven Rostedt
2010-11-30  0:37             ` KOSAKI Motohiro
2010-11-29  3:53     ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2009-10-26  5:53 Arjan van de Ven
2009-10-26  6:03 ` Andrew Morton
2009-10-26  6:55   ` Arjan van de Ven
2009-10-27 16:01 ` Jason Baron
2009-11-11  2:01 ` Wu Fengguang
2009-11-11  6:34   ` Arjan van de Ven
2009-11-11  6:40     ` Wu Fengguang
2009-11-11  7:42     ` Jeff Garzik
2009-11-11  7:45       ` Ingo Molnar
2009-11-11  7:56         ` Jeff Garzik
2009-11-11 11:15           ` Ingo Molnar
2009-11-11 17:27         ` Kok, Auke
2009-11-11 18:29         ` Theodore Tso
2009-11-11 18:56           ` Ingo Molnar
2009-11-11 18:56           ` Ingo Molnar
2009-11-12  2:15           ` Arjan van de Ven
2009-11-11 16:19       ` Arjan van de Ven
2009-11-11 23:10         ` Frank Ch. Eigler
2009-11-11 23:37           ` Kok, Auke
2009-11-12  7:22             ` Ingo Molnar
2009-11-20 10:43               ` Christoph Hellwig
2009-11-20 10:51                 ` Ingo Molnar
2009-11-20 14:45                 ` Arjan van de Ven
2009-11-20 16:05                   ` Jamie Lokier
2009-11-20 16:45                     ` Arjan van de Ven
2009-11-11  2:33 ` Li Zefan
2009-11-15 19:00   ` Arjan van de Ven
2009-11-16  0:56     ` Li Zefan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.