All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] writeback: Tracing support V2
@ 2010-07-06 23:41 Dave Chinner
  2010-07-06 23:41 ` [PATCH 1/3] writeback: Initial tracing support Dave Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Dave Chinner @ 2010-07-06 23:41 UTC (permalink / raw)
  To: axboe; +Cc: linux-fsdevel

Hi Jens, 

This series is an update of the writeback tracing that I'd previously posted
against mainline. This version is based on the block/for-2.6.36 branch.

The tracing provides visibility into the behaviour of the writeback path and is
sufficient to understand questions such as what process is writing data, what
process is blocking on congestion, etc. It also allows view the differences in
behaviour for different sync modes and calling contexts.

This series is based on the block/for-2.6.36 branch.

Output (slighty trimmed) for a background flush of a clean bdi is as follows:

flush-8:0-[...]: wbc_writeback_start: bdi 8:0: towrt=1024 skip=0 mode=0 kupd=1 bgrd=0 reclm=0 cyclic=1 more=0 older=0xffff5d8e start=0x0 end=0x0
flush-8:0-[...]: wbc_writeback_written: bdi 8:0: towrt=1024 skip=0 mode=0 kupd=1 bgrd=0 reclm=0 cyclic=1 more=0 older=0xffff5d8e start=0x0 end=0x0
flush-8:0-[...]: writeback_pages_written: 0

Here we see that we are doing writeback for kupdate in cyclic mode with inodes
that are older than 0xffff5d8e. The filesystem is clean, so there is nothing
to write...

Output (slighty trimmed) for a sync(1) based flush of a clean bdi is as follows:

     sync-[...]: wb_sched_queue: bdi 8:0: sb_dev 0:0 nr_pages=0 sync_mode=0 kupdate=0 range_cyclic=0 background=0
     sync-[...]: wb_sched_task: bdi 8:0: sb_dev 0:0 nr_pages=0 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: wb_sched_exec: bdi 8:0: sb_dev 0:0 nr_pages=0 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: writeback_pages_written: 0
     sync-[...]: wb_sched_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
     sync-[...]: wb_sched_task: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: wb_sched_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: wbc_writeback_start: bdi 8:0: towrt=1024 skip=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
flush-8:0-[...]: wbc_writeback_written: bdi 8:0: towrt=1024 skip=0 mode=0 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
flush-8:0-[...]: writeback_pages_written: 0
     sync-[...]: wb_sched_queue: bdi 8:0: sb_dev 8:1 nr_pages=9223372036854775807 sync_mode=1 kupdate=0 range_cyclic=0 background=0
     sync-[...]: wb_sched_task: bdi 8:0: sb_dev 8:1 nr_pages=9223372036854775807 sync_mode=1 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: wb_sched_exec: bdi 8:0: sb_dev 8:1 nr_pages=9223372036854775807 sync_mode=1 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: wbc_writeback_start: bdi 8:0: wbc=ffff88011cb4bd20 towrt=1024 skip=0 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
flush-8:0-[...]: wbc_writeback_written: bdi 8:0: wbc=ffff88011cb4bd20 towrt=1024 skip=0 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x0 end=0x7fffffffffffffff
flush-8:0-[...]: writeback_pages_written: 0

What we can see here is that:

	- sync queued an async flush to the entire bdi
	- the bdi flusher thread wrote no pages (i.e. clean)
	- sync has issued an async flush of superblock 8:1 for 7712 pages
	- the bdi flush ran writeback on the superblock, wrote no pages
	- sync queued a synchronous flush of the superblock
	- the bdi flush ran writeback on the superblock, wrote no pages

---
Version 2:
- rewrite of initial patch from previous 2.6.34 based code to match new bdi
  scheduling code as part of rebasing onto the block/for-2.6.36 tree.
- removed work structure address from output as it was mostly redundant and
  took up lots of space in the output.
- removed writeback control structure address from output as it was mostly
  redundant and took up lots of space in the output.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] writeback: Initial tracing support
  2010-07-06 23:41 [PATCH 0/3] writeback: Tracing support V2 Dave Chinner
@ 2010-07-06 23:41 ` Dave Chinner
  2010-07-07  2:19   ` Christoph Hellwig
  2010-07-06 23:41 ` [PATCH 2/3] writeback: Add tracing to balance_dirty_pages Dave Chinner
  2010-07-06 23:41 ` [PATCH 3/3] writeback: Add tracing to write_cache_pages Dave Chinner
  2 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-07-06 23:41 UTC (permalink / raw)
  To: axboe; +Cc: linux-fsdevel

Trace queue/sched/exec parts of the writeback loop. This provides
insight into when and why flusher threads are scheduled to run. e.g
a sync invocation leaves traces like:

     sync-[...]: wb_sched_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
     sync-[...]: wb_sched_task: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: wb_sched_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0

This also lays the foundation for adding more writeback tracing to
provide deeper insight into the whole writeback path.

The original tracing code is from Jens Axboe, though this version is
a rewrite as a result of the code being traced changing
significantly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c                |   43 ++++++++++++++----
 include/trace/events/writeback.h |   93 ++++++++++++++++++++++++++++++++++++++
 mm/backing-dev.c                 |    3 +
 3 files changed, 130 insertions(+), 9 deletions(-)
 create mode 100644 include/trace/events/writeback.h

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c8471b3..767898b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -26,15 +26,9 @@
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/buffer_head.h>
+#include <linux/tracepoint.h>
 #include "internal.h"
 
-#define inode_to_bdi(inode)	((inode)->i_mapping->backing_dev_info)
-
-/*
- * We don't actually have pdflush, but this one is exported though /proc...
- */
-int nr_pdflush_threads;
-
 /*
  * Passed into wb_writeback(), essentially a subset of writeback_control
  */
@@ -50,6 +44,21 @@ struct wb_writeback_work {
 	struct completion *done;	/* set if the caller waits */
 };
 
+/*
+ * Include the creation of the trace points after defining the
+ * wb_writeback_work structure so that the definition remains local to this
+ * file.
+ */
+#define CREATE_TRACE_POINTS
+#include <trace/events/writeback.h>
+
+#define inode_to_bdi(inode)	((inode)->i_mapping->backing_dev_info)
+
+/*
+ * We don't actually have pdflush, but this one is exported though /proc...
+ */
+int nr_pdflush_threads;
+
 /**
  * writeback_in_progress - determine whether there is writeback in progress
  * @bdi: the device's backing_dev_info structure.
@@ -65,6 +74,8 @@ int writeback_in_progress(struct backing_dev_info *bdi)
 static void bdi_queue_work(struct backing_dev_info *bdi,
 		struct wb_writeback_work *work)
 {
+	trace_wb_sched_queue(bdi, work);
+
 	spin_lock(&bdi->wb_lock);
 	list_add_tail(&work->list, &bdi->work_list);
 	spin_unlock(&bdi->wb_lock);
@@ -74,12 +85,16 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
 	 * it gets created and wakes up, we'll run this work.
 	 */
 	if (unlikely(!bdi->wb.task)) {
+		trace_wb_sched_default(bdi, work);
 		wake_up_process(default_backing_dev_info.wb.task);
 	} else {
 		struct bdi_writeback *wb = &bdi->wb;
 
-		if (wb->task)
+		if (wb->task) {
+			trace_wb_sched_task(bdi, work);
 			wake_up_process(wb->task);
+		} else
+			trace_wb_sched_notask(bdi, work);
 	}
 }
 
@@ -95,8 +110,10 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 	 */
 	work = kzalloc(sizeof(*work), GFP_ATOMIC);
 	if (!work) {
-		if (bdi->wb.task)
+		if (bdi->wb.task) {
+			trace_wb_sched_nowork(bdi, work);
 			wake_up_process(bdi->wb.task);
+		}
 		return;
 	}
 
@@ -751,6 +768,8 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 		if (force_wait)
 			work->sync_mode = WB_SYNC_ALL;
 
+		trace_wb_sched_exec(bdi, work);
+
 		wrote += wb_writeback(wb, work);
 
 		/*
@@ -805,9 +824,13 @@ int bdi_writeback_thread(void *data)
 	smp_mb__after_clear_bit();
 	wake_up_bit(&bdi->state, BDI_pending);
 
+	trace_wb_thread_start(bdi);
+
 	while (!kthread_should_stop()) {
 		pages_written = wb_do_writeback(wb, 0);
 
+		trace_writeback_pages_written(pages_written);
+
 		if (pages_written)
 			last_active = jiffies;
 		else if (wait_jiffies != -1UL) {
@@ -845,6 +868,8 @@ int bdi_writeback_thread(void *data)
 	 */
 	if (!list_empty(&bdi->work_list))
 		wb_do_writeback(wb, 1);
+
+	trace_wb_thread_stop(bdi);
 	return 0;
 }
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
new file mode 100644
index 0000000..d7b6f41
--- /dev/null
+++ b/include/trace/events/writeback.h
@@ -0,0 +1,93 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM writeback
+
+#if !defined(_TRACE_WRITEBACK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_WRITEBACK_H
+
+#include <linux/backing-dev.h>
+#include <linux/writeback.h>
+
+struct wb_writeback_work;
+
+DECLARE_EVENT_CLASS(wb_sched_class,
+	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work),
+	TP_ARGS(bdi, work),
+	TP_STRUCT__entry(
+		__array(char, name, 32)
+		__field(long, nr_pages)
+		__field(dev_t, sb_dev)
+		__field(int, sync_mode)
+		__field(int, for_kupdate)
+		__field(int, range_cyclic)
+		__field(int, for_background)
+	),
+	TP_fast_assign(
+		strncpy(__entry->name, dev_name(bdi->dev), 32);
+		__entry->nr_pages = work ? work->nr_pages : 0;
+		__entry->sb_dev = work ? work->sb ? work->sb->s_dev : 0 : 0;
+		__entry->sync_mode = work ? work->sync_mode : -1;
+		__entry->for_kupdate = work ? work->for_kupdate : -1;
+		__entry->range_cyclic = work ? work->range_cyclic : -1;
+		__entry->for_background	= work ? work->for_background : -1;
+	),
+	TP_printk("bdi %s: sb_dev %d:%d nr_pages=%ld sync_mode=%d "
+		  "kupdate=%d range_cyclic=%d background=%d",
+		  __entry->name,
+		  MAJOR(__entry->sb_dev), MINOR(__entry->sb_dev),
+		  __entry->nr_pages,
+		  __entry->sync_mode,
+		  __entry->for_kupdate,
+		  __entry->range_cyclic,
+		  __entry->for_background
+	)
+);
+#define DEFINE_WB_SCHED_EVENT(name) \
+DEFINE_EVENT(wb_sched_class, name, \
+	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work), \
+	TP_ARGS(bdi, work))
+DEFINE_WB_SCHED_EVENT(wb_sched_default);
+DEFINE_WB_SCHED_EVENT(wb_sched_queue);
+DEFINE_WB_SCHED_EVENT(wb_sched_task);
+DEFINE_WB_SCHED_EVENT(wb_sched_notask);
+DEFINE_WB_SCHED_EVENT(wb_sched_nowork);
+DEFINE_WB_SCHED_EVENT(wb_sched_exec);
+
+TRACE_EVENT(writeback_pages_written,
+	TP_PROTO(long pages_written),
+	TP_ARGS(pages_written),
+	TP_STRUCT__entry(
+		__field(long,		pages)
+	),
+	TP_fast_assign(
+		__entry->pages		= pages_written;
+	),
+	TP_printk("%ld", __entry->pages)
+);
+
+DECLARE_EVENT_CLASS(wb_thread_class,
+	TP_PROTO(struct backing_dev_info *bdi),
+	TP_ARGS(bdi),
+	TP_STRUCT__entry(
+		__array(char, name, 32)
+	),
+	TP_fast_assign(
+		strncpy(__entry->name, dev_name(bdi->dev), 32);
+	),
+	TP_printk("bdi %s",
+		  __entry->name
+	)
+);
+#define DEFINE_WB_THREAD_EVENT(name) \
+DEFINE_EVENT(wb_thread_class, name, \
+	TP_PROTO(struct backing_dev_info *bdi), \
+	TP_ARGS(bdi))
+
+DEFINE_WB_THREAD_EVENT(wb_bdi_register);
+DEFINE_WB_THREAD_EVENT(wb_bdi_unregister);
+DEFINE_WB_THREAD_EVENT(wb_thread_start);
+DEFINE_WB_THREAD_EVENT(wb_thread_stop);
+
+#endif /* _TRACE_WRITEBACK_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index bceac64..e0411a6 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/writeback.h>
 #include <linux/device.h>
+#include <trace/events/writeback.h>
 
 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
 
@@ -518,6 +519,7 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 
 	bdi_debug_register(bdi, dev_name(dev));
 	set_bit(BDI_registered, &bdi->state);
+	trace_wb_bdi_register(bdi);
 exit:
 	return ret;
 }
@@ -578,6 +580,7 @@ static void bdi_prune_sb(struct backing_dev_info *bdi)
 void bdi_unregister(struct backing_dev_info *bdi)
 {
 	if (bdi->dev) {
+		trace_wb_bdi_unregister(bdi);
 		bdi_prune_sb(bdi);
 
 		if (!bdi_cap_flush_forker(bdi))
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] writeback: Add tracing to balance_dirty_pages
  2010-07-06 23:41 [PATCH 0/3] writeback: Tracing support V2 Dave Chinner
  2010-07-06 23:41 ` [PATCH 1/3] writeback: Initial tracing support Dave Chinner
@ 2010-07-06 23:41 ` Dave Chinner
  2010-07-07  2:20   ` Christoph Hellwig
  2010-07-06 23:41 ` [PATCH 3/3] writeback: Add tracing to write_cache_pages Dave Chinner
  2 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-07-06 23:41 UTC (permalink / raw)
  To: axboe; +Cc: linux-fsdevel

From: Dave Chinner <dchinner@redhat.com>

Tracing high level background writeback events is good, but it doesn't
give the entire picture. Add visibility into write throttling to catch IO
dispatched by foreground throttling of processing dirtying lots of pages.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c                |    5 +++
 include/trace/events/writeback.h |   69 ++++++++++++++++++++++++++++++++++++++
 mm/page-writeback.c              |    4 ++
 3 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 767898b..3e9d221 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -659,10 +659,14 @@ static long wb_writeback(struct bdi_writeback *wb,
 		wbc.more_io = 0;
 		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
 		wbc.pages_skipped = 0;
+
+		trace_wbc_writeback_start(&wbc, wb->bdi);
 		if (work->sb)
 			__writeback_inodes_sb(work->sb, wb, &wbc);
 		else
 			writeback_inodes_wb(wb, &wbc);
+		trace_wbc_writeback_written(&wbc, wb->bdi);
+
 		work->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 		wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
 
@@ -690,6 +694,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		if (!list_empty(&wb->b_more_io))  {
 			inode = list_entry(wb->b_more_io.prev,
 						struct inode, i_list);
+			trace_wbc_writeback_wait(&wbc, wb->bdi);
 			inode_wait_for_writeback(inode);
 		}
 		spin_unlock(&inode_lock);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index d7b6f41..4e219e2 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -87,6 +87,75 @@ DEFINE_WB_THREAD_EVENT(wb_bdi_unregister);
 DEFINE_WB_THREAD_EVENT(wb_thread_start);
 DEFINE_WB_THREAD_EVENT(wb_thread_stop);
 
+DECLARE_EVENT_CLASS(wbc_class,
+	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
+	TP_ARGS(wbc, bdi),
+	TP_STRUCT__entry(
+		__array(char, name, 32)
+		__field(long, nr_to_write)
+		__field(long, pages_skipped)
+		__field(int, sync_mode)
+		__field(int, nonblocking)
+		__field(int, encountered_congestion)
+		__field(int, for_kupdate)
+		__field(int, for_background)
+		__field(int, for_reclaim)
+		__field(int, range_cyclic)
+		__field(int, more_io)
+		__field(unsigned long, older_than_this)
+		__field(long, range_start)
+		__field(long, range_end)
+	),
+
+	TP_fast_assign(
+		char *__no_bdi = "(no bdi)";
+
+		if (bdi)
+			strncpy(__entry->name, dev_name(bdi->dev), 32);
+		else
+			strncpy(__entry->name, __no_bdi, 32);
+		__entry->nr_to_write	= wbc->nr_to_write;
+		__entry->pages_skipped	= wbc->pages_skipped;
+		__entry->sync_mode	= wbc->sync_mode;
+		__entry->for_kupdate	= wbc->for_kupdate;
+		__entry->for_background	= wbc->for_background;
+		__entry->for_reclaim	= wbc->for_reclaim;
+		__entry->range_cyclic	= wbc->range_cyclic;
+		__entry->more_io	= wbc->more_io;
+		__entry->older_than_this = wbc->older_than_this ?
+						*wbc->older_than_this : 0;
+		__entry->range_start	= (long)wbc->range_start;
+		__entry->range_end	= (long)wbc->range_end;
+	),
+
+	TP_printk("bdi %s: towrt=%ld skip=%ld mode=%d kupd=%d "
+		"bgrd=%d reclm=%d cyclic=%d more=%d older=0x%lx "
+		"start=0x%lx end=0x%lx",
+		__entry->name,
+		__entry->nr_to_write,
+		__entry->pages_skipped,
+		__entry->sync_mode,
+		__entry->for_kupdate,
+		__entry->for_background,
+		__entry->for_reclaim,
+		__entry->range_cyclic,
+		__entry->more_io,
+		__entry->older_than_this,
+		__entry->range_start,
+		__entry->range_end)
+)
+
+#define DEFINE_WBC_EVENT(name) \
+DEFINE_EVENT(wbc_class, name, \
+	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi), \
+	TP_ARGS(wbc, bdi))
+DEFINE_WBC_EVENT(wbc_writeback_start);
+DEFINE_WBC_EVENT(wbc_writeback_written);
+DEFINE_WBC_EVENT(wbc_writeback_wait);
+DEFINE_WBC_EVENT(wbc_balance_dirty_start);
+DEFINE_WBC_EVENT(wbc_balance_dirty_written);
+DEFINE_WBC_EVENT(wbc_balance_dirty_wait);
+
 #endif /* _TRACE_WRITEBACK_H */
 
 /* This part must be outside protection */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 37498ef..d556cd8 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -34,6 +34,7 @@
 #include <linux/syscalls.h>
 #include <linux/buffer_head.h>
 #include <linux/pagevec.h>
+#include <trace/events/writeback.h>
 
 /*
  * After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited
@@ -535,11 +536,13 @@ static void balance_dirty_pages(struct address_space *mapping,
 		 * threshold otherwise wait until the disk writes catch
 		 * up.
 		 */
+		trace_wbc_balance_dirty_start(&wbc, bdi);
 		if (bdi_nr_reclaimable > bdi_thresh) {
 			writeback_inodes_wb(&bdi->wb, &wbc);
 			pages_written += write_chunk - wbc.nr_to_write;
 			get_dirty_limits(&background_thresh, &dirty_thresh,
 				       &bdi_thresh, bdi);
+			trace_wbc_balance_dirty_written(&wbc, bdi);
 		}
 
 		/*
@@ -565,6 +568,7 @@ static void balance_dirty_pages(struct address_space *mapping,
 		if (pages_written >= write_chunk)
 			break;		/* We've done our duty */
 
+		trace_wbc_balance_dirty_wait(&wbc, bdi);
 		__set_current_state(TASK_INTERRUPTIBLE);
 		io_schedule_timeout(pause);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] writeback: Add tracing to write_cache_pages
  2010-07-06 23:41 [PATCH 0/3] writeback: Tracing support V2 Dave Chinner
  2010-07-06 23:41 ` [PATCH 1/3] writeback: Initial tracing support Dave Chinner
  2010-07-06 23:41 ` [PATCH 2/3] writeback: Add tracing to balance_dirty_pages Dave Chinner
@ 2010-07-06 23:41 ` Dave Chinner
  2 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2010-07-06 23:41 UTC (permalink / raw)
  To: axboe; +Cc: linux-fsdevel

From: Dave Chinner <dchinner@redhat.com>

Add a trace event to the ->writepage loop in write_cache_pages to give
visibility into how the ->writepage call is changing variables within the
writeback control structure. Of most interest is how wbc->nr_to_write changes
from call to call, especially with filesystems that write multiple pages
in ->writepage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/trace/events/writeback.h |    1 +
 mm/page-writeback.c              |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 4e219e2..9d8460e 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -155,6 +155,7 @@ DEFINE_WBC_EVENT(wbc_writeback_wait);
 DEFINE_WBC_EVENT(wbc_balance_dirty_start);
 DEFINE_WBC_EVENT(wbc_balance_dirty_written);
 DEFINE_WBC_EVENT(wbc_balance_dirty_wait);
+DEFINE_WBC_EVENT(wbc_writepage);
 
 #endif /* _TRACE_WRITEBACK_H */
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d556cd8..3d2111a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -933,6 +933,7 @@ continue_unlock:
 			if (!clear_page_dirty_for_io(page))
 				goto continue_unlock;
 
+			trace_wbc_writepage(wbc, mapping->backing_dev_info);
 			ret = (*writepage)(page, wbc, data);
 			if (unlikely(ret)) {
 				if (ret == AOP_WRITEPAGE_ACTIVATE) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] writeback: Initial tracing support
  2010-07-06 23:41 ` [PATCH 1/3] writeback: Initial tracing support Dave Chinner
@ 2010-07-07  2:19   ` Christoph Hellwig
  2010-07-07  3:06     ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2010-07-07  2:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: axboe, linux-fsdevel

>  static void bdi_queue_work(struct backing_dev_info *bdi,
>  		struct wb_writeback_work *work)
>  {
> +	trace_wb_sched_queue(bdi, work);

Okay, I'm getting more nitpicky with every repost, but the naming here
seems odd.

First why the unreadle wb prefix instead of a proper writeback?
And the sched in here seems rather useless, I'd just call it

	trace_writeback_queue

> @@ -74,12 +85,16 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
>  	 * it gets created and wakes up, we'll run this work.
>  	 */
>  	if (unlikely(!bdi->wb.task)) {
> +		trace_wb_sched_default(bdi, work);

	trace_writeback_no_thread

>  		wake_up_process(default_backing_dev_info.wb.task);
>  	} else {
>  		struct bdi_writeback *wb = &bdi->wb;
>  
> -		if (wb->task)
> +		if (wb->task) {
> +			trace_wb_sched_task(bdi, work);

	trace_writeback_wake

then again I don't think we actually need this, if we got the
trace_writeback_queue event, and no trace_writeback_no_thread it's
implicit that we got here.

>  			wake_up_process(wb->task);
> +		} else
> +			trace_wb_sched_notask(bdi, work);

This case can't actually happen as we checked for the task a few lines
above.  Might be worth to throw in preparation patch in your series
to  something like:


	if (unlikely(!bdi->wb.task)) {
		wake_up_process(default_backing_dev_info.wb.task);
-	} else {
-		struct bdi_writeback *wb = &bdi->wb;
-
-		if (wb->task)
-			wake_up_process(wb->task);
-	}
+		return;
+	}
+
+	wake_up_process(bdi->wb.task);

>  	}
>  }
>  
> @@ -95,8 +110,10 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
>  	 */
>  	work = kzalloc(sizeof(*work), GFP_ATOMIC);
>  	if (!work) {
> -		if (bdi->wb.task)
> +		if (bdi->wb.task) {
> +			trace_wb_sched_nowork(bdi, work);

	trace_writeback_nowork

> +	TP_fast_assign(
> +		strncpy(__entry->name, dev_name(bdi->dev), 32);
> +		__entry->nr_pages = work ? work->nr_pages : 0;
> +		__entry->sb_dev = work ? work->sb ? work->sb->s_dev : 0 : 0;
> +		__entry->sync_mode = work ? work->sync_mode : -1;
> +		__entry->for_kupdate = work ? work->for_kupdate : -1;
> +		__entry->range_cyclic = work ? work->range_cyclic : -1;
> +		__entry->for_background	= work ? work->for_background : -1;
> +	),

If you had a separate class for those tracepoints that do not have
a work item the code would be a lot simpler.

> +DECLARE_EVENT_CLASS(wb_thread_class,
> +	TP_PROTO(struct backing_dev_info *bdi),
> +	TP_ARGS(bdi),
> +	TP_STRUCT__entry(
> +		__array(char, name, 32)
> +	),
> +	TP_fast_assign(
> +		strncpy(__entry->name, dev_name(bdi->dev), 32);
> +	),
> +	TP_printk("bdi %s",
> +		  __entry->name
> +	)

In fact you already have one.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] writeback: Add tracing to balance_dirty_pages
  2010-07-06 23:41 ` [PATCH 2/3] writeback: Add tracing to balance_dirty_pages Dave Chinner
@ 2010-07-07  2:20   ` Christoph Hellwig
  2010-07-07  3:11     ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2010-07-07  2:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: axboe, linux-fsdevel

> +
> +	TP_fast_assign(
> +		char *__no_bdi = "(no bdi)";
> +
> +		if (bdi)
> +			strncpy(__entry->name, dev_name(bdi->dev), 32);
> +		else
> +			strncpy(__entry->name, __no_bdi, 32);

I can't see a caller that doesn't have a bdi.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] writeback: Initial tracing support
  2010-07-07  2:19   ` Christoph Hellwig
@ 2010-07-07  3:06     ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2010-07-07  3:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-fsdevel

On Tue, Jul 06, 2010 at 10:19:12PM -0400, Christoph Hellwig wrote:
> >  static void bdi_queue_work(struct backing_dev_info *bdi,
> >  		struct wb_writeback_work *work)
> >  {
> > +	trace_wb_sched_queue(bdi, work);
> 
> Okay, I'm getting more nitpicky with every repost, but the naming here
> seems odd.
> 
> First why the unreadle wb prefix instead of a proper writeback?
> And the sched in here seems rather useless, I'd just call it
> 
> 	trace_writeback_queue

Done.

> 
> > @@ -74,12 +85,16 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
> >  	 * it gets created and wakes up, we'll run this work.
> >  	 */
> >  	if (unlikely(!bdi->wb.task)) {
> > +		trace_wb_sched_default(bdi, work);
> 
> 	trace_writeback_no_thread
> 
> >  		wake_up_process(default_backing_dev_info.wb.task);
> >  	} else {
> >  		struct bdi_writeback *wb = &bdi->wb;
> >  
> > -		if (wb->task)
> > +		if (wb->task) {
> > +			trace_wb_sched_task(bdi, work);
> 
> 	trace_writeback_wake
> 
> then again I don't think we actually need this, if we got the
> trace_writeback_queue event, and no trace_writeback_no_thread it's
> implicit that we got here.

OK, fixed.

> >  			wake_up_process(wb->task);
> > +		} else
> > +			trace_wb_sched_notask(bdi, work);
> 
> This case can't actually happen as we checked for the task a few lines
> above.

Right, my bad.

> Might be worth to throw in preparation patch in your series
> to  something like:
> 
> 
> 	if (unlikely(!bdi->wb.task)) {
> 		wake_up_process(default_backing_dev_info.wb.task);
> -	} else {
> -		struct bdi_writeback *wb = &bdi->wb;
> -
> -		if (wb->task)
> -			wake_up_process(wb->task);
> -	}
> +		return;
> +	}
> +
> +	wake_up_process(bdi->wb.task);
> 
> >  	}
> >  }

Perhaps. making the above changes means it's not necessary...

> >  
> > @@ -95,8 +110,10 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
> >  	 */
> >  	work = kzalloc(sizeof(*work), GFP_ATOMIC);
> >  	if (!work) {
> > -		if (bdi->wb.task)
> > +		if (bdi->wb.task) {
> > +			trace_wb_sched_nowork(bdi, work);
> 
> 	trace_writeback_nowork
> 
> > +	TP_fast_assign(
> > +		strncpy(__entry->name, dev_name(bdi->dev), 32);
> > +		__entry->nr_pages = work ? work->nr_pages : 0;
> > +		__entry->sb_dev = work ? work->sb ? work->sb->s_dev : 0 : 0;
> > +		__entry->sync_mode = work ? work->sync_mode : -1;
> > +		__entry->for_kupdate = work ? work->for_kupdate : -1;
> > +		__entry->range_cyclic = work ? work->range_cyclic : -1;
> > +		__entry->for_background	= work ? work->for_background : -1;
> > +	),
> 
> If you had a separate class for those tracepoints that do not have
> a work item the code would be a lot simpler.

Yup, fixed.

I'll post a new version in a short while.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] writeback: Add tracing to balance_dirty_pages
  2010-07-07  2:20   ` Christoph Hellwig
@ 2010-07-07  3:11     ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2010-07-07  3:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-fsdevel

On Tue, Jul 06, 2010 at 10:20:55PM -0400, Christoph Hellwig wrote:
> > +
> > +	TP_fast_assign(
> > +		char *__no_bdi = "(no bdi)";
> > +
> > +		if (bdi)
> > +			strncpy(__entry->name, dev_name(bdi->dev), 32);
> > +		else
> > +			strncpy(__entry->name, __no_bdi, 32);
> 
> I can't see a caller that doesn't have a bdi.

True. I think that was left over from the previous version where
the bdi could be null. I'll fix it up.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] writeback: Initial tracing support
  2010-07-07  3:24 [PATCH 0/3] writeback: Tracing support V3 Dave Chinner
@ 2010-07-07  3:24 ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2010-07-07  3:24 UTC (permalink / raw)
  To: axboe; +Cc: linux-fsdevel

Trace queue/sched/exec parts of the writeback loop. This provides
insight into when and why flusher threads are scheduled to run. e.g
a sync invocation leaves traces like:

     sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0

This also lays the foundation for adding more writeback tracing to
provide deeper insight into the whole writeback path.

The original tracing code is from Jens Axboe, though this version is
a rewrite as a result of the code being traced changing
significantly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/fs-writeback.c                |   38 ++++++++++++---
 include/trace/events/writeback.h |   91 ++++++++++++++++++++++++++++++++++++++
 mm/backing-dev.c                 |    3 +
 3 files changed, 124 insertions(+), 8 deletions(-)
 create mode 100644 include/trace/events/writeback.h

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c8471b3..73acab4 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -26,15 +26,9 @@
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/buffer_head.h>
+#include <linux/tracepoint.h>
 #include "internal.h"
 
-#define inode_to_bdi(inode)	((inode)->i_mapping->backing_dev_info)
-
-/*
- * We don't actually have pdflush, but this one is exported though /proc...
- */
-int nr_pdflush_threads;
-
 /*
  * Passed into wb_writeback(), essentially a subset of writeback_control
  */
@@ -50,6 +44,21 @@ struct wb_writeback_work {
 	struct completion *done;	/* set if the caller waits */
 };
 
+/*
+ * Include the creation of the trace points after defining the
+ * wb_writeback_work structure so that the definition remains local to this
+ * file.
+ */
+#define CREATE_TRACE_POINTS
+#include <trace/events/writeback.h>
+
+#define inode_to_bdi(inode)	((inode)->i_mapping->backing_dev_info)
+
+/*
+ * We don't actually have pdflush, but this one is exported though /proc...
+ */
+int nr_pdflush_threads;
+
 /**
  * writeback_in_progress - determine whether there is writeback in progress
  * @bdi: the device's backing_dev_info structure.
@@ -65,6 +74,8 @@ int writeback_in_progress(struct backing_dev_info *bdi)
 static void bdi_queue_work(struct backing_dev_info *bdi,
 		struct wb_writeback_work *work)
 {
+	trace_writeback_queue(bdi, work);
+
 	spin_lock(&bdi->wb_lock);
 	list_add_tail(&work->list, &bdi->work_list);
 	spin_unlock(&bdi->wb_lock);
@@ -74,6 +85,7 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
 	 * it gets created and wakes up, we'll run this work.
 	 */
 	if (unlikely(!bdi->wb.task)) {
+		trace_writeback_nothread(bdi, work);
 		wake_up_process(default_backing_dev_info.wb.task);
 	} else {
 		struct bdi_writeback *wb = &bdi->wb;
@@ -95,8 +107,10 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 	 */
 	work = kzalloc(sizeof(*work), GFP_ATOMIC);
 	if (!work) {
-		if (bdi->wb.task)
+		if (bdi->wb.task) {
+			trace_writeback_nowork(bdi);
 			wake_up_process(bdi->wb.task);
+		}
 		return;
 	}
 
@@ -751,6 +765,8 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 		if (force_wait)
 			work->sync_mode = WB_SYNC_ALL;
 
+		trace_writeback_exec(bdi, work);
+
 		wrote += wb_writeback(wb, work);
 
 		/*
@@ -805,9 +821,13 @@ int bdi_writeback_thread(void *data)
 	smp_mb__after_clear_bit();
 	wake_up_bit(&bdi->state, BDI_pending);
 
+	trace_writeback_thread_start(bdi);
+
 	while (!kthread_should_stop()) {
 		pages_written = wb_do_writeback(wb, 0);
 
+		trace_writeback_pages_written(pages_written);
+
 		if (pages_written)
 			last_active = jiffies;
 		else if (wait_jiffies != -1UL) {
@@ -845,6 +865,8 @@ int bdi_writeback_thread(void *data)
 	 */
 	if (!list_empty(&bdi->work_list))
 		wb_do_writeback(wb, 1);
+
+	trace_writeback_thread_stop(bdi);
 	return 0;
 }
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
new file mode 100644
index 0000000..562fcae
--- /dev/null
+++ b/include/trace/events/writeback.h
@@ -0,0 +1,91 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM writeback
+
+#if !defined(_TRACE_WRITEBACK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_WRITEBACK_H
+
+#include <linux/backing-dev.h>
+#include <linux/writeback.h>
+
+struct wb_writeback_work;
+
+DECLARE_EVENT_CLASS(writeback_work_class,
+	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work),
+	TP_ARGS(bdi, work),
+	TP_STRUCT__entry(
+		__array(char, name, 32)
+		__field(long, nr_pages)
+		__field(dev_t, sb_dev)
+		__field(int, sync_mode)
+		__field(int, for_kupdate)
+		__field(int, range_cyclic)
+		__field(int, for_background)
+	),
+	TP_fast_assign(
+		strncpy(__entry->name, dev_name(bdi->dev), 32);
+		__entry->nr_pages = work->nr_pages;
+		__entry->sb_dev = work->sb ? work->sb->s_dev : 0;
+		__entry->sync_mode = work->sync_mode;
+		__entry->for_kupdate = work->for_kupdate;
+		__entry->range_cyclic = work->range_cyclic;
+		__entry->for_background	= work->for_background;
+	),
+	TP_printk("bdi %s: sb_dev %d:%d nr_pages=%ld sync_mode=%d "
+		  "kupdate=%d range_cyclic=%d background=%d",
+		  __entry->name,
+		  MAJOR(__entry->sb_dev), MINOR(__entry->sb_dev),
+		  __entry->nr_pages,
+		  __entry->sync_mode,
+		  __entry->for_kupdate,
+		  __entry->range_cyclic,
+		  __entry->for_background
+	)
+);
+#define DEFINE_WRITEBACK_WORK_EVENT(name) \
+DEFINE_EVENT(writeback_work_class, name, \
+	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work), \
+	TP_ARGS(bdi, work))
+DEFINE_WRITEBACK_WORK_EVENT(writeback_nothread);
+DEFINE_WRITEBACK_WORK_EVENT(writeback_queue);
+DEFINE_WRITEBACK_WORK_EVENT(writeback_exec);
+
+TRACE_EVENT(writeback_pages_written,
+	TP_PROTO(long pages_written),
+	TP_ARGS(pages_written),
+	TP_STRUCT__entry(
+		__field(long,		pages)
+	),
+	TP_fast_assign(
+		__entry->pages		= pages_written;
+	),
+	TP_printk("%ld", __entry->pages)
+);
+
+DECLARE_EVENT_CLASS(writeback_class,
+	TP_PROTO(struct backing_dev_info *bdi),
+	TP_ARGS(bdi),
+	TP_STRUCT__entry(
+		__array(char, name, 32)
+	),
+	TP_fast_assign(
+		strncpy(__entry->name, dev_name(bdi->dev), 32);
+	),
+	TP_printk("bdi %s",
+		  __entry->name
+	)
+);
+#define DEFINE_WRITEBACK_EVENT(name) \
+DEFINE_EVENT(writeback_class, name, \
+	TP_PROTO(struct backing_dev_info *bdi), \
+	TP_ARGS(bdi))
+
+DEFINE_WRITEBACK_EVENT(writeback_nowork);
+DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
+DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
+DEFINE_WRITEBACK_EVENT(writeback_thread_start);
+DEFINE_WRITEBACK_EVENT(writeback_thread_stop);
+
+#endif /* _TRACE_WRITEBACK_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index bceac64..ac78a33 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/writeback.h>
 #include <linux/device.h>
+#include <trace/events/writeback.h>
 
 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
 
@@ -518,6 +519,7 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 
 	bdi_debug_register(bdi, dev_name(dev));
 	set_bit(BDI_registered, &bdi->state);
+	trace_writeback_bdi_register(bdi);
 exit:
 	return ret;
 }
@@ -578,6 +580,7 @@ static void bdi_prune_sb(struct backing_dev_info *bdi)
 void bdi_unregister(struct backing_dev_info *bdi)
 {
 	if (bdi->dev) {
+		trace_writeback_bdi_unregister(bdi);
 		bdi_prune_sb(bdi);
 
 		if (!bdi_cap_flush_forker(bdi))
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] writeback: initial tracing support
  2010-06-21  1:25 ` [PATCH 1/3] writeback: initial " Dave Chinner
@ 2010-06-21  8:09   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2010-06-21  8:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: axboe, linux-fsdevel

On Mon, Jun 21, 2010 at 11:25:24AM +1000, Dave Chinner wrote:
> +#include <linux/ftrace.h>

I think this should be <linux/tracepoint.h>.


I fear this might need some more rediffing after my latests writeback
patches.

> +	TP_fast_assign(
> +		strlcpy(__entry->name, dev_name(bdi->dev), 16);

For most other fs/block dev tracepoints we print the dev_t as
major:minor.  It would be good to have some consitency to match on
for the trace points.

> +		__entry->nr_pages	= args->nr_pages;
> +		__entry->sb		= !!args->sb;

It might be worth to store the dev_t of the superblock here?

> +	TP_printk("%s: pages=%ld, sb=%d, kupdate=%d, range_cyclic=%d "
> +		  "for_background=%d", __entry->name, __entry->nr_pages,
> +			__entry->sb, __entry->for_kupdate,
> +			__entry->range_cyclic, __entry->for_background)

> +TRACE_EVENT(writeback_sched,
> +
> +	TP_PROTO(struct backing_dev_info *bdi, struct bdi_work *work,
> +		 const char *msg),
> +
> +	TP_ARGS(bdi, work, msg),
> +
> +	TP_STRUCT__entry(
> +		__array(char,			name,		16)
> +		__field(unsigned int,		work)
> +		__array(char,			task,		8)
> +	),

Instead of the task field I'd make this a calss with three instances
for the different types of wakeups.

> +
> +	TP_fast_assign(
> +		strlcpy(__entry->name, dev_name(bdi->dev), 16);
> +		__entry->work = (unsigned long) work & 0xffff;

The unsigned long allows to store the whole pointer.  Or we could just
store it as a void pointer and print it using %p to make the types
simpler.

> +	TP_PROTO(const char *name, int start),
> +
> +	TP_ARGS(name, start),
> +
> +	TP_STRUCT__entry(
> +		__array(char,	name,		16)
> +		__field(int,	start)
> +	),
> +
> +	TP_fast_assign(
> +		strlcpy(__entry->name, name, 16);
> +		__entry->start = start;
> +	),
> +
> +	TP_printk("%s: %s", __entry->name,
> +			__entry->start ? "registered" : "unregistered")

Again, the clean way would be to have class with two instances instead
of using one tracepoint for two different events and differenciate them
by a string.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] writeback: initial tracing support
  2010-06-21  1:25 [PATCH 0/3] writeback: " Dave Chinner
@ 2010-06-21  1:25 ` Dave Chinner
  2010-06-21  8:09   ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-06-21  1:25 UTC (permalink / raw)
  To: axboe; +Cc: linux-fsdevel

From: From: Jens Axboe <jens.axboe@oracle.com>

Trace queue/sched/exec parts of the writeback loop. This provides
insight into when and why flusher threads are scheduled to run. e.g
a sync invocation leaves a trace like:

sync-2798  [006] 611323.335713: writeback_queue: 253:16: pages=87879, sb=0, kupdate=0, range_cyclic=0 for_background=0
sync-2798  [006] 611323.335718: writeback_sched: work=37c0, task=task
sync-2798  [006] 611323.335817: writeback_queue: 8:0: pages=92680, sb=1, kupdate=0, range_cyclic=-1 for_background=0
sync-2798  [006] 611323.335819: writeback_sched: work=35c0, task=task
sync-2798  [006] 611323.335855: writeback_queue: 253:16: pages=92680, sb=1, kupdate=0, range_cyclic=-1 for_background=0
sync-2798  [006] 611323.335857: writeback_sched: work=36c0, task=task
sync-2798  [006] 611323.335890: writeback_queue: 8:0: pages=9223372036854775807, sb=1, kupdate=0, range_cyclic=0 for_background=0
sync-2798  [006] 611323.335891: writeback_sched: work=fe58, task=task
sync-2798  [006] 611323.377341: writeback_queue: 253:16: pages=9223372036854775807, sb=1, kupdate=0, range_cyclic=0 for_background=0
sync-2798  [006] 611323.377346: writeback_sched: work=fe58, task=task

This also lays the foundation for adding more writeback tracing to
provide deeper insight into the whole writeback path.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/fs-writeback.c                |   45 ++++++++--
 include/trace/events/writeback.h |  171 ++++++++++++++++++++++++++++++++++++++
 mm/backing-dev.c                 |    3 +
 3 files changed, 209 insertions(+), 10 deletions(-)
 create mode 100644 include/trace/events/writeback.h

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1d1088f..0949dfd 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -26,15 +26,9 @@
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/buffer_head.h>
+#include <linux/ftrace.h>
 #include "internal.h"
 
-#define inode_to_bdi(inode)	((inode)->i_mapping->backing_dev_info)
-
-/*
- * We don't actually have pdflush, but this one is exported though /proc...
- */
-int nr_pdflush_threads;
-
 /*
  * Passed into wb_writeback(), essentially a subset of writeback_control
  */
@@ -62,6 +56,21 @@ struct bdi_work {
 	unsigned long state;		/* flag bits, see WS_* */
 };
 
+/*
+ * Include the creation of the trace points after defining the bdi_work and
+ * wb_writeback_args structures so that the definitions remain local to this
+ * file.
+ */
+#define CREATE_TRACE_POINTS
+#include <trace/events/writeback.h>
+
+#define inode_to_bdi(inode)	((inode)->i_mapping->backing_dev_info)
+
+/*
+ * We don't actually have pdflush, but this one is exported though /proc...
+ */
+int nr_pdflush_threads;
+
 enum {
 	WS_USED_B = 0,
 	WS_ONSTACK_B,
@@ -136,6 +145,8 @@ static void wb_work_complete(struct bdi_work *work)
 
 static void wb_clear_pending(struct bdi_writeback *wb, struct bdi_work *work)
 {
+	trace_writeback_clear(work);
+
 	/*
 	 * The caller has retrieved the work arguments from this work,
 	 * drop our reference. If this is the last ref, delete and free it
@@ -171,12 +182,16 @@ static void bdi_queue_work(struct backing_dev_info *bdi, struct bdi_work *work)
 	 * If the default thread isn't there, make sure we add it. When
 	 * it gets created and wakes up, we'll run this work.
 	 */
-	if (unlikely(list_empty_careful(&bdi->wb_list)))
+	if (unlikely(list_empty_careful(&bdi->wb_list))) {
+		trace_writeback_sched(bdi, work, "default");
 		wake_up_process(default_backing_dev_info.wb.task);
-	else {
+	} else {
 		struct bdi_writeback *wb = &bdi->wb;
+		struct task_struct *task = wb->task;
 
-		if (wb->task)
+		trace_writeback_sched(bdi, work, task ? "task" : "notask");
+
+		if (task)
 			wake_up_process(wb->task);
 	}
 }
@@ -203,6 +218,7 @@ static void bdi_alloc_queue_work(struct backing_dev_info *bdi,
 	work = kmalloc(sizeof(*work), GFP_ATOMIC);
 	if (work) {
 		bdi_work_init(work, args);
+		trace_writeback_queue(bdi, args);
 		bdi_queue_work(bdi, work);
 	} else {
 		struct bdi_writeback *wb = &bdi->wb;
@@ -236,6 +252,7 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
 	bdi_work_init(&work, &args);
 	work.state |= WS_ONSTACK;
 
+	trace_writeback_queue(bdi, &args);
 	bdi_queue_work(bdi, &work);
 	bdi_wait_on_work_clear(&work);
 }
@@ -907,6 +924,8 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 		if (force_wait)
 			work->args.sync_mode = args.sync_mode = WB_SYNC_ALL;
 
+		trace_writeback_exec(work);
+
 		/*
 		 * If this isn't a data integrity operation, just notify
 		 * that we have seen this work and we are now starting it.
@@ -942,9 +961,13 @@ int bdi_writeback_task(struct bdi_writeback *wb)
 	unsigned long wait_jiffies = -1UL;
 	long pages_written;
 
+	trace_writeback_thread_start(1);
+
 	while (!kthread_should_stop()) {
 		pages_written = wb_do_writeback(wb, 0);
 
+		trace_writeback_pages_written(pages_written);
+
 		if (pages_written)
 			last_active = jiffies;
 		else if (wait_jiffies != -1UL) {
@@ -974,6 +997,8 @@ int bdi_writeback_task(struct bdi_writeback *wb)
 		try_to_freeze();
 	}
 
+	trace_writeback_thread_start(0);
+
 	return 0;
 }
 
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
new file mode 100644
index 0000000..6f510fa
--- /dev/null
+++ b/include/trace/events/writeback.h
@@ -0,0 +1,171 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM writeback
+
+#if !defined(_TRACE_WRITEBACK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_WRITEBACK_H
+
+#include <linux/backing-dev.h>
+#include <linux/writeback.h>
+
+TRACE_EVENT(writeback_queue,
+
+	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_args *args),
+
+	TP_ARGS(bdi, args),
+
+	TP_STRUCT__entry(
+		__array(char,		name,		16)
+		__field(long,		nr_pages)
+		__field(int,		sb)
+		__field(int,		sync_mode)
+		__field(int,		for_kupdate)
+		__field(int,		range_cyclic)
+		__field(int,		for_background)
+	),
+
+	TP_fast_assign(
+		strlcpy(__entry->name, dev_name(bdi->dev), 16);
+		__entry->nr_pages	= args->nr_pages;
+		__entry->sb		= !!args->sb;
+		__entry->for_kupdate	= args->for_kupdate;
+		__entry->range_cyclic	= args->range_cyclic;
+		__entry->for_background	= args->for_background;
+	),
+
+	TP_printk("%s: pages=%ld, sb=%d, kupdate=%d, range_cyclic=%d "
+		  "for_background=%d", __entry->name, __entry->nr_pages,
+			__entry->sb, __entry->for_kupdate,
+			__entry->range_cyclic, __entry->for_background)
+);
+
+TRACE_EVENT(writeback_sched,
+
+	TP_PROTO(struct backing_dev_info *bdi, struct bdi_work *work,
+		 const char *msg),
+
+	TP_ARGS(bdi, work, msg),
+
+	TP_STRUCT__entry(
+		__array(char,			name,		16)
+		__field(unsigned int,		work)
+		__array(char,			task,		8)
+	),
+
+	TP_fast_assign(
+		strlcpy(__entry->name, dev_name(bdi->dev), 16);
+		__entry->work = (unsigned long) work & 0xffff;
+		snprintf(__entry->task, 8, "%s", msg);
+	),
+
+	TP_printk("work=%x, task=%s", __entry->work, __entry->task)
+);
+
+TRACE_EVENT(writeback_exec,
+
+	TP_PROTO(struct bdi_work *work),
+
+	TP_ARGS(work),
+
+	TP_STRUCT__entry(
+		__field(unsigned int,	work)
+		__field(long,		nr_pages)
+		__field(int,		sb)
+		__field(int,		sync_mode)
+		__field(int,		for_kupdate)
+		__field(int,		range_cyclic)
+		__field(int,		for_background)
+	),
+
+	TP_fast_assign(
+		__entry->work		= (unsigned long) work & 0xffff;
+		__entry->nr_pages	= work->args.nr_pages;
+		__entry->sb		= !!work->args.sb;
+		__entry->for_kupdate	= work->args.for_kupdate;
+		__entry->range_cyclic	= work->args.range_cyclic;
+		__entry->for_background	= work->args.for_background;
+
+	),
+
+	TP_printk("work=%x pages=%ld, sb=%d, kupdate=%d, range_cyclic=%d"
+		  " for_background=%d", __entry->work,
+			__entry->nr_pages, __entry->sb, __entry->for_kupdate,
+			__entry->range_cyclic, __entry->for_background)
+);
+
+TRACE_EVENT(writeback_clear,
+
+	TP_PROTO(struct bdi_work *work),
+
+	TP_ARGS(work),
+
+	TP_STRUCT__entry(
+		__field(struct bdi_work *,	work)
+		__field(int,			refs)
+	),
+
+	TP_fast_assign(
+		__entry->work		= work;
+		__entry->refs		= atomic_read(&work->pending);
+	),
+
+	TP_printk("work=%p, refs=%d", __entry->work, __entry->refs)
+);
+
+TRACE_EVENT(writeback_pages_written,
+
+	TP_PROTO(long pages_written),
+
+	TP_ARGS(pages_written),
+
+	TP_STRUCT__entry(
+		__field(long,		pages)
+	),
+
+	TP_fast_assign(
+		__entry->pages		= pages_written;
+	),
+
+	TP_printk("%ld", __entry->pages)
+);
+
+
+TRACE_EVENT(writeback_thread_start,
+
+	TP_PROTO(int start),
+
+	TP_ARGS(start),
+
+	TP_STRUCT__entry(
+		__field(int,	start)
+	),
+
+	TP_fast_assign(
+		__entry->start = start;
+	),
+
+	TP_printk("%s", __entry->start ? "started" : "exited")
+);
+
+TRACE_EVENT(writeback_bdi_register,
+
+	TP_PROTO(const char *name, int start),
+
+	TP_ARGS(name, start),
+
+	TP_STRUCT__entry(
+		__array(char,	name,		16)
+		__field(int,	start)
+	),
+
+	TP_fast_assign(
+		strlcpy(__entry->name, name, 16);
+		__entry->start = start;
+	),
+
+	TP_printk("%s: %s", __entry->name,
+			__entry->start ? "registered" : "unregistered")
+);
+#endif /* _TRACE_WRITEBACK_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 660a87a..1f7723b 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/writeback.h>
 #include <linux/device.h>
+#include <trace/events/writeback.h>
 
 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
 
@@ -585,6 +586,7 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 
 	bdi_debug_register(bdi, dev_name(dev));
 	set_bit(BDI_registered, &bdi->state);
+	trace_writeback_bdi_register(dev_name(dev), 1);
 exit:
 	return ret;
 }
@@ -647,6 +649,7 @@ static void bdi_prune_sb(struct backing_dev_info *bdi)
 void bdi_unregister(struct backing_dev_info *bdi)
 {
 	if (bdi->dev) {
+		trace_writeback_bdi_register(dev_name(bdi->dev), 0);
 		bdi_prune_sb(bdi);
 
 		if (!bdi_cap_flush_forker(bdi))
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-07-07  3:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-06 23:41 [PATCH 0/3] writeback: Tracing support V2 Dave Chinner
2010-07-06 23:41 ` [PATCH 1/3] writeback: Initial tracing support Dave Chinner
2010-07-07  2:19   ` Christoph Hellwig
2010-07-07  3:06     ` Dave Chinner
2010-07-06 23:41 ` [PATCH 2/3] writeback: Add tracing to balance_dirty_pages Dave Chinner
2010-07-07  2:20   ` Christoph Hellwig
2010-07-07  3:11     ` Dave Chinner
2010-07-06 23:41 ` [PATCH 3/3] writeback: Add tracing to write_cache_pages Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2010-07-07  3:24 [PATCH 0/3] writeback: Tracing support V3 Dave Chinner
2010-07-07  3:24 ` [PATCH 1/3] writeback: Initial tracing support Dave Chinner
2010-06-21  1:25 [PATCH 0/3] writeback: " Dave Chinner
2010-06-21  1:25 ` [PATCH 1/3] writeback: initial " Dave Chinner
2010-06-21  8:09   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.