linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET] block: improve tracepoints, take#2
@ 2013-01-11 21:06 Tejun Heo
  2013-01-11 21:06 ` [PATCH 1/5] block: add missing block_bio_complete() tracepoint Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Tejun Heo @ 2013-01-11 21:06 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, chavey, fengguang.wu

Hello, Jens.

This is the second take.  Changes from the first take[L] are

* writeback_dirty_buffer TP was botched.  It made build fail when
  CONFIG_BLOCK is not set (reported by Fengguang) and I somehow lost
  its actual triggering in mark_buffer_dirty() while splitting
  patches.  Made it a block TP instead so that it can share TP
  definition with block_touch_buffer and restored the triggering from
  mark_buffer_dirty().

This patchset fixes/improves bio_complete TP so that block layer
proper triggers for all completing bios instead of stackign drivers
triggering them manually and adds more buffer / block / writeback TPs.
These improve visibility in general and are already in use in google.

This patchset contains the following five patches.

0001-block-add-missing-block_bio_complete-tracepoint.patch
0002-block-add-req-to-bio_-front-back-_merge-tracepoints.patch
0003-buffer-make-touch_buffer-an-exported-function.patch
0004-block-add-block_-touch-dirty-_buffer-tracepoint.patch
0005-writeback-add-more-tracepoints.patch

It's based on top of v3.8-rc2 and also available in the following git
branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-tps

diffstat follows.  Thanks.

 block/blk-core.c                 |    5 -
 drivers/md/dm.c                  |    1
 drivers/md/raid5.c               |   11 ---
 fs/bio.c                         |    2
 fs/buffer.c                      |   10 +++
 fs/fs-writeback.c                |   16 ++++-
 include/linux/blktrace_api.h     |    1
 include/linux/buffer_head.h      |    2
 include/trace/events/block.h     |  104 +++++++++++++++++++++++++++++-----
 include/trace/events/writeback.h |  116 +++++++++++++++++++++++++++++++++++++++
 kernel/trace/blktrace.c          |   28 ++++++++-
 mm/page-writeback.c              |    2
 12 files changed, 263 insertions(+), 35 deletions(-)

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/1419141

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] block: add missing block_bio_complete() tracepoint
  2013-01-11 21:06 [PATCHSET] block: improve tracepoints, take#2 Tejun Heo
@ 2013-01-11 21:06 ` Tejun Heo
  2013-01-11 21:06 ` [PATCH 2/5] block: add @req to bio_{front|back}_merge tracepoints Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2013-01-11 21:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-kernel, chavey, fengguang.wu, Tejun Heo, Steven Rostedt,
	Alasdair Kergon, dm-devel, Neil Brown

bio completion didn't kick block_bio_complete TP.  Only dm was
explicitly triggering the TP on IO completion.  This makes
block_bio_complete TP useless for tracers which want to know about
bios, and all other bio based drivers skip generating blktrace
completion events.

This patch makes all bio completions via bio_endio() generate
block_bio_complete TP.

* Explicit trace_block_bio_complete() invocation removed from dm and
  the trace point is unexported.

* @rq dropped from trace_block_bio_complete().  bios may fly around
  w/o queue associated.  Verifying and accessing the assocaited queue
  belongs to TP probes.

* blktrace now gets both request and bio completions.  Make it ignore
  bio completions if request completion path is happening.

This makes all bio based drivers generate blktrace completion events
properly and makes the block_bio_complete TP actually useful.

v2: With this change, block_bio_complete TP could be invoked on sg
    commands which have bio's with %NULL bi_bdev.  Update TP
    assignment code to check whether bio->bi_bdev is %NULL before
    dereferencing.

Signed-off-by: Tejun Heo <tj@kernel.org>
Original-patch-by: Namhyung Kim <namhyung@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
---
 block/blk-core.c             |  1 -
 drivers/md/dm.c              |  1 -
 drivers/md/raid5.c           | 11 +----------
 fs/bio.c                     |  2 ++
 include/linux/blktrace_api.h |  1 +
 include/trace/events/block.h |  8 ++++----
 kernel/trace/blktrace.c      | 26 +++++++++++++++++++++++---
 7 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c973249..d915d07 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -39,7 +39,6 @@
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap);
-EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete);
 EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug);
 
 DEFINE_IDA(blk_queue_ida);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c72e4d5..650ec28 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -627,7 +627,6 @@ static void dec_pending(struct dm_io *io, int error)
 			queue_io(md, bio);
 		} else {
 			/* done with normal IO or empty flush */
-			trace_block_bio_complete(md->queue, bio, io_error);
 			bio_endio(bio, io_error);
 		}
 	}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 19d77a0..9ab506d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -184,8 +184,6 @@ static void return_io(struct bio *return_bi)
 		return_bi = bi->bi_next;
 		bi->bi_next = NULL;
 		bi->bi_size = 0;
-		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-					 bi, 0);
 		bio_endio(bi, 0);
 		bi = return_bi;
 	}
@@ -3917,8 +3915,6 @@ static void raid5_align_endio(struct bio *bi, int error)
 	rdev_dec_pending(rdev, conf->mddev);
 
 	if (!error && uptodate) {
-		trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev),
-					 raid_bi, 0);
 		bio_endio(raid_bi, 0);
 		if (atomic_dec_and_test(&conf->active_aligned_reads))
 			wake_up(&conf->wait_for_stripe);
@@ -4377,8 +4373,6 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 		if ( rw == WRITE )
 			md_write_end(mddev);
 
-		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-					 bi, 0);
 		bio_endio(bi, 0);
 	}
 }
@@ -4755,11 +4749,8 @@ static int  retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
 		handled++;
 	}
 	remaining = raid5_dec_bi_active_stripes(raid_bio);
-	if (remaining == 0) {
-		trace_block_bio_complete(bdev_get_queue(raid_bio->bi_bdev),
-					 raid_bio, 0);
+	if (remaining == 0)
 		bio_endio(raid_bio, 0);
-	}
 	if (atomic_dec_and_test(&conf->active_aligned_reads))
 		wake_up(&conf->wait_for_stripe);
 	return handled;
diff --git a/fs/bio.c b/fs/bio.c
index b96fc6c..bb5768f 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1428,6 +1428,8 @@ void bio_endio(struct bio *bio, int error)
 	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
 		error = -EIO;
 
+	trace_block_bio_complete(bio, error);
+
 	if (bio->bi_end_io)
 		bio->bi_end_io(bio, error);
 }
diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index 7c2e030..0ea61e0 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -12,6 +12,7 @@
 
 struct blk_trace {
 	int trace_state;
+	bool rq_based;
 	struct rchan *rchan;
 	unsigned long __percpu *sequence;
 	unsigned char __percpu *msg_data;
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 05c5e61..8a168db 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -206,7 +206,6 @@ TRACE_EVENT(block_bio_bounce,
 
 /**
  * block_bio_complete - completed all work on the block operation
- * @q: queue holding the block operation
  * @bio: block operation completed
  * @error: io error value
  *
@@ -215,9 +214,9 @@ TRACE_EVENT(block_bio_bounce,
  */
 TRACE_EVENT(block_bio_complete,
 
-	TP_PROTO(struct request_queue *q, struct bio *bio, int error),
+	TP_PROTO(struct bio *bio, int error),
 
-	TP_ARGS(q, bio, error),
+	TP_ARGS(bio, error),
 
 	TP_STRUCT__entry(
 		__field( dev_t,		dev		)
@@ -228,7 +227,8 @@ TRACE_EVENT(block_bio_complete,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= bio->bi_bdev->bd_dev;
+		__entry->dev		= bio->bi_bdev ?
+					  bio->bi_bdev->bd_dev : 0;
 		__entry->sector		= bio->bi_sector;
 		__entry->nr_sector	= bio->bi_size >> 9;
 		__entry->error		= error;
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index c0bd030..190d98f 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -739,6 +739,12 @@ static void blk_add_trace_rq_complete(void *ignore,
 				      struct request_queue *q,
 				      struct request *rq)
 {
+	struct blk_trace *bt = q->blk_trace;
+
+	/* if control ever passes through here, it's a request based driver */
+	if (unlikely(bt && !bt->rq_based))
+		bt->rq_based = true;
+
 	blk_add_trace_rq(q, rq, BLK_TA_COMPLETE);
 }
 
@@ -774,10 +780,24 @@ static void blk_add_trace_bio_bounce(void *ignore,
 	blk_add_trace_bio(q, bio, BLK_TA_BOUNCE, 0);
 }
 
-static void blk_add_trace_bio_complete(void *ignore,
-				       struct request_queue *q, struct bio *bio,
-				       int error)
+static void blk_add_trace_bio_complete(void *ignore, struct bio *bio, int error)
 {
+	struct request_queue *q;
+	struct blk_trace *bt;
+
+	if (!bio->bi_bdev)
+		return;
+
+	q = bdev_get_queue(bio->bi_bdev);
+	bt = q->blk_trace;
+
+	/*
+	 * Request based drivers will generate both rq and bio completions.
+	 * Ignore bio ones.
+	 */
+	if (likely(!bt) || bt->rq_based)
+		return;
+
 	blk_add_trace_bio(q, bio, BLK_TA_COMPLETE, error);
 }
 
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] block: add @req to bio_{front|back}_merge tracepoints
  2013-01-11 21:06 [PATCHSET] block: improve tracepoints, take#2 Tejun Heo
  2013-01-11 21:06 ` [PATCH 1/5] block: add missing block_bio_complete() tracepoint Tejun Heo
@ 2013-01-11 21:06 ` Tejun Heo
  2013-01-11 21:06 ` [PATCH 3/5] buffer: make touch_buffer() an exported function Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2013-01-11 21:06 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, chavey, fengguang.wu, Tejun Heo

bio_{front|back}_merge tracepoints report a bio merging into an
existing request but didn't specify which request the bio is being
merged into.  Add @req to it.  This makes it impossible to share the
event template with block_bio_queue - split it out.

@req isn't used or exported to userland at this point and there is no
userland visible behavior change.  Later changes will make use of the
extra parameter.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-core.c             |  4 ++--
 include/trace/events/block.h | 45 +++++++++++++++++++++++++++++++++-----------
 kernel/trace/blktrace.c      |  2 ++
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d915d07..7d6910a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1347,7 +1347,7 @@ static bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
 	if (!ll_back_merge_fn(q, req, bio))
 		return false;
 
-	trace_block_bio_backmerge(q, bio);
+	trace_block_bio_backmerge(q, req, bio);
 
 	if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
 		blk_rq_set_mixed_merge(req);
@@ -1369,7 +1369,7 @@ static bool bio_attempt_front_merge(struct request_queue *q,
 	if (!ll_front_merge_fn(q, req, bio))
 		return false;
 
-	trace_block_bio_frontmerge(q, bio);
+	trace_block_bio_frontmerge(q, req, bio);
 
 	if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
 		blk_rq_set_mixed_merge(req);
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 8a168db..b408f51 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -241,11 +241,11 @@ TRACE_EVENT(block_bio_complete,
 		  __entry->nr_sector, __entry->error)
 );
 
-DECLARE_EVENT_CLASS(block_bio,
+DECLARE_EVENT_CLASS(block_bio_merge,
 
-	TP_PROTO(struct request_queue *q, struct bio *bio),
+	TP_PROTO(struct request_queue *q, struct request *rq, struct bio *bio),
 
-	TP_ARGS(q, bio),
+	TP_ARGS(q, rq, bio),
 
 	TP_STRUCT__entry(
 		__field( dev_t,		dev			)
@@ -272,31 +272,33 @@ DECLARE_EVENT_CLASS(block_bio,
 /**
  * block_bio_backmerge - merging block operation to the end of an existing operation
  * @q: queue holding operation
+ * @rq: request bio is being merged into
  * @bio: new block operation to merge
  *
  * Merging block request @bio to the end of an existing block request
  * in queue @q.
  */
-DEFINE_EVENT(block_bio, block_bio_backmerge,
+DEFINE_EVENT(block_bio_merge, block_bio_backmerge,
 
-	TP_PROTO(struct request_queue *q, struct bio *bio),
+	TP_PROTO(struct request_queue *q, struct request *rq, struct bio *bio),
 
-	TP_ARGS(q, bio)
+	TP_ARGS(q, rq, bio)
 );
 
 /**
  * block_bio_frontmerge - merging block operation to the beginning of an existing operation
  * @q: queue holding operation
+ * @rq: request bio is being merged into
  * @bio: new block operation to merge
  *
  * Merging block IO operation @bio to the beginning of an existing block
  * operation in queue @q.
  */
-DEFINE_EVENT(block_bio, block_bio_frontmerge,
+DEFINE_EVENT(block_bio_merge, block_bio_frontmerge,
 
-	TP_PROTO(struct request_queue *q, struct bio *bio),
+	TP_PROTO(struct request_queue *q, struct request *rq, struct bio *bio),
 
-	TP_ARGS(q, bio)
+	TP_ARGS(q, rq, bio)
 );
 
 /**
@@ -306,11 +308,32 @@ DEFINE_EVENT(block_bio, block_bio_frontmerge,
  *
  * About to place the block IO operation @bio into queue @q.
  */
-DEFINE_EVENT(block_bio, block_bio_queue,
+TRACE_EVENT(block_bio_queue,
 
 	TP_PROTO(struct request_queue *q, struct bio *bio),
 
-	TP_ARGS(q, bio)
+	TP_ARGS(q, bio),
+
+	TP_STRUCT__entry(
+		__field( dev_t,		dev			)
+		__field( sector_t,	sector			)
+		__field( unsigned int,	nr_sector		)
+		__array( char,		rwbs,	RWBS_LEN	)
+		__array( char,		comm,	TASK_COMM_LEN	)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= bio->bi_bdev->bd_dev;
+		__entry->sector		= bio->bi_sector;
+		__entry->nr_sector	= bio->bi_size >> 9;
+		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+		memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+	),
+
+	TP_printk("%d,%d %s %llu + %u [%s]",
+		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
+		  (unsigned long long)__entry->sector,
+		  __entry->nr_sector, __entry->comm)
 );
 
 DECLARE_EVENT_CLASS(block_get_rq,
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 190d98f..fb593f6 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -803,6 +803,7 @@ static void blk_add_trace_bio_complete(void *ignore, struct bio *bio, int error)
 
 static void blk_add_trace_bio_backmerge(void *ignore,
 					struct request_queue *q,
+					struct request *rq,
 					struct bio *bio)
 {
 	blk_add_trace_bio(q, bio, BLK_TA_BACKMERGE, 0);
@@ -810,6 +811,7 @@ static void blk_add_trace_bio_backmerge(void *ignore,
 
 static void blk_add_trace_bio_frontmerge(void *ignore,
 					 struct request_queue *q,
+					 struct request *rq,
 					 struct bio *bio)
 {
 	blk_add_trace_bio(q, bio, BLK_TA_FRONTMERGE, 0);
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] buffer: make touch_buffer() an exported function
  2013-01-11 21:06 [PATCHSET] block: improve tracepoints, take#2 Tejun Heo
  2013-01-11 21:06 ` [PATCH 1/5] block: add missing block_bio_complete() tracepoint Tejun Heo
  2013-01-11 21:06 ` [PATCH 2/5] block: add @req to bio_{front|back}_merge tracepoints Tejun Heo
@ 2013-01-11 21:06 ` Tejun Heo
  2013-01-11 21:06 ` [PATCH 4/5] block: add block_{touch|dirty}_buffer tracepoint Tejun Heo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2013-01-11 21:06 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, chavey, fengguang.wu, Tejun Heo, Steven Rostedt

We want to add a trace point to touch_buffer() but macros and inline
functions defined in header files can't have tracing points.  Move
touch_buffer() to fs/buffer.c and make it a proper function.

The new exported function is also declared inline.  As most uses of
touch_buffer() are inside buffer.c with nilfs2 as the only other user,
the effect of this change should be negligible.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 fs/buffer.c                 | 6 ++++++
 include/linux/buffer_head.h | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index c017a2d..a8c2dfb 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -53,6 +53,12 @@ void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private)
 }
 EXPORT_SYMBOL(init_buffer);
 
+inline void touch_buffer(struct buffer_head *bh)
+{
+	mark_page_accessed(bh->b_page);
+}
+EXPORT_SYMBOL(touch_buffer);
+
 static int sleep_on_buffer(void *word)
 {
 	io_schedule();
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 458f497..5afc4f9 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -126,7 +126,6 @@ BUFFER_FNS(Write_EIO, write_io_error)
 BUFFER_FNS(Unwritten, unwritten)
 
 #define bh_offset(bh)		((unsigned long)(bh)->b_data & ~PAGE_MASK)
-#define touch_buffer(bh)	mark_page_accessed(bh->b_page)
 
 /* If we *know* page->private refers to buffer_heads */
 #define page_buffers(page)					\
@@ -142,6 +141,7 @@ BUFFER_FNS(Unwritten, unwritten)
 
 void mark_buffer_dirty(struct buffer_head *bh);
 void init_buffer(struct buffer_head *, bh_end_io_t *, void *);
+void touch_buffer(struct buffer_head *bh);
 void set_bh_page(struct buffer_head *bh,
 		struct page *page, unsigned long offset);
 int try_to_free_buffers(struct page *);
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] block: add block_{touch|dirty}_buffer tracepoint
  2013-01-11 21:06 [PATCHSET] block: improve tracepoints, take#2 Tejun Heo
                   ` (2 preceding siblings ...)
  2013-01-11 21:06 ` [PATCH 3/5] buffer: make touch_buffer() an exported function Tejun Heo
@ 2013-01-11 21:06 ` Tejun Heo
  2013-01-11 21:06 ` [PATCH 5/5] writeback: add more tracepoints Tejun Heo
  2013-01-14 14:02 ` [PATCHSET] block: improve tracepoints, take#2 Jens Axboe
  5 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2013-01-11 21:06 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, chavey, fengguang.wu, Tejun Heo

The former is triggered from touch_buffer() and the latter
mark_buffer_dirty().

This is part of tracepoint additions to improve visiblity into
dirtying / writeback operations for io tracer and userland.

v2: Transformed writeback_dirty_buffer to block_dirty_buffer and made
    it share TP definition with block_touch_buffer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
---
 fs/buffer.c                  |  4 ++++
 include/trace/events/block.h | 51 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index a8c2dfb..87ff335 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -41,6 +41,7 @@
 #include <linux/bitops.h>
 #include <linux/mpage.h>
 #include <linux/bit_spinlock.h>
+#include <trace/events/block.h>
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 
@@ -55,6 +56,7 @@ EXPORT_SYMBOL(init_buffer);
 
 inline void touch_buffer(struct buffer_head *bh)
 {
+	trace_block_touch_buffer(bh);
 	mark_page_accessed(bh->b_page);
 }
 EXPORT_SYMBOL(touch_buffer);
@@ -1119,6 +1121,8 @@ void mark_buffer_dirty(struct buffer_head *bh)
 {
 	WARN_ON_ONCE(!buffer_uptodate(bh));
 
+	trace_block_dirty_buffer(bh);
+
 	/*
 	 * Very *carefully* optimize the it-is-already-dirty case.
 	 *
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index b408f51..9961726 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -6,10 +6,61 @@
 
 #include <linux/blktrace_api.h>
 #include <linux/blkdev.h>
+#include <linux/buffer_head.h>
 #include <linux/tracepoint.h>
 
 #define RWBS_LEN	8
 
+DECLARE_EVENT_CLASS(block_buffer,
+
+	TP_PROTO(struct buffer_head *bh),
+
+	TP_ARGS(bh),
+
+	TP_STRUCT__entry (
+		__field(  dev_t,	dev			)
+		__field(  sector_t,	sector			)
+		__field(  size_t,	size			)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= bh->b_bdev->bd_dev;
+		__entry->sector		= bh->b_blocknr;
+		__entry->size		= bh->b_size;
+	),
+
+	TP_printk("%d,%d sector=%llu size=%zu",
+		MAJOR(__entry->dev), MINOR(__entry->dev),
+		(unsigned long long)__entry->sector, __entry->size
+	)
+);
+
+/**
+ * block_touch_buffer - mark a buffer accessed
+ * @bh: buffer_head being touched
+ *
+ * Called from touch_buffer().
+ */
+DEFINE_EVENT(block_buffer, block_touch_buffer,
+
+	TP_PROTO(struct buffer_head *bh),
+
+	TP_ARGS(bh)
+);
+
+/**
+ * block_dirty_buffer - mark a buffer dirty
+ * @bh: buffer_head being dirtied
+ *
+ * Called from mark_buffer_dirty().
+ */
+DEFINE_EVENT(block_buffer, block_dirty_buffer,
+
+	TP_PROTO(struct buffer_head *bh),
+
+	TP_ARGS(bh)
+);
+
 DECLARE_EVENT_CLASS(block_rq_with_error,
 
 	TP_PROTO(struct request_queue *q, struct request *rq),
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] writeback: add more tracepoints
  2013-01-11 21:06 [PATCHSET] block: improve tracepoints, take#2 Tejun Heo
                   ` (3 preceding siblings ...)
  2013-01-11 21:06 ` [PATCH 4/5] block: add block_{touch|dirty}_buffer tracepoint Tejun Heo
@ 2013-01-11 21:06 ` Tejun Heo
  2013-01-12  3:17   ` Fengguang Wu
  2013-01-14 14:02 ` [PATCHSET] block: improve tracepoints, take#2 Jens Axboe
  5 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2013-01-11 21:06 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, chavey, fengguang.wu, Tejun Heo

Add tracepoints for page dirtying, writeback_single_inode start, inode
dirtying and writeback.  For the latter two inode events, a pair of
events are defined to denote start and end of the operations (the
starting one has _start suffix and the one w/o suffix happens after
the operation is complete).  These inode ops are FS specific and can
be non-trivial and having enclosing tracepoints is useful for external
tracers.

This is part of tracepoint additions to improve visiblity into
dirtying / writeback operations for io tracer and userland.

v2: writeback_dirty_inode[_start] TPs may be called for files on
    pseudo FSes w/ unregistered bdi.  Check whether bdi->dev is %NULL
    before dereferencing.

v3: buffer dirtying moved to a block TP.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fs-writeback.c                |  16 +++++-
 include/trace/events/writeback.h | 116 +++++++++++++++++++++++++++++++++++++++
 mm/page-writeback.c              |   2 +
 3 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 310972b..359494e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -318,8 +318,14 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work)
 
 static int write_inode(struct inode *inode, struct writeback_control *wbc)
 {
-	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode))
-		return inode->i_sb->s_op->write_inode(inode, wbc);
+	int ret;
+
+	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode)) {
+		trace_writeback_write_inode_start(inode, wbc);
+		ret = inode->i_sb->s_op->write_inode(inode, wbc);
+		trace_writeback_write_inode(inode, wbc);
+		return ret;
+	}
 	return 0;
 }
 
@@ -450,6 +456,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	WARN_ON(!(inode->i_state & I_SYNC));
 
+	trace_writeback_single_inode_start(inode, wbc, nr_to_write);
+
 	ret = do_writepages(mapping, wbc);
 
 	/*
@@ -1150,8 +1158,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	 * dirty the inode itself
 	 */
 	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
+		trace_writeback_dirty_inode_start(inode, flags);
+
 		if (sb->s_op->dirty_inode)
 			sb->s_op->dirty_inode(inode, flags);
+
+		trace_writeback_dirty_inode(inode, flags);
 	}
 
 	/*
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index b453d92..6a16fd2 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -32,6 +32,115 @@
 
 struct wb_writeback_work;
 
+TRACE_EVENT(writeback_dirty_page,
+
+	TP_PROTO(struct page *page, struct address_space *mapping),
+
+	TP_ARGS(page, mapping),
+
+	TP_STRUCT__entry (
+		__array(char, name, 32)
+		__field(unsigned long, ino)
+		__field(pgoff_t, index)
+	),
+
+	TP_fast_assign(
+		strncpy(__entry->name,
+			mapping ? dev_name(mapping->backing_dev_info->dev) : "(unknown)", 32);
+		__entry->ino = mapping ? mapping->host->i_ino : 0;
+		__entry->index = page->index;
+	),
+
+	TP_printk("bdi %s: ino=%lu index=%lu",
+		__entry->name,
+		__entry->ino,
+		__entry->index
+	)
+);
+
+DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
+
+	TP_PROTO(struct inode *inode, int flags),
+
+	TP_ARGS(inode, flags),
+
+	TP_STRUCT__entry (
+		__array(char, name, 32)
+		__field(unsigned long, ino)
+		__field(unsigned long, flags)
+	),
+
+	TP_fast_assign(
+		struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
+
+		/* may be called for files on pseudo FSes w/ unregistered bdi */
+		strncpy(__entry->name,
+			bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32);
+		__entry->ino		= inode->i_ino;
+		__entry->flags		= flags;
+	),
+
+	TP_printk("bdi %s: ino=%lu flags=%s",
+		__entry->name,
+		__entry->ino,
+		show_inode_state(__entry->flags)
+	)
+);
+
+DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode_start,
+
+	TP_PROTO(struct inode *inode, int flags),
+
+	TP_ARGS(inode, flags)
+);
+
+DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
+
+	TP_PROTO(struct inode *inode, int flags),
+
+	TP_ARGS(inode, flags)
+);
+
+DECLARE_EVENT_CLASS(writeback_write_inode_template,
+
+	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
+
+	TP_ARGS(inode, wbc),
+
+	TP_STRUCT__entry (
+		__array(char, name, 32)
+		__field(unsigned long, ino)
+		__field(int, sync_mode)
+	),
+
+	TP_fast_assign(
+		strncpy(__entry->name,
+			dev_name(inode->i_mapping->backing_dev_info->dev), 32);
+		__entry->ino		= inode->i_ino;
+		__entry->sync_mode	= wbc->sync_mode;
+	),
+
+	TP_printk("bdi %s: ino=%lu sync_mode=%d",
+		__entry->name,
+		__entry->ino,
+		__entry->sync_mode
+	)
+);
+
+DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode_start,
+
+	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
+
+	TP_ARGS(inode, wbc)
+);
+
+DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode,
+
+	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
+
+	TP_ARGS(inode, wbc)
+);
+
 DECLARE_EVENT_CLASS(writeback_work_class,
 	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work),
 	TP_ARGS(bdi, work),
@@ -479,6 +588,13 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 	)
 );
 
+DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode_start,
+	TP_PROTO(struct inode *inode,
+		 struct writeback_control *wbc,
+		 unsigned long nr_to_write),
+	TP_ARGS(inode, wbc, nr_to_write)
+);
+
 DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode,
 	TP_PROTO(struct inode *inode,
 		 struct writeback_control *wbc,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0713bfb..3734cef 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1982,6 +1982,8 @@ int __set_page_dirty_no_writeback(struct page *page)
  */
 void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
+	trace_writeback_dirty_page(page, mapping);
+
 	if (mapping_cap_account_dirty(mapping)) {
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
 		__inc_zone_page_state(page, NR_DIRTIED);
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] writeback: add more tracepoints
  2013-01-11 21:06 ` [PATCH 5/5] writeback: add more tracepoints Tejun Heo
@ 2013-01-12  3:17   ` Fengguang Wu
  2013-01-14 13:57     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Fengguang Wu @ 2013-01-12  3:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: axboe, linux-kernel, chavey, Jan Kara, linux-fsdevel

Patch looks good to me. But CC more people for review.

On Fri, Jan 11, 2013 at 01:06:37PM -0800, Tejun Heo wrote:
> Add tracepoints for page dirtying, writeback_single_inode start, inode
> dirtying and writeback.  For the latter two inode events, a pair of
> events are defined to denote start and end of the operations (the
> starting one has _start suffix and the one w/o suffix happens after
> the operation is complete).  These inode ops are FS specific and can
> be non-trivial and having enclosing tracepoints is useful for external
> tracers.
> 
> This is part of tracepoint additions to improve visiblity into
> dirtying / writeback operations for io tracer and userland.
> 
> v2: writeback_dirty_inode[_start] TPs may be called for files on
>     pseudo FSes w/ unregistered bdi.  Check whether bdi->dev is %NULL
>     before dereferencing.
> 
> v3: buffer dirtying moved to a block TP.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  fs/fs-writeback.c                |  16 +++++-
>  include/trace/events/writeback.h | 116 +++++++++++++++++++++++++++++++++++++++
>  mm/page-writeback.c              |   2 +
>  3 files changed, 132 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 310972b..359494e 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -318,8 +318,14 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work)
>  
>  static int write_inode(struct inode *inode, struct writeback_control *wbc)
>  {
> -	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode))
> -		return inode->i_sb->s_op->write_inode(inode, wbc);
> +	int ret;
> +
> +	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode)) {
> +		trace_writeback_write_inode_start(inode, wbc);
> +		ret = inode->i_sb->s_op->write_inode(inode, wbc);
> +		trace_writeback_write_inode(inode, wbc);
> +		return ret;
> +	}
>  	return 0;
>  }
>  
> @@ -450,6 +456,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  
>  	WARN_ON(!(inode->i_state & I_SYNC));
>  
> +	trace_writeback_single_inode_start(inode, wbc, nr_to_write);
> +
>  	ret = do_writepages(mapping, wbc);
>  
>  	/*
> @@ -1150,8 +1158,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  	 * dirty the inode itself
>  	 */
>  	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
> +		trace_writeback_dirty_inode_start(inode, flags);
> +
>  		if (sb->s_op->dirty_inode)
>  			sb->s_op->dirty_inode(inode, flags);
> +
> +		trace_writeback_dirty_inode(inode, flags);
>  	}
>  
>  	/*
> diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> index b453d92..6a16fd2 100644
> --- a/include/trace/events/writeback.h
> +++ b/include/trace/events/writeback.h
> @@ -32,6 +32,115 @@
>  
>  struct wb_writeback_work;
>  
> +TRACE_EVENT(writeback_dirty_page,
> +
> +	TP_PROTO(struct page *page, struct address_space *mapping),
> +
> +	TP_ARGS(page, mapping),
> +
> +	TP_STRUCT__entry (
> +		__array(char, name, 32)
> +		__field(unsigned long, ino)
> +		__field(pgoff_t, index)
> +	),
> +
> +	TP_fast_assign(
> +		strncpy(__entry->name,
> +			mapping ? dev_name(mapping->backing_dev_info->dev) : "(unknown)", 32);
> +		__entry->ino = mapping ? mapping->host->i_ino : 0;
> +		__entry->index = page->index;
> +	),
> +
> +	TP_printk("bdi %s: ino=%lu index=%lu",
> +		__entry->name,
> +		__entry->ino,
> +		__entry->index
> +	)
> +);
> +
> +DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
> +
> +	TP_PROTO(struct inode *inode, int flags),
> +
> +	TP_ARGS(inode, flags),
> +
> +	TP_STRUCT__entry (
> +		__array(char, name, 32)
> +		__field(unsigned long, ino)
> +		__field(unsigned long, flags)
> +	),
> +
> +	TP_fast_assign(
> +		struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
> +
> +		/* may be called for files on pseudo FSes w/ unregistered bdi */
> +		strncpy(__entry->name,
> +			bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32);
> +		__entry->ino		= inode->i_ino;
> +		__entry->flags		= flags;
> +	),
> +
> +	TP_printk("bdi %s: ino=%lu flags=%s",
> +		__entry->name,
> +		__entry->ino,
> +		show_inode_state(__entry->flags)
> +	)
> +);
> +
> +DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode_start,
> +
> +	TP_PROTO(struct inode *inode, int flags),
> +
> +	TP_ARGS(inode, flags)
> +);
> +
> +DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
> +
> +	TP_PROTO(struct inode *inode, int flags),
> +
> +	TP_ARGS(inode, flags)
> +);
> +
> +DECLARE_EVENT_CLASS(writeback_write_inode_template,
> +
> +	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
> +
> +	TP_ARGS(inode, wbc),
> +
> +	TP_STRUCT__entry (
> +		__array(char, name, 32)
> +		__field(unsigned long, ino)
> +		__field(int, sync_mode)
> +	),
> +
> +	TP_fast_assign(
> +		strncpy(__entry->name,
> +			dev_name(inode->i_mapping->backing_dev_info->dev), 32);
> +		__entry->ino		= inode->i_ino;
> +		__entry->sync_mode	= wbc->sync_mode;
> +	),
> +
> +	TP_printk("bdi %s: ino=%lu sync_mode=%d",
> +		__entry->name,
> +		__entry->ino,
> +		__entry->sync_mode
> +	)
> +);
> +
> +DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode_start,
> +
> +	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
> +
> +	TP_ARGS(inode, wbc)
> +);
> +
> +DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode,
> +
> +	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
> +
> +	TP_ARGS(inode, wbc)
> +);
> +
>  DECLARE_EVENT_CLASS(writeback_work_class,
>  	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work),
>  	TP_ARGS(bdi, work),
> @@ -479,6 +588,13 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
>  	)
>  );
>  
> +DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode_start,
> +	TP_PROTO(struct inode *inode,
> +		 struct writeback_control *wbc,
> +		 unsigned long nr_to_write),
> +	TP_ARGS(inode, wbc, nr_to_write)
> +);
> +
>  DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode,
>  	TP_PROTO(struct inode *inode,
>  		 struct writeback_control *wbc,
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0713bfb..3734cef 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1982,6 +1982,8 @@ int __set_page_dirty_no_writeback(struct page *page)
>   */
>  void account_page_dirtied(struct page *page, struct address_space *mapping)
>  {
> +	trace_writeback_dirty_page(page, mapping);
> +
>  	if (mapping_cap_account_dirty(mapping)) {
>  		__inc_zone_page_state(page, NR_FILE_DIRTY);
>  		__inc_zone_page_state(page, NR_DIRTIED);
> -- 
> 1.8.0.2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] writeback: add more tracepoints
  2013-01-12  3:17   ` Fengguang Wu
@ 2013-01-14 13:57     ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2013-01-14 13:57 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Tejun Heo, axboe, linux-kernel, chavey, Jan Kara, linux-fsdevel

On Sat 12-01-13 11:17:02, Wu Fengguang wrote:
> Patch looks good to me. But CC more people for review.
  The patch looks fine to me as well. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>

> On Fri, Jan 11, 2013 at 01:06:37PM -0800, Tejun Heo wrote:
> > Add tracepoints for page dirtying, writeback_single_inode start, inode
> > dirtying and writeback.  For the latter two inode events, a pair of
> > events are defined to denote start and end of the operations (the
> > starting one has _start suffix and the one w/o suffix happens after
> > the operation is complete).  These inode ops are FS specific and can
> > be non-trivial and having enclosing tracepoints is useful for external
> > tracers.
> > 
> > This is part of tracepoint additions to improve visiblity into
> > dirtying / writeback operations for io tracer and userland.
> > 
> > v2: writeback_dirty_inode[_start] TPs may be called for files on
> >     pseudo FSes w/ unregistered bdi.  Check whether bdi->dev is %NULL
> >     before dereferencing.
> > 
> > v3: buffer dirtying moved to a block TP.
> > 
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > ---
> >  fs/fs-writeback.c                |  16 +++++-
> >  include/trace/events/writeback.h | 116 +++++++++++++++++++++++++++++++++++++++
> >  mm/page-writeback.c              |   2 +
> >  3 files changed, 132 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 310972b..359494e 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -318,8 +318,14 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work)
> >  
> >  static int write_inode(struct inode *inode, struct writeback_control *wbc)
> >  {
> > -	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode))
> > -		return inode->i_sb->s_op->write_inode(inode, wbc);
> > +	int ret;
> > +
> > +	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode)) {
> > +		trace_writeback_write_inode_start(inode, wbc);
> > +		ret = inode->i_sb->s_op->write_inode(inode, wbc);
> > +		trace_writeback_write_inode(inode, wbc);
> > +		return ret;
> > +	}
> >  	return 0;
> >  }
> >  
> > @@ -450,6 +456,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> >  
> >  	WARN_ON(!(inode->i_state & I_SYNC));
> >  
> > +	trace_writeback_single_inode_start(inode, wbc, nr_to_write);
> > +
> >  	ret = do_writepages(mapping, wbc);
> >  
> >  	/*
> > @@ -1150,8 +1158,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> >  	 * dirty the inode itself
> >  	 */
> >  	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
> > +		trace_writeback_dirty_inode_start(inode, flags);
> > +
> >  		if (sb->s_op->dirty_inode)
> >  			sb->s_op->dirty_inode(inode, flags);
> > +
> > +		trace_writeback_dirty_inode(inode, flags);
> >  	}
> >  
> >  	/*
> > diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> > index b453d92..6a16fd2 100644
> > --- a/include/trace/events/writeback.h
> > +++ b/include/trace/events/writeback.h
> > @@ -32,6 +32,115 @@
> >  
> >  struct wb_writeback_work;
> >  
> > +TRACE_EVENT(writeback_dirty_page,
> > +
> > +	TP_PROTO(struct page *page, struct address_space *mapping),
> > +
> > +	TP_ARGS(page, mapping),
> > +
> > +	TP_STRUCT__entry (
> > +		__array(char, name, 32)
> > +		__field(unsigned long, ino)
> > +		__field(pgoff_t, index)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		strncpy(__entry->name,
> > +			mapping ? dev_name(mapping->backing_dev_info->dev) : "(unknown)", 32);
> > +		__entry->ino = mapping ? mapping->host->i_ino : 0;
> > +		__entry->index = page->index;
> > +	),
> > +
> > +	TP_printk("bdi %s: ino=%lu index=%lu",
> > +		__entry->name,
> > +		__entry->ino,
> > +		__entry->index
> > +	)
> > +);
> > +
> > +DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
> > +
> > +	TP_PROTO(struct inode *inode, int flags),
> > +
> > +	TP_ARGS(inode, flags),
> > +
> > +	TP_STRUCT__entry (
> > +		__array(char, name, 32)
> > +		__field(unsigned long, ino)
> > +		__field(unsigned long, flags)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
> > +
> > +		/* may be called for files on pseudo FSes w/ unregistered bdi */
> > +		strncpy(__entry->name,
> > +			bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32);
> > +		__entry->ino		= inode->i_ino;
> > +		__entry->flags		= flags;
> > +	),
> > +
> > +	TP_printk("bdi %s: ino=%lu flags=%s",
> > +		__entry->name,
> > +		__entry->ino,
> > +		show_inode_state(__entry->flags)
> > +	)
> > +);
> > +
> > +DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode_start,
> > +
> > +	TP_PROTO(struct inode *inode, int flags),
> > +
> > +	TP_ARGS(inode, flags)
> > +);
> > +
> > +DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
> > +
> > +	TP_PROTO(struct inode *inode, int flags),
> > +
> > +	TP_ARGS(inode, flags)
> > +);
> > +
> > +DECLARE_EVENT_CLASS(writeback_write_inode_template,
> > +
> > +	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
> > +
> > +	TP_ARGS(inode, wbc),
> > +
> > +	TP_STRUCT__entry (
> > +		__array(char, name, 32)
> > +		__field(unsigned long, ino)
> > +		__field(int, sync_mode)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		strncpy(__entry->name,
> > +			dev_name(inode->i_mapping->backing_dev_info->dev), 32);
> > +		__entry->ino		= inode->i_ino;
> > +		__entry->sync_mode	= wbc->sync_mode;
> > +	),
> > +
> > +	TP_printk("bdi %s: ino=%lu sync_mode=%d",
> > +		__entry->name,
> > +		__entry->ino,
> > +		__entry->sync_mode
> > +	)
> > +);
> > +
> > +DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode_start,
> > +
> > +	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
> > +
> > +	TP_ARGS(inode, wbc)
> > +);
> > +
> > +DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode,
> > +
> > +	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
> > +
> > +	TP_ARGS(inode, wbc)
> > +);
> > +
> >  DECLARE_EVENT_CLASS(writeback_work_class,
> >  	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work),
> >  	TP_ARGS(bdi, work),
> > @@ -479,6 +588,13 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
> >  	)
> >  );
> >  
> > +DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode_start,
> > +	TP_PROTO(struct inode *inode,
> > +		 struct writeback_control *wbc,
> > +		 unsigned long nr_to_write),
> > +	TP_ARGS(inode, wbc, nr_to_write)
> > +);
> > +
> >  DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode,
> >  	TP_PROTO(struct inode *inode,
> >  		 struct writeback_control *wbc,
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 0713bfb..3734cef 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -1982,6 +1982,8 @@ int __set_page_dirty_no_writeback(struct page *page)
> >   */
> >  void account_page_dirtied(struct page *page, struct address_space *mapping)
> >  {
> > +	trace_writeback_dirty_page(page, mapping);
> > +
> >  	if (mapping_cap_account_dirty(mapping)) {
> >  		__inc_zone_page_state(page, NR_FILE_DIRTY);
> >  		__inc_zone_page_state(page, NR_DIRTIED);
> > -- 
> > 1.8.0.2
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET] block: improve tracepoints, take#2
  2013-01-11 21:06 [PATCHSET] block: improve tracepoints, take#2 Tejun Heo
                   ` (4 preceding siblings ...)
  2013-01-11 21:06 ` [PATCH 5/5] writeback: add more tracepoints Tejun Heo
@ 2013-01-14 14:02 ` Jens Axboe
  2013-01-14 17:43   ` Tejun Heo
  5 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2013-01-14 14:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, chavey, fengguang.wu

On Fri, Jan 11 2013, Tejun Heo wrote:
> drivers
> triggering them manually and adds more buffer / block / writeback TPs.
> These improve visibility in general and are already in use in google.
> 
> This patchset contains the following five patches.
> 
> 0001-block-add-missing-block_bio_complete-tracepoint.patch
> 0002-block-add-req-to-bio_-front-back-_merge-tracepoints.patch
> 0003-buffer-make-touch_buffer-an-exported-function.patch
> 0004-block-add-block_-touch-dirty-_buffer-tracepoint.patch
> 0005-writeback-add-more-tracepoints.patch
> 
> It's based on top of v3.8-rc2 and also available in the following git
> branch.

Applied (with Jan's review on the writeback side). Since we now have
buffer dirtying in the proper traces, it'd be really nice to deprecate
the ancient block_dump and phase it out of the tree...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET] block: improve tracepoints, take#2
  2013-01-14 14:02 ` [PATCHSET] block: improve tracepoints, take#2 Jens Axboe
@ 2013-01-14 17:43   ` Tejun Heo
  2013-01-14 18:22     ` [PATCH] writeback: mark sysctl vm.block_dump for removal Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2013-01-14 17:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, chavey, fengguang.wu

Hello, Jens.

On Mon, Jan 14, 2013 at 03:02:28PM +0100, Jens Axboe wrote:
> > It's based on top of v3.8-rc2 and also available in the following git
> > branch.
> 
> Applied (with Jan's review on the writeback side). Since we now have
> buffer dirtying in the proper traces, it'd be really nice to deprecate
> the ancient block_dump and phase it out of the tree...

Yeah, that really is kinda nasty.  I'll cook up a patch to trigger a
warning message on it.  Maybe we can remove it in some years.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] writeback: mark sysctl vm.block_dump for removal
  2013-01-14 17:43   ` Tejun Heo
@ 2013-01-14 18:22     ` Tejun Heo
  2013-01-14 19:00       ` Jan Kara
  2013-01-15 16:28       ` [PATCH v2] " Tejun Heo
  0 siblings, 2 replies; 16+ messages in thread
From: Tejun Heo @ 2013-01-14 18:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, chavey, fengguang.wu, Jan Kara

vm.block_dump is nasty in that it dumps IO traces directly to printk.
It isn't scalable and can easily lead to looping behavior - IO
generating kernel message which in turn genreates log IO, ad
infinitum.

Now that proper tracepoints are in place, vm.block_dump isn't
necessary.  Generate a warning if used so that it can be removed down
the road.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/writeback.h |    3 +++
 kernel/sysctl.c           |    2 +-
 mm/page-writeback.c       |   14 ++++++++++++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index b82a83a..678ee6d 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -143,6 +143,9 @@ extern int dirty_ratio_handler(struct ctl_table *table, int write,
 extern int dirty_bytes_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos);
+extern int block_dump_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos);
 
 struct ctl_table;
 int dirty_writeback_centisecs_handler(struct ctl_table *, int,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..9e7f7b5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1288,7 +1288,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &block_dump,
 		.maxlen		= sizeof(block_dump),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= block_dump_handler,
 		.extra1		= &zero,
 	},
 	{
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 6f42712..5548bef 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -390,6 +390,20 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
 	return ret;
 }
 
+int block_dump_handler(struct ctl_table *table, int write,
+		       void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	static bool warned = false;
+	int ret;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (!warned && block_dump) {
+		pr_warning("sysctl: vm.block_dump is scheduled for removal. Please use tracepoints instead.\n");
+		warned = true;
+	}
+	return ret;
+}
+
 static unsigned long wp_next_time(unsigned long cur_time)
 {
 	cur_time += VM_COMPLETIONS_PERIOD_LEN;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] writeback: mark sysctl vm.block_dump for removal
  2013-01-14 18:22     ` [PATCH] writeback: mark sysctl vm.block_dump for removal Tejun Heo
@ 2013-01-14 19:00       ` Jan Kara
  2013-01-15 16:28       ` [PATCH v2] " Tejun Heo
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Kara @ 2013-01-14 19:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, chavey, fengguang.wu, Jan Kara

On Mon 14-01-13 10:22:03, Tejun Heo wrote:
> vm.block_dump is nasty in that it dumps IO traces directly to printk.
> It isn't scalable and can easily lead to looping behavior - IO
> generating kernel message which in turn genreates log IO, ad
> infinitum.
> 
> Now that proper tracepoints are in place, vm.block_dump isn't
> necessary.  Generate a warning if used so that it can be removed down
> the road.
  Looks good. Maybe we could extend the message to ask people to let us know
if they see the message? If someone is using it we shouldn't really remove
it.

								Honza
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  include/linux/writeback.h |    3 +++
>  kernel/sysctl.c           |    2 +-
>  mm/page-writeback.c       |   14 ++++++++++++++
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index b82a83a..678ee6d 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -143,6 +143,9 @@ extern int dirty_ratio_handler(struct ctl_table *table, int write,
>  extern int dirty_bytes_handler(struct ctl_table *table, int write,
>  		void __user *buffer, size_t *lenp,
>  		loff_t *ppos);
> +extern int block_dump_handler(struct ctl_table *table, int write,
> +		void __user *buffer, size_t *lenp,
> +		loff_t *ppos);
>  
>  struct ctl_table;
>  int dirty_writeback_centisecs_handler(struct ctl_table *, int,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c88878d..9e7f7b5 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1288,7 +1288,7 @@ static struct ctl_table vm_table[] = {
>  		.data		= &block_dump,
>  		.maxlen		= sizeof(block_dump),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec,
> +		.proc_handler	= block_dump_handler,
>  		.extra1		= &zero,
>  	},
>  	{
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 6f42712..5548bef 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -390,6 +390,20 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
>  	return ret;
>  }
>  
> +int block_dump_handler(struct ctl_table *table, int write,
> +		       void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	static bool warned = false;
> +	int ret;
> +
> +	ret = proc_dointvec(table, write, buffer, lenp, ppos);
> +	if (!warned && block_dump) {
> +		pr_warning("sysctl: vm.block_dump is scheduled for removal. Please use tracepoints instead.\n");
> +		warned = true;
> +	}
> +	return ret;
> +}
> +
>  static unsigned long wp_next_time(unsigned long cur_time)
>  {
>  	cur_time += VM_COMPLETIONS_PERIOD_LEN;
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] writeback: mark sysctl vm.block_dump for removal
  2013-01-14 18:22     ` [PATCH] writeback: mark sysctl vm.block_dump for removal Tejun Heo
  2013-01-14 19:00       ` Jan Kara
@ 2013-01-15 16:28       ` Tejun Heo
  2013-01-16  2:07         ` Jan Kara
  1 sibling, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2013-01-15 16:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, chavey, fengguang.wu, Jan Kara

vm.block_dump is nasty in that it dumps IO traces directly to printk.
It isn't scalable and can easily lead to looping behavior - IO
generating kernel message which in turn genreates log IO, ad
infinitum.

Now that proper tracepoints are in place, vm.block_dump isn't
necessary.  Generate a warning if used so that it can be removed down
the road.

Cc: Added contact info as suggested by Jan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
---
 include/linux/writeback.h |    3 +++
 kernel/sysctl.c           |    2 +-
 mm/page-writeback.c       |   15 +++++++++++++++
 patches/series            |    1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index b82a83a..678ee6d 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -143,6 +143,9 @@ extern int dirty_ratio_handler(struct ctl_table *table, int write,
 extern int dirty_bytes_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos);
+extern int block_dump_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos);
 
 struct ctl_table;
 int dirty_writeback_centisecs_handler(struct ctl_table *, int,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..9e7f7b5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1288,7 +1288,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &block_dump,
 		.maxlen		= sizeof(block_dump),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= block_dump_handler,
 		.extra1		= &zero,
 	},
 	{
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 6f42712..605632d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -390,6 +390,21 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
 	return ret;
 }
 
+int block_dump_handler(struct ctl_table *table, int write,
+		       void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	static bool warned = false;
+	int ret;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (!warned && block_dump) {
+		pr_warning("sysctl: vm.block_dump is scheduled for removal. Use tracepoints instead.\n");
+		pr_warning("sysctl: Please contact Jens Axboe <axboe@kernel.dk> if you require this.\n");
+		warned = true;
+	}
+	return ret;
+}
+
 static unsigned long wp_next_time(unsigned long cur_time)
 {
 	cur_time += VM_COMPLETIONS_PERIOD_LEN;
diff --git a/patches/series b/patches/series
index 9fb9bfe..b2582cb 100644
--- a/patches/series
+++ b/patches/series
@@ -18,3 +18,4 @@ kill-pool-gcwq
 remove-gcwq
 rename-nr_running
 cleanup-leftovers
+dbg

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] writeback: mark sysctl vm.block_dump for removal
  2013-01-15 16:28       ` [PATCH v2] " Tejun Heo
@ 2013-01-16  2:07         ` Jan Kara
  2013-01-16  2:53           ` Fengguang Wu
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2013-01-16  2:07 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel, chavey, fengguang.wu, Jan Kara

On Tue 15-01-13 08:28:13, Tejun Heo wrote:
> vm.block_dump is nasty in that it dumps IO traces directly to printk.
> It isn't scalable and can easily lead to looping behavior - IO
> generating kernel message which in turn genreates log IO, ad
> infinitum.
> 
> Now that proper tracepoints are in place, vm.block_dump isn't
> necessary.  Generate a warning if used so that it can be removed down
> the road.
> 
> Cc: Added contact info as suggested by Jan.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
  Thanks. The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  include/linux/writeback.h |    3 +++
>  kernel/sysctl.c           |    2 +-
>  mm/page-writeback.c       |   15 +++++++++++++++
>  patches/series            |    1 +
>  4 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index b82a83a..678ee6d 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -143,6 +143,9 @@ extern int dirty_ratio_handler(struct ctl_table *table, int write,
>  extern int dirty_bytes_handler(struct ctl_table *table, int write,
>  		void __user *buffer, size_t *lenp,
>  		loff_t *ppos);
> +extern int block_dump_handler(struct ctl_table *table, int write,
> +		void __user *buffer, size_t *lenp,
> +		loff_t *ppos);
>  
>  struct ctl_table;
>  int dirty_writeback_centisecs_handler(struct ctl_table *, int,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c88878d..9e7f7b5 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1288,7 +1288,7 @@ static struct ctl_table vm_table[] = {
>  		.data		= &block_dump,
>  		.maxlen		= sizeof(block_dump),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec,
> +		.proc_handler	= block_dump_handler,
>  		.extra1		= &zero,
>  	},
>  	{
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 6f42712..605632d 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -390,6 +390,21 @@ int dirty_bytes_handler(struct ctl_table *table, int write,
>  	return ret;
>  }
>  
> +int block_dump_handler(struct ctl_table *table, int write,
> +		       void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	static bool warned = false;
> +	int ret;
> +
> +	ret = proc_dointvec(table, write, buffer, lenp, ppos);
> +	if (!warned && block_dump) {
> +		pr_warning("sysctl: vm.block_dump is scheduled for removal. Use tracepoints instead.\n");
> +		pr_warning("sysctl: Please contact Jens Axboe <axboe@kernel.dk> if you require this.\n");
> +		warned = true;
> +	}
> +	return ret;
> +}
> +
>  static unsigned long wp_next_time(unsigned long cur_time)
>  {
>  	cur_time += VM_COMPLETIONS_PERIOD_LEN;
> diff --git a/patches/series b/patches/series
> index 9fb9bfe..b2582cb 100644
> --- a/patches/series
> +++ b/patches/series
> @@ -18,3 +18,4 @@ kill-pool-gcwq
>  remove-gcwq
>  rename-nr_running
>  cleanup-leftovers
> +dbg
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] writeback: mark sysctl vm.block_dump for removal
  2013-01-16  2:07         ` Jan Kara
@ 2013-01-16  2:53           ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2013-01-16  2:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Tejun Heo, Jens Axboe, linux-kernel, chavey

On Wed, Jan 16, 2013 at 03:07:53AM +0100, Jan Kara wrote:
> On Tue 15-01-13 08:28:13, Tejun Heo wrote:
> > vm.block_dump is nasty in that it dumps IO traces directly to printk.
> > It isn't scalable and can easily lead to looping behavior - IO
> > generating kernel message which in turn genreates log IO, ad
> > infinitum.
> > 
> > Now that proper tracepoints are in place, vm.block_dump isn't
> > necessary.  Generate a warning if used so that it can be removed down
> > the road.

> > +	if (!warned && block_dump) {
> > +		pr_warning("sysctl: vm.block_dump is scheduled for removal. Use tracepoints instead.\n");
> > +		pr_warning("sysctl: Please contact Jens Axboe <axboe@kernel.dk> if you require this.\n");
> > +		warned = true;

Looks good to me. A warning in dmesg is pertinent because the
block_dump users will be looking at dmesg for block dump results.

Hmm, perhaps a more practical tip on the usage of replacement
tracepoints or tool will help eliminate some user emails to Jens.

> > diff --git a/patches/series b/patches/series
> > index 9fb9bfe..b2582cb 100644
> > --- a/patches/series
> > +++ b/patches/series
> > @@ -18,3 +18,4 @@ kill-pool-gcwq
> >  remove-gcwq
> >  rename-nr_running
> >  cleanup-leftovers
> > +dbg

An extra chunk.

Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/5] writeback: add more tracepoints
  2013-01-09 16:45 [PATCHSET] block: improve tracepoints Tejun Heo
@ 2013-01-09 16:45 ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2013-01-09 16:45 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, chavey, Tejun Heo

Add tracepoints for page dirtying, writeback_single_inode start, inode
dirtying and writeback.  For the latter two inode events, a pair of
events are defined to denote start and end of the operations (the
starting one has _start suffix and the one w/o suffix happens after
the operation is complete).  These inode ops are FS specific and can
be non-trivial and having enclosing tracepoints is useful for external
tracers.

This is part of tracepoint additions to improve visiblity into
dirtying / writeback operations for io tracer and userland.

v2: writeback_dirty_inode[_start] TPs may be called for files on
    pseudo FSes w/ unregistered bdi.  Check whether bdi->dev is %NULL
    before dereferencing.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fs-writeback.c                |  17 ++++-
 include/trace/events/writeback.h | 135 +++++++++++++++++++++++++++++++++++++++
 mm/page-writeback.c              |   2 +
 3 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 310972b..6cc8872 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -27,6 +27,7 @@
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/tracepoint.h>
+#include <linux/buffer_head.h>
 #include "internal.h"
 
 /*
@@ -318,8 +319,14 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work)
 
 static int write_inode(struct inode *inode, struct writeback_control *wbc)
 {
-	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode))
-		return inode->i_sb->s_op->write_inode(inode, wbc);
+	int ret;
+
+	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode)) {
+		trace_writeback_write_inode_start(inode, wbc);
+		ret = inode->i_sb->s_op->write_inode(inode, wbc);
+		trace_writeback_write_inode(inode, wbc);
+		return ret;
+	}
 	return 0;
 }
 
@@ -450,6 +457,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 
 	WARN_ON(!(inode->i_state & I_SYNC));
 
+	trace_writeback_single_inode_start(inode, wbc, nr_to_write);
+
 	ret = do_writepages(mapping, wbc);
 
 	/*
@@ -1150,8 +1159,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	 * dirty the inode itself
 	 */
 	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
+		trace_writeback_dirty_inode_start(inode, flags);
+
 		if (sb->s_op->dirty_inode)
 			sb->s_op->dirty_inode(inode, flags);
+
+		trace_writeback_dirty_inode(inode, flags);
 	}
 
 	/*
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index b453d92..2aff757 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -32,6 +32,134 @@
 
 struct wb_writeback_work;
 
+TRACE_EVENT(writeback_dirty_buffer,
+
+	TP_PROTO(struct buffer_head *bh),
+
+	TP_ARGS(bh),
+
+	TP_STRUCT__entry(
+		__array(char, name, 32)
+		__field(unsigned long long, block)
+	),
+
+	TP_fast_assign(
+		strncpy(__entry->name, bh->b_bdev->bd_disk->disk_name, 32);
+		__entry->block = bh->b_blocknr;
+	),
+
+	TP_printk("%s: dirty buffer %llu", __entry->name, __entry->block)
+);
+
+TRACE_EVENT(writeback_dirty_page,
+
+	TP_PROTO(struct page *page, struct address_space *mapping),
+
+	TP_ARGS(page, mapping),
+
+	TP_STRUCT__entry (
+		__array(char, name, 32)
+		__field(unsigned long, ino)
+		__field(pgoff_t, index)
+	),
+
+	TP_fast_assign(
+		strncpy(__entry->name,
+			mapping ? dev_name(mapping->backing_dev_info->dev) : "(unknown)", 32);
+		__entry->ino = mapping ? mapping->host->i_ino : 0;
+		__entry->index = page->index;
+	),
+
+	TP_printk("bdi %s: ino=%lu index=%lu",
+		__entry->name,
+		__entry->ino,
+		__entry->index
+	)
+);
+
+DECLARE_EVENT_CLASS(writeback_dirty_inode_template,
+
+	TP_PROTO(struct inode *inode, int flags),
+
+	TP_ARGS(inode, flags),
+
+	TP_STRUCT__entry (
+		__array(char, name, 32)
+		__field(unsigned long, ino)
+		__field(unsigned long, flags)
+	),
+
+	TP_fast_assign(
+		struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
+
+		/* may be called for files on pseudo FSes w/ unregistered bdi */
+		strncpy(__entry->name,
+			bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32);
+		__entry->ino		= inode->i_ino;
+		__entry->flags		= flags;
+	),
+
+	TP_printk("bdi %s: ino=%lu flags=%s",
+		__entry->name,
+		__entry->ino,
+		show_inode_state(__entry->flags)
+	)
+);
+
+DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode_start,
+
+	TP_PROTO(struct inode *inode, int flags),
+
+	TP_ARGS(inode, flags)
+);
+
+DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode,
+
+	TP_PROTO(struct inode *inode, int flags),
+
+	TP_ARGS(inode, flags)
+);
+
+DECLARE_EVENT_CLASS(writeback_write_inode_template,
+
+	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
+
+	TP_ARGS(inode, wbc),
+
+	TP_STRUCT__entry (
+		__array(char, name, 32)
+		__field(unsigned long, ino)
+		__field(int, sync_mode)
+	),
+
+	TP_fast_assign(
+		strncpy(__entry->name,
+			dev_name(inode->i_mapping->backing_dev_info->dev), 32);
+		__entry->ino		= inode->i_ino;
+		__entry->sync_mode	= wbc->sync_mode;
+	),
+
+	TP_printk("bdi %s: ino=%lu sync_mode=%d",
+		__entry->name,
+		__entry->ino,
+		__entry->sync_mode
+	)
+);
+
+DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode_start,
+
+	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
+
+	TP_ARGS(inode, wbc)
+);
+
+DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode,
+
+	TP_PROTO(struct inode *inode, struct writeback_control *wbc),
+
+	TP_ARGS(inode, wbc)
+);
+
 DECLARE_EVENT_CLASS(writeback_work_class,
 	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work),
 	TP_ARGS(bdi, work),
@@ -479,6 +607,13 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template,
 	)
 );
 
+DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode_start,
+	TP_PROTO(struct inode *inode,
+		 struct writeback_control *wbc,
+		 unsigned long nr_to_write),
+	TP_ARGS(inode, wbc, nr_to_write)
+);
+
 DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode,
 	TP_PROTO(struct inode *inode,
 		 struct writeback_control *wbc,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0713bfb..3734cef 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1982,6 +1982,8 @@ int __set_page_dirty_no_writeback(struct page *page)
  */
 void account_page_dirtied(struct page *page, struct address_space *mapping)
 {
+	trace_writeback_dirty_page(page, mapping);
+
 	if (mapping_cap_account_dirty(mapping)) {
 		__inc_zone_page_state(page, NR_FILE_DIRTY);
 		__inc_zone_page_state(page, NR_DIRTIED);
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-01-16  2:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-11 21:06 [PATCHSET] block: improve tracepoints, take#2 Tejun Heo
2013-01-11 21:06 ` [PATCH 1/5] block: add missing block_bio_complete() tracepoint Tejun Heo
2013-01-11 21:06 ` [PATCH 2/5] block: add @req to bio_{front|back}_merge tracepoints Tejun Heo
2013-01-11 21:06 ` [PATCH 3/5] buffer: make touch_buffer() an exported function Tejun Heo
2013-01-11 21:06 ` [PATCH 4/5] block: add block_{touch|dirty}_buffer tracepoint Tejun Heo
2013-01-11 21:06 ` [PATCH 5/5] writeback: add more tracepoints Tejun Heo
2013-01-12  3:17   ` Fengguang Wu
2013-01-14 13:57     ` Jan Kara
2013-01-14 14:02 ` [PATCHSET] block: improve tracepoints, take#2 Jens Axboe
2013-01-14 17:43   ` Tejun Heo
2013-01-14 18:22     ` [PATCH] writeback: mark sysctl vm.block_dump for removal Tejun Heo
2013-01-14 19:00       ` Jan Kara
2013-01-15 16:28       ` [PATCH v2] " Tejun Heo
2013-01-16  2:07         ` Jan Kara
2013-01-16  2:53           ` Fengguang Wu
  -- strict thread matches above, loose matches on Subject: below --
2013-01-09 16:45 [PATCHSET] block: improve tracepoints Tejun Heo
2013-01-09 16:45 ` [PATCH 5/5] writeback: add more tracepoints Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).