All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Filesystem io types statistic
@ 2011-11-10 10:34 Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types Zheng Liu
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel

Hi all,

v1->v2: totally redesign this mechanism

This patchset implements an io types statistic mechanism for filesystem
and it has been added into ext4 to let us know how the ext4 is used by
applications. It is useful for us to analyze how to improve the filesystem
and applications. Nowadays, I have added it into ext4, but other filesytems
also can use it to count the io types by themselves.

A 'Issue' flag is added into buffer_head and will be set in submit_bh().
Thus, we can check this flag in filesystem to know that a request is issued
to the disk when this flag is set. Filesystems just need to check it in
read operation because filesystem should know whehter a write request hits
cache or not, at least in ext4. In filesystem, buffer needs to be locked in
checking and clearing this flag, but it doesn't cost much overhead.

In ext4, a per-cpu counter is defined and some functions are added to count
the io types of buffered/direct io. An exception is __breadahead() due to
this function doesn't need a buffer_head as argument or return value. So now
we cannot handle these requests calling __breadahead().

The IO types in ext4 have shown as following:
Metadata:
 - super block
 - group descriptor
 - inode bitmap
 - block bitmap
 - inode table
 - extent block
 - indirect block
 - dir index and entry
 - extended attribute
Data:
 - regular data block

The result is shown in sysfs. We can read from /sys/fs/ext4/$DEVICE/io_stats
to see the result. We can understand how much metadata or data requests are
issued to the disk according to the result.

I have finished some benchmarks to test its overhead that calling lock_buffer()
brings. The following fio script is used to run on a SSD. The result shows that
the ovheread can be ignored.

FIO config file:
[global]
ioengineshortync
bs=4k
filename=/mnt/sda1/testfile
size=64G
runtime=300
group_reporting
loops=500

[read]
rw=randread
numjobs=4

[write]
rw=randwrite
numjobs=1

The result (iops):
        w/o         w/
READ:  16304      15906 (-2.44%)
WRITE:  1332       1353 (+1.58%)

Any comments or suggestions are welcome.

Regards,
Zheng

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-11 10:48   ` Steven Whitehouse
  2011-11-10 10:34 ` [PATCH v2 2/8] ext4: Add new data structures and related functions " Zheng Liu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

'Issue' flag is added into buffer_head to indicate that a request is issued
to the disk. When an io doesn't hit page cache, 'Issue' flag will be set in
submit_bh(). Filesystem can understand request is sent to the disk according
to whether or not this flag is set. This flag will be cleared after filesystem
tests it. If filesystem doesn't want to count the io type, it can ignore this
flag.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/buffer.c                 |    3 +++
 include/linux/buffer_head.h |    5 +++++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 70a1974..f681caa 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2920,6 +2920,9 @@ int submit_bh(int rw, struct buffer_head * bh)
 	BUG_ON(buffer_delay(bh));
 	BUG_ON(buffer_unwritten(bh));
 
+	/* set issue flag for indicating a request is issued to the disk */
+	set_buffer_issue(bh);
+
 	/*
 	 * Only clear out a write error when rewriting
 	 */
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 458f497..2bccdb4 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -38,6 +38,10 @@ enum bh_state_bits {
 	BH_PrivateStart,/* not a state bit, but the first bit available
 			 * for private allocation by other entities
 			 */
+
+	BH_Issue,	/* Issue a request to the disk when this request
+			 * doesn't hit cache
+			 */
 };
 
 #define MAX_BUF_PER_PAGE (PAGE_CACHE_SIZE / 512)
@@ -124,6 +128,7 @@ BUFFER_FNS(Delay, delay)
 BUFFER_FNS(Boundary, boundary)
 BUFFER_FNS(Write_EIO, write_io_error)
 BUFFER_FNS(Unwritten, unwritten)
+BUFFER_FNS(Issue, issue)
 
 #define bh_offset(bh)		((unsigned long)(bh)->b_data & ~PAGE_MASK)
 #define touch_buffer(bh)	mark_page_accessed(bh->b_page)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/8] ext4: Add new data structures and related functions to count io types
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-11 10:58   ` Steven Whitehouse
  2011-11-10 10:34 ` [PATCH v2 3/8] ext4: Count metadata request of read operations in buffered io Zheng Liu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

A per-cpu counter is defined to store io types in ext4. We define 10 io types
in ext4, which includes 9 metadata types and 1 data type. Read and write
operations are counted separately. When checks 'Issue' flag, filesystem needs
to lock buffer.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/ext4/ext4.h  |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/super.c |   19 +++++++++++++++++++
 2 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5b0e26a..39a1495 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1108,6 +1108,23 @@ struct ext4_super_block {
 #define EXT4_MF_FS_ABORTED	0x0002	/* Fatal error detected */
 
 /*
+ * ext4 io types
+ */
+enum {
+	EXT4_IOS_SUPER_BLOCK = 0,
+	EXT4_IOS_GROUP_DESC,
+	EXT4_IOS_INODE_BITMAP,
+	EXT4_IOS_BLOCK_BITMAP,
+	EXT4_IOS_INODE_TABLE,
+	EXT4_IOS_EXTENT_BLOCK,
+	EXT4_IOS_INDIRECT_BLOCK,
+	EXT4_IOS_DIR_ENTRY,
+	EXT4_IOS_EXTENDED_ATTR,
+	EXT4_IOS_REGULAR_DATA,
+	EXT4_IOS_TYPE_END,
+};
+
+/*
  * fourth extended-fs super-block data in memory
  */
 struct ext4_sb_info {
@@ -1284,6 +1301,11 @@ static inline void ext4_set_io_unwritten_flag(struct inode *inode,
 	}
 }
 
+static inline unsigned ext4_blocks_per_page(struct inode *inode)
+{
+	return PAGE_CACHE_SIZE >> inode->i_blkbits;
+}
+
 /*
  * Inode dynamic state flags
  */
@@ -1926,6 +1948,37 @@ extern int ext4_group_extend(struct super_block *sb,
 				ext4_fsblk_t n_blocks_count);
 
 /* super.c */
+extern void __ext4_io_stat(int, int, unsigned long);
+#define ext4_ios_read(bh, type, count)					\
+	do {								\
+		if (!bh)						\
+			break;						\
+		lock_buffer(bh);					\
+		if (buffer_issue(bh)) {					\
+			clear_buffer_issue(bh);				\
+			__ext4_io_stat(READ, type, count);		\
+		}							\
+		unlock_buffer(bh);					\
+	} while (0)
+#define ext4_ios_write(handle, bh, type, count)				\
+	do {								\
+		if (!bh) {						\
+			__ext4_io_stat(WRITE, type, count);		\
+			break;						\
+		}							\
+		if (!handle || !ext4_handle_valid(handle)) {		\
+			if (buffer_dirty(bh))				\
+				break;					\
+		} else {						\
+			if (buffer_jbddirty(bh))			\
+				break;					\
+		}							\
+		__ext4_io_stat(WRITE, type, count);			\
+	} while(0)
+#define ext4_ios_read_one(bh, type)	ext4_ios_read(bh, type, 1)
+#define ext4_ios_write_one(handle, bh, type) \
+	ext4_ios_write(handle, bh, type, 1)
+
 extern void *ext4_kvmalloc(size_t size, gfp_t flags);
 extern void *ext4_kvzalloc(size_t size, gfp_t flags);
 extern void ext4_kvfree(void *ptr);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9953d80..3bec50c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4940,6 +4940,25 @@ out:
 
 #endif
 
+static DEFINE_PER_CPU(unsigned long [EXT4_IOS_TYPE_END][2], ext4_ios_counters);
+
+static inline unsigned long ext4_get_ios_counter(int rw, int type)
+{
+	unsigned long sum = 0;
+	int i;
+
+	for_each_possible_cpu(i)
+		sum += per_cpu(ext4_ios_counters[type][rw], i);
+
+	return sum;
+}
+
+void __ext4_io_stat(int rw, int type, unsigned long count)
+{
+	BUG_ON(type < 0 || type >= EXT4_IOS_TYPE_END);
+	this_cpu_add(ext4_ios_counters[type][rw], count);
+}
+
 static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
 		       const char *dev_name, void *data)
 {
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/8] ext4: Count metadata request of read operations in buffered io
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 2/8] ext4: Add new data structures and related functions " Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 4/8] ext4: Count data " Zheng Liu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

Add ext4_ios_read*() functions to count the metadata type of read in buffered
io. We cannot count it in breadahead() due to no buffer_head is passed or
returned in this function.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/ext4/balloc.c      |    1 +
 fs/ext4/extents.c     |    3 +++
 fs/ext4/ialloc.c      |    1 +
 fs/ext4/indirect.c    |    3 +++
 fs/ext4/inode.c       |    3 +++
 fs/ext4/mballoc.c     |    1 +
 fs/ext4/move_extent.c |    4 ++++
 fs/ext4/namei.c       |    5 ++++-
 fs/ext4/super.c       |    3 +++
 fs/ext4/xattr.c       |    6 ++++++
 10 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index f6dba45..3503c33 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -394,6 +394,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
 			    block_group, bitmap_blk);
 		return NULL;
 	}
+	ext4_ios_read_one(bh, EXT4_IOS_BLOCK_BITMAP);
 	ext4_valid_block_bitmap(sb, desc, block_group, bh);
 	/*
 	 * file system mounted not to panic on error,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..289994f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -672,6 +672,7 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
 				put_bh(bh);
 				goto err;
 			}
+			ext4_ios_read_one(bh, EXT4_IOS_EXTENT_BLOCK);
 			/* validate the extent entries */
 			need_to_validate = 1;
 		}
@@ -1326,6 +1327,7 @@ got_index:
 		bh = sb_bread(inode->i_sb, block);
 		if (bh == NULL)
 			return -EIO;
+		ext4_ios_read_one(bh, EXT4_IOS_EXTENT_BLOCK);
 		eh = ext_block_hdr(bh);
 		/* subtract from p_depth to get proper eh_depth */
 		if (ext4_ext_check(inode, eh, path->p_depth - depth)) {
@@ -2568,6 +2570,7 @@ again:
 				err = -EIO;
 				break;
 			}
+			ext4_ios_read_one(bh, EXT4_IOS_EXTENT_BLOCK);
 			if (WARN_ON(i + 1 > depth)) {
 				err = -EIO;
 				break;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 00beb4f..a86daff 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -161,6 +161,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 			    block_group, bitmap_blk);
 		return NULL;
 	}
+	ext4_ios_read_one(bh, EXT4_IOS_INODE_BITMAP);
 	return bh;
 }
 
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 3cfc73f..7ba97e3 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -162,6 +162,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
 				put_bh(bh);
 				goto failure;
 			}
+			ext4_ios_read_one(bh, EXT4_IOS_INDIRECT_BLOCK);
 			/* validate block references */
 			if (ext4_check_indirect_blockref(inode, bh)) {
 				put_bh(bh);
@@ -1266,6 +1267,8 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
 				continue;
 			}
 
+			ext4_ios_read_one(bh, EXT4_IOS_INDIRECT_BLOCK);
+
 			/* This zaps the entire block.  Bottom up. */
 			BUFFER_TRACE(bh, "free child branches");
 			ext4_free_branches(handle, inode, bh,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cc5a6da..d4f9da3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -701,6 +701,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
 		return bh;
 	ll_rw_block(READ | REQ_META | REQ_PRIO, 1, &bh);
 	wait_on_buffer(bh);
+	ext4_ios_read_one(bh, EXT4_IOS_DIR_ENTRY);
 	if (buffer_uptodate(bh))
 		return bh;
 	put_bh(bh);
@@ -3426,6 +3427,7 @@ int ext4_block_zero_page_range(handle_t *handle,
 		err = -EIO;
 		ll_rw_block(READ, 1, &bh);
 		wait_on_buffer(bh);
+		ext4_ios_read_one(bh, EXT4_IOS_INODE_TABLE);
 		/* Uhhuh. Read error. Complain and punt. */
 		if (!buffer_uptodate(bh))
 			goto unlock;
@@ -3683,6 +3685,7 @@ make_io:
 			brelse(bh);
 			return -EIO;
 		}
+		ext4_ios_read_one(bh, EXT4_IOS_INODE_TABLE);
 	}
 has_buffer:
 	iloc->bh = bh;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e2d8be8..e5ca591 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -883,6 +883,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
 		set_bitmap_uptodate(bh[i]);
 		bh[i]->b_end_io = end_buffer_read_sync;
 		submit_bh(READ, bh[i]);
+		ext4_ios_read_one(bh[i], EXT4_IOS_BLOCK_BITMAP);
 		mb_debug(1, "read bitmap for group %u\n", first_group + i);
 	}
 
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index c5826c6..200ae0f 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -102,6 +102,8 @@ mext_next_extent(struct inode *inode, struct ext4_ext_path *path,
 				sb_bread(inode->i_sb, path[ppos].p_block);
 			if (!path[ppos+1].p_bh)
 				return -EIO;
+			ext4_ios_read_one(path[ppos+1].p_bh,
+					  EXT4_IOS_EXTENT_BLOCK);
 			path[ppos+1].p_hdr =
 				ext_block_hdr(path[ppos+1].p_bh);
 
@@ -117,6 +119,8 @@ mext_next_extent(struct inode *inode, struct ext4_ext_path *path,
 					path[cur_ppos].p_block);
 				if (!path[cur_ppos+1].p_bh)
 					return -EIO;
+				ext4_ios_read_one(path[cur_ppos+1].p_bh,
+						  EXT4_IOS_EXTENT_BLOCK);
 				path[cur_ppos+1].p_hdr =
 					ext_block_hdr(path[cur_ppos+1].p_bh);
 			}
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index aa4c782..2bd167f 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -921,9 +921,12 @@ restart:
 				num++;
 				bh = ext4_getblk(NULL, dir, b++, 0, &err);
 				bh_use[ra_max] = bh;
-				if (bh)
+				if (bh) {
 					ll_rw_block(READ | REQ_META | REQ_PRIO,
 						    1, &bh);
+					ext4_ios_read_one(bh,
+							  EXT4_IOS_DIR_ENTRY);
+				}
 			}
 		}
 		if ((bh = bh_use[ra_ptr++]) == NULL)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3bec50c..08aa193 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3171,6 +3171,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		ext4_msg(sb, KERN_ERR, "unable to read superblock");
 		goto out_fail;
 	}
+	ext4_ios_read_one(bh, EXT4_IOS_SUPER_BLOCK);
 	/*
 	 * Note: s_es must be initialized as soon as possible because
 	 *       some ext4 macro-instructions depend on its value
@@ -3345,6 +3346,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			       "Can't read superblock on 2nd try");
 			goto failed_mount;
 		}
+		ext4_ios_read_one(bh, EXT4_IOS_SUPER_BLOCK);
 		es = (struct ext4_super_block *)(((char *)bh->b_data) + offset);
 		sbi->s_es = es;
 		if (es->s_magic != cpu_to_le16(EXT4_SUPER_MAGIC)) {
@@ -3555,6 +3557,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			db_count = i;
 			goto failed_mount2;
 		}
+		ext4_ios_read_one(sbi->s_group_desc[i], EXT4_IOS_GROUP_DESC);
 	}
 	if (!ext4_check_descriptors(sb, &first_not_zeroed)) {
 		ext4_msg(sb, KERN_ERR, "group descriptors corrupted!");
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 93a00d8..9869805 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -224,6 +224,7 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name,
 	bh = sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl);
 	if (!bh)
 		goto cleanup;
+	ext4_ios_read_one(bh, EXT4_IOS_EXTENDED_ATTR);
 	ea_bdebug(bh, "b_count=%d, refcount=%d",
 		atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
 	if (ext4_xattr_check_block(bh)) {
@@ -368,6 +369,7 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size)
 	error = -EIO;
 	if (!bh)
 		goto cleanup;
+	ext4_ios_read_one(bh, EXT4_IOS_EXTENDED_ATTR);
 	ea_bdebug(bh, "b_count=%d, refcount=%d",
 		atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
 	if (ext4_xattr_check_block(bh)) {
@@ -659,6 +661,7 @@ ext4_xattr_block_find(struct inode *inode, struct ext4_xattr_info *i,
 		error = -EIO;
 		if (!bs->bh)
 			goto cleanup;
+		ext4_ios_read_one(bs->bh, EXT4_IOS_EXTENDED_ATTR);
 		ea_bdebug(bs->bh, "b_count=%d, refcount=%d",
 			atomic_read(&(bs->bh->b_count)),
 			le32_to_cpu(BHDR(bs->bh)->h_refcount));
@@ -1192,6 +1195,7 @@ retry:
 		error = -EIO;
 		if (!bh)
 			goto cleanup;
+		ext4_ios_read_one(bh, EXT4_IOS_EXTENDED_ATTR);
 		if (ext4_xattr_check_block(bh)) {
 			EXT4_ERROR_INODE(inode, "bad block %llu",
 					 EXT4_I(inode)->i_file_acl);
@@ -1375,6 +1379,7 @@ ext4_xattr_delete_inode(handle_t *handle, struct inode *inode)
 				 EXT4_I(inode)->i_file_acl);
 		goto cleanup;
 	}
+	ext4_ios_read_one(bh, EXT4_IOS_EXTENDED_ATTR);
 	if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
 	    BHDR(bh)->h_blocks != cpu_to_le32(1)) {
 		EXT4_ERROR_INODE(inode, "bad block %llu",
@@ -1515,6 +1520,7 @@ again:
 			*pce = ce;
 			return bh;
 		}
+		ext4_ios_read_one(bh, EXT4_IOS_EXTENDED_ATTR);
 		brelse(bh);
 		ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash);
 	}
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/8] ext4: Count data request of read operations in buffered io
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
                   ` (2 preceding siblings ...)
  2011-11-10 10:34 ` [PATCH v2 3/8] ext4: Count metadata request of read operations in buffered io Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 5/8] ext4: Count metadata request of write " Zheng Liu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

Call ext4_ios_read*() and __ext4_io_stat() functions to count the data of
buffered io in ext4. In ext4_readpage() and ext4_readpages(), we call
__ext4_io_stat() directly because these functions don't use buffer_head.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/ext4/inode.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d4f9da3..5a1fca0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2680,6 +2680,8 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 static int ext4_readpage(struct file *file, struct page *page)
 {
 	trace_ext4_readpage(page);
+	__ext4_io_stat(READ, EXT4_IOS_REGULAR_DATA,
+		       ext4_blocks_per_page(page->mapping->host));
 	return mpage_readpage(page, ext4_get_block);
 }
 
@@ -2687,6 +2689,8 @@ static int
 ext4_readpages(struct file *file, struct address_space *mapping,
 		struct list_head *pages, unsigned nr_pages)
 {
+	__ext4_io_stat(READ, EXT4_IOS_REGULAR_DATA,
+		       nr_pages * ext4_blocks_per_page(mapping->host));
 	return mpage_readpages(mapping, pages, nr_pages, ext4_get_block);
 }
 
@@ -3305,6 +3309,8 @@ int ext4_discard_partial_page_buffers_no_lock(handle_t *handle,
 			err = -EIO;
 			ll_rw_block(READ, 1, &bh);
 			wait_on_buffer(bh);
+			ext4_ios_read(bh, EXT4_IOS_REGULAR_DATA,
+				      ext4_blocks_per_page(inode));
 			/* Uhhuh. Read error. Complain and punt.*/
 			if (!buffer_uptodate(bh))
 				goto next;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/8] ext4: Count metadata request of write operations in buffered io
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
                   ` (3 preceding siblings ...)
  2011-11-10 10:34 ` [PATCH v2 4/8] ext4: Count data " Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 6/8] ext4: Count data " Zheng Liu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

'Issue' flag in buffer_head doesn't be used because filesystem should know
whether or not write request can hit cache.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/ext4/extents.c     |    4 ++++
 fs/ext4/ialloc.c      |    8 ++++++++
 fs/ext4/indirect.c    |   12 ++++++++++--
 fs/ext4/inode.c       |    4 ++++
 fs/ext4/mballoc.c     |    7 +++++++
 fs/ext4/move_extent.c |    2 ++
 fs/ext4/namei.c       |   13 +++++++++++++
 fs/ext4/super.c       |    2 ++
 fs/ext4/xattr.c       |   10 +++++++++-
 9 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 289994f..572d44a 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -106,6 +106,7 @@ static int __ext4_ext_dirty(const char *where, unsigned int line,
 		/* path points to block */
 		err = __ext4_handle_dirty_metadata(where, line, handle,
 						   inode, path->p_bh);
+		ext4_ios_write_one(handle, path->p_bh, EXT4_IOS_EXTENT_BLOCK);
 	} else {
 		/* path points to leaf/index in inode body */
 		err = ext4_mark_inode_dirty(handle, inode);
@@ -902,6 +903,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
 	set_buffer_uptodate(bh);
 	unlock_buffer(bh);
 
+	ext4_ios_write_one(handle, bh, EXT4_IOS_EXTENT_BLOCK);
 	err = ext4_handle_dirty_metadata(handle, inode, bh);
 	if (err)
 		goto cleanup;
@@ -980,6 +982,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
 		set_buffer_uptodate(bh);
 		unlock_buffer(bh);
 
+		ext4_ios_write_one(handle, bh, EXT4_IOS_EXTENT_BLOCK);
 		err = ext4_handle_dirty_metadata(handle, inode, bh);
 		if (err)
 			goto cleanup;
@@ -1077,6 +1080,7 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
 	set_buffer_uptodate(bh);
 	unlock_buffer(bh);
 
+	ext4_ios_write_one(handle, bh, EXT4_IOS_EXTENT_BLOCK);
 	err = ext4_handle_dirty_metadata(handle, inode, bh);
 	if (err)
 		goto out;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index a86daff..e3d51f4 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -278,10 +278,12 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
 			atomic_dec(&sbi->s_flex_groups[f].used_dirs);
 	}
 	BUFFER_TRACE(bh2, "call ext4_handle_dirty_metadata");
+	ext4_ios_write_one(handle, bh2, EXT4_IOS_GROUP_DESC);
 	fatal = ext4_handle_dirty_metadata(handle, NULL, bh2);
 out:
 	if (cleared) {
 		BUFFER_TRACE(bitmap_bh, "call ext4_handle_dirty_metadata");
+		ext4_ios_write_one(handle, bitmap_bh, EXT4_IOS_INODE_BITMAP);
 		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
 		if (!fatal)
 			fatal = err;
@@ -777,6 +779,8 @@ repeat_in_this_group:
 				/* we won it */
 				BUFFER_TRACE(inode_bitmap_bh,
 					"call ext4_handle_dirty_metadata");
+				ext4_ios_write_one(handle, inode_bitmap_bh,
+						   EXT4_IOS_INODE_BITMAP);
 				err = ext4_handle_dirty_metadata(handle,
 								 NULL,
 							inode_bitmap_bh);
@@ -822,6 +826,8 @@ got:
 		}
 
 		BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
+		ext4_ios_write_one(handle, block_bitmap_bh,
+				   EXT4_IOS_BLOCK_BITMAP);
 		err = ext4_handle_dirty_metadata(handle, NULL, block_bitmap_bh);
 		brelse(block_bitmap_bh);
 
@@ -840,6 +846,7 @@ got:
 			goto fail;
 	}
 	BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
+	ext4_ios_write_one(handle, group_desc_bh, EXT4_IOS_GROUP_DESC);
 	err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
 	if (err)
 		goto fail;
@@ -1188,6 +1195,7 @@ skip_zeroout:
 
 	BUFFER_TRACE(group_desc_bh,
 		     "call ext4_handle_dirty_metadata");
+	ext4_ios_write_one(handle, group_desc_bh, EXT4_IOS_GROUP_DESC);
 	ret = ext4_handle_dirty_metadata(handle, NULL,
 					 group_desc_bh);
 
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 7ba97e3..0d7174e 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -506,6 +506,7 @@ static int ext4_alloc_branch(handle_t *handle, struct inode *inode,
 		unlock_buffer(bh);
 
 		BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
+		ext4_ios_write_one(handle, bh, EXT4_IOS_INDIRECT_BLOCK);
 		err = ext4_handle_dirty_metadata(handle, inode, bh);
 		if (err)
 			goto failed;
@@ -593,6 +594,7 @@ static int ext4_splice_branch(handle_t *handle, struct inode *inode,
 		 */
 		jbd_debug(5, "splicing indirect only\n");
 		BUFFER_TRACE(where->bh, "call ext4_handle_dirty_metadata");
+		ext4_ios_write_one(handle, where->bh, EXT4_IOS_INDIRECT_BLOCK);
 		err = ext4_handle_dirty_metadata(handle, inode, where->bh);
 		if (err)
 			goto err_out;
@@ -1093,6 +1095,8 @@ static int ext4_clear_blocks(handle_t *handle, struct inode *inode,
 	if (try_to_extend_transaction(handle, inode)) {
 		if (bh) {
 			BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
+			ext4_ios_write_one(handle, bh,
+					   EXT4_IOS_INDIRECT_BLOCK);
 			err = ext4_handle_dirty_metadata(handle, inode, bh);
 			if (unlikely(err))
 				goto out_err;
@@ -1203,9 +1207,11 @@ static void ext4_free_data(handle_t *handle, struct inode *inode,
 		 * block pointed to itself, it would have been detached when
 		 * the block was cleared. Check for this instead of OOPSing.
 		 */
-		if ((EXT4_JOURNAL(inode) == NULL) || bh2jh(this_bh))
+		if ((EXT4_JOURNAL(inode) == NULL) || bh2jh(this_bh)) {
+			ext4_ios_write_one(handle, this_bh,
+					   EXT4_IOS_INDIRECT_BLOCK);
 			ext4_handle_dirty_metadata(handle, inode, this_bh);
-		else
+		} else
 			EXT4_ERROR_INODE(inode,
 					 "circular indirect block detected at "
 					 "block %llu",
@@ -1327,6 +1333,8 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
 					*p = 0;
 					BUFFER_TRACE(parent_bh,
 					"call ext4_handle_dirty_metadata");
+					ext4_ios_write_one(handle, parent_bh,
+						EXT4_IOS_INDIRECT_BLOCK);
 					ext4_handle_dirty_metadata(handle,
 								   inode,
 								   parent_bh);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5a1fca0..c584f89 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -675,6 +675,7 @@ struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode,
 		}
 		unlock_buffer(bh);
 		BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
+		ext4_ios_write_one(handle, bh, EXT4_IOS_DIR_ENTRY);
 		err = ext4_handle_dirty_metadata(handle, inode, bh);
 		if (!fatal)
 			fatal = err;
@@ -4086,6 +4087,8 @@ static int ext4_do_update_inode(handle_t *handle,
 					EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
 			sb->s_dirt = 1;
 			ext4_handle_sync(handle);
+			ext4_ios_write_one(handle, EXT4_SB(sb)->s_sbh,
+					   EXT4_IOS_SUPER_BLOCK);
 			err = ext4_handle_dirty_metadata(handle, NULL,
 					EXT4_SB(sb)->s_sbh);
 		}
@@ -4115,6 +4118,7 @@ static int ext4_do_update_inode(handle_t *handle,
 	}
 
 	BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
+	ext4_ios_write_one(handle, bh, EXT4_IOS_INODE_TABLE);
 	rc = ext4_handle_dirty_metadata(handle, NULL, bh);
 	if (!err)
 		err = rc;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e5ca591..3ea1bed 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2825,6 +2825,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 		ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start,
 			      ac->ac_b_ex.fe_len);
 		ext4_unlock_group(sb, ac->ac_b_ex.fe_group);
+		ext4_ios_write_one(handle, bitmap_bh, EXT4_IOS_BLOCK_BITMAP);
 		err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
 		if (!err)
 			err = -EAGAIN;
@@ -2870,9 +2871,11 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
 			   &sbi->s_flex_groups[flex_group].free_clusters);
 	}
 
+	ext4_ios_write_one(handle, bitmap_bh, EXT4_IOS_BLOCK_BITMAP);
 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
 	if (err)
 		goto out_err;
+	ext4_ios_write_one(handle, gdp_bh, EXT4_IOS_GROUP_DESC);
 	err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
 
 out_err:
@@ -4736,10 +4739,12 @@ do_more:
 
 	/* We dirtied the bitmap block */
 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+	ext4_ios_write_one(handle, bitmap_bh, EXT4_IOS_BLOCK_BITMAP);
 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
 
 	/* And the group descriptor block */
 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
+	ext4_ios_write_one(handle, gd_bh, EXT4_IOS_GROUP_DESC);
 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
 	if (!err)
 		err = ret;
@@ -4876,10 +4881,12 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 
 	/* We dirtied the bitmap block */
 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+	ext4_ios_write_one(handle, bitmap_bh, EXT4_IOS_BLOCK_BITMAP);
 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
 
 	/* And the group descriptor block */
 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
+	ext4_ios_write_one(handle, gd_bh, EXT4_IOS_GROUP_DESC);
 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
 	if (!err)
 		err = ret;
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 200ae0f..c5fa87a 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -443,6 +443,8 @@ mext_insert_extents(handle_t *handle, struct inode *orig_inode,
 						end_ext, eh, range_to_move);
 
 	if (depth) {
+		ext4_ios_write_one(handle, orig_path->p_bh,
+				   EXT4_IOS_EXTENT_BLOCK);
 		ret = ext4_handle_dirty_metadata(handle, orig_inode,
 						 orig_path->p_bh);
 		if (ret)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 2bd167f..4c768f4 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1224,9 +1224,11 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
 		de = de2;
 	}
 	dx_insert_block(frame, hash2 + continued, newblock);
+	ext4_ios_write_one(handle, bh2, EXT4_IOS_DIR_ENTRY);
 	err = ext4_handle_dirty_metadata(handle, dir, bh2);
 	if (err)
 		goto journal_error;
+	ext4_ios_write_one(handle, frame->bh, EXT4_IOS_DIR_ENTRY);
 	err = ext4_handle_dirty_metadata(handle, dir, frame->bh);
 	if (err)
 		goto journal_error;
@@ -1324,6 +1326,7 @@ static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
 	dir->i_version++;
 	ext4_mark_inode_dirty(handle, dir);
 	BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
+	ext4_ios_write_one(handle, bh, EXT4_IOS_DIR_ENTRY);
 	err = ext4_handle_dirty_metadata(handle, dir, bh);
 	if (err)
 		ext4_std_error(dir->i_sb, err);
@@ -1414,7 +1417,9 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
 	frame->bh = bh;
 	bh = bh2;
 
+	ext4_ios_write_one(handle, frame->bh, EXT4_IOS_DIR_ENTRY);
 	ext4_handle_dirty_metadata(handle, dir, frame->bh);
+	ext4_ios_write_one(handle, bh, EXT4_IOS_DIR_ENTRY);
 	ext4_handle_dirty_metadata(handle, dir, bh);
 
 	de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
@@ -1589,6 +1594,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
 			dxtrace(dx_show_index("node", frames[1].entries));
 			dxtrace(dx_show_index("node",
 			       ((struct dx_node *) bh2->b_data)->entries));
+			ext4_ios_write_one(handle, bh2, EXT4_IOS_DIR_ENTRY);
 			err = ext4_handle_dirty_metadata(handle, dir, bh2);
 			if (err)
 				goto journal_error;
@@ -1615,6 +1621,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,
 			if (err)
 				goto journal_error;
 		}
+		ext4_ios_write_one(handle, frames[0].bh, EXT4_IOS_DIR_ENTRY);
 		err = ext4_handle_dirty_metadata(handle, dir, frames[0].bh);
 		if (err) {
 			ext4_std_error(inode->i_sb, err);
@@ -1673,6 +1680,7 @@ static int ext4_delete_entry(handle_t *handle,
 				de->inode = 0;
 			dir->i_version++;
 			BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
+			ext4_ios_write_one(handle, bh, EXT4_IOS_DIR_ENTRY);
 			err = ext4_handle_dirty_metadata(handle, dir, bh);
 			if (unlikely(err)) {
 				ext4_std_error(dir->i_sb, err);
@@ -1865,6 +1873,7 @@ retry:
 	ext4_set_de_type(dir->i_sb, de, S_IFDIR);
 	set_nlink(inode, 2);
 	BUFFER_TRACE(dir_block, "call ext4_handle_dirty_metadata");
+	ext4_ios_write_one(handle, dir_block, EXT4_IOS_DIR_ENTRY);
 	err = ext4_handle_dirty_metadata(handle, inode, dir_block);
 	if (err)
 		goto out_clear_inode;
@@ -2016,6 +2025,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
 	/* Insert this inode at the head of the on-disk orphan list... */
 	NEXT_ORPHAN(inode) = le32_to_cpu(EXT4_SB(sb)->s_es->s_last_orphan);
 	EXT4_SB(sb)->s_es->s_last_orphan = cpu_to_le32(inode->i_ino);
+	ext4_ios_write_one(handle, EXT4_SB(sb)->s_sbh, EXT4_IOS_SUPER_BLOCK);
 	err = ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh);
 	rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
 	if (!err)
@@ -2089,6 +2099,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
 		if (err)
 			goto out_brelse;
 		sbi->s_es->s_last_orphan = cpu_to_le32(ino_next);
+		ext4_ios_write_one(handle, sbi->s_sbh, EXT4_IOS_SUPER_BLOCK);
 		err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh);
 	} else {
 		struct ext4_iloc iloc2;
@@ -2478,6 +2489,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 					ext4_current_time(new_dir);
 		ext4_mark_inode_dirty(handle, new_dir);
 		BUFFER_TRACE(new_bh, "call ext4_handle_dirty_metadata");
+		ext4_ios_write_one(handle, new_bh, EXT4_IOS_DIR_ENTRY);
 		retval = ext4_handle_dirty_metadata(handle, new_dir, new_bh);
 		if (unlikely(retval)) {
 			ext4_std_error(new_dir->i_sb, retval);
@@ -2532,6 +2544,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		PARENT_INO(dir_bh->b_data, new_dir->i_sb->s_blocksize) =
 						cpu_to_le32(new_dir->i_ino);
 		BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
+		ext4_ios_write_one(handle, dir_bh, EXT4_IOS_DIR_ENTRY);
 		retval = ext4_handle_dirty_metadata(handle, old_inode, dir_bh);
 		if (retval) {
 			ext4_std_error(old_dir->i_sb, retval);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 08aa193..8e27b9c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4188,6 +4188,7 @@ static int ext4_commit_super(struct super_block *sb, int sync)
 				&EXT4_SB(sb)->s_freeinodes_counter));
 	sb->s_dirt = 0;
 	BUFFER_TRACE(sbh, "marking dirty");
+	ext4_ios_write_one(NULL, sbh, EXT4_IOS_SUPER_BLOCK);
 	mark_buffer_dirty(sbh);
 	if (sync) {
 		error = sync_dirty_buffer(sbh);
@@ -4925,6 +4926,7 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
 	memcpy(bh->b_data+offset, data, len);
 	flush_dcache_page(bh->b_page);
 	unlock_buffer(bh);
+	ext4_ios_write_one(handle, bh, EXT4_IOS_SUPER_BLOCK);
 	err = ext4_handle_dirty_metadata(handle, NULL, bh);
 	brelse(bh);
 out:
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 9869805..8df9076 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -491,6 +491,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 				 EXT4_FREE_BLOCKS_FORGET);
 	} else {
 		le32_add_cpu(&BHDR(bh)->h_refcount, -1);
+		ext4_ios_write_one(handle, bh, EXT4_IOS_EXTENDED_ATTR);
 		error = ext4_handle_dirty_metadata(handle, inode, bh);
 		if (IS_SYNC(inode))
 			ext4_handle_sync(handle);
@@ -727,10 +728,13 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 			unlock_buffer(bs->bh);
 			if (error == -EIO)
 				goto bad_block;
-			if (!error)
+			if (!error) {
+				ext4_ios_write_one(handle, bs->bh,
+						   EXT4_IOS_EXTENDED_ATTR);
 				error = ext4_handle_dirty_metadata(handle,
 								   inode,
 								   bs->bh);
+			}
 			if (error)
 				goto cleanup;
 			goto inserted;
@@ -799,6 +803,8 @@ inserted:
 				ea_bdebug(new_bh, "reusing; refcount now=%d",
 					le32_to_cpu(BHDR(new_bh)->h_refcount));
 				unlock_buffer(new_bh);
+				ext4_ios_write_one(handle, new_bh,
+						   EXT4_IOS_EXTENDED_ATTR);
 				error = ext4_handle_dirty_metadata(handle,
 								   inode,
 								   new_bh);
@@ -857,6 +863,8 @@ getblk_failed:
 			set_buffer_uptodate(new_bh);
 			unlock_buffer(new_bh);
 			ext4_xattr_cache_insert(new_bh);
+			ext4_ios_write_one(handle, new_bh,
+					   EXT4_IOS_EXTENDED_ATTR);
 			error = ext4_handle_dirty_metadata(handle,
 							   inode, new_bh);
 			if (error)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/8] ext4: Count data request of write operations in buffered io
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
                   ` (4 preceding siblings ...)
  2011-11-10 10:34 ` [PATCH v2 5/8] ext4: Count metadata request of write " Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 7/8] ext4: Count all requests in direct io Zheng Liu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

'Issue' flag in buffer_head doesn't be used.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/ext4/inode.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c584f89..e552fb2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1907,6 +1907,8 @@ static int ext4_writepage(struct page *page,
 	struct inode *inode = page->mapping->host;
 
 	trace_ext4_writepage(page);
+	__ext4_io_stat(WRITE, EXT4_IOS_REGULAR_DATA,
+		       ext4_blocks_per_page(inode));
 	size = i_size_read(inode);
 	if (page->index == size >> PAGE_CACHE_SHIFT)
 		len = size & ~PAGE_CACHE_MASK;
@@ -2081,6 +2083,8 @@ static int write_cache_pages_da(struct address_space *mapping,
 			logical = (sector_t) page->index <<
 				(PAGE_CACHE_SHIFT - inode->i_blkbits);
 
+			__ext4_io_stat(WRITE, EXT4_IOS_REGULAR_DATA,
+				       ext4_blocks_per_page(inode));
 			if (!page_has_buffers(page)) {
 				mpage_add_bh_to_extent(mpd, logical,
 						       PAGE_CACHE_SIZE,
@@ -3454,8 +3458,10 @@ int ext4_block_zero_page_range(handle_t *handle,
 	err = 0;
 	if (ext4_should_journal_data(inode)) {
 		err = ext4_handle_dirty_metadata(handle, inode, bh);
-	} else
+	} else {
+		ext4_ios_write_one(handle, bh, EXT4_IOS_REGULAR_DATA);
 		mark_buffer_dirty(bh);
+	}
 
 unlock:
 	unlock_page(page);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/8] ext4: Count all requests in direct io
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
                   ` (5 preceding siblings ...)
  2011-11-10 10:34 ` [PATCH v2 6/8] ext4: Count data " Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-10 10:34 ` [PATCH v2 8/8] ext4: Show the result of io types statistic in sysfs Zheng Liu
  2011-11-11 10:55 ` [PATCH v2 0/8] Filesystem io types statistic Steven Whitehouse
  8 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

ext4_ios_submit_io() function is defined to be called by __blockdev_direct_IO()
to count direct io.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/ext4/ext4.h     |    2 ++
 fs/ext4/indirect.c |    2 +-
 fs/ext4/inode.c    |   12 +++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 39a1495..68032e9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1916,6 +1916,8 @@ extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern void ext4_da_update_reserve_space(struct inode *inode,
 					int used, int quota_claim);
+extern void ext4_ios_submit_io(int rw, struct bio *bio,
+			       struct inode *inode, loff_t file_offset);
 
 /* indirect.c */
 extern int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 0d7174e..a27bb19 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -819,7 +819,7 @@ retry:
 		ret = __blockdev_direct_IO(rw, iocb, inode,
 				 inode->i_sb->s_bdev, iov,
 				 offset, nr_segs,
-				 ext4_get_block, NULL, NULL, 0);
+				 ext4_get_block, NULL, ext4_ios_submit_io, 0);
 	} else {
 		ret = blockdev_direct_IO(rw, iocb, inode, iov,
 				 offset, nr_segs, ext4_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e552fb2..600a742 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2962,7 +2962,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 					 offset, nr_segs,
 					 ext4_get_block_write,
 					 ext4_end_io_dio,
-					 NULL,
+					 ext4_ios_submit_io,
 					 DIO_LOCKING | DIO_SKIP_HOLES);
 		if (iocb->private)
 			EXT4_I(inode)->cur_aio_dio = NULL;
@@ -3003,6 +3003,16 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
 	return ext4_ind_direct_IO(rw, iocb, iov, offset, nr_segs);
 }
 
+void ext4_ios_submit_io(int rw, struct bio *bio,
+			struct inode *inode, loff_t file_offset)
+{
+	int tmprw = !!rw;
+
+	/* ext4 io type statistic */
+	__ext4_io_stat(tmprw, EXT4_IOS_REGULAR_DATA, ext4_blocks_per_page(inode));
+	submit_bio(rw, bio);
+}
+
 static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb,
 			      const struct iovec *iov, loff_t offset,
 			      unsigned long nr_segs)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 8/8] ext4: Show the result of io types statistic in sysfs
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
                   ` (6 preceding siblings ...)
  2011-11-10 10:34 ` [PATCH v2 7/8] ext4: Count all requests in direct io Zheng Liu
@ 2011-11-10 10:34 ` Zheng Liu
  2011-11-11 10:55 ` [PATCH v2 0/8] Filesystem io types statistic Steven Whitehouse
  8 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-10 10:34 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel; +Cc: Zheng Liu, Wang Shaoyan

From: Zheng Liu <wenqing.lz@taobao.com>

io_stats file is created to show the result in sysfs.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
---
 fs/ext4/super.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8e27b9c..bc1476d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2532,6 +2532,52 @@ static ssize_t inode_readahead_blks_store(struct ext4_attr *a,
 	return count;
 }
 
+static inline unsigned long ext4_get_ios_counter(int rw, int flag);
+static ssize_t io_stats_show(struct ext4_attr *a,
+			     struct ext4_sb_info *sbi, char *buf)
+{
+	int i;
+	unsigned long ios_counters[EXT4_IOS_TYPE_END][2] = {{0,},};
+
+	for (i = 0; i < EXT4_IOS_TYPE_END; i++) {
+		ios_counters[i][READ] = ext4_get_ios_counter(READ, i);
+		ios_counters[i][WRITE] = ext4_get_ios_counter(WRITE, i);
+	}
+
+	return snprintf(buf, PAGE_SIZE,
+			"TYPE			READ	WRITE\n"
+			"super block		%8lu %8lu\n"
+			"group descriptor	%8lu %8lu\n"
+			"inode bitmap		%8lu %8lu\n"
+			"block bitmap		%8lu %8lu\n"
+			"inode table		%8lu %8lu\n"
+			"extent block		%8lu %8lu\n"
+			"indirect block		%8lu %8lu\n"
+			"dir entry		%8lu %8lu\n"
+			"extended attribute	%8lu %8lu\n"
+			"regular data		%8lu %8lu\n",
+			ios_counters[EXT4_IOS_SUPER_BLOCK][READ],
+			ios_counters[EXT4_IOS_SUPER_BLOCK][WRITE],
+			ios_counters[EXT4_IOS_GROUP_DESC][READ],
+			ios_counters[EXT4_IOS_GROUP_DESC][WRITE],
+			ios_counters[EXT4_IOS_INODE_BITMAP][READ],
+			ios_counters[EXT4_IOS_INODE_BITMAP][WRITE],
+			ios_counters[EXT4_IOS_BLOCK_BITMAP][READ],
+			ios_counters[EXT4_IOS_BLOCK_BITMAP][WRITE],
+			ios_counters[EXT4_IOS_INODE_TABLE][READ],
+			ios_counters[EXT4_IOS_INODE_TABLE][WRITE],
+			ios_counters[EXT4_IOS_EXTENT_BLOCK][READ],
+			ios_counters[EXT4_IOS_EXTENT_BLOCK][WRITE],
+			ios_counters[EXT4_IOS_INDIRECT_BLOCK][READ],
+			ios_counters[EXT4_IOS_INDIRECT_BLOCK][WRITE],
+			ios_counters[EXT4_IOS_DIR_ENTRY][READ],
+			ios_counters[EXT4_IOS_DIR_ENTRY][WRITE],
+			ios_counters[EXT4_IOS_EXTENDED_ATTR][READ],
+			ios_counters[EXT4_IOS_EXTENDED_ATTR][WRITE],
+			ios_counters[EXT4_IOS_REGULAR_DATA][READ],
+			ios_counters[EXT4_IOS_REGULAR_DATA][WRITE]);
+}
+
 static ssize_t sbi_ui_show(struct ext4_attr *a,
 			   struct ext4_sb_info *sbi, char *buf)
 {
@@ -2575,6 +2621,7 @@ EXT4_RO_ATTR(session_write_kbytes);
 EXT4_RO_ATTR(lifetime_write_kbytes);
 EXT4_RO_ATTR(extent_cache_hits);
 EXT4_RO_ATTR(extent_cache_misses);
+EXT4_RO_ATTR(io_stats);
 EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, sbi_ui_show,
 		 inode_readahead_blks_store, s_inode_readahead_blks);
 EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
@@ -2601,6 +2648,7 @@ static struct attribute *ext4_attrs[] = {
 	ATTR_LIST(mb_stream_req),
 	ATTR_LIST(mb_group_prealloc),
 	ATTR_LIST(max_writeback_mb_bump),
+	ATTR_LIST(io_stats),
 	NULL,
 };
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types
  2011-11-10 10:34 ` [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types Zheng Liu
@ 2011-11-11 10:48   ` Steven Whitehouse
  2011-11-11 15:36     ` Zheng Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Whitehouse @ 2011-11-11 10:48 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel, Zheng Liu, Wang Shaoyan

Hi,

On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> 'Issue' flag is added into buffer_head to indicate that a request is issued
> to the disk. When an io doesn't hit page cache, 'Issue' flag will be set in
> submit_bh(). Filesystem can understand request is sent to the disk according
> to whether or not this flag is set. This flag will be cleared after filesystem
> tests it. If filesystem doesn't want to count the io type, it can ignore this
> flag.
> 
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
> ---
>  fs/buffer.c                 |    3 +++
>  include/linux/buffer_head.h |    5 +++++
>  2 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 70a1974..f681caa 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2920,6 +2920,9 @@ int submit_bh(int rw, struct buffer_head * bh)
>  	BUG_ON(buffer_delay(bh));
>  	BUG_ON(buffer_unwritten(bh));
>  
> +	/* set issue flag for indicating a request is issued to the disk */
> +	set_buffer_issue(bh);
> +
>  	/*
>  	 * Only clear out a write error when rewriting
>  	 */
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index 458f497..2bccdb4 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -38,6 +38,10 @@ enum bh_state_bits {
>  	BH_PrivateStart,/* not a state bit, but the first bit available
>  			 * for private allocation by other entities
>  			 */
> +
> +	BH_Issue,	/* Issue a request to the disk when this request
> +			 * doesn't hit cache
> +			 */
>  };
This looks like it will collide with any fs using more than one private
flag, for example, ext4.


Steve.



>  
>  #define MAX_BUF_PER_PAGE (PAGE_CACHE_SIZE / 512)
> @@ -124,6 +128,7 @@ BUFFER_FNS(Delay, delay)
>  BUFFER_FNS(Boundary, boundary)
>  BUFFER_FNS(Write_EIO, write_io_error)
>  BUFFER_FNS(Unwritten, unwritten)
> +BUFFER_FNS(Issue, issue)
>  
>  #define bh_offset(bh)		((unsigned long)(bh)->b_data & ~PAGE_MASK)
>  #define touch_buffer(bh)	mark_page_accessed(bh->b_page)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
                   ` (7 preceding siblings ...)
  2011-11-10 10:34 ` [PATCH v2 8/8] ext4: Show the result of io types statistic in sysfs Zheng Liu
@ 2011-11-11 10:55 ` Steven Whitehouse
  2011-11-11 15:32   ` Zheng Liu
  8 siblings, 1 reply; 21+ messages in thread
From: Steven Whitehouse @ 2011-11-11 10:55 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel

Hi,

On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> Hi all,
> 
> v1->v2: totally redesign this mechanism
> 
> This patchset implements an io types statistic mechanism for filesystem
> and it has been added into ext4 to let us know how the ext4 is used by
> applications. It is useful for us to analyze how to improve the filesystem
> and applications. Nowadays, I have added it into ext4, but other filesytems
> also can use it to count the io types by themselves.
> 
> A 'Issue' flag is added into buffer_head and will be set in submit_bh().
> Thus, we can check this flag in filesystem to know that a request is issued
> to the disk when this flag is set. Filesystems just need to check it in
> read operation because filesystem should know whehter a write request hits
> cache or not, at least in ext4. In filesystem, buffer needs to be locked in
> checking and clearing this flag, but it doesn't cost much overhead.
> 
There is already a REQ_META flag available which allows distinction
between data and metadata I/O (at least when they are not contained
within the same block). If that was to be extended to allow some
filesystem specific bits that would solve the problem that you appear to
be addressing with these patches in a fs independent way.

That would probably have already been done, except that the REQ_ flags
field is already almost full - so it might need the addition of an extra
field or some other solution.

Either way, an fs independent solution to this problem would be worth
considering,

Steve.


> In ext4, a per-cpu counter is defined and some functions are added to count
> the io types of buffered/direct io. An exception is __breadahead() due to
> this function doesn't need a buffer_head as argument or return value. So now
> we cannot handle these requests calling __breadahead().
> 
> The IO types in ext4 have shown as following:
> Metadata:
>  - super block
>  - group descriptor
>  - inode bitmap
>  - block bitmap
>  - inode table
>  - extent block
>  - indirect block
>  - dir index and entry
>  - extended attribute
> Data:
>  - regular data block
> 
> The result is shown in sysfs. We can read from /sys/fs/ext4/$DEVICE/io_stats
> to see the result. We can understand how much metadata or data requests are
> issued to the disk according to the result.
> 
> I have finished some benchmarks to test its overhead that calling lock_buffer()
> brings. The following fio script is used to run on a SSD. The result shows that
> the ovheread can be ignored.
> 
> FIO config file:
> [global]
> ioengineshortync
> bs=4k
> filename=/mnt/sda1/testfile
> size=64G
> runtime=300
> group_reporting
> loops=500
> 
> [read]
> rw=randread
> numjobs=4
> 
> [write]
> rw=randwrite
> numjobs=1
> 
> The result (iops):
>         w/o         w/
> READ:  16304      15906 (-2.44%)
> WRITE:  1332       1353 (+1.58%)
> 
> Any comments or suggestions are welcome.
> 
> Regards,
> Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/8] ext4: Add new data structures and related functions to count io types
  2011-11-10 10:34 ` [PATCH v2 2/8] ext4: Add new data structures and related functions " Zheng Liu
@ 2011-11-11 10:58   ` Steven Whitehouse
  2011-11-11 15:45     ` Zheng Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Whitehouse @ 2011-11-11 10:58 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel, Zheng Liu, Wang Shaoyan

Hi,

On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> A per-cpu counter is defined to store io types in ext4. We define 10 io types
> in ext4, which includes 9 metadata types and 1 data type. Read and write
> operations are counted separately. When checks 'Issue' flag, filesystem needs
> to lock buffer.
> 
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
> ---
>  fs/ext4/ext4.h  |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/ext4/super.c |   19 +++++++++++++++++++
>  2 files changed, 72 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 5b0e26a..39a1495 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1108,6 +1108,23 @@ struct ext4_super_block {
>  #define EXT4_MF_FS_ABORTED	0x0002	/* Fatal error detected */
>  
>  /*
> + * ext4 io types
> + */
> +enum {
> +	EXT4_IOS_SUPER_BLOCK = 0,
> +	EXT4_IOS_GROUP_DESC,
> +	EXT4_IOS_INODE_BITMAP,
> +	EXT4_IOS_BLOCK_BITMAP,
> +	EXT4_IOS_INODE_TABLE,
> +	EXT4_IOS_EXTENT_BLOCK,
> +	EXT4_IOS_INDIRECT_BLOCK,
> +	EXT4_IOS_DIR_ENTRY,
> +	EXT4_IOS_EXTENDED_ATTR,
> +	EXT4_IOS_REGULAR_DATA,
> +	EXT4_IOS_TYPE_END,
> +};
> +
> +/*
>   * fourth extended-fs super-block data in memory
>   */
>  struct ext4_sb_info {
> @@ -1284,6 +1301,11 @@ static inline void ext4_set_io_unwritten_flag(struct inode *inode,
>  	}
>  }
>  
> +static inline unsigned ext4_blocks_per_page(struct inode *inode)
> +{
> +	return PAGE_CACHE_SIZE >> inode->i_blkbits;
> +}
> +
>  /*
>   * Inode dynamic state flags
>   */
> @@ -1926,6 +1948,37 @@ extern int ext4_group_extend(struct super_block *sb,
>  				ext4_fsblk_t n_blocks_count);
>  
>  /* super.c */
> +extern void __ext4_io_stat(int, int, unsigned long);
> +#define ext4_ios_read(bh, type, count)					\
> +	do {								\
> +		if (!bh)						\
> +			break;						\
> +		lock_buffer(bh);					\
> +		if (buffer_issue(bh)) {					\
> +			clear_buffer_issue(bh);				\
> +			__ext4_io_stat(READ, type, count);		\
> +		}							\
> +		unlock_buffer(bh);					\
> +	} while (0)
Why not just test_and_clear_bit(BH_Issue) ? I don't follow why the
buffer needs to be locked and unlocked,

Steve.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-11 10:55 ` [PATCH v2 0/8] Filesystem io types statistic Steven Whitehouse
@ 2011-11-11 15:32   ` Zheng Liu
  2011-11-14 10:23     ` Steven Whitehouse
  0 siblings, 1 reply; 21+ messages in thread
From: Zheng Liu @ 2011-11-11 15:32 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: linux-ext4, linux-fsdevel

On Fri, Nov 11, 2011 at 10:55:26AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> > Hi all,
> > 
> > v1->v2: totally redesign this mechanism
> > 
> > This patchset implements an io types statistic mechanism for filesystem
> > and it has been added into ext4 to let us know how the ext4 is used by
> > applications. It is useful for us to analyze how to improve the filesystem
> > and applications. Nowadays, I have added it into ext4, but other filesytems
> > also can use it to count the io types by themselves.
> > 
> > A 'Issue' flag is added into buffer_head and will be set in submit_bh().
> > Thus, we can check this flag in filesystem to know that a request is issued
> > to the disk when this flag is set. Filesystems just need to check it in
> > read operation because filesystem should know whehter a write request hits
> > cache or not, at least in ext4. In filesystem, buffer needs to be locked in
> > checking and clearing this flag, but it doesn't cost much overhead.
> > 

Hi Steve,

Thank you for your attention.

> There is already a REQ_META flag available which allows distinction
> between data and metadata I/O (at least when they are not contained
> within the same block). If that was to be extended to allow some
> filesystem specific bits that would solve the problem that you appear to
> be addressing with these patches in a fs independent way.

You are right. REQ_META flag quite can distinguish between metadata and
data. But it is difficulty to check this flag in filesystem because
buffer_head doesn't use it and most of filesystems still use buffer_head
to submit a IO request. This is the reason why I added a new flag into
buffer_head.

> 
> That would probably have already been done, except that the REQ_ flags
> field is already almost full - so it might need the addition of an extra
> field or some other solution.

In v1[1], a structure called ios is defined. This structure saves some
information (e.g. IO type) and a callback function. Some interfaces in
buffer layer are modifed to add a new argument that points to this
structure. When this request doesn't hit cache and is issued to the
disk, the callback function in this structure will be called. Filesystem
can define a function to do some operations. A defect in this solution
is that it needs to change some interfaces, such bread, breadahead and
so on. So in v2, I re-implement a new mechanism.

> 
> Either way, an fs independent solution to this problem would be worth
> considering,

Yes, I am willing to implement an fs independent solution. This is my
original intention too. So any suggestions are welcome. Thank you.

[1] http://www.spinics.net/lists/linux-ext4/msg28608.html

Regards,
Zheng

> 
> Steve.
> 
> 
> > In ext4, a per-cpu counter is defined and some functions are added to count
> > the io types of buffered/direct io. An exception is __breadahead() due to
> > this function doesn't need a buffer_head as argument or return value. So now
> > we cannot handle these requests calling __breadahead().
> > 
> > The IO types in ext4 have shown as following:
> > Metadata:
> >  - super block
> >  - group descriptor
> >  - inode bitmap
> >  - block bitmap
> >  - inode table
> >  - extent block
> >  - indirect block
> >  - dir index and entry
> >  - extended attribute
> > Data:
> >  - regular data block
> > 
> > The result is shown in sysfs. We can read from /sys/fs/ext4/$DEVICE/io_stats
> > to see the result. We can understand how much metadata or data requests are
> > issued to the disk according to the result.
> > 
> > I have finished some benchmarks to test its overhead that calling lock_buffer()
> > brings. The following fio script is used to run on a SSD. The result shows that
> > the ovheread can be ignored.
> > 
> > FIO config file:
> > [global]
> > ioengineshortync
> > bs=4k
> > filename=/mnt/sda1/testfile
> > size=64G
> > runtime=300
> > group_reporting
> > loops=500
> > 
> > [read]
> > rw=randread
> > numjobs=4
> > 
> > [write]
> > rw=randwrite
> > numjobs=1
> > 
> > The result (iops):
> >         w/o         w/
> > READ:  16304      15906 (-2.44%)
> > WRITE:  1332       1353 (+1.58%)
> > 
> > Any comments or suggestions are welcome.
> > 
> > Regards,
> > Zheng
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types
  2011-11-11 10:48   ` Steven Whitehouse
@ 2011-11-11 15:36     ` Zheng Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-11 15:36 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: linux-ext4, linux-fsdevel, Zheng Liu, Wang Shaoyan

On Fri, Nov 11, 2011 at 10:48:33AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> > From: Zheng Liu <wenqing.lz@taobao.com>
> > 
> > 'Issue' flag is added into buffer_head to indicate that a request is issued
> > to the disk. When an io doesn't hit page cache, 'Issue' flag will be set in
> > submit_bh(). Filesystem can understand request is sent to the disk according
> > to whether or not this flag is set. This flag will be cleared after filesystem
> > tests it. If filesystem doesn't want to count the io type, it can ignore this
> > flag.
> > 
> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
> > ---
> >  fs/buffer.c                 |    3 +++
> >  include/linux/buffer_head.h |    5 +++++
> >  2 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 70a1974..f681caa 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -2920,6 +2920,9 @@ int submit_bh(int rw, struct buffer_head * bh)
> >  	BUG_ON(buffer_delay(bh));
> >  	BUG_ON(buffer_unwritten(bh));
> >  
> > +	/* set issue flag for indicating a request is issued to the disk */
> > +	set_buffer_issue(bh);
> > +
> >  	/*
> >  	 * Only clear out a write error when rewriting
> >  	 */
> > diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> > index 458f497..2bccdb4 100644
> > --- a/include/linux/buffer_head.h
> > +++ b/include/linux/buffer_head.h
> > @@ -38,6 +38,10 @@ enum bh_state_bits {
> >  	BH_PrivateStart,/* not a state bit, but the first bit available
> >  			 * for private allocation by other entities
> >  			 */
> > +
> > +	BH_Issue,	/* Issue a request to the disk when this request
> > +			 * doesn't hit cache
> > +			 */
> >  };
> This looks like it will collide with any fs using more than one private
> flag, for example, ext4.

I will put this flag on the top of BH_PrivateStart.

Regards,
Zheng

> 
> 
> Steve.
> 
> 
> 
> >  
> >  #define MAX_BUF_PER_PAGE (PAGE_CACHE_SIZE / 512)
> > @@ -124,6 +128,7 @@ BUFFER_FNS(Delay, delay)
> >  BUFFER_FNS(Boundary, boundary)
> >  BUFFER_FNS(Write_EIO, write_io_error)
> >  BUFFER_FNS(Unwritten, unwritten)
> > +BUFFER_FNS(Issue, issue)
> >  
> >  #define bh_offset(bh)		((unsigned long)(bh)->b_data & ~PAGE_MASK)
> >  #define touch_buffer(bh)	mark_page_accessed(bh->b_page)
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/8] ext4: Add new data structures and related functions to count io types
  2011-11-11 10:58   ` Steven Whitehouse
@ 2011-11-11 15:45     ` Zheng Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-11 15:45 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: linux-ext4, linux-fsdevel, Zheng Liu, Wang Shaoyan

On Fri, Nov 11, 2011 at 10:58:11AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> > From: Zheng Liu <wenqing.lz@taobao.com>
> > 
> > A per-cpu counter is defined to store io types in ext4. We define 10 io types
> > in ext4, which includes 9 metadata types and 1 data type. Read and write
> > operations are counted separately. When checks 'Issue' flag, filesystem needs
> > to lock buffer.
> > 
> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com>
> > ---
> >  fs/ext4/ext4.h  |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/ext4/super.c |   19 +++++++++++++++++++
> >  2 files changed, 72 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 5b0e26a..39a1495 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -1108,6 +1108,23 @@ struct ext4_super_block {
> >  #define EXT4_MF_FS_ABORTED	0x0002	/* Fatal error detected */
> >  
> >  /*
> > + * ext4 io types
> > + */
> > +enum {
> > +	EXT4_IOS_SUPER_BLOCK = 0,
> > +	EXT4_IOS_GROUP_DESC,
> > +	EXT4_IOS_INODE_BITMAP,
> > +	EXT4_IOS_BLOCK_BITMAP,
> > +	EXT4_IOS_INODE_TABLE,
> > +	EXT4_IOS_EXTENT_BLOCK,
> > +	EXT4_IOS_INDIRECT_BLOCK,
> > +	EXT4_IOS_DIR_ENTRY,
> > +	EXT4_IOS_EXTENDED_ATTR,
> > +	EXT4_IOS_REGULAR_DATA,
> > +	EXT4_IOS_TYPE_END,
> > +};
> > +
> > +/*
> >   * fourth extended-fs super-block data in memory
> >   */
> >  struct ext4_sb_info {
> > @@ -1284,6 +1301,11 @@ static inline void ext4_set_io_unwritten_flag(struct inode *inode,
> >  	}
> >  }
> >  
> > +static inline unsigned ext4_blocks_per_page(struct inode *inode)
> > +{
> > +	return PAGE_CACHE_SIZE >> inode->i_blkbits;
> > +}
> > +
> >  /*
> >   * Inode dynamic state flags
> >   */
> > @@ -1926,6 +1948,37 @@ extern int ext4_group_extend(struct super_block *sb,
> >  				ext4_fsblk_t n_blocks_count);
> >  
> >  /* super.c */
> > +extern void __ext4_io_stat(int, int, unsigned long);
> > +#define ext4_ios_read(bh, type, count)					\
> > +	do {								\
> > +		if (!bh)						\
> > +			break;						\
> > +		lock_buffer(bh);					\
> > +		if (buffer_issue(bh)) {					\
> > +			clear_buffer_issue(bh);				\
> > +			__ext4_io_stat(READ, type, count);		\
> > +		}							\
> > +		unlock_buffer(bh);					\
> > +	} while (0)
> Why not just test_and_clear_bit(BH_Issue) ? I don't follow why the
> buffer needs to be locked and unlocked,

test_and_clear_bit(BH_Issue) is better. Calling lock/unlock_buffer() is
to ensure 'Issue' flag can be set or cleared atomically. It is
unnecessary after using test_and_clear_bit(BH_Issue).

Regards,
Zheng

> 
> Steve.
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-11 15:32   ` Zheng Liu
@ 2011-11-14 10:23     ` Steven Whitehouse
  2011-11-14 13:35       ` Zheng Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Whitehouse @ 2011-11-14 10:23 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel

Hi,

On Fri, 2011-11-11 at 23:32 +0800, Zheng Liu wrote:
> On Fri, Nov 11, 2011 at 10:55:26AM +0000, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> > > Hi all,
> > > 
> > > v1->v2: totally redesign this mechanism
> > > 
> > > This patchset implements an io types statistic mechanism for filesystem
> > > and it has been added into ext4 to let us know how the ext4 is used by
> > > applications. It is useful for us to analyze how to improve the filesystem
> > > and applications. Nowadays, I have added it into ext4, but other filesytems
> > > also can use it to count the io types by themselves.
> > > 
> > > A 'Issue' flag is added into buffer_head and will be set in submit_bh().
> > > Thus, we can check this flag in filesystem to know that a request is issued
> > > to the disk when this flag is set. Filesystems just need to check it in
> > > read operation because filesystem should know whehter a write request hits
> > > cache or not, at least in ext4. In filesystem, buffer needs to be locked in
> > > checking and clearing this flag, but it doesn't cost much overhead.
> > > 
> 
> Hi Steve,
> 
> Thank you for your attention.
> 
> > There is already a REQ_META flag available which allows distinction
> > between data and metadata I/O (at least when they are not contained
> > within the same block). If that was to be extended to allow some
> > filesystem specific bits that would solve the problem that you appear to
> > be addressing with these patches in a fs independent way.
> 
> You are right. REQ_META flag quite can distinguish between metadata and
> data. But it is difficulty to check this flag in filesystem because
> buffer_head doesn't use it and most of filesystems still use buffer_head
> to submit a IO request. This is the reason why I added a new flag into
> buffer_head.
> 
I don't understand what you mean here.... submit_bh() takes a bh and a
set of REQ flags, so there is no reason to not use REQ_META. Using a bh
doesn't prevent setting those flags. The issue is only that few bits
remain unused in those flags and solving the problem in a "nice" way, by
adding a few more flags, may be tricky.

> > 
> > That would probably have already been done, except that the REQ_ flags
> > field is already almost full - so it might need the addition of an extra
> > field or some other solution.
> 
> In v1[1], a structure called ios is defined. This structure saves some
> information (e.g. IO type) and a callback function. Some interfaces in
> buffer layer are modifed to add a new argument that points to this
> structure. When this request doesn't hit cache and is issued to the
> disk, the callback function in this structure will be called. Filesystem
> can define a function to do some operations. A defect in this solution
> is that it needs to change some interfaces, such bread, breadahead and
> so on. So in v2, I re-implement a new mechanism.
> 
> > 
> > Either way, an fs independent solution to this problem would be worth
> > considering,
> 
> Yes, I am willing to implement an fs independent solution. This is my
> original intention too. So any suggestions are welcome. Thank you.
> 
> [1] http://www.spinics.net/lists/linux-ext4/msg28608.html
> 
> Regards,
> Zheng
> 
Ok. Sounds good. GFS2 already sets REQ_META in all places where metadata
is being written. Now that REQ_META as been demerged from the REQ_PRIO
flag, there is no reason that filesystems cannot set it without fear of
side effects. Its only purpose is as a notification to blktrace now,

Steve.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-14 10:23     ` Steven Whitehouse
@ 2011-11-14 13:35       ` Zheng Liu
  2011-11-15 18:34         ` Aditya Kali
  0 siblings, 1 reply; 21+ messages in thread
From: Zheng Liu @ 2011-11-14 13:35 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: linux-ext4, linux-fsdevel

On Mon, Nov 14, 2011 at 10:23:01AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On Fri, 2011-11-11 at 23:32 +0800, Zheng Liu wrote:
> > On Fri, Nov 11, 2011 at 10:55:26AM +0000, Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> > > > Hi all,
> > > > 
> > > > v1->v2: totally redesign this mechanism
> > > > 
> > > > This patchset implements an io types statistic mechanism for filesystem
> > > > and it has been added into ext4 to let us know how the ext4 is used by
> > > > applications. It is useful for us to analyze how to improve the filesystem
> > > > and applications. Nowadays, I have added it into ext4, but other filesytems
> > > > also can use it to count the io types by themselves.
> > > > 
> > > > A 'Issue' flag is added into buffer_head and will be set in submit_bh().
> > > > Thus, we can check this flag in filesystem to know that a request is issued
> > > > to the disk when this flag is set. Filesystems just need to check it in
> > > > read operation because filesystem should know whehter a write request hits
> > > > cache or not, at least in ext4. In filesystem, buffer needs to be locked in
> > > > checking and clearing this flag, but it doesn't cost much overhead.
> > > > 
> > 
> > Hi Steve,
> > 
> > Thank you for your attention.
> > 
> > > There is already a REQ_META flag available which allows distinction
> > > between data and metadata I/O (at least when they are not contained
> > > within the same block). If that was to be extended to allow some
> > > filesystem specific bits that would solve the problem that you appear to
> > > be addressing with these patches in a fs independent way.
> > 
> > You are right. REQ_META flag quite can distinguish between metadata and
> > data. But it is difficulty to check this flag in filesystem because
> > buffer_head doesn't use it and most of filesystems still use buffer_head
> > to submit a IO request. This is the reason why I added a new flag into
> > buffer_head.
> > 
> I don't understand what you mean here.... submit_bh() takes a bh and a
> set of REQ flags, so there is no reason to not use REQ_META. Using a bh
> doesn't prevent setting those flags. The issue is only that few bits
> remain unused in those flags and solving the problem in a "nice" way, by
> adding a few more flags, may be tricky.

Hi,

Please let me explain why a new flag is needed in buffer_head.

The goal of this patchset wants to provide a mechanism to let
filesystems can inspect how much different types of IOs are issued to
the disk. The types not only are divided into metadata and data. The
detailed types are needed, such as super_block, inode, EA and so on.
So filesystem needs to define some counters to save the result and
increase these counters when it makes a request. But filesystems couldn't
know whether or not this request is issued to the disk because the data
might be in page cache, at least read operation is like that. So we need
a solution to let filesystems know that. Meanwhile filesystems can free
choose whether or not providing the statistic result.

A new flag can be added into buffer_head and is set when the request is
really issued to the disk to let filesystem know that. But it seems that
REQ_META flag could not fit for us because REQ flags are used in bio.
Buffer_head couldn't use these flags. So filesystem cannot check this
flag that has been set or not. Further, AFAIK, some filesystems (e.g.
ext4) call sb_bread() and sb_breadahead() to do a read operation besides
submit_bh() and ll_rw_block(). It seems that there is no way to check
REQ_META flag from buffer_head too.

Hopefully the explaination is clear enough, and any comments or
suggestions are welcome. Thanks again. :-)

Regards,
Zheng

> 
> > > 
> > > That would probably have already been done, except that the REQ_ flags
> > > field is already almost full - so it might need the addition of an extra
> > > field or some other solution.
> > 
> > In v1[1], a structure called ios is defined. This structure saves some
> > information (e.g. IO type) and a callback function. Some interfaces in
> > buffer layer are modifed to add a new argument that points to this
> > structure. When this request doesn't hit cache and is issued to the
> > disk, the callback function in this structure will be called. Filesystem
> > can define a function to do some operations. A defect in this solution
> > is that it needs to change some interfaces, such bread, breadahead and
> > so on. So in v2, I re-implement a new mechanism.
> > 
> > > 
> > > Either way, an fs independent solution to this problem would be worth
> > > considering,
> > 
> > Yes, I am willing to implement an fs independent solution. This is my
> > original intention too. So any suggestions are welcome. Thank you.
> > 
> > [1] http://www.spinics.net/lists/linux-ext4/msg28608.html
> > 
> > Regards,
> > Zheng
> > 
> Ok. Sounds good. GFS2 already sets REQ_META in all places where metadata
> is being written. Now that REQ_META as been demerged from the REQ_PRIO
> flag, there is no reason that filesystems cannot set it without fear of
> side effects. Its only purpose is as a notification to blktrace now,
> 
> Steve.
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-14 13:35       ` Zheng Liu
@ 2011-11-15 18:34         ` Aditya Kali
  2011-11-16  8:43           ` Zheng Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Aditya Kali @ 2011-11-15 18:34 UTC (permalink / raw)
  To: Steven Whitehouse, linux-ext4, linux-fsdevel

On Mon, Nov 14, 2011 at 5:35 AM, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> On Mon, Nov 14, 2011 at 10:23:01AM +0000, Steven Whitehouse wrote:
>> Hi,
>>
>> On Fri, 2011-11-11 at 23:32 +0800, Zheng Liu wrote:
>> > On Fri, Nov 11, 2011 at 10:55:26AM +0000, Steven Whitehouse wrote:
>> > > Hi,
>> > >
>> > > On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
>> > > > Hi all,
>> > > >
>> > > > v1->v2: totally redesign this mechanism
>> > > >
>> > > > This patchset implements an io types statistic mechanism for filesystem
>> > > > and it has been added into ext4 to let us know how the ext4 is used by
>> > > > applications. It is useful for us to analyze how to improve the filesystem
>> > > > and applications. Nowadays, I have added it into ext4, but other filesytems
>> > > > also can use it to count the io types by themselves.
>> > > >
>> > > > A 'Issue' flag is added into buffer_head and will be set in submit_bh().
>> > > > Thus, we can check this flag in filesystem to know that a request is issued
>> > > > to the disk when this flag is set. Filesystems just need to check it in
>> > > > read operation because filesystem should know whehter a write request hits
>> > > > cache or not, at least in ext4. In filesystem, buffer needs to be locked in
>> > > > checking and clearing this flag, but it doesn't cost much overhead.
>> > > >
>> >
>> > Hi Steve,
>> >
>> > Thank you for your attention.
>> >
>> > > There is already a REQ_META flag available which allows distinction
>> > > between data and metadata I/O (at least when they are not contained
>> > > within the same block). If that was to be extended to allow some
>> > > filesystem specific bits that would solve the problem that you appear to
>> > > be addressing with these patches in a fs independent way.
>> >
>> > You are right. REQ_META flag quite can distinguish between metadata and
>> > data. But it is difficulty to check this flag in filesystem because
>> > buffer_head doesn't use it and most of filesystems still use buffer_head
>> > to submit a IO request. This is the reason why I added a new flag into
>> > buffer_head.
>> >
>> I don't understand what you mean here.... submit_bh() takes a bh and a
>> set of REQ flags, so there is no reason to not use REQ_META. Using a bh
>> doesn't prevent setting those flags. The issue is only that few bits
>> remain unused in those flags and solving the problem in a "nice" way, by
>> adding a few more flags, may be tricky.
>
> Hi,
>
> Please let me explain why a new flag is needed in buffer_head.
>
> The goal of this patchset wants to provide a mechanism to let
> filesystems can inspect how much different types of IOs are issued to
> the disk. The types not only are divided into metadata and data. The
> detailed types are needed, such as super_block, inode, EA and so on.
> So filesystem needs to define some counters to save the result and
> increase these counters when it makes a request. But filesystems couldn't
> know whether or not this request is issued to the disk because the data
> might be in page cache, at least read operation is like that. So we need
> a solution to let filesystems know that. Meanwhile filesystems can free
> choose whether or not providing the statistic result.
>
> A new flag can be added into buffer_head and is set when the request is
> really issued to the disk to let filesystem know that. But it seems that
> REQ_META flag could not fit for us because REQ flags are used in bio.
> Buffer_head couldn't use these flags. So filesystem cannot check this
> flag that has been set or not. Further, AFAIK, some filesystems (e.g.
> ext4) call sb_bread() and sb_breadahead() to do a read operation besides
> submit_bh() and ll_rw_block(). It seems that there is no way to check
> REQ_META flag from buffer_head too.
>

As part of some other work, I had added ext4's own submit_bh functions
and replaced all the calls to submit_bh() and ll_rw_block() with
these:

------ x ------

--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
+void ext4_submit_bh_read_nowait(int rw, struct buffer_head *bh)
+{
+       BUG_ON(rw & WRITE);
+       BUG_ON(!buffer_locked(bh));
+       get_bh(bh);
+       bh->b_end_io = end_buffer_read_sync;
+       submit_bh(rw, bh);
+}
+
+int ext4_submit_bh_read(int rw, struct buffer_head *bh)
+{
+       BUG_ON(rw & WRITE);
+       BUG_ON(!buffer_locked(bh));
+
+       if (buffer_uptodate(bh)) {
+               unlock_buffer(bh);
+               return 0;
+       }
+
+       ext4_submit_bh_read_nowait(rw, bh);
+       wait_on_buffer(bh);
+       if (buffer_uptodate(bh))
+               return 0;
+       return -EIO;
+}
+
 struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
                               ext4_lblk_t block, int create, int *err)
 {
@@ -1572,11 +1598,9 @@ struct buffer_head *ext4_bread(handle_t
*handle, struct inode *inode,
        bh = ext4_getblk(handle, inode, block, create, err);
        if (!bh)
                return bh;
-       if (buffer_uptodate(bh))
+       if (bh_uptodate_or_lock(bh))
                return bh;
-       ll_rw_block(READ_META, 1, &bh);
-       wait_on_buffer(bh);
-       if (buffer_uptodate(bh))
+       if (!ext4_submit_bh_read(READ_META, bh))
                return bh;
        put_bh(bh);
        *err = -EIO;

------ x ------

I had made the change only for reads, but it should be easy to make it
do writes to. Also, this function can take ext4 specific flags and you
can do your accounting at a single place in ext4. So, you wont need
any more flags for buffer head.
Will this approach help in what you are trying to do?

Thanks,

> Hopefully the explaination is clear enough, and any comments or
> suggestions are welcome. Thanks again. :-)
>
> Regards,
> Zheng
>
>>
>> > >
>> > > That would probably have already been done, except that the REQ_ flags
>> > > field is already almost full - so it might need the addition of an extra
>> > > field or some other solution.
>> >
>> > In v1[1], a structure called ios is defined. This structure saves some
>> > information (e.g. IO type) and a callback function. Some interfaces in
>> > buffer layer are modifed to add a new argument that points to this
>> > structure. When this request doesn't hit cache and is issued to the
>> > disk, the callback function in this structure will be called. Filesystem
>> > can define a function to do some operations. A defect in this solution
>> > is that it needs to change some interfaces, such bread, breadahead and
>> > so on. So in v2, I re-implement a new mechanism.
>> >
>> > >
>> > > Either way, an fs independent solution to this problem would be worth
>> > > considering,
>> >
>> > Yes, I am willing to implement an fs independent solution. This is my
>> > original intention too. So any suggestions are welcome. Thank you.
>> >
>> > [1] http://www.spinics.net/lists/linux-ext4/msg28608.html
>> >
>> > Regards,
>> > Zheng
>> >
>> Ok. Sounds good. GFS2 already sets REQ_META in all places where metadata
>> is being written. Now that REQ_META as been demerged from the REQ_PRIO
>> flag, there is no reason that filesystems cannot set it without fear of
>> side effects. Its only purpose is as a notification to blktrace now,
>>
>> Steve.
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Aditya
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-15 18:34         ` Aditya Kali
@ 2011-11-16  8:43           ` Zheng Liu
  2011-11-16 10:14             ` Steven Whitehouse
  0 siblings, 1 reply; 21+ messages in thread
From: Zheng Liu @ 2011-11-16  8:43 UTC (permalink / raw)
  To: Aditya Kali; +Cc: Steven Whitehouse, linux-ext4, linux-fsdevel

On Tue, Nov 15, 2011 at 10:34:20AM -0800, Aditya Kali wrote:
> On Mon, Nov 14, 2011 at 5:35 AM, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> > On Mon, Nov 14, 2011 at 10:23:01AM +0000, Steven Whitehouse wrote:
> >> Hi,
> >>
> >> On Fri, 2011-11-11 at 23:32 +0800, Zheng Liu wrote:
> >> > On Fri, Nov 11, 2011 at 10:55:26AM +0000, Steven Whitehouse wrote:
> >> > > Hi,
> >> > >
> >> > > On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> >> > > > Hi all,
> >> > > >
> >> > > > v1->v2: totally redesign this mechanism
> >> > > >
> >> > > > This patchset implements an io types statistic mechanism for filesystem
> >> > > > and it has been added into ext4 to let us know how the ext4 is used by
> >> > > > applications. It is useful for us to analyze how to improve the filesystem
> >> > > > and applications. Nowadays, I have added it into ext4, but other filesytems
> >> > > > also can use it to count the io types by themselves.
> >> > > >
> >> > > > A 'Issue' flag is added into buffer_head and will be set in submit_bh().
> >> > > > Thus, we can check this flag in filesystem to know that a request is issued
> >> > > > to the disk when this flag is set. Filesystems just need to check it in
> >> > > > read operation because filesystem should know whehter a write request hits
> >> > > > cache or not, at least in ext4. In filesystem, buffer needs to be locked in
> >> > > > checking and clearing this flag, but it doesn't cost much overhead.
> >> > > >
> >> >
> >> > Hi Steve,
> >> >
> >> > Thank you for your attention.
> >> >
> >> > > There is already a REQ_META flag available which allows distinction
> >> > > between data and metadata I/O (at least when they are not contained
> >> > > within the same block). If that was to be extended to allow some
> >> > > filesystem specific bits that would solve the problem that you appear to
> >> > > be addressing with these patches in a fs independent way.
> >> >
> >> > You are right. REQ_META flag quite can distinguish between metadata and
> >> > data. But it is difficulty to check this flag in filesystem because
> >> > buffer_head doesn't use it and most of filesystems still use buffer_head
> >> > to submit a IO request. This is the reason why I added a new flag into
> >> > buffer_head.
> >> >
> >> I don't understand what you mean here.... submit_bh() takes a bh and a
> >> set of REQ flags, so there is no reason to not use REQ_META. Using a bh
> >> doesn't prevent setting those flags. The issue is only that few bits
> >> remain unused in those flags and solving the problem in a "nice" way, by
> >> adding a few more flags, may be tricky.
> >
> > Hi,
> >
> > Please let me explain why a new flag is needed in buffer_head.
> >
> > The goal of this patchset wants to provide a mechanism to let
> > filesystems can inspect how much different types of IOs are issued to
> > the disk. The types not only are divided into metadata and data. The
> > detailed types are needed, such as super_block, inode, EA and so on.
> > So filesystem needs to define some counters to save the result and
> > increase these counters when it makes a request. But filesystems couldn't
> > know whether or not this request is issued to the disk because the data
> > might be in page cache, at least read operation is like that. So we need
> > a solution to let filesystems know that. Meanwhile filesystems can free
> > choose whether or not providing the statistic result.
> >
> > A new flag can be added into buffer_head and is set when the request is
> > really issued to the disk to let filesystem know that. But it seems that
> > REQ_META flag could not fit for us because REQ flags are used in bio.
> > Buffer_head couldn't use these flags. So filesystem cannot check this
> > flag that has been set or not. Further, AFAIK, some filesystems (e.g.
> > ext4) call sb_bread() and sb_breadahead() to do a read operation besides
> > submit_bh() and ll_rw_block(). It seems that there is no way to check
> > REQ_META flag from buffer_head too.
> >
> 
> As part of some other work, I had added ext4's own submit_bh functions
> and replaced all the calls to submit_bh() and ll_rw_block() with
> these:
> 
> ------ x ------
> 
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> +void ext4_submit_bh_read_nowait(int rw, struct buffer_head *bh)
> +{
> +       BUG_ON(rw & WRITE);
> +       BUG_ON(!buffer_locked(bh));
> +       get_bh(bh);
> +       bh->b_end_io = end_buffer_read_sync;
> +       submit_bh(rw, bh);
> +}
> +
> +int ext4_submit_bh_read(int rw, struct buffer_head *bh)
> +{
> +       BUG_ON(rw & WRITE);
> +       BUG_ON(!buffer_locked(bh));
> +
> +       if (buffer_uptodate(bh)) {
> +               unlock_buffer(bh);
> +               return 0;
> +       }
> +
> +       ext4_submit_bh_read_nowait(rw, bh);
> +       wait_on_buffer(bh);
> +       if (buffer_uptodate(bh))
> +               return 0;
> +       return -EIO;
> +}
> +
>  struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
>                                ext4_lblk_t block, int create, int *err)
>  {
> @@ -1572,11 +1598,9 @@ struct buffer_head *ext4_bread(handle_t
> *handle, struct inode *inode,
>         bh = ext4_getblk(handle, inode, block, create, err);
>         if (!bh)
>                 return bh;
> -       if (buffer_uptodate(bh))
> +       if (bh_uptodate_or_lock(bh))
>                 return bh;
> -       ll_rw_block(READ_META, 1, &bh);
> -       wait_on_buffer(bh);
> -       if (buffer_uptodate(bh))
> +       if (!ext4_submit_bh_read(READ_META, bh))
>                 return bh;
>         put_bh(bh);
>         *err = -EIO;
> 
> ------ x ------
> 
> I had made the change only for reads, but it should be easy to make it
> do writes to. Also, this function can take ext4 specific flags and you
> can do your accounting at a single place in ext4. So, you wont need
> any more flags for buffer head.
> Will this approach help in what you are trying to do?
> 
> Thanks,

Hi Aditya,

Thank you for your patch. It quite can help me to solve my problem. We
can define some wrapper functions to do our accounting in ext4. But it
seems that this approach is just suitable for ext4. I prefer to
provide a fs independent solution. Steven and I are talking about how to
implement it to let other filesystems can use this mechanism directly to
do their accouting. If you have some suggestions, feel free to tell me.

Regards,
Zheng

> 
> > Hopefully the explaination is clear enough, and any comments or
> > suggestions are welcome. Thanks again. :-)
> >
> > Regards,
> > Zheng
> >
> >>
> >> > >
> >> > > That would probably have already been done, except that the REQ_ flags
> >> > > field is already almost full - so it might need the addition of an extra
> >> > > field or some other solution.
> >> >
> >> > In v1[1], a structure called ios is defined. This structure saves some
> >> > information (e.g. IO type) and a callback function. Some interfaces in
> >> > buffer layer are modifed to add a new argument that points to this
> >> > structure. When this request doesn't hit cache and is issued to the
> >> > disk, the callback function in this structure will be called. Filesystem
> >> > can define a function to do some operations. A defect in this solution
> >> > is that it needs to change some interfaces, such bread, breadahead and
> >> > so on. So in v2, I re-implement a new mechanism.
> >> >
> >> > >
> >> > > Either way, an fs independent solution to this problem would be worth
> >> > > considering,
> >> >
> >> > Yes, I am willing to implement an fs independent solution. This is my
> >> > original intention too. So any suggestions are welcome. Thank you.
> >> >
> >> > [1] http://www.spinics.net/lists/linux-ext4/msg28608.html
> >> >
> >> > Regards,
> >> > Zheng
> >> >
> >> Ok. Sounds good. GFS2 already sets REQ_META in all places where metadata
> >> is being written. Now that REQ_META as been demerged from the REQ_PRIO
> >> flag, there is no reason that filesystems cannot set it without fear of
> >> side effects. Its only purpose is as a notification to blktrace now,
> >>
> >> Steve.
> >>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 
> -- 
> Aditya
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-16  8:43           ` Zheng Liu
@ 2011-11-16 10:14             ` Steven Whitehouse
  2011-11-18  2:48               ` Zheng Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Whitehouse @ 2011-11-16 10:14 UTC (permalink / raw)
  To: Zheng Liu; +Cc: Aditya Kali, linux-ext4, linux-fsdevel, Jens Axboe

Hi,

On Wed, 2011-11-16 at 16:43 +0800, Zheng Liu wrote:
> On Tue, Nov 15, 2011 at 10:34:20AM -0800, Aditya Kali wrote:
> > On Mon, Nov 14, 2011 at 5:35 AM, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> > > On Mon, Nov 14, 2011 at 10:23:01AM +0000, Steven Whitehouse wrote:
> > >> Hi,
> > >>
> > >> On Fri, 2011-11-11 at 23:32 +0800, Zheng Liu wrote:
> > >> > On Fri, Nov 11, 2011 at 10:55:26AM +0000, Steven Whitehouse wrote:
> > >> > > Hi,
> > >> > >
> > >> > > On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
> > >> > > > Hi all,
> > >> > > >
> > >> > > > v1->v2: totally redesign this mechanism
> > >> > > >
> > >> > > > This patchset implements an io types statistic mechanism for filesystem
> > >> > > > and it has been added into ext4 to let us know how the ext4 is used by
> > >> > > > applications. It is useful for us to analyze how to improve the filesystem
> > >> > > > and applications. Nowadays, I have added it into ext4, but other filesytems
> > >> > > > also can use it to count the io types by themselves.
> > >> > > >
> > >> > > > A 'Issue' flag is added into buffer_head and will be set in submit_bh().
> > >> > > > Thus, we can check this flag in filesystem to know that a request is issued
> > >> > > > to the disk when this flag is set. Filesystems just need to check it in
> > >> > > > read operation because filesystem should know whehter a write request hits
> > >> > > > cache or not, at least in ext4. In filesystem, buffer needs to be locked in
> > >> > > > checking and clearing this flag, but it doesn't cost much overhead.
> > >> > > >
> > >> >
> > >> > Hi Steve,
> > >> >
> > >> > Thank you for your attention.
> > >> >
> > >> > > There is already a REQ_META flag available which allows distinction
> > >> > > between data and metadata I/O (at least when they are not contained
> > >> > > within the same block). If that was to be extended to allow some
> > >> > > filesystem specific bits that would solve the problem that you appear to
> > >> > > be addressing with these patches in a fs independent way.
> > >> >
> > >> > You are right. REQ_META flag quite can distinguish between metadata and
> > >> > data. But it is difficulty to check this flag in filesystem because
> > >> > buffer_head doesn't use it and most of filesystems still use buffer_head
> > >> > to submit a IO request. This is the reason why I added a new flag into
> > >> > buffer_head.
> > >> >
> > >> I don't understand what you mean here.... submit_bh() takes a bh and a
> > >> set of REQ flags, so there is no reason to not use REQ_META. Using a bh
> > >> doesn't prevent setting those flags. The issue is only that few bits
> > >> remain unused in those flags and solving the problem in a "nice" way, by
> > >> adding a few more flags, may be tricky.
> > >
> > > Hi,
> > >
> > > Please let me explain why a new flag is needed in buffer_head.
> > >
> > > The goal of this patchset wants to provide a mechanism to let
> > > filesystems can inspect how much different types of IOs are issued to
> > > the disk. The types not only are divided into metadata and data. The
> > > detailed types are needed, such as super_block, inode, EA and so on.
> > > So filesystem needs to define some counters to save the result and
> > > increase these counters when it makes a request. But filesystems couldn't
> > > know whether or not this request is issued to the disk because the data
> > > might be in page cache, at least read operation is like that. So we need
> > > a solution to let filesystems know that. Meanwhile filesystems can free
> > > choose whether or not providing the statistic result.
> > >
> > > A new flag can be added into buffer_head and is set when the request is
> > > really issued to the disk to let filesystem know that. But it seems that
> > > REQ_META flag could not fit for us because REQ flags are used in bio.
> > > Buffer_head couldn't use these flags. So filesystem cannot check this
> > > flag that has been set or not. Further, AFAIK, some filesystems (e.g.
> > > ext4) call sb_bread() and sb_breadahead() to do a read operation besides
> > > submit_bh() and ll_rw_block(). It seems that there is no way to check
> > > REQ_META flag from buffer_head too.
> > >
> > 
> > As part of some other work, I had added ext4's own submit_bh functions
> > and replaced all the calls to submit_bh() and ll_rw_block() with
> > these:
> > 
> > ------ x ------
> > 
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > +void ext4_submit_bh_read_nowait(int rw, struct buffer_head *bh)
> > +{
> > +       BUG_ON(rw & WRITE);
> > +       BUG_ON(!buffer_locked(bh));
> > +       get_bh(bh);
> > +       bh->b_end_io = end_buffer_read_sync;
> > +       submit_bh(rw, bh);
> > +}
> > +
> > +int ext4_submit_bh_read(int rw, struct buffer_head *bh)
> > +{
> > +       BUG_ON(rw & WRITE);
> > +       BUG_ON(!buffer_locked(bh));
> > +
> > +       if (buffer_uptodate(bh)) {
> > +               unlock_buffer(bh);
> > +               return 0;
> > +       }
> > +
> > +       ext4_submit_bh_read_nowait(rw, bh);
> > +       wait_on_buffer(bh);
> > +       if (buffer_uptodate(bh))
> > +               return 0;
> > +       return -EIO;
> > +}
> > +
> >  struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
> >                                ext4_lblk_t block, int create, int *err)
> >  {
> > @@ -1572,11 +1598,9 @@ struct buffer_head *ext4_bread(handle_t
> > *handle, struct inode *inode,
> >         bh = ext4_getblk(handle, inode, block, create, err);
> >         if (!bh)
> >                 return bh;
> > -       if (buffer_uptodate(bh))
> > +       if (bh_uptodate_or_lock(bh))
> >                 return bh;
> > -       ll_rw_block(READ_META, 1, &bh);
> > -       wait_on_buffer(bh);
> > -       if (buffer_uptodate(bh))
> > +       if (!ext4_submit_bh_read(READ_META, bh))
> >                 return bh;
> >         put_bh(bh);
> >         *err = -EIO;
> > 
> > ------ x ------
> > 
> > I had made the change only for reads, but it should be easy to make it
> > do writes to. Also, this function can take ext4 specific flags and you
> > can do your accounting at a single place in ext4. So, you wont need
> > any more flags for buffer head.
> > Will this approach help in what you are trying to do?
> > 
> > Thanks,
> 
> Hi Aditya,
> 
> Thank you for your patch. It quite can help me to solve my problem. We
> can define some wrapper functions to do our accounting in ext4. But it
> seems that this approach is just suitable for ext4. I prefer to
> provide a fs independent solution. Steven and I are talking about how to
> implement it to let other filesystems can use this mechanism directly to
> do their accouting. If you have some suggestions, feel free to tell me.
> 
> Regards,
> Zheng
> 

Hi,

I think Aditya's patch is a reasonable solution to part of the problem.
Having said that, I'm much more worried about how the info gets passed
via the block layer to blktrace. One possible solution is that a number
of the REQ flags are used only in certain places, so it might be
possible to split those into a separate field or something like that, in
order to avoid changing the submit_bh() interface. Resolving this issue
is really the tricky problem here, once thats done, the rest should be
relatively simple.

Looking at blk_types.h, a large number of those flags are listed as:

        /* request only flags */
        __REQ_SORTED,           /* elevator knows about this request */
        __REQ_SOFTBARRIER,      /* may not be passed by ioscheduler */
etc.

which I suspect are used only at lower layers of the block layer, so
they would potentially be available for use at submit_bh time, even if
they were then split out to a separate variable in the bio/request.

There is also the issue of what happens when requests containing
different fs objects get merged. I'd be tempted to just OR the info
together. A more tricky problem is if such requests/bios then got split
again. I think we'd just have to take the hit on accuracy and copy the
same info to both parts of the split request/bio, otherwise it would
become too complicated.

It should then be possible, given a few spare flags to use, to add some
generic flags, perhaps:

REQ_SB
REQ_INODE
REQ_INDIRECT
REQ_XATTR
REQ_JOURNAL

would be a minimum set, plus say:

REQ_FSPRIV1
REQ_FSPRIV2

for fs specific use. I've copied in Jens to see if he has some
suggestions on a good way forward here,

Steve.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/8] Filesystem io types statistic
  2011-11-16 10:14             ` Steven Whitehouse
@ 2011-11-18  2:48               ` Zheng Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Zheng Liu @ 2011-11-18  2:48 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: Aditya Kali, linux-ext4, linux-fsdevel, Jens Axboe

Hi,

Sorry for the delay reply.

On Wed, Nov 16, 2011 at 10:14:19AM +0000, Steven Whitehouse wrote:
[snip]
> > > 
> > > As part of some other work, I had added ext4's own submit_bh functions
> > > and replaced all the calls to submit_bh() and ll_rw_block() with
> > > these:
> > > 
> > > ------ x ------
> > > 
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > +void ext4_submit_bh_read_nowait(int rw, struct buffer_head *bh)
> > > +{
> > > +       BUG_ON(rw & WRITE);
> > > +       BUG_ON(!buffer_locked(bh));
> > > +       get_bh(bh);
> > > +       bh->b_end_io = end_buffer_read_sync;
> > > +       submit_bh(rw, bh);
> > > +}
> > > +
> > > +int ext4_submit_bh_read(int rw, struct buffer_head *bh)
> > > +{
> > > +       BUG_ON(rw & WRITE);
> > > +       BUG_ON(!buffer_locked(bh));
> > > +
> > > +       if (buffer_uptodate(bh)) {
> > > +               unlock_buffer(bh);
> > > +               return 0;
> > > +       }
> > > +
> > > +       ext4_submit_bh_read_nowait(rw, bh);
> > > +       wait_on_buffer(bh);
> > > +       if (buffer_uptodate(bh))
> > > +               return 0;
> > > +       return -EIO;
> > > +}
> > > +
> > >  struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
> > >                                ext4_lblk_t block, int create, int *err)
> > >  {
> > > @@ -1572,11 +1598,9 @@ struct buffer_head *ext4_bread(handle_t
> > > *handle, struct inode *inode,
> > >         bh = ext4_getblk(handle, inode, block, create, err);
> > >         if (!bh)
> > >                 return bh;
> > > -       if (buffer_uptodate(bh))
> > > +       if (bh_uptodate_or_lock(bh))
> > >                 return bh;
> > > -       ll_rw_block(READ_META, 1, &bh);
> > > -       wait_on_buffer(bh);
> > > -       if (buffer_uptodate(bh))
> > > +       if (!ext4_submit_bh_read(READ_META, bh))
> > >                 return bh;
> > >         put_bh(bh);
> > >         *err = -EIO;
> > > 
> > > ------ x ------
> > > 
> > > I had made the change only for reads, but it should be easy to make it
> > > do writes to. Also, this function can take ext4 specific flags and you
> > > can do your accounting at a single place in ext4. So, you wont need
> > > any more flags for buffer head.
> > > Will this approach help in what you are trying to do?
> > > 
> > > Thanks,
> > 
> > Hi Aditya,
> > 
> > Thank you for your patch. It quite can help me to solve my problem. We
> > can define some wrapper functions to do our accounting in ext4. But it
> > seems that this approach is just suitable for ext4. I prefer to
> > provide a fs independent solution. Steven and I are talking about how to
> > implement it to let other filesystems can use this mechanism directly to
> > do their accouting. If you have some suggestions, feel free to tell me.
> > 
> > Regards,
> > Zheng
> > 
> 
> Hi,
> 
> I think Aditya's patch is a reasonable solution to part of the problem.
> Having said that, I'm much more worried about how the info gets passed
> via the block layer to blktrace. One possible solution is that a number
> of the REQ flags are used only in certain places, so it might be
> possible to split those into a separate field or something like that, in
> order to avoid changing the submit_bh() interface. Resolving this issue
> is really the tricky problem here, once thats done, the rest should be
> relatively simple.

I think this information should be provided by filesystem rather than
block layer because it depends on specific filesystem. Block layer only
needs to notify filesystem that the request is issued to the disk. Then
filesystem can do something according to this notification, such as
increasing different counters by the IO types. So I don't think blktrace
shoud handle these flags.

> 
> Looking at blk_types.h, a large number of those flags are listed as:
> 
>         /* request only flags */
>         __REQ_SORTED,           /* elevator knows about this request */
>         __REQ_SOFTBARRIER,      /* may not be passed by ioscheduler */
> etc.
> 
> which I suspect are used only at lower layers of the block layer, so
> they would potentially be available for use at submit_bh time, even if
> they were then split out to a separate variable in the bio/request.
> 
> There is also the issue of what happens when requests containing
> different fs objects get merged. I'd be tempted to just OR the info
> together. A more tricky problem is if such requests/bios then got split
> again. I think we'd just have to take the hit on accuracy and copy the
> same info to both parts of the split request/bio, otherwise it would
> become too complicated.
> 
> It should then be possible, given a few spare flags to use, to add some
> generic flags, perhaps:
> 
> REQ_SB
> REQ_INODE
> REQ_INDIRECT
> REQ_XATTR
> REQ_JOURNAL
> 
> would be a minimum set, plus say:
> 
> REQ_FSPRIV1
> REQ_FSPRIV2
> 
> for fs specific use. I've copied in Jens to see if he has some
> suggestions on a good way forward here,

This is an optional approach. We can define some REQ_* flags into block
layer and build some wrapper functions in filesystem like Aditya's
suggestion to handle these flags. But it has a defect that this
implementation is not a fs independent solution, even though we don't
need to modify block layer.

I will try to implement a new version by this approach.

Regards,
Zheng

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-11-18  2:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-10 10:34 [PATCH v2 0/8] Filesystem io types statistic Zheng Liu
2011-11-10 10:34 ` [PATCH v2 1/8] vfs: Add a new flag and related functions in buffer to count io types Zheng Liu
2011-11-11 10:48   ` Steven Whitehouse
2011-11-11 15:36     ` Zheng Liu
2011-11-10 10:34 ` [PATCH v2 2/8] ext4: Add new data structures and related functions " Zheng Liu
2011-11-11 10:58   ` Steven Whitehouse
2011-11-11 15:45     ` Zheng Liu
2011-11-10 10:34 ` [PATCH v2 3/8] ext4: Count metadata request of read operations in buffered io Zheng Liu
2011-11-10 10:34 ` [PATCH v2 4/8] ext4: Count data " Zheng Liu
2011-11-10 10:34 ` [PATCH v2 5/8] ext4: Count metadata request of write " Zheng Liu
2011-11-10 10:34 ` [PATCH v2 6/8] ext4: Count data " Zheng Liu
2011-11-10 10:34 ` [PATCH v2 7/8] ext4: Count all requests in direct io Zheng Liu
2011-11-10 10:34 ` [PATCH v2 8/8] ext4: Show the result of io types statistic in sysfs Zheng Liu
2011-11-11 10:55 ` [PATCH v2 0/8] Filesystem io types statistic Steven Whitehouse
2011-11-11 15:32   ` Zheng Liu
2011-11-14 10:23     ` Steven Whitehouse
2011-11-14 13:35       ` Zheng Liu
2011-11-15 18:34         ` Aditya Kali
2011-11-16  8:43           ` Zheng Liu
2011-11-16 10:14             ` Steven Whitehouse
2011-11-18  2:48               ` Zheng Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.