All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] ext4: try to fix up es regressions
@ 2013-02-28 18:26 Zheng Liu
  2013-02-28 18:26 ` [PATCH 1/6] ext4: invalidate exntent-status-tree during extent_migration Zheng Liu
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Zheng Liu @ 2013-02-28 18:26 UTC (permalink / raw)
  To: linux-ext4; +Cc: Zheng Liu, Theodore Ts'o, Dmitry Monakhov

Hi all,

Sorry for the late.  This patch series tries to fix up some regressions
after we try to use extent status tree as a cache, and give it some
improvements.

The first commit tries to fix a e4defrag regression which is reported by
Dmitry.  As Dmitry reported after applied this patch there still has a
failure in Dmitry's xfstests #300.

The second commit improves ext4_es_can_be_merged function to avoid a
potential overflow.

The third commit adds a self-testing infrastructure for extent status
tree to do a sanity check.  After as a extent cache we need to make sure
that extent status tree caches right things.  After applied this patch,
we will get a lot of messages when we run xfstests, e.g. #83, #91, ...
Please define ES_AGGRESSIVE_TEST to enable it.

The fourth commit lets ext4_ext_handle_uninitialized_extents return right
length of converted extent.  After applied this patch, some messages
that reports the return value is not equal to map->m_len disappears.

The fifth commit fixes a bug that will causes an initialized extent is
marked as uninitialized when we try to split an extent.  This patch is
very important for the later one because we will not eliminate all
error messages if this patch is missing.

The last commit updates extent status tree after we zeroout an extent.
Meanwhile we will not update it in ext4_map_blocks.  Otherwise we will
track an written extent as unwritten.  It might cause a potential bug.
After applied this patch, we will never get any messages from self-
testing infrastructure except that xfstest #223 when dioread_nolock
enables.  We don't need to worry about it because when we lookup a
block mapping from status tree i_data_sem hasn't been taken.  So there
is a race window that an unwritten extent is converted by end_io when
we compare the result between extent tree and status tree.  I have tried
to take i_data_sem locking and run #223 again, and the message doesn't
be printed.

After applied the patch series, I run xfstests with the following
configurations because it is easy to trigger a unwritten->written
conversion.

  * EXT4 4k Block
  * EXT4 4k Block with dioread_nolock
  * EXT4 4k Block with bigalloc

And I don't get any message from self-testing infrastructure except #223
as said before.  I will go on testing other configurations.  Hope I don't
screw up something.

The patch series also needs to do more tests, though.  I decide to send it
out now to let others review because we still have a pending regression.
I want to know whether this patch series can it or not.

Dmitry, I still couldn't reproduce the failure of #300 in my desktop.
I change another machine with a HDD to try to reproduce it, but I failed.
Could you please test this patch series to confirm this regression can
be fixed or not.  I really appreicate for your help.  If you have any
update please let me know.

This patch series is against the latest dev branch of ext4.  As always,
any feedback is appreicated.

Thanks!!!
						- Zheng

Dmitry Monakhov (3):
  ext4: invalidate exntent-status-tree during extent_migration
  ext4: add self-testing infrastructure to do a sanity check
  ext4: ext4_split_extent shoult take care about extent zeroout v3

Zheng Liu (3):
  ext4: improve ext4_es_can_be_merged() to avoid a potential overflow
  ext4: fix wrong m_len value after unwritten extent conversion
  ext4: update extent status tree after an extent is zeroed out

 fs/ext4/extents.c        |  53 ++++++++++++++++++++----
 fs/ext4/extents_status.c | 104 +++++++++++++++++++++++++++++++++++++++++++---
 fs/ext4/extents_status.h |   9 ++++
 fs/ext4/inode.c          | 105 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/move_extent.c    |   8 ++++
 5 files changed, 266 insertions(+), 13 deletions(-)

-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] ext4: invalidate exntent-status-tree during extent_migration
  2013-02-28 18:26 [PATCH 0/6] ext4: try to fix up es regressions Zheng Liu
@ 2013-02-28 18:26 ` Zheng Liu
  2013-02-28 18:26 ` [PATCH 2/6] ext4: improve ext4_es_can_be_merged() to avoid a potential overflow Zheng Liu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Zheng Liu @ 2013-02-28 18:26 UTC (permalink / raw)
  To: linux-ext4; +Cc: Dmitry Monakhov, Theodore Ts'o

From: Dmitry Monakhov <dmonakhov@openvz.org>

mext_replace_branches() will change inode's extents layout so
we have to drop corresponding cache.

TESTCASE:  301'th xfstest was not yet accepted to official xfstest's branch
and can be found here: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/move_extent.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index d78c33e..c1f15b2 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -666,6 +666,14 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
 	int replaced_count = 0;
 	int dext_alen;
 
+	*err = ext4_es_remove_extent(orig_inode, from, count);
+	if (*err)
+		goto out;
+
+	*err = ext4_es_remove_extent(donor_inode, from, count);
+	if (*err)
+		goto out;
+
 	/* Get the original extent for the block "orig_off" */
 	*err = get_ext_path(orig_inode, orig_off, &orig_path);
 	if (*err)
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] ext4: improve ext4_es_can_be_merged() to avoid a potential overflow
  2013-02-28 18:26 [PATCH 0/6] ext4: try to fix up es regressions Zheng Liu
  2013-02-28 18:26 ` [PATCH 1/6] ext4: invalidate exntent-status-tree during extent_migration Zheng Liu
@ 2013-02-28 18:26 ` Zheng Liu
  2013-02-28 18:26 ` [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check Zheng Liu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Zheng Liu @ 2013-02-28 18:26 UTC (permalink / raw)
  To: linux-ext4; +Cc: Zheng Liu, Theodore Ts'o, Dmitry Monakhov

From: Zheng Liu <wenqing.lz@taobao.com>

This commit tries to improve ext4_es_can_be_merged.  We check the length
of extent to avoid a potential overflow.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents_status.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index f768f4a..a769485 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -329,17 +329,33 @@ static void ext4_es_free_extent(struct inode *inode, struct extent_status *es)
 static int ext4_es_can_be_merged(struct extent_status *es1,
 				 struct extent_status *es2)
 {
-	if (es1->es_lblk + es1->es_len != es2->es_lblk)
+	__u64 es1_len, es2_len, max_len;
+
+	if (ext4_es_status(es1) ^ ext4_es_status(es2))
 		return 0;
 
-	if (ext4_es_status(es1) != ext4_es_status(es2))
+	max_len = 0xFFFFFFFFULL;
+	es1_len = es1->es_len;
+	es2_len = es2->es_len;
+
+	if (es1_len + es2_len > max_len)
 		return 0;
 
-	if ((ext4_es_is_written(es1) || ext4_es_is_unwritten(es1)) &&
-	    (ext4_es_pblock(es1) + es1->es_len != ext4_es_pblock(es2)))
+	if (es1->es_lblk + es1_len != es2->es_lblk)
 		return 0;
 
-	return 1;
+	if ((ext4_es_is_written(es1) || ext4_es_is_unwritten(es1)) &&
+	    (ext4_es_pblock(es1) + es1_len == ext4_es_pblock(es2)))
+		return 1;
+
+	if (ext4_es_is_hole(es1))
+		return 1;
+
+	/* we need to check delayed extent is without unwritten status */
+	if (ext4_es_is_delayed(es1) && !ext4_es_is_unwritten(es1))
+		return 1;
+
+	return 0;
 }
 
 static struct extent_status *
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check
  2013-02-28 18:26 [PATCH 0/6] ext4: try to fix up es regressions Zheng Liu
  2013-02-28 18:26 ` [PATCH 1/6] ext4: invalidate exntent-status-tree during extent_migration Zheng Liu
  2013-02-28 18:26 ` [PATCH 2/6] ext4: improve ext4_es_can_be_merged() to avoid a potential overflow Zheng Liu
@ 2013-02-28 18:26 ` Zheng Liu
  2013-02-28 19:04   ` Dmitry Monakhov
  2013-02-28 18:26 ` [PATCH 4/6] ext4: fix wrong m_len value after unwritten extent conversion Zheng Liu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Zheng Liu @ 2013-02-28 18:26 UTC (permalink / raw)
  To: linux-ext4; +Cc: Dmitry Monakhov, Zheng Liu, Theodore Ts'o

From: Dmitry Monakhov <dmonakhov@openvz.org>

This commit adds a self-testing infrastructure like extent tree does to
do a sanity check for extent status tree.  After status tree is as a
extent cache, we'd better to make sure that it caches right result.

After applied this commit, we will get a lot of messages when we run
xfstests as below.

...
kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len
3 in ext4_map_blocks (allocation)
...
kernel: ES cache assertation failed for inode: 230 es_cached ex
[974/2/4781/20] != found ex [974/1/4781/1000]
...
kernel: ES insert assertation failed for inode: 635 ex_status
[0/45/21388/w] != es_status [44/1/21432/u]
...

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/extents_status.c | 64 ++++++++++++++++++++++++++++++++
 fs/ext4/extents_status.h |  6 +++
 fs/ext4/inode.c          | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 165 insertions(+)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index a769485..02df536 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -401,6 +401,66 @@ ext4_es_try_to_merge_right(struct inode *inode, struct extent_status *es)
 	return es;
 }
 
+#ifdef ES_AGGRESSIVE_TEST
+static void ext4_es_insert_extent_check(struct inode *inode,
+					struct extent_status *es)
+{
+	struct ext4_ext_path *path = NULL;
+	struct ext4_extent *ex;
+	ext4_lblk_t ee_block;
+	ext4_fsblk_t ee_start;
+	unsigned short ee_len;
+	int depth, ex_status, es_status;
+
+	/*
+	 * Delayed and hole haven't been allocated in disk.  We
+	 * don't need to check it.
+	 */
+	if (!ext4_es_is_written(es) && !ext4_es_is_unwritten(es))
+		return;
+
+	/*
+	 * We don't need to worry about the race condition because
+	 * caller takes i_data_sem locking.
+	 */
+	path = ext4_ext_find_extent(inode, es->es_lblk, NULL);
+	if (IS_ERR(path))
+		return;
+
+	depth = ext_depth(inode);
+
+	/* We should always find an extent at given logical block */
+	ex = path[depth].p_ext;
+	if (!ex) {
+		printk("ES insert assertation failed to inode: %lu "
+		       "can not find the extent at block %d\n",
+		       inode->i_ino, es->es_lblk);
+		goto out;
+	}
+
+	ee_block = le32_to_cpu(ex->ee_block);
+	ee_start = ext4_ext_pblock(ex);
+	ee_len = ext4_ext_get_actual_len(ex);
+
+	ex_status = ext4_ext_is_uninitialized(ex) ? 1 : 0;
+	es_status = ext4_es_is_unwritten(es) ? 1 : 0;
+	if (ex_status ^ es_status) {
+		printk("ES insert assertation failed for inode: %lu "
+		       "ex_status [%d/%d/%llu/%c] != "
+		       "es_status [%d/%d/%llu/%c]\n", inode->i_ino,
+		       ee_block, ee_len, ee_start, ex_status ? 'u' : 'w',
+		       es->es_lblk, es->es_len, ext4_es_pblock(es),
+		       es_status ? 'u' : 'w');
+	}
+out:
+	if (path) {
+		ext4_ext_drop_refs(path);
+		kfree(path);
+	}
+
+}
+#endif
+
 static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
 {
 	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
@@ -483,6 +543,10 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 	ext4_es_store_status(&newes, status);
 	trace_ext4_es_insert_extent(inode, &newes);
 
+#ifdef ES_AGGRESSIVE_TEST
+	ext4_es_insert_extent_check(inode, &newes);
+#endif
+
 	write_lock(&EXT4_I(inode)->i_es_lock);
 	err = __es_remove_extent(inode, lblk, end);
 	if (err != 0)
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index f190dfe..56140ad 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -21,6 +21,12 @@
 #endif
 
 /*
+ * With ES_AGGRESSIVE_TEST defined, the result of es caching will be
+ * checked with old map_block's result.
+ */
+#define ES_AGGRESSIVE_TEST__
+
+/*
  * These flags live in the high bits of extent_status.es_pblk
  */
 #define EXTENT_STATUS_WRITTEN	(1ULL << 63)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 95a0c62..8d22170 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -482,6 +482,57 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
 	return num;
 }
 
+#ifdef ES_AGGRESSIVE_TEST
+static void ext4_map_blocks_es_recheck(handle_t *handle,
+				       struct inode *inode,
+				       struct ext4_map_blocks *es_map,
+				       struct ext4_map_blocks *map,
+				       int flags)
+{
+	int retval;
+
+	map->m_flags = 0;
+	/*
+	 * There is a race window that the result is not the same.
+	 * e.g. xfstests #223 when dioread_nolock enables.  The reason
+	 * is that we lookup a block mapping in extent status tree with
+	 * out taking i_data_sem.  So at the time the unwritten extent
+	 * could be converted.
+	 */
+	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
+		down_read((&EXT4_I(inode)->i_data_sem));
+	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+		retval = ext4_ext_map_blocks(handle, inode, map, flags &
+					     EXT4_GET_BLOCKS_KEEP_SIZE);
+	} else {
+		retval = ext4_ind_map_blocks(handle, inode, map, flags &
+					     EXT4_GET_BLOCKS_KEEP_SIZE);
+	}
+	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
+		up_read((&EXT4_I(inode)->i_data_sem));
+	/*
+	 * Clear EXT4_MAP_FROM_CLUSTER flag because it shouldn't be marked
+	 * in es_map->m_flags.
+	 */
+	map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
+
+	/*
+	 * We don't check m_len because extent will be collpased in status
+	 * tree.  So the m_len might not equal.
+	 */
+	if (es_map->m_lblk != map->m_lblk ||
+	    es_map->m_flags != map->m_flags ||
+	    es_map->m_pblk != map->m_pblk) {
+		printk("ES cache assertation failed for inode: %lu "
+		       "es_cached ex [%d/%d/%llu/%x] != "
+		       "found ex [%d/%d/%llu/%x]\n", inode->i_ino,
+		       es_map->m_lblk, es_map->m_len,
+		       es_map->m_pblk, es_map->m_flags,
+		       map->m_lblk, map->m_len, map->m_pblk, map->m_flags);
+	}
+}
+#endif /* ES_AGGRESSIVE_TEST */
+
 /*
  * The ext4_map_blocks() function tries to look up the requested blocks,
  * and returns if the blocks are already mapped.
@@ -509,6 +560,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 {
 	struct extent_status es;
 	int retval;
+#ifdef ES_AGGRESSIVE_TEST
+	struct ext4_map_blocks orig_map;
+
+	memcpy(&orig_map, map, sizeof(*map));
+#endif
 
 	map->m_flags = 0;
 	ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u,"
@@ -531,6 +587,10 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 		} else {
 			BUG_ON(1);
 		}
+#ifdef ES_AGGRESSIVE_TEST
+		ext4_map_blocks_es_recheck(handle, inode, map,
+					   &orig_map, flags);
+#endif
 		goto found;
 	}
 
@@ -551,6 +611,15 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 		int ret;
 		unsigned long long status;
 
+#ifdef ES_AGGRESSIVE_TEST
+		if (retval != map->m_len) {
+			printk("ES len assertation failed for inode: %lu "
+			       "retval %d != map->m_len %d "
+			       "in %s (lookup)\n", inode->i_ino, retval,
+			       map->m_len, __func__);
+		}
+#endif
+
 		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
 				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
 		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
@@ -643,6 +712,15 @@ found:
 		int ret;
 		unsigned long long status;
 
+#ifdef ES_AGGRESSIVE_TEST
+		if (retval != map->m_len) {
+			printk("ES len assertation failed for inode: %lu "
+			       "retval %d != map->m_len %d "
+			       "in %s (allocation)\n", inode->i_ino, retval,
+			       map->m_len, __func__);
+		}
+#endif
+
 		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
 				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
 		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
@@ -1768,6 +1846,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 	struct extent_status es;
 	int retval;
 	sector_t invalid_block = ~((sector_t) 0xffff);
+#ifdef ES_AGGRESSIVE_TEST
+	struct ext4_map_blocks orig_map;
+
+	memcpy(&orig_map, map, sizeof(&map));
+#endif
 
 	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
 		invalid_block = ~0;
@@ -1809,6 +1892,9 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 		else
 			BUG_ON(1);
 
+#ifdef ES_AGRESSIVE_TEST
+		ext4_map_blocks_es_recheck(NULL, inode, map, &orig_map, flags);
+#endif
 		return retval;
 	}
 
@@ -1873,6 +1959,15 @@ add_delayed:
 		int ret;
 		unsigned long long status;
 
+#ifdef ES_AGGRESSIVE_TEST
+		if (retval != map->m_len) {
+			printk("ES len assertation failed for inode: %lu "
+			       "retval %d != map->m_len %d "
+			       "in %s (lookup)\n", inode->i_ino, retval,
+			       map->m_len, __func__);
+		}
+#endif
+
 		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
 				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
 		ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] ext4: fix wrong m_len value after unwritten extent conversion
  2013-02-28 18:26 [PATCH 0/6] ext4: try to fix up es regressions Zheng Liu
                   ` (2 preceding siblings ...)
  2013-02-28 18:26 ` [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check Zheng Liu
@ 2013-02-28 18:26 ` Zheng Liu
  2013-02-28 18:26 ` [PATCH 5/6] ext4: ext4_split_extent shoult take care about extent zeroout v3 Zheng Liu
  2013-02-28 18:26 ` [PATCH 6/6] ext4: update extent status tree after an extent is zeroed out Zheng Liu
  5 siblings, 0 replies; 13+ messages in thread
From: Zheng Liu @ 2013-02-28 18:26 UTC (permalink / raw)
  To: linux-ext4; +Cc: Zheng Liu, Theodore Ts'o, Dmitry Monakhov

From: Zheng Liu <wenqing.lz@taobao.com>

We always assume that the return value of ext4_ext_map_blocks is equal
to map->m_len but when we try to convert an unwritten extent 'm_len'
value will break this assumption.  It is harmless until we use status
tree as a extent cache because we need to update status tree according
to 'm_len' value.

Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent
conversion.  It shouldn't cause a bug because we update status tree
according to checking EXT4_MAP_UNWRITTEN flag.  But it should be fixed.

After applied this commit, the following error message from self-testing
infrastructure disappears.

    ...
    kernel: ES len assertation failed for inode: 230 retval 1 !=
    map->m_len 3 in ext4_map_blocks (allocation)
    ...

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 372b2cb..0793a13 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3626,6 +3626,10 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
 						 path, map->m_len);
 		} else
 			err = ret;
+		map->m_flags |= EXT4_MAP_MAPPED;
+		if (allocated > map->m_len)
+			allocated = map->m_len;
+		map->m_len = allocated;
 		goto out2;
 	}
 	/* buffered IO case */
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] ext4: ext4_split_extent shoult take care about extent zeroout v3
  2013-02-28 18:26 [PATCH 0/6] ext4: try to fix up es regressions Zheng Liu
                   ` (3 preceding siblings ...)
  2013-02-28 18:26 ` [PATCH 4/6] ext4: fix wrong m_len value after unwritten extent conversion Zheng Liu
@ 2013-02-28 18:26 ` Zheng Liu
  2013-02-28 18:26 ` [PATCH 6/6] ext4: update extent status tree after an extent is zeroed out Zheng Liu
  5 siblings, 0 replies; 13+ messages in thread
From: Zheng Liu @ 2013-02-28 18:26 UTC (permalink / raw)
  To: linux-ext4; +Cc: Dmitry Monakhov, Theodore Ts'o, Zheng Liu

From: Dmitry Monakhov <dmonakhov@openvz.org>

When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized resulting in
end IO code getting confused from large unwritten extents and may result in
data loss.

The example of problematic behavior is:
			    lblk len              lblk len
  ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
    ext4_split_extent_at() (split [1000,30,uninit] at 1020)
      ext4_ext_insert_extent() -> ENOSPC
      ext4_ext_zeroout()
	 -> extent [1000,30] is now initialized
    ext4_split_extent_at() (split [1000,30,init] at 1010,
			     MARK_UNINIT1 | MARK_UNINIT2)
      -> extent is split and parts marked as uninitialized

Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied to
initialized extent so this patch also add BUG_ON to prevent similar issues
in future.

TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597

Changes since V2: Patch no longer depends on Jan's "disable-uninit-ext-mergring" patch.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/extents.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0793a13..835d82b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2943,6 +2943,10 @@ static int ext4_split_extent_at(handle_t *handle,
 	newblock = split - ee_block + ext4_ext_pblock(ex);
 
 	BUG_ON(split < ee_block || split >= (ee_block + ee_len));
+	BUG_ON(!ext4_ext_is_uninitialized(ex) &&
+	       split_flag & (EXT4_EXT_MAY_ZEROOUT |
+			     EXT4_EXT_MARK_UNINIT1 |
+			     EXT4_EXT_MARK_UNINIT2));
 
 	err = ext4_ext_get_access(handle, inode, path + depth);
 	if (err)
@@ -3061,19 +3065,25 @@ static int ext4_split_extent(handle_t *handle,
 		if (err)
 			goto out;
 	}
-
+	/*
+	 * Update path is required because previous ext4_split_extent_at() may
+	 * result in split of original leaf or extent zeroout.
+	 */
 	ext4_ext_drop_refs(path);
 	path = ext4_ext_find_extent(inode, map->m_lblk, path);
 	if (IS_ERR(path))
 		return PTR_ERR(path);
+	depth = ext_depth(inode);
+	ex = path[depth].p_ext;
+	uninitialized = ext4_ext_is_uninitialized(ex);
+	split_flag1 = 0;
 
 	if (map->m_lblk >= ee_block) {
-		split_flag1 = split_flag & (EXT4_EXT_MAY_ZEROOUT |
-					    EXT4_EXT_DATA_VALID2);
-		if (uninitialized)
+		split_flag1 = split_flag & EXT4_EXT_DATA_VALID2;
+		if (uninitialized) {
 			split_flag1 |= EXT4_EXT_MARK_UNINIT1;
-		if (split_flag & EXT4_EXT_MARK_UNINIT2)
-			split_flag1 |= EXT4_EXT_MARK_UNINIT2;
+			split_flag1 |= split_flag & (EXT4_EXT_MAY_ZEROOUT | EXT4_EXT_MARK_UNINIT2);
+		}
 		err = ext4_split_extent_at(handle, inode, path,
 				map->m_lblk, split_flag1, flags);
 		if (err)
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] ext4: update extent status tree after an extent is zeroed out
  2013-02-28 18:26 [PATCH 0/6] ext4: try to fix up es regressions Zheng Liu
                   ` (4 preceding siblings ...)
  2013-02-28 18:26 ` [PATCH 5/6] ext4: ext4_split_extent shoult take care about extent zeroout v3 Zheng Liu
@ 2013-02-28 18:26 ` Zheng Liu
  2013-02-28 18:45   ` Dmitry Monakhov
  5 siblings, 1 reply; 13+ messages in thread
From: Zheng Liu @ 2013-02-28 18:26 UTC (permalink / raw)
  To: linux-ext4; +Cc: Zheng Liu, Theodore Ts'o, Dmitry Monakhov

From: Zheng Liu <wenqing.lz@taobao.com>

When we try to split an extent, this extent could be zeroed out and mark
as initialized.  But we don't know this in ext4_map_blocks because it
only returns a allocated length of extent.  Meanwhile we will mark this
extent as uninitialized because we only check m_flags.

This commit update extent status tree when we try to split an unwritten
extent.  We don't need to worry about the status of this extent because
we always mark it as initialized.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c        | 27 +++++++++++++++++++++++++--
 fs/ext4/extents_status.c | 14 ++++++++++++++
 fs/ext4/extents_status.h |  3 +++
 fs/ext4/inode.c          | 10 ++++++++++
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 835d82b..a2cf028 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3007,6 +3007,19 @@ static int ext4_split_extent_at(handle_t *handle,
 		ex->ee_len = cpu_to_le16(ee_len);
 		ext4_ext_try_to_merge(handle, inode, path, ex);
 		err = ext4_ext_dirty(handle, inode, path + path->p_depth);
+		if (err)
+			goto fix_extent_len;
+
+		/* update extent status tree */
+		if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
+			if (split_flag & EXT4_EXT_DATA_VALID1)
+				err = ext4_es_zeroout(inode, ex2);
+			else
+				err = ext4_es_zeroout(inode, ex);
+		} else {
+			err = ext4_es_zeroout(inode, &orig_ex);
+		}
+
 		goto out;
 	} else if (err)
 		goto fix_extent_len;
@@ -3128,7 +3141,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 	ext4_lblk_t ee_block, eof_block;
 	unsigned int ee_len, depth;
 	int allocated, max_zeroout = 0;
-	int err = 0;
+	int err = 0, has_zeroout = 0;
 	int split_flag = 0;
 
 	ext_debug("ext4_ext_convert_to_initialized: inode %lu, logical"
@@ -3251,6 +3264,9 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 		ext4_ext_mark_initialized(ex);
 		ext4_ext_try_to_merge(handle, inode, path, ex);
 		err = ext4_ext_dirty(handle, inode, path + path->p_depth);
+		if (err)
+			goto out;
+		err = ext4_es_zeroout(inode, ex);
 		goto out;
 	}
 
@@ -3277,6 +3293,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 				goto out;
 			split_map.m_lblk = map->m_lblk;
 			split_map.m_len = allocated;
+			has_zeroout = 1;
 		} else if (map->m_lblk - ee_block + map->m_len < max_zeroout) {
 			/* case 2 */
 			if (map->m_lblk != ee_block) {
@@ -3293,13 +3310,19 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
 			split_map.m_lblk = ee_block;
 			split_map.m_len = map->m_lblk - ee_block + map->m_len;
 			allocated = map->m_len;
+			has_zeroout = 1;
 		}
 	}
 
 	allocated = ext4_split_extent(handle, inode, path,
 				      &split_map, split_flag, 0);
-	if (allocated < 0)
+	if (allocated < 0) {
 		err = allocated;
+		goto out;
+	}
+
+	if (has_zeroout)
+		err = ext4_es_zeroout(inode, &zero_ex);
 
 out:
 	return err ? err : allocated;
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 02df536..7289e5f 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -745,6 +745,20 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
 	return err;
 }
 
+int ext4_es_zeroout(struct inode *inode, struct ext4_extent *ex)
+{
+	ext4_lblk_t  ee_block;
+	ext4_fsblk_t ee_pblock;
+	unsigned int ee_len;
+
+	ee_block  = le32_to_cpu(ex->ee_block);
+	ee_len    = ext4_ext_get_actual_len(ex);
+	ee_pblock = ext4_ext_pblock(ex);
+
+	return ext4_es_insert_extent(inode, ee_block, ee_len, ee_pblock,
+				     EXTENT_STATUS_WRITTEN);
+}
+
 static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
 {
 	struct ext4_sb_info *sbi = container_of(shrink,
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 56140ad..d8e2d4d 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -39,6 +39,8 @@
 				 EXTENT_STATUS_DELAYED | \
 				 EXTENT_STATUS_HOLE)
 
+struct ext4_extent;
+
 struct extent_status {
 	struct rb_node rb_node;
 	ext4_lblk_t es_lblk;	/* first logical block extent covers */
@@ -64,6 +66,7 @@ extern void ext4_es_find_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
 					struct extent_status *es);
 extern int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
 				 struct extent_status *es);
+extern int ext4_es_zeroout(struct inode *inode, struct ext4_extent *ex);
 
 static inline int ext4_es_is_written(struct extent_status *es)
 {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8d22170..e2ceab6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -721,6 +721,15 @@ found:
 		}
 #endif
 
+		/*
+		 * If the extent has been zeroed out, we don't need to update
+		 * extent status tree.
+		 */
+		if ((flags & EXT4_GET_BLOCKS_PRE_IO) &&
+		    ext4_es_lookup_extent(inode, map->m_lblk, &es)) {
+			if (ext4_es_is_written(&es))
+				goto has_zeroout;
+		}
 		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
 				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
 		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
@@ -733,6 +742,7 @@ found:
 			retval = ret;
 	}
 
+has_zeroout:
 	up_write((&EXT4_I(inode)->i_data_sem));
 	if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
 		int ret = check_block_validity(inode, map);
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/6] ext4: update extent status tree after an extent is zeroed out
  2013-02-28 18:26 ` [PATCH 6/6] ext4: update extent status tree after an extent is zeroed out Zheng Liu
@ 2013-02-28 18:45   ` Dmitry Monakhov
  2013-03-01  2:37     ` Zheng Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Monakhov @ 2013-02-28 18:45 UTC (permalink / raw)
  To: Zheng Liu, linux-ext4; +Cc: Zheng Liu, Theodore Ts'o

On Fri,  1 Mar 2013 02:26:44 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> When we try to split an extent, this extent could be zeroed out and mark
> as initialized.  But we don't know this in ext4_map_blocks because it
> only returns a allocated length of extent.  Meanwhile we will mark this
> extent as uninitialized because we only check m_flags.
> 
> This commit update extent status tree when we try to split an unwritten
> extent.  We don't need to worry about the status of this extent because
> we always mark it as initialized.
Your solution seems too complex, why not just update es_cache inside 
ext4_ext_zeroout() like this:

>From e0565336c9ea540b0297781196453b358c3c3313 Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Thu, 28 Feb 2013 22:40:14 +0400
Subject: [PATCH] ext4: update status tree on extent zeroout


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c           |    4 ++++
 include/trace/events/ext4.h |   27 +++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c89e850..15cc324 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2885,12 +2885,16 @@ static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
 	unsigned int ee_len;
 	int ret;
 
+	trace_ext4_ext_zeroout(inode, ex);
 	ee_len    = ext4_ext_get_actual_len(ex);
 	ee_pblock = ext4_ext_pblock(ex);
 
 	ret = sb_issue_zeroout(inode->i_sb, ee_pblock, ee_len, GFP_NOFS);
 	if (ret > 0)
 		ret = 0;
+	if (!ret)
+		ret = ext4_es_insert_extent(inode, le32_to_cpu(ex->ee_block),
+					    ee_len, ee_pblock, EXTENT_STATUS_WRITTEN);
 
 	return ret;
 }
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index c0457c0..cd750a6 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2092,6 +2092,33 @@ TRACE_EVENT(ext4_ext_remove_space_done,
 		  (unsigned short) __entry->eh_entries)
 );
 
+TRACE_EVENT(ext4_ext_zeroout,
+	TP_PROTO(struct inode *inode, struct ext4_extent *ex),
+
+	TP_ARGS(inode, ex),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,		dev	)
+		__field(	ino_t,		ino	)
+		__field(	ext4_lblk_t,	u_lblk	)
+		__field(	unsigned,	u_len	)
+		__field(	ext4_fsblk_t,	u_pblk	)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= inode->i_sb->s_dev;
+		__entry->ino		= inode->i_ino;
+		__entry->u_lblk		= le32_to_cpu(ex->ee_block);
+		__entry->u_len		= ext4_ext_get_actual_len(ex);
+		__entry->u_pblk		= ext4_ext_pblock(ex);
+	),
+
+	TP_printk("dev %d,%d ino %lu ee_lblk %u ee_len %u ee_pblk %llu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->ino,
+		  __entry->u_lblk, __entry->u_len, __entry->u_pblk)
+);
+
 TRACE_EVENT(ext4_es_insert_extent,
 	TP_PROTO(struct inode *inode, struct extent_status *es),
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check
  2013-02-28 18:26 ` [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check Zheng Liu
@ 2013-02-28 19:04   ` Dmitry Monakhov
  2013-03-01  2:28     ` Zheng Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Monakhov @ 2013-02-28 19:04 UTC (permalink / raw)
  To: Zheng Liu, linux-ext4; +Cc: Zheng Liu, Theodore Ts'o

On Fri,  1 Mar 2013 02:26:41 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> From: Dmitry Monakhov <dmonakhov@openvz.org>
> 
> This commit adds a self-testing infrastructure like extent tree does to
> do a sanity check for extent status tree.  After status tree is as a
> extent cache, we'd better to make sure that it caches right result.
> 
> After applied this commit, we will get a lot of messages when we run
> xfstests as below.
> 
> ...
> kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len
> 3 in ext4_map_blocks (allocation)
> ...
> kernel: ES cache assertation failed for inode: 230 es_cached ex
> [974/2/4781/20] != found ex [974/1/4781/1000]
> ...
> kernel: ES insert assertation failed for inode: 635 ex_status
> [0/45/21388/w] != es_status [44/1/21432/u]
> ...
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> ---
>  fs/ext4/extents_status.c | 64 ++++++++++++++++++++++++++++++++
>  fs/ext4/extents_status.h |  6 +++
>  fs/ext4/inode.c          | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 165 insertions(+)
> 
> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> index a769485..02df536 100644
> --- a/fs/ext4/extents_status.c
> +++ b/fs/ext4/extents_status.c
> @@ -401,6 +401,66 @@ ext4_es_try_to_merge_right(struct inode *inode, struct extent_status *es)
>  	return es;
>  }
>  
> +#ifdef ES_AGGRESSIVE_TEST
> +static void ext4_es_insert_extent_check(struct inode *inode,
> +					struct extent_status *es)
Agree, this is very good place for sanity check.
> +{
> +	struct ext4_ext_path *path = NULL;
> +	struct ext4_extent *ex;
> +	ext4_lblk_t ee_block;
> +	ext4_fsblk_t ee_start;
> +	unsigned short ee_len;
> +	int depth, ex_status, es_status;
> +
> +	/*
> +	 * Delayed and hole haven't been allocated in disk.  We
> +	 * don't need to check it.
> +	 */
Why?  We still can perform ext4_ext_find_extent and check that:
      /* If extent is delayed  only if ex == NULL */
      if (!ex  == ext4_es_is_written(es))
           WARN_ON();
          
      if (ext4_es_is_unwritten(es) != ext4_ext_is_uninitialized(ex))
           WARN_ON();

> +	if (!ext4_es_is_written(es) && !ext4_es_is_unwritten(es))
> +		return;
> +
> +	/*
> +	 * We don't need to worry about the race condition because
> +	 * caller takes i_data_sem locking.
         Yes, and it is good place to assertion that we hold it.
> +	 */
> +	path = ext4_ext_find_extent(inode, es->es_lblk, NULL);
> +	if (IS_ERR(path))
> +		return;
> +
> +	depth = ext_depth(inode);
> +
> +	/* We should always find an extent at given logical block */
> +	ex = path[depth].p_ext;
> +	if ((ex != NULL )  ^ ext4_es_is_written(es) {
> +		printk("ES insert assertation failed to inode: %lu "
> +		       "can not find the extent at block %d\n",
> +		       inode->i_ino, es->es_lblk);
> +		goto out;
> +	}
> +
> +> +
> +	ee_block = le32_to_cpu(ex->ee_block);
> +	ee_start = ext4_ext_pblock(ex);
> +	ee_len = ext4_ext_get_actual_len(ex);
> +
> +	ex_status = ext4_ext_is_uninitialized(ex) ? 1 : 0;
> +	es_status = ext4_es_is_unwritten(es) ? 1 : 0;
> +	if (ex_status ^ es_status) {
Why status only?  we can also check pblock, and lblk.
> +		printk("ES insert assertation failed for inode: %lu "
> +		       "ex_status [%d/%d/%llu/%c] != "
> +		       "es_status [%d/%d/%llu/%c]\n", inode->i_ino,
> +		       ee_block, ee_len, ee_start, ex_status ? 'u' : 'w',
> +		       es->es_lblk, es->es_len, ext4_es_pblock(es),
> +		       es_status ? 'u' : 'w');
> +	}
> +out:
> +	if (path) {
> +		ext4_ext_drop_refs(path);
> +		kfree(path);
> +	}
> +
> +}
> +#endif
> +
>  static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
>  {
>  	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> @@ -483,6 +543,10 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>  	ext4_es_store_status(&newes, status);
>  	trace_ext4_es_insert_extent(inode, &newes);
>  
> +#ifdef ES_AGGRESSIVE_TEST
> +	ext4_es_insert_extent_check(inode, &newes);
> +#endif
> +
>  	write_lock(&EXT4_I(inode)->i_es_lock);
>  	err = __es_remove_extent(inode, lblk, end);
>  	if (err != 0)
> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> index f190dfe..56140ad 100644
> --- a/fs/ext4/extents_status.h
> +++ b/fs/ext4/extents_status.h
> @@ -21,6 +21,12 @@
>  #endif
>  
>  /*
> + * With ES_AGGRESSIVE_TEST defined, the result of es caching will be
> + * checked with old map_block's result.
> + */
> +#define ES_AGGRESSIVE_TEST__
> +
> +/*
>   * These flags live in the high bits of extent_status.es_pblk
>   */
>  #define EXTENT_STATUS_WRITTEN	(1ULL << 63)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 95a0c62..8d22170 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -482,6 +482,57 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
>  	return num;
>  }
>  
> +#ifdef ES_AGGRESSIVE_TEST
> +static void ext4_map_blocks_es_recheck(handle_t *handle,
> +				       struct inode *inode,
> +				       struct ext4_map_blocks *es_map,
> +				       struct ext4_map_blocks *map,
> +				       int flags)
> +{
> +	int retval;
> +
> +	map->m_flags = 0;
> +	/*
> +	 * There is a race window that the result is not the same.
> +	 * e.g. xfstests #223 when dioread_nolock enables.  The reason
> +	 * is that we lookup a block mapping in extent status tree with
> +	 * out taking i_data_sem.  So at the time the unwritten extent
> +	 * could be converted.
> +	 */
> +	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> +		down_read((&EXT4_I(inode)->i_data_sem));
> +	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> +		retval = ext4_ext_map_blocks(handle, inode, map, flags &
> +					     EXT4_GET_BLOCKS_KEEP_SIZE);
> +	} else {
> +		retval = ext4_ind_map_blocks(handle, inode, map, flags &
> +					     EXT4_GET_BLOCKS_KEEP_SIZE);
> +	}
> +	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> +		up_read((&EXT4_I(inode)->i_data_sem));
> +	/*
> +	 * Clear EXT4_MAP_FROM_CLUSTER flag because it shouldn't be marked
> +	 * in es_map->m_flags.
> +	 */
> +	map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
> +
> +	/*
> +	 * We don't check m_len because extent will be collpased in status
> +	 * tree.  So the m_len might not equal.
> +	 */
> +	if (es_map->m_lblk != map->m_lblk ||
> +	    es_map->m_flags != map->m_flags ||
> +	    es_map->m_pblk != map->m_pblk) {
> +		printk("ES cache assertation failed for inode: %lu "
> +		       "es_cached ex [%d/%d/%llu/%x] != "
> +		       "found ex [%d/%d/%llu/%x]\n", inode->i_ino,
> +		       es_map->m_lblk, es_map->m_len,
> +		       es_map->m_pblk, es_map->m_flags,
> +		       map->m_lblk, map->m_len, map->m_pblk, map->m_flags);
> +	}
> +}
> +#endif /* ES_AGGRESSIVE_TEST */
> +
>  /*
>   * The ext4_map_blocks() function tries to look up the requested blocks,
>   * and returns if the blocks are already mapped.
> @@ -509,6 +560,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  {
>  	struct extent_status es;
>  	int retval;
> +#ifdef ES_AGGRESSIVE_TEST
> +	struct ext4_map_blocks orig_map;
> +
> +	memcpy(&orig_map, map, sizeof(*map));
> +#endif
>  
>  	map->m_flags = 0;
>  	ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u,"
> @@ -531,6 +587,10 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  		} else {
>  			BUG_ON(1);
>  		}
> +#ifdef ES_AGGRESSIVE_TEST
> +		ext4_map_blocks_es_recheck(handle, inode, map,
> +					   &orig_map, flags);
> +#endif
>  		goto found;
>  	}
>  
> @@ -551,6 +611,15 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
>  		int ret;
>  		unsigned long long status;
>  
> +#ifdef ES_AGGRESSIVE_TEST
> +		if (retval != map->m_len) {
> +			printk("ES len assertation failed for inode: %lu "
> +			       "retval %d != map->m_len %d "
> +			       "in %s (lookup)\n", inode->i_ino, retval,
> +			       map->m_len, __func__);
> +		}
> +#endif
> +
>  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>  		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> @@ -643,6 +712,15 @@ found:
>  		int ret;
>  		unsigned long long status;
>  
> +#ifdef ES_AGGRESSIVE_TEST
> +		if (retval != map->m_len) {
> +			printk("ES len assertation failed for inode: %lu "
> +			       "retval %d != map->m_len %d "
> +			       "in %s (allocation)\n", inode->i_ino, retval,
> +			       map->m_len, __func__);
> +		}
> +#endif
> +
>  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>  		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> @@ -1768,6 +1846,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>  	struct extent_status es;
>  	int retval;
>  	sector_t invalid_block = ~((sector_t) 0xffff);
> +#ifdef ES_AGGRESSIVE_TEST
> +	struct ext4_map_blocks orig_map;
> +
> +	memcpy(&orig_map, map, sizeof(&map));
> +#endif
>  
>  	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
>  		invalid_block = ~0;
> @@ -1809,6 +1892,9 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>  		else
>  			BUG_ON(1);
>  
> +#ifdef ES_AGRESSIVE_TEST
> +		ext4_map_blocks_es_recheck(NULL, inode, map, &orig_map, flags);
> +#endif
>  		return retval;
>  	}
>  
> @@ -1873,6 +1959,15 @@ add_delayed:
>  		int ret;
>  		unsigned long long status;
>  
> +#ifdef ES_AGGRESSIVE_TEST
> +		if (retval != map->m_len) {
> +			printk("ES len assertation failed for inode: %lu "
> +			       "retval %d != map->m_len %d "
> +			       "in %s (lookup)\n", inode->i_ino, retval,
> +			       map->m_len, __func__);
> +		}
> +#endif
> +
>  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
>  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
>  		ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> -- 
> 1.7.12.rc2.18.g61b472e
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check
  2013-02-28 19:04   ` Dmitry Monakhov
@ 2013-03-01  2:28     ` Zheng Liu
  2013-03-01  2:29       ` Zheng Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Zheng Liu @ 2013-03-01  2:28 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, Zheng Liu, Theodore Ts'o

On Thu, Feb 28, 2013 at 11:04:39PM +0400, Dmitry Monakhov wrote:
> On Fri,  1 Mar 2013 02:26:41 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> > From: Dmitry Monakhov <dmonakhov@openvz.org>
> > 
> > This commit adds a self-testing infrastructure like extent tree does to
> > do a sanity check for extent status tree.  After status tree is as a
> > extent cache, we'd better to make sure that it caches right result.
> > 
> > After applied this commit, we will get a lot of messages when we run
> > xfstests as below.
> > 
> > ...
> > kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len
> > 3 in ext4_map_blocks (allocation)
> > ...
> > kernel: ES cache assertation failed for inode: 230 es_cached ex
> > [974/2/4781/20] != found ex [974/1/4781/1000]
> > ...
> > kernel: ES insert assertation failed for inode: 635 ex_status
> > [0/45/21388/w] != es_status [44/1/21432/u]
> > ...
> > 
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > Cc: "Theodore Ts'o" <tytso@mit.edu>
> > ---
> >  fs/ext4/extents_status.c | 64 ++++++++++++++++++++++++++++++++
> >  fs/ext4/extents_status.h |  6 +++
> >  fs/ext4/inode.c          | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 165 insertions(+)
> > 
> > diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> > index a769485..02df536 100644
> > --- a/fs/ext4/extents_status.c
> > +++ b/fs/ext4/extents_status.c
> > @@ -401,6 +401,66 @@ ext4_es_try_to_merge_right(struct inode *inode, struct extent_status *es)
> >  	return es;
> >  }
> >  
> > +#ifdef ES_AGGRESSIVE_TEST
> > +static void ext4_es_insert_extent_check(struct inode *inode,
> > +					struct extent_status *es)
> Agree, this is very good place for sanity check.
> > +{
> > +	struct ext4_ext_path *path = NULL;
> > +	struct ext4_extent *ex;
> > +	ext4_lblk_t ee_block;
> > +	ext4_fsblk_t ee_start;
> > +	unsigned short ee_len;
> > +	int depth, ex_status, es_status;
> > +
> > +	/*
> > +	 * Delayed and hole haven't been allocated in disk.  We
> > +	 * don't need to check it.
> > +	 */
> Why?  We still can perform ext4_ext_find_extent and check that:
>       /* If extent is delayed  only if ex == NULL */
>       if (!ex  == ext4_es_is_written(es))
>            WARN_ON();
>           
>       if (ext4_es_is_unwritten(es) != ext4_ext_is_uninitialized(ex))
>            WARN_ON();

Yes, it will be fixed in next version.

> 
> > +	if (!ext4_es_is_written(es) && !ext4_es_is_unwritten(es))
> > +		return;
> > +
> > +	/*
> > +	 * We don't need to worry about the race condition because
> > +	 * caller takes i_data_sem locking.
>          Yes, and it is good place to assertion that we hold it.

A BUG_ON(mutex_is_locked(&inode->i_mutex) will be added.

> > +	 */
> > +	path = ext4_ext_find_extent(inode, es->es_lblk, NULL);
> > +	if (IS_ERR(path))
> > +		return;
> > +
> > +	depth = ext_depth(inode);
> > +
> > +	/* We should always find an extent at given logical block */
> > +	ex = path[depth].p_ext;
> > +	if ((ex != NULL )  ^ ext4_es_is_written(es) {
> > +		printk("ES insert assertation failed to inode: %lu "
> > +		       "can not find the extent at block %d\n",
> > +		       inode->i_ino, es->es_lblk);
> > +		goto out;
> > +	}
> > +
> > +> +
> > +	ee_block = le32_to_cpu(ex->ee_block);
> > +	ee_start = ext4_ext_pblock(ex);
> > +	ee_len = ext4_ext_get_actual_len(ex);
> > +
> > +	ex_status = ext4_ext_is_uninitialized(ex) ? 1 : 0;
> > +	es_status = ext4_es_is_unwritten(es) ? 1 : 0;
> > +	if (ex_status ^ es_status) {
> Why status only?  we can also check pblock, and lblk.

Yes, it will be fixed.

Thanks,
                                                - Zheng

> > +		printk("ES insert assertation failed for inode: %lu "
> > +		       "ex_status [%d/%d/%llu/%c] != "
> > +		       "es_status [%d/%d/%llu/%c]\n", inode->i_ino,
> > +		       ee_block, ee_len, ee_start, ex_status ? 'u' : 'w',
> > +		       es->es_lblk, es->es_len, ext4_es_pblock(es),
> > +		       es_status ? 'u' : 'w');
> > +	}
> > +out:
> > +	if (path) {
> > +		ext4_ext_drop_refs(path);
> > +		kfree(path);
> > +	}
> > +
> > +}
> > +#endif
> > +
> >  static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
> >  {
> >  	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> > @@ -483,6 +543,10 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> >  	ext4_es_store_status(&newes, status);
> >  	trace_ext4_es_insert_extent(inode, &newes);
> >  
> > +#ifdef ES_AGGRESSIVE_TEST
> > +	ext4_es_insert_extent_check(inode, &newes);
> > +#endif
> > +
> >  	write_lock(&EXT4_I(inode)->i_es_lock);
> >  	err = __es_remove_extent(inode, lblk, end);
> >  	if (err != 0)
> > diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> > index f190dfe..56140ad 100644
> > --- a/fs/ext4/extents_status.h
> > +++ b/fs/ext4/extents_status.h
> > @@ -21,6 +21,12 @@
> >  #endif
> >  
> >  /*
> > + * With ES_AGGRESSIVE_TEST defined, the result of es caching will be
> > + * checked with old map_block's result.
> > + */
> > +#define ES_AGGRESSIVE_TEST__
> > +
> > +/*
> >   * These flags live in the high bits of extent_status.es_pblk
> >   */
> >  #define EXTENT_STATUS_WRITTEN	(1ULL << 63)
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 95a0c62..8d22170 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -482,6 +482,57 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
> >  	return num;
> >  }
> >  
> > +#ifdef ES_AGGRESSIVE_TEST
> > +static void ext4_map_blocks_es_recheck(handle_t *handle,
> > +				       struct inode *inode,
> > +				       struct ext4_map_blocks *es_map,
> > +				       struct ext4_map_blocks *map,
> > +				       int flags)
> > +{
> > +	int retval;
> > +
> > +	map->m_flags = 0;
> > +	/*
> > +	 * There is a race window that the result is not the same.
> > +	 * e.g. xfstests #223 when dioread_nolock enables.  The reason
> > +	 * is that we lookup a block mapping in extent status tree with
> > +	 * out taking i_data_sem.  So at the time the unwritten extent
> > +	 * could be converted.
> > +	 */
> > +	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> > +		down_read((&EXT4_I(inode)->i_data_sem));
> > +	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> > +		retval = ext4_ext_map_blocks(handle, inode, map, flags &
> > +					     EXT4_GET_BLOCKS_KEEP_SIZE);
> > +	} else {
> > +		retval = ext4_ind_map_blocks(handle, inode, map, flags &
> > +					     EXT4_GET_BLOCKS_KEEP_SIZE);
> > +	}
> > +	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> > +		up_read((&EXT4_I(inode)->i_data_sem));
> > +	/*
> > +	 * Clear EXT4_MAP_FROM_CLUSTER flag because it shouldn't be marked
> > +	 * in es_map->m_flags.
> > +	 */
> > +	map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
> > +
> > +	/*
> > +	 * We don't check m_len because extent will be collpased in status
> > +	 * tree.  So the m_len might not equal.
> > +	 */
> > +	if (es_map->m_lblk != map->m_lblk ||
> > +	    es_map->m_flags != map->m_flags ||
> > +	    es_map->m_pblk != map->m_pblk) {
> > +		printk("ES cache assertation failed for inode: %lu "
> > +		       "es_cached ex [%d/%d/%llu/%x] != "
> > +		       "found ex [%d/%d/%llu/%x]\n", inode->i_ino,
> > +		       es_map->m_lblk, es_map->m_len,
> > +		       es_map->m_pblk, es_map->m_flags,
> > +		       map->m_lblk, map->m_len, map->m_pblk, map->m_flags);
> > +	}
> > +}
> > +#endif /* ES_AGGRESSIVE_TEST */
> > +
> >  /*
> >   * The ext4_map_blocks() function tries to look up the requested blocks,
> >   * and returns if the blocks are already mapped.
> > @@ -509,6 +560,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
> >  {
> >  	struct extent_status es;
> >  	int retval;
> > +#ifdef ES_AGGRESSIVE_TEST
> > +	struct ext4_map_blocks orig_map;
> > +
> > +	memcpy(&orig_map, map, sizeof(*map));
> > +#endif
> >  
> >  	map->m_flags = 0;
> >  	ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u,"
> > @@ -531,6 +587,10 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
> >  		} else {
> >  			BUG_ON(1);
> >  		}
> > +#ifdef ES_AGGRESSIVE_TEST
> > +		ext4_map_blocks_es_recheck(handle, inode, map,
> > +					   &orig_map, flags);
> > +#endif
> >  		goto found;
> >  	}
> >  
> > @@ -551,6 +611,15 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
> >  		int ret;
> >  		unsigned long long status;
> >  
> > +#ifdef ES_AGGRESSIVE_TEST
> > +		if (retval != map->m_len) {
> > +			printk("ES len assertation failed for inode: %lu "
> > +			       "retval %d != map->m_len %d "
> > +			       "in %s (lookup)\n", inode->i_ino, retval,
> > +			       map->m_len, __func__);
> > +		}
> > +#endif
> > +
> >  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> >  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> >  		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> > @@ -643,6 +712,15 @@ found:
> >  		int ret;
> >  		unsigned long long status;
> >  
> > +#ifdef ES_AGGRESSIVE_TEST
> > +		if (retval != map->m_len) {
> > +			printk("ES len assertation failed for inode: %lu "
> > +			       "retval %d != map->m_len %d "
> > +			       "in %s (allocation)\n", inode->i_ino, retval,
> > +			       map->m_len, __func__);
> > +		}
> > +#endif
> > +
> >  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> >  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> >  		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> > @@ -1768,6 +1846,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> >  	struct extent_status es;
> >  	int retval;
> >  	sector_t invalid_block = ~((sector_t) 0xffff);
> > +#ifdef ES_AGGRESSIVE_TEST
> > +	struct ext4_map_blocks orig_map;
> > +
> > +	memcpy(&orig_map, map, sizeof(&map));
> > +#endif
> >  
> >  	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
> >  		invalid_block = ~0;
> > @@ -1809,6 +1892,9 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> >  		else
> >  			BUG_ON(1);
> >  
> > +#ifdef ES_AGRESSIVE_TEST
> > +		ext4_map_blocks_es_recheck(NULL, inode, map, &orig_map, flags);
> > +#endif
> >  		return retval;
> >  	}
> >  
> > @@ -1873,6 +1959,15 @@ add_delayed:
> >  		int ret;
> >  		unsigned long long status;
> >  
> > +#ifdef ES_AGGRESSIVE_TEST
> > +		if (retval != map->m_len) {
> > +			printk("ES len assertation failed for inode: %lu "
> > +			       "retval %d != map->m_len %d "
> > +			       "in %s (lookup)\n", inode->i_ino, retval,
> > +			       map->m_len, __func__);
> > +		}
> > +#endif
> > +
> >  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> >  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> >  		ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> > -- 
> > 1.7.12.rc2.18.g61b472e
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check
  2013-03-01  2:28     ` Zheng Liu
@ 2013-03-01  2:29       ` Zheng Liu
  2013-03-01  3:23         ` Zheng Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Zheng Liu @ 2013-03-01  2:29 UTC (permalink / raw)
  To: Dmitry Monakhov, linux-ext4, Zheng Liu, Theodore Ts'o

On Fri, Mar 01, 2013 at 10:28:16AM +0800, Zheng Liu wrote:
> On Thu, Feb 28, 2013 at 11:04:39PM +0400, Dmitry Monakhov wrote:
> > On Fri,  1 Mar 2013 02:26:41 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> > > From: Dmitry Monakhov <dmonakhov@openvz.org>
> > > 
> > > This commit adds a self-testing infrastructure like extent tree does to
> > > do a sanity check for extent status tree.  After status tree is as a
> > > extent cache, we'd better to make sure that it caches right result.
> > > 
> > > After applied this commit, we will get a lot of messages when we run
> > > xfstests as below.
> > > 
> > > ...
> > > kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len
> > > 3 in ext4_map_blocks (allocation)
> > > ...
> > > kernel: ES cache assertation failed for inode: 230 es_cached ex
> > > [974/2/4781/20] != found ex [974/1/4781/1000]
> > > ...
> > > kernel: ES insert assertation failed for inode: 635 ex_status
> > > [0/45/21388/w] != es_status [44/1/21432/u]
> > > ...
> > > 
> > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > > Cc: "Theodore Ts'o" <tytso@mit.edu>
> > > ---
> > >  fs/ext4/extents_status.c | 64 ++++++++++++++++++++++++++++++++
> > >  fs/ext4/extents_status.h |  6 +++
> > >  fs/ext4/inode.c          | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 165 insertions(+)
> > > 
> > > diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> > > index a769485..02df536 100644
> > > --- a/fs/ext4/extents_status.c
> > > +++ b/fs/ext4/extents_status.c
> > > @@ -401,6 +401,66 @@ ext4_es_try_to_merge_right(struct inode *inode, struct extent_status *es)
> > >  	return es;
> > >  }
> > >  
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +static void ext4_es_insert_extent_check(struct inode *inode,
> > > +					struct extent_status *es)
> > Agree, this is very good place for sanity check.
> > > +{
> > > +	struct ext4_ext_path *path = NULL;
> > > +	struct ext4_extent *ex;
> > > +	ext4_lblk_t ee_block;
> > > +	ext4_fsblk_t ee_start;
> > > +	unsigned short ee_len;
> > > +	int depth, ex_status, es_status;
> > > +
> > > +	/*
> > > +	 * Delayed and hole haven't been allocated in disk.  We
> > > +	 * don't need to check it.
> > > +	 */
> > Why?  We still can perform ext4_ext_find_extent and check that:
> >       /* If extent is delayed  only if ex == NULL */
> >       if (!ex  == ext4_es_is_written(es))
> >            WARN_ON();
> >           
> >       if (ext4_es_is_unwritten(es) != ext4_ext_is_uninitialized(ex))
> >            WARN_ON();
> 
> Yes, it will be fixed in next version.
> 
> > 
> > > +	if (!ext4_es_is_written(es) && !ext4_es_is_unwritten(es))
> > > +		return;
> > > +
> > > +	/*
> > > +	 * We don't need to worry about the race condition because
> > > +	 * caller takes i_data_sem locking.
> >          Yes, and it is good place to assertion that we hold it.
> 
> A BUG_ON(mutex_is_locked(&inode->i_mutex) will be added.
           ^^^^^^^
           !mutex_is_locked(&inode->i_mutex)

Regards,
                                                - Zheng

> 
> > > +	 */
> > > +	path = ext4_ext_find_extent(inode, es->es_lblk, NULL);
> > > +	if (IS_ERR(path))
> > > +		return;
> > > +
> > > +	depth = ext_depth(inode);
> > > +
> > > +	/* We should always find an extent at given logical block */
> > > +	ex = path[depth].p_ext;
> > > +	if ((ex != NULL )  ^ ext4_es_is_written(es) {
> > > +		printk("ES insert assertation failed to inode: %lu "
> > > +		       "can not find the extent at block %d\n",
> > > +		       inode->i_ino, es->es_lblk);
> > > +		goto out;
> > > +	}
> > > +
> > > +> +
> > > +	ee_block = le32_to_cpu(ex->ee_block);
> > > +	ee_start = ext4_ext_pblock(ex);
> > > +	ee_len = ext4_ext_get_actual_len(ex);
> > > +
> > > +	ex_status = ext4_ext_is_uninitialized(ex) ? 1 : 0;
> > > +	es_status = ext4_es_is_unwritten(es) ? 1 : 0;
> > > +	if (ex_status ^ es_status) {
> > Why status only?  we can also check pblock, and lblk.
> 
> Yes, it will be fixed.
> 
> Thanks,
>                                                 - Zheng
> 
> > > +		printk("ES insert assertation failed for inode: %lu "
> > > +		       "ex_status [%d/%d/%llu/%c] != "
> > > +		       "es_status [%d/%d/%llu/%c]\n", inode->i_ino,
> > > +		       ee_block, ee_len, ee_start, ex_status ? 'u' : 'w',
> > > +		       es->es_lblk, es->es_len, ext4_es_pblock(es),
> > > +		       es_status ? 'u' : 'w');
> > > +	}
> > > +out:
> > > +	if (path) {
> > > +		ext4_ext_drop_refs(path);
> > > +		kfree(path);
> > > +	}
> > > +
> > > +}
> > > +#endif
> > > +
> > >  static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
> > >  {
> > >  	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> > > @@ -483,6 +543,10 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> > >  	ext4_es_store_status(&newes, status);
> > >  	trace_ext4_es_insert_extent(inode, &newes);
> > >  
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +	ext4_es_insert_extent_check(inode, &newes);
> > > +#endif
> > > +
> > >  	write_lock(&EXT4_I(inode)->i_es_lock);
> > >  	err = __es_remove_extent(inode, lblk, end);
> > >  	if (err != 0)
> > > diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> > > index f190dfe..56140ad 100644
> > > --- a/fs/ext4/extents_status.h
> > > +++ b/fs/ext4/extents_status.h
> > > @@ -21,6 +21,12 @@
> > >  #endif
> > >  
> > >  /*
> > > + * With ES_AGGRESSIVE_TEST defined, the result of es caching will be
> > > + * checked with old map_block's result.
> > > + */
> > > +#define ES_AGGRESSIVE_TEST__
> > > +
> > > +/*
> > >   * These flags live in the high bits of extent_status.es_pblk
> > >   */
> > >  #define EXTENT_STATUS_WRITTEN	(1ULL << 63)
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 95a0c62..8d22170 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -482,6 +482,57 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
> > >  	return num;
> > >  }
> > >  
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +static void ext4_map_blocks_es_recheck(handle_t *handle,
> > > +				       struct inode *inode,
> > > +				       struct ext4_map_blocks *es_map,
> > > +				       struct ext4_map_blocks *map,
> > > +				       int flags)
> > > +{
> > > +	int retval;
> > > +
> > > +	map->m_flags = 0;
> > > +	/*
> > > +	 * There is a race window that the result is not the same.
> > > +	 * e.g. xfstests #223 when dioread_nolock enables.  The reason
> > > +	 * is that we lookup a block mapping in extent status tree with
> > > +	 * out taking i_data_sem.  So at the time the unwritten extent
> > > +	 * could be converted.
> > > +	 */
> > > +	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> > > +		down_read((&EXT4_I(inode)->i_data_sem));
> > > +	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> > > +		retval = ext4_ext_map_blocks(handle, inode, map, flags &
> > > +					     EXT4_GET_BLOCKS_KEEP_SIZE);
> > > +	} else {
> > > +		retval = ext4_ind_map_blocks(handle, inode, map, flags &
> > > +					     EXT4_GET_BLOCKS_KEEP_SIZE);
> > > +	}
> > > +	if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
> > > +		up_read((&EXT4_I(inode)->i_data_sem));
> > > +	/*
> > > +	 * Clear EXT4_MAP_FROM_CLUSTER flag because it shouldn't be marked
> > > +	 * in es_map->m_flags.
> > > +	 */
> > > +	map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
> > > +
> > > +	/*
> > > +	 * We don't check m_len because extent will be collpased in status
> > > +	 * tree.  So the m_len might not equal.
> > > +	 */
> > > +	if (es_map->m_lblk != map->m_lblk ||
> > > +	    es_map->m_flags != map->m_flags ||
> > > +	    es_map->m_pblk != map->m_pblk) {
> > > +		printk("ES cache assertation failed for inode: %lu "
> > > +		       "es_cached ex [%d/%d/%llu/%x] != "
> > > +		       "found ex [%d/%d/%llu/%x]\n", inode->i_ino,
> > > +		       es_map->m_lblk, es_map->m_len,
> > > +		       es_map->m_pblk, es_map->m_flags,
> > > +		       map->m_lblk, map->m_len, map->m_pblk, map->m_flags);
> > > +	}
> > > +}
> > > +#endif /* ES_AGGRESSIVE_TEST */
> > > +
> > >  /*
> > >   * The ext4_map_blocks() function tries to look up the requested blocks,
> > >   * and returns if the blocks are already mapped.
> > > @@ -509,6 +560,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
> > >  {
> > >  	struct extent_status es;
> > >  	int retval;
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +	struct ext4_map_blocks orig_map;
> > > +
> > > +	memcpy(&orig_map, map, sizeof(*map));
> > > +#endif
> > >  
> > >  	map->m_flags = 0;
> > >  	ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u,"
> > > @@ -531,6 +587,10 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
> > >  		} else {
> > >  			BUG_ON(1);
> > >  		}
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +		ext4_map_blocks_es_recheck(handle, inode, map,
> > > +					   &orig_map, flags);
> > > +#endif
> > >  		goto found;
> > >  	}
> > >  
> > > @@ -551,6 +611,15 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
> > >  		int ret;
> > >  		unsigned long long status;
> > >  
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +		if (retval != map->m_len) {
> > > +			printk("ES len assertation failed for inode: %lu "
> > > +			       "retval %d != map->m_len %d "
> > > +			       "in %s (lookup)\n", inode->i_ino, retval,
> > > +			       map->m_len, __func__);
> > > +		}
> > > +#endif
> > > +
> > >  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> > >  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> > >  		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> > > @@ -643,6 +712,15 @@ found:
> > >  		int ret;
> > >  		unsigned long long status;
> > >  
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +		if (retval != map->m_len) {
> > > +			printk("ES len assertation failed for inode: %lu "
> > > +			       "retval %d != map->m_len %d "
> > > +			       "in %s (allocation)\n", inode->i_ino, retval,
> > > +			       map->m_len, __func__);
> > > +		}
> > > +#endif
> > > +
> > >  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> > >  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> > >  		if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> > > @@ -1768,6 +1846,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> > >  	struct extent_status es;
> > >  	int retval;
> > >  	sector_t invalid_block = ~((sector_t) 0xffff);
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +	struct ext4_map_blocks orig_map;
> > > +
> > > +	memcpy(&orig_map, map, sizeof(&map));
> > > +#endif
> > >  
> > >  	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
> > >  		invalid_block = ~0;
> > > @@ -1809,6 +1892,9 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> > >  		else
> > >  			BUG_ON(1);
> > >  
> > > +#ifdef ES_AGRESSIVE_TEST
> > > +		ext4_map_blocks_es_recheck(NULL, inode, map, &orig_map, flags);
> > > +#endif
> > >  		return retval;
> > >  	}
> > >  
> > > @@ -1873,6 +1959,15 @@ add_delayed:
> > >  		int ret;
> > >  		unsigned long long status;
> > >  
> > > +#ifdef ES_AGGRESSIVE_TEST
> > > +		if (retval != map->m_len) {
> > > +			printk("ES len assertation failed for inode: %lu "
> > > +			       "retval %d != map->m_len %d "
> > > +			       "in %s (lookup)\n", inode->i_ino, retval,
> > > +			       map->m_len, __func__);
> > > +		}
> > > +#endif
> > > +
> > >  		status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> > >  				EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> > >  		ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> > > -- 
> > > 1.7.12.rc2.18.g61b472e
> > > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/6] ext4: update extent status tree after an extent is zeroed out
  2013-02-28 18:45   ` Dmitry Monakhov
@ 2013-03-01  2:37     ` Zheng Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Zheng Liu @ 2013-03-01  2:37 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, Zheng Liu, Theodore Ts'o

On Thu, Feb 28, 2013 at 10:45:29PM +0400, Dmitry Monakhov wrote:
> On Fri,  1 Mar 2013 02:26:44 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> > From: Zheng Liu <wenqing.lz@taobao.com>
> > 
> > When we try to split an extent, this extent could be zeroed out and mark
> > as initialized.  But we don't know this in ext4_map_blocks because it
> > only returns a allocated length of extent.  Meanwhile we will mark this
> > extent as uninitialized because we only check m_flags.
> > 
> > This commit update extent status tree when we try to split an unwritten
> > extent.  We don't need to worry about the status of this extent because
> > we always mark it as initialized.
> Your solution seems too complex, why not just update es_cache inside 
> ext4_ext_zeroout() like this:

Honestly this is my first solution.  But I don't think we can update
es_cache in ext4_ext_zeroout().  The reason is that when
ext4_ext_zerouout is called ext4_ext_dirty hasn't been called.  That
means that extent tree on disk is not changed.  We can't update es_cache
before metadata is changed.  Otherwise the assertation in
ext4_es_insert_extent_check() will fail.  Am I miss something?

Thanks
                                                - Zheng

> 
> From e0565336c9ea540b0297781196453b358c3c3313 Mon Sep 17 00:00:00 2001
> From: Dmitry Monakhov <dmonakhov@openvz.org>
> Date: Thu, 28 Feb 2013 22:40:14 +0400
> Subject: [PATCH] ext4: update status tree on extent zeroout
> 
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c           |    4 ++++
>  include/trace/events/ext4.h |   27 +++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index c89e850..15cc324 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2885,12 +2885,16 @@ static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
>  	unsigned int ee_len;
>  	int ret;
>  
> +	trace_ext4_ext_zeroout(inode, ex);
>  	ee_len    = ext4_ext_get_actual_len(ex);
>  	ee_pblock = ext4_ext_pblock(ex);
>  
>  	ret = sb_issue_zeroout(inode->i_sb, ee_pblock, ee_len, GFP_NOFS);
>  	if (ret > 0)
>  		ret = 0;
> +	if (!ret)
> +		ret = ext4_es_insert_extent(inode, le32_to_cpu(ex->ee_block),
> +					    ee_len, ee_pblock, EXTENT_STATUS_WRITTEN);
>  
>  	return ret;
>  }
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index c0457c0..cd750a6 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -2092,6 +2092,33 @@ TRACE_EVENT(ext4_ext_remove_space_done,
>  		  (unsigned short) __entry->eh_entries)
>  );
>  
> +TRACE_EVENT(ext4_ext_zeroout,
> +	TP_PROTO(struct inode *inode, struct ext4_extent *ex),
> +
> +	TP_ARGS(inode, ex),
> +
> +	TP_STRUCT__entry(
> +		__field(	dev_t,		dev	)
> +		__field(	ino_t,		ino	)
> +		__field(	ext4_lblk_t,	u_lblk	)
> +		__field(	unsigned,	u_len	)
> +		__field(	ext4_fsblk_t,	u_pblk	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->dev		= inode->i_sb->s_dev;
> +		__entry->ino		= inode->i_ino;
> +		__entry->u_lblk		= le32_to_cpu(ex->ee_block);
> +		__entry->u_len		= ext4_ext_get_actual_len(ex);
> +		__entry->u_pblk		= ext4_ext_pblock(ex);
> +	),
> +
> +	TP_printk("dev %d,%d ino %lu ee_lblk %u ee_len %u ee_pblk %llu",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  (unsigned long) __entry->ino,
> +		  __entry->u_lblk, __entry->u_len, __entry->u_pblk)
> +);
> +
>  TRACE_EVENT(ext4_es_insert_extent,
>  	TP_PROTO(struct inode *inode, struct extent_status *es),
>  
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check
  2013-03-01  2:29       ` Zheng Liu
@ 2013-03-01  3:23         ` Zheng Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Zheng Liu @ 2013-03-01  3:23 UTC (permalink / raw)
  To: Dmitry Monakhov, linux-ext4, Zheng Liu, Theodore Ts'o

On Fri, Mar 01, 2013 at 10:29:46AM +0800, Zheng Liu wrote:
[snip]
> > > > +	if (!ext4_es_is_written(es) && !ext4_es_is_unwritten(es))
> > > > +		return;
> > > > +
> > > > +	/*
> > > > +	 * We don't need to worry about the race condition because
> > > > +	 * caller takes i_data_sem locking.
> > >          Yes, and it is good place to assertion that we hold it.
> > 
> > A BUG_ON(mutex_is_locked(&inode->i_mutex) will be added.
>            ^^^^^^^
>            !mutex_is_locked(&inode->i_mutex)

Sigh, it should be !rwsem_is_locked(&EXT4_I(inode)->i_data_sem).  Sorry
for the noisy.

Regards,
                                                - Zheng

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-03-01  3:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-28 18:26 [PATCH 0/6] ext4: try to fix up es regressions Zheng Liu
2013-02-28 18:26 ` [PATCH 1/6] ext4: invalidate exntent-status-tree during extent_migration Zheng Liu
2013-02-28 18:26 ` [PATCH 2/6] ext4: improve ext4_es_can_be_merged() to avoid a potential overflow Zheng Liu
2013-02-28 18:26 ` [PATCH 3/6] ext4: add self-testing infrastructure to do a sanity check Zheng Liu
2013-02-28 19:04   ` Dmitry Monakhov
2013-03-01  2:28     ` Zheng Liu
2013-03-01  2:29       ` Zheng Liu
2013-03-01  3:23         ` Zheng Liu
2013-02-28 18:26 ` [PATCH 4/6] ext4: fix wrong m_len value after unwritten extent conversion Zheng Liu
2013-02-28 18:26 ` [PATCH 5/6] ext4: ext4_split_extent shoult take care about extent zeroout v3 Zheng Liu
2013-02-28 18:26 ` [PATCH 6/6] ext4: update extent status tree after an extent is zeroed out Zheng Liu
2013-02-28 18:45   ` Dmitry Monakhov
2013-03-01  2:37     ` Zheng Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.