All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3
@ 2013-02-25 16:07 Dmitry Monakhov
  2013-02-25 16:07 ` [PATCH 2/5] ext4: disable merging of uninitialized extents Dmitry Monakhov
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-02-25 16:07 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, jack, wenqing.lz, Dmitry Monakhov

When ext4_split_extent_at() ends up doing zeroout & conversion to
initialized instead of split & conversion, ext4_split_extent() gets
confused and can wrongly mark the extent back as uninitialized resulting in
end IO code getting confused from large unwritten extents and may result in
data loss.

The example of problematic behavior is:
			    lblk len              lblk len
  ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
    ext4_split_extent_at() (split [1000,30,uninit] at 1020)
      ext4_ext_insert_extent() -> ENOSPC
      ext4_ext_zeroout()
	 -> extent [1000,30] is now initialized
    ext4_split_extent_at() (split [1000,30,init] at 1010,
			     MARK_UNINIT1 | MARK_UNINIT2)
      -> extent is split and parts marked as uninitialized

Fix the problem by rechecking extent type after the first
ext4_split_extent_at() returns. None of split_flags can not be applied to
initialized extent so this patch also add BUG_ON to prevent similar issues
in future.

TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597

Changes since V2: Patch no longer depends on Jan's "disable-uninit-ext-mergring" patch.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   22 ++++++++++++++++------
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 372b2cb..3bd3ca5 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2943,6 +2943,10 @@ static int ext4_split_extent_at(handle_t *handle,
 	newblock = split - ee_block + ext4_ext_pblock(ex);
 
 	BUG_ON(split < ee_block || split >= (ee_block + ee_len));
+	BUG_ON(!ext4_ext_is_uninitialized(ex) &&
+	       split_flag & (EXT4_EXT_MAY_ZEROOUT |
+			     EXT4_EXT_MARK_UNINIT1 |
+			     EXT4_EXT_MARK_UNINIT2));
 
 	err = ext4_ext_get_access(handle, inode, path + depth);
 	if (err)
@@ -3061,19 +3065,25 @@ static int ext4_split_extent(handle_t *handle,
 		if (err)
 			goto out;
 	}
-
+	/*
+	 * Update path is required because previous ext4_split_extent_at() may
+	 * result in split of original leaf or extent zeroout.
+	 */
 	ext4_ext_drop_refs(path);
 	path = ext4_ext_find_extent(inode, map->m_lblk, path);
 	if (IS_ERR(path))
 		return PTR_ERR(path);
+	depth = ext_depth(inode);
+	ex = path[depth].p_ext;
+	uninitialized = ext4_ext_is_uninitialized(ex);
+	split_flag1 = 0;
 
 	if (map->m_lblk >= ee_block) {
-		split_flag1 = split_flag & (EXT4_EXT_MAY_ZEROOUT |
-					    EXT4_EXT_DATA_VALID2);
-		if (uninitialized)
+		split_flag1 = split_flag & EXT4_EXT_DATA_VALID2;
+		if (uninitialized) {
 			split_flag1 |= EXT4_EXT_MARK_UNINIT1;
-		if (split_flag & EXT4_EXT_MARK_UNINIT2)
-			split_flag1 |= EXT4_EXT_MARK_UNINIT2;
+			split_flag1 |= split_flag & (EXT4_EXT_MAY_ZEROOUT | EXT4_EXT_MARK_UNINIT2);
+		}
 		err = ext4_split_extent_at(handle, inode, path,
 				map->m_lblk, split_flag1, flags);
 		if (err)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] ext4: disable merging of uninitialized extents
  2013-02-25 16:07 [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Dmitry Monakhov
@ 2013-02-25 16:07 ` Dmitry Monakhov
  2013-02-25 18:09   ` Jan Kara
  2013-03-04 14:26   ` Zheng Liu
  2013-02-25 16:07 ` [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio Dmitry Monakhov
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-02-25 16:07 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, jack, wenqing.lz, Dmitry Monakhov

Derived from Jan's patch:http://permalink.gmane.org/gmane.comp.file-systems.ext4/36470

Merging of uninitialized extents creates all sorts of interesting race
possibilities when writeback / DIO races with fallocate. Thus
ext4_convert_unwritten_extents_endio() has to deal with a case where
extent to be converted needs to be split out first. That isn't nice
for two reasons:

1) It may need allocation of extent tree block so ENOSPC is possible.
2) It complicates end_io handling code

So we disable merging of uninitialized extents which allows us to simplify
the code. Extents will get merged after they are converted to initialized
ones.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3bd3ca5..1d37f2d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1584,10 +1584,12 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
 	unsigned short ext1_ee_len, ext2_ee_len, max_len;
 
 	/*
-	 * Make sure that either both extents are uninitialized, or
-	 * both are _not_.
+	 * Make sure that both extents are initialized. We don't merge
+	 * uninitialized extents so that we can be sure that end_io code has
+	 * the extent that was written properly split out and conversion to
+	 * initialized is trivial.
 	 */
-	if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
+	if (ext4_ext_is_uninitialized(ex1) || ext4_ext_is_uninitialized(ex2))
 		return 0;
 
 	if (ext4_ext_is_uninitialized(ex1))
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio
  2013-02-25 16:07 [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Dmitry Monakhov
  2013-02-25 16:07 ` [PATCH 2/5] ext4: disable merging of uninitialized extents Dmitry Monakhov
@ 2013-02-25 16:07 ` Dmitry Monakhov
  2013-02-25 18:08   ` Jan Kara
  2013-03-04 14:00   ` Zheng Liu
  2013-02-25 16:07 ` [PATCH 4/5] ext4: remove unnecessary wait for extent conversion in ext4_fallocate() Dmitry Monakhov
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-02-25 16:07 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, jack, wenqing.lz, Dmitry Monakhov

Splitting extents inside endio is bad thing, but unfortunetly it is still
possible. In fact we are pretty close to the moment when all related
issues will be fixed. Let's warn developer if it still the case.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 1d37f2d..78c2a91 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3386,8 +3386,19 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
 		"block %llu, max_blocks %u\n", inode->i_ino,
 		  (unsigned long long)ee_block, ee_len);
 
-	/* If extent is larger than requested then split is required */
+	/* If extent is larger than requested it is a clear sign that we still
+	 * have some extent state machine issues left. So extent_split is still
+	 * required.
+	 * TODO: Once all related issues will be fixed this situation should be
+	 * illegal.
+	 */
 	if (ee_block != map->m_lblk || ee_len > map->m_len) {
+#ifdef EXT4_DEBUG
+		ext4_warning("Inode (%ld) finished: extent logical block %llu,"
+			     " len %u; IO logical block %llu, len %u\n",
+			     inode->i_ino, (unsigned long long)ee_block, ee_len,
+			     (unsigned long long)map->m_lblk, map->m_len);
+#endif
 		err = ext4_split_unwritten_extents(handle, inode, map, path,
 						   EXT4_GET_BLOCKS_CONVERT);
 		if (err < 0)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] ext4: remove unnecessary wait for extent conversion in ext4_fallocate()
  2013-02-25 16:07 [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Dmitry Monakhov
  2013-02-25 16:07 ` [PATCH 2/5] ext4: disable merging of uninitialized extents Dmitry Monakhov
  2013-02-25 16:07 ` [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio Dmitry Monakhov
@ 2013-02-25 16:07 ` Dmitry Monakhov
  2013-02-25 16:07 ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Dmitry Monakhov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-02-25 16:07 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, jack, wenqing.lz, Dmitry Monakhov

From: Jan Kara <jack@suse.cz>

Now that we don't merge uninitialized extents anymore,
ext4_fallocate() is free to operate on the inode while there are still
some extent conversions pending - it won't disturb them in any way.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 78c2a91..c89e850 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4391,8 +4391,6 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (len <= EXT_UNINIT_MAX_LEN << blkbits)
 		flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
 
-	/* Prevent race condition between unwritten */
-	ext4_flush_unwritten_io(inode);
 retry:
 	while (ret >= 0 && ret < max_blocks) {
 		map.m_lblk = map.m_lblk + ret;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration
  2013-02-25 16:07 [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Dmitry Monakhov
                   ` (2 preceding siblings ...)
  2013-02-25 16:07 ` [PATCH 4/5] ext4: remove unnecessary wait for extent conversion in ext4_fallocate() Dmitry Monakhov
@ 2013-02-25 16:07 ` Dmitry Monakhov
  2013-02-25 16:29   ` Dmitry Monakhov
  2013-02-25 18:06   ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Jan Kara
  2013-02-25 18:33 ` [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Jan Kara
  2013-03-04  5:58 ` Theodore Ts'o
  5 siblings, 2 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-02-25 16:07 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, jack, wenqing.lz, Dmitry Monakhov

mext_replace_branches() will change inode's extents layout so
we have to drop corresponding cache.

TESTCASE:  301'th xfstest was not yet accepted to official xfstest's branch
and can be found here: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/move_extent.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index d78c33e..c1f15b2 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -666,6 +666,14 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
 	int replaced_count = 0;
 	int dext_alen;
 
+	*err = ext4_es_remove_extent(orig_inode, from, count);
+	if (*err)
+		goto out;
+
+	*err = ext4_es_remove_extent(donor_inode, from, count);
+	if (*err)
+		goto out;
+
 	/* Get the original extent for the block "orig_off" */
 	*err = get_ext_path(orig_inode, orig_off, &orig_path);
 	if (*err)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration
  2013-02-25 16:07 ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Dmitry Monakhov
@ 2013-02-25 16:29   ` Dmitry Monakhov
  2013-02-25 17:04     ` Zheng Liu
  2013-02-26 14:23     ` [PATCH] ext4: fix wrong m_len value after unwritten extent conversion (Re: [PATCH 5/5] ext4: invalidate...) Zheng Liu
  2013-02-25 18:06   ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Jan Kara
  1 sibling, 2 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-02-25 16:29 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, jack, wenqing.lz

On Mon, 25 Feb 2013 20:07:43 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> mext_replace_branches() will change inode's extents layout so
> we have to drop corresponding cache.
> 
> TESTCASE:  301'th xfstest was not yet accepted to official xfstest's branch
> and can be found here: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0
One BIG note about this patch.
It fix BUG_ON(bh->b_blocknr != pblock) in fs/ext4/inode.c:1452
but 300'th xfstest(from my repo) still failed due to data corruption in
verifier thread. 
LOG:
MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/vzvg-scratch_dev
/mnt_scratch

300 33s ... [failed, exit status 1] - output mismatch (see 300.out.bad)
    --- 300.out      2013-02-20 12:46:24.000000000 +0400
    +++ 300.out.bad  2013-02-25 20:18:24.000000000 +0400
    @@ -2,3 +2,5 @@
     
      Start defragment activity 
     
    +failed: '/usr/local/bin/fio /tmp/1665-300.fio'
    +(see 300.full for details)
     ...
     (Run 'diff -u 300.out 300.out.bad' to see the entire diff)

#300.full
crc32c: verify failed at file /mnt_scratch/test2 offset 16973824, length
    65536
       Expected CRC: d062565e
       Received CRC: f2cd2028
       received data dumped as test2.16973824.received
       expected data dumped as test2.16973824.expected
defrag-4k: Laying out IO file(s) (1 file(s) / 3400MB)
donor-file-fuzzer: Laying out IO file(s) (1 file(s) / 3400MB)

#DIFF:
--- /tmp/test2.16973824.expected        2013-02-25 20:23:04.000000000
    +0400
+++ /tmp/test2.16973824.received        2013-02-25 20:22:55.000000000
    +0400
@@ -254,3844 +254,6 @@
 0000fd0 eb95 6201 89ec 151a bd72 9675 203c 0d80
 0000fe0 b7ae b415 b8cc 0b2f b6f5 baab 9d0e 1741
 0000ff0 f6de fbda d64b 1bd3 5edb 040c 7cdc 18e6
-0001000 0bdb 5114 cd95 01eb 017b a435 d128 11af
-0001010 202f 93c9 a80a 013e a405 c261 8b1c 0dab
-0001020 b480 c649 1032 1b0a 3690 7e89 dee1 0ac7
-0001030 26d2 9489 3c97 0bd8 24da 5f28 3d4e 066d
-0001040 049b b978 7815 159c 8093 b4e1 b246 1c25
.....
-000ffc0 bbb1 fa68 5622 022d 9776 0174 2d90 1ecb
-000ffd0 92ee b473 64f7 0bcb 725d 2d17 9265 0fff
-000ffe0 6e4b ec74 5bc0 0618 0dc9 e669 953d 002f
-000fff0 a1b9 35e8 ff73 17a5 9437 15e0 24fa 150a
+0001000 0000 0000 0000 0000 0000 0000 0000 0000
+*
 0010000

It seems that we data was written to uninitialized extent, but
unwritten->initialized extent conversion was missed somewhere.
I have not fix for that issue yet.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/move_extent.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index d78c33e..c1f15b2 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -666,6 +666,14 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>  	int replaced_count = 0;
>  	int dext_alen;
>  
> +	*err = ext4_es_remove_extent(orig_inode, from, count);
> +	if (*err)
> +		goto out;
> +
> +	*err = ext4_es_remove_extent(donor_inode, from, count);
> +	if (*err)
> +		goto out;
> +
>  	/* Get the original extent for the block "orig_off" */
>  	*err = get_ext_path(orig_inode, orig_off, &orig_path);
>  	if (*err)
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration
  2013-02-25 16:29   ` Dmitry Monakhov
@ 2013-02-25 17:04     ` Zheng Liu
  2013-02-26 14:23     ` [PATCH] ext4: fix wrong m_len value after unwritten extent conversion (Re: [PATCH 5/5] ext4: invalidate...) Zheng Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Zheng Liu @ 2013-02-25 17:04 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

Hi Dmitry,

Thank you very much for fixing this regression.  A very quick test is
run in my desktop and xfstests #300 and #301 can pass.  Feel free to
add:
Reviewed-and-tested-by: Zheng Liu <wenqing.lz@taobao.com>

I will add self-testing infrastructure in status tree ASAP and re-run
xfstests.

Thanks!!
                                                - Zheng

On Mon, Feb 25, 2013 at 08:29:46PM +0400, Dmitry Monakhov wrote:
> On Mon, 25 Feb 2013 20:07:43 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> > mext_replace_branches() will change inode's extents layout so
> > we have to drop corresponding cache.
> > 
> > TESTCASE:  301'th xfstest was not yet accepted to official xfstest's branch
> > and can be found here: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0
> One BIG note about this patch.
> It fix BUG_ON(bh->b_blocknr != pblock) in fs/ext4/inode.c:1452
> but 300'th xfstest(from my repo) still failed due to data corruption in
> verifier thread. 
> LOG:
> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/vzvg-scratch_dev
> /mnt_scratch
> 
> 300 33s ... [failed, exit status 1] - output mismatch (see 300.out.bad)
>     --- 300.out      2013-02-20 12:46:24.000000000 +0400
>     +++ 300.out.bad  2013-02-25 20:18:24.000000000 +0400
>     @@ -2,3 +2,5 @@
>      
>       Start defragment activity 
>      
>     +failed: '/usr/local/bin/fio /tmp/1665-300.fio'
>     +(see 300.full for details)
>      ...
>      (Run 'diff -u 300.out 300.out.bad' to see the entire diff)
> 
> #300.full
> crc32c: verify failed at file /mnt_scratch/test2 offset 16973824, length
>     65536
>        Expected CRC: d062565e
>        Received CRC: f2cd2028
>        received data dumped as test2.16973824.received
>        expected data dumped as test2.16973824.expected
> defrag-4k: Laying out IO file(s) (1 file(s) / 3400MB)
> donor-file-fuzzer: Laying out IO file(s) (1 file(s) / 3400MB)
> 
> #DIFF:
> --- /tmp/test2.16973824.expected        2013-02-25 20:23:04.000000000
>     +0400
> +++ /tmp/test2.16973824.received        2013-02-25 20:22:55.000000000
>     +0400
> @@ -254,3844 +254,6 @@
>  0000fd0 eb95 6201 89ec 151a bd72 9675 203c 0d80
>  0000fe0 b7ae b415 b8cc 0b2f b6f5 baab 9d0e 1741
>  0000ff0 f6de fbda d64b 1bd3 5edb 040c 7cdc 18e6
> -0001000 0bdb 5114 cd95 01eb 017b a435 d128 11af
> -0001010 202f 93c9 a80a 013e a405 c261 8b1c 0dab
> -0001020 b480 c649 1032 1b0a 3690 7e89 dee1 0ac7
> -0001030 26d2 9489 3c97 0bd8 24da 5f28 3d4e 066d
> -0001040 049b b978 7815 159c 8093 b4e1 b246 1c25
> .....
> -000ffc0 bbb1 fa68 5622 022d 9776 0174 2d90 1ecb
> -000ffd0 92ee b473 64f7 0bcb 725d 2d17 9265 0fff
> -000ffe0 6e4b ec74 5bc0 0618 0dc9 e669 953d 002f
> -000fff0 a1b9 35e8 ff73 17a5 9437 15e0 24fa 150a
> +0001000 0000 0000 0000 0000 0000 0000 0000 0000
> +*
>  0010000
> 
> It seems that we data was written to uninitialized extent, but
> unwritten->initialized extent conversion was missed somewhere.
> I have not fix for that issue yet.
> > 
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > ---
> >  fs/ext4/move_extent.c |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> > index d78c33e..c1f15b2 100644
> > --- a/fs/ext4/move_extent.c
> > +++ b/fs/ext4/move_extent.c
> > @@ -666,6 +666,14 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
> >  	int replaced_count = 0;
> >  	int dext_alen;
> >  
> > +	*err = ext4_es_remove_extent(orig_inode, from, count);
> > +	if (*err)
> > +		goto out;
> > +
> > +	*err = ext4_es_remove_extent(donor_inode, from, count);
> > +	if (*err)
> > +		goto out;
> > +
> >  	/* Get the original extent for the block "orig_off" */
> >  	*err = get_ext_path(orig_inode, orig_off, &orig_path);
> >  	if (*err)
> > -- 
> > 1.7.1
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration
  2013-02-25 16:07 ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Dmitry Monakhov
  2013-02-25 16:29   ` Dmitry Monakhov
@ 2013-02-25 18:06   ` Jan Kara
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Kara @ 2013-02-25 18:06 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

On Mon 25-02-13 20:07:43, Dmitry Monakhov wrote:
> mext_replace_branches() will change inode's extents layout so
> we have to drop corresponding cache.
> 
> TESTCASE:  301'th xfstest was not yet accepted to official xfstest's branch
> and can be found here: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
  Makes sense. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> ---
>  fs/ext4/move_extent.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index d78c33e..c1f15b2 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -666,6 +666,14 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
>  	int replaced_count = 0;
>  	int dext_alen;
>  
> +	*err = ext4_es_remove_extent(orig_inode, from, count);
> +	if (*err)
> +		goto out;
> +
> +	*err = ext4_es_remove_extent(donor_inode, from, count);
> +	if (*err)
> +		goto out;
> +
>  	/* Get the original extent for the block "orig_off" */
>  	*err = get_ext_path(orig_inode, orig_off, &orig_path);
>  	if (*err)
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio
  2013-02-25 16:07 ` [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio Dmitry Monakhov
@ 2013-02-25 18:08   ` Jan Kara
  2013-03-04 14:00   ` Zheng Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Kara @ 2013-02-25 18:08 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

On Mon 25-02-13 20:07:41, Dmitry Monakhov wrote:
> Splitting extents inside endio is bad thing, but unfortunetly it is still
> possible. In fact we are pretty close to the moment when all related
> issues will be fixed. Let's warn developer if it still the case.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/extents.c |   13 ++++++++++++-
>  1 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 1d37f2d..78c2a91 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3386,8 +3386,19 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
>  		"block %llu, max_blocks %u\n", inode->i_ino,
>  		  (unsigned long long)ee_block, ee_len);
>  
> -	/* If extent is larger than requested then split is required */
> +	/* If extent is larger than requested it is a clear sign that we still
> +	 * have some extent state machine issues left. So extent_split is still
> +	 * required.
> +	 * TODO: Once all related issues will be fixed this situation should be
> +	 * illegal.
> +	 */
>  	if (ee_block != map->m_lblk || ee_len > map->m_len) {
> +#ifdef EXT4_DEBUG
> +		ext4_warning("Inode (%ld) finished: extent logical block %llu,"
> +			     " len %u; IO logical block %llu, len %u\n",
> +			     inode->i_ino, (unsigned long long)ee_block, ee_len,
> +			     (unsigned long long)map->m_lblk, map->m_len);
> +#endif
>  		err = ext4_split_unwritten_extents(handle, inode, map, path,
>  						   EXT4_GET_BLOCKS_CONVERT);
>  		if (err < 0)
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] ext4: disable merging of uninitialized extents
  2013-02-25 16:07 ` [PATCH 2/5] ext4: disable merging of uninitialized extents Dmitry Monakhov
@ 2013-02-25 18:09   ` Jan Kara
  2013-03-04 14:26   ` Zheng Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Kara @ 2013-02-25 18:09 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

On Mon 25-02-13 20:07:40, Dmitry Monakhov wrote:
> Derived from Jan's patch:http://permalink.gmane.org/gmane.comp.file-systems.ext4/36470
> 
> Merging of uninitialized extents creates all sorts of interesting race
> possibilities when writeback / DIO races with fallocate. Thus
> ext4_convert_unwritten_extents_endio() has to deal with a case where
> extent to be converted needs to be split out first. That isn't nice
> for two reasons:
> 
> 1) It may need allocation of extent tree block so ENOSPC is possible.
> 2) It complicates end_io handling code
> 
> So we disable merging of uninitialized extents which allows us to simplify
> the code. Extents will get merged after they are converted to initialized
> ones.
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3bd3ca5..1d37f2d 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1584,10 +1584,12 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
>  	unsigned short ext1_ee_len, ext2_ee_len, max_len;
>  
>  	/*
> -	 * Make sure that either both extents are uninitialized, or
> -	 * both are _not_.
> +	 * Make sure that both extents are initialized. We don't merge
> +	 * uninitialized extents so that we can be sure that end_io code has
> +	 * the extent that was written properly split out and conversion to
> +	 * initialized is trivial.
>  	 */
> -	if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
> +	if (ext4_ext_is_uninitialized(ex1) || ext4_ext_is_uninitialized(ex2))
>  		return 0;
>  
>  	if (ext4_ext_is_uninitialized(ex1))
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3
  2013-02-25 16:07 [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Dmitry Monakhov
                   ` (3 preceding siblings ...)
  2013-02-25 16:07 ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Dmitry Monakhov
@ 2013-02-25 18:33 ` Jan Kara
  2013-03-04  5:58 ` Theodore Ts'o
  5 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2013-02-25 18:33 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

On Mon 25-02-13 20:07:39, Dmitry Monakhov wrote:
> When ext4_split_extent_at() ends up doing zeroout & conversion to
> initialized instead of split & conversion, ext4_split_extent() gets
> confused and can wrongly mark the extent back as uninitialized resulting in
> end IO code getting confused from large unwritten extents and may result in
> data loss.
> 
> The example of problematic behavior is:
> 			    lblk len              lblk len
>   ext4_split_extent() (ex=[1000,30,uninit], map=[1010,10])
>     ext4_split_extent_at() (split [1000,30,uninit] at 1020)
>       ext4_ext_insert_extent() -> ENOSPC
>       ext4_ext_zeroout()
> 	 -> extent [1000,30] is now initialized
>     ext4_split_extent_at() (split [1000,30,init] at 1010,
> 			     MARK_UNINIT1 | MARK_UNINIT2)
>       -> extent is split and parts marked as uninitialized
> 
> Fix the problem by rechecking extent type after the first
> ext4_split_extent_at() returns. None of split_flags can not be applied to
> initialized extent so this patch also add BUG_ON to prevent similar issues
> in future.
> 
> TESTCASE: https://github.com/dmonakhov/xfstests/commit/b8a55eb5ce28c6ff29e620ab090902fcd5833597
> 
> Changes since V2: Patch no longer depends on Jan's "disable-uninit-ext-mergring" patch.
  Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c |   22 ++++++++++++++++------
>  1 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 372b2cb..3bd3ca5 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2943,6 +2943,10 @@ static int ext4_split_extent_at(handle_t *handle,
>  	newblock = split - ee_block + ext4_ext_pblock(ex);
>  
>  	BUG_ON(split < ee_block || split >= (ee_block + ee_len));
> +	BUG_ON(!ext4_ext_is_uninitialized(ex) &&
> +	       split_flag & (EXT4_EXT_MAY_ZEROOUT |
> +			     EXT4_EXT_MARK_UNINIT1 |
> +			     EXT4_EXT_MARK_UNINIT2));
>  
>  	err = ext4_ext_get_access(handle, inode, path + depth);
>  	if (err)
> @@ -3061,19 +3065,25 @@ static int ext4_split_extent(handle_t *handle,
>  		if (err)
>  			goto out;
>  	}
> -
> +	/*
> +	 * Update path is required because previous ext4_split_extent_at() may
> +	 * result in split of original leaf or extent zeroout.
> +	 */
>  	ext4_ext_drop_refs(path);
>  	path = ext4_ext_find_extent(inode, map->m_lblk, path);
>  	if (IS_ERR(path))
>  		return PTR_ERR(path);
> +	depth = ext_depth(inode);
> +	ex = path[depth].p_ext;
> +	uninitialized = ext4_ext_is_uninitialized(ex);
> +	split_flag1 = 0;
>  
>  	if (map->m_lblk >= ee_block) {
> -		split_flag1 = split_flag & (EXT4_EXT_MAY_ZEROOUT |
> -					    EXT4_EXT_DATA_VALID2);
> -		if (uninitialized)
> +		split_flag1 = split_flag & EXT4_EXT_DATA_VALID2;
> +		if (uninitialized) {
>  			split_flag1 |= EXT4_EXT_MARK_UNINIT1;
> -		if (split_flag & EXT4_EXT_MARK_UNINIT2)
> -			split_flag1 |= EXT4_EXT_MARK_UNINIT2;
> +			split_flag1 |= split_flag & (EXT4_EXT_MAY_ZEROOUT | EXT4_EXT_MARK_UNINIT2);
> +		}
>  		err = ext4_split_extent_at(handle, inode, path,
>  				map->m_lblk, split_flag1, flags);
>  		if (err)
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] ext4: fix wrong m_len value after unwritten extent conversion (Re: [PATCH 5/5] ext4: invalidate...)
  2013-02-25 16:29   ` Dmitry Monakhov
  2013-02-25 17:04     ` Zheng Liu
@ 2013-02-26 14:23     ` Zheng Liu
  2013-02-26 15:43       ` Dmitry Monakhov
  1 sibling, 1 reply; 17+ messages in thread
From: Zheng Liu @ 2013-02-26 14:23 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

Hi Dmitry and Ted,

On Mon, Feb 25, 2013 at 08:29:46PM +0400, Dmitry Monakhov wrote:
[snip]
> One BIG note about this patch.
> It fix BUG_ON(bh->b_blocknr != pblock) in fs/ext4/inode.c:1452
> but 300'th xfstest(from my repo) still failed due to data corruption in
> verifier thread. 
> LOG:
> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/vzvg-scratch_dev
> /mnt_scratch
> 
> 300 33s ... [failed, exit status 1] - output mismatch (see 300.out.bad)
>     --- 300.out      2013-02-20 12:46:24.000000000 +0400
>     +++ 300.out.bad  2013-02-25 20:18:24.000000000 +0400
>     @@ -2,3 +2,5 @@
>      
>       Start defragment activity 
>      
>     +failed: '/usr/local/bin/fio /tmp/1665-300.fio'
>     +(see 300.full for details)
>      ...
>      (Run 'diff -u 300.out 300.out.bad' to see the entire diff)
> 
> #300.full
> crc32c: verify failed at file /mnt_scratch/test2 offset 16973824, length
>     65536
>        Expected CRC: d062565e
>        Received CRC: f2cd2028
>        received data dumped as test2.16973824.received
>        expected data dumped as test2.16973824.expected
> defrag-4k: Laying out IO file(s) (1 file(s) / 3400MB)
> donor-file-fuzzer: Laying out IO file(s) (1 file(s) / 3400MB)
> 
> #DIFF:
> --- /tmp/test2.16973824.expected        2013-02-25 20:23:04.000000000
>     +0400
> +++ /tmp/test2.16973824.received        2013-02-25 20:22:55.000000000
>     +0400
> @@ -254,3844 +254,6 @@
>  0000fd0 eb95 6201 89ec 151a bd72 9675 203c 0d80
>  0000fe0 b7ae b415 b8cc 0b2f b6f5 baab 9d0e 1741
>  0000ff0 f6de fbda d64b 1bd3 5edb 040c 7cdc 18e6
> -0001000 0bdb 5114 cd95 01eb 017b a435 d128 11af
> -0001010 202f 93c9 a80a 013e a405 c261 8b1c 0dab
> -0001020 b480 c649 1032 1b0a 3690 7e89 dee1 0ac7
> -0001030 26d2 9489 3c97 0bd8 24da 5f28 3d4e 066d
> -0001040 049b b978 7815 159c 8093 b4e1 b246 1c25
> .....
> -000ffc0 bbb1 fa68 5622 022d 9776 0174 2d90 1ecb
> -000ffd0 92ee b473 64f7 0bcb 725d 2d17 9265 0fff
> -000ffe0 6e4b ec74 5bc0 0618 0dc9 e669 953d 002f
> -000fff0 a1b9 35e8 ff73 17a5 9437 15e0 24fa 150a
> +0001000 0000 0000 0000 0000 0000 0000 0000 0000
> +*
>  0010000
> 
> It seems that we data was written to uninitialized extent, but
> unwritten->initialized extent conversion was missed somewhere.
> I have not fix for that issue yet.

I take a close look at this bug, and I found that a wrong 'map->m_len'
is returned from ext4_ext_map_blocks when we try to convert an unwritten
extent with EXT4_GET_BLOCKS_IO_CONVERT_EXT flag.  Here we always assume
that the return value of ext/ind_map_blocks() is equal to m_len.  But
here it breaks this assumption.  Let us consider what happens.

When we do a dio write for a extent-based file, we will preallocate an
unwritten extent for it.  After dio write has been done, we will convert
it in ext4_end_io callback.  We use ext4_convert_unwritten_extents to do
this conversion.  Then the codepath is like beblow:

ext4_convert_unwritten_extents()
  ->ext4_map_blocks()
    ->ext4_es_lookup_extent()
      [We can lookup an unwritten extent]
    ->ext4_ext_map_blocks()
      ->ext4_ext_handle_uninitialized_extents()
    ->ext4_es_insert_extent()
      [We update status tree, but the length of written extent is wrong.
       The length of written extent is greater than the length of we
       have converted]

The following case demonstrates what happens.

1. We do a dio write.  An unwritten extent will be preallocated.

status tree: [0, 23]:unwritten
extent tree: [0, 23]:unwritten

2. We try to convert an unwritten extent, e.g. [0, 15].  But due to this
bug we will update all unwritten extent in status tree.

status tree: [0, 23]:written
extent tree: [0, 15]:written, [16, 23]:unwritten

3. Then we try to convert the following unwritten extent, but we will
return directly in ext4_map_blocks because we lookup an written extent
from status tree, and that unwritten extent will never be converted.  At
the time if e4defrag is running, the status tree could be removed, and
we could read an unwritten extent.

Certainly this is only my *guess* because in my sandbox xfstests #300 and
#301 never fail.  I am not sure this patch could fix this regression.
*But* I am pretty sure it will cause some potential bugs if it hasn't been
fixed.  I paste the patch at the below, which has been tested by xfstests
plus Dmitry's #300 and #301 test case at the following configurations.

 * Ext4 4k block
 * Ext4 4k block w/dioread_nolock
 * EXt4 4k block w/bigalloc


Dmitry, I really appreciate if you could give this patch a try.  Hope it
can fix this regression.  Please let me know if there is any update.
Thank you very much.

Regards,
						- Zheng


Subject: [PATCH] ext4: fix wrong m_len value after unwritten extent conversion

From: Zheng Liu <wenqing.lz@taobao.com>

We always assume that the return value of ext4_ext_map_blocks is equal
to map->m_len but when we try to convert an unwritten extent 'm_len'
value will break this assumption.  It is harmless until we use status
tree as a extent cache because we need to update status tree according
to 'm_len' value.

Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent
conversion.  It shouldn't cause a bug because we update status tree
according to checking EXT4_MAP_UNWRITTEN flag.  But it should be fixed.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 372b2cb..0793a13 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3626,6 +3626,10 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
 						 path, map->m_len);
 		} else
 			err = ret;
+		map->m_flags |= EXT4_MAP_MAPPED;
+		if (allocated > map->m_len)
+			allocated = map->m_len;
+		map->m_len = allocated;
 		goto out2;
 	}
 	/* buffered IO case */
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] ext4: fix wrong m_len value after unwritten extent conversion (Re: [PATCH 5/5] ext4: invalidate...)
  2013-02-26 14:23     ` [PATCH] ext4: fix wrong m_len value after unwritten extent conversion (Re: [PATCH 5/5] ext4: invalidate...) Zheng Liu
@ 2013-02-26 15:43       ` Dmitry Monakhov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Monakhov @ 2013-02-26 15:43 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, tytso, jack, wenqing.lz

On Tue, 26 Feb 2013 22:23:42 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> Hi Dmitry and Ted,
> 
> On Mon, Feb 25, 2013 at 08:29:46PM +0400, Dmitry Monakhov wrote:
> [snip]
> > One BIG note about this patch.
> > It fix BUG_ON(bh->b_blocknr != pblock) in fs/ext4/inode.c:1452
> > but 300'th xfstest(from my repo) still failed due to data corruption in
> > verifier thread. 
> > LOG:
> > MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/vzvg-scratch_dev
> > /mnt_scratch
> > 
> > 300 33s ... [failed, exit status 1] - output mismatch (see 300.out.bad)
> >     --- 300.out      2013-02-20 12:46:24.000000000 +0400
> >     +++ 300.out.bad  2013-02-25 20:18:24.000000000 +0400
> >     @@ -2,3 +2,5 @@
> >      
> >       Start defragment activity 
> >      
> >     +failed: '/usr/local/bin/fio /tmp/1665-300.fio'
> >     +(see 300.full for details)
> >      ...
> >      (Run 'diff -u 300.out 300.out.bad' to see the entire diff)
> > 
> > #300.full
> > crc32c: verify failed at file /mnt_scratch/test2 offset 16973824, length
> >     65536
> >        Expected CRC: d062565e
> >        Received CRC: f2cd2028
> >        received data dumped as test2.16973824.received
> >        expected data dumped as test2.16973824.expected
> > defrag-4k: Laying out IO file(s) (1 file(s) / 3400MB)
> > donor-file-fuzzer: Laying out IO file(s) (1 file(s) / 3400MB)
> > 
> > #DIFF:
> > --- /tmp/test2.16973824.expected        2013-02-25 20:23:04.000000000
> >     +0400
> > +++ /tmp/test2.16973824.received        2013-02-25 20:22:55.000000000
> >     +0400
> > @@ -254,3844 +254,6 @@
> >  0000fd0 eb95 6201 89ec 151a bd72 9675 203c 0d80
> >  0000fe0 b7ae b415 b8cc 0b2f b6f5 baab 9d0e 1741
> >  0000ff0 f6de fbda d64b 1bd3 5edb 040c 7cdc 18e6
> > -0001000 0bdb 5114 cd95 01eb 017b a435 d128 11af
> > -0001010 202f 93c9 a80a 013e a405 c261 8b1c 0dab
> > -0001020 b480 c649 1032 1b0a 3690 7e89 dee1 0ac7
> > -0001030 26d2 9489 3c97 0bd8 24da 5f28 3d4e 066d
> > -0001040 049b b978 7815 159c 8093 b4e1 b246 1c25
> > .....
> > -000ffc0 bbb1 fa68 5622 022d 9776 0174 2d90 1ecb
> > -000ffd0 92ee b473 64f7 0bcb 725d 2d17 9265 0fff
> > -000ffe0 6e4b ec74 5bc0 0618 0dc9 e669 953d 002f
> > -000fff0 a1b9 35e8 ff73 17a5 9437 15e0 24fa 150a
> > +0001000 0000 0000 0000 0000 0000 0000 0000 0000
> > +*
> >  0010000
> > 
> > It seems that we data was written to uninitialized extent, but
> > unwritten->initialized extent conversion was missed somewhere.
> > I have not fix for that issue yet.
> 
> I take a close look at this bug, and I found that a wrong 'map->m_len'
> is returned from ext4_ext_map_blocks when we try to convert an unwritten
> extent with EXT4_GET_BLOCKS_IO_CONVERT_EXT flag.  Here we always assume
> that the return value of ext/ind_map_blocks() is equal to m_len.  But
> here it breaks this assumption.  Let us consider what happens.
> 
> When we do a dio write for a extent-based file, we will preallocate an
> unwritten extent for it.  After dio write has been done, we will convert
> it in ext4_end_io callback.  We use ext4_convert_unwritten_extents to do
> this conversion.  Then the codepath is like beblow:
> 
> ext4_convert_unwritten_extents()
>   ->ext4_map_blocks()
>     ->ext4_es_lookup_extent()
>       [We can lookup an unwritten extent]
>     ->ext4_ext_map_blocks()
>       ->ext4_ext_handle_uninitialized_extents()
>     ->ext4_es_insert_extent()
>       [We update status tree, but the length of written extent is wrong.
>        The length of written extent is greater than the length of we
>        have converted]
> 
> The following case demonstrates what happens.
> 
> 1. We do a dio write.  An unwritten extent will be preallocated.
> 
> status tree: [0, 23]:unwritten
> extent tree: [0, 23]:unwritten
> 
> 2. We try to convert an unwritten extent, e.g. [0, 15].  But due to this
> bug we will update all unwritten extent in status tree.
> 
> status tree: [0, 23]:written
> extent tree: [0, 15]:written, [16, 23]:unwritten
> 
> 3. Then we try to convert the following unwritten extent, but we will
> return directly in ext4_map_blocks because we lookup an written extent
> from status tree, and that unwritten extent will never be converted.  At
> the time if e4defrag is running, the status tree could be removed, and
> we could read an unwritten extent.
> 
> Certainly this is only my *guess* because in my sandbox xfstests #300 and
> #301 never fail.  I am not sure this patch could fix this regression.
> *But* I am pretty sure it will cause some potential bugs if it hasn't been
> fixed.  I paste the patch at the below, which has been tested by xfstests
> plus Dmitry's #300 and #301 test case at the following configurations.
> 
>  * Ext4 4k block
>  * Ext4 4k block w/dioread_nolock
>  * EXt4 4k block w/bigalloc
> 
> 
> Dmitry, I really appreciate if you could give this patch a try.  Hope it
> can fix this regression.  Please let me know if there is any update.
> Thank you very much.
No 300'th still fails. Even more it trigger new error:
 EXT4-fs error (device dm-3): ext4_move_extents:1486: inode #12: comm
 fio: We replaced blocks too much! sum of replaced: 33 requested: 32

BTW I dont understand your patch.
You state that (map->m_len > ex->ee_len ) condition is possible inside
ext4_convert_unwritten_extents_endio() but HOW?

0) We call dio [lblk:10, len:10] to fallocated area [lblk:0, len:30, unwritten]

1)DIO preparation call map_blocks with EXT4_GET_BLOCKS_PRE_IO
  ->ext4_ext_handle_uninitialized_extents
    ->ext4_split_unwritten_extents
      which split extent according to map->m_len -> [0,10,u], [10,10,u], [20,10,u]

2) ext_end_io will call map_blocks with EXT4_GET_BLOCKS_CONVERT
  ->ext4_ext_handle_uninitialized_extents
    ->ext4_convert_unwritten_extents_endio
      will found [10,10,u] and convert it to [10,10,w]


e4defrag can not change layout between (1)-(2) because it properly wait for
any outstanding aio like follows (move_extent.c:1328):

        /* Protect orig and donor inodes against a truncate */
        mext_inode_double_lock(orig_inode, donor_inode);
        /* Wait for all existing dio workers */
        ext4_inode_block_unlocked_dio(orig_inode);
        ext4_inode_block_unlocked_dio(donor_inode);
        inode_dio_wait(orig_inode);
        inode_dio_wait(donor_inode)
        /* Protect extent tree against block allocations via delalloc */
        double_down_write_data_sem(orig_inode, donor_inode);
        /*Here any io activity is blocked for given inodes. */

> 
> Regards,
> 						- Zheng
> 
> 
> Subject: [PATCH] ext4: fix wrong m_len value after unwritten extent conversion
> 
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> We always assume that the return value of ext4_ext_map_blocks is equal
> to map->m_len but when we try to convert an unwritten extent 'm_len'
> value will break this assumption.  It is harmless until we use status
> tree as a extent cache because we need to update status tree according
> to 'm_len' value.
> 
> Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent
> conversion.  It shouldn't cause a bug because we update status tree
> according to checking EXT4_MAP_UNWRITTEN flag.  But it should be fixed.
> 
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 372b2cb..0793a13 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3626,6 +3626,10 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
>  						 path, map->m_len);
>  		} else
>  			err = ret;
> +		map->m_flags |= EXT4_MAP_MAPPED;
> +		if (allocated > map->m_len)
> +			allocated = map->m_len;
> +		map->m_len = allocated;
>  		goto out2;
>  	}
>  	/* buffered IO case */
> -- 
> 1.7.12.rc2.18.g61b472e
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3
  2013-02-25 16:07 [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Dmitry Monakhov
                   ` (4 preceding siblings ...)
  2013-02-25 18:33 ` [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Jan Kara
@ 2013-03-04  5:58 ` Theodore Ts'o
  2013-03-04  6:37   ` Zheng Liu
  5 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2013-03-04  5:58 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack, wenqing.lz, Zheng Liu

Dmitry,

Thanks for working on these patches.  They look good!  I've dropped
them into the ext4 tree in the dev branch, and am starting to run
tests on them.

Zheng, since some of your fix up patches look like they touch some of
the code modified by Dmitry's patches, could you rebase your patch set
on top of the dev branch (which is what we've pushed to Linus plus
Dmitry's patches)?   Thanks!!

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3
  2013-03-04  5:58 ` Theodore Ts'o
@ 2013-03-04  6:37   ` Zheng Liu
  0 siblings, 0 replies; 17+ messages in thread
From: Zheng Liu @ 2013-03-04  6:37 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Dmitry Monakhov, linux-ext4, jack, wenqing.lz

On Mon, Mar 04, 2013 at 12:58:13AM -0500, Theodore Ts'o wrote:
> Dmitry,
> 
> Thanks for working on these patches.  They look good!  I've dropped
> them into the ext4 tree in the dev branch, and am starting to run
> tests on them.
> 
> Zheng, since some of your fix up patches look like they touch some of
> the code modified by Dmitry's patches, could you rebase your patch set
> on top of the dev branch (which is what we've pushed to Linus plus
> Dmitry's patches)?   Thanks!!

No problem, I will rebase my patches against the latest dev branch.

Thanks,
                                                - Zheng

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio
  2013-02-25 16:07 ` [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio Dmitry Monakhov
  2013-02-25 18:08   ` Jan Kara
@ 2013-03-04 14:00   ` Zheng Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Zheng Liu @ 2013-03-04 14:00 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

On Mon, Feb 25, 2013 at 08:07:41PM +0400, Dmitry Monakhov wrote:
> Splitting extents inside endio is bad thing, but unfortunetly it is still
> possible. In fact we are pretty close to the moment when all related
> issues will be fixed. Let's warn developer if it still the case.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/extents.c |   13 ++++++++++++-
>  1 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 1d37f2d..78c2a91 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3386,8 +3386,19 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
>  		"block %llu, max_blocks %u\n", inode->i_ino,
>  		  (unsigned long long)ee_block, ee_len);
>  
> -	/* If extent is larger than requested then split is required */
> +	/* If extent is larger than requested it is a clear sign that we still
> +	 * have some extent state machine issues left. So extent_split is still
> +	 * required.
> +	 * TODO: Once all related issues will be fixed this situation should be
> +	 * illegal.
> +	 */
>  	if (ee_block != map->m_lblk || ee_len > map->m_len) {
> +#ifdef EXT4_DEBUG
> +		ext4_warning("Inode (%ld) finished: extent logical block %llu,"
A parameter is missing.

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 25c86aa..daf8bb9 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3395,7 +3395,8 @@ static int
ext4_convert_unwritten_extents_endio(handle_t *handle,
         */
        if (ee_block != map->m_lblk || ee_len > map->m_len) {
 #ifdef EXT4_DEBUG
-               ext4_warning("Inode (%ld) finished: extent logical block %llu,"
+               ext4_warning(inode->i_sb,
+                            "Inode (%ld) finished: extent logical block %llu,"
                             " len %u; IO logical block %llu, len %u\n",
                             inode->i_ino, (unsigned long long)ee_block, ee_len,
                             (unsigned long long)map->m_lblk, map->m_len);

Regards,
                                                - Zheng

> +			     " len %u; IO logical block %llu, len %u\n",
> +			     inode->i_ino, (unsigned long long)ee_block, ee_len,
> +			     (unsigned long long)map->m_lblk, map->m_len);
> +#endif
>  		err = ext4_split_unwritten_extents(handle, inode, map, path,
>  						   EXT4_GET_BLOCKS_CONVERT);
>  		if (err < 0)
> -- 
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] ext4: disable merging of uninitialized extents
  2013-02-25 16:07 ` [PATCH 2/5] ext4: disable merging of uninitialized extents Dmitry Monakhov
  2013-02-25 18:09   ` Jan Kara
@ 2013-03-04 14:26   ` Zheng Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Zheng Liu @ 2013-03-04 14:26 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, tytso, jack, wenqing.lz

On Mon, Feb 25, 2013 at 08:07:40PM +0400, Dmitry Monakhov wrote:
> Derived from Jan's patch:http://permalink.gmane.org/gmane.comp.file-systems.ext4/36470
> 
> Merging of uninitialized extents creates all sorts of interesting race
> possibilities when writeback / DIO races with fallocate. Thus
> ext4_convert_unwritten_extents_endio() has to deal with a case where
> extent to be converted needs to be split out first. That isn't nice
> for two reasons:
> 
> 1) It may need allocation of extent tree block so ENOSPC is possible.
> 2) It complicates end_io handling code
> 
> So we disable merging of uninitialized extents which allows us to simplify
> the code. Extents will get merged after they are converted to initialized
> ones.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>

After applied this patch, xfstests #275 will print a warning message.

kernel:EXT4-fs (sda2): failed to convert unwritten extents to written
extents -- potential data loss!  (inode 13, offset 1537212416, size 524288,
error -28)

But IMHO we don't need to worry about it because it is hard to be
trigger.  I hit it because I run xfstests #275 several times.  So just a
note here.

Regards,
                                                - Zheng

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-03-04 14:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-25 16:07 [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Dmitry Monakhov
2013-02-25 16:07 ` [PATCH 2/5] ext4: disable merging of uninitialized extents Dmitry Monakhov
2013-02-25 18:09   ` Jan Kara
2013-03-04 14:26   ` Zheng Liu
2013-02-25 16:07 ` [PATCH 3/5] ext4: add warning to ext4_convert_unwritten_extents_endio Dmitry Monakhov
2013-02-25 18:08   ` Jan Kara
2013-03-04 14:00   ` Zheng Liu
2013-02-25 16:07 ` [PATCH 4/5] ext4: remove unnecessary wait for extent conversion in ext4_fallocate() Dmitry Monakhov
2013-02-25 16:07 ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Dmitry Monakhov
2013-02-25 16:29   ` Dmitry Monakhov
2013-02-25 17:04     ` Zheng Liu
2013-02-26 14:23     ` [PATCH] ext4: fix wrong m_len value after unwritten extent conversion (Re: [PATCH 5/5] ext4: invalidate...) Zheng Liu
2013-02-26 15:43       ` Dmitry Monakhov
2013-02-25 18:06   ` [PATCH 5/5] ext4: invalidate exntent-status-tree during extent_migration Jan Kara
2013-02-25 18:33 ` [PATCH 1/5] ext4: ext4_split_extent shoult take care about extent zeroout v3 Jan Kara
2013-03-04  5:58 ` Theodore Ts'o
2013-03-04  6:37   ` Zheng Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.