linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] ext4: fallocate insert/collapse range fixes
@ 2017-01-06 20:25 Roman Pen
  2017-01-06 20:26 ` [PATCH v2 1/2] ext4: Include forgotten start block on fallocate insert range Roman Pen
  2017-01-06 20:26 ` [PATCH v2 2/2] ext4: Do not populate extents tree with outdated offsets while shifting extents Roman Pen
  0 siblings, 2 replies; 7+ messages in thread
From: Roman Pen @ 2017-01-06 20:25 UTC (permalink / raw)
  Cc: Roman Pen, Namjae Jeon, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-kernel

Hi all,

This is the second version of a patchset which targets two nasty bugs in
a logic responsible for shifting extents.  The difference between series
is that in this set the third patch, where I incorrectly tried to optimize
linear search, is skipped.

I ran the 'kvm-xfstests.sh -c 4k -g auto' against b25ead75e7b6 4.10-rc2
with and without my changes.  No regressions were found.

1k configuration was run against latest Linus's mainline.  Also no
regressions were found.

Two patches in a set target two problems, which can be easily reproduced
by the following new xfstest:

  http://www.spinics.net/lists/linux-ext4/msg54965.html

1.  On right shift (insert range) start block is not included in the range
and hole appears at the wrong offset.  The bug can be easily reproduced by
the following test:

    ptr = malloc(4096);
    assert(ptr);

    fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
    assert(fd >= 0);

    rc = fallocate(fd, 0, 0, 8192);
    assert(rc == 0);
    for (i = 0; i < 2048; i++)
            *((unsigned short *)ptr + i) = 0xbeef;
    rc = pwrite(fd, ptr, 4096, 0);
    assert(rc == 4096);
    rc = pwrite(fd, ptr, 4096, 4096);
    assert(rc == 4096);

    for (block = 2; block < 1000; block++) {
            rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
            assert(rc == 0);

            for (i = 0; i < 2048; i++)
                    *((unsigned short *)ptr + i) = block;

            rc = pwrite(fd, ptr, 4096, 4096);
            assert(rc == 4096);
    }

After the test no zero blocks should appear (test always does pwrite() after
fallocate), but zero blocks do exist:

  $ hexdump ./ext4.file | grep '0000 0000'

This bug is targeted by the first patch in the set.

2.  Inside ext4_ext_shift_extents() function ext4_find_extent() is called
without EXT4_EX_NOCACHE flag, which should prevent cache population.  This
leads to outdated offsets in the extents tree and wrong data blocks, which
can be observed doing read().  That is also quite well reproduced by the
test above.

This is fixed by the second patch.

Roman Pen (2):
  ext4: Include forgotten start block on fallocate insert range
  ext4: Do not populate extents tree with outdated offsets while
    shifting extents

 fs/ext4/extents.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
-- 
2.10.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] ext4: Include forgotten start block on fallocate insert range
  2017-01-06 20:25 [PATCH v2 0/2] ext4: fallocate insert/collapse range fixes Roman Pen
@ 2017-01-06 20:26 ` Roman Pen
  2017-01-07 21:22   ` Theodore Ts'o
  2017-01-06 20:26 ` [PATCH v2 2/2] ext4: Do not populate extents tree with outdated offsets while shifting extents Roman Pen
  1 sibling, 1 reply; 7+ messages in thread
From: Roman Pen @ 2017-01-06 20:26 UTC (permalink / raw)
  Cc: Roman Pen, Namjae Jeon, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-kernel

While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:

    ptr = malloc(4096);
    assert(ptr);

    fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
    assert(fd >= 0);

    rc = fallocate(fd, 0, 0, 8192);
    assert(rc == 0);
    for (i = 0; i < 2048; i++)
            *((unsigned short *)ptr + i) = 0xbeef;
    rc = pwrite(fd, ptr, 4096, 0);
    assert(rc == 4096);
    rc = pwrite(fd, ptr, 4096, 4096);
    assert(rc == 4096);

    for (block = 2; block < 1000; block++) {
            rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
            assert(rc == 0);

            for (i = 0; i < 2048; i++)
                    *((unsigned short *)ptr + i) = block;

            rc = pwrite(fd, ptr, 4096, 4096);
            assert(rc == 4096);
    }

Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.

Simple way to verify wrong behaviour is to check zeroed blocks after
the test:

   $ hexdump ./ext4.file | grep '0000 0000'

The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].

This patch fixes the problem by including start into the range.  But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.

The other not obvious change is an iterator check on validness in a
main loop.  Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/ext4/extents.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3e295d3350a9..4d3014b5a3f9 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5343,8 +5343,7 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 	if (!extent)
 		goto out;
 
-	stop = le32_to_cpu(extent->ee_block) +
-			ext4_ext_get_actual_len(extent);
+	stop = le32_to_cpu(extent->ee_block);
 
        /*
 	 * In case of left shift, Don't start shifting extents until we make
@@ -5383,8 +5382,12 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 	else
 		iterator = &stop;
 
-	/* Its safe to start updating extents */
-	while (start < stop) {
+	/*
+	 * Its safe to start updating extents.  Start and stop are unsigned, so
+	 * in case of right shift if extent with 0 block is reached, iterator
+	 * becomes NULL to indicate the end of the loop.
+	 */
+	while (iterator && start <= stop) {
 		path = ext4_find_extent(inode, *iterator, &path, 0);
 		if (IS_ERR(path))
 			return PTR_ERR(path);
@@ -5412,8 +5415,11 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 					ext4_ext_get_actual_len(extent);
 		} else {
 			extent = EXT_FIRST_EXTENT(path[depth].p_hdr);
-			*iterator =  le32_to_cpu(extent->ee_block) > 0 ?
-				le32_to_cpu(extent->ee_block) - 1 : 0;
+			if (le32_to_cpu(extent->ee_block) > 0)
+				*iterator = le32_to_cpu(extent->ee_block) - 1;
+			else
+				/* Beginning is reached, end of the loop */
+				iterator = NULL;
 			/* Update path extent in case we need to stop */
 			while (le32_to_cpu(extent->ee_block) < start)
 				extent++;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] ext4: Do not populate extents tree with outdated offsets while shifting extents
  2017-01-06 20:25 [PATCH v2 0/2] ext4: fallocate insert/collapse range fixes Roman Pen
  2017-01-06 20:26 ` [PATCH v2 1/2] ext4: Include forgotten start block on fallocate insert range Roman Pen
@ 2017-01-06 20:26 ` Roman Pen
  2017-01-07 21:27   ` Theodore Ts'o
  1 sibling, 1 reply; 7+ messages in thread
From: Roman Pen @ 2017-01-06 20:26 UTC (permalink / raw)
  Cc: Roman Pen, Namjae Jeon, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-kernel

Inside ext4_ext_shift_extents() function ext4_find_extent() is called
without EXT4_EX_NOCACHE flag, which should prevent cache population.

This leads to oudated offsets in the extents tree and wrong blocks
afterwards.

Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
ext4_find_extents() call inside ext4_ext_shift_extents function.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/ext4/extents.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4d3014b5a3f9..2a97dff87b96 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5334,7 +5334,8 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 	ext4_lblk_t stop, *iterator, ex_start, ex_end;
 
 	/* Let path point to the last extent */
-	path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL, 0);
+	path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
+				EXT4_EX_NOCACHE);
 	if (IS_ERR(path))
 		return PTR_ERR(path);
 
@@ -5350,7 +5351,8 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 	 * sure the hole is big enough to accommodate the shift.
 	*/
 	if (SHIFT == SHIFT_LEFT) {
-		path = ext4_find_extent(inode, start - 1, &path, 0);
+		path = ext4_find_extent(inode, start - 1, &path,
+					EXT4_EX_NOCACHE);
 		if (IS_ERR(path))
 			return PTR_ERR(path);
 		depth = path->p_depth;
@@ -5388,7 +5390,8 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
 	 * becomes NULL to indicate the end of the loop.
 	 */
 	while (iterator && start <= stop) {
-		path = ext4_find_extent(inode, *iterator, &path, 0);
+		path = ext4_find_extent(inode, *iterator, &path,
+					EXT4_EX_NOCACHE);
 		if (IS_ERR(path))
 			return PTR_ERR(path);
 		depth = path->p_depth;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] ext4: Include forgotten start block on fallocate insert range
  2017-01-06 20:26 ` [PATCH v2 1/2] ext4: Include forgotten start block on fallocate insert range Roman Pen
@ 2017-01-07 21:22   ` Theodore Ts'o
  2017-01-09 12:36     ` Roman Penyaev
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2017-01-07 21:22 UTC (permalink / raw)
  To: Roman Pen; +Cc: Namjae Jeon, Andreas Dilger, linux-ext4, linux-kernel

On Fri, Jan 06, 2017 at 09:26:00PM +0100, Roman Pen wrote:
> While doing 'insert range' start block should be also shifted right.
> The bug can be easily reproduced by the following test:
> 
>     ptr = malloc(4096);
>     assert(ptr);
> 
>     fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
>     assert(fd >= 0);
> 
>     rc = fallocate(fd, 0, 0, 8192);
>     assert(rc == 0);
>     for (i = 0; i < 2048; i++)
>             *((unsigned short *)ptr + i) = 0xbeef;
>     rc = pwrite(fd, ptr, 4096, 0);
>     assert(rc == 4096);
>     rc = pwrite(fd, ptr, 4096, 4096);
>     assert(rc == 4096);
> 
>     for (block = 2; block < 1000; block++) {
>             rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
>             assert(rc == 0);
> 
>             for (i = 0; i < 2048; i++)
>                     *((unsigned short *)ptr + i) = block;
> 
>             rc = pwrite(fd, ptr, 4096, 4096);
>             assert(rc == 4096);
>     }
> 
> Because start block is not included in the range the hole appears at
> the wrong offset (just after the desired offset) and the following
> pwrite() overwrites already existent block, keeping hole untouched.
> 
> Simple way to verify wrong behaviour is to check zeroed blocks after
> the test:
> 
>    $ hexdump ./ext4.file | grep '0000 0000'
> 
> The root cause of the bug is a wrong range (start, stop], where start
> should be inclusive, i.e. [start, stop].
> 
> This patch fixes the problem by including start into the range.  But
> not to break left shift (range collapse) stop points to the beginning
> of the a block, not to the end.
> 
> The other not obvious change is an iterator check on validness in a
> main loop.  Because iterator is unsigned the following corner case
> should be considered with care: insert a block at 0 offset, when stop
> variables overflows and never becomes less than start, which is 0.
> To handle this special case iterator is set to NULL to indicate that
> end of the loop is reached.
> 
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] ext4: Do not populate extents tree with outdated offsets while shifting extents
  2017-01-06 20:26 ` [PATCH v2 2/2] ext4: Do not populate extents tree with outdated offsets while shifting extents Roman Pen
@ 2017-01-07 21:27   ` Theodore Ts'o
  2017-01-09 12:36     ` Roman Penyaev
  0 siblings, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2017-01-07 21:27 UTC (permalink / raw)
  To: Roman Pen; +Cc: Namjae Jeon, Andreas Dilger, linux-ext4, linux-kernel

On Fri, Jan 06, 2017 at 09:26:01PM +0100, Roman Pen wrote:
> Inside ext4_ext_shift_extents() function ext4_find_extent() is called
> without EXT4_EX_NOCACHE flag, which should prevent cache population.
> 
> This leads to oudated offsets in the extents tree and wrong blocks
> afterwards.
> 
> Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
> ext4_find_extents() call inside ext4_ext_shift_extents function.
> 
> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] ext4: Include forgotten start block on fallocate insert range
  2017-01-07 21:22   ` Theodore Ts'o
@ 2017-01-09 12:36     ` Roman Penyaev
  0 siblings, 0 replies; 7+ messages in thread
From: Roman Penyaev @ 2017-01-09 12:36 UTC (permalink / raw)
  To: Theodore Ts'o, Roman Pen, Namjae Jeon, Andreas Dilger,
	linux-ext4, linux-kernel

On Sat, Jan 7, 2017 at 10:22 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Fri, Jan 06, 2017 at 09:26:00PM +0100, Roman Pen wrote:
>> While doing 'insert range' start block should be also shifted right.
>> The bug can be easily reproduced by the following test:
>>
>>     ptr = malloc(4096);
>>     assert(ptr);
>>
>>     fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
>>     assert(fd >= 0);
>>
>>     rc = fallocate(fd, 0, 0, 8192);
>>     assert(rc == 0);
>>     for (i = 0; i < 2048; i++)
>>             *((unsigned short *)ptr + i) = 0xbeef;
>>     rc = pwrite(fd, ptr, 4096, 0);
>>     assert(rc == 4096);
>>     rc = pwrite(fd, ptr, 4096, 4096);
>>     assert(rc == 4096);
>>
>>     for (block = 2; block < 1000; block++) {
>>             rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
>>             assert(rc == 0);
>>
>>             for (i = 0; i < 2048; i++)
>>                     *((unsigned short *)ptr + i) = block;
>>
>>             rc = pwrite(fd, ptr, 4096, 4096);
>>             assert(rc == 4096);
>>     }
>>
>> Because start block is not included in the range the hole appears at
>> the wrong offset (just after the desired offset) and the following
>> pwrite() overwrites already existent block, keeping hole untouched.
>>
>> Simple way to verify wrong behaviour is to check zeroed blocks after
>> the test:
>>
>>    $ hexdump ./ext4.file | grep '0000 0000'
>>
>> The root cause of the bug is a wrong range (start, stop], where start
>> should be inclusive, i.e. [start, stop].
>>
>> This patch fixes the problem by including start into the range.  But
>> not to break left shift (range collapse) stop points to the beginning
>> of the a block, not to the end.
>>
>> The other not obvious change is an iterator check on validness in a
>> main loop.  Because iterator is unsigned the following corner case
>> should be considered with care: insert a block at 0 offset, when stop
>> variables overflows and never becomes less than start, which is 0.
>> To handle this special case iterator is set to NULL to indicate that
>> end of the loop is reached.
>>
>> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
>
> Thanks, applied.
>

Could you please provide with the SHA1 of the patch in your branch?
I want to make an exact reference in a new test of the xfstest which
covers that bug.

--
Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] ext4: Do not populate extents tree with outdated offsets while shifting extents
  2017-01-07 21:27   ` Theodore Ts'o
@ 2017-01-09 12:36     ` Roman Penyaev
  0 siblings, 0 replies; 7+ messages in thread
From: Roman Penyaev @ 2017-01-09 12:36 UTC (permalink / raw)
  To: Theodore Ts'o, Roman Pen, Namjae Jeon, Andreas Dilger,
	linux-ext4, linux-kernel

On Sat, Jan 7, 2017 at 10:27 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Fri, Jan 06, 2017 at 09:26:01PM +0100, Roman Pen wrote:
>> Inside ext4_ext_shift_extents() function ext4_find_extent() is called
>> without EXT4_EX_NOCACHE flag, which should prevent cache population.
>>
>> This leads to oudated offsets in the extents tree and wrong blocks
>> afterwards.
>>
>> Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
>> ext4_find_extents() call inside ext4_ext_shift_extents function.
>>
>> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
>
> Thanks, applied.

Could you please provide with the SHA1 of the patch in your branch?
I want to make an exact reference in a new test of the xfstest which
covers that bug.

--
Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-09 12:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-06 20:25 [PATCH v2 0/2] ext4: fallocate insert/collapse range fixes Roman Pen
2017-01-06 20:26 ` [PATCH v2 1/2] ext4: Include forgotten start block on fallocate insert range Roman Pen
2017-01-07 21:22   ` Theodore Ts'o
2017-01-09 12:36     ` Roman Penyaev
2017-01-06 20:26 ` [PATCH v2 2/2] ext4: Do not populate extents tree with outdated offsets while shifting extents Roman Pen
2017-01-07 21:27   ` Theodore Ts'o
2017-01-09 12:36     ` Roman Penyaev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).