linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] ext4: fix two bugs in ext4_mb_normalize_request
@ 2022-05-28 11:00 Baokun Li
  2022-05-28 11:00 ` [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa Baokun Li
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Baokun Li @ 2022-05-28 11:00 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, libaokun1

The logical block map reached before the problem stack was 1011.
The estimated location of the size logical block of the inde plus
the required allocation length 7, the size is 1018.

But the i_size of inode is 1299, so the size is 1299,
the aligned size is 2048, and the end is 2048.
Because of the restriction of ar -> pleft, start==648.

EXT4_BLOCKS_PER_GROUP (ac- > ac_sb) is 256,
so the size is 256 and the end is 904.

It is not normal to truncate here, the end is less than 1299 of the
target logical block, that is, the allocated range does not contain
the target logical block.

Then this new scope conflicts with the previous PA, as follows:

        pa_start-506        pa_end-759
 |____________P________V_________P__________V_____________l________|
 0                 start-648             end-904    logical-1299   2048

In this case, start is changed to pa_end, that is, 759.
In this case, a bug_ON is reported in ext4_mb_mark_diskspace_used.

The problem is caused by the truncation introduced in the
cd648b8a8fd5 ("ext4: trim allocation requests to group size").
The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
However, the truncation method is incorrect. The group where the
logical is located should be used for allocation. If the value of
EXT4_BLOCKS_PER_GROUP is 256, size 2048 can be divided into eight
groups. If the value of logical is 1299, the value of logical must be
in the sixth group, that is,
	start=1299/256*256=5*256=1280,
	end=size+1280=1536.

Then, the value range can be further narrowed down based on
other restrictions.
                              1024    1280     1536
|_______|_______|_______|_______|_______|__l_____|_______|_______|
0 group1  group2  group3  group4  group5  group6  group7  group8 2048 

At the same time, we fixed earlier assertions that we could find out when
the target logical block was not in the allocated range.

Ritesh Harjani found that flex_bg groups was not considered in
ext4_mb_normalize_request. So we add support for flex_bg to make
the physical blocks of large files contiguous.

I ran xfstests on ext3 and ext4 and found no problems in ext3/4.

V1: https://patchwork.ozlabs.org/project/linux-ext4/cover/20220521134217.312071-1-libaokun1@huawei.com/
V2: https://patchwork.ozlabs.org/project/linux-ext4/cover/20220523141658.2919003-1-libaokun1@huawei.com/

Baokun Li (3):
  ext4: fix bug_on ext4_mb_use_inode_pa
  ext4: correct the judgment of BUG in ext4_mb_normalize_request
  ext4: support flex_bg in ext4_mb_normalize_request

 fs/ext4/mballoc.c | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa
  2022-05-28 11:00 [PATCH v3 0/3] ext4: fix two bugs in ext4_mb_normalize_request Baokun Li
@ 2022-05-28 11:00 ` Baokun Li
  2022-05-28 15:10   ` Ritesh Harjani
  2022-05-28 11:00 ` [PATCH v3 2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request Baokun Li
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Baokun Li @ 2022-05-28 11:00 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, libaokun1, Hulk Robot

Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/mballoc.c:3211!
[...]
RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
[...]
Call Trace:
 ext4_mb_new_blocks+0x9df/0x5d30
 ext4_ext_map_blocks+0x1803/0x4d80
 ext4_map_blocks+0x3a4/0x1a10
 ext4_writepages+0x126d/0x2c30
 do_writepages+0x7f/0x1b0
 __filemap_fdatawrite_range+0x285/0x3b0
 file_write_and_wait_range+0xb1/0x140
 ext4_sync_file+0x1aa/0xca0
 vfs_fsync_range+0xfb/0x260
 do_fsync+0x48/0xa0
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
do_fsync
 vfs_fsync_range
  ext4_sync_file
   file_write_and_wait_range
    __filemap_fdatawrite_range
     do_writepages
      ext4_writepages
       mpage_map_and_submit_extent
        mpage_map_one_extent
         ext4_map_blocks
          ext4_mb_new_blocks
           ext4_mb_normalize_request
            >>> start + size <= ac->ac_o_ex.fe_logical
           ext4_mb_regular_allocator
            ext4_mb_simple_scan_group
             ext4_mb_use_best_found
              ext4_mb_new_preallocation
               ext4_mb_new_inode_pa
                ext4_mb_use_inode_pa
                 >>> set ac->ac_b_ex.fe_len <= 0
           ext4_mb_mark_diskspace_used
            >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);

we can easily reproduce this problem with the following commands:
	`fallocate -l100M disk`
	`mkfs.ext4 -b 1024 -g 256 disk`
	`mount disk /mnt`
	`fsstress -d /mnt -l 0 -n 1000 -p 1`

The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.

Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
V1->V2:
	Replace round_down() with rounddown().
	Modified comments.
V2->V3:
	Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t
	to avoid compilation warnings.

 fs/ext4/mballoc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9f12f29bc346..4d3740fdff90 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4104,6 +4104,15 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	size = size >> bsbits;
 	start = start_off >> bsbits;
 
+	/*
+	 * For tiny groups (smaller than 8MB) the chosen allocation
+	 * alignment may be larger than group size. Make sure the
+	 * alignment does not move allocation to a different group which
+	 * makes mballoc fail assertions later.
+	 */
+	start = max(start, rounddown(ac->ac_o_ex.fe_logical,
+			(ext4_lblk_t)EXT4_BLOCKS_PER_GROUP(ac->ac_sb)));
+
 	/* don't cover already allocated blocks in selected range */
 	if (ar->pleft && start <= ar->lleft) {
 		size -= ar->lleft + 1 - start;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request
  2022-05-28 11:00 [PATCH v3 0/3] ext4: fix two bugs in ext4_mb_normalize_request Baokun Li
  2022-05-28 11:00 ` [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa Baokun Li
@ 2022-05-28 11:00 ` Baokun Li
  2022-05-28 15:12   ` Ritesh Harjani
  2022-05-28 11:00 ` [PATCH v3 3/3] ext4: support flex_bg " Baokun Li
  2022-06-18  2:12 ` [PATCH v3 0/3] ext4: fix two bugs " Theodore Ts'o
  3 siblings, 1 reply; 11+ messages in thread
From: Baokun Li @ 2022-05-28 11:00 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, libaokun1

ext4_mb_normalize_request() can move logical start of allocated blocks
to reduce fragmentation and better utilize preallocation. However logical
block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
always be covered by allocated blocks so we should check that by
modifying and to or in the assertion.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
V1->V2:
	Change Fixes from dfe076c106f6 to c9de560ded61.
V2->V3:
	Delete Fixes tag.
	Add more comments and commit logs to make the code easier to understand.

 fs/ext4/mballoc.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4d3740fdff90..9e06334771a3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4185,7 +4185,22 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	}
 	rcu_read_unlock();
 
-	if (start + size <= ac->ac_o_ex.fe_logical &&
+	/*
+	 * In this function "start" and "size" are normalized for better
+	 * alignment and length such that we could preallocate more blocks.
+	 * This normalization is done such that original request of
+	 * ac->ac_o_ex.fe_logical & fe_len should always lie within "start" and
+	 * "size" boundaries.
+	 * (Note fe_len can be relaxed since FS block allocation API does not
+	 * provide gurantee on number of contiguous blocks allocation since that
+	 * depends upon free space left, etc).
+	 * In case of inode pa, later we use the allocated blocks
+	 * [pa_start + fe_logical - pa_lstart, fe_len/size] from the preallocated
+	 * range of goal/best blocks [start, size] to put it at the
+	 * ac_o_ex.fe_logical extent of this inode.
+	 * (See ext4_mb_use_inode_pa() for more details)
+	 */
+	if (start + size <= ac->ac_o_ex.fe_logical ||
 			start > ac->ac_o_ex.fe_logical) {
 		ext4_msg(ac->ac_sb, KERN_ERR,
 			 "start %lu, size %lu, fe_logical %lu",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/3] ext4: support flex_bg in ext4_mb_normalize_request
  2022-05-28 11:00 [PATCH v3 0/3] ext4: fix two bugs in ext4_mb_normalize_request Baokun Li
  2022-05-28 11:00 ` [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa Baokun Li
  2022-05-28 11:00 ` [PATCH v3 2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request Baokun Li
@ 2022-05-28 11:00 ` Baokun Li
  2022-05-28 15:09   ` Ritesh Harjani
  2022-06-18  2:12 ` [PATCH v3 0/3] ext4: fix two bugs " Theodore Ts'o
  3 siblings, 1 reply; 11+ messages in thread
From: Baokun Li @ 2022-05-28 11:00 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, libaokun1

In ext4_mb_normalize_request, the size of the allocation request is
limited to no more than EXT4_BLOCKS_PER_GROUP. Ritesh mentions that this
does not take into account the case of flex_bg groups. So we should add
support for flex_bg to make the physical blocks of large files contiguous.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/mballoc.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9e06334771a3..253fc250e9a0 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4028,6 +4028,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	loff_t size, start_off;
 	loff_t orig_size __maybe_unused;
 	ext4_lblk_t start;
+	ext4_lblk_t bpg;
 	struct ext4_inode_info *ei = EXT4_I(ac->ac_inode);
 	struct ext4_prealloc_space *pa;
 
@@ -4051,6 +4052,11 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	}
 
 	bsbits = ac->ac_sb->s_blocksize_bits;
+	bpg = EXT4_BLOCKS_PER_GROUP(ac->ac_sb);
+	if (ext4_has_feature_flex_bg(ac->ac_sb) && sbi->s_log_groups_per_flex) {
+		if (check_shl_overflow(bpg, sbi->s_log_groups_per_flex, &bpg))
+			bpg = EXT_MAX_BLOCKS;
+	}
 
 	/* first, let's learn actual file size
 	 * given current request is allocated */
@@ -4110,8 +4116,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	 * alignment does not move allocation to a different group which
 	 * makes mballoc fail assertions later.
 	 */
-	start = max(start, rounddown(ac->ac_o_ex.fe_logical,
-			(ext4_lblk_t)EXT4_BLOCKS_PER_GROUP(ac->ac_sb)));
+	start = max(start, rounddown(ac->ac_o_ex.fe_logical, bpg));
 
 	/* don't cover already allocated blocks in selected range */
 	if (ar->pleft && start <= ar->lleft) {
@@ -4125,8 +4130,8 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	 * Trim allocation request for filesystems with artificially small
 	 * groups.
 	 */
-	if (size > EXT4_BLOCKS_PER_GROUP(ac->ac_sb))
-		size = EXT4_BLOCKS_PER_GROUP(ac->ac_sb);
+	if (size > bpg)
+		size = bpg;
 
 	end = start + size;
 
@@ -4208,7 +4213,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 			 (unsigned long) ac->ac_o_ex.fe_logical);
 		BUG();
 	}
-	BUG_ON(size <= 0 || size > EXT4_BLOCKS_PER_GROUP(ac->ac_sb));
+	BUG_ON(size <= 0 || size > bpg);
 
 	/* now prepare goal request */
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/3] ext4: support flex_bg in ext4_mb_normalize_request
  2022-05-28 11:00 ` [PATCH v3 3/3] ext4: support flex_bg " Baokun Li
@ 2022-05-28 15:09   ` Ritesh Harjani
  2022-05-30  2:35     ` Baokun Li
  0 siblings, 1 reply; 11+ messages in thread
From: Ritesh Harjani @ 2022-05-28 15:09 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3

On 22/05/28 07:00PM, Baokun Li wrote:
> In ext4_mb_normalize_request, the size of the allocation request is
> limited to no more than EXT4_BLOCKS_PER_GROUP. Ritesh mentions that this
> does not take into account the case of flex_bg groups. So we should add
> support for flex_bg to make the physical blocks of large files contiguous.

My only concern here was that what if we are at the flex group end boundary and
decide to take the size as of flex group size. How are we detecting that case.

But, I haven't yet looked at this patch of yours (as I am on travel for next few days),
but if this requires further discussion, we can work on this seperately and let
the other two patches go in as those are part of the bug fixes which you
identified (just my thoughts).

-ritesh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa
  2022-05-28 11:00 ` [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa Baokun Li
@ 2022-05-28 15:10   ` Ritesh Harjani
  2022-05-30  2:12     ` Baokun Li
  0 siblings, 1 reply; 11+ messages in thread
From: Ritesh Harjani @ 2022-05-28 15:10 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, Hulk Robot

On 22/05/28 07:00PM, Baokun Li wrote:
> Hulk Robot reported a BUG_ON:
> ==================================================================
> kernel BUG at fs/ext4/mballoc.c:3211!
> [...]
> RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
> [...]
> Call Trace:
>  ext4_mb_new_blocks+0x9df/0x5d30
>  ext4_ext_map_blocks+0x1803/0x4d80
>  ext4_map_blocks+0x3a4/0x1a10
>  ext4_writepages+0x126d/0x2c30
>  do_writepages+0x7f/0x1b0
>  __filemap_fdatawrite_range+0x285/0x3b0
>  file_write_and_wait_range+0xb1/0x140
>  ext4_sync_file+0x1aa/0xca0
>  vfs_fsync_range+0xfb/0x260
>  do_fsync+0x48/0xa0
> [...]
> ==================================================================
>
> Above issue may happen as follows:
> -------------------------------------
> do_fsync
>  vfs_fsync_range
>   ext4_sync_file
>    file_write_and_wait_range
>     __filemap_fdatawrite_range
>      do_writepages
>       ext4_writepages
>        mpage_map_and_submit_extent
>         mpage_map_one_extent
>          ext4_map_blocks
>           ext4_mb_new_blocks
>            ext4_mb_normalize_request
>             >>> start + size <= ac->ac_o_ex.fe_logical
>            ext4_mb_regular_allocator
>             ext4_mb_simple_scan_group
>              ext4_mb_use_best_found
>               ext4_mb_new_preallocation
>                ext4_mb_new_inode_pa
>                 ext4_mb_use_inode_pa
>                  >>> set ac->ac_b_ex.fe_len <= 0
>            ext4_mb_mark_diskspace_used
>             >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
>
> we can easily reproduce this problem with the following commands:
> 	`fallocate -l100M disk`
> 	`mkfs.ext4 -b 1024 -g 256 disk`
> 	`mount disk /mnt`
> 	`fsstress -d /mnt -l 0 -n 1000 -p 1`
>
> The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
> Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
> when the size is truncated. So start should be the start position of
> the group where ac_o_ex.fe_logical is located after alignment.
> In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
> is very large, the value calculated by start_off is more accurate.
>
> Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
> V1->V2:
> 	Replace round_down() with rounddown().
> 	Modified comments.
> V2->V3:
> 	Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t
> 	to avoid compilation warnings.

Looks good to me. Feel free to add -

Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>

>
>  fs/ext4/mballoc.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request
  2022-05-28 11:00 ` [PATCH v3 2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request Baokun Li
@ 2022-05-28 15:12   ` Ritesh Harjani
  2022-05-30  2:13     ` Baokun Li
  0 siblings, 1 reply; 11+ messages in thread
From: Ritesh Harjani @ 2022-05-28 15:12 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3

On 22/05/28 07:00PM, Baokun Li wrote:
> ext4_mb_normalize_request() can move logical start of allocated blocks
> to reduce fragmentation and better utilize preallocation. However logical
> block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
> always be covered by allocated blocks so we should check that by
> modifying and to or in the assertion.
>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

Changes looks good to me as we discussed. Feel free to add -

Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>

> ---
> V1->V2:
> 	Change Fixes from dfe076c106f6 to c9de560ded61.
> V2->V3:
> 	Delete Fixes tag.
> 	Add more comments and commit logs to make the code easier to understand.
>
>  fs/ext4/mballoc.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 4d3740fdff90..9e06334771a3 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4185,7 +4185,22 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
>  	}
>  	rcu_read_unlock();
>
> -	if (start + size <= ac->ac_o_ex.fe_logical &&
> +	/*
> +	 * In this function "start" and "size" are normalized for better
> +	 * alignment and length such that we could preallocate more blocks.
> +	 * This normalization is done such that original request of
> +	 * ac->ac_o_ex.fe_logical & fe_len should always lie within "start" and
> +	 * "size" boundaries.
> +	 * (Note fe_len can be relaxed since FS block allocation API does not
> +	 * provide gurantee on number of contiguous blocks allocation since that
> +	 * depends upon free space left, etc).
> +	 * In case of inode pa, later we use the allocated blocks
> +	 * [pa_start + fe_logical - pa_lstart, fe_len/size] from the preallocated
> +	 * range of goal/best blocks [start, size] to put it at the
> +	 * ac_o_ex.fe_logical extent of this inode.
> +	 * (See ext4_mb_use_inode_pa() for more details)
> +	 */
> +	if (start + size <= ac->ac_o_ex.fe_logical ||
>  			start > ac->ac_o_ex.fe_logical) {
>  		ext4_msg(ac->ac_sb, KERN_ERR,
>  			 "start %lu, size %lu, fe_logical %lu",
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa
  2022-05-28 15:10   ` Ritesh Harjani
@ 2022-05-30  2:12     ` Baokun Li
  0 siblings, 0 replies; 11+ messages in thread
From: Baokun Li @ 2022-05-30  2:12 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, tytso, adilger.kernel, jack, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, Hulk Robot, Baokun Li

在 2022/5/28 23:10, Ritesh Harjani 写道:
> On 22/05/28 07:00PM, Baokun Li wrote:
>> Hulk Robot reported a BUG_ON:
>> ==================================================================
>> kernel BUG at fs/ext4/mballoc.c:3211!
>> [...]
>> RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
>> [...]
>> Call Trace:
>>   ext4_mb_new_blocks+0x9df/0x5d30
>>   ext4_ext_map_blocks+0x1803/0x4d80
>>   ext4_map_blocks+0x3a4/0x1a10
>>   ext4_writepages+0x126d/0x2c30
>>   do_writepages+0x7f/0x1b0
>>   __filemap_fdatawrite_range+0x285/0x3b0
>>   file_write_and_wait_range+0xb1/0x140
>>   ext4_sync_file+0x1aa/0xca0
>>   vfs_fsync_range+0xfb/0x260
>>   do_fsync+0x48/0xa0
>> [...]
>> ==================================================================
>>
>> Above issue may happen as follows:
>> -------------------------------------
>> do_fsync
>>   vfs_fsync_range
>>    ext4_sync_file
>>     file_write_and_wait_range
>>      __filemap_fdatawrite_range
>>       do_writepages
>>        ext4_writepages
>>         mpage_map_and_submit_extent
>>          mpage_map_one_extent
>>           ext4_map_blocks
>>            ext4_mb_new_blocks
>>             ext4_mb_normalize_request
>>              >>> start + size <= ac->ac_o_ex.fe_logical
>>             ext4_mb_regular_allocator
>>              ext4_mb_simple_scan_group
>>               ext4_mb_use_best_found
>>                ext4_mb_new_preallocation
>>                 ext4_mb_new_inode_pa
>>                  ext4_mb_use_inode_pa
>>                   >>> set ac->ac_b_ex.fe_len <= 0
>>             ext4_mb_mark_diskspace_used
>>              >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
>>
>> we can easily reproduce this problem with the following commands:
>> 	`fallocate -l100M disk`
>> 	`mkfs.ext4 -b 1024 -g 256 disk`
>> 	`mount disk /mnt`
>> 	`fsstress -d /mnt -l 0 -n 1000 -p 1`
>>
>> The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
>> Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
>> when the size is truncated. So start should be the start position of
>> the group where ac_o_ex.fe_logical is located after alignment.
>> In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
>> is very large, the value calculated by start_off is more accurate.
>>
>> Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>> V1->V2:
>> 	Replace round_down() with rounddown().
>> 	Modified comments.
>> V2->V3:
>> 	Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t
>> 	to avoid compilation warnings.
> Looks good to me. Feel free to add -
>
> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
>
>>   fs/ext4/mballoc.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
> .
>
Thank you for your review!
-- 
With Best Regards,
Baokun Li


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request
  2022-05-28 15:12   ` Ritesh Harjani
@ 2022-05-30  2:13     ` Baokun Li
  0 siblings, 0 replies; 11+ messages in thread
From: Baokun Li @ 2022-05-30  2:13 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, tytso, adilger.kernel, jack, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, Baokun Li

在 2022/5/28 23:12, Ritesh Harjani 写道:
> On 22/05/28 07:00PM, Baokun Li wrote:
>> ext4_mb_normalize_request() can move logical start of allocated blocks
>> to reduce fragmentation and better utilize preallocation. However logical
>> block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
>> always be covered by allocated blocks so we should check that by
>> modifying and to or in the assertion.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Changes looks good to me as we discussed. Feel free to add -
>
> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
>
>> ---
>> V1->V2:
>> 	Change Fixes from dfe076c106f6 to c9de560ded61.
>> V2->V3:
>> 	Delete Fixes tag.
>> 	Add more comments and commit logs to make the code easier to understand.
>>
>>   fs/ext4/mballoc.c | 17 ++++++++++++++++-
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4d3740fdff90..9e06334771a3 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4185,7 +4185,22 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
>>   	}
>>   	rcu_read_unlock();
>>
>> -	if (start + size <= ac->ac_o_ex.fe_logical &&
>> +	/*
>> +	 * In this function "start" and "size" are normalized for better
>> +	 * alignment and length such that we could preallocate more blocks.
>> +	 * This normalization is done such that original request of
>> +	 * ac->ac_o_ex.fe_logical & fe_len should always lie within "start" and
>> +	 * "size" boundaries.
>> +	 * (Note fe_len can be relaxed since FS block allocation API does not
>> +	 * provide gurantee on number of contiguous blocks allocation since that
>> +	 * depends upon free space left, etc).
>> +	 * In case of inode pa, later we use the allocated blocks
>> +	 * [pa_start + fe_logical - pa_lstart, fe_len/size] from the preallocated
>> +	 * range of goal/best blocks [start, size] to put it at the
>> +	 * ac_o_ex.fe_logical extent of this inode.
>> +	 * (See ext4_mb_use_inode_pa() for more details)
>> +	 */
>> +	if (start + size <= ac->ac_o_ex.fe_logical ||
>>   			start > ac->ac_o_ex.fe_logical) {
>>   		ext4_msg(ac->ac_sb, KERN_ERR,
>>   			 "start %lu, size %lu, fe_logical %lu",
>> --
>> 2.31.1
>>
> .

Thank you for your review!
-- 
With Best Regards,
Baokun Li


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/3] ext4: support flex_bg in ext4_mb_normalize_request
  2022-05-28 15:09   ` Ritesh Harjani
@ 2022-05-30  2:35     ` Baokun Li
  0 siblings, 0 replies; 11+ messages in thread
From: Baokun Li @ 2022-05-30  2:35 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, tytso, adilger.kernel, jack, lczerner, linux-kernel,
	yi.zhang, yebin10, yukuai3, Baokun Li

在 2022/5/28 23:09, Ritesh Harjani 写道:
> On 22/05/28 07:00PM, Baokun Li wrote:
>> In ext4_mb_normalize_request, the size of the allocation request is
>> limited to no more than EXT4_BLOCKS_PER_GROUP. Ritesh mentions that this
>> does not take into account the case of flex_bg groups. So we should add
>> support for flex_bg to make the physical blocks of large files contiguous.
> My only concern here was that what if we are at the flex group end boundary and
> decide to take the size as of flex group size. How are we detecting that case.
>
> But, I haven't yet looked at this patch of yours (as I am on travel for next few days),
> but if this requires further discussion, we can work on this seperately and let
> the other two patches go in as those are part of the bug fixes which you
> identified (just my thoughts).
>
> -ritesh
> .

All right, work on this separately.

Have a nice trip!

-- 
With Best Regards,
Baokun Li


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/3] ext4: fix two bugs in ext4_mb_normalize_request
  2022-05-28 11:00 [PATCH v3 0/3] ext4: fix two bugs in ext4_mb_normalize_request Baokun Li
                   ` (2 preceding siblings ...)
  2022-05-28 11:00 ` [PATCH v3 3/3] ext4: support flex_bg " Baokun Li
@ 2022-06-18  2:12 ` Theodore Ts'o
  3 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2022-06-18  2:12 UTC (permalink / raw)
  To: linux-ext4, Baokun Li
  Cc: Theodore Ts'o, lczerner, yukuai3, yebin10, linux-kernel,
	adilger.kernel, yi.zhang, jack, ritesh.list

On Sat, 28 May 2022 19:00:14 +0800, Baokun Li wrote:
> The logical block map reached before the problem stack was 1011.
> The estimated location of the size logical block of the inde plus
> the required allocation length 7, the size is 1018.
> 
> But the i_size of inode is 1299, so the size is 1299,
> the aligned size is 2048, and the end is 2048.
> Because of the restriction of ar -> pleft, start==648.
> 
> [...]

I've applied the first two patches, thanks!

[1/3] ext4: fix bug_on ext4_mb_use_inode_pa
      commit: 0fb337007c8cbdaef5bed798c30a82723f97f4cb
[2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request
      commit: d1389cc90702fea565a6efd4d1d0c5c8fe1cc317

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-18  2:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-28 11:00 [PATCH v3 0/3] ext4: fix two bugs in ext4_mb_normalize_request Baokun Li
2022-05-28 11:00 ` [PATCH v3 1/3] ext4: fix bug_on ext4_mb_use_inode_pa Baokun Li
2022-05-28 15:10   ` Ritesh Harjani
2022-05-30  2:12     ` Baokun Li
2022-05-28 11:00 ` [PATCH v3 2/3] ext4: correct the judgment of BUG in ext4_mb_normalize_request Baokun Li
2022-05-28 15:12   ` Ritesh Harjani
2022-05-30  2:13     ` Baokun Li
2022-05-28 11:00 ` [PATCH v3 3/3] ext4: support flex_bg " Baokun Li
2022-05-28 15:09   ` Ritesh Harjani
2022-05-30  2:35     ` Baokun Li
2022-06-18  2:12 ` [PATCH v3 0/3] ext4: fix two bugs " Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).