* [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing @ 2022-08-17 13:26 Baokun Li 2022-08-17 13:27 ` [PATCH 1/2] ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 Baokun Li ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Baokun Li @ 2022-08-17 13:26 UTC (permalink / raw) To: linux-ext4 Cc: tytso, adilger.kernel, jack, ritesh.list, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3, libaokun1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="y", Size: 1466 bytes --] We got a issue: the ext4 writeback process was stuck in do_writepages and do_writepages kept retrying. However, '-ENOMEM' is returned each time, even if there is still free memory on the current machine. We find that the direct cause of this issue is that the bg_inode_table_hi in the group descriptor is written to an incorrect value, which causes the inode block found through the inode table to exceed the end_ block。Then, sb_getblk always returns null, __ext4_get_inode_loc returns `-ENOMEM`, and do_writepages keeps retrying. The root cause is that the GDT is overwritten when the backup superblock is updated in the online resizing process of the disk. The prerequisite is that the block size of the disk is 1024, bigalloc and meta_bg are enabled, and sparse_super is disabled. Therefore, the check on inode_table is added to __ext4_get_inode_loc by referring to the check on inode_bitmap in ext4_read_inode_bitmap to avoid infinite loops in similar cases. In addition, the offset of the backup super block in the group in the above case is also corrected to avoid some strange problems caused by the GDT being overwritten. Baokun Li (2): ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop fs/ext4/inode.c | 10 +++++++++- fs/ext4/resize.c | 6 +++++- 2 files changed, 14 insertions(+), 2 deletions(-) -- 2.31.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 2022-08-17 13:26 [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Baokun Li @ 2022-08-17 13:27 ` Baokun Li 2022-08-17 13:27 ` [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop Baokun Li 2022-11-29 21:12 ` [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Theodore Ts'o 2 siblings, 0 replies; 15+ messages in thread From: Baokun Li @ 2022-08-17 13:27 UTC (permalink / raw) To: linux-ext4 Cc: tytso, adilger.kernel, jack, ritesh.list, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3, libaokun1 When the backup superblock is updated in update_backups, the offset of the current superblock in the group (that is, sbi->s_sbh->b_blocknr) is used as the offset of the backup superblock in the group where the backup superblock resides. When blocksize==1024, sbi->s_sbh->b_blocknr is 1. Their block distribution of groups is {0: 1-8192, 1:8193-16384...}, so the location of the backup superblock is still the first block in each group. If bigalloc is enabled at the same time, the block distribution of each group changes to {0: 0-131071, 1:131072-262143...}. In this case, update_backups overwrites the second block instead of the first block in each group that contains backup superblocks with the current superblock. As a result, both the backup superblock and the backup GDT are incorrect. This is nothing, after all, the backup GDT is only used when the disk is repaired. However, in some cases, this may cause file system corruption, data loss, and even some programs stuck in the D state. We can easily reproduce this problem with the following commands: mkfs.ext4 -F -O ^resize_inode,^sparse_super,bigalloc -b 1024 /dev/sdb 4M mount /dev/sdb /tmp/test resize2fs /dev/sdb 4G This is because the GDT for each meta_bg is placed in its first group. When sparse_super is disabled, backup superblocks exist in each group. In this case, the GDT of the new meta_bg obtained by online resizing is corrupt. To solve this issue, we only need to specify the offset of the backup superblock in the group to 0 when bigalloc is enabled. Fixes: d77147ff443b ("ext4: add support for online resizing with bigalloc") Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/ext4/resize.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index fea2a68d067b..0146a11efd06 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1590,8 +1590,12 @@ static int ext4_flex_group_add(struct super_block *sb, EXT4_DESC_PER_BLOCK(sb)); int meta_bg = ext4_has_feature_meta_bg(sb); sector_t old_gdb = 0; + sector_t blk_off = sbi->s_sbh->b_blocknr; - update_backups(sb, sbi->s_sbh->b_blocknr, (char *)es, + if (ext4_has_feature_bigalloc(sb)) + blk_off = 0; + + update_backups(sb, blk_off, (char *)es, sizeof(struct ext4_super_block), 0); for (; gdb_num <= gdb_num_end; gdb_num++) { struct buffer_head *gdb_bh; -- 2.31.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-17 13:26 [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Baokun Li 2022-08-17 13:27 ` [PATCH 1/2] ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 Baokun Li @ 2022-08-17 13:27 ` Baokun Li 2022-08-17 14:31 ` Jan Kara 2022-11-29 21:12 ` [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Theodore Ts'o 2 siblings, 1 reply; 15+ messages in thread From: Baokun Li @ 2022-08-17 13:27 UTC (permalink / raw) To: linux-ext4 Cc: tytso, adilger.kernel, jack, ritesh.list, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3, libaokun1 In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met. In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, the function returns -ENOMEM. In __getblk_slow, if the return value of grow_buffers is less than 0, the function returns NULL. When the three processes are connected in series like the following stack, an infinite loop may occur: do_writepages <--- keep retrying ext4_writepages mpage_map_and_submit_extent mpage_map_one_extent ext4_map_blocks ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_ext_convert_to_initialized ext4_split_extent ext4_split_extent_at __ext4_ext_dirty __ext4_mark_inode_dirty ext4_reserve_inode_write ext4_get_inode_loc __ext4_get_inode_loc <--- return -ENOMEM sb_getblk __getblk_gfp __getblk_slow <--- return NULL grow_buffers grow_dev_page <--- return -ENXIO ret = (block < end_block) ? 1 : -ENXIO; In this issue, bg_inode_table_hi is overwritten as an incorrect value. As a result, `block < end_block` cannot be met in grow_dev_page. Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages keeps retrying. As a result, the writeback process is in the D state due to an infinite loop. Add a check on inode table block in the __ext4_get_inode_loc function by referring to ext4_read_inode_bitmap to avoid this infinite loop. Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/ext4/inode.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 601214453c3a..5e171879fa23 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4466,9 +4466,17 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, inodes_per_block = EXT4_SB(sb)->s_inodes_per_block; inode_offset = ((ino - 1) % EXT4_INODES_PER_GROUP(sb)); - block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb); + block = ext4_inode_table(sb, gdp); + if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) || + (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) { + ext4_error(sb, "Invalid inode table block %llu in " + "block_group %u", block, iloc->block_group); + return -EFSCORRUPTED; + } + block += (inode_offset / inodes_per_block); + bh = sb_getblk(sb, block); if (unlikely(!bh)) return -ENOMEM; -- 2.31.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-17 13:27 ` [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop Baokun Li @ 2022-08-17 14:31 ` Jan Kara 2022-08-18 1:54 ` Baokun Li 2022-08-18 14:43 ` Ritesh Harjani (IBM) 0 siblings, 2 replies; 15+ messages in thread From: Jan Kara @ 2022-08-17 14:31 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3 On Wed 17-08-22 21:27:01, Baokun Li wrote: > In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" > and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met. > > In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, > the function returns -ENOMEM. > > In __getblk_slow, if the return value of grow_buffers is less than 0, > the function returns NULL. > > When the three processes are connected in series like the following stack, > an infinite loop may occur: > > do_writepages <--- keep retrying > ext4_writepages > mpage_map_and_submit_extent > mpage_map_one_extent > ext4_map_blocks > ext4_ext_map_blocks > ext4_ext_handle_unwritten_extents > ext4_ext_convert_to_initialized > ext4_split_extent > ext4_split_extent_at > __ext4_ext_dirty > __ext4_mark_inode_dirty > ext4_reserve_inode_write > ext4_get_inode_loc > __ext4_get_inode_loc <--- return -ENOMEM > sb_getblk > __getblk_gfp > __getblk_slow <--- return NULL > grow_buffers > grow_dev_page <--- return -ENXIO > ret = (block < end_block) ? 1 : -ENXIO; > > In this issue, bg_inode_table_hi is overwritten as an incorrect value. > As a result, `block < end_block` cannot be met in grow_dev_page. > Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages > keeps retrying. As a result, the writeback process is in the D state due > to an infinite loop. > > Add a check on inode table block in the __ext4_get_inode_loc function by > referring to ext4_read_inode_bitmap to avoid this infinite loop. > > Signed-off-by: Baokun Li <libaokun1@huawei.com> Thanks for the fixes. Normally, we check that inode table is fine in ext4_check_descriptors() (and those checks are much stricter) so it seems unnecessary to check it again here. I understand that in your case it was resize that corrupted the group descriptor after the filesystem was mounted which is nasty but there's much more metadata that can be corrupted like this and it's infeasible to check each metadata block before we use it. IMHO a proper fix to this class of issues would be for sb_getblk() to return proper error so that we can distinguish ENOMEM from other errors. But that will be a larger undertaking... Honza > --- > fs/ext4/inode.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 601214453c3a..5e171879fa23 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4466,9 +4466,17 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, > inodes_per_block = EXT4_SB(sb)->s_inodes_per_block; > inode_offset = ((ino - 1) % > EXT4_INODES_PER_GROUP(sb)); > - block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); > iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb); > > + block = ext4_inode_table(sb, gdp); > + if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) || > + (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) { > + ext4_error(sb, "Invalid inode table block %llu in " > + "block_group %u", block, iloc->block_group); > + return -EFSCORRUPTED; > + } > + block += (inode_offset / inodes_per_block); > + > bh = sb_getblk(sb, block); > if (unlikely(!bh)) > return -ENOMEM; > -- > 2.31.1 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-17 14:31 ` Jan Kara @ 2022-08-18 1:54 ` Baokun Li 2022-08-18 14:43 ` Ritesh Harjani (IBM) 1 sibling, 0 replies; 15+ messages in thread From: Baokun Li @ 2022-08-18 1:54 UTC (permalink / raw) To: Jan Kara Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3, Baokun Li On 8/17/2022 10:31 PM, Jan Kara wrote: > On Wed 17-08-22 21:27:01, Baokun Li wrote: >> In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" >> and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met. >> >> In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, >> the function returns -ENOMEM. >> >> In __getblk_slow, if the return value of grow_buffers is less than 0, >> the function returns NULL. >> >> When the three processes are connected in series like the following stack, >> an infinite loop may occur: >> >> do_writepages <--- keep retrying >> ext4_writepages >> mpage_map_and_submit_extent >> mpage_map_one_extent >> ext4_map_blocks >> ext4_ext_map_blocks >> ext4_ext_handle_unwritten_extents >> ext4_ext_convert_to_initialized >> ext4_split_extent >> ext4_split_extent_at >> __ext4_ext_dirty >> __ext4_mark_inode_dirty >> ext4_reserve_inode_write >> ext4_get_inode_loc >> __ext4_get_inode_loc <--- return -ENOMEM >> sb_getblk >> __getblk_gfp >> __getblk_slow <--- return NULL >> grow_buffers >> grow_dev_page <--- return -ENXIO >> ret = (block < end_block) ? 1 : -ENXIO; >> >> In this issue, bg_inode_table_hi is overwritten as an incorrect value. >> As a result, `block < end_block` cannot be met in grow_dev_page. >> Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages >> keeps retrying. As a result, the writeback process is in the D state due >> to an infinite loop. >> >> Add a check on inode table block in the __ext4_get_inode_loc function by >> referring to ext4_read_inode_bitmap to avoid this infinite loop. >> >> Signed-off-by: Baokun Li <libaokun1@huawei.com> > Thanks for the fixes. Normally, we check that inode table is fine in > ext4_check_descriptors() (and those checks are much stricter) so it seems Yes, when I first found this problem, I thought that the inode table was an abnormal value when the disk was mounted. However, in ext4_ check_ Descriptors see the inspection of inode table. ext4_ check_ Descriptors is more strict, and it also checks blocks_ Bitmap and inode_ bitmap。 However, in addition to mounting, I can't find any changes to bg_inode_table_hi in other parts of the kernel, which makes me very confused. It wasn't until I tracked the address of the variable that I found it was modified by memcpy. This check is added here to facilitate us to find problems and avoid dead loops. Otherwise, we may not find problems until the write back process is stuck in the D state for several hours. At this time, the file system may be in a mess. > unnecessary to check it again here. I understand that in your case it was > resize that corrupted the group descriptor after the filesystem was mounted > which is nasty but there's much more metadata that can be corrupted like > this and it's infeasible to check each metadata block before we use it. Indeed, But is it true that checking for inode_bitmap in ext4_read_inode_bitmap and checking for block_bitmap in ext4_read_block_bitmap_nowait also unnecessary? After all, inode_bitmap and block_bitmap may be faulty only when the metadata is corrupted. I think it seems unreasonable that both block_bitmap and inode_bitmap have checks and inode_table does not. > > IMHO a proper fix to this class of issues would be for sb_getblk() to > return proper error so that we can distinguish ENOMEM from other errors. Totally agree! There is a FIXME in the comments of __getblk_gfp: ` ` ` __getblk_gfp() will lock up the machine if grow_dev_page's try_to_free_buffers() attempt is failing. FIXME, perhaps? ` ` ` This is the same as this issue because the return value of the grow_buffers function is not propagated correctly. > But that will be a larger undertaking... > > Honza Yes, this function can be called in many places, and external interface changes are involved. >> --- >> fs/ext4/inode.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) Thanks! -- With Best Regards, Baokun Li ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-17 14:31 ` Jan Kara 2022-08-18 1:54 ` Baokun Li @ 2022-08-18 14:43 ` Ritesh Harjani (IBM) 2022-08-18 17:23 ` Jan Kara 2022-11-28 20:44 ` Theodore Ts'o 1 sibling, 2 replies; 15+ messages in thread From: Ritesh Harjani (IBM) @ 2022-08-18 14:43 UTC (permalink / raw) To: Jan Kara Cc: Baokun Li, linux-ext4, tytso, adilger.kernel, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3 On 22/08/17 04:31PM, Jan Kara wrote: > On Wed 17-08-22 21:27:01, Baokun Li wrote: > > In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" > > and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met. > > > > In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, > > the function returns -ENOMEM. > > > > In __getblk_slow, if the return value of grow_buffers is less than 0, > > the function returns NULL. > > > > When the three processes are connected in series like the following stack, > > an infinite loop may occur: > > > > do_writepages <--- keep retrying > > ext4_writepages > > mpage_map_and_submit_extent > > mpage_map_one_extent > > ext4_map_blocks > > ext4_ext_map_blocks > > ext4_ext_handle_unwritten_extents > > ext4_ext_convert_to_initialized > > ext4_split_extent > > ext4_split_extent_at > > __ext4_ext_dirty > > __ext4_mark_inode_dirty > > ext4_reserve_inode_write > > ext4_get_inode_loc > > __ext4_get_inode_loc <--- return -ENOMEM > > sb_getblk > > __getblk_gfp > > __getblk_slow <--- return NULL > > grow_buffers > > grow_dev_page <--- return -ENXIO > > ret = (block < end_block) ? 1 : -ENXIO; > > > > In this issue, bg_inode_table_hi is overwritten as an incorrect value. > > As a result, `block < end_block` cannot be met in grow_dev_page. > > Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages > > keeps retrying. As a result, the writeback process is in the D state due > > to an infinite loop. > > > > Add a check on inode table block in the __ext4_get_inode_loc function by > > referring to ext4_read_inode_bitmap to avoid this infinite loop. > > > > Signed-off-by: Baokun Li <libaokun1@huawei.com> > > Thanks for the fixes. Normally, we check that inode table is fine in > ext4_check_descriptors() (and those checks are much stricter) so it seems > unnecessary to check it again here. I understand that in your case it was > resize that corrupted the group descriptor after the filesystem was mounted > which is nasty but there's much more metadata that can be corrupted like > this and it's infeasible to check each metadata block before we use it. > > IMHO a proper fix to this class of issues would be for sb_getblk() to > return proper error so that we can distinguish ENOMEM from other errors. > But that will be a larger undertaking... > Hi Jan, How about adding a wrapper around sb_getblk() which will do the basic block bound checks for ext4. Then we can carefully convert all the callers of sb_getblk() in ext4 to call ext4_sb_getblk(). ext4_sb_getblk() will then return either of below - 1. ERR_PTR(-EFSCORRUPTED) 2. NULL 3. struct buffer_head* It's caller can then implement the proper error handling. Folding a small patch to implement the simple bound check. Is this the right approach? From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Date: Thu, 18 Aug 2022 07:53:58 -0500 Subject: [RFC] ext4: Add ext4_sb_getblk() wrapper for block bounds checking We might need more bounds checking on the block before calling sb_getblk(). This helper does that and if it is not valid then returns ERR_PTR(-EFSCORRUPTED) Later we will need to carefully convert the callers to use ext4_sb_getblk() instead of sb_getblk(). For e.g. __ext4_get_inode_loc() Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/block_validity.c | 4 +--- fs/ext4/ext4.h | 12 ++++++++++++ fs/ext4/super.c | 9 +++++++++ 3 files changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c index 5504f72bbbbe..69347fab7c38 100644 --- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -301,9 +301,7 @@ int ext4_sb_block_valid(struct super_block *sb, struct inode *inode, struct rb_node *n; int ret = 1; - if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || - (start_blk + count < start_blk) || - (start_blk + count > ext4_blocks_count(sbi->s_es))) + if (!ext4_sb_block_valid_fastcheck(sbi->s_es, start_blk, count)) return 0; /* diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9bca5565547b..79d0e45185d3 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3072,6 +3072,8 @@ extern struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block, blk_opf_t op_flags); extern struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb, sector_t block); +extern struct buffer_head *ext4_sb_getblk(struct super_block *sb, + sector_t block); extern void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags, bh_end_io_t *end_io); extern int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags, @@ -3358,6 +3360,16 @@ static inline unsigned int ext4_flex_bg_size(struct ext4_sb_info *sbi) return 1 << sbi->s_log_groups_per_flex; } +static inline bool ext4_sb_block_valid_fastcheck(struct ext4_super_block *es, + sector_t start_blk, unsigned int count) +{ + if ((start_blk <= le32_to_cpu(es->s_first_data_block)) || + (start_blk + count < start_blk) || + (start_blk + count > ext4_blocks_count(es))) + return false; + return true; +} + #define ext4_std_error(sb, errno) \ do { \ if ((errno)) \ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9a66abcca1a8..5b29272ad9a9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -269,6 +269,15 @@ void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block) } } +struct buffer_head *ext4_sb_getblk(struct super_block *sb, sector_t block) +{ + struct ext4_super_block *es = EXT4_SB(sb)->s_es; + + if (!ext4_sb_block_valid_fastcheck(es, block, 1)) + return ERR_PTR(-EFSCORRUPTED); + return sb_getblk(sb, block); +} + static int ext4_verify_csum_type(struct super_block *sb, struct ext4_super_block *es) { -- 2.25.1 -ritesh > Honza > > > --- > > fs/ext4/inode.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 601214453c3a..5e171879fa23 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -4466,9 +4466,17 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, > > inodes_per_block = EXT4_SB(sb)->s_inodes_per_block; > > inode_offset = ((ino - 1) % > > EXT4_INODES_PER_GROUP(sb)); > > - block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); > > iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb); > > > > + block = ext4_inode_table(sb, gdp); > > + if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) || > > + (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) { > > + ext4_error(sb, "Invalid inode table block %llu in " > > + "block_group %u", block, iloc->block_group); > > + return -EFSCORRUPTED; > > + } > > + block += (inode_offset / inodes_per_block); > > + > > bh = sb_getblk(sb, block); > > if (unlikely(!bh)) > > return -ENOMEM; > > -- > > 2.31.1 > > > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-18 14:43 ` Ritesh Harjani (IBM) @ 2022-08-18 17:23 ` Jan Kara 2022-08-18 23:15 ` Ritesh Harjani (IBM) 2022-11-28 20:44 ` Theodore Ts'o 1 sibling, 1 reply; 15+ messages in thread From: Jan Kara @ 2022-08-18 17:23 UTC (permalink / raw) To: Ritesh Harjani (IBM) Cc: Jan Kara, Baokun Li, linux-ext4, tytso, adilger.kernel, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3 On Thu 18-08-22 20:13:53, Ritesh Harjani (IBM) wrote: > On 22/08/17 04:31PM, Jan Kara wrote: > > On Wed 17-08-22 21:27:01, Baokun Li wrote: > > > In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" > > > and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met. > > > > > > In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, > > > the function returns -ENOMEM. > > > > > > In __getblk_slow, if the return value of grow_buffers is less than 0, > > > the function returns NULL. > > > > > > When the three processes are connected in series like the following stack, > > > an infinite loop may occur: > > > > > > do_writepages <--- keep retrying > > > ext4_writepages > > > mpage_map_and_submit_extent > > > mpage_map_one_extent > > > ext4_map_blocks > > > ext4_ext_map_blocks > > > ext4_ext_handle_unwritten_extents > > > ext4_ext_convert_to_initialized > > > ext4_split_extent > > > ext4_split_extent_at > > > __ext4_ext_dirty > > > __ext4_mark_inode_dirty > > > ext4_reserve_inode_write > > > ext4_get_inode_loc > > > __ext4_get_inode_loc <--- return -ENOMEM > > > sb_getblk > > > __getblk_gfp > > > __getblk_slow <--- return NULL > > > grow_buffers > > > grow_dev_page <--- return -ENXIO > > > ret = (block < end_block) ? 1 : -ENXIO; > > > > > > In this issue, bg_inode_table_hi is overwritten as an incorrect value. > > > As a result, `block < end_block` cannot be met in grow_dev_page. > > > Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages > > > keeps retrying. As a result, the writeback process is in the D state due > > > to an infinite loop. > > > > > > Add a check on inode table block in the __ext4_get_inode_loc function by > > > referring to ext4_read_inode_bitmap to avoid this infinite loop. > > > > > > Signed-off-by: Baokun Li <libaokun1@huawei.com> > > > > Thanks for the fixes. Normally, we check that inode table is fine in > > ext4_check_descriptors() (and those checks are much stricter) so it seems > > unnecessary to check it again here. I understand that in your case it was > > resize that corrupted the group descriptor after the filesystem was mounted > > which is nasty but there's much more metadata that can be corrupted like > > this and it's infeasible to check each metadata block before we use it. > > > > IMHO a proper fix to this class of issues would be for sb_getblk() to > > return proper error so that we can distinguish ENOMEM from other errors. > > But that will be a larger undertaking... > > > > Hi Jan, > > How about adding a wrapper around sb_getblk() which will do the basic block > bound checks for ext4. Then we can carefully convert all the callers of > sb_getblk() in ext4 to call ext4_sb_getblk(). > > ext4_sb_getblk() will then return either of below - > 1. ERR_PTR(-EFSCORRUPTED) > 2. NULL > 3. struct buffer_head* > > It's caller can then implement the proper error handling. > > Folding a small patch to implement the simple bound check. Is this the right > approach? Yep, looks sensible to me. Maybe I'd just make ext4_sb_getblk() return bh or ERR_PTR so something like ERR_PTR(-EFSCORRUPTED), ERR_PTR(-ENXIO), or bh pointer. Honza > > From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> > Date: Thu, 18 Aug 2022 07:53:58 -0500 > Subject: [RFC] ext4: Add ext4_sb_getblk() wrapper for block bounds checking > > We might need more bounds checking on the block before calling sb_getblk(). > This helper does that and if it is not valid then returns ERR_PTR(-EFSCORRUPTED) > Later we will need to carefully convert the callers to use ext4_sb_getblk() > instead of sb_getblk(). > For e.g. __ext4_get_inode_loc() > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > --- > fs/ext4/block_validity.c | 4 +--- > fs/ext4/ext4.h | 12 ++++++++++++ > fs/ext4/super.c | 9 +++++++++ > 3 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c > index 5504f72bbbbe..69347fab7c38 100644 > --- a/fs/ext4/block_validity.c > +++ b/fs/ext4/block_validity.c > @@ -301,9 +301,7 @@ int ext4_sb_block_valid(struct super_block *sb, struct inode *inode, > struct rb_node *n; > int ret = 1; > > - if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || > - (start_blk + count < start_blk) || > - (start_blk + count > ext4_blocks_count(sbi->s_es))) > + if (!ext4_sb_block_valid_fastcheck(sbi->s_es, start_blk, count)) > return 0; > > /* > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 9bca5565547b..79d0e45185d3 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -3072,6 +3072,8 @@ extern struct buffer_head *ext4_sb_bread(struct super_block *sb, > sector_t block, blk_opf_t op_flags); > extern struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb, > sector_t block); > +extern struct buffer_head *ext4_sb_getblk(struct super_block *sb, > + sector_t block); > extern void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags, > bh_end_io_t *end_io); > extern int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags, > @@ -3358,6 +3360,16 @@ static inline unsigned int ext4_flex_bg_size(struct ext4_sb_info *sbi) > return 1 << sbi->s_log_groups_per_flex; > } > > +static inline bool ext4_sb_block_valid_fastcheck(struct ext4_super_block *es, > + sector_t start_blk, unsigned int count) > +{ > + if ((start_blk <= le32_to_cpu(es->s_first_data_block)) || > + (start_blk + count < start_blk) || > + (start_blk + count > ext4_blocks_count(es))) > + return false; > + return true; > +} > + > #define ext4_std_error(sb, errno) \ > do { \ > if ((errno)) \ > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 9a66abcca1a8..5b29272ad9a9 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -269,6 +269,15 @@ void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block) > } > } > > +struct buffer_head *ext4_sb_getblk(struct super_block *sb, sector_t block) > +{ > + struct ext4_super_block *es = EXT4_SB(sb)->s_es; > + > + if (!ext4_sb_block_valid_fastcheck(es, block, 1)) > + return ERR_PTR(-EFSCORRUPTED); > + return sb_getblk(sb, block); > +} > + > static int ext4_verify_csum_type(struct super_block *sb, > struct ext4_super_block *es) > { > -- > 2.25.1 > > -ritesh > > > > > Honza > > > > > --- > > > fs/ext4/inode.c | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > index 601214453c3a..5e171879fa23 100644 > > > --- a/fs/ext4/inode.c > > > +++ b/fs/ext4/inode.c > > > @@ -4466,9 +4466,17 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, > > > inodes_per_block = EXT4_SB(sb)->s_inodes_per_block; > > > inode_offset = ((ino - 1) % > > > EXT4_INODES_PER_GROUP(sb)); > > > - block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); > > > iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb); > > > > > > + block = ext4_inode_table(sb, gdp); > > > + if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) || > > > + (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) { > > > + ext4_error(sb, "Invalid inode table block %llu in " > > > + "block_group %u", block, iloc->block_group); > > > + return -EFSCORRUPTED; > > > + } > > > + block += (inode_offset / inodes_per_block); > > > + > > > bh = sb_getblk(sb, block); > > > if (unlikely(!bh)) > > > return -ENOMEM; > > > -- > > > 2.31.1 > > > > > -- > > Jan Kara <jack@suse.com> > > SUSE Labs, CR -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-18 17:23 ` Jan Kara @ 2022-08-18 23:15 ` Ritesh Harjani (IBM) 2022-08-19 8:44 ` Jan Kara 0 siblings, 1 reply; 15+ messages in thread From: Ritesh Harjani (IBM) @ 2022-08-18 23:15 UTC (permalink / raw) To: Jan Kara Cc: Baokun Li, linux-ext4, tytso, adilger.kernel, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3 On 22/08/18 07:23PM, Jan Kara wrote: > On Thu 18-08-22 20:13:53, Ritesh Harjani (IBM) wrote: > > On 22/08/17 04:31PM, Jan Kara wrote: > > > On Wed 17-08-22 21:27:01, Baokun Li wrote: > > > > In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" > > > > and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met. > > > > > > > > In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, > > > > the function returns -ENOMEM. > > > > > > > > In __getblk_slow, if the return value of grow_buffers is less than 0, > > > > the function returns NULL. > > > > > > > > When the three processes are connected in series like the following stack, > > > > an infinite loop may occur: > > > > > > > > do_writepages <--- keep retrying > > > > ext4_writepages > > > > mpage_map_and_submit_extent > > > > mpage_map_one_extent > > > > ext4_map_blocks > > > > ext4_ext_map_blocks > > > > ext4_ext_handle_unwritten_extents > > > > ext4_ext_convert_to_initialized > > > > ext4_split_extent > > > > ext4_split_extent_at > > > > __ext4_ext_dirty > > > > __ext4_mark_inode_dirty > > > > ext4_reserve_inode_write > > > > ext4_get_inode_loc > > > > __ext4_get_inode_loc <--- return -ENOMEM > > > > sb_getblk > > > > __getblk_gfp > > > > __getblk_slow <--- return NULL > > > > grow_buffers > > > > grow_dev_page <--- return -ENXIO > > > > ret = (block < end_block) ? 1 : -ENXIO; > > > > > > > > In this issue, bg_inode_table_hi is overwritten as an incorrect value. > > > > As a result, `block < end_block` cannot be met in grow_dev_page. > > > > Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages > > > > keeps retrying. As a result, the writeback process is in the D state due > > > > to an infinite loop. > > > > > > > > Add a check on inode table block in the __ext4_get_inode_loc function by > > > > referring to ext4_read_inode_bitmap to avoid this infinite loop. > > > > > > > > Signed-off-by: Baokun Li <libaokun1@huawei.com> > > > > > > Thanks for the fixes. Normally, we check that inode table is fine in > > > ext4_check_descriptors() (and those checks are much stricter) so it seems > > > unnecessary to check it again here. I understand that in your case it was > > > resize that corrupted the group descriptor after the filesystem was mounted > > > which is nasty but there's much more metadata that can be corrupted like > > > this and it's infeasible to check each metadata block before we use it. > > > > > > IMHO a proper fix to this class of issues would be for sb_getblk() to > > > return proper error so that we can distinguish ENOMEM from other errors. > > > But that will be a larger undertaking... > > > > > > > Hi Jan, > > > > How about adding a wrapper around sb_getblk() which will do the basic block > > bound checks for ext4. Then we can carefully convert all the callers of > > sb_getblk() in ext4 to call ext4_sb_getblk(). > > > > ext4_sb_getblk() will then return either of below - > > 1. ERR_PTR(-EFSCORRUPTED) > > 2. NULL > > 3. struct buffer_head* > > > > It's caller can then implement the proper error handling. > > > > Folding a small patch to implement the simple bound check. Is this the right > > approach? > > Yep, looks sensible to me. Maybe I'd just make ext4_sb_getblk() return bh > or ERR_PTR so something like ERR_PTR(-EFSCORRUPTED), ERR_PTR(-ENXIO), or bh > pointer. Sure, Thanks Jan. Will do that once I clear some confusion w.r.t "start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)" At some places this is checked with "<= s_first_data_block" e.g. fs/ext4/ialloc.c, ext4_sb_block_valid() while at some places I see it to be "< s_first_data_block" e.g. fs/ext4/mballoc.c, fs/ext4/mmp.c Will spend sometime to understand why the difference and if there is anything I might miss here for off-by-one check. Adding more to the confusion would be difference w.r.t blocksize = 1024 v/s other blocksizes. Based on the blocksize value I guess, s_first_data_block can be different (0/1??). Or can bigalloc can change this... ...Will look more into this. -ritesh > > Honza > > > > > From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> > > Date: Thu, 18 Aug 2022 07:53:58 -0500 > > Subject: [RFC] ext4: Add ext4_sb_getblk() wrapper for block bounds checking > > > > We might need more bounds checking on the block before calling sb_getblk(). > > This helper does that and if it is not valid then returns ERR_PTR(-EFSCORRUPTED) > > Later we will need to carefully convert the callers to use ext4_sb_getblk() > > instead of sb_getblk(). > > For e.g. __ext4_get_inode_loc() > > > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > > --- > > fs/ext4/block_validity.c | 4 +--- > > fs/ext4/ext4.h | 12 ++++++++++++ > > fs/ext4/super.c | 9 +++++++++ > > 3 files changed, 22 insertions(+), 3 deletions(-) > > > > diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c > > index 5504f72bbbbe..69347fab7c38 100644 > > --- a/fs/ext4/block_validity.c > > +++ b/fs/ext4/block_validity.c > > @@ -301,9 +301,7 @@ int ext4_sb_block_valid(struct super_block *sb, struct inode *inode, > > struct rb_node *n; > > int ret = 1; > > > > - if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || > > - (start_blk + count < start_blk) || > > - (start_blk + count > ext4_blocks_count(sbi->s_es))) > > + if (!ext4_sb_block_valid_fastcheck(sbi->s_es, start_blk, count)) > > return 0; > > > > /* > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > > index 9bca5565547b..79d0e45185d3 100644 > > --- a/fs/ext4/ext4.h > > +++ b/fs/ext4/ext4.h > > @@ -3072,6 +3072,8 @@ extern struct buffer_head *ext4_sb_bread(struct super_block *sb, > > sector_t block, blk_opf_t op_flags); > > extern struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb, > > sector_t block); > > +extern struct buffer_head *ext4_sb_getblk(struct super_block *sb, > > + sector_t block); > > extern void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags, > > bh_end_io_t *end_io); > > extern int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags, > > @@ -3358,6 +3360,16 @@ static inline unsigned int ext4_flex_bg_size(struct ext4_sb_info *sbi) > > return 1 << sbi->s_log_groups_per_flex; > > } > > > > +static inline bool ext4_sb_block_valid_fastcheck(struct ext4_super_block *es, > > + sector_t start_blk, unsigned int count) > > +{ > > + if ((start_blk <= le32_to_cpu(es->s_first_data_block)) || > > + (start_blk + count < start_blk) || > > + (start_blk + count > ext4_blocks_count(es))) > > + return false; > > + return true; > > +} > > + > > #define ext4_std_error(sb, errno) \ > > do { \ > > if ((errno)) \ > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > > index 9a66abcca1a8..5b29272ad9a9 100644 > > --- a/fs/ext4/super.c > > +++ b/fs/ext4/super.c > > @@ -269,6 +269,15 @@ void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block) > > } > > } > > > > +struct buffer_head *ext4_sb_getblk(struct super_block *sb, sector_t block) > > +{ > > + struct ext4_super_block *es = EXT4_SB(sb)->s_es; > > + > > + if (!ext4_sb_block_valid_fastcheck(es, block, 1)) > > + return ERR_PTR(-EFSCORRUPTED); > > + return sb_getblk(sb, block); > > +} > > + > > static int ext4_verify_csum_type(struct super_block *sb, > > struct ext4_super_block *es) > > { > > -- > > 2.25.1 > > > > -ritesh > > > > > > > > > Honza > > > > > > > --- > > > > fs/ext4/inode.c | 10 +++++++++- > > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > > index 601214453c3a..5e171879fa23 100644 > > > > --- a/fs/ext4/inode.c > > > > +++ b/fs/ext4/inode.c > > > > @@ -4466,9 +4466,17 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, > > > > inodes_per_block = EXT4_SB(sb)->s_inodes_per_block; > > > > inode_offset = ((ino - 1) % > > > > EXT4_INODES_PER_GROUP(sb)); > > > > - block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); > > > > iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb); > > > > > > > > + block = ext4_inode_table(sb, gdp); > > > > + if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) || > > > > + (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) { > > > > + ext4_error(sb, "Invalid inode table block %llu in " > > > > + "block_group %u", block, iloc->block_group); > > > > + return -EFSCORRUPTED; > > > > + } > > > > + block += (inode_offset / inodes_per_block); > > > > + > > > > bh = sb_getblk(sb, block); > > > > if (unlikely(!bh)) > > > > return -ENOMEM; > > > > -- > > > > 2.31.1 > > > > > > > -- > > > Jan Kara <jack@suse.com> > > > SUSE Labs, CR > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-18 23:15 ` Ritesh Harjani (IBM) @ 2022-08-19 8:44 ` Jan Kara 0 siblings, 0 replies; 15+ messages in thread From: Jan Kara @ 2022-08-19 8:44 UTC (permalink / raw) To: Ritesh Harjani (IBM) Cc: Jan Kara, Baokun Li, linux-ext4, tytso, adilger.kernel, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3 On Fri 19-08-22 04:45:41, Ritesh Harjani (IBM) wrote: > On 22/08/18 07:23PM, Jan Kara wrote: > > On Thu 18-08-22 20:13:53, Ritesh Harjani (IBM) wrote: > > > On 22/08/17 04:31PM, Jan Kara wrote: > > > > On Wed 17-08-22 21:27:01, Baokun Li wrote: > > > > > In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" > > > > > and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met. > > > > > > > > > > In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, > > > > > the function returns -ENOMEM. > > > > > > > > > > In __getblk_slow, if the return value of grow_buffers is less than 0, > > > > > the function returns NULL. > > > > > > > > > > When the three processes are connected in series like the following stack, > > > > > an infinite loop may occur: > > > > > > > > > > do_writepages <--- keep retrying > > > > > ext4_writepages > > > > > mpage_map_and_submit_extent > > > > > mpage_map_one_extent > > > > > ext4_map_blocks > > > > > ext4_ext_map_blocks > > > > > ext4_ext_handle_unwritten_extents > > > > > ext4_ext_convert_to_initialized > > > > > ext4_split_extent > > > > > ext4_split_extent_at > > > > > __ext4_ext_dirty > > > > > __ext4_mark_inode_dirty > > > > > ext4_reserve_inode_write > > > > > ext4_get_inode_loc > > > > > __ext4_get_inode_loc <--- return -ENOMEM > > > > > sb_getblk > > > > > __getblk_gfp > > > > > __getblk_slow <--- return NULL > > > > > grow_buffers > > > > > grow_dev_page <--- return -ENXIO > > > > > ret = (block < end_block) ? 1 : -ENXIO; > > > > > > > > > > In this issue, bg_inode_table_hi is overwritten as an incorrect value. > > > > > As a result, `block < end_block` cannot be met in grow_dev_page. > > > > > Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages > > > > > keeps retrying. As a result, the writeback process is in the D state due > > > > > to an infinite loop. > > > > > > > > > > Add a check on inode table block in the __ext4_get_inode_loc function by > > > > > referring to ext4_read_inode_bitmap to avoid this infinite loop. > > > > > > > > > > Signed-off-by: Baokun Li <libaokun1@huawei.com> > > > > > > > > Thanks for the fixes. Normally, we check that inode table is fine in > > > > ext4_check_descriptors() (and those checks are much stricter) so it seems > > > > unnecessary to check it again here. I understand that in your case it was > > > > resize that corrupted the group descriptor after the filesystem was mounted > > > > which is nasty but there's much more metadata that can be corrupted like > > > > this and it's infeasible to check each metadata block before we use it. > > > > > > > > IMHO a proper fix to this class of issues would be for sb_getblk() to > > > > return proper error so that we can distinguish ENOMEM from other errors. > > > > But that will be a larger undertaking... > > > > > > > > > > Hi Jan, > > > > > > How about adding a wrapper around sb_getblk() which will do the basic block > > > bound checks for ext4. Then we can carefully convert all the callers of > > > sb_getblk() in ext4 to call ext4_sb_getblk(). > > > > > > ext4_sb_getblk() will then return either of below - > > > 1. ERR_PTR(-EFSCORRUPTED) > > > 2. NULL > > > 3. struct buffer_head* > > > > > > It's caller can then implement the proper error handling. > > > > > > Folding a small patch to implement the simple bound check. Is this the right > > > approach? > > > > Yep, looks sensible to me. Maybe I'd just make ext4_sb_getblk() return bh > > or ERR_PTR so something like ERR_PTR(-EFSCORRUPTED), ERR_PTR(-ENXIO), or bh > > pointer. > > Sure, Thanks Jan. Will do that once I clear some confusion w.r.t > "start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)" > > At some places this is checked with "<= s_first_data_block" > e.g. fs/ext4/ialloc.c, ext4_sb_block_valid() > > while at some places I see it to be "< s_first_data_block" > e.g. fs/ext4/mballoc.c, fs/ext4/mmp.c Well, superblock is stored at s_first_data_block offset. So strictly speaking the check should be < s_first_data_block because that block is a valid filesystem block. OTOH in most places you are not supposed to look at block with the superblock so stricter <= s_first_data_block is fine. > Will spend sometime to understand why the difference and if there is anything > I might miss here for off-by-one check. > > Adding more to the confusion would be difference w.r.t blocksize = 1024 v/s > other blocksizes. Based on the blocksize value I guess, s_first_data_block can > be different (0/1??). Or can bigalloc can change this... > ...Will look more into this. That's what we discussed on our ext4 call yesterday. Normally, s_first_data_block is 1 for blocksize 1k and 0 for blocksize > 1k. So for 1k blocksize the first group begins with block 1 and not block 0. Effectively the whole filesystem is shifted by 1 block with 1k blocksize. When bigalloc comes to play and blocksize is 1k, things are even more interesting because there, the first group starts at block 0 while s_first_data_block is still 1, because superblock is stored in block 1 of cluster 0. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-08-18 14:43 ` Ritesh Harjani (IBM) 2022-08-18 17:23 ` Jan Kara @ 2022-11-28 20:44 ` Theodore Ts'o 2022-11-29 8:54 ` Ritesh Harjani (IBM) 1 sibling, 1 reply; 15+ messages in thread From: Theodore Ts'o @ 2022-11-28 20:44 UTC (permalink / raw) To: Ritesh Harjani (IBM) Cc: Jan Kara, Baokun Li, linux-ext4, adilger.kernel, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3 On Thu, Aug 18, 2022 at 08:13:53PM +0530, Ritesh Harjani (IBM) wrote: > Folding a small patch to implement the simple bound check. Is this the right > approach? > > From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> > Date: Thu, 18 Aug 2022 07:53:58 -0500 > Subject: [RFC] ext4: Add ext4_sb_getblk() wrapper for block bounds checking > > We might need more bounds checking on the block before calling sb_getblk(). > This helper does that and if it is not valid then returns ERR_PTR(-EFSCORRUPTED) > Later we will need to carefully convert the callers to use ext4_sb_getblk() > instead of sb_getblk(). Hey Ritesh, I was going through some old patches and came across this RFC patch. Have you had a chance to polish this up? I don't think I've seen a newer version of this patch, but maybe I missed it. Thanks, - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop 2022-11-28 20:44 ` Theodore Ts'o @ 2022-11-29 8:54 ` Ritesh Harjani (IBM) 0 siblings, 0 replies; 15+ messages in thread From: Ritesh Harjani (IBM) @ 2022-11-29 8:54 UTC (permalink / raw) To: Theodore Ts'o Cc: Jan Kara, Baokun Li, linux-ext4, adilger.kernel, lczerner, enwlinux, linux-kernel, yi.zhang, yebin10, yukuai3 On 22/11/28 03:44PM, Theodore Ts'o wrote: > On Thu, Aug 18, 2022 at 08:13:53PM +0530, Ritesh Harjani (IBM) wrote: > > Folding a small patch to implement the simple bound check. Is this the right > > approach? > > > > From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> > > Date: Thu, 18 Aug 2022 07:53:58 -0500 > > Subject: [RFC] ext4: Add ext4_sb_getblk() wrapper for block bounds checking > > > > We might need more bounds checking on the block before calling sb_getblk(). > > This helper does that and if it is not valid then returns ERR_PTR(-EFSCORRUPTED) > > Later we will need to carefully convert the callers to use ext4_sb_getblk() > > instead of sb_getblk(). > > Hey Ritesh, > > I was going through some old patches and came across this RFC patch. > Have you had a chance to polish this up? I don't think I've seen a > newer version of this patch, but maybe I missed it. Hello Ted, Sorry, I guess I completely forgot about this, since around those dates I was travelling. I will definitely try and get back to this once I finish few ongoing things on my plate. Thanks for bringing it up! -ritesh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing 2022-08-17 13:26 [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Baokun Li 2022-08-17 13:27 ` [PATCH 1/2] ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 Baokun Li 2022-08-17 13:27 ` [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop Baokun Li @ 2022-11-29 21:12 ` Theodore Ts'o 2022-11-30 2:08 ` Baokun Li 2 siblings, 1 reply; 15+ messages in thread From: Theodore Ts'o @ 2022-11-29 21:12 UTC (permalink / raw) To: linux-ext4, Baokun Li Cc: Theodore Ts'o, linux-kernel, enwlinux, jack, lczerner, yukuai3, yi.zhang, yebin10, ritesh.list, adilger.kernel On Wed, 17 Aug 2022 21:26:59 +0800, Baokun Li wrote: > We got a issue: the ext4 writeback process was stuck in do_writepages and > do_writepages kept retrying. However, '-ENOMEM' is returned each time, even > if there is still free memory on the current machine. > > We find that the direct cause of this issue is that the bg_inode_table_hi > in the group descriptor is written to an incorrect value, which causes the > inode block found through the inode table to exceed the end_ block。Then, > sb_getblk always returns null, __ext4_get_inode_loc returns `-ENOMEM`, > and do_writepages keeps retrying. > > [...] Applied, thanks! [1/2] ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 commit: 496fb12f8e236f303de6bc73a0334dd50c4eb64a [2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop commit: bfb0625e8e86f8797264b1c7d10e146afe243d23 Best regards, -- Theodore Ts'o <tytso@mit.edu> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing 2022-11-29 21:12 ` [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Theodore Ts'o @ 2022-11-30 2:08 ` Baokun Li 2022-12-01 3:42 ` Theodore Ts'o 0 siblings, 1 reply; 15+ messages in thread From: Baokun Li @ 2022-11-30 2:08 UTC (permalink / raw) To: Theodore Ts'o, linux-ext4 Cc: linux-kernel, enwlinux, jack, lczerner, yukuai3, yi.zhang, yebin10, ritesh.list, adilger.kernel On 2022/11/30 5:12, Theodore Ts'o wrote: > On Wed, 17 Aug 2022 21:26:59 +0800, Baokun Li wrote: >> We got a issue: the ext4 writeback process was stuck in do_writepages and >> do_writepages kept retrying. However, '-ENOMEM' is returned each time, even >> if there is still free memory on the current machine. >> >> We find that the direct cause of this issue is that the bg_inode_table_hi >> in the group descriptor is written to an incorrect value, which causes the >> inode block found through the inode table to exceed the end_ block。Then, >> sb_getblk always returns null, __ext4_get_inode_loc returns `-ENOMEM`, >> and do_writepages keeps retrying. >> >> [...] > Applied, thanks! > > [1/2] ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 > commit: 496fb12f8e236f303de6bc73a0334dd50c4eb64a > [2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop > commit: bfb0625e8e86f8797264b1c7d10e146afe243d23 > > Best regards, Hi Theodore, Thank you very much for applying this patch set! But I thought this patch set was discarded because there was no "Reviewed-by". And a few days ago, I came up with a better solution to the problem fixed by PATCH 1/2. The new solution is called "ext4: fix corruption when online resizing a 1K bigalloc fs", which is in another patch set ("[PATCH v3 0/3] ext4: fix some bugs in online resize") that fixes some online resize problems. This patch set has been reviewed, and I would appreciate it if you could revert PATCH 1/2 and apply the patch set containing the new solution. Sorry for wasting your time without stating that a new solution is available after the old patch. Thanks again! -- With Best Regards, Baokun Li ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing 2022-11-30 2:08 ` Baokun Li @ 2022-12-01 3:42 ` Theodore Ts'o 2022-12-01 6:26 ` Baokun Li 0 siblings, 1 reply; 15+ messages in thread From: Theodore Ts'o @ 2022-12-01 3:42 UTC (permalink / raw) To: Baokun Li Cc: linux-ext4, linux-kernel, enwlinux, jack, lczerner, yukuai3, yi.zhang, yebin10, ritesh.list, adilger.kernel On Wed, Nov 30, 2022 at 10:08:13AM +0800, Baokun Li wrote: > > But I thought this patch set was discarded because there was no > "Reviewed-by". No, it had just slipped through the cracks; and when I was going back through patchwork to see what patches were outstanding, I came across the patch set which had gotten missed during the last cycle; and I didn't notice that it had been superceded with a newer patch which achieved the same goal. I'll drop it from the ext4 tree and then pick up your newer series. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing 2022-12-01 3:42 ` Theodore Ts'o @ 2022-12-01 6:26 ` Baokun Li 0 siblings, 0 replies; 15+ messages in thread From: Baokun Li @ 2022-12-01 6:26 UTC (permalink / raw) To: Theodore Ts'o Cc: linux-ext4, linux-kernel, enwlinux, jack, lczerner, yukuai3, yi.zhang, yebin10, ritesh.list, adilger.kernel On 2022/12/1 11:42, Theodore Ts'o wrote: > On Wed, Nov 30, 2022 at 10:08:13AM +0800, Baokun Li wrote: >> But I thought this patch set was discarded because there was no >> "Reviewed-by". > No, it had just slipped through the cracks; and when I was going back > through patchwork to see what patches were outstanding, I came across > the patch set which had gotten missed during the last cycle; and I > didn't notice that it had been superceded with a newer patch which > achieved the same goal. > > I'll drop it from the ext4 tree and then pick up your newer series. > > - Ted > Thank you very much! 😉 -- With Best Regards, Baokun Li . ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-12-01 6:26 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-17 13:26 [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Baokun Li 2022-08-17 13:27 ` [PATCH 1/2] ext4: fix GDT corruption after online resizing with bigalloc enable and blocksize is 1024 Baokun Li 2022-08-17 13:27 ` [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop Baokun Li 2022-08-17 14:31 ` Jan Kara 2022-08-18 1:54 ` Baokun Li 2022-08-18 14:43 ` Ritesh Harjani (IBM) 2022-08-18 17:23 ` Jan Kara 2022-08-18 23:15 ` Ritesh Harjani (IBM) 2022-08-19 8:44 ` Jan Kara 2022-11-28 20:44 ` Theodore Ts'o 2022-11-29 8:54 ` Ritesh Harjani (IBM) 2022-11-29 21:12 ` [PATCH 0/2] ext4: fix a infinite loop in do_writepages after online resizing Theodore Ts'o 2022-11-30 2:08 ` Baokun Li 2022-12-01 3:42 ` Theodore Ts'o 2022-12-01 6:26 ` Baokun Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).