linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.11 02/44] ext4: shrink race window in ext4_should_retry_alloc()
       [not found] <20210325112459.1926846-1-sashal@kernel.org>
@ 2021-03-25 11:24 ` Sasha Levin
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 03/44] ext4: add reclaim checks to xattr code Sasha Levin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2021-03-25 11:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Whitney, Theodore Ts'o, Sasha Levin, linux-ext4

From: Eric Whitney <enwlinux@gmail.com>

[ Upstream commit efc61345274d6c7a46a0570efbc916fcbe3e927b ]

When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it
fails at significant rates on the two test scenarios that disable
delayed allocation (ext3conv and data_journal) and force actual block
allocation for the fallocate and pwrite functions in the test.  The
failure rate on 5.10 for both ext3conv and data_journal on one test
system typically runs about 85%.  On 5.11, the failure rate on ext3conv
sometimes drops to as low as 1% while the rate on data_journal
increases to nearly 100%.

The observed failures are largely due to ext4_should_retry_alloc()
cutting off block allocation retries when s_mb_free_pending (used to
indicate that a transaction in progress will free blocks) is 0.
However, free space is usually available when this occurs during runs
of generic/371.  It appears that a thread attempting to allocate
blocks is just missing transaction commits in other threads that
increase the free cluster count and reset s_mb_free_pending while
the allocating thread isn't running.  Explicitly testing for free space
availability avoids this race.

The current code uses a post-increment operator in the conditional
expression that determines whether the retry limit has been exceeded.
This means that the conditional expression uses the value of the
retry counter before it's increased, resulting in an extra retry cycle.
The current code actually retries twice before hitting its retry limit
rather than once.

Increasing the retry limit to 3 from the current actual maximum retry
count of 2 in combination with the change described above reduces the
observed failure rate to less that 0.1% on both ext3conv and
data_journal with what should be limited impact on users sensitive to
the overhead caused by retries.

A per filesystem percpu counter exported via sysfs is added to allow
users or developers to track the number of times the retry limit is
exceeded without resorting to debugging methods.  This should provide
some insight into worst case retry behavior.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/balloc.c | 38 ++++++++++++++++++++++++++------------
 fs/ext4/ext4.h   |  1 +
 fs/ext4/super.c  |  5 +++++
 fs/ext4/sysfs.c  |  7 +++++++
 4 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index f45f9feebe59..74a5172c2d83 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -626,27 +626,41 @@ int ext4_claim_free_clusters(struct ext4_sb_info *sbi,
 
 /**
  * ext4_should_retry_alloc() - check if a block allocation should be retried
- * @sb:			super block
- * @retries:		number of attemps has been made
+ * @sb:			superblock
+ * @retries:		number of retry attempts made so far
  *
- * ext4_should_retry_alloc() is called when ENOSPC is returned, and if
- * it is profitable to retry the operation, this function will wait
- * for the current or committing transaction to complete, and then
- * return TRUE.  We will only retry once.
+ * ext4_should_retry_alloc() is called when ENOSPC is returned while
+ * attempting to allocate blocks.  If there's an indication that a pending
+ * journal transaction might free some space and allow another attempt to
+ * succeed, this function will wait for the current or committing transaction
+ * to complete and then return TRUE.
  */
 int ext4_should_retry_alloc(struct super_block *sb, int *retries)
 {
-	if (!ext4_has_free_clusters(EXT4_SB(sb), 1, 0) ||
-	    (*retries)++ > 1 ||
-	    !EXT4_SB(sb)->s_journal)
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+	if (!sbi->s_journal)
 		return 0;
 
-	smp_mb();
-	if (EXT4_SB(sb)->s_mb_free_pending == 0)
+	if (++(*retries) > 3) {
+		percpu_counter_inc(&sbi->s_sra_exceeded_retry_limit);
 		return 0;
+	}
 
+	/*
+	 * if there's no indication that blocks are about to be freed it's
+	 * possible we just missed a transaction commit that did so
+	 */
+	smp_mb();
+	if (sbi->s_mb_free_pending == 0)
+		return ext4_has_free_clusters(sbi, 1, 0);
+
+	/*
+	 * it's possible we've just missed a transaction commit here,
+	 * so ignore the returned status
+	 */
 	jbd_debug(1, "%s: retrying operation after ENOSPC\n", sb->s_id);
-	jbd2_journal_force_commit_nested(EXT4_SB(sb)->s_journal);
+	(void) jbd2_journal_force_commit_nested(sbi->s_journal);
 	return 1;
 }
 
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2866d249f3d2..8055ade70532 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1484,6 +1484,7 @@ struct ext4_sb_info {
 	struct percpu_counter s_freeinodes_counter;
 	struct percpu_counter s_dirs_counter;
 	struct percpu_counter s_dirtyclusters_counter;
+	struct percpu_counter s_sra_exceeded_retry_limit;
 	struct blockgroup_lock *s_blockgroup_lock;
 	struct proc_dir_entry *s_proc;
 	struct kobject s_kobj;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2ae0af1c88c7..2f5c62a7750e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1210,6 +1210,7 @@ static void ext4_put_super(struct super_block *sb)
 	percpu_counter_destroy(&sbi->s_freeinodes_counter);
 	percpu_counter_destroy(&sbi->s_dirs_counter);
 	percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
+	percpu_counter_destroy(&sbi->s_sra_exceeded_retry_limit);
 	percpu_free_rwsem(&sbi->s_writepages_rwsem);
 #ifdef CONFIG_QUOTA
 	for (i = 0; i < EXT4_MAXQUOTAS; i++)
@@ -5011,6 +5012,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	if (!err)
 		err = percpu_counter_init(&sbi->s_dirtyclusters_counter, 0,
 					  GFP_KERNEL);
+	if (!err)
+		err = percpu_counter_init(&sbi->s_sra_exceeded_retry_limit, 0,
+					  GFP_KERNEL);
 	if (!err)
 		err = percpu_init_rwsem(&sbi->s_writepages_rwsem);
 
@@ -5124,6 +5128,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	percpu_counter_destroy(&sbi->s_freeinodes_counter);
 	percpu_counter_destroy(&sbi->s_dirs_counter);
 	percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
+	percpu_counter_destroy(&sbi->s_sra_exceeded_retry_limit);
 	percpu_free_rwsem(&sbi->s_writepages_rwsem);
 failed_mount5:
 	ext4_ext_release(sb);
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 075aa3a19ff5..a3d08276d441 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -24,6 +24,7 @@ typedef enum {
 	attr_session_write_kbytes,
 	attr_lifetime_write_kbytes,
 	attr_reserved_clusters,
+	attr_sra_exceeded_retry_limit,
 	attr_inode_readahead,
 	attr_trigger_test_error,
 	attr_first_error_time,
@@ -202,6 +203,7 @@ EXT4_ATTR_FUNC(delayed_allocation_blocks, 0444);
 EXT4_ATTR_FUNC(session_write_kbytes, 0444);
 EXT4_ATTR_FUNC(lifetime_write_kbytes, 0444);
 EXT4_ATTR_FUNC(reserved_clusters, 0644);
+EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
 
 EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
 		 ext4_sb_info, s_inode_readahead_blks);
@@ -251,6 +253,7 @@ static struct attribute *ext4_attrs[] = {
 	ATTR_LIST(session_write_kbytes),
 	ATTR_LIST(lifetime_write_kbytes),
 	ATTR_LIST(reserved_clusters),
+	ATTR_LIST(sra_exceeded_retry_limit),
 	ATTR_LIST(inode_readahead_blks),
 	ATTR_LIST(inode_goal),
 	ATTR_LIST(mb_stats),
@@ -374,6 +377,10 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
 		return snprintf(buf, PAGE_SIZE, "%llu\n",
 				(unsigned long long)
 				atomic64_read(&sbi->s_resv_clusters));
+	case attr_sra_exceeded_retry_limit:
+		return snprintf(buf, PAGE_SIZE, "%llu\n",
+				(unsigned long long)
+			percpu_counter_sum(&sbi->s_sra_exceeded_retry_limit));
 	case attr_inode_readahead:
 	case attr_pointer_ui:
 		if (!ptr)
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 5.11 03/44] ext4: add reclaim checks to xattr code
       [not found] <20210325112459.1926846-1-sashal@kernel.org>
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 02/44] ext4: shrink race window in ext4_should_retry_alloc() Sasha Levin
@ 2021-03-25 11:24 ` Sasha Levin
  2021-03-25 14:30   ` Jan Kara
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 04/44] fs/ext4: fix integer overflow in s_log_groups_per_flex Sasha Levin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Sasha Levin @ 2021-03-25 11:24 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Jan Kara, Theodore Ts'o, Sasha Levin, linux-ext4

From: Jan Kara <jack@suse.cz>

[ Upstream commit 163f0ec1df33cf468509ff38cbcbb5eb0d7fac60 ]

Syzbot is reporting that ext4 can enter fs reclaim from kvmalloc() while
the transaction is started like:

  fs_reclaim_acquire+0x117/0x150 mm/page_alloc.c:4340
  might_alloc include/linux/sched/mm.h:193 [inline]
  slab_pre_alloc_hook mm/slab.h:493 [inline]
  slab_alloc_node mm/slub.c:2817 [inline]
  __kmalloc_node+0x5f/0x430 mm/slub.c:4015
  kmalloc_node include/linux/slab.h:575 [inline]
  kvmalloc_node+0x61/0xf0 mm/util.c:587
  kvmalloc include/linux/mm.h:781 [inline]
  ext4_xattr_inode_cache_find fs/ext4/xattr.c:1465 [inline]
  ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1508 [inline]
  ext4_xattr_set_entry+0x1ce6/0x3780 fs/ext4/xattr.c:1649
  ext4_xattr_ibody_set+0x78/0x2b0 fs/ext4/xattr.c:2224
  ext4_xattr_set_handle+0x8f4/0x13e0 fs/ext4/xattr.c:2380
  ext4_xattr_set+0x13a/0x340 fs/ext4/xattr.c:2493

This should be impossible since transaction start sets PF_MEMALLOC_NOFS.
Add some assertions to the code to catch if something isn't working as
expected early.

Link: https://lore.kernel.org/linux-ext4/000000000000563a0205bafb7970@google.com/
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210222171626.21884-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/xattr.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 372208500f4e..083c95126781 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1462,6 +1462,9 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
 	if (!ce)
 		return NULL;
 
+	WARN_ON_ONCE(ext4_handle_valid(journal_current_handle()) &&
+		     !(current->flags & PF_MEMALLOC_NOFS));
+
 	ea_data = kvmalloc(value_len, GFP_KERNEL);
 	if (!ea_data) {
 		mb_cache_entry_put(ea_inode_cache, ce);
@@ -2327,6 +2330,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
 			error = -ENOSPC;
 			goto cleanup;
 		}
+		WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS));
 	}
 
 	error = ext4_reserve_inode_write(handle, inode, &is.iloc);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 5.11 04/44] fs/ext4: fix integer overflow in s_log_groups_per_flex
       [not found] <20210325112459.1926846-1-sashal@kernel.org>
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 02/44] ext4: shrink race window in ext4_should_retry_alloc() Sasha Levin
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 03/44] ext4: add reclaim checks to xattr code Sasha Levin
@ 2021-03-25 11:24 ` Sasha Levin
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 05/44] ext4: fix bh ref count on error paths Sasha Levin
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 41/44] ext4: do not iput inode under running transaction in ext4_rename() Sasha Levin
  4 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2021-03-25 11:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sabyrzhan Tasbolatov, syzbot+a8b4b0c60155e87e9484, Jan Kara,
	Theodore Ts'o, Sasha Levin, linux-ext4

From: Sabyrzhan Tasbolatov <snovitoll@gmail.com>

[ Upstream commit f91436d55a279f045987e8b8c1385585dca54be9 ]

syzbot found UBSAN: shift-out-of-bounds in ext4_mb_init [1], when
1 << sbi->s_es->s_log_groups_per_flex is bigger than UINT_MAX,
where sbi->s_mb_prefetch is unsigned integer type.

32 is the maximum allowed power of s_log_groups_per_flex. Following if
check will also trigger UBSAN shift-out-of-bound:

if (1 << sbi->s_es->s_log_groups_per_flex >= UINT_MAX) {

So I'm checking it against the raw number, perhaps there is another way
to calculate UINT_MAX max power. Also use min_t as to make sure it's
uint type.

[1] UBSAN: shift-out-of-bounds in fs/ext4/mballoc.c:2713:24
shift exponent 60 is too large for 32-bit type 'int'
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x137/0x1be lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:148 [inline]
 __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
 ext4_mb_init_backend fs/ext4/mballoc.c:2713 [inline]
 ext4_mb_init+0x19bc/0x19f0 fs/ext4/mballoc.c:2898
 ext4_fill_super+0xc2ec/0xfbe0 fs/ext4/super.c:4983

Reported-by: syzbot+a8b4b0c60155e87e9484@syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210224095800.3350002-1-snovitoll@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/mballoc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 99bf091fee10..a02fadf4fc84 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2709,8 +2709,15 @@ static int ext4_mb_init_backend(struct super_block *sb)
 	}
 
 	if (ext4_has_feature_flex_bg(sb)) {
-		/* a single flex group is supposed to be read by a single IO */
-		sbi->s_mb_prefetch = min(1 << sbi->s_es->s_log_groups_per_flex,
+		/* a single flex group is supposed to be read by a single IO.
+		 * 2 ^ s_log_groups_per_flex != UINT_MAX as s_mb_prefetch is
+		 * unsigned integer, so the maximum shift is 32.
+		 */
+		if (sbi->s_es->s_log_groups_per_flex >= 32) {
+			ext4_msg(sb, KERN_ERR, "too many log groups per flexible block group");
+			goto err_freesgi;
+		}
+		sbi->s_mb_prefetch = min_t(uint, 1 << sbi->s_es->s_log_groups_per_flex,
 			BLK_MAX_SEGMENT_SIZE >> (sb->s_blocksize_bits - 9));
 		sbi->s_mb_prefetch *= 8; /* 8 prefetch IOs in flight at most */
 	} else {
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 5.11 05/44] ext4: fix bh ref count on error paths
       [not found] <20210325112459.1926846-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 04/44] fs/ext4: fix integer overflow in s_log_groups_per_flex Sasha Levin
@ 2021-03-25 11:24 ` Sasha Levin
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 41/44] ext4: do not iput inode under running transaction in ext4_rename() Sasha Levin
  4 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2021-03-25 11:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zhaolong Zhang, Theodore Ts'o, Sasha Levin, linux-ext4

From: Zhaolong Zhang <zhangzl2013@126.com>

[ Upstream commit c915fb80eaa6194fa9bd0a4487705cd5b0dda2f1 ]

__ext4_journalled_writepage should drop bhs' ref count on error paths

Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Link: https://lore.kernel.org/r/1614678151-70481-1-git-send-email-zhangzl2013@126.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/inode.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c173c8405856..ffbd459e2b37 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1937,13 +1937,13 @@ static int __ext4_journalled_writepage(struct page *page,
 	if (!ret)
 		ret = err;
 
-	if (!ext4_has_inline_data(inode))
-		ext4_walk_page_buffers(NULL, page_bufs, 0, len,
-				       NULL, bput_one);
 	ext4_set_inode_state(inode, EXT4_STATE_JDATA);
 out:
 	unlock_page(page);
 out_no_pagelock:
+	if (!inline_data && page_bufs)
+		ext4_walk_page_buffers(NULL, page_bufs, 0, len,
+				       NULL, bput_one);
 	brelse(inode_bh);
 	return ret;
 }
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 5.11 41/44] ext4: do not iput inode under running transaction in ext4_rename()
       [not found] <20210325112459.1926846-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 05/44] ext4: fix bh ref count on error paths Sasha Levin
@ 2021-03-25 11:24 ` Sasha Levin
  4 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2021-03-25 11:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: zhangyi (F), Theodore Ts'o, Sasha Levin, linux-ext4

From: "zhangyi (F)" <yi.zhang@huawei.com>

[ Upstream commit 5dccdc5a1916d4266edd251f20bbbb113a5c495f ]

In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into
directory, it ends up dropping new created whiteout inode under the
running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode
under running transaction"), we follow the assumptions that evict() does
not get called from a transaction context but in ext4_rename() it breaks
this suggestion. Although it's not a real problem, better to obey it, so
this patch add inode to orphan list and stop transaction before final
iput().

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/namei.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index df0368d578b1..6d954d681502 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3760,14 +3760,14 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	 */
 	retval = -ENOENT;
 	if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
-		goto end_rename;
+		goto release_bh;
 
 	new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
 				 &new.de, &new.inlined);
 	if (IS_ERR(new.bh)) {
 		retval = PTR_ERR(new.bh);
 		new.bh = NULL;
-		goto end_rename;
+		goto release_bh;
 	}
 	if (new.bh) {
 		if (!new.inode) {
@@ -3784,15 +3784,13 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		handle = ext4_journal_start(old.dir, EXT4_HT_DIR, credits);
 		if (IS_ERR(handle)) {
 			retval = PTR_ERR(handle);
-			handle = NULL;
-			goto end_rename;
+			goto release_bh;
 		}
 	} else {
 		whiteout = ext4_whiteout_for_rename(&old, credits, &handle);
 		if (IS_ERR(whiteout)) {
 			retval = PTR_ERR(whiteout);
-			whiteout = NULL;
-			goto end_rename;
+			goto release_bh;
 		}
 	}
 
@@ -3926,16 +3924,18 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 			ext4_setent(handle, &old,
 				old.inode->i_ino, old_file_type);
 			drop_nlink(whiteout);
+			ext4_orphan_add(handle, whiteout);
 		}
 		unlock_new_inode(whiteout);
+		ext4_journal_stop(handle);
 		iput(whiteout);
-
+	} else {
+		ext4_journal_stop(handle);
 	}
+release_bh:
 	brelse(old.dir_bh);
 	brelse(old.bh);
 	brelse(new.bh);
-	if (handle)
-		ext4_journal_stop(handle);
 	return retval;
 }
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH AUTOSEL 5.11 03/44] ext4: add reclaim checks to xattr code
  2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 03/44] ext4: add reclaim checks to xattr code Sasha Levin
@ 2021-03-25 14:30   ` Jan Kara
  2021-03-30 21:14     ` Sasha Levin
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2021-03-25 14:30 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, stable, Jan Kara, Theodore Ts'o, linux-ext4

Sasha, just be aware that this commit was added to help tracking down a
particular syzbot report. As such there's no point in carrying it in
-stable but there's no big harm either... Just one patch more.

								Honza

On Thu 25-03-21 07:24:18, Sasha Levin wrote:
> From: Jan Kara <jack@suse.cz>
> 
> [ Upstream commit 163f0ec1df33cf468509ff38cbcbb5eb0d7fac60 ]
> 
> Syzbot is reporting that ext4 can enter fs reclaim from kvmalloc() while
> the transaction is started like:
> 
>   fs_reclaim_acquire+0x117/0x150 mm/page_alloc.c:4340
>   might_alloc include/linux/sched/mm.h:193 [inline]
>   slab_pre_alloc_hook mm/slab.h:493 [inline]
>   slab_alloc_node mm/slub.c:2817 [inline]
>   __kmalloc_node+0x5f/0x430 mm/slub.c:4015
>   kmalloc_node include/linux/slab.h:575 [inline]
>   kvmalloc_node+0x61/0xf0 mm/util.c:587
>   kvmalloc include/linux/mm.h:781 [inline]
>   ext4_xattr_inode_cache_find fs/ext4/xattr.c:1465 [inline]
>   ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1508 [inline]
>   ext4_xattr_set_entry+0x1ce6/0x3780 fs/ext4/xattr.c:1649
>   ext4_xattr_ibody_set+0x78/0x2b0 fs/ext4/xattr.c:2224
>   ext4_xattr_set_handle+0x8f4/0x13e0 fs/ext4/xattr.c:2380
>   ext4_xattr_set+0x13a/0x340 fs/ext4/xattr.c:2493
> 
> This should be impossible since transaction start sets PF_MEMALLOC_NOFS.
> Add some assertions to the code to catch if something isn't working as
> expected early.
> 
> Link: https://lore.kernel.org/linux-ext4/000000000000563a0205bafb7970@google.com/
> Signed-off-by: Jan Kara <jack@suse.cz>
> Link: https://lore.kernel.org/r/20210222171626.21884-1-jack@suse.cz
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  fs/ext4/xattr.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 372208500f4e..083c95126781 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1462,6 +1462,9 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value,
>  	if (!ce)
>  		return NULL;
>  
> +	WARN_ON_ONCE(ext4_handle_valid(journal_current_handle()) &&
> +		     !(current->flags & PF_MEMALLOC_NOFS));
> +
>  	ea_data = kvmalloc(value_len, GFP_KERNEL);
>  	if (!ea_data) {
>  		mb_cache_entry_put(ea_inode_cache, ce);
> @@ -2327,6 +2330,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
>  			error = -ENOSPC;
>  			goto cleanup;
>  		}
> +		WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS));
>  	}
>  
>  	error = ext4_reserve_inode_write(handle, inode, &is.iloc);
> -- 
> 2.30.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH AUTOSEL 5.11 03/44] ext4: add reclaim checks to xattr code
  2021-03-25 14:30   ` Jan Kara
@ 2021-03-30 21:14     ` Sasha Levin
  0 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2021-03-30 21:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, stable, Theodore Ts'o, linux-ext4

On Thu, Mar 25, 2021 at 03:30:20PM +0100, Jan Kara wrote:
>Sasha, just be aware that this commit was added to help tracking down a
>particular syzbot report. As such there's no point in carrying it in
>-stable but there's no big harm either... Just one patch more.

Yup, I'd rather keep it to see if we get reports of this in stable
kernels. Better a (harmless) warning than a silent corruption.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-30 21:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210325112459.1926846-1-sashal@kernel.org>
2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 02/44] ext4: shrink race window in ext4_should_retry_alloc() Sasha Levin
2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 03/44] ext4: add reclaim checks to xattr code Sasha Levin
2021-03-25 14:30   ` Jan Kara
2021-03-30 21:14     ` Sasha Levin
2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 04/44] fs/ext4: fix integer overflow in s_log_groups_per_flex Sasha Levin
2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 05/44] ext4: fix bh ref count on error paths Sasha Levin
2021-03-25 11:24 ` [PATCH AUTOSEL 5.11 41/44] ext4: do not iput inode under running transaction in ext4_rename() Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).