* [PATCH v3 0/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
@ 2020-02-19 18:30 Eric Biggers
2020-02-19 18:30 ` [PATCH v3 1/2] ext4: rename s_journal_flag_rwsem to s_writepages_rwsem Eric Biggers
2020-02-19 18:30 ` [PATCH v3 2/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL Eric Biggers
0 siblings, 2 replies; 7+ messages in thread
From: Eric Biggers @ 2020-02-19 18:30 UTC (permalink / raw)
To: linux-ext4, Theodore Ts'o; +Cc: Jan Kara
This series fixes a race between writepages and enabling EXT4_EXTENTS_FL
that could cause a WARN_ON() in ext4_add_complete_io() to be hit. Patch
1 is a trivial renaming in preparation for patch 2 which is the actual
fix. See patch 2 for the full details.
Changed in v3:
Do the renaming in a separate patch.
Changed in v2:
Instead of making ext4_writepages() read EXT4_EXTENTS_FL only once,
make it so that EXT4_EXTENTS_FL can't be changed while
ext4_writepages() is running.
Eric Biggers (2):
ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
fs/ext4/ext4.h | 7 +++++--
fs/ext4/inode.c | 14 +++++++-------
fs/ext4/migrate.c | 27 +++++++++++++++++++--------
fs/ext4/super.c | 6 +++---
4 files changed, 34 insertions(+), 20 deletions(-)
base-commit: c96dceeabf765d0b1b1f29c3bf50a5c01315b820
--
2.25.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 1/2] ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
2020-02-19 18:30 [PATCH v3 0/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL Eric Biggers
@ 2020-02-19 18:30 ` Eric Biggers
2020-02-20 9:14 ` Jan Kara
2020-02-19 18:30 ` [PATCH v3 2/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL Eric Biggers
1 sibling, 1 reply; 7+ messages in thread
From: Eric Biggers @ 2020-02-19 18:30 UTC (permalink / raw)
To: linux-ext4, Theodore Ts'o; +Cc: Jan Kara
From: Eric Biggers <ebiggers@google.com>
In preparation for making s_journal_flag_rwsem synchronize
ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
flags (rather than just JOURNAL_DATA as it does currently), rename it to
s_writepages_rwsem.
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/inode.c | 14 +++++++-------
fs/ext4/super.c | 6 +++---
3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4441331d06cc4..487a7b430b9dd 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1553,7 +1553,7 @@ struct ext4_sb_info {
struct ratelimit_state s_msg_ratelimit_state;
/* Barrier between changing inodes' journal flags and writepages ops. */
- struct percpu_rw_semaphore s_journal_flag_rwsem;
+ struct percpu_rw_semaphore s_writepages_rwsem;
struct dax_device *s_daxdev;
#ifdef CONFIG_EXT4_DEBUG
unsigned long s_simulate_fail;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c04a15fc8b6ad..f49c48ea2f170 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2628,7 +2628,7 @@ static int ext4_writepages(struct address_space *mapping,
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO;
- percpu_down_read(&sbi->s_journal_flag_rwsem);
+ percpu_down_read(&sbi->s_writepages_rwsem);
trace_ext4_writepages(inode, wbc);
/*
@@ -2849,7 +2849,7 @@ static int ext4_writepages(struct address_space *mapping,
out_writepages:
trace_ext4_writepages_result(inode, wbc, ret,
nr_to_write - wbc->nr_to_write);
- percpu_up_read(&sbi->s_journal_flag_rwsem);
+ percpu_up_read(&sbi->s_writepages_rwsem);
return ret;
}
@@ -2864,13 +2864,13 @@ static int ext4_dax_writepages(struct address_space *mapping,
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO;
- percpu_down_read(&sbi->s_journal_flag_rwsem);
+ percpu_down_read(&sbi->s_writepages_rwsem);
trace_ext4_writepages(inode, wbc);
ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, wbc);
trace_ext4_writepages_result(inode, wbc, ret,
nr_to_write - wbc->nr_to_write);
- percpu_up_read(&sbi->s_journal_flag_rwsem);
+ percpu_up_read(&sbi->s_writepages_rwsem);
return ret;
}
@@ -5861,7 +5861,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
}
}
- percpu_down_write(&sbi->s_journal_flag_rwsem);
+ percpu_down_write(&sbi->s_writepages_rwsem);
jbd2_journal_lock_updates(journal);
/*
@@ -5878,7 +5878,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
err = jbd2_journal_flush(journal);
if (err < 0) {
jbd2_journal_unlock_updates(journal);
- percpu_up_write(&sbi->s_journal_flag_rwsem);
+ percpu_up_write(&sbi->s_writepages_rwsem);
return err;
}
ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
@@ -5886,7 +5886,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
ext4_set_aops(inode);
jbd2_journal_unlock_updates(journal);
- percpu_up_write(&sbi->s_journal_flag_rwsem);
+ percpu_up_write(&sbi->s_writepages_rwsem);
if (val)
up_write(&EXT4_I(inode)->i_mmap_sem);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b0b9150c97735..feb59c7ad395f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1054,7 +1054,7 @@ static void ext4_put_super(struct super_block *sb)
percpu_counter_destroy(&sbi->s_freeinodes_counter);
percpu_counter_destroy(&sbi->s_dirs_counter);
percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
- percpu_free_rwsem(&sbi->s_journal_flag_rwsem);
+ percpu_free_rwsem(&sbi->s_writepages_rwsem);
#ifdef CONFIG_QUOTA
for (i = 0; i < EXT4_MAXQUOTAS; i++)
kfree(get_qf_name(sb, sbi, i));
@@ -4600,7 +4600,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
err = percpu_counter_init(&sbi->s_dirtyclusters_counter, 0,
GFP_KERNEL);
if (!err)
- err = percpu_init_rwsem(&sbi->s_journal_flag_rwsem);
+ err = percpu_init_rwsem(&sbi->s_writepages_rwsem);
if (err) {
ext4_msg(sb, KERN_ERR, "insufficient memory");
@@ -4694,7 +4694,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
percpu_counter_destroy(&sbi->s_freeinodes_counter);
percpu_counter_destroy(&sbi->s_dirs_counter);
percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
- percpu_free_rwsem(&sbi->s_journal_flag_rwsem);
+ percpu_free_rwsem(&sbi->s_writepages_rwsem);
failed_mount5:
ext4_ext_release(sb);
ext4_release_system_zone(sb);
base-commit: c96dceeabf765d0b1b1f29c3bf50a5c01315b820
--
2.25.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
2020-02-19 18:30 [PATCH v3 0/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL Eric Biggers
2020-02-19 18:30 ` [PATCH v3 1/2] ext4: rename s_journal_flag_rwsem to s_writepages_rwsem Eric Biggers
@ 2020-02-19 18:30 ` Eric Biggers
2020-02-20 9:15 ` Jan Kara
1 sibling, 1 reply; 7+ messages in thread
From: Eric Biggers @ 2020-02-19 18:30 UTC (permalink / raw)
To: linux-ext4, Theodore Ts'o; +Cc: Jan Kara
From: Eric Biggers <ebiggers@google.com>
If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
on it, the following warning in ext4_add_complete_io() can be hit:
WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120
Here's a minimal reproducer (not 100% reliable) (root isn't required):
while true; do
sync
done &
while true; do
rm -f file
touch file
chattr -e file
echo X >> file
chattr +e file
done
The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
(which only returns true on extent-based files) is checked once to set
the number of reserved journal credits, and also again later to select
the flags for ext4_map_blocks() and copy the reserved journal handle to
ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set,
the first check can see dioread_nolock disabled while the later one can
see it enabled, causing the reserved handle to unexpectedly be NULL.
Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
related to doing so as well, fix this by synchronizing changing
EXT4_EXTENTS_FL with ext4_writepages() via the existing
s_writepages_rwsem (previously called s_journal_flag_rwsem).
This was originally reported by syzbot without a reproducer at
https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
but now that dioread_nolock is the default I also started seeing this
when running syzkaller locally.
Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
fs/ext4/ext4.h | 5 ++++-
fs/ext4/migrate.c | 27 +++++++++++++++++++--------
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 487a7b430b9dd..0a59006c621a0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1552,7 +1552,10 @@ struct ext4_sb_info {
struct ratelimit_state s_warning_ratelimit_state;
struct ratelimit_state s_msg_ratelimit_state;
- /* Barrier between changing inodes' journal flags and writepages ops. */
+ /*
+ * Barrier between writepages ops and changing any inode's JOURNAL_DATA
+ * or EXTENTS flag.
+ */
struct percpu_rw_semaphore s_writepages_rwsem;
struct dax_device *s_daxdev;
#ifdef CONFIG_EXT4_DEBUG
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 89725fa425732..fb6520f371355 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -407,6 +407,7 @@ static int free_ext_block(handle_t *handle, struct inode *inode)
int ext4_ext_migrate(struct inode *inode)
{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
handle_t *handle;
int retval = 0, i;
__le32 *i_data;
@@ -431,6 +432,8 @@ int ext4_ext_migrate(struct inode *inode)
*/
return retval;
+ percpu_down_write(&sbi->s_writepages_rwsem);
+
/*
* Worst case we can touch the allocation bitmaps, a bgd
* block, and a block to link in the orphan list. We do need
@@ -441,7 +444,7 @@ int ext4_ext_migrate(struct inode *inode)
if (IS_ERR(handle)) {
retval = PTR_ERR(handle);
- return retval;
+ goto out_unlock;
}
goal = (((inode->i_ino - 1) / EXT4_INODES_PER_GROUP(inode->i_sb)) *
EXT4_INODES_PER_GROUP(inode->i_sb)) + 1;
@@ -452,7 +455,7 @@ int ext4_ext_migrate(struct inode *inode)
if (IS_ERR(tmp_inode)) {
retval = PTR_ERR(tmp_inode);
ext4_journal_stop(handle);
- return retval;
+ goto out_unlock;
}
i_size_write(tmp_inode, i_size_read(inode));
/*
@@ -494,7 +497,7 @@ int ext4_ext_migrate(struct inode *inode)
*/
ext4_orphan_del(NULL, tmp_inode);
retval = PTR_ERR(handle);
- goto out;
+ goto out_tmp_inode;
}
ei = EXT4_I(inode);
@@ -576,10 +579,11 @@ int ext4_ext_migrate(struct inode *inode)
ext4_ext_tree_init(handle, tmp_inode);
out_stop:
ext4_journal_stop(handle);
-out:
+out_tmp_inode:
unlock_new_inode(tmp_inode);
iput(tmp_inode);
-
+out_unlock:
+ percpu_up_write(&sbi->s_writepages_rwsem);
return retval;
}
@@ -589,7 +593,8 @@ int ext4_ext_migrate(struct inode *inode)
int ext4_ind_migrate(struct inode *inode)
{
struct ext4_extent_header *eh;
- struct ext4_super_block *es = EXT4_SB(inode->i_sb)->s_es;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ struct ext4_super_block *es = sbi->s_es;
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_extent *ex;
unsigned int i, len;
@@ -613,9 +618,13 @@ int ext4_ind_migrate(struct inode *inode)
if (test_opt(inode->i_sb, DELALLOC))
ext4_alloc_da_blocks(inode);
+ percpu_down_write(&sbi->s_writepages_rwsem);
+
handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1);
- if (IS_ERR(handle))
- return PTR_ERR(handle);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out_unlock;
+ }
down_write(&EXT4_I(inode)->i_data_sem);
ret = ext4_ext_check_inode(inode);
@@ -650,5 +659,7 @@ int ext4_ind_migrate(struct inode *inode)
errout:
ext4_journal_stop(handle);
up_write(&EXT4_I(inode)->i_data_sem);
+out_unlock:
+ percpu_up_write(&sbi->s_writepages_rwsem);
return ret;
}
--
2.25.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
2020-02-19 18:30 ` [PATCH v3 1/2] ext4: rename s_journal_flag_rwsem to s_writepages_rwsem Eric Biggers
@ 2020-02-20 9:14 ` Jan Kara
2020-02-21 18:53 ` Theodore Y. Ts'o
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2020-02-20 9:14 UTC (permalink / raw)
To: Eric Biggers; +Cc: linux-ext4, Theodore Ts'o, Jan Kara
On Wed 19-02-20 10:30:46, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> In preparation for making s_journal_flag_rwsem synchronize
> ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
> flags (rather than just JOURNAL_DATA as it does currently), rename it to
> s_writepages_rwsem.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
The patch looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/ext4.h | 2 +-
> fs/ext4/inode.c | 14 +++++++-------
> fs/ext4/super.c | 6 +++---
> 3 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 4441331d06cc4..487a7b430b9dd 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1553,7 +1553,7 @@ struct ext4_sb_info {
> struct ratelimit_state s_msg_ratelimit_state;
>
> /* Barrier between changing inodes' journal flags and writepages ops. */
> - struct percpu_rw_semaphore s_journal_flag_rwsem;
> + struct percpu_rw_semaphore s_writepages_rwsem;
> struct dax_device *s_daxdev;
> #ifdef CONFIG_EXT4_DEBUG
> unsigned long s_simulate_fail;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c04a15fc8b6ad..f49c48ea2f170 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2628,7 +2628,7 @@ static int ext4_writepages(struct address_space *mapping,
> if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
> return -EIO;
>
> - percpu_down_read(&sbi->s_journal_flag_rwsem);
> + percpu_down_read(&sbi->s_writepages_rwsem);
> trace_ext4_writepages(inode, wbc);
>
> /*
> @@ -2849,7 +2849,7 @@ static int ext4_writepages(struct address_space *mapping,
> out_writepages:
> trace_ext4_writepages_result(inode, wbc, ret,
> nr_to_write - wbc->nr_to_write);
> - percpu_up_read(&sbi->s_journal_flag_rwsem);
> + percpu_up_read(&sbi->s_writepages_rwsem);
> return ret;
> }
>
> @@ -2864,13 +2864,13 @@ static int ext4_dax_writepages(struct address_space *mapping,
> if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
> return -EIO;
>
> - percpu_down_read(&sbi->s_journal_flag_rwsem);
> + percpu_down_read(&sbi->s_writepages_rwsem);
> trace_ext4_writepages(inode, wbc);
>
> ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, wbc);
> trace_ext4_writepages_result(inode, wbc, ret,
> nr_to_write - wbc->nr_to_write);
> - percpu_up_read(&sbi->s_journal_flag_rwsem);
> + percpu_up_read(&sbi->s_writepages_rwsem);
> return ret;
> }
>
> @@ -5861,7 +5861,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> }
> }
>
> - percpu_down_write(&sbi->s_journal_flag_rwsem);
> + percpu_down_write(&sbi->s_writepages_rwsem);
> jbd2_journal_lock_updates(journal);
>
> /*
> @@ -5878,7 +5878,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> err = jbd2_journal_flush(journal);
> if (err < 0) {
> jbd2_journal_unlock_updates(journal);
> - percpu_up_write(&sbi->s_journal_flag_rwsem);
> + percpu_up_write(&sbi->s_writepages_rwsem);
> return err;
> }
> ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA);
> @@ -5886,7 +5886,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> ext4_set_aops(inode);
>
> jbd2_journal_unlock_updates(journal);
> - percpu_up_write(&sbi->s_journal_flag_rwsem);
> + percpu_up_write(&sbi->s_writepages_rwsem);
>
> if (val)
> up_write(&EXT4_I(inode)->i_mmap_sem);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b0b9150c97735..feb59c7ad395f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1054,7 +1054,7 @@ static void ext4_put_super(struct super_block *sb)
> percpu_counter_destroy(&sbi->s_freeinodes_counter);
> percpu_counter_destroy(&sbi->s_dirs_counter);
> percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
> - percpu_free_rwsem(&sbi->s_journal_flag_rwsem);
> + percpu_free_rwsem(&sbi->s_writepages_rwsem);
> #ifdef CONFIG_QUOTA
> for (i = 0; i < EXT4_MAXQUOTAS; i++)
> kfree(get_qf_name(sb, sbi, i));
> @@ -4600,7 +4600,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> err = percpu_counter_init(&sbi->s_dirtyclusters_counter, 0,
> GFP_KERNEL);
> if (!err)
> - err = percpu_init_rwsem(&sbi->s_journal_flag_rwsem);
> + err = percpu_init_rwsem(&sbi->s_writepages_rwsem);
>
> if (err) {
> ext4_msg(sb, KERN_ERR, "insufficient memory");
> @@ -4694,7 +4694,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> percpu_counter_destroy(&sbi->s_freeinodes_counter);
> percpu_counter_destroy(&sbi->s_dirs_counter);
> percpu_counter_destroy(&sbi->s_dirtyclusters_counter);
> - percpu_free_rwsem(&sbi->s_journal_flag_rwsem);
> + percpu_free_rwsem(&sbi->s_writepages_rwsem);
> failed_mount5:
> ext4_ext_release(sb);
> ext4_release_system_zone(sb);
>
> base-commit: c96dceeabf765d0b1b1f29c3bf50a5c01315b820
> --
> 2.25.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
2020-02-19 18:30 ` [PATCH v3 2/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL Eric Biggers
@ 2020-02-20 9:15 ` Jan Kara
2020-02-21 18:53 ` Theodore Y. Ts'o
0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2020-02-20 9:15 UTC (permalink / raw)
To: Eric Biggers; +Cc: linux-ext4, Theodore Ts'o, Jan Kara
On Wed 19-02-20 10:30:47, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
> on it, the following warning in ext4_add_complete_io() can be hit:
>
> WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120
>
> Here's a minimal reproducer (not 100% reliable) (root isn't required):
>
> while true; do
> sync
> done &
> while true; do
> rm -f file
> touch file
> chattr -e file
> echo X >> file
> chattr +e file
> done
>
> The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
> (which only returns true on extent-based files) is checked once to set
> the number of reserved journal credits, and also again later to select
> the flags for ext4_map_blocks() and copy the reserved journal handle to
> ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set,
> the first check can see dioread_nolock disabled while the later one can
> see it enabled, causing the reserved handle to unexpectedly be NULL.
>
> Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
> related to doing so as well, fix this by synchronizing changing
> EXT4_EXTENTS_FL with ext4_writepages() via the existing
> s_writepages_rwsem (previously called s_journal_flag_rwsem).
>
> This was originally reported by syzbot without a reproducer at
> https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
> but now that dioread_nolock is the default I also started seeing this
> when running syzkaller locally.
>
> Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
> Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
> Cc: stable@kernel.org
> Signed-off-by: Eric Biggers <ebiggers@google.com>
The patch looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/ext4.h | 5 ++++-
> fs/ext4/migrate.c | 27 +++++++++++++++++++--------
> 2 files changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 487a7b430b9dd..0a59006c621a0 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1552,7 +1552,10 @@ struct ext4_sb_info {
> struct ratelimit_state s_warning_ratelimit_state;
> struct ratelimit_state s_msg_ratelimit_state;
>
> - /* Barrier between changing inodes' journal flags and writepages ops. */
> + /*
> + * Barrier between writepages ops and changing any inode's JOURNAL_DATA
> + * or EXTENTS flag.
> + */
> struct percpu_rw_semaphore s_writepages_rwsem;
> struct dax_device *s_daxdev;
> #ifdef CONFIG_EXT4_DEBUG
> diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
> index 89725fa425732..fb6520f371355 100644
> --- a/fs/ext4/migrate.c
> +++ b/fs/ext4/migrate.c
> @@ -407,6 +407,7 @@ static int free_ext_block(handle_t *handle, struct inode *inode)
>
> int ext4_ext_migrate(struct inode *inode)
> {
> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> handle_t *handle;
> int retval = 0, i;
> __le32 *i_data;
> @@ -431,6 +432,8 @@ int ext4_ext_migrate(struct inode *inode)
> */
> return retval;
>
> + percpu_down_write(&sbi->s_writepages_rwsem);
> +
> /*
> * Worst case we can touch the allocation bitmaps, a bgd
> * block, and a block to link in the orphan list. We do need
> @@ -441,7 +444,7 @@ int ext4_ext_migrate(struct inode *inode)
>
> if (IS_ERR(handle)) {
> retval = PTR_ERR(handle);
> - return retval;
> + goto out_unlock;
> }
> goal = (((inode->i_ino - 1) / EXT4_INODES_PER_GROUP(inode->i_sb)) *
> EXT4_INODES_PER_GROUP(inode->i_sb)) + 1;
> @@ -452,7 +455,7 @@ int ext4_ext_migrate(struct inode *inode)
> if (IS_ERR(tmp_inode)) {
> retval = PTR_ERR(tmp_inode);
> ext4_journal_stop(handle);
> - return retval;
> + goto out_unlock;
> }
> i_size_write(tmp_inode, i_size_read(inode));
> /*
> @@ -494,7 +497,7 @@ int ext4_ext_migrate(struct inode *inode)
> */
> ext4_orphan_del(NULL, tmp_inode);
> retval = PTR_ERR(handle);
> - goto out;
> + goto out_tmp_inode;
> }
>
> ei = EXT4_I(inode);
> @@ -576,10 +579,11 @@ int ext4_ext_migrate(struct inode *inode)
> ext4_ext_tree_init(handle, tmp_inode);
> out_stop:
> ext4_journal_stop(handle);
> -out:
> +out_tmp_inode:
> unlock_new_inode(tmp_inode);
> iput(tmp_inode);
> -
> +out_unlock:
> + percpu_up_write(&sbi->s_writepages_rwsem);
> return retval;
> }
>
> @@ -589,7 +593,8 @@ int ext4_ext_migrate(struct inode *inode)
> int ext4_ind_migrate(struct inode *inode)
> {
> struct ext4_extent_header *eh;
> - struct ext4_super_block *es = EXT4_SB(inode->i_sb)->s_es;
> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + struct ext4_super_block *es = sbi->s_es;
> struct ext4_inode_info *ei = EXT4_I(inode);
> struct ext4_extent *ex;
> unsigned int i, len;
> @@ -613,9 +618,13 @@ int ext4_ind_migrate(struct inode *inode)
> if (test_opt(inode->i_sb, DELALLOC))
> ext4_alloc_da_blocks(inode);
>
> + percpu_down_write(&sbi->s_writepages_rwsem);
> +
> handle = ext4_journal_start(inode, EXT4_HT_MIGRATE, 1);
> - if (IS_ERR(handle))
> - return PTR_ERR(handle);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + goto out_unlock;
> + }
>
> down_write(&EXT4_I(inode)->i_data_sem);
> ret = ext4_ext_check_inode(inode);
> @@ -650,5 +659,7 @@ int ext4_ind_migrate(struct inode *inode)
> errout:
> ext4_journal_stop(handle);
> up_write(&EXT4_I(inode)->i_data_sem);
> +out_unlock:
> + percpu_up_write(&sbi->s_writepages_rwsem);
> return ret;
> }
> --
> 2.25.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
2020-02-20 9:14 ` Jan Kara
@ 2020-02-21 18:53 ` Theodore Y. Ts'o
0 siblings, 0 replies; 7+ messages in thread
From: Theodore Y. Ts'o @ 2020-02-21 18:53 UTC (permalink / raw)
To: Jan Kara; +Cc: Eric Biggers, linux-ext4
On Thu, Feb 20, 2020 at 10:14:58AM +0100, Jan Kara wrote:
> On Wed 19-02-20 10:30:46, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> >
> > In preparation for making s_journal_flag_rwsem synchronize
> > ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
> > flags (rather than just JOURNAL_DATA as it does currently), rename it to
> > s_writepages_rwsem.
> >
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
>
> The patch looks good to me. You can add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 2/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
2020-02-20 9:15 ` Jan Kara
@ 2020-02-21 18:53 ` Theodore Y. Ts'o
0 siblings, 0 replies; 7+ messages in thread
From: Theodore Y. Ts'o @ 2020-02-21 18:53 UTC (permalink / raw)
To: Jan Kara; +Cc: Eric Biggers, linux-ext4
On Thu, Feb 20, 2020 at 10:15:48AM +0100, Jan Kara wrote:
> On Wed 19-02-20 10:30:47, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> >
> > If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
> > on it, the following warning in ext4_add_complete_io() can be hit:
> >
> > WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120
> >
> > Here's a minimal reproducer (not 100% reliable) (root isn't required):
> >
> > while true; do
> > sync
> > done &
> > while true; do
> > rm -f file
> > touch file
> > chattr -e file
> > echo X >> file
> > chattr +e file
> > done
> >
> > The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
> > (which only returns true on extent-based files) is checked once to set
> > the number of reserved journal credits, and also again later to select
> > the flags for ext4_map_blocks() and copy the reserved journal handle to
> > ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set,
> > the first check can see dioread_nolock disabled while the later one can
> > see it enabled, causing the reserved handle to unexpectedly be NULL.
> >
> > Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
> > related to doing so as well, fix this by synchronizing changing
> > EXT4_EXTENTS_FL with ext4_writepages() via the existing
> > s_writepages_rwsem (previously called s_journal_flag_rwsem).
> >
> > This was originally reported by syzbot without a reproducer at
> > https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
> > but now that dioread_nolock is the default I also started seeing this
> > when running syzkaller locally.
> >
> > Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
> > Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
> > Cc: stable@kernel.org
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
>
> The patch looks good to me. You can add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-02-21 18:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19 18:30 [PATCH v3 0/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL Eric Biggers
2020-02-19 18:30 ` [PATCH v3 1/2] ext4: rename s_journal_flag_rwsem to s_writepages_rwsem Eric Biggers
2020-02-20 9:14 ` Jan Kara
2020-02-21 18:53 ` Theodore Y. Ts'o
2020-02-19 18:30 ` [PATCH v3 2/2] ext4: fix race between writepages and enabling EXT4_EXTENTS_FL Eric Biggers
2020-02-20 9:15 ` Jan Kara
2020-02-21 18:53 ` Theodore Y. Ts'o
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.