All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: bug fix for read-only scrub on read-only mount
@ 2021-12-16 11:47 Qu Wenruo
  2021-12-16 11:47 ` [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only Qu Wenruo
  2021-12-16 11:47 ` [PATCH 2/2] btrfs: output more debug message for uncommitted transaction Qu Wenruo
  0 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2021-12-16 11:47 UTC (permalink / raw)
  To: linux-btrfs

There is a long existing bug that read-only scrub on read-only mounted
btrfs can cause a uncommitted transaction to trigger an ASSERT() at
unmount time.

The first patch is the fix while the 2nd is to make the debugging of
similar bugs easier.

Qu Wenruo (2):
  btrfs: don't start transaction for scrub if the fs is mounted
    read-only
  btrfs: output more debug message for uncommitted transaction

 fs/btrfs/block-group.c | 13 +++++++++++++
 fs/btrfs/disk-io.c     | 43 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only
  2021-12-16 11:47 [PATCH 0/2] btrfs: bug fix for read-only scrub on read-only mount Qu Wenruo
@ 2021-12-16 11:47 ` Qu Wenruo
  2022-01-03 18:52   ` David Sterba
  2021-12-16 11:47 ` [PATCH 2/2] btrfs: output more debug message for uncommitted transaction Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2021-12-16 11:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: stable

[BUG]
The following super simple script would crash btrfs at unmount time, if
CONFIG_BTRFS_ASSERT() is set.

 mkfs.btrfs -f $dev
 mount $dev $mnt
 xfs_io -f -c "pwrite 0 4k" $mnt/file
 umount $mnt
 mount -r ro $dev $mnt
 btrfs scrub start -Br $mnt
 umount $mnt

This will trigger the following ASSERT() introduced by commit
0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
late stage of umount").

That patch is deifnitely not the cause, it just makes enough noise for
us developer.

[CAUSE]
We will start transaction for the following call chain during scrub:

  scrub_enumerate_chunks()
  |- btrfs_inc_block_group_ro()
     |- btrfs_join_transaction()

However for RO mount, there is no running transaction at all, thus
btrfs_join_transaction() will start a new transaction.

Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
btrfs_commit_super() to commit the new but empty transaction.

And lead to the ASSERT() being triggered.

The bug should be there for a long time. Only the new ASSERT() makes it
noisy enough to be noticed.

[FIX]
For read-only scrub on read-only mount, there is no need to start a
transaction nor to allocate new chunks in btrfs_inc_block_group_ro().

Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
if it's read-only, skip all chunk allocation and go inc_block_group_ro()
directly.

Since we're here, also add extra debug message at unmount for
btrfs_fs_info::trans_list.
Sometimes just knowing that there is no dirty metadata bytes for a
uncommitted transaction can tell us a lot of things.

Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 1db24e6d6d90..702219361b12 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
 	int ret;
 	bool dirty_bg_running;
 
+	/*
+	 * This can only happen when we are doing read-only scrub on read-only
+	 * mount.
+	 * In that case we should not start a new transaction on read-only fs.
+	 * Thus here we skip all chunk allocation.
+	 */
+	if (sb_rdonly(fs_info->sb)) {
+		mutex_lock(&fs_info->ro_block_group_mutex);
+		ret = inc_block_group_ro(cache, 0);
+		mutex_unlock(&fs_info->ro_block_group_mutex);
+		return ret;
+	}
+
 	do {
 		trans = btrfs_join_transaction(root);
 		if (IS_ERR(trans))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] btrfs: output more debug message for uncommitted transaction
  2021-12-16 11:47 [PATCH 0/2] btrfs: bug fix for read-only scrub on read-only mount Qu Wenruo
  2021-12-16 11:47 ` [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only Qu Wenruo
@ 2021-12-16 11:47 ` Qu Wenruo
  2022-01-03 19:02   ` David Sterba
  1 sibling, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2021-12-16 11:47 UTC (permalink / raw)
  To: linux-btrfs

The extra info like how many dirty bytes this uncommitted transaction
has can be very helpful.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5c598e124c25..25e0248e3c55 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4491,6 +4491,47 @@ int btrfs_commit_super(struct btrfs_fs_info *fs_info)
 	return btrfs_commit_transaction(trans);
 }
 
+static void warn_about_uncommitted_trans(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_transaction *trans;
+	struct btrfs_transaction *tmp;
+	bool found = false;
+
+	if (likely(list_empty(&fs_info->trans_list)))
+		return;
+
+	/*
+	 * This function is only called at the very end of close_ctree(),
+	 * thus no other running transaction, no need to take trans_lock.
+	 */
+	list_for_each_entry_safe(trans, tmp, &fs_info->trans_list, list) {
+		struct extent_state *cached = NULL;
+		u64 dirty_bytes = 0;
+		u64 cur = 0;
+		u64 found_start;
+		u64 found_end;
+
+		found = true;
+		while (!find_first_extent_bit(&trans->dirty_pages, cur,
+			&found_start, &found_end, EXTENT_DIRTY, &cached)) {
+			dirty_bytes += found_end + 1 - found_start;
+			cur = found_end + 1;
+		}
+		btrfs_warn(fs_info,
+	"transaction %llu (with %llu dirty metadata bytes) is not committed",
+			   trans->transid, dirty_bytes);
+		btrfs_cleanup_one_transaction(trans, fs_info);
+
+		if (trans == fs_info->running_transaction)
+			fs_info->running_transaction = NULL;
+		list_del_init(&trans->list);
+
+		btrfs_put_transaction(trans);
+		trace_btrfs_transaction_commit(fs_info);
+	}
+	ASSERT(!found);
+}
+
 void __cold close_ctree(struct btrfs_fs_info *fs_info)
 {
 	int ret;
@@ -4599,7 +4640,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	btrfs_stop_all_workers(fs_info);
 
 	/* We shouldn't have any transaction open at this point */
-	ASSERT(list_empty(&fs_info->trans_list));
+	warn_about_uncommitted_trans(fs_info);
 
 	clear_bit(BTRFS_FS_OPEN, &fs_info->flags);
 	free_root_pointers(fs_info, true);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only
  2021-12-16 11:47 ` [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only Qu Wenruo
@ 2022-01-03 18:52   ` David Sterba
  2022-01-03 23:52     ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2022-01-03 18:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, stable

On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:
> [BUG]
> The following super simple script would crash btrfs at unmount time, if
> CONFIG_BTRFS_ASSERT() is set.
> 
>  mkfs.btrfs -f $dev
>  mount $dev $mnt
>  xfs_io -f -c "pwrite 0 4k" $mnt/file
>  umount $mnt
>  mount -r ro $dev $mnt
>  btrfs scrub start -Br $mnt
>  umount $mnt
> 
> This will trigger the following ASSERT() introduced by commit
> 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
> late stage of umount").
> 
> That patch is deifnitely not the cause, it just makes enough noise for
> us developer.
> 
> [CAUSE]
> We will start transaction for the following call chain during scrub:
> 
>   scrub_enumerate_chunks()
>   |- btrfs_inc_block_group_ro()
>      |- btrfs_join_transaction()
> 
> However for RO mount, there is no running transaction at all, thus
> btrfs_join_transaction() will start a new transaction.
> 
> Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
> btrfs_commit_super() to commit the new but empty transaction.
> 
> And lead to the ASSERT() being triggered.
> 
> The bug should be there for a long time. Only the new ASSERT() makes it
> noisy enough to be noticed.
> 
> [FIX]
> For read-only scrub on read-only mount, there is no need to start a
> transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
> 
> Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
> if it's read-only, skip all chunk allocation and go inc_block_group_ro()
> directly.
> 
> Since we're here, also add extra debug message at unmount for
> btrfs_fs_info::trans_list.
> Sometimes just knowing that there is no dirty metadata bytes for a
> uncommitted transaction can tell us a lot of things.
> 
> Cc: stable@vger.kernel.org # 5.4+
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/block-group.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 1db24e6d6d90..702219361b12 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
>  	int ret;
>  	bool dirty_bg_running;
>  
> +	/*
> +	 * This can only happen when we are doing read-only scrub on read-only
> +	 * mount.
> +	 * In that case we should not start a new transaction on read-only fs.
> +	 * Thus here we skip all chunk allocation.
> +	 */
> +	if (sb_rdonly(fs_info->sb)) {

Should this also verify or at least assert that do_chunk_alloc is not
set? The scrub code is used for replace that can set the parameter to
true.

> +		mutex_lock(&fs_info->ro_block_group_mutex);
> +		ret = inc_block_group_ro(cache, 0);
> +		mutex_unlock(&fs_info->ro_block_group_mutex);
> +		return ret;

So this is taking a shortcut and skips a few things done in the function
that use the transaction. I'm not sure how safe this is, it depends on
the read-only status of superblock, that can chage any time, so what are
further calls to btrfs_inc_block_group_ro going to do regaring the
transaction?

> +	}
> +
>  	do {
>  		trans = btrfs_join_transaction(root);
>  		if (IS_ERR(trans))
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] btrfs: output more debug message for uncommitted transaction
  2021-12-16 11:47 ` [PATCH 2/2] btrfs: output more debug message for uncommitted transaction Qu Wenruo
@ 2022-01-03 19:02   ` David Sterba
  0 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2022-01-03 19:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Dec 16, 2021 at 07:47:36PM +0800, Qu Wenruo wrote:
> The extra info like how many dirty bytes this uncommitted transaction
> has can be very helpful.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This does not depend on the first patch, so I'll apply that now.

> ---
>  fs/btrfs/disk-io.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 5c598e124c25..25e0248e3c55 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4491,6 +4491,47 @@ int btrfs_commit_super(struct btrfs_fs_info *fs_info)
>  	return btrfs_commit_transaction(trans);
>  }
>  
> +static void warn_about_uncommitted_trans(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_transaction *trans;
> +	struct btrfs_transaction *tmp;
> +	bool found = false;
> +
> +	if (likely(list_empty(&fs_info->trans_list)))
> +		return;
> +
> +	/*
> +	 * This function is only called at the very end of close_ctree(),
> +	 * thus no other running transaction, no need to take trans_lock.
> +	 */

I've added an assert

	ASSERT(test_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags));

just in case somebody would use it as a warning function outside of the
close_tree context.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only
  2022-01-03 18:52   ` David Sterba
@ 2022-01-03 23:52     ` Qu Wenruo
  2022-01-04 18:40       ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2022-01-03 23:52 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, stable



On 2022/1/4 02:52, David Sterba wrote:
> On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:
>> [BUG]
>> The following super simple script would crash btrfs at unmount time, if
>> CONFIG_BTRFS_ASSERT() is set.
>>
>>   mkfs.btrfs -f $dev
>>   mount $dev $mnt
>>   xfs_io -f -c "pwrite 0 4k" $mnt/file
>>   umount $mnt
>>   mount -r ro $dev $mnt
>>   btrfs scrub start -Br $mnt
>>   umount $mnt
>>
>> This will trigger the following ASSERT() introduced by commit
>> 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
>> late stage of umount").
>>
>> That patch is deifnitely not the cause, it just makes enough noise for
>> us developer.
>>
>> [CAUSE]
>> We will start transaction for the following call chain during scrub:
>>
>>    scrub_enumerate_chunks()
>>    |- btrfs_inc_block_group_ro()
>>       |- btrfs_join_transaction()
>>
>> However for RO mount, there is no running transaction at all, thus
>> btrfs_join_transaction() will start a new transaction.
>>
>> Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
>> btrfs_commit_super() to commit the new but empty transaction.
>>
>> And lead to the ASSERT() being triggered.
>>
>> The bug should be there for a long time. Only the new ASSERT() makes it
>> noisy enough to be noticed.
>>
>> [FIX]
>> For read-only scrub on read-only mount, there is no need to start a
>> transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
>>
>> Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
>> if it's read-only, skip all chunk allocation and go inc_block_group_ro()
>> directly.
>>
>> Since we're here, also add extra debug message at unmount for
>> btrfs_fs_info::trans_list.
>> Sometimes just knowing that there is no dirty metadata bytes for a
>> uncommitted transaction can tell us a lot of things.
>>
>> Cc: stable@vger.kernel.org # 5.4+
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/block-group.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index 1db24e6d6d90..702219361b12 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
>>   	int ret;
>>   	bool dirty_bg_running;
>>
>> +	/*
>> +	 * This can only happen when we are doing read-only scrub on read-only
>> +	 * mount.
>> +	 * In that case we should not start a new transaction on read-only fs.
>> +	 * Thus here we skip all chunk allocation.
>> +	 */
>> +	if (sb_rdonly(fs_info->sb)) {
>
> Should this also verify or at least assert that do_chunk_alloc is not
> set? The scrub code is used for replace that can set the parameter to
> true.

Replace start needs RW mount, thus we don't need to bother replace in
this case.

>
>> +		mutex_lock(&fs_info->ro_block_group_mutex);
>> +		ret = inc_block_group_ro(cache, 0);
>> +		mutex_unlock(&fs_info->ro_block_group_mutex);
>> +		return ret;
>
> So this is taking a shortcut and skips a few things done in the function
> that use the transaction. I'm not sure how safe this is, it depends on
> the read-only status of superblock, that can chage any time, so what are
> further calls to btrfs_inc_block_group_ro going to do regaring the
> transaction?

By anytime you mean "remount". Thus if that's your concern, I can make
remount to stop read-only scrub, just to be extra safe.

Another thing is, only scrub and balance uses this function, for balance
it needs RW.

For scrub, if one scrub is already running, even it's RO and then the fs
mounted RW, then the next scrub run will return -EINPROGRESS or similar
error.

Thus I don't think we need to bother too much about this.

Thanks,
Qu

>
>> +	}
>> +
>>   	do {
>>   		trans = btrfs_join_transaction(root);
>>   		if (IS_ERR(trans))
>> --
>> 2.34.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only
  2022-01-03 23:52     ` Qu Wenruo
@ 2022-01-04 18:40       ` David Sterba
  2022-01-04 22:13         ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2022-01-04 18:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, stable

On Tue, Jan 04, 2022 at 07:52:39AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/1/4 02:52, David Sterba wrote:
> > On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:
> >> [BUG]
> >> The following super simple script would crash btrfs at unmount time, if
> >> CONFIG_BTRFS_ASSERT() is set.
> >>
> >>   mkfs.btrfs -f $dev
> >>   mount $dev $mnt
> >>   xfs_io -f -c "pwrite 0 4k" $mnt/file
> >>   umount $mnt
> >>   mount -r ro $dev $mnt
> >>   btrfs scrub start -Br $mnt
> >>   umount $mnt
> >>
> >> This will trigger the following ASSERT() introduced by commit
> >> 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
> >> late stage of umount").
> >>
> >> That patch is deifnitely not the cause, it just makes enough noise for
> >> us developer.
> >>
> >> [CAUSE]
> >> We will start transaction for the following call chain during scrub:
> >>
> >>    scrub_enumerate_chunks()
> >>    |- btrfs_inc_block_group_ro()
> >>       |- btrfs_join_transaction()
> >>
> >> However for RO mount, there is no running transaction at all, thus
> >> btrfs_join_transaction() will start a new transaction.
> >>
> >> Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
> >> btrfs_commit_super() to commit the new but empty transaction.
> >>
> >> And lead to the ASSERT() being triggered.
> >>
> >> The bug should be there for a long time. Only the new ASSERT() makes it
> >> noisy enough to be noticed.
> >>
> >> [FIX]
> >> For read-only scrub on read-only mount, there is no need to start a
> >> transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
> >>
> >> Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
> >> if it's read-only, skip all chunk allocation and go inc_block_group_ro()
> >> directly.
> >>
> >> Since we're here, also add extra debug message at unmount for
> >> btrfs_fs_info::trans_list.
> >> Sometimes just knowing that there is no dirty metadata bytes for a
> >> uncommitted transaction can tell us a lot of things.
> >>
> >> Cc: stable@vger.kernel.org # 5.4+
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/block-group.c | 13 +++++++++++++
> >>   1 file changed, 13 insertions(+)
> >>
> >> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> >> index 1db24e6d6d90..702219361b12 100644
> >> --- a/fs/btrfs/block-group.c
> >> +++ b/fs/btrfs/block-group.c
> >> @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
> >>   	int ret;
> >>   	bool dirty_bg_running;
> >>
> >> +	/*
> >> +	 * This can only happen when we are doing read-only scrub on read-only
> >> +	 * mount.
> >> +	 * In that case we should not start a new transaction on read-only fs.
> >> +	 * Thus here we skip all chunk allocation.
> >> +	 */
> >> +	if (sb_rdonly(fs_info->sb)) {
> >
> > Should this also verify or at least assert that do_chunk_alloc is not
> > set? The scrub code is used for replace that can set the parameter to
> > true.
> 
> Replace start needs RW mount, thus we don't need to bother replace in
> this case.

What if replace starts on rw mount, and then it's flipped to read-only?
I don't see how this is prevented (like by mnt_want_write). It should
not cause any problems either, as it would not start the transaction.

> >> +		mutex_lock(&fs_info->ro_block_group_mutex);
> >> +		ret = inc_block_group_ro(cache, 0);
> >> +		mutex_unlock(&fs_info->ro_block_group_mutex);
> >> +		return ret;
> >
> > So this is taking a shortcut and skips a few things done in the function
> > that use the transaction. I'm not sure how safe this is, it depends on
> > the read-only status of superblock, that can chage any time, so what are
> > further calls to btrfs_inc_block_group_ro going to do regaring the
> > transaction?
> 
> By anytime you mean "remount". Thus if that's your concern, I can make
> remount to stop read-only scrub, just to be extra safe.

If scrub is running in the read-only mode then it's fine, the corner
cases I'm interested in are some mixture of read-write/read-only on the
filesystem and scrub and when they get out of sync.

> Another thing is, only scrub and balance uses this function, for balance
> it needs RW.
> 
> For scrub, if one scrub is already running, even it's RO and then the fs
> mounted RW, then the next scrub run will return -EINPROGRESS or similar
> error.
> 
> Thus I don't think we need to bother too much about this.

It's not about another scrub running, that won't work, but what if
scrub is started, and then at some point the filesystem gets remounted
read-only. Both can be done without any notification by any system tool
or service. So ther's no problematic case, then ok, I'm probably not
understanding it completely yet so I'm asking. If it works by accident
or there's a corner case left I'd rather find it now.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only
  2022-01-04 18:40       ` David Sterba
@ 2022-01-04 22:13         ` Qu Wenruo
  2022-01-06 15:18           ` David Sterba
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2022-01-04 22:13 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, stable



On 2022/1/5 02:40, David Sterba wrote:
> On Tue, Jan 04, 2022 at 07:52:39AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/1/4 02:52, David Sterba wrote:
>>> On Thu, Dec 16, 2021 at 07:47:35PM +0800, Qu Wenruo wrote:
>>>> [BUG]
>>>> The following super simple script would crash btrfs at unmount time, if
>>>> CONFIG_BTRFS_ASSERT() is set.
>>>>
>>>>    mkfs.btrfs -f $dev
>>>>    mount $dev $mnt
>>>>    xfs_io -f -c "pwrite 0 4k" $mnt/file
>>>>    umount $mnt
>>>>    mount -r ro $dev $mnt
>>>>    btrfs scrub start -Br $mnt
>>>>    umount $mnt
>>>>
>>>> This will trigger the following ASSERT() introduced by commit
>>>> 0a31daa4b602 ("btrfs: add assertion for empty list of transactions at
>>>> late stage of umount").
>>>>
>>>> That patch is deifnitely not the cause, it just makes enough noise for
>>>> us developer.
>>>>
>>>> [CAUSE]
>>>> We will start transaction for the following call chain during scrub:
>>>>
>>>>     scrub_enumerate_chunks()
>>>>     |- btrfs_inc_block_group_ro()
>>>>        |- btrfs_join_transaction()
>>>>
>>>> However for RO mount, there is no running transaction at all, thus
>>>> btrfs_join_transaction() will start a new transaction.
>>>>
>>>> Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
>>>> btrfs_commit_super() to commit the new but empty transaction.
>>>>
>>>> And lead to the ASSERT() being triggered.
>>>>
>>>> The bug should be there for a long time. Only the new ASSERT() makes it
>>>> noisy enough to be noticed.
>>>>
>>>> [FIX]
>>>> For read-only scrub on read-only mount, there is no need to start a
>>>> transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
>>>>
>>>> Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
>>>> if it's read-only, skip all chunk allocation and go inc_block_group_ro()
>>>> directly.
>>>>
>>>> Since we're here, also add extra debug message at unmount for
>>>> btrfs_fs_info::trans_list.
>>>> Sometimes just knowing that there is no dirty metadata bytes for a
>>>> uncommitted transaction can tell us a lot of things.
>>>>
>>>> Cc: stable@vger.kernel.org # 5.4+
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>    fs/btrfs/block-group.c | 13 +++++++++++++
>>>>    1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>>>> index 1db24e6d6d90..702219361b12 100644
>>>> --- a/fs/btrfs/block-group.c
>>>> +++ b/fs/btrfs/block-group.c
>>>> @@ -2544,6 +2544,19 @@ int btrfs_inc_block_group_ro(struct btrfs_block_group *cache,
>>>>    	int ret;
>>>>    	bool dirty_bg_running;
>>>>
>>>> +	/*
>>>> +	 * This can only happen when we are doing read-only scrub on read-only
>>>> +	 * mount.
>>>> +	 * In that case we should not start a new transaction on read-only fs.
>>>> +	 * Thus here we skip all chunk allocation.
>>>> +	 */
>>>> +	if (sb_rdonly(fs_info->sb)) {
>>>
>>> Should this also verify or at least assert that do_chunk_alloc is not
>>> set? The scrub code is used for replace that can set the parameter to
>>> true.
>>
>> Replace start needs RW mount, thus we don't need to bother replace in
>> this case.
>
> What if replace starts on rw mount, and then it's flipped to read-only?
> I don't see how this is prevented (like by mnt_want_write). It should
> not cause any problems either, as it would not start the transaction.

For this case, there are 2 entrances:

- Remount RO
   We will stop replace in that case

- Some fs error (like trans abort)
   I believe we should fail at any transaction start.

Thanks,
Qu
>
>>>> +		mutex_lock(&fs_info->ro_block_group_mutex);
>>>> +		ret = inc_block_group_ro(cache, 0);
>>>> +		mutex_unlock(&fs_info->ro_block_group_mutex);
>>>> +		return ret;
>>>
>>> So this is taking a shortcut and skips a few things done in the function
>>> that use the transaction. I'm not sure how safe this is, it depends on
>>> the read-only status of superblock, that can chage any time, so what are
>>> further calls to btrfs_inc_block_group_ro going to do regaring the
>>> transaction?
>>
>> By anytime you mean "remount". Thus if that's your concern, I can make
>> remount to stop read-only scrub, just to be extra safe.
>
> If scrub is running in the read-only mode then it's fine, the corner
> cases I'm interested in are some mixture of read-write/read-only on the
> filesystem and scrub and when they get out of sync.
>
>> Another thing is, only scrub and balance uses this function, for balance
>> it needs RW.
>>
>> For scrub, if one scrub is already running, even it's RO and then the fs
>> mounted RW, then the next scrub run will return -EINPROGRESS or similar
>> error.
>>
>> Thus I don't think we need to bother too much about this.
>
> It's not about another scrub running, that won't work, but what if
> scrub is started, and then at some point the filesystem gets remounted
> read-only. Both can be done without any notification by any system tool
> or service. So ther's no problematic case, then ok, I'm probably not
> understanding it completely yet so I'm asking. If it works by accident
> or there's a corner case left I'd rather find it now.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only
  2022-01-04 22:13         ` Qu Wenruo
@ 2022-01-06 15:18           ` David Sterba
  0 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2022-01-06 15:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, stable

On Wed, Jan 05, 2022 at 06:13:09AM +0800, Qu Wenruo wrote:
> >>> Should this also verify or at least assert that do_chunk_alloc is not
> >>> set? The scrub code is used for replace that can set the parameter to
> >>> true.
> >>
> >> Replace start needs RW mount, thus we don't need to bother replace in
> >> this case.
> >
> > What if replace starts on rw mount, and then it's flipped to read-only?
> > I don't see how this is prevented (like by mnt_want_write). It should
> > not cause any problems either, as it would not start the transaction.
> 
> For this case, there are 2 entrances:
> 
> - Remount RO
>    We will stop replace in that case
> 
> - Some fs error (like trans abort)
>    I believe we should fail at any transaction start.

Right, thanks, that was the missing piece.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-06 15:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-16 11:47 [PATCH 0/2] btrfs: bug fix for read-only scrub on read-only mount Qu Wenruo
2021-12-16 11:47 ` [PATCH 1/2] btrfs: don't start transaction for scrub if the fs is mounted read-only Qu Wenruo
2022-01-03 18:52   ` David Sterba
2022-01-03 23:52     ` Qu Wenruo
2022-01-04 18:40       ` David Sterba
2022-01-04 22:13         ` Qu Wenruo
2022-01-06 15:18           ` David Sterba
2021-12-16 11:47 ` [PATCH 2/2] btrfs: output more debug message for uncommitted transaction Qu Wenruo
2022-01-03 19:02   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.