linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks
@ 2019-02-21  8:22 Qu Wenruo
  2019-02-21 14:25 ` Nikolay Borisov
  2019-02-26  7:38 ` Qu Wenruo
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-02-21  8:22 UTC (permalink / raw)
  To: linux-btrfs

There are a lot of error reports complaining about transid error in the
mail list.

Under most case, the on-disk transid is lower than expected transid.
This may indicate that some tree blocks are not written back to disk
before writing super blocks.

This patch will add a safe net for developers, by calling
btrfs_write_and_wait_transaction() before setting transaction unblocked
and double check btree_inode and dirty_pages io_tree, to ensure no tree
blocks are still dirty or under writeback.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
The reason for RFC is, I'm not sure why we currently call
btrfs_write_and_wait_transaction() after setting transaction UNBLOCKED.

It looks like an optimization, but I don't see much performance
difference during regression test.

I hope to move the call before we unblock transaction so we can do such
sanity check for all builds and hope to catch some clue of transid
error.
---
 fs/btrfs/transaction.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4ec2b660d014..30b7ed0bf873 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2213,6 +2213,44 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	btrfs_trans_release_chunk_metadata(trans);
 
+	/* Last safenet or developer to catch any unwritten tree blocks */
+	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+		u64 found_start = 0;
+		u64 found_end = 0;
+
+		ret = btrfs_write_and_wait_transaction(trans);
+		if (ret) {
+			btrfs_handle_fs_error(fs_info, ret,
+					      "Error while writing out transaction");
+			mutex_unlock(&fs_info->tree_log_mutex);
+			goto scrub_continue;
+		}
+
+		/* No dirty extent should exist in btree inode */
+		ret = test_range_bit(&trans->transaction->dirty_pages, 0,
+				(u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
+				0, NULL);
+		if (ret > 0) {
+			WARN(1,
+		"dirty_pages not fully written back, start=%llu len=%llu\n",
+			     found_start, found_end + 1 - found_start);
+			ret = -EUCLEAN;
+			mutex_unlock(&fs_info->tree_log_mutex);
+			goto scrub_continue;
+		}
+		ret = test_range_bit(&BTRFS_I(fs_info->btree_inode)->io_tree, 0,
+				     (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
+				     0, NULL);
+		if (ret > 0) {
+			WARN(1,
+		"btree io_tree not fully written back, start=%llu len=%llu\n",
+			     found_start, found_end + 1 - found_start);
+			ret = -EUCLEAN;
+			mutex_unlock(&fs_info->tree_log_mutex);
+			goto scrub_continue;
+		}
+	}
+
 	spin_lock(&fs_info->trans_lock);
 	cur_trans->state = TRANS_STATE_UNBLOCKED;
 	fs_info->running_transaction = NULL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks
  2019-02-21  8:22 [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks Qu Wenruo
@ 2019-02-21 14:25 ` Nikolay Borisov
  2019-02-21 14:32   ` Qu Wenruo
  2019-02-26  7:38 ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2019-02-21 14:25 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 21.02.19 г. 10:22 ч., Qu Wenruo wrote:
> There are a lot of error reports complaining about transid error in the
> mail list.
> 
> Under most case, the on-disk transid is lower than expected transid.
> This may indicate that some tree blocks are not written back to disk
> before writing super blocks.
> 
> This patch will add a safe net for developers, by calling
> btrfs_write_and_wait_transaction() before setting transaction unblocked
> and double check btree_inode and dirty_pages io_tree, to ensure no tree
> blocks are still dirty or under writeback.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> The reason for RFC is, I'm not sure why we currently call
> btrfs_write_and_wait_transaction() after setting transaction UNBLOCKED.
> 
> It looks like an optimization, but I don't see much performance
> difference during regression test.
> 
> I hope to move the call before we unblock transaction so we can do such
> sanity check for all builds and hope to catch some clue of transid
> error.

Even current code ensures that all allocated blocks in the current
transaction (which is what all those EXTENT_DIRTY extents in the
dirty_pages tree ) are written before the new superblocks are.

Slight offtopic: In fact instead of playing games with the flags and
having an extent_io_tree called dirty_pages o_O it can be replaced with
a simple linked list that holds all newly allocated buffers so writing
all such buffers will result in simply iterating the list.

In any case this patch is buggy, see below on why

> ---
>  fs/btrfs/transaction.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 4ec2b660d014..30b7ed0bf873 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -2213,6 +2213,44 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  
>  	btrfs_trans_release_chunk_metadata(trans);
>  
> +	/* Last safenet or developer to catch any unwritten tree blocks */
> +	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
> +		u64 found_start = 0;
> +		u64 found_end = 0;
> +
> +		ret = btrfs_write_and_wait_transaction(trans);
> +		if (ret) {
> +			btrfs_handle_fs_error(fs_info, ret,
> +					      "Error while writing out transaction");
> +			mutex_unlock(&fs_info->tree_log_mutex);
> +			goto scrub_continue;
> +		}
> +
> +		/* No dirty extent should exist in btree inode */
> +		ret = test_range_bit(&trans->transaction->dirty_pages, 0,
> +				(u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,

Why do you check EXTENT_WRITEBACK, AFAICS that flag is not currently
used in the code and should perhahps be deleted? I don't see anything
setting it, it's only being checked for (as part of EXTENT_IOBITS).

Additionally this check is pointless because
btrfs_write_and_wait_transaction calls clear_btree_io_tree which purges
the tree.

> +				0, NULL);
> +		if (ret > 0) {
> +			WARN(1,
> +		"dirty_pages not fully written back, start=%llu len=%llu\n",
> +			     found_start, found_end + 1 - found_start);
> +			ret = -EUCLEAN;
> +			mutex_unlock(&fs_info->tree_log_mutex);
> +			goto scrub_continue;
> +		}
> +		ret = test_range_bit(&BTRFS_I(fs_info->btree_inode)->io_tree, 0,
> +				     (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
> +				     0, NULL);
> +		if (ret > 0) {
> +			WARN(1,
> +		"btree io_tree not fully written back, start=%llu len=%llu\n",
> +			     found_start, found_end + 1 - found_start);
> +			ret = -EUCLEAN;
> +			mutex_unlock(&fs_info->tree_log_mutex);
> +			goto scrub_continue;
> +		}
> +	}
> +
>  	spin_lock(&fs_info->trans_lock);
>  	cur_trans->state = TRANS_STATE_UNBLOCKED;
>  	fs_info->running_transaction = NULL;
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks
  2019-02-21 14:25 ` Nikolay Borisov
@ 2019-02-21 14:32   ` Qu Wenruo
  2019-02-21 15:01     ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2019-02-21 14:32 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2019/2/21 下午10:25, Nikolay Borisov wrote:
> 
> 
> On 21.02.19 г. 10:22 ч., Qu Wenruo wrote:
>> There are a lot of error reports complaining about transid error in the
>> mail list.
>>
>> Under most case, the on-disk transid is lower than expected transid.
>> This may indicate that some tree blocks are not written back to disk
>> before writing super blocks.
>>
>> This patch will add a safe net for developers, by calling
>> btrfs_write_and_wait_transaction() before setting transaction unblocked
>> and double check btree_inode and dirty_pages io_tree, to ensure no tree
>> blocks are still dirty or under writeback.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> The reason for RFC is, I'm not sure why we currently call
>> btrfs_write_and_wait_transaction() after setting transaction UNBLOCKED.
>>
>> It looks like an optimization, but I don't see much performance
>> difference during regression test.
>>
>> I hope to move the call before we unblock transaction so we can do such
>> sanity check for all builds and hope to catch some clue of transid
>> error.
> 
> Even current code ensures that all allocated blocks in the current
> transaction (which is what all those EXTENT_DIRTY extents in the
> dirty_pages tree ) are written before the new superblocks are.
> 
> Slight offtopic: In fact instead of playing games with the flags and
> having an extent_io_tree called dirty_pages o_O it can be replaced with
> a simple linked list that holds all newly allocated buffers so writing
> all such buffers will result in simply iterating the list.
> 
> In any case this patch is buggy, see below on why
> 
>> ---
>>  fs/btrfs/transaction.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 38 insertions(+)
>>
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 4ec2b660d014..30b7ed0bf873 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -2213,6 +2213,44 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>  
>>  	btrfs_trans_release_chunk_metadata(trans);
>>  
>> +	/* Last safenet or developer to catch any unwritten tree blocks */
>> +	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
>> +		u64 found_start = 0;
>> +		u64 found_end = 0;
>> +
>> +		ret = btrfs_write_and_wait_transaction(trans);
>> +		if (ret) {
>> +			btrfs_handle_fs_error(fs_info, ret,
>> +					      "Error while writing out transaction");
>> +			mutex_unlock(&fs_info->tree_log_mutex);
>> +			goto scrub_continue;
>> +		}
>> +
>> +		/* No dirty extent should exist in btree inode */
>> +		ret = test_range_bit(&trans->transaction->dirty_pages, 0,
>> +				(u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
> 
> Why do you check EXTENT_WRITEBACK, AFAICS that flag is not currently
> used in the code and should perhahps be deleted? I don't see anything
> setting it, it's only being checked for (as part of EXTENT_IOBITS).
> 
> Additionally this check is pointless because
> btrfs_write_and_wait_transaction calls clear_btree_io_tree which purges
> the tree.

But we still have BTRFS_I(fs_info->btree_inode)->io_tree, and that's the
main part of the check.

I don't really think the dirty_pages is really an important thing
compared to btree_inode.
If there is some way to find any dirty pages from an address_space, it
would be even better.

Thanks,
Qu

> 
>> +				0, NULL);
>> +		if (ret > 0) {
>> +			WARN(1,
>> +		"dirty_pages not fully written back, start=%llu len=%llu\n",
>> +			     found_start, found_end + 1 - found_start);
>> +			ret = -EUCLEAN;
>> +			mutex_unlock(&fs_info->tree_log_mutex);
>> +			goto scrub_continue;
>> +		}
>> +		ret = test_range_bit(&BTRFS_I(fs_info->btree_inode)->io_tree, 0,
>> +				     (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
>> +				     0, NULL);
>> +		if (ret > 0) {
>> +			WARN(1,
>> +		"btree io_tree not fully written back, start=%llu len=%llu\n",
>> +			     found_start, found_end + 1 - found_start);
>> +			ret = -EUCLEAN;
>> +			mutex_unlock(&fs_info->tree_log_mutex);
>> +			goto scrub_continue;
>> +		}
>> +	}
>> +
>>  	spin_lock(&fs_info->trans_lock);
>>  	cur_trans->state = TRANS_STATE_UNBLOCKED;
>>  	fs_info->running_transaction = NULL;
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks
  2019-02-21 14:32   ` Qu Wenruo
@ 2019-02-21 15:01     ` Nikolay Borisov
  0 siblings, 0 replies; 5+ messages in thread
From: Nikolay Borisov @ 2019-02-21 15:01 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 21.02.19 г. 16:32 ч., Qu Wenruo wrote:
> 
> 
> On 2019/2/21 下午10:25, Nikolay Borisov wrote:
>>
>>
>> On 21.02.19 г. 10:22 ч., Qu Wenruo wrote:
>>> There are a lot of error reports complaining about transid error in the
>>> mail list.
>>>
>>> Under most case, the on-disk transid is lower than expected transid.
>>> This may indicate that some tree blocks are not written back to disk
>>> before writing super blocks.
>>>
>>> This patch will add a safe net for developers, by calling
>>> btrfs_write_and_wait_transaction() before setting transaction unblocked
>>> and double check btree_inode and dirty_pages io_tree, to ensure no tree
>>> blocks are still dirty or under writeback.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>> The reason for RFC is, I'm not sure why we currently call
>>> btrfs_write_and_wait_transaction() after setting transaction UNBLOCKED.
>>>
>>> It looks like an optimization, but I don't see much performance
>>> difference during regression test.
>>>
>>> I hope to move the call before we unblock transaction so we can do such
>>> sanity check for all builds and hope to catch some clue of transid
>>> error.
>>
>> Even current code ensures that all allocated blocks in the current
>> transaction (which is what all those EXTENT_DIRTY extents in the
>> dirty_pages tree ) are written before the new superblocks are.
>>
>> Slight offtopic: In fact instead of playing games with the flags and
>> having an extent_io_tree called dirty_pages o_O it can be replaced with
>> a simple linked list that holds all newly allocated buffers so writing
>> all such buffers will result in simply iterating the list.
>>
>> In any case this patch is buggy, see below on why
>>
>>> ---
>>>  fs/btrfs/transaction.c | 38 ++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 38 insertions(+)
>>>
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 4ec2b660d014..30b7ed0bf873 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -2213,6 +2213,44 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>>  
>>>  	btrfs_trans_release_chunk_metadata(trans);
>>>  
>>> +	/* Last safenet or developer to catch any unwritten tree blocks */
>>> +	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
>>> +		u64 found_start = 0;
>>> +		u64 found_end = 0;
>>> +
>>> +		ret = btrfs_write_and_wait_transaction(trans);
>>> +		if (ret) {
>>> +			btrfs_handle_fs_error(fs_info, ret,
>>> +					      "Error while writing out transaction");
>>> +			mutex_unlock(&fs_info->tree_log_mutex);
>>> +			goto scrub_continue;
>>> +		}
>>> +
>>> +		/* No dirty extent should exist in btree inode */
>>> +		ret = test_range_bit(&trans->transaction->dirty_pages, 0,
>>> +				(u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
>>
>> Why do you check EXTENT_WRITEBACK, AFAICS that flag is not currently
>> used in the code and should perhahps be deleted? I don't see anything
>> setting it, it's only being checked for (as part of EXTENT_IOBITS).
>>
>> Additionally this check is pointless because
>> btrfs_write_and_wait_transaction calls clear_btree_io_tree which purges
>> the tree.
> 
> But we still have BTRFS_I(fs_info->btree_inode)->io_tree, and that's the
> main part of the check.

Okay, but the ->dirty_pages code is pointless. Also checking for
EXTENT_WRITEBACK on the io_tree is also pointless.
> 
> I don't really think the dirty_pages is really an important thing
> compared to btree_inode.
> If there is some way to find any dirty pages from an address_space, it
> would be even better.
> 
> Thanks,
> Qu
> 
>>
>>> +				0, NULL);
>>> +		if (ret > 0) {
>>> +			WARN(1,
>>> +		"dirty_pages not fully written back, start=%llu len=%llu\n",
>>> +			     found_start, found_end + 1 - found_start);
>>> +			ret = -EUCLEAN;
>>> +			mutex_unlock(&fs_info->tree_log_mutex);
>>> +			goto scrub_continue;
>>> +		}
>>> +		ret = test_range_bit(&BTRFS_I(fs_info->btree_inode)->io_tree, 0,
>>> +				     (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
>>> +				     0, NULL);
>>> +		if (ret > 0) {
>>> +			WARN(1,
>>> +		"btree io_tree not fully written back, start=%llu len=%llu\n",
>>> +			     found_start, found_end + 1 - found_start);
>>> +			ret = -EUCLEAN;
>>> +			mutex_unlock(&fs_info->tree_log_mutex);
>>> +			goto scrub_continue;
>>> +		}
>>> +	}
>>> +
>>>  	spin_lock(&fs_info->trans_lock);
>>>  	cur_trans->state = TRANS_STATE_UNBLOCKED;
>>>  	fs_info->running_transaction = NULL;
>>>
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks
  2019-02-21  8:22 [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks Qu Wenruo
  2019-02-21 14:25 ` Nikolay Borisov
@ 2019-02-26  7:38 ` Qu Wenruo
  1 sibling, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-02-26  7:38 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2463 bytes --]



On 2019/2/21 下午4:22, Qu Wenruo wrote:
> There are a lot of error reports complaining about transid error in the
> mail list.
> 
> Under most case, the on-disk transid is lower than expected transid.
> This may indicate that some tree blocks are not written back to disk
> before writing super blocks.
> 
> This patch will add a safe net for developers, by calling
> btrfs_write_and_wait_transaction() before setting transaction unblocked
> and double check btree_inode and dirty_pages io_tree, to ensure no tree
> blocks are still dirty or under writeback.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Please discard this patch.

As Nikolay suggested, the check doesn't make much sense.

>  
> +	/* Last safenet or developer to catch any unwritten tree blocks */
> +	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
> +		u64 found_start = 0;
> +		u64 found_end = 0;
> +
> +		ret = btrfs_write_and_wait_transaction(trans);
> +		if (ret) {
> +			btrfs_handle_fs_error(fs_info, ret,
> +					      "Error while writing out transaction");
> +			mutex_unlock(&fs_info->tree_log_mutex);
> +			goto scrub_continue;
> +		}
> +
> +		/* No dirty extent should exist in btree inode */
> +		ret = test_range_bit(&trans->transaction->dirty_pages, 0,
> +				(u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
> +				0, NULL);

For dirty_pages, btrfs_write_and_wait_transaction() will clear all bits.
So the check will always return 0.

> +		if (ret > 0) {
> +			WARN(1,
> +		"dirty_pages not fully written back, start=%llu len=%llu\n",
> +			     found_start, found_end + 1 - found_start);
> +			ret = -EUCLEAN;
> +			mutex_unlock(&fs_info->tree_log_mutex);
> +			goto scrub_continue;
> +		}
> +		ret = test_range_bit(&BTRFS_I(fs_info->btree_inode)->io_tree, 0,
> +				     (u64)-1, EXTENT_DIRTY | EXTENT_WRITEBACK,
> +				     0, NULL);

btree_inode doesn't really carry EXTENT_DIRTY bit.
It only carries EXTENT_LOCK, and are mostly for tree read locking.

So this check doesn't make much sense either.

Thanks,
Qu

> +		if (ret > 0) {
> +			WARN(1,
> +		"btree io_tree not fully written back, start=%llu len=%llu\n",
> +			     found_start, found_end + 1 - found_start);
> +			ret = -EUCLEAN;
> +			mutex_unlock(&fs_info->tree_log_mutex);
> +			goto scrub_continue;
> +		}
> +	}
> +
>  	spin_lock(&fs_info->trans_lock);
>  	cur_trans->state = TRANS_STATE_UNBLOCKED;
>  	fs_info->running_transaction = NULL;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-26  7:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-21  8:22 [RFC PATCH] btrfs: Introduce developer-oriented check to ensure all tree blocks are written back before writing super blocks Qu Wenruo
2019-02-21 14:25 ` Nikolay Borisov
2019-02-21 14:32   ` Qu Wenruo
2019-02-21 15:01     ` Nikolay Borisov
2019-02-26  7:38 ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).