All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: dsterba@suse.cz, Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit
Date: Mon, 11 Jan 2021 10:33:42 +0200	[thread overview]
Message-ID: <52aef9a6-efc7-0820-7056-067e69c2a856@suse.com> (raw)
In-Reply-To: <20210108160109.GB6430@twin.jikos.cz>



On 8.01.21 г. 18:01 ч., David Sterba wrote:
> On Fri, Dec 18, 2020 at 02:24:20PM -0500, Josef Bacik wrote:
>> I've been running a stress test that runs 20 workers in their own
>> subvolume, which are running an fsstress instance with 4 threads per
>> worker, which is 80 total fsstress threads.  In addition to this I'm
>> running balance in the background as well as creating and deleting
>> snapshots.  This test takes around 12 hours to run normally, going
>> slower and slower as the test goes on.
>>
>> The reason for this is because fsstress is running fsync sometimes, and
>> because we're messing with block groups we often fall through to
>> btrfs_commit_transaction, so will often have 20-30 threads all calling
>> btrfs_commit_transaction at the same time.
>>
>> These all get stuck contending on the extent tree while they try to run
>> delayed refs during the initial part of the commit.
>>
>> This is suboptimal, really because the extent tree is a single point of
>> failure we only want one thread acting on that tree at once to reduce
>> lock contention.  Fix this by making the flushing mechanism a bit
>> operation, to make it easy to use test_and_set_bit() in order to make
>> sure only one task does this initial flush.
>>
>> Once we're into the transaction commit we only have one thread doing
>> delayed ref running, it's just this initial pre-flush that is
>> problematic.  With this patch my stress test takes around 90 minutes to
>> run, instead of 12 hours.
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>  fs/btrfs/delayed-ref.h | 12 ++++++------
>>  fs/btrfs/transaction.c | 33 ++++++++++++++++-----------------
>>  2 files changed, 22 insertions(+), 23 deletions(-)
>>
>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>> index 1c977e6d45dc..6e414785b56f 100644
>> --- a/fs/btrfs/delayed-ref.h
>> +++ b/fs/btrfs/delayed-ref.h
>> @@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
>>  	u64 offset;
>>  };
>>  
>> +enum btrfs_delayed_ref_flags {
>> +	/* Used to indicate that we are flushing delayed refs for the commit. */
>> +	BTRFS_DELAYED_REFS_FLUSHING,
>> +};
>> +
>>  struct btrfs_delayed_ref_root {
>>  	/* head ref rbtree */
>>  	struct rb_root_cached href_root;
>> @@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
>>  
>>  	u64 pending_csums;
>>  
>> -	/*
>> -	 * set when the tree is flushing before a transaction commit,
>> -	 * used by the throttling code to decide if new updates need
>> -	 * to be run right away
>> -	 */
>> -	int flushing;
>> +	unsigned long flags;
>>  
>>  	u64 run_delayed_start;
>>  
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index f51f9e39bcee..ccd37fbe5db1 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -909,9 +909,9 @@ bool btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
>>  {
>>  	struct btrfs_transaction *cur_trans = trans->transaction;
>>  
>> -	smp_mb();
>>  	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
>> -	    cur_trans->delayed_refs.flushing)
>> +	    test_bit(BTRFS_DELAYED_REFS_FLUSHING,
>> +		     &cur_trans->delayed_refs.flags))
>>  		return true;
>>  
>>  	return should_end_transaction(trans);
>> @@ -2043,23 +2043,22 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>>  	btrfs_trans_release_metadata(trans);
>>  	trans->block_rsv = NULL;
>>  
>> -	/* make a pass through all the delayed refs we have so far
>> -	 * any runnings procs may add more while we are here
>> -	 */
>> -	ret = btrfs_run_delayed_refs(trans, 0);
>> -	if (ret) {
>> -		btrfs_end_transaction(trans);
>> -		return ret;
>> -	}
>> -
>> -	cur_trans = trans->transaction;
>> -
>>  	/*
>> -	 * set the flushing flag so procs in this transaction have to
>> -	 * start sending their work down.
>> +	 * We only want one transaction commit doing the flushing so we do not
>> +	 * waste a bunch of time on lock contention on the extent root node.
>>  	 */
>> -	cur_trans->delayed_refs.flushing = 1;
>> -	smp_wmb();
> 
> This barrier obviously separates the flushing = 1 and the rest of the
> code, now implemented as test_and_set_bit, which implies full barrier.
> 
> However, hunk in btrfs_should_end_transaction removes the barrier and
> I'm not sure whether this is correct:
> 
> -	smp_mb();
>  	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
> -	    cur_trans->delayed_refs.flushing)
> +	    test_bit(BTRFS_DELAYED_REFS_FLUSHING,
> +		     &cur_trans->delayed_refs.flags))
>  		return true;
> 
> This is never called under locks so we don't have complete
> synchronization of neither the transaction state nor the flushing bit.
> btrfs_should_end_transaction is merely a hint and not called in critical
> places so we could probably afford to keep it without a barrier, or keep
> it with comment(s).

I think the point is moot in this case, because the test_bit either sees
the flag or it doesn't. It's not possible for the flag to be set AND
should_end_transaction return false that would be gross violation of
program correctness.

> 

  reply	other threads:[~2021-01-11  8:34 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18 19:24 [PATCH v5 0/8] A variety of lock contention fixes Josef Bacik
2020-12-18 19:24 ` [PATCH v5 1/8] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
2020-12-18 19:24 ` [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
2021-01-08 16:01   ` David Sterba
2021-01-11  8:33     ` Nikolay Borisov [this message]
2021-01-11 21:50       ` David Sterba
2021-01-12  9:17         ` Nikolay Borisov
2021-01-26 17:36           ` David Sterba
2020-12-18 19:24 ` [PATCH v5 3/8] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-12-18 19:24 ` [PATCH v5 4/8] btrfs: only run delayed refs once before committing Josef Bacik
2020-12-18 19:24 ` [PATCH v5 5/8] btrfs: move delayed ref flushing for qgroup into qgroup helper Josef Bacik
2020-12-18 19:24 ` [PATCH v5 6/8] btrfs: remove bogus BUG_ON in alloc_reserved_tree_block Josef Bacik
2020-12-18 19:24 ` [PATCH v5 7/8] btrfs: stop running all delayed refs during snapshot Josef Bacik
2020-12-18 19:24 ` [PATCH v5 8/8] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2021-01-26 17:51 ` [PATCH v5 0/8] A variety of lock contention fixes David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52aef9a6-efc7-0820-7056-067e69c2a856@suse.com \
    --to=nborisov@suse.com \
    --cc=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.