All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	Nikolay Borisov <nborisov@suse.com>
Subject: Re: [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit
Date: Fri, 8 Jan 2021 17:01:09 +0100	[thread overview]
Message-ID: <20210108160109.GB6430@twin.jikos.cz> (raw)
In-Reply-To: <9e47b11bdfe5b4905fdaa81e952de2e2466c6335.1608319304.git.josef@toxicpanda.com>

On Fri, Dec 18, 2020 at 02:24:20PM -0500, Josef Bacik wrote:
> I've been running a stress test that runs 20 workers in their own
> subvolume, which are running an fsstress instance with 4 threads per
> worker, which is 80 total fsstress threads.  In addition to this I'm
> running balance in the background as well as creating and deleting
> snapshots.  This test takes around 12 hours to run normally, going
> slower and slower as the test goes on.
> 
> The reason for this is because fsstress is running fsync sometimes, and
> because we're messing with block groups we often fall through to
> btrfs_commit_transaction, so will often have 20-30 threads all calling
> btrfs_commit_transaction at the same time.
> 
> These all get stuck contending on the extent tree while they try to run
> delayed refs during the initial part of the commit.
> 
> This is suboptimal, really because the extent tree is a single point of
> failure we only want one thread acting on that tree at once to reduce
> lock contention.  Fix this by making the flushing mechanism a bit
> operation, to make it easy to use test_and_set_bit() in order to make
> sure only one task does this initial flush.
> 
> Once we're into the transaction commit we only have one thread doing
> delayed ref running, it's just this initial pre-flush that is
> problematic.  With this patch my stress test takes around 90 minutes to
> run, instead of 12 hours.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/delayed-ref.h | 12 ++++++------
>  fs/btrfs/transaction.c | 33 ++++++++++++++++-----------------
>  2 files changed, 22 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 1c977e6d45dc..6e414785b56f 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
>  	u64 offset;
>  };
>  
> +enum btrfs_delayed_ref_flags {
> +	/* Used to indicate that we are flushing delayed refs for the commit. */
> +	BTRFS_DELAYED_REFS_FLUSHING,
> +};
> +
>  struct btrfs_delayed_ref_root {
>  	/* head ref rbtree */
>  	struct rb_root_cached href_root;
> @@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
>  
>  	u64 pending_csums;
>  
> -	/*
> -	 * set when the tree is flushing before a transaction commit,
> -	 * used by the throttling code to decide if new updates need
> -	 * to be run right away
> -	 */
> -	int flushing;
> +	unsigned long flags;
>  
>  	u64 run_delayed_start;
>  
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index f51f9e39bcee..ccd37fbe5db1 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -909,9 +909,9 @@ bool btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
>  {
>  	struct btrfs_transaction *cur_trans = trans->transaction;
>  
> -	smp_mb();
>  	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
> -	    cur_trans->delayed_refs.flushing)
> +	    test_bit(BTRFS_DELAYED_REFS_FLUSHING,
> +		     &cur_trans->delayed_refs.flags))
>  		return true;
>  
>  	return should_end_transaction(trans);
> @@ -2043,23 +2043,22 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  	btrfs_trans_release_metadata(trans);
>  	trans->block_rsv = NULL;
>  
> -	/* make a pass through all the delayed refs we have so far
> -	 * any runnings procs may add more while we are here
> -	 */
> -	ret = btrfs_run_delayed_refs(trans, 0);
> -	if (ret) {
> -		btrfs_end_transaction(trans);
> -		return ret;
> -	}
> -
> -	cur_trans = trans->transaction;
> -
>  	/*
> -	 * set the flushing flag so procs in this transaction have to
> -	 * start sending their work down.
> +	 * We only want one transaction commit doing the flushing so we do not
> +	 * waste a bunch of time on lock contention on the extent root node.
>  	 */
> -	cur_trans->delayed_refs.flushing = 1;
> -	smp_wmb();

This barrier obviously separates the flushing = 1 and the rest of the
code, now implemented as test_and_set_bit, which implies full barrier.

However, hunk in btrfs_should_end_transaction removes the barrier and
I'm not sure whether this is correct:

-	smp_mb();
 	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
-	    cur_trans->delayed_refs.flushing)
+	    test_bit(BTRFS_DELAYED_REFS_FLUSHING,
+		     &cur_trans->delayed_refs.flags))
 		return true;

This is never called under locks so we don't have complete
synchronization of neither the transaction state nor the flushing bit.
btrfs_should_end_transaction is merely a hint and not called in critical
places so we could probably afford to keep it without a barrier, or keep
it with comment(s).

  reply	other threads:[~2021-01-08 16:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18 19:24 [PATCH v5 0/8] A variety of lock contention fixes Josef Bacik
2020-12-18 19:24 ` [PATCH v5 1/8] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
2020-12-18 19:24 ` [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
2021-01-08 16:01   ` David Sterba [this message]
2021-01-11  8:33     ` Nikolay Borisov
2021-01-11 21:50       ` David Sterba
2021-01-12  9:17         ` Nikolay Borisov
2021-01-26 17:36           ` David Sterba
2020-12-18 19:24 ` [PATCH v5 3/8] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-12-18 19:24 ` [PATCH v5 4/8] btrfs: only run delayed refs once before committing Josef Bacik
2020-12-18 19:24 ` [PATCH v5 5/8] btrfs: move delayed ref flushing for qgroup into qgroup helper Josef Bacik
2020-12-18 19:24 ` [PATCH v5 6/8] btrfs: remove bogus BUG_ON in alloc_reserved_tree_block Josef Bacik
2020-12-18 19:24 ` [PATCH v5 7/8] btrfs: stop running all delayed refs during snapshot Josef Bacik
2020-12-18 19:24 ` [PATCH v5 8/8] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2021-01-26 17:51 ` [PATCH v5 0/8] A variety of lock contention fixes David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210108160109.GB6430@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    --subject='Re: [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.