From: Nikolay Borisov <nborisov@suse.com>
To: dsterba@suse.cz, Josef Bacik <josef@toxicpanda.com>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit
Date: Mon, 11 Jan 2021 10:33:42 +0200 [thread overview]
Message-ID: <52aef9a6-efc7-0820-7056-067e69c2a856@suse.com> (raw)
In-Reply-To: <20210108160109.GB6430@twin.jikos.cz>
On 8.01.21 г. 18:01 ч., David Sterba wrote:
> On Fri, Dec 18, 2020 at 02:24:20PM -0500, Josef Bacik wrote:
>> I've been running a stress test that runs 20 workers in their own
>> subvolume, which are running an fsstress instance with 4 threads per
>> worker, which is 80 total fsstress threads. In addition to this I'm
>> running balance in the background as well as creating and deleting
>> snapshots. This test takes around 12 hours to run normally, going
>> slower and slower as the test goes on.
>>
>> The reason for this is because fsstress is running fsync sometimes, and
>> because we're messing with block groups we often fall through to
>> btrfs_commit_transaction, so will often have 20-30 threads all calling
>> btrfs_commit_transaction at the same time.
>>
>> These all get stuck contending on the extent tree while they try to run
>> delayed refs during the initial part of the commit.
>>
>> This is suboptimal, really because the extent tree is a single point of
>> failure we only want one thread acting on that tree at once to reduce
>> lock contention. Fix this by making the flushing mechanism a bit
>> operation, to make it easy to use test_and_set_bit() in order to make
>> sure only one task does this initial flush.
>>
>> Once we're into the transaction commit we only have one thread doing
>> delayed ref running, it's just this initial pre-flush that is
>> problematic. With this patch my stress test takes around 90 minutes to
>> run, instead of 12 hours.
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>> fs/btrfs/delayed-ref.h | 12 ++++++------
>> fs/btrfs/transaction.c | 33 ++++++++++++++++-----------------
>> 2 files changed, 22 insertions(+), 23 deletions(-)
>>
>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>> index 1c977e6d45dc..6e414785b56f 100644
>> --- a/fs/btrfs/delayed-ref.h
>> +++ b/fs/btrfs/delayed-ref.h
>> @@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
>> u64 offset;
>> };
>>
>> +enum btrfs_delayed_ref_flags {
>> + /* Used to indicate that we are flushing delayed refs for the commit. */
>> + BTRFS_DELAYED_REFS_FLUSHING,
>> +};
>> +
>> struct btrfs_delayed_ref_root {
>> /* head ref rbtree */
>> struct rb_root_cached href_root;
>> @@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
>>
>> u64 pending_csums;
>>
>> - /*
>> - * set when the tree is flushing before a transaction commit,
>> - * used by the throttling code to decide if new updates need
>> - * to be run right away
>> - */
>> - int flushing;
>> + unsigned long flags;
>>
>> u64 run_delayed_start;
>>
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index f51f9e39bcee..ccd37fbe5db1 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -909,9 +909,9 @@ bool btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
>> {
>> struct btrfs_transaction *cur_trans = trans->transaction;
>>
>> - smp_mb();
>> if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
>> - cur_trans->delayed_refs.flushing)
>> + test_bit(BTRFS_DELAYED_REFS_FLUSHING,
>> + &cur_trans->delayed_refs.flags))
>> return true;
>>
>> return should_end_transaction(trans);
>> @@ -2043,23 +2043,22 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>> btrfs_trans_release_metadata(trans);
>> trans->block_rsv = NULL;
>>
>> - /* make a pass through all the delayed refs we have so far
>> - * any runnings procs may add more while we are here
>> - */
>> - ret = btrfs_run_delayed_refs(trans, 0);
>> - if (ret) {
>> - btrfs_end_transaction(trans);
>> - return ret;
>> - }
>> -
>> - cur_trans = trans->transaction;
>> -
>> /*
>> - * set the flushing flag so procs in this transaction have to
>> - * start sending their work down.
>> + * We only want one transaction commit doing the flushing so we do not
>> + * waste a bunch of time on lock contention on the extent root node.
>> */
>> - cur_trans->delayed_refs.flushing = 1;
>> - smp_wmb();
>
> This barrier obviously separates the flushing = 1 and the rest of the
> code, now implemented as test_and_set_bit, which implies full barrier.
>
> However, hunk in btrfs_should_end_transaction removes the barrier and
> I'm not sure whether this is correct:
>
> - smp_mb();
> if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
> - cur_trans->delayed_refs.flushing)
> + test_bit(BTRFS_DELAYED_REFS_FLUSHING,
> + &cur_trans->delayed_refs.flags))
> return true;
>
> This is never called under locks so we don't have complete
> synchronization of neither the transaction state nor the flushing bit.
> btrfs_should_end_transaction is merely a hint and not called in critical
> places so we could probably afford to keep it without a barrier, or keep
> it with comment(s).
I think the point is moot in this case, because the test_bit either sees
the flag or it doesn't. It's not possible for the flag to be set AND
should_end_transaction return false that would be gross violation of
program correctness.
>
next prev parent reply other threads:[~2021-01-11 8:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-18 19:24 [PATCH v5 0/8] A variety of lock contention fixes Josef Bacik
2020-12-18 19:24 ` [PATCH v5 1/8] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
2020-12-18 19:24 ` [PATCH v5 2/8] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
2021-01-08 16:01 ` David Sterba
2021-01-11 8:33 ` Nikolay Borisov [this message]
2021-01-11 21:50 ` David Sterba
2021-01-12 9:17 ` Nikolay Borisov
2021-01-26 17:36 ` David Sterba
2020-12-18 19:24 ` [PATCH v5 3/8] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-12-18 19:24 ` [PATCH v5 4/8] btrfs: only run delayed refs once before committing Josef Bacik
2020-12-18 19:24 ` [PATCH v5 5/8] btrfs: move delayed ref flushing for qgroup into qgroup helper Josef Bacik
2020-12-18 19:24 ` [PATCH v5 6/8] btrfs: remove bogus BUG_ON in alloc_reserved_tree_block Josef Bacik
2020-12-18 19:24 ` [PATCH v5 7/8] btrfs: stop running all delayed refs during snapshot Josef Bacik
2020-12-18 19:24 ` [PATCH v5 8/8] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2021-01-26 17:51 ` [PATCH v5 0/8] A variety of lock contention fixes David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52aef9a6-efc7-0820-7056-067e69c2a856@suse.com \
--to=nborisov@suse.com \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).