All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit
Date: Fri, 16 Oct 2020 10:19:17 +0300	[thread overview]
Message-ID: <8e2eef6b-9a38-63f2-f5ec-4f251e9ce6ce@suse.com> (raw)
In-Reply-To: <93459148-83d6-e5b6-f819-811833158750@toxicpanda.com>



On 15.10.20 г. 23:26 ч., Josef Bacik wrote:
> On 10/15/20 3:35 PM, Nikolay Borisov wrote:
>>
>>
>> On 15.10.20 г. 21:25 ч., Josef Bacik wrote:
>>> I've been running a stress test that runs 20 workers in their own
>>> subvolume, which are running an fsstress instance with 4 threads per
>>> worker, which is 80 total fsstress threads.  In addition to this I'm
>>> running balance in the background as well as creating and deleting
>>> snapshots.  This test takes around 12 hours to run normally, going
>>> slower and slower as the test goes on.
>>>
>>> The reason for this is because fsstress is running fsync sometimes, and
>>> because we're messing with block groups we often fall through to
>>> btrfs_commit_transaction, so will often have 20-30 threads all calling
>>> btrfs_commit_transaction at the same time.
>>>
>>> These all get stuck contending on the extent tree while they try to run
>>> delayed refs during the initial part of the commit.
>>>
>>> This is suboptimal, really because the extent tree is a single point of
>>> failure we only want one thread acting on that tree at once to reduce
>>> lock contention.  Fix this by making the flushing mechanism a bit
>>> operation, to make it easy to use test_and_set_bit() in order to make
>>> sure only one task does this initial flush.
>>>
>>> Once we're into the transaction commit we only have one thread doing
>>> delayed ref running, it's just this initial pre-flush that is
>>> problematic.  With this patch my stress test takes around 90 minutes to
>>> run, instead of 12 hours.
>>>
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>>   fs/btrfs/delayed-ref.h | 12 ++++++------
>>>   fs/btrfs/transaction.c | 32 ++++++++++++++++----------------
>>>   2 files changed, 22 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>>> index 1c977e6d45dc..6e414785b56f 100644
>>> --- a/fs/btrfs/delayed-ref.h
>>> +++ b/fs/btrfs/delayed-ref.h
>>> @@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
>>>       u64 offset;
>>>   };
>>>   +enum btrfs_delayed_ref_flags {
>>> +    /* Used to indicate that we are flushing delayed refs for the
>>> commit. */
>>> +    BTRFS_DELAYED_REFS_FLUSHING,
>>> +};
>>> +
>>>   struct btrfs_delayed_ref_root {
>>>       /* head ref rbtree */
>>>       struct rb_root_cached href_root;
>>> @@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
>>>         u64 pending_csums;
>>>   -    /*
>>> -     * set when the tree is flushing before a transaction commit,
>>> -     * used by the throttling code to decide if new updates need
>>> -     * to be run right away
>>> -     */
>>> -    int flushing;
>>> +    unsigned long flags;
>>>         u64 run_delayed_start;
>>>   diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 52ada47aff50..e8e706def41c 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -872,7 +872,8 @@ int btrfs_should_end_transaction(struct
>>> btrfs_trans_handle *trans)
>>>         smp_mb();
>>
>> Is this memory barrier required now that you have removed the one in
>> btrfs_commit_transaction ?
>>
> 
> I had it in my head that we needed it for ->state too, but we don't,
> I'll fix that up.  Thanks,


I went through transaction.c and found another place where we have an
smp_mb() before checking cur_trans->state, in start_transaction:


             smp_mb();

   1         if (cur_trans->state >= TRANS_STATE_COMMIT_START &&

   2             may_wait_transaction(fs_info, type)) {

   3                 current->journal_info = h;

   4                 btrfs_commit_transaction(h);

   5                 goto again;

   6         }

Shouldn't that smp_mb() also be removed?

> 
> Josef
> 

  reply	other threads:[~2020-10-16  7:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
2020-10-15 18:25 ` [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
2020-10-16  7:26   ` Nikolay Borisov
2020-10-15 18:25 ` [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
2020-10-15 19:35   ` Nikolay Borisov
2020-10-15 20:26     ` Josef Bacik
2020-10-16  7:19       ` Nikolay Borisov [this message]
2020-10-15 18:25 ` [PATCH 3/6] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-10-15 18:26 ` [PATCH 4/6] btrfs: only run delayed refs once before committing Josef Bacik
2020-10-15 18:26 ` [PATCH 5/6] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2020-10-16  7:35   ` Nikolay Borisov
2020-10-15 18:26 ` [PATCH 6/6] btrfs: stop running all delayed refs during snapshot Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e2eef6b-9a38-63f2-f5ec-4f251e9ce6ce@suse.com \
    --to=nborisov@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.