All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] A variety of lock contention fixes
@ 2020-10-15 18:25 Josef Bacik
  2020-10-15 18:25 ` [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 18:25 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

I've been running some stress tests recently in order to try and reproduce some
problems I've tripped over in relocation.  Most of this series is a reposting of
patches I wrote when debugging related issues for Zygo that got lost.  I've
updated one of them to make the lock contention even better, making it so I have
to ramp up my stress test loops because it now finishes way too fast.  Thanks,

Josef

Josef Bacik (6):
  btrfs: do not block on deleted bgs mutex in the cleaner
  btrfs: only let one thread pre-flush delayed refs in commit
  btrfs: delayed refs pre-flushing should only run the heads we have
  btrfs: only run delayed refs once before committing
  btrfs: run delayed refs less often in commit_cowonly_roots
  btrfs: stop running all delayed refs during snapshot

 fs/btrfs/block-group.c | 11 +++++--
 fs/btrfs/delayed-ref.h | 12 +++----
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/transaction.c | 73 ++++++++++++++++--------------------------
 4 files changed, 43 insertions(+), 55 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner
  2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
@ 2020-10-15 18:25 ` Josef Bacik
  2020-10-16  7:26   ` Nikolay Borisov
  2020-10-15 18:25 ` [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 18:25 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

While running some stress tests I started getting hung task messages.
This is because the delete unused bg's code has to take the
delete_unused_bgs_mutex to do it's work, which is taken by balance to
make sure we don't delete block groups while we're balancing.

The problem is a balance can take a while, and so we were getting hung
task warnings.  We don't need to block and run these things, and the
cleaner is needed to do other work, so trylock on this mutex and just
bail if we can't acquire it right away.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index c0f1d6818df7..5e0e5843edf9 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1333,6 +1333,13 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
 		return;
 
+	/*
+	 * Long running balances can keep us blocked here for eternity, so
+	 * simply skip deletion if we're unable to get the mutex.
+	 */
+	if (!mutex_trylock(&fs_info->delete_unused_bgs_mutex))
+		return;
+
 	spin_lock(&fs_info->unused_bgs_lock);
 	while (!list_empty(&fs_info->unused_bgs)) {
 		int trimming;
@@ -1352,8 +1359,6 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 
 		btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
 
-		mutex_lock(&fs_info->delete_unused_bgs_mutex);
-
 		/* Don't want to race with allocators so take the groups_sem */
 		down_write(&space_info->groups_sem);
 
@@ -1499,11 +1504,11 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 end_trans:
 		btrfs_end_transaction(trans);
 next:
-		mutex_unlock(&fs_info->delete_unused_bgs_mutex);
 		btrfs_put_block_group(block_group);
 		spin_lock(&fs_info->unused_bgs_lock);
 	}
 	spin_unlock(&fs_info->unused_bgs_lock);
+	mutex_unlock(&fs_info->delete_unused_bgs_mutex);
 	return;
 
 flip_async:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit
  2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
  2020-10-15 18:25 ` [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
@ 2020-10-15 18:25 ` Josef Bacik
  2020-10-15 19:35   ` Nikolay Borisov
  2020-10-15 18:25 ` [PATCH 3/6] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 18:25 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I've been running a stress test that runs 20 workers in their own
subvolume, which are running an fsstress instance with 4 threads per
worker, which is 80 total fsstress threads.  In addition to this I'm
running balance in the background as well as creating and deleting
snapshots.  This test takes around 12 hours to run normally, going
slower and slower as the test goes on.

The reason for this is because fsstress is running fsync sometimes, and
because we're messing with block groups we often fall through to
btrfs_commit_transaction, so will often have 20-30 threads all calling
btrfs_commit_transaction at the same time.

These all get stuck contending on the extent tree while they try to run
delayed refs during the initial part of the commit.

This is suboptimal, really because the extent tree is a single point of
failure we only want one thread acting on that tree at once to reduce
lock contention.  Fix this by making the flushing mechanism a bit
operation, to make it easy to use test_and_set_bit() in order to make
sure only one task does this initial flush.

Once we're into the transaction commit we only have one thread doing
delayed ref running, it's just this initial pre-flush that is
problematic.  With this patch my stress test takes around 90 minutes to
run, instead of 12 hours.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/delayed-ref.h | 12 ++++++------
 fs/btrfs/transaction.c | 32 ++++++++++++++++----------------
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 1c977e6d45dc..6e414785b56f 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
 	u64 offset;
 };
 
+enum btrfs_delayed_ref_flags {
+	/* Used to indicate that we are flushing delayed refs for the commit. */
+	BTRFS_DELAYED_REFS_FLUSHING,
+};
+
 struct btrfs_delayed_ref_root {
 	/* head ref rbtree */
 	struct rb_root_cached href_root;
@@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
 
 	u64 pending_csums;
 
-	/*
-	 * set when the tree is flushing before a transaction commit,
-	 * used by the throttling code to decide if new updates need
-	 * to be run right away
-	 */
-	int flushing;
+	unsigned long flags;
 
 	u64 run_delayed_start;
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 52ada47aff50..e8e706def41c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -872,7 +872,8 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
 
 	smp_mb();
 	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
-	    cur_trans->delayed_refs.flushing)
+	    test_bit(BTRFS_DELAYED_REFS_FLUSHING,
+		     &cur_trans->delayed_refs.flags))
 		return 1;
 
 	return should_end_transaction(trans);
@@ -2042,23 +2043,22 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
 
-	/* make a pass through all the delayed refs we have so far
-	 * any runnings procs may add more while we are here
-	 */
-	ret = btrfs_run_delayed_refs(trans, 0);
-	if (ret) {
-		btrfs_end_transaction(trans);
-		return ret;
-	}
-
-	cur_trans = trans->transaction;
-
 	/*
-	 * set the flushing flag so procs in this transaction have to
-	 * start sending their work down.
+	 * We only want one transaction commit doing the flushing so we do not
+	 * waste a bunch of time on lock contention on the extent root node.
 	 */
-	cur_trans->delayed_refs.flushing = 1;
-	smp_wmb();
+	if (!test_and_set_bit(BTRFS_DELAYED_REFS_FLUSHING,
+			      &cur_trans->delayed_refs.flags)) {
+		/*
+		 * make a pass through all the delayed refs we have so far
+		 * any runnings procs may add more while we are here
+		 */
+		ret = btrfs_run_delayed_refs(trans, 0);
+		if (ret) {
+			btrfs_end_transaction(trans);
+			return ret;
+		}
+	}
 
 	btrfs_create_pending_block_groups(trans);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] btrfs: delayed refs pre-flushing should only run the heads we have
  2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
  2020-10-15 18:25 ` [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
  2020-10-15 18:25 ` [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
@ 2020-10-15 18:25 ` Josef Bacik
  2020-10-15 18:26 ` [PATCH 4/6] btrfs: only run delayed refs once before committing Josef Bacik
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 18:25 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

Previously our delayed ref running used the total number of items as the
items to run.  However we changed that to number of heads to run with
the delayed_refs_rsv, as generally we want to run all of the operations
for one bytenr.

But with btrfs_run_delayed_refs(trans, 0) we set our count to 2x the
number of items that we have.  This is generally fine, but if we have
some operation generation loads of delayed refs while we're doing this
pre-flushing in the transaction commit, we'll just spin forever doing
delayed refs.

Fix this to simply pick the number of delayed refs we currently have,
that way we do not end up doing a lot of extra work that's being
generated in other threads.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5fd60b13f4f8..a7f0a1480cd9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2180,7 +2180,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
 
 	delayed_refs = &trans->transaction->delayed_refs;
 	if (count == 0)
-		count = atomic_read(&delayed_refs->num_entries) * 2;
+		count = delayed_refs->num_heads_ready;
 
 again:
 #ifdef SCRAMBLE_DELAYED_REFS
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] btrfs: only run delayed refs once before committing
  2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
                   ` (2 preceding siblings ...)
  2020-10-15 18:25 ` [PATCH 3/6] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
@ 2020-10-15 18:26 ` Josef Bacik
  2020-10-15 18:26 ` [PATCH 5/6] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
  2020-10-15 18:26 ` [PATCH 6/6] btrfs: stop running all delayed refs during snapshot Josef Bacik
  5 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 18:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Nikolay Borisov

We try to pre-flush the delayed refs when committing, because we want to
do as little work as possible in the critical section of the transaction
commit.

However doing this twice can lead to very long transaction commit delays
as other threads are allowed to continue to generate more delayed refs,
which potentially delays the commit by multiple minutes in very extreme
cases.

So simply stick to one pre-flush, and then continue the rest of the
transaction commit.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/transaction.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e8e706def41c..e4c7fa5076a7 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2062,12 +2062,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	btrfs_create_pending_block_groups(trans);
 
-	ret = btrfs_run_delayed_refs(trans, 0);
-	if (ret) {
-		btrfs_end_transaction(trans);
-		return ret;
-	}
-
 	if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) {
 		int run_it = 0;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] btrfs: run delayed refs less often in commit_cowonly_roots
  2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
                   ` (3 preceding siblings ...)
  2020-10-15 18:26 ` [PATCH 4/6] btrfs: only run delayed refs once before committing Josef Bacik
@ 2020-10-15 18:26 ` Josef Bacik
  2020-10-16  7:35   ` Nikolay Borisov
  2020-10-15 18:26 ` [PATCH 6/6] btrfs: stop running all delayed refs during snapshot Josef Bacik
  5 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 18:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We love running delayed refs in commit_cowonly_roots, but it is a bit
excessive.  I was seeing cases of running 3 or 4 refs a few times in a
row during this time.  Instead simply update all of the roots first,
then run delayed refs, then handle the empty block groups case, and then
if we have any more dirty roots do the whole thing again.  This allows
us to be much more efficient with our delayed ref running, as we can
batch a few more operations at once.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e4c7fa5076a7..f4d55ac7f8f2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1651,12 +1651,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 		goto fail;
 	}
 
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-	if (ret) {
-		btrfs_abort_transaction(trans, ret);
-		goto fail;
-	}
-
 	/*
 	 * Do special qgroup accounting for snapshot, as we do some qgroup
 	 * snapshot hack to do fast snapshot.
@@ -1704,12 +1698,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-	if (ret) {
-		btrfs_abort_transaction(trans, ret);
-		goto fail;
-	}
-
 fail:
 	pending->error = ret;
 dir_item_existed:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] btrfs: stop running all delayed refs during snapshot
  2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
                   ` (4 preceding siblings ...)
  2020-10-15 18:26 ` [PATCH 5/6] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
@ 2020-10-15 18:26 ` Josef Bacik
  5 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 18:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This was added a very long time ago to work around issues with delayed
refs and with qgroups.  Both of these issues have since been properly
fixed, so all this does is cause a lot of lock contention with anybody
else who is running delayed refs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f4d55ac7f8f2..fc8f12e609a2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1189,10 +1189,6 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 	btrfs_tree_unlock(eb);
 	free_extent_buffer(eb);
 
-	if (ret)
-		return ret;
-
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
 	if (ret)
 		return ret;
 
@@ -1210,10 +1206,6 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 	if (ret)
 		return ret;
 
-	/* run_qgroups might have added some more refs */
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-	if (ret)
-		return ret;
 again:
 	while (!list_empty(&fs_info->dirty_cowonly_roots)) {
 		struct btrfs_root *root;
@@ -1228,15 +1220,24 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 		ret = update_cowonly_root(trans, root);
 		if (ret)
 			return ret;
-		ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-		if (ret)
-			return ret;
 	}
 
+	/* Now flush any delayed refs generated by updating all of the roots. */
+	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
+	if (ret)
+		return ret;
+
 	while (!list_empty(dirty_bgs) || !list_empty(io_bgs)) {
 		ret = btrfs_write_dirty_block_groups(trans);
 		if (ret)
 			return ret;
+
+		/*
+		 * We're writing the dirty block groups, which could generate
+		 * delayed refs, which could generate more dirty block groups,
+		 * so we want to keep this flushing in this loop to make sure
+		 * everything gets run.
+		 */
 		ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
 		if (ret)
 			return ret;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit
  2020-10-15 18:25 ` [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
@ 2020-10-15 19:35   ` Nikolay Borisov
  2020-10-15 20:26     ` Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2020-10-15 19:35 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 15.10.20 г. 21:25 ч., Josef Bacik wrote:
> I've been running a stress test that runs 20 workers in their own
> subvolume, which are running an fsstress instance with 4 threads per
> worker, which is 80 total fsstress threads.  In addition to this I'm
> running balance in the background as well as creating and deleting
> snapshots.  This test takes around 12 hours to run normally, going
> slower and slower as the test goes on.
> 
> The reason for this is because fsstress is running fsync sometimes, and
> because we're messing with block groups we often fall through to
> btrfs_commit_transaction, so will often have 20-30 threads all calling
> btrfs_commit_transaction at the same time.
> 
> These all get stuck contending on the extent tree while they try to run
> delayed refs during the initial part of the commit.
> 
> This is suboptimal, really because the extent tree is a single point of
> failure we only want one thread acting on that tree at once to reduce
> lock contention.  Fix this by making the flushing mechanism a bit
> operation, to make it easy to use test_and_set_bit() in order to make
> sure only one task does this initial flush.
> 
> Once we're into the transaction commit we only have one thread doing
> delayed ref running, it's just this initial pre-flush that is
> problematic.  With this patch my stress test takes around 90 minutes to
> run, instead of 12 hours.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/delayed-ref.h | 12 ++++++------
>  fs/btrfs/transaction.c | 32 ++++++++++++++++----------------
>  2 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 1c977e6d45dc..6e414785b56f 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
>  	u64 offset;
>  };
>  
> +enum btrfs_delayed_ref_flags {
> +	/* Used to indicate that we are flushing delayed refs for the commit. */
> +	BTRFS_DELAYED_REFS_FLUSHING,
> +};
> +
>  struct btrfs_delayed_ref_root {
>  	/* head ref rbtree */
>  	struct rb_root_cached href_root;
> @@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
>  
>  	u64 pending_csums;
>  
> -	/*
> -	 * set when the tree is flushing before a transaction commit,
> -	 * used by the throttling code to decide if new updates need
> -	 * to be run right away
> -	 */
> -	int flushing;
> +	unsigned long flags;
>  
>  	u64 run_delayed_start;
>  
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 52ada47aff50..e8e706def41c 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -872,7 +872,8 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
>  
>  	smp_mb();

Is this memory barrier required now that you have removed the one in
btrfs_commit_transaction ?

>  	if (cur_trans->state >= TRANS_STATE_COMMIT_START ||
> -	    cur_trans->delayed_refs.flushing)
> +	    test_bit(BTRFS_DELAYED_REFS_FLUSHING,
> +		     &cur_trans->delayed_refs.flags))
>  		return 1;
>  
>  	return should_end_transaction(trans);
> @@ -2042,23 +2043,22 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  	btrfs_trans_release_metadata(trans);
>  	trans->block_rsv = NULL;
>  
> -	/* make a pass through all the delayed refs we have so far
> -	 * any runnings procs may add more while we are here
> -	 */
> -	ret = btrfs_run_delayed_refs(trans, 0);
> -	if (ret) {
> -		btrfs_end_transaction(trans);
> -		return ret;
> -	}
> -
> -	cur_trans = trans->transaction;
> -
>  	/*
> -	 * set the flushing flag so procs in this transaction have to
> -	 * start sending their work down.
> +	 * We only want one transaction commit doing the flushing so we do not
> +	 * waste a bunch of time on lock contention on the extent root node.
>  	 */
> -	cur_trans->delayed_refs.flushing = 1;
> -	smp_wmb();
> +	if (!test_and_set_bit(BTRFS_DELAYED_REFS_FLUSHING,
> +			      &cur_trans->delayed_refs.flags)) {
> +		/*
> +		 * make a pass through all the delayed refs we have so far
> +		 * any runnings procs may add more while we are here
> +		 */
> +		ret = btrfs_run_delayed_refs(trans, 0);
> +		if (ret) {
> +			btrfs_end_transaction(trans);
> +			return ret;
> +		}
> +	}
>  
>  	btrfs_create_pending_block_groups(trans);
>  
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit
  2020-10-15 19:35   ` Nikolay Borisov
@ 2020-10-15 20:26     ` Josef Bacik
  2020-10-16  7:19       ` Nikolay Borisov
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2020-10-15 20:26 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs, kernel-team

On 10/15/20 3:35 PM, Nikolay Borisov wrote:
> 
> 
> On 15.10.20 г. 21:25 ч., Josef Bacik wrote:
>> I've been running a stress test that runs 20 workers in their own
>> subvolume, which are running an fsstress instance with 4 threads per
>> worker, which is 80 total fsstress threads.  In addition to this I'm
>> running balance in the background as well as creating and deleting
>> snapshots.  This test takes around 12 hours to run normally, going
>> slower and slower as the test goes on.
>>
>> The reason for this is because fsstress is running fsync sometimes, and
>> because we're messing with block groups we often fall through to
>> btrfs_commit_transaction, so will often have 20-30 threads all calling
>> btrfs_commit_transaction at the same time.
>>
>> These all get stuck contending on the extent tree while they try to run
>> delayed refs during the initial part of the commit.
>>
>> This is suboptimal, really because the extent tree is a single point of
>> failure we only want one thread acting on that tree at once to reduce
>> lock contention.  Fix this by making the flushing mechanism a bit
>> operation, to make it easy to use test_and_set_bit() in order to make
>> sure only one task does this initial flush.
>>
>> Once we're into the transaction commit we only have one thread doing
>> delayed ref running, it's just this initial pre-flush that is
>> problematic.  With this patch my stress test takes around 90 minutes to
>> run, instead of 12 hours.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/delayed-ref.h | 12 ++++++------
>>   fs/btrfs/transaction.c | 32 ++++++++++++++++----------------
>>   2 files changed, 22 insertions(+), 22 deletions(-)
>>
>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>> index 1c977e6d45dc..6e414785b56f 100644
>> --- a/fs/btrfs/delayed-ref.h
>> +++ b/fs/btrfs/delayed-ref.h
>> @@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
>>   	u64 offset;
>>   };
>>   
>> +enum btrfs_delayed_ref_flags {
>> +	/* Used to indicate that we are flushing delayed refs for the commit. */
>> +	BTRFS_DELAYED_REFS_FLUSHING,
>> +};
>> +
>>   struct btrfs_delayed_ref_root {
>>   	/* head ref rbtree */
>>   	struct rb_root_cached href_root;
>> @@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
>>   
>>   	u64 pending_csums;
>>   
>> -	/*
>> -	 * set when the tree is flushing before a transaction commit,
>> -	 * used by the throttling code to decide if new updates need
>> -	 * to be run right away
>> -	 */
>> -	int flushing;
>> +	unsigned long flags;
>>   
>>   	u64 run_delayed_start;
>>   
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 52ada47aff50..e8e706def41c 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -872,7 +872,8 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle *trans)
>>   
>>   	smp_mb();
> 
> Is this memory barrier required now that you have removed the one in
> btrfs_commit_transaction ?
> 

I had it in my head that we needed it for ->state too, but we don't, I'll fix 
that up.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit
  2020-10-15 20:26     ` Josef Bacik
@ 2020-10-16  7:19       ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2020-10-16  7:19 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 15.10.20 г. 23:26 ч., Josef Bacik wrote:
> On 10/15/20 3:35 PM, Nikolay Borisov wrote:
>>
>>
>> On 15.10.20 г. 21:25 ч., Josef Bacik wrote:
>>> I've been running a stress test that runs 20 workers in their own
>>> subvolume, which are running an fsstress instance with 4 threads per
>>> worker, which is 80 total fsstress threads.  In addition to this I'm
>>> running balance in the background as well as creating and deleting
>>> snapshots.  This test takes around 12 hours to run normally, going
>>> slower and slower as the test goes on.
>>>
>>> The reason for this is because fsstress is running fsync sometimes, and
>>> because we're messing with block groups we often fall through to
>>> btrfs_commit_transaction, so will often have 20-30 threads all calling
>>> btrfs_commit_transaction at the same time.
>>>
>>> These all get stuck contending on the extent tree while they try to run
>>> delayed refs during the initial part of the commit.
>>>
>>> This is suboptimal, really because the extent tree is a single point of
>>> failure we only want one thread acting on that tree at once to reduce
>>> lock contention.  Fix this by making the flushing mechanism a bit
>>> operation, to make it easy to use test_and_set_bit() in order to make
>>> sure only one task does this initial flush.
>>>
>>> Once we're into the transaction commit we only have one thread doing
>>> delayed ref running, it's just this initial pre-flush that is
>>> problematic.  With this patch my stress test takes around 90 minutes to
>>> run, instead of 12 hours.
>>>
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>>   fs/btrfs/delayed-ref.h | 12 ++++++------
>>>   fs/btrfs/transaction.c | 32 ++++++++++++++++----------------
>>>   2 files changed, 22 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>>> index 1c977e6d45dc..6e414785b56f 100644
>>> --- a/fs/btrfs/delayed-ref.h
>>> +++ b/fs/btrfs/delayed-ref.h
>>> @@ -135,6 +135,11 @@ struct btrfs_delayed_data_ref {
>>>       u64 offset;
>>>   };
>>>   +enum btrfs_delayed_ref_flags {
>>> +    /* Used to indicate that we are flushing delayed refs for the
>>> commit. */
>>> +    BTRFS_DELAYED_REFS_FLUSHING,
>>> +};
>>> +
>>>   struct btrfs_delayed_ref_root {
>>>       /* head ref rbtree */
>>>       struct rb_root_cached href_root;
>>> @@ -158,12 +163,7 @@ struct btrfs_delayed_ref_root {
>>>         u64 pending_csums;
>>>   -    /*
>>> -     * set when the tree is flushing before a transaction commit,
>>> -     * used by the throttling code to decide if new updates need
>>> -     * to be run right away
>>> -     */
>>> -    int flushing;
>>> +    unsigned long flags;
>>>         u64 run_delayed_start;
>>>   diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 52ada47aff50..e8e706def41c 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -872,7 +872,8 @@ int btrfs_should_end_transaction(struct
>>> btrfs_trans_handle *trans)
>>>         smp_mb();
>>
>> Is this memory barrier required now that you have removed the one in
>> btrfs_commit_transaction ?
>>
> 
> I had it in my head that we needed it for ->state too, but we don't,
> I'll fix that up.  Thanks,


I went through transaction.c and found another place where we have an
smp_mb() before checking cur_trans->state, in start_transaction:


             smp_mb();

   1         if (cur_trans->state >= TRANS_STATE_COMMIT_START &&

   2             may_wait_transaction(fs_info, type)) {

   3                 current->journal_info = h;

   4                 btrfs_commit_transaction(h);

   5                 goto again;

   6         }

Shouldn't that smp_mb() also be removed?

> 
> Josef
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner
  2020-10-15 18:25 ` [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
@ 2020-10-16  7:26   ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2020-10-16  7:26 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 15.10.20 г. 21:25 ч., Josef Bacik wrote:
> While running some stress tests I started getting hung task messages.
> This is because the delete unused bg's code has to take the
> delete_unused_bgs_mutex to do it's work, which is taken by balance to
> make sure we don't delete block groups while we're balancing.
> 
> The problem is a balance can take a while, and so we were getting hung
> task warnings.  We don't need to block and run these things, and the
> cleaner is needed to do other work, so trylock on this mutex and just
> bail if we can't acquire it right away.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Unused bgs are deleted also on remount and unmount (provided the fs is
not RO) so bailing in the cleaner in case of contention is not critical.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/6] btrfs: run delayed refs less often in commit_cowonly_roots
  2020-10-15 18:26 ` [PATCH 5/6] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
@ 2020-10-16  7:35   ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2020-10-16  7:35 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 15.10.20 г. 21:26 ч., Josef Bacik wrote:
> We love running delayed refs in commit_cowonly_roots, but it is a bit
> excessive.  I was seeing cases of running 3 or 4 refs a few times in a
> row during this time.  Instead simply update all of the roots first,
> then run delayed refs, then handle the empty block groups case, and then
> if we have any more dirty roots do the whole thing again.  This allows
> us to be much more efficient with our delayed ref running, as we can
> batch a few more operations at once.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>


I remember pointing this out but the commit messages of this and next
patch are transposed. Here you talk about commit_cowonly_roots but you
change create_pending_snapshot, analogically the same is happening in
next commit. Please fix it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-10-16  7:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 18:25 [PATCH 0/6] A variety of lock contention fixes Josef Bacik
2020-10-15 18:25 ` [PATCH 1/6] btrfs: do not block on deleted bgs mutex in the cleaner Josef Bacik
2020-10-16  7:26   ` Nikolay Borisov
2020-10-15 18:25 ` [PATCH 2/6] btrfs: only let one thread pre-flush delayed refs in commit Josef Bacik
2020-10-15 19:35   ` Nikolay Borisov
2020-10-15 20:26     ` Josef Bacik
2020-10-16  7:19       ` Nikolay Borisov
2020-10-15 18:25 ` [PATCH 3/6] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-10-15 18:26 ` [PATCH 4/6] btrfs: only run delayed refs once before committing Josef Bacik
2020-10-15 18:26 ` [PATCH 5/6] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2020-10-16  7:35   ` Nikolay Borisov
2020-10-15 18:26 ` [PATCH 6/6] btrfs: stop running all delayed refs during snapshot Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.