[PATCH 0/5] Fix up some stupid delayed ref flushing behaviors

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors
@ 2020-03-13 21:12 Josef Bacik
  2020-03-13 21:12 ` [PATCH 1/5] btrfs: set delayed_refs.flushing for the first delayed ref flushing Josef Bacik
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 21:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

While debugging Zygo's delayed ref problems it was clear there were a bunch of
cases that we're running delayed refs when we don't need to be, and they result
in a lot of weird latencies.

Each patch has their individual explanations.  But the gist of it is we run
delayed refs in a lot of arbitrary ways that have just accumulated throughout
the years, so clean up all of these so we can have more consistent performance.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] btrfs: set delayed_refs.flushing for the first delayed ref flushing
  2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
@ 2020-03-13 21:12 ` Josef Bacik
  2020-03-13 21:12 ` [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 21:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We need to stop any other delayed refs flushing operations from
happening if we're flushing delayed refs for transaction commit as we
don't want the lock contention to make everything go slower, especially
the transaction commit.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 53af0f55f5f9..cff767722a75 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2037,6 +2037,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	btrfs_trans_release_metadata(trans);
 	trans->block_rsv = NULL;
 
+	/*
+	 * set the flushing flag so procs in this transaction have to
+	 * start sending their work down.
+	 */
+	cur_trans->delayed_refs.flushing = 1;
+	smp_wmb();
+
 	/* make a pass through all the delayed refs we have so far
 	 * any runnings procs may add more while we are here
 	 */
@@ -2048,13 +2055,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	cur_trans = trans->transaction;
 
-	/*
-	 * set the flushing flag so procs in this transaction have to
-	 * start sending their work down.
-	 */
-	cur_trans->delayed_refs.flushing = 1;
-	smp_wmb();
-
 	btrfs_create_pending_block_groups(trans);
 
 	ret = btrfs_run_delayed_refs(trans, 0);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have
  2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
  2020-03-13 21:12 ` [PATCH 1/5] btrfs: set delayed_refs.flushing for the first delayed ref flushing Josef Bacik
@ 2020-03-13 21:12 ` Josef Bacik
  2020-04-03 14:31   ` Nikolay Borisov
  2020-03-13 21:12 ` [PATCH 3/5] btrfs: only run delayed refs once before committing Josef Bacik
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 21:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Previously our delayed ref running used the total number of items as the
items to run.  However we changed that to number of heads to run with
the delayed_refs_rsv, as generally we want to run all of the operations
for one bytenr.

But with btrfs_run_delayed_refs(trans, 0) we set our count to 2x the
number of items that we have.  This is generally fine, but if we have
some operation generation loads of delayed refs while we're doing this
pre-flushing in the transaction commit, we'll just spin forever doing
delayed refs.

Fix this to simply pick the number of delayed refs we currently have,
that way we do not end up doing a lot of extra work that's being
generated in other threads.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8e5b49baad98..2925b3ad77a1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2196,7 +2196,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,

 	delayed_refs = &trans->transaction->delayed_refs;
 	if (count == 0)
-		count = atomic_read(&delayed_refs->num_entries) * 2;
+		count = delayed_refs->num_heads_ready;

 again:
 #ifdef SCRAMBLE_DELAYED_REFS
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] btrfs: only run delayed refs once before committing
  2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
  2020-03-13 21:12 ` [PATCH 1/5] btrfs: set delayed_refs.flushing for the first delayed ref flushing Josef Bacik
  2020-03-13 21:12 ` [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
@ 2020-03-13 21:12 ` Josef Bacik
  2020-04-03 14:34   ` Nikolay Borisov
  2020-03-13 21:12 ` [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 21:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We try to pre-flush the delayed refs when committing, because we want to
do as little work as possible in the critical section of the transaction
commit.

However doing this twice can lead to very long transaction commit delays
as other threads are allowed to continue to generate more delayed refs,
which potentially delays the commit by multiple minutes in very extreme
cases.

So simply stick to one pre-flush, and then continue the rest of the
transaction commit.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index cff767722a75..3e7fd8a934c1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2057,12 +2057,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 
 	btrfs_create_pending_block_groups(trans);
 
-	ret = btrfs_run_delayed_refs(trans, 0);
-	if (ret) {
-		btrfs_end_transaction(trans);
-		return ret;
-	}
-
 	if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) {
 		int run_it = 0;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots
  2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
                   ` (2 preceding siblings ...)
  2020-03-13 21:12 ` [PATCH 3/5] btrfs: only run delayed refs once before committing Josef Bacik
@ 2020-03-13 21:12 ` Josef Bacik
  2020-04-03 14:43   ` Nikolay Borisov
  2020-03-13 21:12 ` [PATCH 5/5] btrfs: stop running all delayed refs during snapshot Josef Bacik
  2020-03-25 13:51 ` [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors David Sterba
  5 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 21:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We love running delayed refs in commit_cowonly_roots, but it is a bit
excessive.  I was seeing cases of running 3 or 4 refs a few times in a
row during this time.  Instead simply update all of the roots first,
then run delayed refs, then handle the empty block groups case, and then
if we have any more dirty roots do the whole thing again.  This allows
us to be much more efficient with our delayed ref running, as we can
batch a few more operations at once.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3e7fd8a934c1..c3b3b524b8c3 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1646,12 +1646,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 		goto fail;
 	}
 
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-	if (ret) {
-		btrfs_abort_transaction(trans, ret);
-		goto fail;
-	}
-
 	/*
 	 * Do special qgroup accounting for snapshot, as we do some qgroup
 	 * snapshot hack to do fast snapshot.
@@ -1698,12 +1692,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-	if (ret) {
-		btrfs_abort_transaction(trans, ret);
-		goto fail;
-	}
-
 fail:
 	pending->error = ret;
 dir_item_existed:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] btrfs: stop running all delayed refs during snapshot
  2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
                   ` (3 preceding siblings ...)
  2020-03-13 21:12 ` [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
@ 2020-03-13 21:12 ` Josef Bacik
  2020-04-03 14:46   ` Nikolay Borisov
  2020-03-25 13:51 ` [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors David Sterba
  5 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 21:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This was added a very long time ago to work around issues with delayed
refs and with qgroups.  Both of these issues have since been properly
fixed, so all this does is cause a lot of lock contention with anybody
else who is running delayed refs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c3b3b524b8c3..8d34d7e0adb6 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1184,10 +1184,6 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 	btrfs_tree_unlock(eb);
 	free_extent_buffer(eb);
 
-	if (ret)
-		return ret;
-
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
 	if (ret)
 		return ret;
 
@@ -1205,10 +1201,6 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 	if (ret)
 		return ret;
 
-	/* run_qgroups might have added some more refs */
-	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-	if (ret)
-		return ret;
 again:
 	while (!list_empty(&fs_info->dirty_cowonly_roots)) {
 		struct btrfs_root *root;
@@ -1223,15 +1215,24 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 		ret = update_cowonly_root(trans, root);
 		if (ret)
 			return ret;
-		ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
-		if (ret)
-			return ret;
 	}
 
+	/* Now flush any delayed refs generated by updating all of the roots. */
+	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
+	if (ret)
+		return ret;
+
 	while (!list_empty(dirty_bgs) || !list_empty(io_bgs)) {
 		ret = btrfs_write_dirty_block_groups(trans);
 		if (ret)
 			return ret;
+
+		/*
+		 * We're writing the dirty block groups, which could generate
+		 * delayed refs, which could generate more dirty block groups,
+		 * so we want to keep this flushing in this loop to make sure
+		 * everything gets run.
+		 */
 		ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
 		if (ret)
 			return ret;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors
  2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
                   ` (4 preceding siblings ...)
  2020-03-13 21:12 ` [PATCH 5/5] btrfs: stop running all delayed refs during snapshot Josef Bacik
@ 2020-03-25 13:51 ` David Sterba
  2020-03-25 14:12   ` Josef Bacik
  5 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2020-03-25 13:51 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Mar 13, 2020 at 05:12:15PM -0400, Josef Bacik wrote:
> While debugging Zygo's delayed ref problems it was clear there were a bunch of
> cases that we're running delayed refs when we don't need to be, and they result
> in a lot of weird latencies.
> 
> Each patch has their individual explanations.  But the gist of it is we run
> delayed refs in a lot of arbitrary ways that have just accumulated throughout
> the years, so clean up all of these so we can have more consistent performance.

It would be fine to remove the delayed refs being run from so many
places but I vaguely remember some patches adding them with "we have to
run delayed refs here or we will miss something and that would be a
corruption". The changelogs in patches from 3 on don't point out any
specific problems and I miss some reasoning about correctness, ideally
for each line of btrfs_run_delayed_refs removed.

As a worst case I really don't want to get to a situation where we start
getting reports that something broke because of the missing delayed
refs, followed by series of "oh yeah I forgot we need it here, add it
back".

The branch with this patchset is in for-next but I'm still not
comfortable with adding it to misc-next as I can't convince myself it's
safe, so more reviews are welcome.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors
  2020-03-25 13:51 ` [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors David Sterba
@ 2020-03-25 14:12   ` Josef Bacik
  2020-03-26 15:36     ` Zygo Blaxell
  0 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-03-25 14:12 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On 3/25/20 9:51 AM, David Sterba wrote:
> On Fri, Mar 13, 2020 at 05:12:15PM -0400, Josef Bacik wrote:
>> While debugging Zygo's delayed ref problems it was clear there were a bunch of
>> cases that we're running delayed refs when we don't need to be, and they result
>> in a lot of weird latencies.
>>
>> Each patch has their individual explanations.  But the gist of it is we run
>> delayed refs in a lot of arbitrary ways that have just accumulated throughout
>> the years, so clean up all of these so we can have more consistent performance.
> 
> It would be fine to remove the delayed refs being run from so many
> places but I vaguely remember some patches adding them with "we have to
> run delayed refs here or we will miss something and that would be a
> corruption". The changelogs in patches from 3 on don't point out any
> specific problems and I miss some reasoning about correctness, ideally
> for each line of btrfs_run_delayed_refs removed.
> 
> As a worst case I really don't want to get to a situation where we start
> getting reports that something broke because of the missing delayed
> refs, followed by series of "oh yeah I forgot we need it here, add it
> back".

Yeah I went through and checked each of these spots to see why we had them.  A 
lot of it had to do with how poorly delayed refs were run previously.  You could 
end up with weird ordering cases and missing our flags.

These problems are all gone now, we no longer have to run delayed refs to work 
around ordering weirdness because I fixed all of those problems.  Now these are 
just old relics of the past that need to die.  The only case where I didn't 
touch them is for qgroups, likely because it still matters for the before/after 
lookups there.

But everywhere else it was working around some deficiency in how we ran delayed 
refs, either in the ordering issues or space related.  Both those problems no 
longer exist, so we can drop these workarounds.

> 
> The branch with this patchset is in for-next but I'm still not
> comfortable with adding it to misc-next as I can't convince myself it's
> safe, so more reviews are welcome.
> 

Yeah I'm targeting the merge window after the upcoming one with these, there's 
still a lot more testing I want to get done.  I mostly threw them up because 
they were no longer blowing up constantly for Zygo, and I wanted Filipe to get 
an early look at them.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors
  2020-03-25 14:12   ` Josef Bacik
@ 2020-03-26 15:36     ` Zygo Blaxell
  2020-03-26 19:58       ` Josef Bacik
  0 siblings, 1 reply; 14+ messages in thread
From: Zygo Blaxell @ 2020-03-26 15:36 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, linux-btrfs, kernel-team

On Wed, Mar 25, 2020 at 10:12:30AM -0400, Josef Bacik wrote:
> On 3/25/20 9:51 AM, David Sterba wrote:
> > On Fri, Mar 13, 2020 at 05:12:15PM -0400, Josef Bacik wrote:
> > > While debugging Zygo's delayed ref problems it was clear there were a bunch of
> > > cases that we're running delayed refs when we don't need to be, and they result
> > > in a lot of weird latencies.
> > > 
> > > Each patch has their individual explanations.  But the gist of it is we run
> > > delayed refs in a lot of arbitrary ways that have just accumulated throughout
> > > the years, so clean up all of these so we can have more consistent performance.
> > 
> > It would be fine to remove the delayed refs being run from so many
> > places but I vaguely remember some patches adding them with "we have to
> > run delayed refs here or we will miss something and that would be a
> > corruption". The changelogs in patches from 3 on don't point out any
> > specific problems and I miss some reasoning about correctness, ideally
> > for each line of btrfs_run_delayed_refs removed.
> > 
> > As a worst case I really don't want to get to a situation where we start
> > getting reports that something broke because of the missing delayed
> > refs, followed by series of "oh yeah I forgot we need it here, add it
> > back".
> 
> Yeah I went through and checked each of these spots to see why we had them.
> A lot of it had to do with how poorly delayed refs were run previously.  You
> could end up with weird ordering cases and missing our flags.
> 
> These problems are all gone now, we no longer have to run delayed refs to
> work around ordering weirdness because I fixed all of those problems.  Now
> these are just old relics of the past that need to die.  The only case where
> I didn't touch them is for qgroups, likely because it still matters for the
> before/after lookups there.
> 
> But everywhere else it was working around some deficiency in how we ran
> delayed refs, either in the ordering issues or space related.  Both those
> problems no longer exist, so we can drop these workarounds.
> 
> > 
> > The branch with this patchset is in for-next but I'm still not
> > comfortable with adding it to misc-next as I can't convince myself it's
> > safe, so more reviews are welcome.
> > 
> 
> Yeah I'm targeting the merge window after the upcoming one with these,
> there's still a lot more testing I want to get done.  I mostly threw them up
> because they were no longer blowing up constantly for Zygo, and I wanted
> Filipe to get an early look at them.  Thanks,

No longer blowing up _constantly_, but there was definitely a 2-3 day
cadence between blowups last time I rebased.  Test runs were ending in
splats due to KASAN UAF bugs and bad unlock balances.  It doesn't seem
to be corrupting on-disk metadata, but my test VMs can't get anywhere
close to a week uptime under the full stress load yet.

I'd like to keep a test VM pointed at this as it makes it way upstream.
It's an important set of changes, but it has a high regression risk.
There are some big changes here, and that's going to expose all the gaps
in developers' knowledge of how stuff really works.

Do I just keep rebasing on for-next-<date>?

> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors
  2020-03-26 15:36     ` Zygo Blaxell
@ 2020-03-26 19:58       ` Josef Bacik
  0 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-26 19:58 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: dsterba, linux-btrfs, kernel-team

On 3/26/20 11:36 AM, Zygo Blaxell wrote:
> On Wed, Mar 25, 2020 at 10:12:30AM -0400, Josef Bacik wrote:
>> On 3/25/20 9:51 AM, David Sterba wrote:
>>> On Fri, Mar 13, 2020 at 05:12:15PM -0400, Josef Bacik wrote:
>>>> While debugging Zygo's delayed ref problems it was clear there were a bunch of
>>>> cases that we're running delayed refs when we don't need to be, and they result
>>>> in a lot of weird latencies.
>>>>
>>>> Each patch has their individual explanations.  But the gist of it is we run
>>>> delayed refs in a lot of arbitrary ways that have just accumulated throughout
>>>> the years, so clean up all of these so we can have more consistent performance.
>>>
>>> It would be fine to remove the delayed refs being run from so many
>>> places but I vaguely remember some patches adding them with "we have to
>>> run delayed refs here or we will miss something and that would be a
>>> corruption". The changelogs in patches from 3 on don't point out any
>>> specific problems and I miss some reasoning about correctness, ideally
>>> for each line of btrfs_run_delayed_refs removed.
>>>
>>> As a worst case I really don't want to get to a situation where we start
>>> getting reports that something broke because of the missing delayed
>>> refs, followed by series of "oh yeah I forgot we need it here, add it
>>> back".
>>
>> Yeah I went through and checked each of these spots to see why we had them.
>> A lot of it had to do with how poorly delayed refs were run previously.  You
>> could end up with weird ordering cases and missing our flags.
>>
>> These problems are all gone now, we no longer have to run delayed refs to
>> work around ordering weirdness because I fixed all of those problems.  Now
>> these are just old relics of the past that need to die.  The only case where
>> I didn't touch them is for qgroups, likely because it still matters for the
>> before/after lookups there.
>>
>> But everywhere else it was working around some deficiency in how we ran
>> delayed refs, either in the ordering issues or space related.  Both those
>> problems no longer exist, so we can drop these workarounds.
>>
>>>
>>> The branch with this patchset is in for-next but I'm still not
>>> comfortable with adding it to misc-next as I can't convince myself it's
>>> safe, so more reviews are welcome.
>>>
>>
>> Yeah I'm targeting the merge window after the upcoming one with these,
>> there's still a lot more testing I want to get done.  I mostly threw them up
>> because they were no longer blowing up constantly for Zygo, and I wanted
>> Filipe to get an early look at them.  Thanks,
> 
> No longer blowing up _constantly_, but there was definitely a 2-3 day
> cadence between blowups last time I rebased.  Test runs were ending in
> splats due to KASAN UAF bugs and bad unlock balances.  It doesn't seem
> to be corrupting on-disk metadata, but my test VMs can't get anywhere
> close to a week uptime under the full stress load yet.
> 
> I'd like to keep a test VM pointed at this as it makes it way upstream.
> It's an important set of changes, but it has a high regression risk.
> There are some big changes here, and that's going to expose all the gaps
> in developers' knowledge of how stuff really works.
> 
> Do I just keep rebasing on for-next-<date>?
> 

I think so, I'm not sure what Dave has merged so far.  My own long term tests 
have uncovered a few bugs that I've been busy running down.  Once my long term 
tests no longer fall over I'm going to rebase everything onto the most recent 
devel branch and we can go from there.  Hopefully I'll have this last corner run 
down by tomorrow or early next week.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have
  2020-03-13 21:12 ` [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
@ 2020-04-03 14:31   ` Nikolay Borisov
  0 siblings, 0 replies; 14+ messages in thread
From: Nikolay Borisov @ 2020-04-03 14:31 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 13.03.20 г. 23:12 ч., Josef Bacik wrote:
> Previously our delayed ref running used the total number of items as the
> items to run.  However we changed that to number of heads to run with
> the delayed_refs_rsv, as generally we want to run all of the operations
> for one bytenr.
> 
> But with btrfs_run_delayed_refs(trans, 0) we set our count to 2x the
> number of items that we have.  This is generally fine, but if we have
> some operation generation loads of delayed refs while we're doing this
> pre-flushing in the transaction commit, we'll just spin forever doing
> delayed refs.
> 
> Fix this to simply pick the number of delayed refs we currently have,
> that way we do not end up doing a lot of extra work that's being
> generated in other threads.

Indeed there is a mismatch between delayed_refs->num_entries and what we
account in __btrfs_run_delayed_refs. In the function we count on a
per-head (which can include multiple delayed refs ops) granularity not
on per-refs-per-head. So this fix makes sense.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8e5b49baad98..2925b3ad77a1 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2196,7 +2196,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>  
>  	delayed_refs = &trans->transaction->delayed_refs;
>  	if (count == 0)
> -		count = atomic_read(&delayed_refs->num_entries) * 2;
> +		count = delayed_refs->num_heads_ready;
>  
>  again:
>  #ifdef SCRAMBLE_DELAYED_REFS
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] btrfs: only run delayed refs once before committing
  2020-03-13 21:12 ` [PATCH 3/5] btrfs: only run delayed refs once before committing Josef Bacik
@ 2020-04-03 14:34   ` Nikolay Borisov
  0 siblings, 0 replies; 14+ messages in thread
From: Nikolay Borisov @ 2020-04-03 14:34 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 13.03.20 г. 23:12 ч., Josef Bacik wrote:
> We try to pre-flush the delayed refs when committing, because we want to
> do as little work as possible in the critical section of the transaction
> commit.
> 
> However doing this twice can lead to very long transaction commit delays
> as other threads are allowed to continue to generate more delayed refs,
> which potentially delays the commit by multiple minutes in very extreme
> cases.
> 
> So simply stick to one pre-flush, and then continue the rest of the
> transaction commit.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/transaction.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index cff767722a75..3e7fd8a934c1 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -2057,12 +2057,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>  
>  	btrfs_create_pending_block_groups(trans);
>  
> -	ret = btrfs_run_delayed_refs(trans, 0);
> -	if (ret) {
> -		btrfs_end_transaction(trans);
> -		return ret;
> -	}
> -
>  	if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, &cur_trans->flags)) {
>  		int run_it = 0;
>  
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots
  2020-03-13 21:12 ` [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
@ 2020-04-03 14:43   ` Nikolay Borisov
  0 siblings, 0 replies; 14+ messages in thread
From: Nikolay Borisov @ 2020-04-03 14:43 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 13.03.20 г. 23:12 ч., Josef Bacik wrote:
> We love running delayed refs in commit_cowonly_roots, but it is a bit

You refer to commit_cowonly_roots but in fact are changing
create_pending_snapshot. Between calling
create_pending_snapshots->create_pending_snapshot and
commit_cowonly_roots we have 2 calls to btrfs_run_delayed_refs. My point
is by the time commit_cowonly_roots is called we've already long flushed
the delayed refs generated from create_pending_snapshot. IMO referring
to commit_cowonly_roots in this commit is wrong?


> excessive.  I was seeing cases of running 3 or 4 refs a few times in a
> row during this time.  Instead simply update all of the roots first,
By "simply update all of the roots" I assume you meant the call to
commit_fs_roots or the changes happening to the snapshoted roots in
create_pending_snapshot? If it's the latter I'd rather you made the text
more explicit by referring to the fact the refs are generated from
snapshots e.g.

Instead simply run all snapshot-related work, then drain delayed refs
resulting from this ...

> then run delayed refs, then handle the empty block groups case, and then
> if we have any more dirty roots do the whole thing again.  This allows
> us to be much more efficient with our delayed ref running, as we can
> batch a few more operations at once.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/transaction.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3e7fd8a934c1..c3b3b524b8c3 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1646,12 +1646,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  		goto fail;
>  	}
>  
> -	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
> -	if (ret) {
> -		btrfs_abort_transaction(trans, ret);
> -		goto fail;
> -	}
> -
>  	/*
>  	 * Do special qgroup accounting for snapshot, as we do some qgroup
>  	 * snapshot hack to do fast snapshot.
> @@ -1698,12 +1692,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  		}
>  	}
>  
> -	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
> -	if (ret) {
> -		btrfs_abort_transaction(trans, ret);
> -		goto fail;
> -	}
> -
>  fail:
>  	pending->error = ret;
>  dir_item_existed:
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] btrfs: stop running all delayed refs during snapshot
  2020-03-13 21:12 ` [PATCH 5/5] btrfs: stop running all delayed refs during snapshot Josef Bacik
@ 2020-04-03 14:46   ` Nikolay Borisov
  0 siblings, 0 replies; 14+ messages in thread
From: Nikolay Borisov @ 2020-04-03 14:46 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 13.03.20 г. 23:12 ч., Josef Bacik wrote:
> This was added a very long time ago to work around issues with delayed
> refs and with qgroups.  Both of these issues have since been properly
> fixed, so all this does is cause a lot of lock contention with anybody
> else who is running delayed refs.

Oh I see, the changelog for this and the previous patch got switched, no ?
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/transaction.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index c3b3b524b8c3..8d34d7e0adb6 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1184,10 +1184,6 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
>  	btrfs_tree_unlock(eb);
>  	free_extent_buffer(eb);
>  
> -	if (ret)
> -		return ret;
> -
> -	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
>  	if (ret)
>  		return ret;
>  
> @@ -1205,10 +1201,6 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
>  	if (ret)
>  		return ret;
>  
> -	/* run_qgroups might have added some more refs */
> -	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
> -	if (ret)
> -		return ret;
>  again:
>  	while (!list_empty(&fs_info->dirty_cowonly_roots)) {
>  		struct btrfs_root *root;
> @@ -1223,15 +1215,24 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
>  		ret = update_cowonly_root(trans, root);
>  		if (ret)
>  			return ret;
> -		ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
> -		if (ret)
> -			return ret;
>  	}
>  
> +	/* Now flush any delayed refs generated by updating all of the roots. */
> +	ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
> +	if (ret)
> +		return ret;
> +
>  	while (!list_empty(dirty_bgs) || !list_empty(io_bgs)) {
>  		ret = btrfs_write_dirty_block_groups(trans);
>  		if (ret)
>  			return ret;
> +
> +		/*
> +		 * We're writing the dirty block groups, which could generate
> +		 * delayed refs, which could generate more dirty block groups,
> +		 * so we want to keep this flushing in this loop to make sure
> +		 * everything gets run.
> +		 */
>  		ret = btrfs_run_delayed_refs(trans, (unsigned long)-1);
>  		if (ret)
>  			return ret;
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-03 14:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
2020-03-13 21:12 ` [PATCH 1/5] btrfs: set delayed_refs.flushing for the first delayed ref flushing Josef Bacik
2020-03-13 21:12 ` [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-04-03 14:31   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 3/5] btrfs: only run delayed refs once before committing Josef Bacik
2020-04-03 14:34   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2020-04-03 14:43   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 5/5] btrfs: stop running all delayed refs during snapshot Josef Bacik
2020-04-03 14:46   ` Nikolay Borisov
2020-03-25 13:51 ` [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors David Sterba
2020-03-25 14:12   ` Josef Bacik
2020-03-26 15:36     ` Zygo Blaxell
2020-03-26 19:58       ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).