linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: dsterba@suse.cz, linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors
Date: Thu, 26 Mar 2020 11:36:04 -0400	[thread overview]
Message-ID: <20200326153603.GY13306@hungrycats.org> (raw)
In-Reply-To: <e28e8818-bd8e-707f-c749-d00f7a5c913a@toxicpanda.com>

On Wed, Mar 25, 2020 at 10:12:30AM -0400, Josef Bacik wrote:
> On 3/25/20 9:51 AM, David Sterba wrote:
> > On Fri, Mar 13, 2020 at 05:12:15PM -0400, Josef Bacik wrote:
> > > While debugging Zygo's delayed ref problems it was clear there were a bunch of
> > > cases that we're running delayed refs when we don't need to be, and they result
> > > in a lot of weird latencies.
> > > 
> > > Each patch has their individual explanations.  But the gist of it is we run
> > > delayed refs in a lot of arbitrary ways that have just accumulated throughout
> > > the years, so clean up all of these so we can have more consistent performance.
> > 
> > It would be fine to remove the delayed refs being run from so many
> > places but I vaguely remember some patches adding them with "we have to
> > run delayed refs here or we will miss something and that would be a
> > corruption". The changelogs in patches from 3 on don't point out any
> > specific problems and I miss some reasoning about correctness, ideally
> > for each line of btrfs_run_delayed_refs removed.
> > 
> > As a worst case I really don't want to get to a situation where we start
> > getting reports that something broke because of the missing delayed
> > refs, followed by series of "oh yeah I forgot we need it here, add it
> > back".
> 
> Yeah I went through and checked each of these spots to see why we had them.
> A lot of it had to do with how poorly delayed refs were run previously.  You
> could end up with weird ordering cases and missing our flags.
> 
> These problems are all gone now, we no longer have to run delayed refs to
> work around ordering weirdness because I fixed all of those problems.  Now
> these are just old relics of the past that need to die.  The only case where
> I didn't touch them is for qgroups, likely because it still matters for the
> before/after lookups there.
> 
> But everywhere else it was working around some deficiency in how we ran
> delayed refs, either in the ordering issues or space related.  Both those
> problems no longer exist, so we can drop these workarounds.
> 
> > 
> > The branch with this patchset is in for-next but I'm still not
> > comfortable with adding it to misc-next as I can't convince myself it's
> > safe, so more reviews are welcome.
> > 
> 
> Yeah I'm targeting the merge window after the upcoming one with these,
> there's still a lot more testing I want to get done.  I mostly threw them up
> because they were no longer blowing up constantly for Zygo, and I wanted
> Filipe to get an early look at them.  Thanks,

No longer blowing up _constantly_, but there was definitely a 2-3 day
cadence between blowups last time I rebased.  Test runs were ending in
splats due to KASAN UAF bugs and bad unlock balances.  It doesn't seem
to be corrupting on-disk metadata, but my test VMs can't get anywhere
close to a week uptime under the full stress load yet.

I'd like to keep a test VM pointed at this as it makes it way upstream.
It's an important set of changes, but it has a high regression risk.
There are some big changes here, and that's going to expose all the gaps
in developers' knowledge of how stuff really works.

Do I just keep rebasing on for-next-<date>?

> Josef

  reply	other threads:[~2020-03-26 15:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-13 21:12 [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors Josef Bacik
2020-03-13 21:12 ` [PATCH 1/5] btrfs: set delayed_refs.flushing for the first delayed ref flushing Josef Bacik
2020-03-13 21:12 ` [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-04-03 14:31   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 3/5] btrfs: only run delayed refs once before committing Josef Bacik
2020-04-03 14:34   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2020-04-03 14:43   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 5/5] btrfs: stop running all delayed refs during snapshot Josef Bacik
2020-04-03 14:46   ` Nikolay Borisov
2020-03-25 13:51 ` [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors David Sterba
2020-03-25 14:12   ` Josef Bacik
2020-03-26 15:36     ` Zygo Blaxell [this message]
2020-03-26 19:58       ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200326153603.GY13306@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).