Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: dsterba@suse.cz, linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors
Date: Thu, 26 Mar 2020 15:58:52 -0400
Message-ID: <2f15a973-1f2e-33fa-aefd-93a1f7dce683@toxicpanda.com> (raw)
In-Reply-To: <20200326153603.GY13306@hungrycats.org>

On 3/26/20 11:36 AM, Zygo Blaxell wrote:
> On Wed, Mar 25, 2020 at 10:12:30AM -0400, Josef Bacik wrote:
>> On 3/25/20 9:51 AM, David Sterba wrote:
>>> On Fri, Mar 13, 2020 at 05:12:15PM -0400, Josef Bacik wrote:
>>>> While debugging Zygo's delayed ref problems it was clear there were a bunch of
>>>> cases that we're running delayed refs when we don't need to be, and they result
>>>> in a lot of weird latencies.
>>>>
>>>> Each patch has their individual explanations.  But the gist of it is we run
>>>> delayed refs in a lot of arbitrary ways that have just accumulated throughout
>>>> the years, so clean up all of these so we can have more consistent performance.
>>>
>>> It would be fine to remove the delayed refs being run from so many
>>> places but I vaguely remember some patches adding them with "we have to
>>> run delayed refs here or we will miss something and that would be a
>>> corruption". The changelogs in patches from 3 on don't point out any
>>> specific problems and I miss some reasoning about correctness, ideally
>>> for each line of btrfs_run_delayed_refs removed.
>>>
>>> As a worst case I really don't want to get to a situation where we start
>>> getting reports that something broke because of the missing delayed
>>> refs, followed by series of "oh yeah I forgot we need it here, add it
>>> back".
>>
>> Yeah I went through and checked each of these spots to see why we had them.
>> A lot of it had to do with how poorly delayed refs were run previously.  You
>> could end up with weird ordering cases and missing our flags.
>>
>> These problems are all gone now, we no longer have to run delayed refs to
>> work around ordering weirdness because I fixed all of those problems.  Now
>> these are just old relics of the past that need to die.  The only case where
>> I didn't touch them is for qgroups, likely because it still matters for the
>> before/after lookups there.
>>
>> But everywhere else it was working around some deficiency in how we ran
>> delayed refs, either in the ordering issues or space related.  Both those
>> problems no longer exist, so we can drop these workarounds.
>>
>>>
>>> The branch with this patchset is in for-next but I'm still not
>>> comfortable with adding it to misc-next as I can't convince myself it's
>>> safe, so more reviews are welcome.
>>>
>>
>> Yeah I'm targeting the merge window after the upcoming one with these,
>> there's still a lot more testing I want to get done.  I mostly threw them up
>> because they were no longer blowing up constantly for Zygo, and I wanted
>> Filipe to get an early look at them.  Thanks,
> 
> No longer blowing up _constantly_, but there was definitely a 2-3 day
> cadence between blowups last time I rebased.  Test runs were ending in
> splats due to KASAN UAF bugs and bad unlock balances.  It doesn't seem
> to be corrupting on-disk metadata, but my test VMs can't get anywhere
> close to a week uptime under the full stress load yet.
> 
> I'd like to keep a test VM pointed at this as it makes it way upstream.
> It's an important set of changes, but it has a high regression risk.
> There are some big changes here, and that's going to expose all the gaps
> in developers' knowledge of how stuff really works.
> 
> Do I just keep rebasing on for-next-<date>?
> 

I think so, I'm not sure what Dave has merged so far.  My own long term tests 
have uncovered a few bugs that I've been busy running down.  Once my long term 
tests no longer fall over I'm going to rebase everything onto the most recent 
devel branch and we can go from there.  Hopefully I'll have this last corner run 
down by tomorrow or early next week.  Thanks,

Josef

      reply index

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-13 21:12 Josef Bacik
2020-03-13 21:12 ` [PATCH 1/5] btrfs: set delayed_refs.flushing for the first delayed ref flushing Josef Bacik
2020-03-13 21:12 ` [PATCH 2/5] btrfs: delayed refs pre-flushing should only run the heads we have Josef Bacik
2020-04-03 14:31   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 3/5] btrfs: only run delayed refs once before committing Josef Bacik
2020-04-03 14:34   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 4/5] btrfs: run delayed refs less often in commit_cowonly_roots Josef Bacik
2020-04-03 14:43   ` Nikolay Borisov
2020-03-13 21:12 ` [PATCH 5/5] btrfs: stop running all delayed refs during snapshot Josef Bacik
2020-04-03 14:46   ` Nikolay Borisov
2020-03-25 13:51 ` [PATCH 0/5] Fix up some stupid delayed ref flushing behaviors David Sterba
2020-03-25 14:12   ` Josef Bacik
2020-03-26 15:36     ` Zygo Blaxell
2020-03-26 19:58       ` Josef Bacik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f15a973-1f2e-33fa-aefd-93a1f7dce683@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git