From: "Darrick J. Wong" <djwong@kernel.org>
To: Chris Dunlop <chris@onthe.net.au>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: Highly reflinked and fragmented considered harmful?
Date: Mon, 9 May 2022 22:14:31 -0700 [thread overview]
Message-ID: <20220510051431.GZ27195@magnolia> (raw)
In-Reply-To: <20220510025541.GA192172@onthe.net.au>
On Tue, May 10, 2022 at 12:55:41PM +1000, Chris Dunlop wrote:
> Hi Dave,
>
> On Tue, May 10, 2022 at 09:09:18AM +1000, Dave Chinner wrote:
> > On Mon, May 09, 2022 at 12:46:59PM +1000, Chris Dunlop wrote:
> > > Is it to be expected that removing 29TB of highly reflinked and fragmented
> > > data could take days, the entire time blocking other tasks like "rm" and
> > > "df" on the same filesystem?
> ...
> > At some point, you have to pay the price of creating billions of
> > random fine-grained cross references in tens of TBs of data spread
> > across weeks and months of production. You don't notice the scale of
> > the cross-reference because it's taken weeks and months of normal
> > operations to get there. It's only when you finally have to perform
> > an operation that needs to iterate all those references that the
> > scale suddenly becomes apparent. XFS scales to really large numbers
> > without significant degradation, so people don't notice things like
> > object counts or cross references until something like this
> > happens.
> >
> > I don't think there's much we can do at the filesystem level to help
> > you at this point - the inode output in the transaction dump above
> > indicates that you haven't been using extent size hints to limit
> > fragmentation or extent share/COW sizes, so the damage is already
> > present and we can't really do anything to fix that up.
>
> Thanks for taking the time to provide a detailed and informative
> exposition, it certainly helps me understand what I'm asking of the fs, the
> areas that deserve more attention, and how to approach analyzing the
> situation.
>
> At this point I'm about 3 days from completing copying the data (from a
> snapshot of the troubled fs mounted with 'norecovery') over to a brand new
> fs. Unfortunately the new fs is also rmapbt=1 so I'll go through all the
> copying again (under more controlled circumstances) to get onto a rmapbt=0
> fs (losing the ability to do online repairs whenever that arrives -
> hopefully that won't come back to haunt me).
Hmm. Were most of the stuck processes running xfs_inodegc_flush? Maybe
we should try to switch that to something that will stop waiting after
30s, since most of the (non-fsfreeze) callers don't actually *require*
that the work actually finish, they're just trying to return accurate
space accounting to userspace.
> Out of interest:
>
> > > - with a reboot/remount, does the log replay continue from where it left
> > > off, or start again?
>
> Sorry, if you provided an answer to this, I didn't understand it.
>
> Basically the question is, if a recovery on mount were going to take 10
> hours, but the box rebooted and fs mounted again at 8 hours, would the
> recovery this time take 2 hours or once again 10 hours?
In theory yes, it'll restart where it left off, but if 10 seconds go by
and the extent count *hasn't changed* then yikes did we spend that
entire time doing refcount btree updates??
--D
> Cheers,
>
> Chris
next prev parent reply other threads:[~2022-05-10 5:14 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-09 2:46 Highly reflinked and fragmented considered harmful? Chris Dunlop
2022-05-09 23:09 ` Dave Chinner
2022-05-10 2:55 ` Chris Dunlop
2022-05-10 5:14 ` Darrick J. Wong [this message]
2022-05-10 4:07 ` Amir Goldstein
2022-05-10 5:10 ` Darrick J. Wong
2022-05-10 6:30 ` Chris Dunlop
2022-05-10 8:16 ` Dave Chinner
2022-05-10 19:19 ` Darrick J. Wong
2022-05-10 21:54 ` Dave Chinner
2022-05-11 0:37 ` Darrick J. Wong
2022-05-11 1:36 ` Dave Chinner
2022-05-11 2:16 ` Chris Dunlop
2022-05-11 2:52 ` Dave Chinner
2022-05-11 3:58 ` Chris Dunlop
2022-05-11 5:18 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220510051431.GZ27195@magnolia \
--to=djwong@kernel.org \
--cc=chris@onthe.net.au \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).