linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Chris Dunlop <chris@onthe.net.au>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: Highly reflinked and fragmented considered harmful?
Date: Mon, 9 May 2022 22:14:31 -0700	[thread overview]
Message-ID: <20220510051431.GZ27195@magnolia> (raw)
In-Reply-To: <20220510025541.GA192172@onthe.net.au>

On Tue, May 10, 2022 at 12:55:41PM +1000, Chris Dunlop wrote:
> Hi Dave,
> 
> On Tue, May 10, 2022 at 09:09:18AM +1000, Dave Chinner wrote:
> > On Mon, May 09, 2022 at 12:46:59PM +1000, Chris Dunlop wrote:
> > > Is it to be expected that removing 29TB of highly reflinked and fragmented
> > > data could take days, the entire time blocking other tasks like "rm" and
> > > "df" on the same filesystem?
> ...
> > At some point, you have to pay the price of creating billions of
> > random fine-grained cross references in tens of TBs of data spread
> > across weeks and months of production. You don't notice the scale of
> > the cross-reference because it's taken weeks and months of normal
> > operations to get there. It's only when you finally have to perform
> > an operation that needs to iterate all those references that the
> > scale suddenly becomes apparent. XFS scales to really large numbers
> > without significant degradation, so people don't notice things like
> > object counts or cross references until something like this
> > happens.
> > 
> > I don't think there's much we can do at the filesystem level to help
> > you at this point - the inode output in the transaction dump above
> > indicates that you haven't been using extent size hints to limit
> > fragmentation or extent share/COW sizes, so the damage is already
> > present and we can't really do anything to fix that up.
> 
> Thanks for taking the time to provide a detailed and informative
> exposition, it certainly helps me understand what I'm asking of the fs, the
> areas that deserve more attention, and how to approach analyzing the
> situation.
> 
> At this point I'm about 3 days from completing copying the data (from a
> snapshot of the troubled fs mounted with 'norecovery') over to a brand new
> fs. Unfortunately the new fs is also rmapbt=1 so I'll go through all the
> copying again (under more controlled circumstances) to get onto a rmapbt=0
> fs (losing the ability to do online repairs whenever that arrives -
> hopefully that won't come back to haunt me).

Hmm.  Were most of the stuck processes running xfs_inodegc_flush?  Maybe
we should try to switch that to something that will stop waiting after
30s, since most of the (non-fsfreeze) callers don't actually *require*
that the work actually finish, they're just trying to return accurate
space accounting to userspace.

> Out of interest:
> 
> > > - with a reboot/remount, does the log replay continue from where it left
> > > off, or start again?
> 
> Sorry, if you provided an answer to this, I didn't understand it.
> 
> Basically the question is, if a recovery on mount were going to take 10
> hours, but the box rebooted and fs mounted again at 8 hours, would the
> recovery this time take 2 hours or once again 10 hours?

In theory yes, it'll restart where it left off, but if 10 seconds go by
and the extent count *hasn't changed* then yikes did we spend that
entire time doing refcount btree updates??

--D

> Cheers,
> 
> Chris

  reply	other threads:[~2022-05-10  5:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09  2:46 Highly reflinked and fragmented considered harmful? Chris Dunlop
2022-05-09 23:09 ` Dave Chinner
2022-05-10  2:55   ` Chris Dunlop
2022-05-10  5:14     ` Darrick J. Wong [this message]
2022-05-10  4:07   ` Amir Goldstein
2022-05-10  5:10     ` Darrick J. Wong
2022-05-10  6:30       ` Chris Dunlop
2022-05-10  8:16         ` Dave Chinner
2022-05-10 19:19           ` Darrick J. Wong
2022-05-10 21:54             ` Dave Chinner
2022-05-11  0:37               ` Darrick J. Wong
2022-05-11  1:36                 ` Dave Chinner
2022-05-11  2:16                   ` Chris Dunlop
2022-05-11  2:52                     ` Dave Chinner
2022-05-11  3:58                       ` Chris Dunlop
2022-05-11  5:18                         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220510051431.GZ27195@magnolia \
    --to=djwong@kernel.org \
    --cc=chris@onthe.net.au \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).