All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 08/21] xfs: defer iput on certain inodes while scrub / repair are running
Date: Fri, 29 Jun 2018 07:49:43 -0700	[thread overview]
Message-ID: <20180629144943.GL5711@magnolia> (raw)
In-Reply-To: <20180628233721.GE2234@dastard>

On Fri, Jun 29, 2018 at 09:37:21AM +1000, Dave Chinner wrote:
> On Sun, Jun 24, 2018 at 12:24:20PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Destroying an incore inode sometimes requires some work to be done on
> > the inode.  For example, post-EOF blocks on a non-PREALLOC inode are
> > trimmed, and copy-on-write staging extents are freed.  This work is done
> > in separate transactions, which is bad for scrub and repair because (a)
> > we already have a transaction and can't nest them, and (b) if we've
> > frozen the filesystem for scrub/repair work, that (regular) transaction
> > allocation will block on the freeze.
> > 
> > Therefore, if we detect that work has to be done to destroy the incore
> > inode, we'll just hang on to the reference until after the scrub is
> > finished.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Darrick, I'll just repeat what we discussed on #xfs here so we have
> in it the archive and everyone else knows why this is probably going
> to be done differently.
> 
> I think we should move deferred inode inactivation processing into
> the background reclaim radix tree walker rather than introduce a
> special new "don't iput this inode yet" state. We're really only
> trying to prevent the transactions that xfs_inactive() may run
> throught iput() when the filesystem is frozen, and we already stop
> background reclaim processing when the fs is frozen.
> 
> I've always intended that xfs_fs_destroy_inode() basically becomes a
> no-op that just queues the inode for final inactivation, freeing and
> reclaim - right now it ony does the reclaim work in the background.
> I first proposed this back in ~2008 here:
> 
> http://xfs.org/index.php/Improving_inode_Caching#Inode_Unlink
> 
> At this point, it really only requires a new inode flag to indicate
> that it has an inactivation pending - we set that if xfs_inactive
> needs to do work before the inode can be reclaimed, and have a
> separate per-ag work queue that walks the inode radix tree finding
> reclaimable inodes that have the NEED_INACTIVATION inode flag set.
> This way background reclaim doesn't get stuck on them.
> 
> This has benefits for many operations e.g. bulk processing of
> inode inactivation and freeing either concurrently or after rm -rf
> rather than at unlink syscall exit, VFS inode cache shrinker never
> blocks on inactivation needing to run transactions, etc.
> 
> It also allows us to turn off inactivation on a per-AG basis,
> meaning that when we are rebuilding an AG structure in repair (e.g.
> the rmap btree) we can turn off inode inactivation and reclaim for
> that AG rather than needing to freeze the entire filesystem....

So although I've been off playing a JavaScript monkey this week, I should
note that the past few months I've also been slowly combing through all
the past online repair fuzz test output to see what's still majorly
broken.  I've noticed that the bmbt fuzzers have a particular failure
pattern that leads to shutdown, which is:

1) Fuzz a bmbt.br_blockcount value to a large enough value that we now
have a giant post-eof extent.

2) Mount filesystem.

3) Run xfs_scrub, which loads said inode, checks the bad bmbt, and tells
userspace it's broken...

4) ...and releases the inode.

5) Memory reclaim or someone comes along and calls xfs_inactive, which
says "Hey, nice post-EOF extent, let's trim that off!"  The extent free
code then freaks out "ZOMG, that extent is already free!"

6) Bam, filesystem shuts down.

7) xfs_scrub retries the bmbt scrub, but this time with IFLAG_REPAIR
set, but by now the fs has already gone down, and sadness.

I've had a thought lurking around in my head for a while that perhaps we
should have a second SKIP_INACTIVATION iflag that indicates that the
inode is corrupt and we should skip post-eof inactivation to avoid fs
shutdowns.  We'd still have to take the risk of cleaning out the cow
fork (because that metadata are never persisted) but we could at least
avoid a shutdown.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-06-29 14:49 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-24 19:23 [PATCH v16 00/21] xfs-4.19: online repair support Darrick J. Wong
2018-06-24 19:23 ` [PATCH 01/21] xfs: don't assume a left rmap when allocating a new rmap Darrick J. Wong
2018-06-27  0:54   ` Dave Chinner
2018-06-28 21:11   ` Allison Henderson
2018-06-29 14:39     ` Darrick J. Wong
2018-06-24 19:23 ` [PATCH 02/21] xfs: add helper to decide if an inode has allocated cow blocks Darrick J. Wong
2018-06-27  1:02   ` Dave Chinner
2018-06-28 21:12   ` Allison Henderson
2018-06-24 19:23 ` [PATCH 03/21] xfs: refactor part of xfs_free_eofblocks Darrick J. Wong
2018-06-28 21:13   ` Allison Henderson
2018-06-24 19:23 ` [PATCH 04/21] xfs: repair the AGF and AGFL Darrick J. Wong
2018-06-27  2:19   ` Dave Chinner
2018-06-27 16:44     ` Allison Henderson
2018-06-27 23:37       ` Dave Chinner
2018-06-29 15:14         ` Darrick J. Wong
2018-06-28 17:25     ` Allison Henderson
2018-06-29 15:08       ` Darrick J. Wong
2018-06-28 21:14   ` Allison Henderson
2018-06-28 23:21     ` Dave Chinner
2018-06-29  1:35       ` Allison Henderson
2018-06-29 14:55         ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 05/21] xfs: repair the AGI Darrick J. Wong
2018-06-27  2:22   ` Dave Chinner
2018-06-28 21:15   ` Allison Henderson
2018-06-24 19:24 ` [PATCH 06/21] xfs: repair free space btrees Darrick J. Wong
2018-06-27  3:21   ` Dave Chinner
2018-07-04  2:15     ` Darrick J. Wong
2018-07-04  2:25       ` Dave Chinner
2018-06-30 17:36   ` Allison Henderson
2018-06-24 19:24 ` [PATCH 07/21] xfs: repair inode btrees Darrick J. Wong
2018-06-28  0:55   ` Dave Chinner
2018-07-04  2:22     ` Darrick J. Wong
2018-06-30 17:36   ` Allison Henderson
2018-06-30 18:30     ` Darrick J. Wong
2018-07-01  0:45       ` Allison Henderson
2018-06-24 19:24 ` [PATCH 08/21] xfs: defer iput on certain inodes while scrub / repair are running Darrick J. Wong
2018-06-28 23:37   ` Dave Chinner
2018-06-29 14:49     ` Darrick J. Wong [this message]
2018-06-24 19:24 ` [PATCH 09/21] xfs: finish our set of inode get/put tracepoints for scrub Darrick J. Wong
2018-06-24 19:24 ` [PATCH 10/21] xfs: introduce online scrub freeze Darrick J. Wong
2018-06-24 19:24 ` [PATCH 11/21] xfs: repair the rmapbt Darrick J. Wong
2018-07-03  5:32   ` Dave Chinner
2018-07-03 23:59     ` Darrick J. Wong
2018-07-04  8:44       ` Carlos Maiolino
2018-07-04 18:40         ` Darrick J. Wong
2018-07-04 23:21       ` Dave Chinner
2018-07-05  3:48         ` Darrick J. Wong
2018-07-05  7:03           ` Dave Chinner
2018-07-06  0:47             ` Darrick J. Wong
2018-07-06  1:08               ` Dave Chinner
2018-06-24 19:24 ` [PATCH 12/21] xfs: repair refcount btrees Darrick J. Wong
2018-07-03  5:50   ` Dave Chinner
2018-07-04  2:23     ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 13/21] xfs: repair inode records Darrick J. Wong
2018-07-03  6:17   ` Dave Chinner
2018-07-04  0:16     ` Darrick J. Wong
2018-07-04  1:03       ` Dave Chinner
2018-07-04  1:30         ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 14/21] xfs: zap broken inode forks Darrick J. Wong
2018-07-04  2:07   ` Dave Chinner
2018-07-04  3:26     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 15/21] xfs: repair inode block maps Darrick J. Wong
2018-07-04  3:00   ` Dave Chinner
2018-07-04  3:41     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 16/21] xfs: repair damaged symlinks Darrick J. Wong
2018-07-04  5:45   ` Dave Chinner
2018-07-04 18:45     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 17/21] xfs: repair extended attributes Darrick J. Wong
2018-07-06  1:03   ` Dave Chinner
2018-07-06  3:10     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 18/21] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-06-29  2:52   ` Dave Chinner
2018-06-24 19:25 ` [PATCH 19/21] xfs: repair quotas Darrick J. Wong
2018-07-06  1:50   ` Dave Chinner
2018-07-06  3:16     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 20/21] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
2018-06-24 19:25 ` [PATCH 21/21] xfs: add online scrub/repair for superblock counters Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180629144943.GL5711@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.