All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 16/25] xfs: scrub inodes
Date: Thu, 5 Oct 2017 18:13:39 +1100	[thread overview]
Message-ID: <20171005071339.GN3666@dastard> (raw)
In-Reply-To: <20171005052219.GE7122@magnolia>

On Wed, Oct 04, 2017 at 10:22:19PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 05, 2017 at 03:04:52PM +1100, Dave Chinner wrote:
> > On Tue, Oct 03, 2017 at 01:42:30PM -0700, Darrick J. Wong wrote:
> > > +	error = xfs_iget(mp, NULL, sc->sm->sm_ino, XFS_IGET_UNTRUSTED,
> > > +			0, &ips);
> > 
> > I think we also want XFS_IGET_DONTCACHE here, so we don't trash the
> > inode cache with inodes that we use once for scrub and never touch
> > again.
> 
> I thought about adding this, but if we let the inodes fall out of the
> cache now then we'll just have to load them back in for the bmap checks,
> right?

Well, I'm looking at ensuring that we don't blow out the memory
side of things. We've still got the inode buffer in the buffer
cache, so I don't see why we should double cache these things
and then leave both cached copied hanging around after we've
finished with them. Leave the buffer around because we do a fair few
checks with it, but don't use excessive icache memory and trash the
working set if we can avoid it...

> > > +xfs_scrub_checkpoint_log(
> > > +	struct xfs_mount	*mp)
> > > +{
> > > +	int			error;
> > > +
> > > +	error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
> > > +	if (error)
> > > +		return error;
> > > +	xfs_ail_push_all_sync(mp->m_ail);
> > > +	return 0;
> > > +}
> > 
> > Oooo, that's a nasty thing to do on busy systems with large dirty
> > logs. I hope this is a "last resort" kind of thing....
> 
> It is; we only do this if the inobt says there's an inode there and the
> inode verifiers fail.

Ok, so why would pushing the log and the AIL make the verifier then
succeed? how likely is this to occur on a busy system?

> > > +/* Set us up with an inode. */
> > 
> > What state are we trying to get the inode into here? We grab all the
> > various locks, but we can still have data changing via mmap pages
> > that are already faulted in and delalloc extents in the incore
> > extent list that aren't reflected on disk...
> > 
> > A comment explaining what we expect here would be nice.
> 
> /* 
>  * Grab total control of the inode metadata.  It doesn't matter here if
>  * the file data is still changing, we just want exclusive access to the
>  * metadata.
>  */

*nod*

> > > +	/* Got the inode, lock it and we're ready to go. */
> > > +	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
> > > +	xfs_ilock(sc->ip, sc->ilock_flags);
> > > +	error = xfs_scrub_trans_alloc(sc->sm, mp, &sc->tp);
> > > +	if (error)
> > > +		goto out_unlock;
> > > +	sc->ilock_flags |= XFS_ILOCK_EXCL;
> > > +	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
> > 
> > Should the inode be joined to the transaction so that cancelling the
> > transaction unlocks the inode? Then the need for the ilock_flags
> > variable goes away....
> 
> This is the confluence of two semi-icky things: first, some of the
> scrubbers (particularly the dir and parent pointer scrubbers) will need
> to drop the ILOCK for short periods of time; later on, repair will want
> to keep the inode locked across all the repair transactions, so it makes
> more sense to control the lock and unlock directly.

Ok, I'll pass on this for now, see how the rest of the code falls
out.

> > > +	/* di_size */
> > > +	isize = be64_to_cpu(dip->di_size);
> > > +	if (isize & (1ULL << 63))
> > > +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > 
> > Should we be checking against the on disk format size, or the
> > mounted filesystem maximum size (i.e. mp->m_super->s_maxbytes)?
> > 32 or 64 bit systems are going to have different maximum valid file
> > sizes..
> 
> It's perfectly valid to 'truncate -s $((2 ** 60) foofile' so the only

Ugh. We can't do IO past 16TB on 32 bit systems, so I'm kinda
surprised truncate doesn't have the same s_maxbytes restriction...

> thing we can really check for here is that the upper bit isn't set
> (because the VFS does not check, but barfs on, files with that large of
> a size).

xfs_max_file_offset() sets the max file offset to 2^63 - 1, so it
looks like the lack of checking in truncate is the problem here,
not the IO path.

> > Directories have a maximum bound size, too - the data space, leaf
> > space and freespace space, each of which are 32GB in size, IIRC.
> > 
> > And symlinks have a different maximum size, too.
> 
> Fair enough, I'll expand the i_size checks, though ISTR the verifiers
> now check that for us.

If they do, then just drop a comment in there to say what is checked
by the verifier.

> > > +	if (!S_ISDIR(mode) && !S_ISREG(mode) && !S_ISLNK(mode) && isize != 0)
> > > +		xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > 
> > > +
> > > +	/* di_nblocks */
> > > +	if (flags2 & XFS_DIFLAG2_REFLINK) {
> > > +		; /* nblocks can exceed dblocks */
> > > +	} else if (flags & XFS_DIFLAG_REALTIME) {
> > > +		if (be64_to_cpu(dip->di_nblocks) >=
> > > +		    mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > 
> > That doesn't seem right. the file can be on either the data or the
> > rt device, so the maximum file blocks is the size of one device or
> > the other, not both combined.
> 
> di_nblocks is the sum of (data blocks + bmbt blocks + attr blocks),
> right?

Yeah, forgot it was more than just data extents.

> So in theory if you had a rt file with 1000 data blocks, 10 bmbt
> blocks to map the data blocks, and 100 attr blocks then di_nblocks has
> to be 1110.

Yup, but the additional metadata on the data device is not going to
be anywhere near the size of the data device.

/me shrugs

I can't think of an easy way to get a maximum block count, so I
guess that'll have to do...

> > > +		if (nextents > fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_BTREE:
> > > +		if (nextents <= fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_LOCAL:
> > > +	case XFS_DINODE_FMT_DEV:
> > > +	case XFS_DINODE_FMT_UUID:
> > > +	default:
> > > +		if (nextents != 0)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	}
> > > +
> > > +	/* di_anextents */
> > > +	nextents = be16_to_cpu(dip->di_anextents);
> > > +	fork_recs =  XFS_DFORK_ASIZE(dip, mp) / sizeof(struct xfs_bmbt_rec);
> > > +	switch (dip->di_aformat) {
> > > +	case XFS_DINODE_FMT_EXTENTS:
> > > +		if (nextents > fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_BTREE:
> > > +		if (nextents <= fork_recs)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	case XFS_DINODE_FMT_LOCAL:
> > > +	case XFS_DINODE_FMT_DEV:
> > > +	case XFS_DINODE_FMT_UUID:
> > > +	default:
> > > +		if (nextents != 0)
> > > +			xfs_scrub_ino_set_corrupt(sc, ino, bp);
> > > +		break;
> > > +	}
> > 
> > Don't we need a check here first to see whether an attribute fork
> > exists or not?
> 
> Do you mean the xfs_inode_fork, or something else?

SOmething else. :P

> XFS_DFORK_ASIZE returns zero if !XFS_DFORK_Q which in turn is based on
> di_forkoff so we're really only checking that di_aformat makes sense
> given the number of extents and the size of the attr fork area.

Right, but if XFS_DFORK_ASIZE == 0, the dip->di_aformat *must* be
XFS_DINODE_FMT_EXTENTS. That's the only valid configuration when
there is no attribute fork present.

If there is an attribute fork present, then it can be XFS_DINODE_FMT_LOCAL,
EXTENT or BTREE, and then the extent count needs checking.
XFS_DINODE_FMT_DEV and XFS_DINODE_FMT_UUID are both invalid for the
attribute fork.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2017-10-05  7:14 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 20:40 [PATCH v11 00/25] xfs: online scrub support Darrick J. Wong
2017-10-03 20:40 ` [PATCH 01/25] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-10-03 20:41 ` [PATCH 02/25] xfs: dispatch metadata scrub subcommands Darrick J. Wong
2017-10-03 20:41 ` [PATCH 03/25] xfs: probe the scrub ioctl Darrick J. Wong
2017-10-03 23:32   ` Dave Chinner
2017-10-04  0:02     ` Darrick J. Wong
2017-10-04  1:56       ` Dave Chinner
2017-10-04  3:14         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 04/25] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
2017-10-03 23:44   ` Dave Chinner
2017-10-04  0:56     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 05/25] xfs: create helpers to scrub a metadata btree Darrick J. Wong
2017-10-03 23:49   ` Dave Chinner
2017-10-04  0:13     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 06/25] xfs: scrub the shape of " Darrick J. Wong
2017-10-04  0:15   ` Dave Chinner
2017-10-04  3:51     ` Darrick J. Wong
2017-10-04  5:48       ` Dave Chinner
2017-10-04 17:48         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 07/25] xfs: scrub btree keys and records Darrick J. Wong
2017-10-04 20:52   ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 08/25] xfs: create helpers to scan an allocation group Darrick J. Wong
2017-10-04  0:46   ` Dave Chinner
2017-10-04  3:58     ` Darrick J. Wong
2017-10-04  5:59       ` Dave Chinner
2017-10-04 17:51         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 09/25] xfs: scrub the backup superblocks Darrick J. Wong
2017-10-04  0:57   ` Dave Chinner
2017-10-04  4:06     ` Darrick J. Wong
2017-10-04  6:13       ` Dave Chinner
2017-10-04 17:56         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 10/25] xfs: scrub AGF and AGFL Darrick J. Wong
2017-10-04  1:31   ` Dave Chinner
2017-10-04  4:21     ` Darrick J. Wong
2017-10-04  6:28       ` Dave Chinner
2017-10-04 17:57         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 11/25] xfs: scrub the AGI Darrick J. Wong
2017-10-04  1:43   ` Dave Chinner
2017-10-04  4:25     ` Darrick J. Wong
2017-10-04  6:43       ` Dave Chinner
2017-10-04 18:02         ` Darrick J. Wong
2017-10-04 22:16           ` Dave Chinner
2017-10-04 23:12             ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 12/25] xfs: scrub free space btrees Darrick J. Wong
2017-10-05  0:59   ` Dave Chinner
2017-10-05  1:13     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 13/25] xfs: scrub inode btrees Darrick J. Wong
2017-10-05  2:08   ` Dave Chinner
2017-10-05  5:47     ` Darrick J. Wong
2017-10-05  7:22       ` Dave Chinner
2017-10-05 18:26         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 14/25] xfs: scrub rmap btrees Darrick J. Wong
2017-10-05  2:56   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 15/25] xfs: scrub refcount btrees Darrick J. Wong
2017-10-05  2:59   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 16/25] xfs: scrub inodes Darrick J. Wong
2017-10-05  4:04   ` Dave Chinner
2017-10-05  5:22     ` Darrick J. Wong
2017-10-05  7:13       ` Dave Chinner [this message]
2017-10-05 19:56         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 17/25] xfs: scrub inode block mappings Darrick J. Wong
2017-10-06  2:51   ` Dave Chinner
2017-10-06 17:00     ` Darrick J. Wong
2017-10-07 23:10       ` Dave Chinner
2017-10-08  3:54         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 18/25] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-10-06  5:07   ` Dave Chinner
2017-10-06 18:30     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 19/25] xfs: scrub directory metadata Darrick J. Wong
2017-10-06  7:07   ` Dave Chinner
2017-10-06 19:45     ` Darrick J. Wong
2017-10-06 22:16       ` Dave Chinner
2017-10-03 20:42 ` [PATCH 20/25] xfs: scrub directory freespace Darrick J. Wong
2017-10-09  1:44   ` Dave Chinner
2017-10-09 22:54     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 21/25] xfs: scrub extended attributes Darrick J. Wong
2017-10-09  2:13   ` Dave Chinner
2017-10-09 21:14     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 22/25] xfs: scrub symbolic links Darrick J. Wong
2017-10-09  2:17   ` Dave Chinner
2017-10-03 20:43 ` [PATCH 23/25] xfs: scrub parent pointers Darrick J. Wong
2017-10-03 20:43 ` [PATCH 24/25] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-10-09  2:28   ` Dave Chinner
2017-10-09 20:24     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 25/25] xfs: scrub quota information Darrick J. Wong
2017-10-09  2:51   ` Dave Chinner
2017-10-09 20:03     ` Darrick J. Wong
2017-10-09 22:17       ` Dave Chinner
2017-10-09 23:08         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171005071339.GN3666@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.