All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 13/25] xfs: scrub inode btrees
Date: Wed, 4 Oct 2017 22:47:14 -0700	[thread overview]
Message-ID: <20171005054714.GF7122@magnolia> (raw)
In-Reply-To: <20171005020810.GJ3666@dastard>

On Thu, Oct 05, 2017 at 01:08:10PM +1100, Dave Chinner wrote:
> On Tue, Oct 03, 2017 at 01:42:11PM -0700, Darrick J. Wong wrote:
> > +/*
> > + * Set us up to scrub inode btrees.
> > + * If we detect a discrepancy between the inobt and the inode,
> > + * try again after forcing logged inode cores out to disk.
> > + */
> > +int
> > +xfs_scrub_setup_ag_iallocbt(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip)
> > +{
> > +	return xfs_scrub_setup_ag_btree(sc, ip, sc->try_harder);
> > +}
> > +
> > +/* Inode btree scrubber. */
> > +
> > +/* Is this chunk worth checking? */
> > +STATIC bool
> > +xfs_scrub_iallocbt_chunk(
> > +	struct xfs_scrub_btree		*bs,
> > +	struct xfs_inobt_rec_incore	*irec,
> > +	xfs_agino_t			agino,
> > +	xfs_extlen_t			len)
> > +{
> > +	struct xfs_mount		*mp = bs->cur->bc_mp;
> > +	struct xfs_agf			*agf;
> > +	unsigned long long		rec_end;
> > +	xfs_agblock_t			eoag;
> > +	xfs_agblock_t			bno;
> > +
> > +	agf = XFS_BUF_TO_AGF(bs->sc->sa.agf_bp);
> > +	eoag = be32_to_cpu(agf->agf_length);

<nod>

> Probably should use the AGI for this.
> 
> > +	bno = XFS_AGINO_TO_AGBNO(mp, agino);
> > +	rec_end = (unsigned long long)bno + len;
> > +
> > +	if (bno >= mp->m_sb.sb_agblocks || bno >= eoag ||
> > +	    rec_end > mp->m_sb.sb_agblocks || rec_end > eoag) {
> 
> Same comment as last patch.

Yup yup.

> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> 
> I note there is no check on the length passed in for the inode
> chunk - should that be verified?

len is computed from constants, so in theory it should be safe since
if the geometry is all messed up then we likely wouldn't have been
able to mount.

OTOH it's pretty cheap to put in, so sure.

> > +
> > +/* Count the number of free inodes. */
> > +static unsigned int
> > +xfs_scrub_iallocbt_freecount(
> > +	xfs_inofree_t			freemask)
> > +{
> > +	int				bits = XFS_INODES_PER_CHUNK;
> > +	unsigned int			ret = 0;
> > +
> > +	while (bits--) {
> > +		if (freemask & 1)
> > +			ret++;
> > +		freemask >>= 1;
> > +	}
> 
> 
> Seems a little cumbersome. Perhaps a loop using xfs_next_bit()
> might be a bit faster, something like:
> 
> 	nextbit = xfs_next_bit(&freemask, 1, 0); 
> 	while (nextbit != -1) {
> 		ret++;
> 		nextbit = xfs_next_bit(&freemask, 1, nextbit + 1);
> 	}

<nod>  A pity there's no popcnt()...

> > +/* Check a particular inode with ir_free. */
> > +STATIC int
> > +xfs_scrub_iallocbt_check_cluster_freemask(
> > +	struct xfs_scrub_btree		*bs,
> > +	xfs_ino_t			fsino,
> > +	xfs_agino_t			chunkino,
> > +	xfs_agino_t			clusterino,
> > +	struct xfs_inobt_rec_incore	*irec,
> > +	struct xfs_buf			*bp)
> > +{
> > +	struct xfs_dinode		*dip;
> > +	struct xfs_mount		*mp = bs->cur->bc_mp;
> > +	bool				freemask_ok;
> > +	bool				inuse;
> > +	int				error;
> > +
> > +	dip = xfs_buf_offset(bp, clusterino * mp->m_sb.sb_inodesize);
> > +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC ||
> > +	    (dip->di_version >= 3 &&
> > +	     be64_to_cpu(dip->di_ino) != fsino + clusterino)) {
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		goto out;
> > +	}
> > +
> > +	freemask_ok = !!(irec->ir_free & XFS_INOBT_MASK(chunkino + clusterino));
> 
> No need for !!(...) for a bool type - the compiler will squash it
> down to 0/1 autmoatically.

<nod>

> > +	error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp,
> > +			fsino + clusterino, &inuse);
> > +	if (error == -ENODATA) {
> > +		/* Not cached, just read the disk buffer */
> 
> I think that is wrong. xfs_icache_inode_is_allocated() returns
> -ENOENT if the inode is not in cache....

I changed it to ENODATA so that we can tell the difference between
inode not in cache (ENODATA) and inode racing with unlink (ENOENT).

(Patch was sent to the ML a while ago and I omitted it from this posting...)

> > +		freemask_ok ^= !!(dip->di_mode);
> > +		if (!bs->sc->try_harder && !freemask_ok)
> > +			return -EDEADLOCK;
> > +	} else if (error < 0) {
> > +		/* Inode is only half assembled, don't bother. */
> > +		freemask_ok = true;
> 
> Or we had an IO error looking it up. i.e. -EAGAIN is the "half
> assembled" state (i.e. in the XFS_INEW state) or the half
> *disasembled* state (i.e. XFS_IRECLAIMABLE), anything
> else is an error...

<nod>

> > +	} else {
> > +		/* Inode is all there. */
> > +		freemask_ok ^= inuse;
> 
> So inuse is returned from a mode check after iget succeeds. The mode
> isn't zeroed until  /after/ XFS_IRECLAIMABLE is set, but it's also
> set before XFS_INEW is cleared.  IOWs, how can
> xfs_icache_inode_is_allocated() report anything
> other than inuse == true here? If that's the case, what's the point
> of the mode check inside xfs_icache_inode_is_allocated()?

I think you're right about this, I'll have a look in the morning.

> > +	}
> > +	if (!freemask_ok)
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +out:
> > +	return 0;
> > +}
> > +
> > +/* Make sure the free mask is consistent with what the inodes think. */
> > +STATIC int
> > +xfs_scrub_iallocbt_check_freemask(
> > +	struct xfs_scrub_btree		*bs,
> > +	struct xfs_inobt_rec_incore	*irec)
> > +{
> > +	struct xfs_owner_info		oinfo;
> > +	struct xfs_imap			imap;
> > +	struct xfs_mount		*mp = bs->cur->bc_mp;
> > +	struct xfs_dinode		*dip;
> > +	struct xfs_buf			*bp;
> > +	xfs_ino_t			fsino;
> > +	xfs_agino_t			nr_inodes;
> > +	xfs_agino_t			agino;
> > +	xfs_agino_t			chunkino;
> > +	xfs_agino_t			clusterino;
> > +	xfs_agblock_t			agbno;
> > +	int				blks_per_cluster;
> > +	uint16_t			holemask;
> > +	uint16_t			ir_holemask;
> > +	int				error = 0;
> > +
> > +	/* Make sure the freemask matches the inode records. */
> > +	blks_per_cluster = xfs_icluster_size_fsb(mp);
> > +	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
> 
> Does this setup and loop work for the case where we have 64k
> filesystem blocks and so two or more inode chunks per filesystem
> block (i.e. ppc64)? 

I think the answer is yes, at worst case we end up processing a block's
worth of inodes at a time.  The last time I ran scrub on ppc64 (last
week) it worked fine.

> > +/* Scrub an inobt/finobt record. */
> > +STATIC int
> > +xfs_scrub_iallocbt_helper(
> > +	struct xfs_scrub_btree		*bs,
> > +	union xfs_btree_rec		*rec)
> > +{
> > +	struct xfs_mount		*mp = bs->cur->bc_mp;
> > +	struct xfs_agi			*agi;
> > +	struct xfs_inobt_rec_incore	irec;
> > +	uint64_t			holes;
> > +	xfs_agino_t			agino;
> > +	xfs_agblock_t			agbno;
> > +	xfs_extlen_t			len;
> > +	int				holecount;
> > +	int				i;
> > +	int				error = 0;
> > +	unsigned int			real_freecount;
> > +	uint16_t			holemask;
> > +
> > +	xfs_inobt_btrec_to_irec(mp, rec, &irec);
> > +
> > +	if (irec.ir_count > XFS_INODES_PER_CHUNK ||
> > +	    irec.ir_freecount > XFS_INODES_PER_CHUNK)
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +
> > +	real_freecount = irec.ir_freecount +
> > +			(XFS_INODES_PER_CHUNK - irec.ir_count);
> > +	if (real_freecount != xfs_scrub_iallocbt_freecount(irec.ir_free))
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +
> > +	agi = XFS_BUF_TO_AGI(bs->sc->sa.agi_bp);
> > +	agino = irec.ir_startino;
> > +	agbno = XFS_AGINO_TO_AGBNO(mp, irec.ir_startino);
> > +	if (agbno >= be32_to_cpu(agi->agi_length)) {
> 
> Validate as per every other agbno?

<nod>

> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +		goto out;
> > +	}
> > +
> > +	if ((agbno & (xfs_ialloc_cluster_alignment(mp) - 1)) ||
> > +	    (agbno & (xfs_icluster_size_fsb(mp) - 1)))
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> 
> What's the magic masking checks being done here? (comment?)

/* Make sure this record is aligned to cluster and inoalignmnt size. */

> > +	/* Handle non-sparse inodes */
> > +	if (!xfs_inobt_issparse(irec.ir_holemask)) {
> > +		len = XFS_B_TO_FSB(mp,
> > +				XFS_INODES_PER_CHUNK * mp->m_sb.sb_inodesize);
> > +		if (irec.ir_count != XFS_INODES_PER_CHUNK)
> > +			xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +
> > +		if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
> > +			goto out;
> > +		goto check_freemask;
> > +	}
> > +
> > +	/* Check each chunk of a sparse inode cluster. */
> > +	holemask = irec.ir_holemask;
> > +	holecount = 0;
> > +	len = XFS_B_TO_FSB(mp,
> > +			XFS_INODES_PER_HOLEMASK_BIT * mp->m_sb.sb_inodesize);
> > +	holes = ~xfs_inobt_irec_to_allocmask(&irec);
> > +	if ((holes & irec.ir_free) != holes ||
> > +	    irec.ir_freecount > irec.ir_count)
> > +		xfs_scrub_btree_set_corrupt(bs->sc, bs->cur, 0);
> > +
> > +	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; holemask >>= 1,
> > +			i++, agino += XFS_INODES_PER_HOLEMASK_BIT) {
> 
> Urk. THat's a bit hard to read.
> 
> > +		if (holemask & 1) {
> > +			holecount += XFS_INODES_PER_HOLEMASK_BIT;
> > +			continue;
> > +		}
> > +
> > +		if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
> > +			break;
> > +	}
> 
> How about
> 
> 	for (i = 0; i < XFS_INOBT_HOLEMASK_BITS; i++) {
> 		if (holemask & 1) {
> 			holecount += XFS_INODES_PER_HOLEMASK_BIT;
> 		} else if (!xfs_scrub_iallocbt_chunk(bs, &irec, agino, len))
> 			break;
> 
> 		holemask >>= 1;
> 		agino += XFS_INODES_PER_HOLEMASK_BIT;
> 	}

Looks better than mine. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-10-05  5:47 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 20:40 [PATCH v11 00/25] xfs: online scrub support Darrick J. Wong
2017-10-03 20:40 ` [PATCH 01/25] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-10-03 20:41 ` [PATCH 02/25] xfs: dispatch metadata scrub subcommands Darrick J. Wong
2017-10-03 20:41 ` [PATCH 03/25] xfs: probe the scrub ioctl Darrick J. Wong
2017-10-03 23:32   ` Dave Chinner
2017-10-04  0:02     ` Darrick J. Wong
2017-10-04  1:56       ` Dave Chinner
2017-10-04  3:14         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 04/25] xfs: create helpers to record and deal with scrub problems Darrick J. Wong
2017-10-03 23:44   ` Dave Chinner
2017-10-04  0:56     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 05/25] xfs: create helpers to scrub a metadata btree Darrick J. Wong
2017-10-03 23:49   ` Dave Chinner
2017-10-04  0:13     ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 06/25] xfs: scrub the shape of " Darrick J. Wong
2017-10-04  0:15   ` Dave Chinner
2017-10-04  3:51     ` Darrick J. Wong
2017-10-04  5:48       ` Dave Chinner
2017-10-04 17:48         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 07/25] xfs: scrub btree keys and records Darrick J. Wong
2017-10-04 20:52   ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 08/25] xfs: create helpers to scan an allocation group Darrick J. Wong
2017-10-04  0:46   ` Dave Chinner
2017-10-04  3:58     ` Darrick J. Wong
2017-10-04  5:59       ` Dave Chinner
2017-10-04 17:51         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 09/25] xfs: scrub the backup superblocks Darrick J. Wong
2017-10-04  0:57   ` Dave Chinner
2017-10-04  4:06     ` Darrick J. Wong
2017-10-04  6:13       ` Dave Chinner
2017-10-04 17:56         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 10/25] xfs: scrub AGF and AGFL Darrick J. Wong
2017-10-04  1:31   ` Dave Chinner
2017-10-04  4:21     ` Darrick J. Wong
2017-10-04  6:28       ` Dave Chinner
2017-10-04 17:57         ` Darrick J. Wong
2017-10-03 20:41 ` [PATCH 11/25] xfs: scrub the AGI Darrick J. Wong
2017-10-04  1:43   ` Dave Chinner
2017-10-04  4:25     ` Darrick J. Wong
2017-10-04  6:43       ` Dave Chinner
2017-10-04 18:02         ` Darrick J. Wong
2017-10-04 22:16           ` Dave Chinner
2017-10-04 23:12             ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 12/25] xfs: scrub free space btrees Darrick J. Wong
2017-10-05  0:59   ` Dave Chinner
2017-10-05  1:13     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 13/25] xfs: scrub inode btrees Darrick J. Wong
2017-10-05  2:08   ` Dave Chinner
2017-10-05  5:47     ` Darrick J. Wong [this message]
2017-10-05  7:22       ` Dave Chinner
2017-10-05 18:26         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 14/25] xfs: scrub rmap btrees Darrick J. Wong
2017-10-05  2:56   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 15/25] xfs: scrub refcount btrees Darrick J. Wong
2017-10-05  2:59   ` Dave Chinner
2017-10-05  5:02     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 16/25] xfs: scrub inodes Darrick J. Wong
2017-10-05  4:04   ` Dave Chinner
2017-10-05  5:22     ` Darrick J. Wong
2017-10-05  7:13       ` Dave Chinner
2017-10-05 19:56         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 17/25] xfs: scrub inode block mappings Darrick J. Wong
2017-10-06  2:51   ` Dave Chinner
2017-10-06 17:00     ` Darrick J. Wong
2017-10-07 23:10       ` Dave Chinner
2017-10-08  3:54         ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 18/25] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-10-06  5:07   ` Dave Chinner
2017-10-06 18:30     ` Darrick J. Wong
2017-10-03 20:42 ` [PATCH 19/25] xfs: scrub directory metadata Darrick J. Wong
2017-10-06  7:07   ` Dave Chinner
2017-10-06 19:45     ` Darrick J. Wong
2017-10-06 22:16       ` Dave Chinner
2017-10-03 20:42 ` [PATCH 20/25] xfs: scrub directory freespace Darrick J. Wong
2017-10-09  1:44   ` Dave Chinner
2017-10-09 22:54     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 21/25] xfs: scrub extended attributes Darrick J. Wong
2017-10-09  2:13   ` Dave Chinner
2017-10-09 21:14     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 22/25] xfs: scrub symbolic links Darrick J. Wong
2017-10-09  2:17   ` Dave Chinner
2017-10-03 20:43 ` [PATCH 23/25] xfs: scrub parent pointers Darrick J. Wong
2017-10-03 20:43 ` [PATCH 24/25] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-10-09  2:28   ` Dave Chinner
2017-10-09 20:24     ` Darrick J. Wong
2017-10-03 20:43 ` [PATCH 25/25] xfs: scrub quota information Darrick J. Wong
2017-10-09  2:51   ` Dave Chinner
2017-10-09 20:03     ` Darrick J. Wong
2017-10-09 22:17       ` Dave Chinner
2017-10-09 23:08         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171005054714.GF7122@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.