linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 07/13] xfs: check if an inode is cached and allocated
Date: Tue, 6 Jun 2017 11:40:06 -0700	[thread overview]
Message-ID: <20170606184006.GD5196@birch.djwong.org> (raw)
In-Reply-To: <20170606162813.GB55166@bfoster.bfoster>

On Tue, Jun 06, 2017 at 12:28:13PM -0400, Brian Foster wrote:
> On Fri, Jun 02, 2017 at 02:24:43PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Check the inode cache for a particular inode number.  If it's in the
> > cache, check that it's not currently being reclaimed.  If it's not being
> > reclaimed, return zero if the inode is allocated.  This function will be
> > used by various scrubbers to decide if the cache is more up to date
> > than the disk in terms of checking if an inode is allocated.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_icache.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_icache.h |    3 ++
> >  2 files changed, 86 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index f61c84f8..d610a7e 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -633,6 +633,89 @@ xfs_iget(
> >  }
> >  
> >  /*
> > + * "Is this a cached inode that's also allocated?"
> > + *
> > + * Look up an inode by number in the given file system.  If the inode is
> > + * in cache and isn't in purgatory, return 1 if the inode is allocated
> > + * and 0 if it is not.  For all other cases (not in cache, being torn
> > + * down, etc.), return a negative error code.
> > + *
> > + * (The caller has to prevent inode allocation activity.)
> > + */
> 
> Hmm.. so isn't the data returned here potentially invalid once we drop
> the inode reference? In other words, couldn't an inode where we return
> inuse == true be reclaimed immediately after? Perhaps I'm just not far
> enough along to understand how this is used. If that's the case, a note
> about the lifetime/rules of this value might be useful.

The comment could state more explicitly what we're assuming the caller
has done to prevent inode allocation or freeing activity.  The scrubber
that calls this function will have locked the AGI buffer for this AG so
that it can compare the inobt ir_free bits against di_mode to make sure
that there aren't any discrepancies.  Even if the inode is immediately
reclaimed/deleted after we release the inode, the corresponding inobt
update will block on the AGI until the scrubber finishes, so from the
scrubber's point of view things are still consistent.  If the scrubber
finds the inode in some intermediate state of being created or torn
down, it doesn't bother checking the free mask on the assumption that
the thread modifying the inode will ensure the consistency or shut down.

tldr: We assume the caller has the AGI locked so that inodes stay stable
wrt to allocation or freeing, or only end up in an intermediate state;
we also assume the caller can handle inodes in an intermediate state.

> FWIW, I'm also kind of wondering if rather than open code the bits of
> the inode lookup, we could accomplish the same thing with a new flag to
> the existing xfs_iget() lookup mechanism that implements the associated
> semantics (i.e., don't read from disk, don't reinit, sort of a read-only
> semantic).

Originally it was just an iget flag, but the flag ended up special
casing a lot of the existing iget functionality.  Basically, we need to
disable the xfs_iget_cache_miss call; avoid the out_error_or_again case;
do our i_mode testing, release the inode, and jump out of the function
prior to the bit that can call xfs_setup_existing_inode; and change the
lock_flags assert to require lock_flags == 0 when we're just checking.

All that turned xfs_iget into such a muddy mess that I decided it was
cleaner to separate this specialized case into its own function and hope
that we're not really going to modify _iget a whole lot.

Anyway, thank you for the reviewing!

--D

> 
> Brian
> 
> > +int
> > +xfs_icache_inode_is_allocated(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans	*tp,
> > +	xfs_ino_t		ino,
> > +	bool			*inuse)
> > +{
> > +	struct xfs_inode	*ip;
> > +	struct xfs_perag	*pag;
> > +	xfs_agino_t		agino;
> > +	int			ret = 0;
> > +
> > +	/* reject inode numbers outside existing AGs */
> > +	if (!ino || XFS_INO_TO_AGNO(mp, ino) >= mp->m_sb.sb_agcount)
> > +		return -EINVAL;
> > +
> > +	/* get the perag structure and ensure that it's inode capable */
> > +	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ino));
> > +	agino = XFS_INO_TO_AGINO(mp, ino);
> > +
> > +	rcu_read_lock();
> > +	ip = radix_tree_lookup(&pag->pag_ici_root, agino);
> > +	if (!ip) {
> > +		ret = -ENOENT;
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * Is the inode being reused?  Is it new?  Is it being
> > +	 * reclaimed?  Is it being torn down?  For any of those cases,
> > +	 * fall back.
> > +	 */
> > +	spin_lock(&ip->i_flags_lock);
> > +	if (ip->i_ino != ino ||
> > +	    (ip->i_flags & (XFS_INEW | XFS_IRECLAIM | XFS_IRECLAIMABLE))) {
> > +		ret = -EAGAIN;
> > +		goto out_istate;
> > +	}
> > +
> > +	/*
> > +	 * If lookup is racing with unlink, jump out immediately.
> > +	 */
> > +	if (VFS_I(ip)->i_mode == 0) {
> > +		*inuse = false;
> > +		ret = 0;
> > +		goto out_istate;
> > +	}
> > +
> > +	/* If the VFS inode is being torn down, forget it. */
> > +	if (!igrab(VFS_I(ip))) {
> > +		ret = -EAGAIN;
> > +		goto out_istate;
> > +	}
> > +
> > +	/* We've got a live one. */
> > +	spin_unlock(&ip->i_flags_lock);
> > +	rcu_read_unlock();
> > +	xfs_perag_put(pag);
> > +
> > +	*inuse = !!(VFS_I(ip)->i_mode);
> > +	ret = 0;
> > +	IRELE(ip);
> > +
> > +	return ret;
> > +
> > +out_istate:
> > +	spin_unlock(&ip->i_flags_lock);
> > +out:
> > +	rcu_read_unlock();
> > +	xfs_perag_put(pag);
> > +	return ret;
> > +}
> > +
> > +/*
> >   * The inode lookup is done in batches to keep the amount of lock traffic and
> >   * radix tree lookups to a minimum. The batch size is a trade off between
> >   * lookup reduction and stack usage. This is in the reclaim path, so we can't
> > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> > index 9183f77..eadf718 100644
> > --- a/fs/xfs/xfs_icache.h
> > +++ b/fs/xfs/xfs_icache.h
> > @@ -126,4 +126,7 @@ xfs_fs_eofblocks_from_user(
> >  	return 0;
> >  }
> >  
> > +int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
> > +				  xfs_ino_t ino, bool *inuse);
> > +
> >  #endif
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-06-06 18:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-02 21:24 [PATCH v7 00/13] xfs: preparing for online scrub support Darrick J. Wong
2017-06-02 21:24 ` [PATCH 01/13] xfs: optimize _btree_query_all Darrick J. Wong
2017-06-06 13:32   ` Brian Foster
2017-06-06 17:43     ` Darrick J. Wong
2017-06-07  1:18   ` [PATCH v2 " Darrick J. Wong
2017-06-07 14:22     ` Brian Foster
2017-06-02 21:24 ` [PATCH 02/13] xfs: remove double-underscore integer types Darrick J. Wong
2017-06-02 21:24 ` [PATCH 03/13] xfs: always compile the btree inorder check functions Darrick J. Wong
2017-06-06 13:32   ` Brian Foster
2017-06-02 21:24 ` [PATCH 04/13] xfs: export various function for the online scrubber Darrick J. Wong
2017-06-06 13:32   ` Brian Foster
2017-06-02 21:24 ` [PATCH 05/13] xfs: plumb in needed functions for range querying of various btrees Darrick J. Wong
2017-06-06 13:33   ` Brian Foster
2017-06-02 21:24 ` [PATCH 06/13] xfs: export _inobt_btrec_to_irec and _ialloc_cluster_alignment for scrub Darrick J. Wong
2017-06-06 16:27   ` Brian Foster
2017-06-06 17:46     ` Darrick J. Wong
2017-06-02 21:24 ` [PATCH 07/13] xfs: check if an inode is cached and allocated Darrick J. Wong
2017-06-06 16:28   ` Brian Foster
2017-06-06 18:40     ` Darrick J. Wong [this message]
2017-06-07 14:22       ` Brian Foster
2017-06-15  5:00         ` Darrick J. Wong
2017-06-07  1:21   ` [PATCH v2 " Darrick J. Wong
2017-06-16 17:59   ` [PATCH v3 " Darrick J. Wong
2017-06-19 12:07     ` Brian Foster
2017-06-02 21:24 ` [PATCH 08/13] xfs: reflink find shared should take a transaction Darrick J. Wong
2017-06-06 16:28   ` Brian Foster
2017-06-02 21:24 ` [PATCH 09/13] xfs: separate function to check if reflink flag needed Darrick J. Wong
2017-06-06 16:28   ` Brian Foster
2017-06-06 18:05     ` Darrick J. Wong
2017-06-07  1:26   ` [PATCH v2 " Darrick J. Wong
2017-06-07 14:22     ` Brian Foster
2017-06-02 21:25 ` [PATCH 10/13] xfs: refactor the ifork block counting function Darrick J. Wong
2017-06-06 16:29   ` Brian Foster
2017-06-06 18:51     ` Darrick J. Wong
2017-06-06 20:35       ` Darrick J. Wong
2017-06-07  1:29   ` [PATCH v2 9.9/13] xfs: make _bmap_count_blocks consistent wrt delalloc extent behavior Darrick J. Wong
2017-06-07 15:11     ` Brian Foster
2017-06-07 16:19       ` Darrick J. Wong
2017-06-07  1:29   ` [PATCH v2 10/13] xfs: refactor the ifork block counting function Darrick J. Wong
2017-06-07 15:11     ` Brian Foster
2017-06-02 21:25 ` [PATCH 11/13] xfs: return the hash value of a leaf1 directory block Darrick J. Wong
2017-06-08 13:02   ` Brian Foster
2017-06-08 15:53     ` Darrick J. Wong
2017-06-08 16:31       ` Brian Foster
2017-06-08 16:43         ` Darrick J. Wong
2017-06-08 16:52           ` Brian Foster
2017-06-08 18:22   ` [PATCH v2 " Darrick J. Wong
2017-06-09 12:54     ` Brian Foster
2017-06-02 21:25 ` [PATCH 12/13] xfs: pass along transaction context when reading directory block buffers Darrick J. Wong
2017-06-08 13:02   ` Brian Foster
2017-06-02 21:25 ` [PATCH 13/13] xfs: pass along transaction context when reading xattr " Darrick J. Wong
2017-06-08 13:02   ` Brian Foster
2017-06-02 22:19 ` [PATCH 14/13] xfs: allow reading of already-locked remote symbolic link Darrick J. Wong
2017-06-08 13:02   ` Brian Foster
2017-06-26  6:04 ` [PATCH 15/13] xfs: grab dquots without taking the ilock Darrick J. Wong
2017-06-27 11:00   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170606184006.GD5196@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).