Re: [RFC PATCH 2/2] xfs: don't access AGI on unlinked inodes if it can

From: Dave Chinner <david@fromorbit.com>
To: Gao Xiang <hsiangkao@redhat.com>
Cc: linux-xfs@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Brian Foster <bfoster@redhat.com>
Subject: Re: [RFC PATCH 2/2] xfs: don't access AGI on unlinked inodes if it can
Date: Thu, 9 Jul 2020 09:33:11 +1000	[thread overview]
Message-ID: <20200708233311.GP2005@dread.disaster.area> (raw)
In-Reply-To: <20200707135741.487-3-hsiangkao@redhat.com>

On Tue, Jul 07, 2020 at 09:57:41PM +0800, Gao Xiang wrote:
> Currently, we use AGI buffer lock to protect in-memory linked list for
> unlinked inodes but since it's not necessary to modify AGI unless the
> head of the unlinked list is modified. So let's removing the AGI buffer
> modification dependency if possible, including 1) adding another per-AG
> dedicated lock to protect the whole list and 2) inserting unlinked
> inodes from tail.
> 
> For 2), the tail of bucket 0 is now recorded in perag for xfs_iunlink()
> to use. xfs_iunlink_remove() still support old multiple short bucket
> lists for recovery code.

I would split this into two separate patches. One to move to a perag
based locking strategy, another to change from head to tail
addition as they are largely independent algorithmic changes.

> Note that some paths take AGI lock in its transaction in advance,
> so the proper locking order is only AGI lock -> unlinked list lock.

These paths should be documented in the commit message as well
as in code comments so the reviewer is aware of those code paths
and can verify that your assumptions about locking order are
correct.

> 
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/xfs_inode.c       | 251 ++++++++++++++++++++-------------------
>  fs/xfs/xfs_log_recover.c |   6 +
>  fs/xfs/xfs_mount.c       |   3 +
>  fs/xfs/xfs_mount.h       |   3 +
>  4 files changed, 144 insertions(+), 119 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 10565fa5ace4..d33e5b198534 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1994,182 +1994,195 @@ xfs_iunlink_update_bucket(
>  }
>  
>  /*
> - * Always insert at the head, so we only have to do a next inode lookup to
> - * update it's prev pointer. The AGI bucket will point at the one we are
> - * inserting.
> + * This is called when the inode's link count has gone to 0 or we are creating
> + * a tmpfile via O_TMPFILE.  The inode @ip must have nlink == 0.
> + *
> + * We place the on-disk inode on a list in the AGI.  It will be pulled from this
> + * list when the inode is freed.
>   */
> -static int
> -xfs_iunlink_insert_inode(

Hmmm. Flattening the code also make the patch harder to follow as it
combines code movement/rearrangement with algorithmic changes.  We
try to separate out code movement/rearrangement into their own
patches so that the movement is easy to verify by itself.

Also, the helper functions help document the separation of the
unlinked list manipulations from the from setup and locking
requirements for the list manipulations, and this largely undoes all
that. I added these helpers because it completely untangled the mess
that was present before the RFC patchset I posted. THat is, I
couldn't easily modify the existing code because it interleaved
the locking, the backref hash manipulations and
the on-disk list manipulations in ways I found difficult to
understand and manage. Short, simple, clear functions are much
better than long, multiple operation functions...

i.e. this:

xfs_iunlink()
{

	get locks
	do list insert
	drop locks
}

Is better for understanding, maintenance and future modification
than:

xfs_iunlink()
{

	get perag
	lock perag
	look at tail of list
	if (empty) {
		unlock perag
		read/lock AGI
		lock perag
		look at tail of list
		if (empty)
			do head insert
			goto out
	}
	do tail insert
out:
	update inode/pag tails
	unlock
	drop perag
}

It's trivial for a reader to understand what the first version of
xfs_iunlink() is going to do without needing to understand the
intraccies of the locking strategies. However, it takes time and
effort to undestand exactly waht the second one is doing because
it's not clear where lock ends and list modifications start, nor
what the locking rules are for the different modifications that are
being made. Essentially, it goes back to the complex
locking-intertwined-with-modification-algorithm problem the current
TOT code has.

I'd much prefer to see something like this:

/*
 * Inode allocation in the O_TMPFILE path defines the AGI/unlinked
 * list lock order as being AGI->perag unlinked list lock. We are
 * inverting it here as the fast path tail addition does not need to
 * modify the AGI at all. Hence we only need the AGI lock if the
 * tail is empty, but if we fail to get it without blocking then we
 * need to fall back to the slower, correct lock order.
 */
xfs_iunlink_insert_lock()
{
	get perag;
	lock_perag();
	if (!tail empty)
		return;
	if (trylock AGI)
		return;

	/*
	 * Slow path, need to lock AGI first. Don't even bother
	 * rechecking tail pointers or trying to optimise for
	 * minimal AGI lock hold time as racing unlink list mods
	 * will all block on the perag lock once we take that. They
	 * will then hit the !tail empty fast path and not require
	 * the AGI lock at all.
	 */
	lock AGI
	lock_perag()
	return;
}

The non-AGI locking fast path is slightly different in the remove
case, so we'll have a slightly different helper function in that
case which checks where the inode being removed is in the list.

In both cases, though, the unlock should be the same:

xfs_iunlink_unlock()
{
	/* Does not unlock AGI, ever. commit does that. */
	unlock perag
	put perag
}

This keeps the list locking completely separate from the list
manipulations and allows us to document the locking constraints and
reasons for why it is or isn't optimised for specific conditions
without cluttering up the list manipulations code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com