Re: [RFC PATCH 1/4] xfs: support deactivating AGs

From: Dave Chinner <david@fromorbit.com>
To: Gao Xiang <hsiangkao@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 1/4] xfs: support deactivating AGs
Date: Thu, 15 Apr 2021 13:42:55 +1000	[thread overview]
Message-ID: <20210415034255.GJ63242@dread.disaster.area> (raw)
In-Reply-To: <20210414195240.1802221-2-hsiangkao@redhat.com>

On Thu, Apr 15, 2021 at 03:52:37AM +0800, Gao Xiang wrote:
> To get rid of paralleled requests related to AGs which are pending
> for shrinking, mark these perags as inactive rather than playing
> with per-ag structures theirselves.
> 
> Since in that way, a per-ag lock can be used to stablize the inactive
> status together with agi/agf buffer lock (which is much easier than
> adding more complicated perag_{get, put} pairs..) Also, Such per-ags
> can be released / reused when unmountfs / growfs.
> 
> On the read side, pag_inactive_rwsem can be unlocked immediately after
> the agf or agi buffer lock is acquired. However, pag_inactive_rwsem
> can only be unlocked after the agf/agi buffer locks are all acquired
> with the inactive status on the write side.
> 
> XXX: maybe there are some missing cases.
> 
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.c     | 16 +++++++++++++---
>  fs/xfs/libxfs/xfs_alloc.c  | 12 +++++++++++-
>  fs/xfs/libxfs/xfs_ialloc.c | 26 +++++++++++++++++++++++++-
>  fs/xfs/xfs_mount.c         |  2 ++
>  fs/xfs/xfs_mount.h         |  6 ++++++
>  5 files changed, 57 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index c68a36688474..ba5702e5c9ad 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -676,16 +676,24 @@ xfs_ag_get_geometry(
>  	if (agno >= mp->m_sb.sb_agcount)
>  		return -EINVAL;
>  
> +	pag = xfs_perag_get(mp, agno);
> +	down_read(&pag->pag_inactive_rwsem);

No need to encode the lock type in the lock name. We know it's a
rwsem from the lock API functions...

> +
> +	if (pag->pag_inactive) {
> +		error = -EBUSY;
> +		up_read(&pag->pag_inactive_rwsem);
> +		goto out;
> +	}

This looks kinda heavyweight. Having to take a rwsem whenever we do
a perag lookup to determine if we can access the perag completely
defeats the purpose of xfs_perag_get() being a lightweight, lockless
operation.

I suspect what we really want here is active/passive references like
are used for the superblock, and an API that hides the
implementation from all the callers.

> +
>  	/* Lock the AG headers. */
>  	error = xfs_ialloc_read_agi(mp, NULL, agno, &agi_bp);
> +	up_read(&pag->pag_inactive_rwsem);
>  	if (error)
> -		return error;
> +		goto out;
>  	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agf_bp);
>  	if (error)
>  		goto out_agi;

This seems to imply that we don't need to hold the inactive rwsem
to read the AGF. Either we need the semaphore held in read mode to
protect the inactive state for the entire operation we are
performing, or you haven't documented how the locking is supposed to
guarantee existence once the inactive_rwsem has been dropped.

Instead, lets' look at this as an active vs passive reference issue.
Currently everything takes passive references because we don't
enforce the reference count for anything - it's largely to tell us
that everything that took a reference also released it by the time
we tear down the perag.

So say that buffers take passive references because we need buffers
to be used while we tear down the AG, and that everything else that
does perag_get()...perag_put() in a loop or single logic context is
an active reference, we can actually use active references to
prevent a shrink from stepping into the perag while something else
is actively referencing the perag.

Hence we convert the sites in this patch to use
xfs_perag_get_active() ...  xfs_perag_put_active() pairs, and
everything else gets hidden away inside those functions.

That allows us to implement active references as an atomic counter
that is manipulated under RCU context to provide existence
guarantees for the structure.  i.e. something like this:

/*
 * Get an active reference to the perag for a given agno. If the AG
 * is in the process of being torn down, the active reference count
 * will be zero and this lookup will fail and return NULL.
 */
static inline struct xfs_perag *
xfs_perag_get_active(
	struct xfs_mount	*mp,
	xfs_agnumber_t		agno)
{
	struct xfs_perag        *pag;

	rcu_read_lock();
	pag = radix_tree_lookup(&mp->m_perag_tree, agno);
	if (pag) {
		if (!atomic_inc_not_zero(&pag->pag_active_ref))
			pag = NULL;
	}
	rcu_read_unlock();
	trace_xfs_perag_get(mp, agno, ref, _RET_IP_);
	return pag;
}

static inline void
xfs_perag_put_active(
	struct xfs_perag	*pag)
{
	atomic_dec(&pag->pag_active_ref);
}

For active references to work as an access barrier,
xfs_perag_initialise() needs to first take an active reference for
the mount when it is allocated. This makes the initial value
non-zero, so active references can be taken at any time after
initialisation.

When shrink needs to remove the AG, it first drops the mount's
active reference to the AG. This allows the active reference count
to fall to zero. It then waits for it to hit zero, and once it does
all future active reference lookups will fail to find this perag.
Hence shrink now can start the process of tearing down the AG
structures knowing that nobody is going to be using the AG for
per-ag based work...

This has no more overhead than the a passive lookup, and does not
require a rwsem or a separate inactive state flag to detect or
switch between active and inactive states. It's also trivially
reversable - the mount just has to take an active reference once the
perag has been re-initialised.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com