From: Gao Xiang <hsiangkao@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Brian Foster <bfoster@redhat.com>
Subject: Re: [RFC PATCH 2/2] xfs: don't access AGI on unlinked inodes if it can
Date: Thu, 9 Jul 2020 08:55:26 +0800 [thread overview]
Message-ID: <20200709005526.GC15249@xiangao.remote.csb> (raw)
In-Reply-To: <20200708233311.GP2005@dread.disaster.area>
On Thu, Jul 09, 2020 at 09:33:11AM +1000, Dave Chinner wrote:
> On Tue, Jul 07, 2020 at 09:57:41PM +0800, Gao Xiang wrote:
> > Currently, we use AGI buffer lock to protect in-memory linked list for
> > unlinked inodes but since it's not necessary to modify AGI unless the
> > head of the unlinked list is modified. So let's removing the AGI buffer
> > modification dependency if possible, including 1) adding another per-AG
> > dedicated lock to protect the whole list and 2) inserting unlinked
> > inodes from tail.
> >
> > For 2), the tail of bucket 0 is now recorded in perag for xfs_iunlink()
> > to use. xfs_iunlink_remove() still support old multiple short bucket
> > lists for recovery code.
>
> I would split this into two separate patches. One to move to a perag
> based locking strategy, another to change from head to tail
> addition as they are largely independent algorithmic changes.
Yes, that is much better from the perspective of spilting patches and
I thought that before. It seems that is like 2 steps but the proposed
target solution is as a whole (in other words, 2 steps are code-related)
and I'm not sure how large these code is sharable or can be inherited
but rather than introduce some code in patch 2 and then remove immediately
and turn into a new code in patch 3). I'm not sure how large logic could
be sharable between these 2 dependent steps so I didn't do that.
I will spilt patches in the next RFC version to make a try.
>
> > Note that some paths take AGI lock in its transaction in advance,
> > so the proper locking order is only AGI lock -> unlinked list lock.
>
> These paths should be documented in the commit message as well
> as in code comments so the reviewer is aware of those code paths
> and can verify that your assumptions about locking order are
> correct.
Yeah, It has been documented in the following.
+xfs_iunlink(
struct xfs_trans *tp,
- struct xfs_buf *agibp,
struct xfs_inode *ip)
{
struct xfs_mount *mp = tp->t_mountp;
- struct xfs_agi *agi = agibp->b_addr;
- xfs_agino_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+ xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
+ struct xfs_perag *pag;
+ struct xfs_inode *pip;
xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
- xfs_agino_t next_agino;
+ int error;
+
+ ASSERT(VFS_I(ip)->i_nlink == 0);
+ ASSERT(VFS_I(ip)->i_mode != 0);
+ trace_xfs_iunlink(ip);
+
+ pag = xfs_perag_get(mp, agno);
/*
- * Get the index into the agi hash table for the list this inode will
- * go on. Make sure the pointer isn't garbage and that this inode
- * isn't already on the list.
+ * some paths (e.g. xfs_create_tmpfile) could take AGI lock
+ * in this transaction in advance and the only locking order
+ * AGI buf lock -> pag_unlinked_mutex is safe.
>
> >
> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> > ---
> > fs/xfs/xfs_inode.c | 251 ++++++++++++++++++++-------------------
> > fs/xfs/xfs_log_recover.c | 6 +
> > fs/xfs/xfs_mount.c | 3 +
> > fs/xfs/xfs_mount.h | 3 +
> > 4 files changed, 144 insertions(+), 119 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 10565fa5ace4..d33e5b198534 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -1994,182 +1994,195 @@ xfs_iunlink_update_bucket(
> > }
> >
> > /*
> > - * Always insert at the head, so we only have to do a next inode lookup to
> > - * update it's prev pointer. The AGI bucket will point at the one we are
> > - * inserting.
> > + * This is called when the inode's link count has gone to 0 or we are creating
> > + * a tmpfile via O_TMPFILE. The inode @ip must have nlink == 0.
> > + *
> > + * We place the on-disk inode on a list in the AGI. It will be pulled from this
> > + * list when the inode is freed.
> > */
> > -static int
> > -xfs_iunlink_insert_inode(
>
> Hmmm. Flattening the code also make the patch harder to follow as it
> combines code movement/rearrangement with algorithmic changes. We
> try to separate out code movement/rearrangement into their own
> patches so that the movement is easy to verify by itself.
>
> Also, the helper functions help document the separation of the
> unlinked list manipulations from the from setup and locking
> requirements for the list manipulations, and this largely undoes all
> that. I added these helpers because it completely untangled the mess
> that was present before the RFC patchset I posted. THat is, I
> couldn't easily modify the existing code because it interleaved
> the locking, the backref hash manipulations and
> the on-disk list manipulations in ways I found difficult to
> understand and manage. Short, simple, clear functions are much
> better than long, multiple operation functions...
>
> i.e. this:
>
> xfs_iunlink()
> {
>
> get locks
> do list insert
> drop locks
> }
>
> Is better for understanding, maintenance and future modification
> than:
>
> xfs_iunlink()
> {
>
> get perag
> lock perag
> look at tail of list
> if (empty) {
> unlock perag
> read/lock AGI
> lock perag
> look at tail of list
> if (empty)
> do head insert
> goto out
> }
> do tail insert
> out:
> update inode/pag tails
> unlock
> drop perag
> }
>
> It's trivial for a reader to understand what the first version of
> xfs_iunlink() is going to do without needing to understand the
> intraccies of the locking strategies. However, it takes time and
> effort to undestand exactly waht the second one is doing because
> it's not clear where lock ends and list modifications start, nor
> what the locking rules are for the different modifications that are
> being made. Essentially, it goes back to the complex
> locking-intertwined-with-modification-algorithm problem the current
> TOT code has.
>
> I'd much prefer to see something like this:
>
> /*
> * Inode allocation in the O_TMPFILE path defines the AGI/unlinked
> * list lock order as being AGI->perag unlinked list lock. We are
> * inverting it here as the fast path tail addition does not need to
> * modify the AGI at all. Hence we only need the AGI lock if the
> * tail is empty, but if we fail to get it without blocking then we
> * need to fall back to the slower, correct lock order.
> */
> xfs_iunlink_insert_lock()
> {
> get perag;
> lock_perag();
> if (!tail empty)
> return;
> if (trylock AGI)
> return;
(adding some notes here, this patch doesn't use try lock here
finally but unlock perag and take AGI and relock and recheck tail_empty....
since the tail non-empty is rare...)
>
> /*
> * Slow path, need to lock AGI first. Don't even bother
> * rechecking tail pointers or trying to optimise for
> * minimal AGI lock hold time as racing unlink list mods
> * will all block on the perag lock once we take that. They
> * will then hit the !tail empty fast path and not require
> * the AGI lock at all.
> */
> lock AGI
> lock_perag()
> return;
> }
>
> The non-AGI locking fast path is slightly different in the remove
> case, so we'll have a slightly different helper function in that
> case which checks where the inode being removed is in the list.
>
> In both cases, though, the unlock should be the same:
>
> xfs_iunlink_unlock()
> {
> /* Does not unlock AGI, ever. commit does that. */
> unlock perag
> put perag
> }
>
> This keeps the list locking completely separate from the list
> manipulations and allows us to document the locking constraints and
> reasons for why it is or isn't optimised for specific conditions
> without cluttering up the list manipulations code.
Yeah, the _lock and _unlock pair is helpful, I will go for this direction.
Thanks,
Gao Xiang
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2020-07-09 0:55 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-07 13:57 [RFC PATCH 0/2] xfs: more unlinked inode list optimization v1 Gao Xiang
2020-07-07 13:57 ` [RFC PATCH 1/2] xfs: arrange all unlinked inodes into one list Gao Xiang
2020-07-08 22:33 ` Dave Chinner
2020-07-09 0:17 ` Gao Xiang
2020-07-07 13:57 ` [RFC PATCH 2/2] xfs: don't access AGI on unlinked inodes if it can Gao Xiang
2020-07-08 17:03 ` Darrick J. Wong
2020-07-08 23:40 ` Gao Xiang
2020-07-08 23:33 ` Dave Chinner
2020-07-09 0:55 ` Gao Xiang [this message]
2020-07-09 2:32 ` Dave Chinner
2020-07-09 10:36 ` Gao Xiang
2020-07-09 10:47 ` Gao Xiang
2020-07-09 22:36 ` Dave Chinner
2020-07-24 6:12 ` [RFC PATCH v2 0/3] xfs: more unlinked inode list optimization v2 Gao Xiang
2020-07-24 6:12 ` [RFC PATCH v2 1/3] xfs: arrange all unlinked inodes into one list Gao Xiang
2020-07-24 6:12 ` [RFC PATCH v2 2/3] xfs: introduce perag iunlink lock Gao Xiang
2020-07-24 6:12 ` [RFC PATCH v2 3/3] xfs: insert unlinked inodes from tail Gao Xiang
2020-08-18 13:30 ` [RFC PATCH v4 0/3] xfs: more unlinked inode list optimization v4 Gao Xiang
2020-08-18 13:30 ` [RFC PATCH v4 1/3] xfs: get rid of unused pagi_unlinked_hash Gao Xiang
2020-08-19 0:54 ` Darrick J. Wong
2020-08-21 1:09 ` Dave Chinner
2020-08-18 13:30 ` [RFC PATCH v4 2/3] xfs: introduce perag iunlink lock Gao Xiang
2020-08-19 1:06 ` Darrick J. Wong
2020-08-19 1:23 ` Gao Xiang
2020-08-18 13:30 ` [RFC PATCH v4 3/3] xfs: insert unlinked inodes from tail Gao Xiang
2020-08-19 0:53 ` [RFC PATCH v4 0/3] xfs: more unlinked inode list optimization v4 Darrick J. Wong
2020-08-19 1:14 ` Gao Xiang
2020-08-20 2:46 ` Darrick J. Wong
2020-08-20 4:01 ` Gao Xiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200709005526.GC15249@xiangao.remote.csb \
--to=hsiangkao@redhat.com \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).