From: Yafang Shao <laoar.shao@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Michal Hocko <mhocko@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Linux MM <linux-mm@kvack.org>,
Yafang Shao <shaoyafang@didiglobal.com>
Subject: Re: [PATCH v4 2/2] xfs: avoid transaction reservation recursion
Date: Fri, 7 Aug 2020 12:11:48 +0800 [thread overview]
Message-ID: <CALOAHbCbyx8HSG2vRcK7dtDCk_abUiHwmXBgvNPRkrvnUsOZyw@mail.gmail.com> (raw)
In-Reply-To: <20200804233541.GE2114@dread.disaster.area>
On Wed, Aug 5, 2020 at 7:35 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Aug 01, 2020 at 11:46:32AM -0400, Yafang Shao wrote:
> > From: Yafang Shao <shaoyafang@didiglobal.com>
> >
> > PF_FSTRANS which is used to avoid transaction reservation recursion, is
> > dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> > PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> > memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> > means to avoid filesystem reclaim recursion. That change is subtle.
> > Let's take the exmple of the check of WARN_ON_ONCE(current->flags &
> > PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to
> > PF_MEMALLOC_NOFS is not proper.
> > Below comment is quoted from Dave,
> > > It wasn't for memory allocation recursion protection in XFS - it was for
> > > transaction reservation recursion protection by something trying to flush
> > > data pages while holding a transaction reservation. Doing
> > > this could deadlock the journal because the existing reservation
> > > could prevent the nested reservation for being able to reserve space
> > > in the journal and that is a self-deadlock vector.
> > > IOWs, this check is not protecting against memory reclaim recursion
> > > bugs at all (that's the previous check [1]). This check is
> > > protecting against the filesystem calling writepages directly from a
> > > context where it can self-deadlock.
> > > So what we are seeing here is that the PF_FSTRANS ->
> > > PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> > > about what type of error this check was protecting against.
> >
> > As a result, we should reintroduce PF_FSTRANS. As current->journal_info
> > isn't used in XFS, we can reuse it to indicate whehter the task is in
> > fstrans or not.
>
> IF we are just going to use ->journal_info, do it the simple way.
>
> > index 2d25bab68764..0795511f9e6a 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -2825,6 +2825,7 @@ xfs_btree_split_worker(
> > if (args->kswapd)
> > new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> >
> > + xfs_trans_context_start();
>
> Not really. We are transferring a transaction context here, not
> starting one. Hence this reads somewhat strangely.
>
> This implies that the helper function should be something like:
>
> xfs_trans_context_set(tp);
>
>
> > current_set_flags_nested(&pflags, new_pflags);
> >
> > args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> > @@ -2832,6 +2833,7 @@ xfs_btree_split_worker(
> > complete(args->done);
> >
> > current_restore_flags_nested(&pflags, new_pflags);
> > + xfs_trans_context_end();
>
> And this is more likely xfs_trans_context_clear(tp)
>
> Reasons for this will become clear soon...
>
> > }
> >
> > /*
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index b35611882ff9..39ef95acdd8e 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -63,6 +63,8 @@ xfs_setfilesize_trans_alloc(
> > * clear the flag here.
> > */
> > current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > + xfs_trans_context_end();
>
> Note the repeated pairing of functions in this patch?
>
> > +
> > return 0;
> > }
> >
> > @@ -125,6 +127,7 @@ xfs_setfilesize_ioend(
> > * thus we need to mark ourselves as being in a transaction manually.
> > * Similarly for freeze protection.
> > */
> > + xfs_trans_context_start();
> > current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
> >
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index 9f70d2f68e05..1192b660a968 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -111,6 +111,25 @@ typedef __u32 xfs_nlink_t;
> > #define current_restore_flags_nested(sp, f) \
> > (current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
> >
> > +static inline void xfs_trans_context_start(void)
> > +{
> > + long flags = (long)current->journal_info;
> > +
> > + /*
> > + * Reuse journal_info to indicate whehter the current is in fstrans
> > + * or not.
> > + */
> > + current->journal_info = (void *)(flags + 1);
> > +}
> > +
> > +static inline void xfs_trans_context_end(void)
> > +{
> > + long flags = (long)current->journal_info;
> > +
> > + WARN_ON_ONCE(flags <= 0);
> > + current->journal_info = ((void *)(flags - 1));
> > +}
>
> This is overly complex, and cannot be used for validation that we
> are clearing the transaction context we expect to be clearing. These
> are really "set" and "clear" operations, and for rolling
> transactions we are going to need an "update" operation, too.
>
> As per my comments about the previous patch, _set() would be done
> in xfs_trans_alloc(), _clear() would be done on the final
> xfs_trans_commit() or _cancel() and _update() would be done when the
> transaction rolls.
>
> Then we can roll in the NOFS updates, and we get these three helper
> functions that keep all the per-transaction thread state coherent:
>
> static inline void
> xfs_trans_context_set(struct xfs_trans *tp)
> {
> ASSERT(!current->journal_info);
> current->journal_info = tp;
> tp->t_flags = memalloc_nofs_save();
> }
>
> static inline void
> xfs_trans_context_update(struct xfs_trans *old, struct xfs_trans *new)
> {
> ASSERT(current->journal_info == old);
> current->journal_info = new;
> new->t_flags = old->t_flags;
> }
>
> static inline void
> xfs_trans_context_clear(struct xfs_trans *tp)
> {
> ASSERT(current->journal_info == tp);
> current->journal_info = NULL;
> memalloc_nofs_restore(tp->t_flags);
> }
>
Below helper will be used in fs/iomap/buffered-io.c, so I think we'd
better name it with fstrans_context_active() and put it in
include/linux/iomap.h, while regarding the other three helpers I think
we'd better put them in fs/xfs/xfs_trans.h.
> static bool
> xfs_trans_context_active(void)
> {
> return current->journal_info != NULL;
> }
>
Many thanks for the detailed explanation, I will update with your
suggestion in the next version.
--
Thanks
Yafang
prev parent reply other threads:[~2020-08-07 4:12 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-01 15:46 [PATCH v4 0/2] void xfs transaction reservation recursion Yafang Shao
2020-08-01 15:46 ` [PATCH v4 1/2] xfs: avoid double restore PF_MEMALLOC_NOFS if transaction reservation fails Yafang Shao
2020-08-04 23:20 ` Dave Chinner
2020-08-04 23:50 ` Matthew Wilcox
2020-08-05 1:28 ` Dave Chinner
2020-08-07 4:05 ` Yafang Shao
2020-08-01 15:46 ` [PATCH v4 2/2] xfs: avoid transaction reservation recursion Yafang Shao
2020-08-04 23:35 ` Dave Chinner
2020-08-07 4:11 ` Yafang Shao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALOAHbCbyx8HSG2vRcK7dtDCk_abUiHwmXBgvNPRkrvnUsOZyw@mail.gmail.com \
--to=laoar.shao@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=shaoyafang@didiglobal.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).