linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Michal Hocko <mhocko@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>,
	Yafang Shao <shaoyafang@didiglobal.com>
Subject: Re: [PATCH v4 2/2] xfs: avoid transaction reservation recursion
Date: Fri, 7 Aug 2020 12:11:48 +0800	[thread overview]
Message-ID: <CALOAHbCbyx8HSG2vRcK7dtDCk_abUiHwmXBgvNPRkrvnUsOZyw@mail.gmail.com> (raw)
In-Reply-To: <20200804233541.GE2114@dread.disaster.area>

On Wed, Aug 5, 2020 at 7:35 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Aug 01, 2020 at 11:46:32AM -0400, Yafang Shao wrote:
> > From: Yafang Shao <shaoyafang@didiglobal.com>
> >
> > PF_FSTRANS which is used to avoid transaction reservation recursion, is
> > dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> > PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> > memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> > means to avoid filesystem reclaim recursion. That change is subtle.
> > Let's take the exmple of the check of WARN_ON_ONCE(current->flags &
> > PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to
> > PF_MEMALLOC_NOFS is not proper.
> > Below comment is quoted from Dave,
> > > It wasn't for memory allocation recursion protection in XFS - it was for
> > > transaction reservation recursion protection by something trying to flush
> > > data pages while holding a transaction reservation. Doing
> > > this could deadlock the journal because the existing reservation
> > > could prevent the nested reservation for being able to reserve space
> > > in the journal and that is a self-deadlock vector.
> > > IOWs, this check is not protecting against memory reclaim recursion
> > > bugs at all (that's the previous check [1]). This check is
> > > protecting against the filesystem calling writepages directly from a
> > > context where it can self-deadlock.
> > > So what we are seeing here is that the PF_FSTRANS ->
> > > PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> > > about what type of error this check was protecting against.
> >
> > As a result, we should reintroduce PF_FSTRANS. As current->journal_info
> > isn't used in XFS, we can reuse it to indicate whehter the task is in
> > fstrans or not.
>
> IF we are just going to use ->journal_info, do it the simple way.
>
> > index 2d25bab68764..0795511f9e6a 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -2825,6 +2825,7 @@ xfs_btree_split_worker(
> >       if (args->kswapd)
> >               new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> >
> > +     xfs_trans_context_start();
>
> Not really. We are transferring a transaction context here, not
> starting one. Hence this reads somewhat strangely.
>
> This implies that the helper function should be something like:
>
>         xfs_trans_context_set(tp);
>
>
> >       current_set_flags_nested(&pflags, new_pflags);
> >
> >       args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> > @@ -2832,6 +2833,7 @@ xfs_btree_split_worker(
> >       complete(args->done);
> >
> >       current_restore_flags_nested(&pflags, new_pflags);
> > +     xfs_trans_context_end();
>
> And this is more likely xfs_trans_context_clear(tp)
>
> Reasons for this will become clear soon...
>
> >  }
> >
> >  /*
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index b35611882ff9..39ef95acdd8e 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -63,6 +63,8 @@ xfs_setfilesize_trans_alloc(
> >        * clear the flag here.
> >        */
> >       current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +     xfs_trans_context_end();
>
> Note the repeated pairing of functions in this patch?
>
> > +
> >       return 0;
> >  }
> >
> > @@ -125,6 +127,7 @@ xfs_setfilesize_ioend(
> >        * thus we need to mark ourselves as being in a transaction manually.
> >        * Similarly for freeze protection.
> >        */
> > +     xfs_trans_context_start();
> >       current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> >       __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
> >
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index 9f70d2f68e05..1192b660a968 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -111,6 +111,25 @@ typedef __u32                    xfs_nlink_t;
> >  #define current_restore_flags_nested(sp, f)  \
> >               (current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
> >
> > +static inline void xfs_trans_context_start(void)
> > +{
> > +     long flags = (long)current->journal_info;
> > +
> > +     /*
> > +      * Reuse journal_info to indicate whehter the current is in fstrans
> > +      * or not.
> > +      */
> > +     current->journal_info = (void *)(flags + 1);
> > +}
> > +
> > +static inline void xfs_trans_context_end(void)
> > +{
> > +     long flags = (long)current->journal_info;
> > +
> > +     WARN_ON_ONCE(flags <= 0);
> > +     current->journal_info = ((void *)(flags - 1));
> > +}
>
> This is overly complex, and cannot be used for validation that we
> are clearing the transaction context we expect to be clearing. These
> are really "set" and "clear" operations, and for rolling
> transactions we are going to need an "update" operation, too.
>
> As per my comments about the previous patch, _set() would be done
> in xfs_trans_alloc(), _clear() would be done on the final
> xfs_trans_commit() or _cancel() and _update() would be done when the
> transaction rolls.
>
> Then we can roll in the NOFS updates, and we get these three helper
> functions that keep all the per-transaction thread state coherent:
>
> static inline void
> xfs_trans_context_set(struct xfs_trans *tp)
> {
>         ASSERT(!current->journal_info);
>         current->journal_info = tp;
>         tp->t_flags = memalloc_nofs_save();
> }
>
> static inline void
> xfs_trans_context_update(struct xfs_trans *old, struct xfs_trans *new)
> {
>         ASSERT(current->journal_info == old);
>         current->journal_info = new;
>         new->t_flags = old->t_flags;
> }
>
> static inline void
> xfs_trans_context_clear(struct xfs_trans *tp)
> {
>         ASSERT(current->journal_info == tp);
>         current->journal_info = NULL;
>         memalloc_nofs_restore(tp->t_flags);
> }
>

Below helper will be used in fs/iomap/buffered-io.c, so I think we'd
better name it with fstrans_context_active() and put it in
include/linux/iomap.h, while regarding the other three helpers I think
we'd better put them in fs/xfs/xfs_trans.h.
> static bool
> xfs_trans_context_active(void)
> {
>         return current->journal_info != NULL;
> }
>

Many thanks for the detailed explanation, I will update with your
suggestion in the next version.

-- 
Thanks
Yafang

      reply	other threads:[~2020-08-07  4:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-01 15:46 [PATCH v4 0/2] void xfs transaction reservation recursion Yafang Shao
2020-08-01 15:46 ` [PATCH v4 1/2] xfs: avoid double restore PF_MEMALLOC_NOFS if transaction reservation fails Yafang Shao
2020-08-04 23:20   ` Dave Chinner
2020-08-04 23:50     ` Matthew Wilcox
2020-08-05  1:28       ` Dave Chinner
2020-08-07  4:05         ` Yafang Shao
2020-08-01 15:46 ` [PATCH v4 2/2] xfs: avoid transaction reservation recursion Yafang Shao
2020-08-04 23:35   ` Dave Chinner
2020-08-07  4:11     ` Yafang Shao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbCbyx8HSG2vRcK7dtDCk_abUiHwmXBgvNPRkrvnUsOZyw@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shaoyafang@didiglobal.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).