All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Brian Foster <bfoster@redhat.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH] xfs: reintroduce PF_FSTRANS for transaction reservation recursion protection
Date: Mon, 22 Jun 2020 20:18:09 +0800	[thread overview]
Message-ID: <CALOAHbDaDVXjfeYL8TqkTFzSnnQvYV2aKTiKC25UBdcUBdvB9A@mail.gmail.com> (raw)
In-Reply-To: <20200621230420.GT2005@dread.disaster.area>

On Mon, Jun 22, 2020 at 7:04 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Jun 20, 2020 at 03:12:54AM -0400, Yafang Shao wrote:
> > PF_FSTRANS which is used to avoid transaction reservation recursion, is
> > dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> > PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> > memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> > means to avoid filesystem reclaim recursion. That change is subtle.
> > Let's take the exmple of the check of WARN_ON_ONCE(current->flags &
> > PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to
> > PF_MEMALLOC_NOFS is not proper.
> >
> > Bellow comment is quoted from Dave,
> > > It wasn't for memory allocation recursion protection in XFS - it was for
> > > transaction reservation recursion protection by something trying to flush
> > > data pages while holding a transaction reservation. Doing
> > > this could deadlock the journal because the existing reservation
> > > could prevent the nested reservation for being able to reserve space
> > > in the journal and that is a self-deadlock vector.
> > > IOWs, this check is not protecting against memory reclaim recursion
> > > bugs at all (that's the previous check [1]). This check is
> > > protecting against the filesystem calling writepages directly from a
> > > context where it can self-deadlock.
> > > So what we are seeing here is that the PF_FSTRANS ->
> > > PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> > > about what type of error this check was protecting against.
> >
> > Besides reintroducing PF_FSTRANS, there're some other improvements in this
> > patch,
> > - Remove useless MACRO current_clear_flags_nested(), current_pid() and
> >   current_test_flags().
> > - Remove useless memalloc_nofs_{save, restore} in __kmem_vmalloc()
> >
> > [1]. Bellow check is to avoid memory reclaim recursion.
> > if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
> >       PF_MEMALLOC))
> >       goto redirty;
> >
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  fs/iomap/buffered-io.c    |  4 ++--
> >  fs/xfs/kmem.c             |  7 -------
> >  fs/xfs/kmem.h             |  2 +-
> >  fs/xfs/libxfs/xfs_btree.c |  2 +-
> >  fs/xfs/xfs_aops.c         |  4 ++--
> >  fs/xfs/xfs_linux.h        |  4 ----
> >  fs/xfs/xfs_trans.c        | 12 ++++++------
> >  include/linux/sched.h     |  1 +
> >  8 files changed, 13 insertions(+), 23 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index bcfc288..0f1945c 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1500,9 +1500,9 @@ static void iomap_writepage_end_bio(struct bio *bio)
> >
> >       /*
> >        * Given that we do not allow direct reclaim to call us, we should
> > -      * never be called in a recursive filesystem reclaim context.
> > +      * never be called while in a filesystem transaction.
> >        */
> > -     if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> > +     if (WARN_ON_ONCE(current->flags & PF_FSTRANS))
> >               goto redirty;
>
> This is OK, but the rest of the patch is not.
>
> I did not say "replace all XFS use of GFP_NOFS/KM_NOFS with
> PF_TRANS", which is what this patch does. The use of
> PF_MEMALLOC_NOFS within transactions is correct and valid and needs
> to remain. Replacing this with PF_FSTRANS effectively reverts all
> the simplifications and obviously self-documneting code that
> PF_MEMALLOC_NOFS provides us with.
>

Sorry about that, I misunderstood it. Will correct it in the next version.

> IOWs, PF_MEMALLOC_NOFS is used to indicate that this is a "no
> reclaim recursion" path and so it's use remains completely unchanged
> in XFS. PF_FSTRANS is to indicate this is a "no
> transaction recursion" path, which is a different thing and needs
> it's own specific annotation.
>

Thanks for the explanation.

> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index f136647..9875a23 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -41,18 +41,11 @@
> >  static void *
> >  __kmem_vmalloc(size_t size, xfs_km_flags_t flags)
> >  {
> > -     unsigned nofs_flag = 0;
> >       void    *ptr;
> >       gfp_t   lflags = kmem_flags_convert(flags);
> >
> > -     if (flags & KM_NOFS)
> > -             nofs_flag = memalloc_nofs_save();
> > -
> >       ptr = __vmalloc(size, lflags);
> >
> > -     if (flags & KM_NOFS)
> > -             memalloc_nofs_restore(nofs_flag);
> > -
>
> This breaks both kmem_alloc_large() and kmem_alloc_io() if they are
> called from an explicit KM_NOFS context. vmalloc() does not respect
> the gfp flags that are passed to it and will always do GFP_KERNEL
> allocations deep down in the page table allocation code, and hence
> we must use memalloc_nofs_save() here if called in a KM_NOFS
> context.
>

I thought kmem_flags_convert() has already checked KM_NOFS so we don't
need to call memalloc_nofs_save(), but it seems I was wrong.
Thanks for the clarification.


> >       return ptr;
> >  }
> >
> > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> > index 34cbcfd..ccc63de 100644
> > --- a/fs/xfs/kmem.h
> > +++ b/fs/xfs/kmem.h
> > @@ -34,7 +34,7 @@
> >       BUG_ON(flags & ~(KM_NOFS | KM_MAYFAIL | KM_ZERO | KM_NOLOCKDEP));
> >
> >       lflags = GFP_KERNEL | __GFP_NOWARN;
> > -     if (flags & KM_NOFS)
> > +     if (current->flags & PF_FSTRANS || flags & KM_NOFS)
> >               lflags &= ~__GFP_FS;
>
> No. If we are in a transaction context, PF_MEMALLOC_NOFS should be
> set. We got rid of all the PF_FSTRANS checks out of this code by
> moving to PF_MEMALLOC_NOFS, reverting this isn't an improvement.
>

Got it. Thanks.

> >
> >       /*
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 2d25bab..65d0afe 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -2814,7 +2814,7 @@ struct xfs_btree_split_args {
> >       struct xfs_btree_split_args     *args = container_of(work,
> >                                               struct xfs_btree_split_args, work);
> >       unsigned long           pflags;
> > -     unsigned long           new_pflags = PF_MEMALLOC_NOFS;
> > +     unsigned long           new_pflags = PF_FSTRANS;
>
>         new_pflags = PF_MEMALLOC_NOFS | PF_FSTRANS;
> >
> >       /*
> >        * we are in a transaction context here, but may also be doing work
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index b356118..02733eb 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -62,7 +62,7 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
> >        * We hand off the transaction to the completion thread now, so
> >        * clear the flag here.
> >        */
> > -     current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +     current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
>
>         current_restore_flags_nested(PF_MEMALLOC_NOFS | PF_FSTRANS);
>

Thanks

> >       return 0;
> >  }
> >
> > @@ -125,7 +125,7 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
> >        * thus we need to mark ourselves as being in a transaction manually.
> >        * Similarly for freeze protection.
> >        */
> > -     current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +     current_set_flags_nested(&tp->t_pflags, PF_FSTRANS);
>
>         current_set_flags_nested(PF_MEMALLOC_NOFS | PF_FSTRANS);
>

Thanks

> >       __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
> >
> >       /* we abort the update if there was an IO error */
> > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> > index 9f70d2f..ab737fe 100644
> > --- a/fs/xfs/xfs_linux.h
> > +++ b/fs/xfs/xfs_linux.h
> > @@ -102,12 +102,8 @@
> >  #define xfs_cowb_secs                xfs_params.cowb_timer.val
> >
> >  #define current_cpu()                (raw_smp_processor_id())
> > -#define current_pid()                (current->pid)
> > -#define current_test_flags(f)        (current->flags & (f))
> >  #define current_set_flags_nested(sp, f)              \
> >               (*(sp) = current->flags, current->flags |= (f))
> > -#define current_clear_flags_nested(sp, f)    \
> > -             (*(sp) = current->flags, current->flags &= ~(f))
> >  #define current_restore_flags_nested(sp, f)  \
> >               (current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
>
> Separate cleanup patch to remove unrelated definitions, please.
>

Sure, I will.

> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index 3c94e5f..1c1b982 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -153,7 +153,7 @@
> >       bool                    rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
> >
> >       /* Mark this thread as being in a transaction */
> > -     current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +     current_set_flags_nested(&tp->t_pflags, PF_FSTRANS);
> >
>
> And, again, PF_FSTRANS | PF_MEMALLOC_NOFS through this code.
>

Thanks



-- 
Thanks
Yafang

  reply	other threads:[~2020-06-22 12:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-20  7:12 [PATCH] xfs: reintroduce PF_FSTRANS for transaction reservation recursion protection Yafang Shao
2020-06-21 23:04 ` Dave Chinner
2020-06-22 12:18   ` Yafang Shao [this message]
2020-06-22 12:18     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbDaDVXjfeYL8TqkTFzSnnQvYV2aKTiKC25UBdcUBdvB9A@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.