All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: "Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Yafang Shao" <laoar.shao@gmail.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages
Date: Tue, 16 Jun 2020 09:56:06 +0200	[thread overview]
Message-ID: <20200616075606.GB9499@dhcp22.suse.cz> (raw)
In-Reply-To: <20200615230605.GV2040@dread.disaster.area>

On Tue 16-06-20 09:06:05, Dave Chinner wrote:
> On Mon, Jun 15, 2020 at 04:53:31PM +0200, Michal Hocko wrote:
> > On Mon 15-06-20 16:25:52, Holger Hoffstätte wrote:
> > > On 2020-06-15 13:56, Yafang Shao wrote:
> > [...]
> > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > > index b356118..1ccfbf2 100644
> > > > --- a/fs/xfs/xfs_aops.c
> > > > +++ b/fs/xfs/xfs_aops.c
> > > > @@ -573,9 +573,21 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> > > >   	struct writeback_control *wbc)
> > > >   {
> > > >   	struct xfs_writepage_ctx wpc = { };
> > > > +	unsigned int nofs_flag;
> > > > +	int ret;
> > > >   	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> > > > -	return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
> > > > +
> > > > +	/*
> > > > +	 * We can allocate memory here while doing writeback on behalf of
> > > > +	 * memory reclaim.  To avoid memory allocation deadlocks set the
> > > > +	 * task-wide nofs context for the following operations.
> > > > +	 */
> > > > +	nofs_flag = memalloc_nofs_save();
> > > > +	ret = iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
> > > > +	memalloc_nofs_restore(nofs_flag);
> > > > +
> > > > +	return ret;
> > > >   }
> > > >   STATIC int
> > > > 
> > > 
> > > Not sure if I did something wrong, but while the previous version of this patch
> > > worked fine, this one gave me (with v2 removed obviously):
> > > 
> > > [  +0.000004] WARNING: CPU: 0 PID: 2811 at fs/iomap/buffered-io.c:1544 iomap_do_writepage+0x6b4/0x780
> > 
> > This corresponds to
> >         /*
> >          * Given that we do not allow direct reclaim to call us, we should
> >          * never be called in a recursive filesystem reclaim context.
> >          */
> >         if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> >                 goto redirty;
> > 
> > which effectivelly says that memalloc_nofs_save/restore cannot be used
> > for that code path.
> 
> No it doesn't. Everyone is ignoring the -context- of this code, most
> especially the previous 3 lines of code and it's comment:
> 
>         /*
>          * Refuse to write the page out if we are called from reclaim context.
>          *
>          * This avoids stack overflows when called from deeply used stacks in
>          * random callers for direct reclaim or memcg reclaim.  We explicitly
>          * allow reclaim from kswapd as the stack usage there is relatively low.
>          *
>          * This should never happen except in the case of a VM regression so
>          * warn about it.
>          */
>         if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
>                         PF_MEMALLOC))
>                 goto redirty;
> 
> You will see that we specifically avoid this path from reclaim
> context except for kswapd. And kswapd always runs with GFP_KERNEL
> context so we allow writeback from it, so it will pass both this
> check and the NOFS check above. 
> 
> IOws, we can't trigger to the WARN_ON_ONCE(current->flags &
> PF_MEMALLOC_NOFS)) check from a memory reclaim context: this
> PF_MEMALLOC_NOFS check here is not doing what people think it is.

You are right.

> History lesson time, eh?
> 
> The recursion protection here -used- to use PF_FSTRANS, not
> PF_MEMALLOC_NOFS. See commit 9070733b4efac ("xfs: abstract
> PF_FSTRANS to PF_MEMALLOC_NOFS"). This hunk is most instructive
> when you look at the comment:
> 
>          * Given that we do not allow direct reclaim to call us, we should
>          * never be called while in a filesystem transaction.
>          */
> -       if (WARN_ON_ONCE(current->flags & PF_FSTRANS))
> +       if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
>                 goto redirty;
> 
> It wasn't for memory allocation recursion protection in XFS - it was
> for transaction reservation recursion protection by something trying
> to flush data pages while holding a transaction reservation. Doing
> this could deadlock the journal because the existing reservation
> could prevent the nested reservation for being able to reserve space
> in the journal and that is a self-deadlock vector.
> 
> IOWs, this check is not protecting against memory reclaim recursion
> bugs at all (that's the previous check I quoted). This check is
> protecting against the filesystem calling writepages directly from a
> context where it can self-deadlock.

Thanks for the clarification.

> So what we are seeing here is that the PF_FSTRANS ->
> PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> about what type of error this check was protecting against.

I have to admit that I am not familiar with the xfs code and the
PF_TRANS abstraction to PF_MEMALLOC_NOFS was mostly automatic and I
relied on xfs maintainers to tell me I was doing something stupid.
Now after your explanation it makes more sense that the warning is
indeed protecting from a different kind of issue.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-06-16  7:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 11:56 [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages Yafang Shao
2020-06-15 14:25 ` Holger Hoffstätte
2020-06-15 14:51   ` Yafang Shao
2020-06-15 14:51     ` Yafang Shao
2020-06-15 14:53   ` Michal Hocko
2020-06-15 15:07     ` Matthew Wilcox
2020-06-15 23:23       ` Dave Chinner
2020-06-15 15:08     ` Yafang Shao
2020-06-15 23:06     ` Dave Chinner
2020-06-16  7:56       ` Michal Hocko [this message]
2020-06-16 10:17       ` Yafang Shao
2020-06-16  8:16 ` Michal Hocko
2020-06-16  9:05   ` Yafang Shao
2020-06-16  9:05     ` Yafang Shao
2020-06-16  9:27     ` Michal Hocko
2020-06-16  9:39       ` Yafang Shao
2020-06-16  9:39         ` Yafang Shao
2020-06-16 10:48         ` Michal Hocko
2020-06-16 11:42           ` Yafang Shao
2020-06-16 11:42             ` Yafang Shao
2020-06-18  0:34           ` Dave Chinner
2020-06-18 11:04             ` Yafang Shao
2020-06-18 11:04               ` Yafang Shao
2020-06-22  1:23 ` [xfs] 59d77e81c5: WARNING:at_fs/iomap/buffered-io.c:#iomap_do_writepage kernel test robot
2020-06-22  1:23   ` kernel test robot
2020-06-22 12:20   ` Yafang Shao
2020-06-22 12:20     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200616075606.GB9499@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=david@fromorbit.com \
    --cc=holger@applied-asynchrony.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.