linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: "Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Yafang Shao" <laoar.shao@gmail.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages
Date: Tue, 16 Jun 2020 09:56:06 +0200	[thread overview]
Message-ID: <20200616075606.GB9499@dhcp22.suse.cz> (raw)
In-Reply-To: <20200615230605.GV2040@dread.disaster.area>

On Tue 16-06-20 09:06:05, Dave Chinner wrote:
> On Mon, Jun 15, 2020 at 04:53:31PM +0200, Michal Hocko wrote:
> > On Mon 15-06-20 16:25:52, Holger Hoffstätte wrote:
> > > On 2020-06-15 13:56, Yafang Shao wrote:
> > [...]
> > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > > index b356118..1ccfbf2 100644
> > > > --- a/fs/xfs/xfs_aops.c
> > > > +++ b/fs/xfs/xfs_aops.c
> > > > @@ -573,9 +573,21 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
> > > >   	struct writeback_control *wbc)
> > > >   {
> > > >   	struct xfs_writepage_ctx wpc = { };
> > > > +	unsigned int nofs_flag;
> > > > +	int ret;
> > > >   	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> > > > -	return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
> > > > +
> > > > +	/*
> > > > +	 * We can allocate memory here while doing writeback on behalf of
> > > > +	 * memory reclaim.  To avoid memory allocation deadlocks set the
> > > > +	 * task-wide nofs context for the following operations.
> > > > +	 */
> > > > +	nofs_flag = memalloc_nofs_save();
> > > > +	ret = iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
> > > > +	memalloc_nofs_restore(nofs_flag);
> > > > +
> > > > +	return ret;
> > > >   }
> > > >   STATIC int
> > > > 
> > > 
> > > Not sure if I did something wrong, but while the previous version of this patch
> > > worked fine, this one gave me (with v2 removed obviously):
> > > 
> > > [  +0.000004] WARNING: CPU: 0 PID: 2811 at fs/iomap/buffered-io.c:1544 iomap_do_writepage+0x6b4/0x780
> > 
> > This corresponds to
> >         /*
> >          * Given that we do not allow direct reclaim to call us, we should
> >          * never be called in a recursive filesystem reclaim context.
> >          */
> >         if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> >                 goto redirty;
> > 
> > which effectivelly says that memalloc_nofs_save/restore cannot be used
> > for that code path.
> 
> No it doesn't. Everyone is ignoring the -context- of this code, most
> especially the previous 3 lines of code and it's comment:
> 
>         /*
>          * Refuse to write the page out if we are called from reclaim context.
>          *
>          * This avoids stack overflows when called from deeply used stacks in
>          * random callers for direct reclaim or memcg reclaim.  We explicitly
>          * allow reclaim from kswapd as the stack usage there is relatively low.
>          *
>          * This should never happen except in the case of a VM regression so
>          * warn about it.
>          */
>         if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
>                         PF_MEMALLOC))
>                 goto redirty;
> 
> You will see that we specifically avoid this path from reclaim
> context except for kswapd. And kswapd always runs with GFP_KERNEL
> context so we allow writeback from it, so it will pass both this
> check and the NOFS check above. 
> 
> IOws, we can't trigger to the WARN_ON_ONCE(current->flags &
> PF_MEMALLOC_NOFS)) check from a memory reclaim context: this
> PF_MEMALLOC_NOFS check here is not doing what people think it is.

You are right.

> History lesson time, eh?
> 
> The recursion protection here -used- to use PF_FSTRANS, not
> PF_MEMALLOC_NOFS. See commit 9070733b4efac ("xfs: abstract
> PF_FSTRANS to PF_MEMALLOC_NOFS"). This hunk is most instructive
> when you look at the comment:
> 
>          * Given that we do not allow direct reclaim to call us, we should
>          * never be called while in a filesystem transaction.
>          */
> -       if (WARN_ON_ONCE(current->flags & PF_FSTRANS))
> +       if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
>                 goto redirty;
> 
> It wasn't for memory allocation recursion protection in XFS - it was
> for transaction reservation recursion protection by something trying
> to flush data pages while holding a transaction reservation. Doing
> this could deadlock the journal because the existing reservation
> could prevent the nested reservation for being able to reserve space
> in the journal and that is a self-deadlock vector.
> 
> IOWs, this check is not protecting against memory reclaim recursion
> bugs at all (that's the previous check I quoted). This check is
> protecting against the filesystem calling writepages directly from a
> context where it can self-deadlock.

Thanks for the clarification.

> So what we are seeing here is that the PF_FSTRANS ->
> PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> about what type of error this check was protecting against.

I have to admit that I am not familiar with the xfs code and the
PF_TRANS abstraction to PF_MEMALLOC_NOFS was mostly automatic and I
relied on xfs maintainers to tell me I was doing something stupid.
Now after your explanation it makes more sense that the warning is
indeed protecting from a different kind of issue.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-06-16  7:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 11:56 [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages Yafang Shao
2020-06-15 14:25 ` Holger Hoffstätte
2020-06-15 14:51   ` Yafang Shao
2020-06-15 14:53   ` Michal Hocko
2020-06-15 15:07     ` Matthew Wilcox
2020-06-15 23:23       ` Dave Chinner
2020-06-15 15:08     ` Yafang Shao
2020-06-15 23:06     ` Dave Chinner
2020-06-16  7:56       ` Michal Hocko [this message]
2020-06-16 10:17       ` Yafang Shao
2020-06-16  8:16 ` Michal Hocko
2020-06-16  9:05   ` Yafang Shao
2020-06-16  9:27     ` Michal Hocko
2020-06-16  9:39       ` Yafang Shao
2020-06-16 10:48         ` Michal Hocko
2020-06-16 11:42           ` Yafang Shao
2020-06-18  0:34           ` Dave Chinner
2020-06-18 11:04             ` Yafang Shao
2020-06-22  1:23 ` [xfs] 59d77e81c5: WARNING:at_fs/iomap/buffered-io.c:#iomap_do_writepage kernel test robot
2020-06-22 12:20   ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200616075606.GB9499@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=david@fromorbit.com \
    --cc=holger@applied-asynchrony.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).