All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: akpm@linux-foundation.org, david@fromorbit.com,
	hch@infradead.org, willy@infradead.org, mhocko@kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion
Date: Tue, 3 Nov 2020 16:16:49 -0800	[thread overview]
Message-ID: <20201104001649.GN7123@magnolia> (raw)
In-Reply-To: <20201103131754.94949-3-laoar.shao@gmail.com>

On Tue, Nov 03, 2020 at 09:17:54PM +0800, Yafang Shao wrote:
> PF_FSTRANS which is used to avoid transaction reservation recursion, is
> dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> means to avoid filesystem reclaim recursion. That change is subtle.
> Let's take the exmple of the check of WARN_ON_ONCE(current->flags &
> PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to
> PF_MEMALLOC_NOFS is not proper.
> Below comment is quoted from Dave,
> > It wasn't for memory allocation recursion protection in XFS - it was for
> > transaction reservation recursion protection by something trying to flush
> > data pages while holding a transaction reservation. Doing
> > this could deadlock the journal because the existing reservation
> > could prevent the nested reservation for being able to reserve space
> > in the journal and that is a self-deadlock vector.
> > IOWs, this check is not protecting against memory reclaim recursion
> > bugs at all (that's the previous check [1]). This check is
> > protecting against the filesystem calling writepages directly from a
> > context where it can self-deadlock.
> > So what we are seeing here is that the PF_FSTRANS ->
> > PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> > about what type of error this check was protecting against.
> 
> As a result, we should reintroduce PF_FSTRANS. As current->journal_info
> isn't used in XFS, we can reuse it to indicate whehter the task is in
> fstrans or not, Per Willy. To achieve that, four new helpers are introduce
> in this patch, per Dave:
> - xfs_trans_context_set()
>   Used in xfs_trans_alloc()
> - xfs_trans_context_clear()
>   Used in xfs_trans_commit() and xfs_trans_cancel()
> - xfs_trans_context_update()
>   Used in xfs_trans_roll()
> - xfs_trans_context_active()
>   To check whehter current is in fs transcation or not
> [1]. Below check is to avoid memory reclaim recursion.
> if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
>         PF_MEMALLOC))
>         goto redirty;
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Urrrrk, I found some problems with this patch while testing.  xfs/141
blows up with:

XFS: Assertion failed: current->journal_info == tp, file:
fs/xfs/xfs_trans.h, line: 289

The call trace is very garbled, but I think it is:

+[ 1815.870749]  __xfs_trans_commit+0x4df/0x680 [xfs]
+[ 1815.871342]  xfs_symlink+0x5ec/0xac0 [xfs]
+[ 1815.871834]  ? lock_release+0x20d/0x450
+[ 1815.872280]  ? get_cached_acl+0x32/0x250
+[ 1815.872847]  xfs_vn_symlink+0x8d/0x1b0 [xfs]
+[ 1815.873742]  vfs_symlink+0xc7/0x150
+[ 1815.874356]  do_symlinkat+0x83/0x110
+[ 1815.874788]  do_syscall_64+0x31/0x40
+[ 1815.875204]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
+[ 1815.875781] RIP: 0033:0x7f2317fc6d7b


> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index c94e71f..b272d07 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -153,8 +153,6 @@
>  	int			error = 0;
>  	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
>  
> -	/* Mark this thread as being in a transaction */
> -	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
>  
>  	/*
>  	 * Attempt to reserve the needed disk blocks by decrementing
> @@ -163,10 +161,8 @@
>  	 */
>  	if (blocks > 0) {
>  		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> -		if (error != 0) {
> -			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +		if (error != 0)
>  			return -ENOSPC;
> -		}
>  		tp->t_blk_res += blocks;
>  	}
>  
> @@ -241,8 +237,6 @@
>  		tp->t_blk_res = 0;
>  	}
>  
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> -
>  	return error;
>  }
>  
> @@ -284,6 +278,8 @@
>  	INIT_LIST_HEAD(&tp->t_dfops);
>  	tp->t_firstblock = NULLFSBLOCK;
>  
> +	/* Mark this thread as being in a transaction */
> +	xfs_trans_context_set(tp);
>  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
>  	if (error) {
>  		xfs_trans_cancel(tp);

You're missing a xfs_trans_context_clear() call here.

> @@ -878,7 +874,8 @@
>  
>  	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
>  
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	if (!regrant)
> +		xfs_trans_context_clear(tp);
>  	xfs_trans_free(tp);
>  
>  	/*
> @@ -910,7 +907,8 @@
>  			xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
>  		tp->t_ticket = NULL;
>  	}
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +
> +	xfs_trans_context_clear(tp);
>  	xfs_trans_free_items(tp, !!error);
>  	xfs_trans_free(tp);
>  
> @@ -971,7 +969,7 @@
>  	}
>  
>  	/* mark this thread as no longer being in a transaction */
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_clear(tp);
>  
>  	xfs_trans_free_items(tp, dirty);
>  	xfs_trans_free(tp);
> @@ -1013,6 +1011,7 @@
>  	if (error)
>  		return error;
>  
> +	xfs_trans_context_update(trans, *tpp);

Two bugs here: First, xfs_trans_commit freed @trans, which means that
the assertion commits a UAF error.  Second, if the transaction is
synchronous and the xfs_log_force_lsn at the bottom of
__xfs_trans_commit fails, we'll abort everything without clearing
current->journal_info or restoring the memalloc flags.

Personally I think you should just clear the context from xfs_trans_free
if current->journal_info points to the transaction being freed.  I
/think/ you could fix this with the attached patch; what do you think?

--D

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index b272d0767c87..09ae5c181299 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -67,6 +67,11 @@ xfs_trans_free(
 	xfs_extent_busy_sort(&tp->t_busy);
 	xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
 
+	/* Detach the transaction from this thread. */
+	ASSERT(current->journal_info != NULL);
+	if (current->journal_info == tp)
+		xfs_trans_context_clear(tp);
+
 	trace_xfs_trans_free(tp, _RET_IP_);
 	if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
 		sb_end_intwrite(tp->t_mountp->m_super);
@@ -119,7 +124,11 @@ xfs_trans_dup(
 
 	ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
 	tp->t_rtx_res = tp->t_rtx_res_used;
+
+	/* Associate the new transaction with this thread. */
+	ASSERT(current->journal_info == tp);
 	ntp->t_pflags = tp->t_pflags;
+	current->journal_info = ntp;
 
 	/* move deferred ops over to the new tp */
 	xfs_defer_move(ntp, tp);
@@ -874,8 +883,6 @@ __xfs_trans_commit(
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	if (!regrant)
-		xfs_trans_context_clear(tp);
 	xfs_trans_free(tp);
 
 	/*
@@ -908,7 +915,6 @@ __xfs_trans_commit(
 		tp->t_ticket = NULL;
 	}
 
-	xfs_trans_context_clear(tp);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -968,9 +974,6 @@ xfs_trans_cancel(
 		tp->t_ticket = NULL;
 	}
 
-	/* mark this thread as no longer being in a transaction */
-	xfs_trans_context_clear(tp);
-
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
 }
@@ -1011,7 +1014,6 @@ xfs_trans_roll(
 	if (error)
 		return error;
 
-	xfs_trans_context_update(trans, *tpp);
 	/*
 	 * Reserve space in the log for the next transaction.
 	 * This also pushes items in the "AIL", the list of logged items,
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index c4877afcb8b9..ceb530bf5c4b 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -276,13 +276,6 @@ xfs_trans_context_set(struct xfs_trans *tp)
 	tp->t_pflags = memalloc_nofs_save();
 }
 
-static inline void
-xfs_trans_context_update(struct xfs_trans *old, struct xfs_trans *new)
-{
-	ASSERT(current->journal_info == old);
-	current->journal_info = new;
-}
-
 static inline void
 xfs_trans_context_clear(struct xfs_trans *tp)
 {

  reply	other threads:[~2020-11-04  0:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-03 13:17 [PATCH v8 resend 0/2] avoid xfs transaction reservation recursion Yafang Shao
2020-11-03 13:17 ` [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd Yafang Shao
2020-11-03 19:48   ` Darrick J. Wong
2020-11-04 14:17     ` Yafang Shao
2020-11-04 14:17       ` Yafang Shao
2020-11-04 16:26       ` Darrick J. Wong
2020-11-04 17:50         ` Amy Parker
2020-11-04 17:50           ` Amy Parker
2020-11-05 13:04         ` Yafang Shao
2020-11-05 13:04           ` Yafang Shao
2020-11-03 13:17 ` [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion Yafang Shao
2020-11-04  0:16   ` Darrick J. Wong [this message]
2020-11-04 14:11     ` Yafang Shao
2020-11-04 14:11       ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201104001649.GN7123@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.