linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 0/4] xfs: avoid transaction reservation recursion
@ 2020-12-08 12:28 Yafang Shao
  2020-12-08 12:28 ` [PATCH v11 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Yafang Shao @ 2020-12-08 12:28 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao

PF_FSTRANS which is used to avoid transaction reservation recursion, is
dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
memalloc_nofs_{save,restore} API"), and replaced by PF_MEMALLOC_NOFS which
means to avoid filesystem reclaim recursion.

As these two flags have different meanings, we'd better reintroduce
PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
we can reuse the current->journal_info to do that, per Willy. As the 
check of transaction reservation recursion is used by XFS only, we can 
move the check into xfs_vm_writepage(s), per Dave.

Patch #1 and #2 are to use the memalloc_nofs_{save,restore} API
Patch #1 is picked form Willy's patchset "Overhaul memalloc_no*"[1]

Patch #3 is the refactor of xfs_trans context, which is activated when
xfs_trans is allocated and deactivated when xfs_trans is freed.

Patch #4 is the implementation of reussing current->journal_info to
avoid transaction reservation recursion.

No obvious error occurred after running xfstests.

[1]. https://lore.kernel.org/linux-mm/20200625113122.7540-1-willy@infradead.org

v11:
- add the warning at the callsite of xfs_trans_context_active()
- improve the commit log of patch #2

v10:
- refactor the code, per Dave.

v9:
- rebase it on xfs tree.
- Darrick fixed an error occurred in xfs/141
- run xfstests, and no obvious error occurred.

v8:
- check xfs_trans_context_active() in xfs_vm_writepage(s), per Dave.

v7:
- check fstrans recursion for XFS only, by introducing a new member in
  struct writeback_control.

v6:
- add Michal's ack and comment in patch #1. 

v5:
- pick one of Willy's patch
- introduce four new helpers, per Dave

v4:
- retitle from "xfs: introduce task->in_fstrans for transaction reservation
  recursion protection"
- reuse current->journal_info, per Willy

Matthew Wilcox (Oracle) (1):
  mm: Add become_kswapd and restore_kswapd

Yafang Shao (3):
  xfs: use memalloc_nofs_{save,restore} in xfs transaction
  xfs: refactor the usage around xfs_trans_context_{set,clear}
  xfs: use current->journal_info to avoid transaction reservation
    recursion

 fs/iomap/buffered-io.c    |  7 -------
 fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
 fs/xfs/xfs_aops.c         | 21 +++++++++++++++++++--
 fs/xfs/xfs_linux.h        |  4 ----
 fs/xfs/xfs_trans.c        | 24 +++++++++++-------------
 fs/xfs/xfs_trans.h        | 34 ++++++++++++++++++++++++++++++++++
 include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
 mm/vmscan.c               | 16 +---------------
 8 files changed, 96 insertions(+), 47 deletions(-)

-- 
2.18.4



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 1/4] mm: Add become_kswapd and restore_kswapd
  2020-12-08 12:28 [PATCH v11 0/4] xfs: avoid transaction reservation recursion Yafang Shao
@ 2020-12-08 12:28 ` Yafang Shao
  2020-12-08 12:28 ` [PATCH v11 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction Yafang Shao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2020-12-08 12:28 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Michal Hocko,
	Christoph Hellwig, Yafang Shao

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Since XFS needs to pretend to be kswapd in some of its worker threads,
create methods to save & restore kswapd state.  Don't bother restoring
kswapd state in kswapd -- the only time we reach this code is when we're
exiting and the task_struct is about to be destroyed anyway.

Cc: Dave Chinner <david@fromorbit.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
 include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
 mm/vmscan.c               | 16 +---------------
 3 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2d25bab68764..a04a44238aab 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -2813,8 +2813,9 @@ xfs_btree_split_worker(
 {
 	struct xfs_btree_split_args	*args = container_of(work,
 						struct xfs_btree_split_args, work);
+	bool			is_kswapd = args->kswapd;
 	unsigned long		pflags;
-	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
+	int			memalloc_nofs;
 
 	/*
 	 * we are in a transaction context here, but may also be doing work
@@ -2822,16 +2823,17 @@ xfs_btree_split_worker(
 	 * temporarily to ensure that we don't block waiting for memory reclaim
 	 * in any way.
 	 */
-	if (args->kswapd)
-		new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
-
-	current_set_flags_nested(&pflags, new_pflags);
+	if (is_kswapd)
+		pflags = become_kswapd();
+	memalloc_nofs = memalloc_nofs_save();
 
 	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
 					 args->key, args->curp, args->stat);
 	complete(args->done);
 
-	current_restore_flags_nested(&pflags, new_pflags);
+	memalloc_nofs_restore(memalloc_nofs);
+	if (is_kswapd)
+		restore_kswapd(pflags);
 }
 
 /*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5ece7a9a403..2faf03e79a1e 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
 }
 #endif
 
+/*
+ * Tell the memory management code that this thread is working on behalf
+ * of background memory reclaim (like kswapd).  That means that it will
+ * get access to memory reserves should it need to allocate memory in
+ * order to make forward progress.  With this great power comes great
+ * responsibility to not exhaust those reserves.
+ */
+#define KSWAPD_PF_FLAGS		(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
+
+static inline unsigned long become_kswapd(void)
+{
+	unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
+
+	current->flags |= KSWAPD_PF_FLAGS;
+
+	return flags;
+}
+
+static inline void restore_kswapd(unsigned long flags)
+{
+	current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
+}
+
 #ifdef CONFIG_MEMCG
 DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
 /**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1b8f0e059767..77bc1dda75bf 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3869,19 +3869,7 @@ static int kswapd(void *p)
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(tsk, cpumask);
 
-	/*
-	 * Tell the memory management that we're a "memory allocator",
-	 * and that if we need more memory we should get access to it
-	 * regardless (see "__alloc_pages()"). "kswapd" should
-	 * never get caught in the normal page freeing logic.
-	 *
-	 * (Kswapd normally doesn't need memory anyway, but sometimes
-	 * you need a small amount of memory in order to be able to
-	 * page out something else, and this flag essentially protects
-	 * us from recursively trying to free more memory as we're
-	 * trying to free the first piece of memory in the first place).
-	 */
-	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
+	become_kswapd();
 	set_freezable();
 
 	WRITE_ONCE(pgdat->kswapd_order, 0);
@@ -3931,8 +3919,6 @@ static int kswapd(void *p)
 			goto kswapd_try_sleep;
 	}
 
-	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
-
 	return 0;
 }
 
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v11 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction
  2020-12-08 12:28 [PATCH v11 0/4] xfs: avoid transaction reservation recursion Yafang Shao
  2020-12-08 12:28 ` [PATCH v11 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
@ 2020-12-08 12:28 ` Yafang Shao
  2020-12-08 19:02   ` Darrick J. Wong
  2020-12-08 12:28 ` [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
  2020-12-08 12:28 ` [PATCH v11 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
  3 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-12-08 12:28 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao,
	Christoph Hellwig

Introduce a new API to mark the start and end of XFS transactions.
For now, just save and restore the memalloc_nofs flags.

The new helpers as follows,
- xfs_trans_context_set
  Mark the start of XFS transactions
- xfs_trans_context_clear
  Mark the end of XFS transactions

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/xfs_aops.c  |  4 ++--
 fs/xfs/xfs_linux.h |  4 ----
 fs/xfs/xfs_trans.c | 13 +++++++------
 fs/xfs/xfs_trans.h | 12 ++++++++++++
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4304c6416fbb..2371187b7615 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -62,7 +62,7 @@ xfs_setfilesize_trans_alloc(
 	 * We hand off the transaction to the completion thread now, so
 	 * clear the flag here.
 	 */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 	return 0;
 }
 
@@ -125,7 +125,7 @@ xfs_setfilesize_ioend(
 	 * thus we need to mark ourselves as being in a transaction manually.
 	 * Similarly for freeze protection.
 	 */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_set(tp);
 	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
 
 	/* we abort the update if there was an IO error */
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 5b7a1e201559..6ab0f8043c73 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -102,10 +102,6 @@ typedef __u32			xfs_nlink_t;
 #define xfs_cowb_secs		xfs_params.cowb_timer.val
 
 #define current_cpu()		(raw_smp_processor_id())
-#define current_set_flags_nested(sp, f)		\
-		(*(sp) = current->flags, current->flags |= (f))
-#define current_restore_flags_nested(sp, f)	\
-		(current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
 
 #define NBBY		8		/* number of bits per byte */
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index c94e71f741b6..11d390f0d3f2 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -154,7 +154,7 @@ xfs_trans_reserve(
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 
 	/* Mark this thread as being in a transaction */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_set(tp);
 
 	/*
 	 * Attempt to reserve the needed disk blocks by decrementing
@@ -164,7 +164,7 @@ xfs_trans_reserve(
 	if (blocks > 0) {
 		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
 		if (error != 0) {
-			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+			xfs_trans_context_clear(tp);
 			return -ENOSPC;
 		}
 		tp->t_blk_res += blocks;
@@ -241,7 +241,7 @@ xfs_trans_reserve(
 		tp->t_blk_res = 0;
 	}
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 
 	return error;
 }
@@ -878,7 +878,7 @@ __xfs_trans_commit(
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 	xfs_trans_free(tp);
 
 	/*
@@ -910,7 +910,8 @@ __xfs_trans_commit(
 			xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
 		tp->t_ticket = NULL;
 	}
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+
+	xfs_trans_context_clear(tp);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -971,7 +972,7 @@ xfs_trans_cancel(
 	}
 
 	/* mark this thread as no longer being in a transaction */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 084658946cc8..44b11c64a15e 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -268,4 +268,16 @@ xfs_trans_item_relog(
 	return lip->li_ops->iop_relog(lip, tp);
 }
 
+static inline void
+xfs_trans_context_set(struct xfs_trans *tp)
+{
+	tp->t_pflags = memalloc_nofs_save();
+}
+
+static inline void
+xfs_trans_context_clear(struct xfs_trans *tp)
+{
+	memalloc_nofs_restore(tp->t_pflags);
+}
+
 #endif	/* __XFS_TRANS_H__ */
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-08 12:28 [PATCH v11 0/4] xfs: avoid transaction reservation recursion Yafang Shao
  2020-12-08 12:28 ` [PATCH v11 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
  2020-12-08 12:28 ` [PATCH v11 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction Yafang Shao
@ 2020-12-08 12:28 ` Yafang Shao
  2020-12-08 18:59   ` Darrick J. Wong
  2020-12-08 12:28 ` [PATCH v11 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
  3 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-12-08 12:28 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao,
	Christoph Hellwig

The xfs_trans context should be active after it is allocated, and
deactive when it is freed.

So these two helpers are refactored as,
- xfs_trans_context_set()
  Used in xfs_trans_alloc()
- xfs_trans_context_clear()
  Used in xfs_trans_free()

This patch is based on Darrick's work to fix the issue in xfs/141 in the
earlier version. [1]

1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/xfs_trans.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 11d390f0d3f2..fe20398a214e 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -67,6 +67,9 @@ xfs_trans_free(
 	xfs_extent_busy_sort(&tp->t_busy);
 	xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
 
+	/* Detach the transaction from this thread. */
+	xfs_trans_context_clear(tp);
+
 	trace_xfs_trans_free(tp, _RET_IP_);
 	if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
 		sb_end_intwrite(tp->t_mountp->m_super);
@@ -153,9 +156,6 @@ xfs_trans_reserve(
 	int			error = 0;
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 
-	/* Mark this thread as being in a transaction */
-	xfs_trans_context_set(tp);
-
 	/*
 	 * Attempt to reserve the needed disk blocks by decrementing
 	 * the number needed from the number available.  This will
@@ -163,10 +163,9 @@ xfs_trans_reserve(
 	 */
 	if (blocks > 0) {
 		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
-		if (error != 0) {
-			xfs_trans_context_clear(tp);
+		if (error != 0)
 			return -ENOSPC;
-		}
+
 		tp->t_blk_res += blocks;
 	}
 
@@ -241,8 +240,6 @@ xfs_trans_reserve(
 		tp->t_blk_res = 0;
 	}
 
-	xfs_trans_context_clear(tp);
-
 	return error;
 }
 
@@ -284,6 +281,8 @@ xfs_trans_alloc(
 	INIT_LIST_HEAD(&tp->t_dfops);
 	tp->t_firstblock = NULLFSBLOCK;
 
+	/* Mark this thread as being in a transaction */
+	xfs_trans_context_set(tp);
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
 	if (error) {
 		xfs_trans_cancel(tp);
@@ -878,7 +877,6 @@ __xfs_trans_commit(
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	xfs_trans_context_clear(tp);
 	xfs_trans_free(tp);
 
 	/*
@@ -911,7 +909,6 @@ __xfs_trans_commit(
 		tp->t_ticket = NULL;
 	}
 
-	xfs_trans_context_clear(tp);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -971,9 +968,6 @@ xfs_trans_cancel(
 		tp->t_ticket = NULL;
 	}
 
-	/* mark this thread as no longer being in a transaction */
-	xfs_trans_context_clear(tp);
-
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
 }
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v11 4/4] xfs: use current->journal_info to avoid transaction reservation recursion
  2020-12-08 12:28 [PATCH v11 0/4] xfs: avoid transaction reservation recursion Yafang Shao
                   ` (2 preceding siblings ...)
  2020-12-08 12:28 ` [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
@ 2020-12-08 12:28 ` Yafang Shao
  2020-12-08 18:59   ` Darrick J. Wong
  3 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-12-08 12:28 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao,
	Christoph Hellwig

PF_FSTRANS which is used to avoid transaction reservation recursion, is
dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
means to avoid filesystem reclaim recursion.

As these two flags have different meanings, we'd better reintroduce
PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
we can reuse the current->journal_info to do that, per Willy. As the
check of transaction reservation recursion is used by XFS only, we can
move the check into xfs_vm_writepage(s), per Dave.

To better abstract that behavoir, two new helpers are introduced, as
follows,
- xfs_trans_context_active
  To check whehter current is in fs transcation or not
- xfs_trans_context_swap
  Transfer the transaction context when rolling a permanent transaction

These two new helpers are instroduced in xfs_trans.h.

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/iomap/buffered-io.c |  7 -------
 fs/xfs/xfs_aops.c      | 17 +++++++++++++++++
 fs/xfs/xfs_trans.c     |  3 +++
 fs/xfs/xfs_trans.h     | 22 ++++++++++++++++++++++
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 10cc7979ce38..3c53fa6ce64d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 			PF_MEMALLOC))
 		goto redirty;
 
-	/*
-	 * Given that we do not allow direct reclaim to call us, we should
-	 * never be called in a recursive filesystem reclaim context.
-	 */
-	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
-		goto redirty;
-
 	/*
 	 * Is this page beyond the end of the file?
 	 *
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2371187b7615..0da0242d42c3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -568,6 +568,16 @@ xfs_vm_writepage(
 {
 	struct xfs_writepage_ctx wpc = { };
 
+	/*
+	 * Given that we do not allow direct reclaim to call us, we should
+	 * never be called while in a filesystem transaction.
+	 */
+	if (WARN_ON_ONCE(xfs_trans_context_active())) {
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+		return 0;
+	}
+
 	return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
 }
 
@@ -579,6 +589,13 @@ xfs_vm_writepages(
 	struct xfs_writepage_ctx wpc = { };
 
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
+	/*
+	 * Given that we do not allow direct reclaim to call us, we should
+	 * never be called while in a filesystem transaction.
+	 */
+	if (WARN_ON_ONCE(xfs_trans_context_active()))
+		return 0;
+
 	return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
 }
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index fe20398a214e..08d4916ffb13 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -124,6 +124,9 @@ xfs_trans_dup(
 	tp->t_rtx_res = tp->t_rtx_res_used;
 	ntp->t_pflags = tp->t_pflags;
 
+	/* Associate the new transaction with this thread. */
+	xfs_trans_context_swap(tp, ntp);
+
 	/* move deferred ops over to the new tp */
 	xfs_defer_move(ntp, tp);
 
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 44b11c64a15e..d596a375e3bf 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -268,16 +268,38 @@ xfs_trans_item_relog(
 	return lip->li_ops->iop_relog(lip, tp);
 }
 
+static inline bool
+xfs_trans_context_active(void)
+{
+	/* Use journal_info to indicate current is in a transaction */
+	return current->journal_info != NULL;
+}
+
 static inline void
 xfs_trans_context_set(struct xfs_trans *tp)
 {
+	ASSERT(!current->journal_info);
+	current->journal_info = tp;
 	tp->t_pflags = memalloc_nofs_save();
 }
 
 static inline void
 xfs_trans_context_clear(struct xfs_trans *tp)
 {
+	ASSERT(current->journal_info == tp);
+	current->journal_info = NULL;
 	memalloc_nofs_restore(tp->t_pflags);
 }
 
+/*
+ * Transfer the transaction context when rolling a permanent
+ * transaction.
+ */
+static inline void
+xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
+{
+	ASSERT(current->journal_info == tp);
+	current->journal_info = ntp;
+}
+
 #endif	/* __XFS_TRANS_H__ */
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-08 12:28 ` [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
@ 2020-12-08 18:59   ` Darrick J. Wong
       [not found]     ` <CALOAHbB1uKmQ7ns08KW4zH1ikqD0GAY_Y7VySzmTY0=LTEPURA@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2020-12-08 18:59 UTC (permalink / raw)
  To: Yafang Shao
  Cc: willy, david, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm,
	Christoph Hellwig

On Tue, Dec 08, 2020 at 08:28:23PM +0800, Yafang Shao wrote:
> The xfs_trans context should be active after it is allocated, and
> deactive when it is freed.
> 
> So these two helpers are refactored as,
> - xfs_trans_context_set()
>   Used in xfs_trans_alloc()
> - xfs_trans_context_clear()
>   Used in xfs_trans_free()
> 
> This patch is based on Darrick's work to fix the issue in xfs/141 in the
> earlier version. [1]
> 
> 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia
> 
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  fs/xfs/xfs_trans.c | 20 +++++++-------------
>  1 file changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 11d390f0d3f2..fe20398a214e 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -67,6 +67,9 @@ xfs_trans_free(
>  	xfs_extent_busy_sort(&tp->t_busy);
>  	xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
>  
> +	/* Detach the transaction from this thread. */
> +	xfs_trans_context_clear(tp);

Don't you need to check if tp is still the current transaction before
you clear PF_MEMALLOC_NOFS, now that the NOFS is bound to the lifespan
of the transaction itself instead of the reservation?

--D

> +
>  	trace_xfs_trans_free(tp, _RET_IP_);
>  	if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
>  		sb_end_intwrite(tp->t_mountp->m_super);
> @@ -153,9 +156,6 @@ xfs_trans_reserve(
>  	int			error = 0;
>  	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
>  
> -	/* Mark this thread as being in a transaction */
> -	xfs_trans_context_set(tp);
> -
>  	/*
>  	 * Attempt to reserve the needed disk blocks by decrementing
>  	 * the number needed from the number available.  This will
> @@ -163,10 +163,9 @@ xfs_trans_reserve(
>  	 */
>  	if (blocks > 0) {
>  		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> -		if (error != 0) {
> -			xfs_trans_context_clear(tp);
> +		if (error != 0)
>  			return -ENOSPC;
> -		}
> +
>  		tp->t_blk_res += blocks;
>  	}
>  
> @@ -241,8 +240,6 @@ xfs_trans_reserve(
>  		tp->t_blk_res = 0;
>  	}
>  
> -	xfs_trans_context_clear(tp);
> -
>  	return error;
>  }
>  
> @@ -284,6 +281,8 @@ xfs_trans_alloc(
>  	INIT_LIST_HEAD(&tp->t_dfops);
>  	tp->t_firstblock = NULLFSBLOCK;
>  
> +	/* Mark this thread as being in a transaction */
> +	xfs_trans_context_set(tp);
>  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
>  	if (error) {
>  		xfs_trans_cancel(tp);
> @@ -878,7 +877,6 @@ __xfs_trans_commit(
>  
>  	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
>  
> -	xfs_trans_context_clear(tp);
>  	xfs_trans_free(tp);
>  
>  	/*
> @@ -911,7 +909,6 @@ __xfs_trans_commit(
>  		tp->t_ticket = NULL;
>  	}
>  
> -	xfs_trans_context_clear(tp);
>  	xfs_trans_free_items(tp, !!error);
>  	xfs_trans_free(tp);
>  
> @@ -971,9 +968,6 @@ xfs_trans_cancel(
>  		tp->t_ticket = NULL;
>  	}
>  
> -	/* mark this thread as no longer being in a transaction */
> -	xfs_trans_context_clear(tp);
> -
>  	xfs_trans_free_items(tp, dirty);
>  	xfs_trans_free(tp);
>  }
> -- 
> 2.18.4
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 4/4] xfs: use current->journal_info to avoid transaction reservation recursion
  2020-12-08 12:28 ` [PATCH v11 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
@ 2020-12-08 18:59   ` Darrick J. Wong
  2020-12-09  1:40     ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2020-12-08 18:59 UTC (permalink / raw)
  To: Yafang Shao
  Cc: willy, david, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm,
	Christoph Hellwig

On Tue, Dec 08, 2020 at 08:28:24PM +0800, Yafang Shao wrote:
> PF_FSTRANS which is used to avoid transaction reservation recursion, is
> dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> means to avoid filesystem reclaim recursion.
> 
> As these two flags have different meanings, we'd better reintroduce
> PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
> we can reuse the current->journal_info to do that, per Willy. As the
> check of transaction reservation recursion is used by XFS only, we can
> move the check into xfs_vm_writepage(s), per Dave.
> 
> To better abstract that behavoir, two new helpers are introduced, as
> follows,
> - xfs_trans_context_active
>   To check whehter current is in fs transcation or not
> - xfs_trans_context_swap
>   Transfer the transaction context when rolling a permanent transaction
> 
> These two new helpers are instroduced in xfs_trans.h.
> 
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Jeff Layton <jlayton@redhat.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  fs/iomap/buffered-io.c |  7 -------
>  fs/xfs/xfs_aops.c      | 17 +++++++++++++++++
>  fs/xfs/xfs_trans.c     |  3 +++
>  fs/xfs/xfs_trans.h     | 22 ++++++++++++++++++++++
>  4 files changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 10cc7979ce38..3c53fa6ce64d 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
>  			PF_MEMALLOC))
>  		goto redirty;
>  
> -	/*
> -	 * Given that we do not allow direct reclaim to call us, we should
> -	 * never be called in a recursive filesystem reclaim context.
> -	 */
> -	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> -		goto redirty;
> -
>  	/*
>  	 * Is this page beyond the end of the file?
>  	 *
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 2371187b7615..0da0242d42c3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -568,6 +568,16 @@ xfs_vm_writepage(
>  {
>  	struct xfs_writepage_ctx wpc = { };
>  
> +	/*
> +	 * Given that we do not allow direct reclaim to call us, we should
> +	 * never be called while in a filesystem transaction.
> +	 */
> +	if (WARN_ON_ONCE(xfs_trans_context_active())) {
> +		redirty_page_for_writepage(wbc, page);
> +		unlock_page(page);
> +		return 0;
> +	}
> +
>  	return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
>  }
>  
> @@ -579,6 +589,13 @@ xfs_vm_writepages(
>  	struct xfs_writepage_ctx wpc = { };
>  
>  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> +	/*
> +	 * Given that we do not allow direct reclaim to call us, we should
> +	 * never be called while in a filesystem transaction.
> +	 */
> +	if (WARN_ON_ONCE(xfs_trans_context_active()))
> +		return 0;
> +
>  	return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
>  }
>  
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index fe20398a214e..08d4916ffb13 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -124,6 +124,9 @@ xfs_trans_dup(
>  	tp->t_rtx_res = tp->t_rtx_res_used;
>  	ntp->t_pflags = tp->t_pflags;

This one line (ntp->t_pflags = tp->t_pflags) should move to
xfs_trans_context_swap.

--D

>  
> +	/* Associate the new transaction with this thread. */
> +	xfs_trans_context_swap(tp, ntp);
> +
>  	/* move deferred ops over to the new tp */
>  	xfs_defer_move(ntp, tp);
>  
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 44b11c64a15e..d596a375e3bf 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -268,16 +268,38 @@ xfs_trans_item_relog(
>  	return lip->li_ops->iop_relog(lip, tp);
>  }
>  
> +static inline bool
> +xfs_trans_context_active(void)
> +{
> +	/* Use journal_info to indicate current is in a transaction */
> +	return current->journal_info != NULL;
> +}
> +
>  static inline void
>  xfs_trans_context_set(struct xfs_trans *tp)
>  {
> +	ASSERT(!current->journal_info);
> +	current->journal_info = tp;
>  	tp->t_pflags = memalloc_nofs_save();
>  }
>  
>  static inline void
>  xfs_trans_context_clear(struct xfs_trans *tp)
>  {
> +	ASSERT(current->journal_info == tp);
> +	current->journal_info = NULL;
>  	memalloc_nofs_restore(tp->t_pflags);
>  }
>  
> +/*
> + * Transfer the transaction context when rolling a permanent
> + * transaction.
> + */
> +static inline void
> +xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> +{
> +	ASSERT(current->journal_info == tp);
> +	current->journal_info = ntp;
> +}
> +
>  #endif	/* __XFS_TRANS_H__ */
> -- 
> 2.18.4
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction
  2020-12-08 12:28 ` [PATCH v11 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction Yafang Shao
@ 2020-12-08 19:02   ` Darrick J. Wong
  0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2020-12-08 19:02 UTC (permalink / raw)
  To: Yafang Shao
  Cc: willy, david, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm,
	Christoph Hellwig

On Tue, Dec 08, 2020 at 08:28:22PM +0800, Yafang Shao wrote:
> Introduce a new API to mark the start and end of XFS transactions.
> For now, just save and restore the memalloc_nofs flags.
> 
> The new helpers as follows,
> - xfs_trans_context_set
>   Mark the start of XFS transactions
> - xfs_trans_context_clear
>   Mark the end of XFS transactions
> 
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  fs/xfs/xfs_aops.c  |  4 ++--
>  fs/xfs/xfs_linux.h |  4 ----
>  fs/xfs/xfs_trans.c | 13 +++++++------
>  fs/xfs/xfs_trans.h | 12 ++++++++++++
>  4 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 4304c6416fbb..2371187b7615 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -62,7 +62,7 @@ xfs_setfilesize_trans_alloc(
>  	 * We hand off the transaction to the completion thread now, so
>  	 * clear the flag here.
>  	 */
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_clear(tp);
>  	return 0;
>  }
>  
> @@ -125,7 +125,7 @@ xfs_setfilesize_ioend(
>  	 * thus we need to mark ourselves as being in a transaction manually.
>  	 * Similarly for freeze protection.
>  	 */
> -	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_set(tp);
>  	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
>  
>  	/* we abort the update if there was an IO error */
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index 5b7a1e201559..6ab0f8043c73 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -102,10 +102,6 @@ typedef __u32			xfs_nlink_t;
>  #define xfs_cowb_secs		xfs_params.cowb_timer.val
>  
>  #define current_cpu()		(raw_smp_processor_id())
> -#define current_set_flags_nested(sp, f)		\
> -		(*(sp) = current->flags, current->flags |= (f))
> -#define current_restore_flags_nested(sp, f)	\
> -		(current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
>  
>  #define NBBY		8		/* number of bits per byte */
>  
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index c94e71f741b6..11d390f0d3f2 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -154,7 +154,7 @@ xfs_trans_reserve(
>  	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
>  
>  	/* Mark this thread as being in a transaction */
> -	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_set(tp);
>  
>  	/*
>  	 * Attempt to reserve the needed disk blocks by decrementing
> @@ -164,7 +164,7 @@ xfs_trans_reserve(
>  	if (blocks > 0) {
>  		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
>  		if (error != 0) {
> -			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +			xfs_trans_context_clear(tp);
>  			return -ENOSPC;
>  		}
>  		tp->t_blk_res += blocks;
> @@ -241,7 +241,7 @@ xfs_trans_reserve(
>  		tp->t_blk_res = 0;
>  	}
>  
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_clear(tp);
>  
>  	return error;
>  }
> @@ -878,7 +878,7 @@ __xfs_trans_commit(
>  
>  	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
>  
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_clear(tp);
>  	xfs_trans_free(tp);
>  
>  	/*
> @@ -910,7 +910,8 @@ __xfs_trans_commit(
>  			xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
>  		tp->t_ticket = NULL;
>  	}
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +
> +	xfs_trans_context_clear(tp);
>  	xfs_trans_free_items(tp, !!error);
>  	xfs_trans_free(tp);
>  
> @@ -971,7 +972,7 @@ xfs_trans_cancel(
>  	}
>  
>  	/* mark this thread as no longer being in a transaction */
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_clear(tp);
>  
>  	xfs_trans_free_items(tp, dirty);
>  	xfs_trans_free(tp);
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 084658946cc8..44b11c64a15e 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -268,4 +268,16 @@ xfs_trans_item_relog(
>  	return lip->li_ops->iop_relog(lip, tp);
>  }
>  
> +static inline void
> +xfs_trans_context_set(struct xfs_trans *tp)
> +{
> +	tp->t_pflags = memalloc_nofs_save();
> +}
> +
> +static inline void
> +xfs_trans_context_clear(struct xfs_trans *tp)
> +{
> +	memalloc_nofs_restore(tp->t_pflags);

It's a little strange to add the wrappers and convert the current->flags
modification macros to the memalloc_nofs_* functions in one patch, but
whatever, I'm more concerned about the things I complained about in the
next two patches.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +}
> +
>  #endif	/* __XFS_TRANS_H__ */
> -- 
> 2.18.4
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 4/4] xfs: use current->journal_info to avoid transaction reservation recursion
  2020-12-08 18:59   ` Darrick J. Wong
@ 2020-12-09  1:40     ` Yafang Shao
  0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2020-12-09  1:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, Michal Hocko,
	Andrew Morton, David Howells, jlayton, linux-fsdevel,
	linux-cachefs, linux-xfs, Linux MM, Christoph Hellwig

On Wed, Dec 9, 2020 at 3:00 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Tue, Dec 08, 2020 at 08:28:24PM +0800, Yafang Shao wrote:
> > PF_FSTRANS which is used to avoid transaction reservation recursion, is
> > dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> > PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> > memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> > means to avoid filesystem reclaim recursion.
> >
> > As these two flags have different meanings, we'd better reintroduce
> > PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
> > we can reuse the current->journal_info to do that, per Willy. As the
> > check of transaction reservation recursion is used by XFS only, we can
> > move the check into xfs_vm_writepage(s), per Dave.
> >
> > To better abstract that behavoir, two new helpers are introduced, as
> > follows,
> > - xfs_trans_context_active
> >   To check whehter current is in fs transcation or not
> > - xfs_trans_context_swap
> >   Transfer the transaction context when rolling a permanent transaction
> >
> > These two new helpers are instroduced in xfs_trans.h.
> >
> > Cc: Darrick J. Wong <darrick.wong@oracle.com>
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Jeff Layton <jlayton@redhat.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  fs/iomap/buffered-io.c |  7 -------
> >  fs/xfs/xfs_aops.c      | 17 +++++++++++++++++
> >  fs/xfs/xfs_trans.c     |  3 +++
> >  fs/xfs/xfs_trans.h     | 22 ++++++++++++++++++++++
> >  4 files changed, 42 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 10cc7979ce38..3c53fa6ce64d 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
> >                       PF_MEMALLOC))
> >               goto redirty;
> >
> > -     /*
> > -      * Given that we do not allow direct reclaim to call us, we should
> > -      * never be called in a recursive filesystem reclaim context.
> > -      */
> > -     if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> > -             goto redirty;
> > -
> >       /*
> >        * Is this page beyond the end of the file?
> >        *
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 2371187b7615..0da0242d42c3 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -568,6 +568,16 @@ xfs_vm_writepage(
> >  {
> >       struct xfs_writepage_ctx wpc = { };
> >
> > +     /*
> > +      * Given that we do not allow direct reclaim to call us, we should
> > +      * never be called while in a filesystem transaction.
> > +      */
> > +     if (WARN_ON_ONCE(xfs_trans_context_active())) {
> > +             redirty_page_for_writepage(wbc, page);
> > +             unlock_page(page);
> > +             return 0;
> > +     }
> > +
> >       return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
> >  }
> >
> > @@ -579,6 +589,13 @@ xfs_vm_writepages(
> >       struct xfs_writepage_ctx wpc = { };
> >
> >       xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> > +     /*
> > +      * Given that we do not allow direct reclaim to call us, we should
> > +      * never be called while in a filesystem transaction.
> > +      */
> > +     if (WARN_ON_ONCE(xfs_trans_context_active()))
> > +             return 0;
> > +
> >       return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
> >  }
> >
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index fe20398a214e..08d4916ffb13 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -124,6 +124,9 @@ xfs_trans_dup(
> >       tp->t_rtx_res = tp->t_rtx_res_used;
> >       ntp->t_pflags = tp->t_pflags;
>
> This one line (ntp->t_pflags = tp->t_pflags) should move to
> xfs_trans_context_swap.
>

Make sense to me.
Will update it.


> --D
>
> >
> > +     /* Associate the new transaction with this thread. */
> > +     xfs_trans_context_swap(tp, ntp);
> > +
> >       /* move deferred ops over to the new tp */
> >       xfs_defer_move(ntp, tp);
> >
> > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > index 44b11c64a15e..d596a375e3bf 100644
> > --- a/fs/xfs/xfs_trans.h
> > +++ b/fs/xfs/xfs_trans.h
> > @@ -268,16 +268,38 @@ xfs_trans_item_relog(
> >       return lip->li_ops->iop_relog(lip, tp);
> >  }
> >
> > +static inline bool
> > +xfs_trans_context_active(void)
> > +{
> > +     /* Use journal_info to indicate current is in a transaction */
> > +     return current->journal_info != NULL;
> > +}
> > +
> >  static inline void
> >  xfs_trans_context_set(struct xfs_trans *tp)
> >  {
> > +     ASSERT(!current->journal_info);
> > +     current->journal_info = tp;
> >       tp->t_pflags = memalloc_nofs_save();
> >  }
> >
> >  static inline void
> >  xfs_trans_context_clear(struct xfs_trans *tp)
> >  {
> > +     ASSERT(current->journal_info == tp);
> > +     current->journal_info = NULL;
> >       memalloc_nofs_restore(tp->t_pflags);
> >  }
> >
> > +/*
> > + * Transfer the transaction context when rolling a permanent
> > + * transaction.
> > + */
> > +static inline void
> > +xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> > +{
> > +     ASSERT(current->journal_info == tp);
> > +     current->journal_info = ntp;
> > +}
> > +
> >  #endif       /* __XFS_TRANS_H__ */
> > --
> > 2.18.4
> >



--
Thanks
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
       [not found]     ` <CALOAHbB1uKmQ7ns08KW4zH1ikqD0GAY_Y7VySzmTY0=LTEPURA@mail.gmail.com>
@ 2020-12-09  3:53       ` Darrick J. Wong
  2020-12-09 10:43         ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2020-12-09  3:53 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, Michal Hocko,
	Andrew Morton, David Howells, jlayton, linux-fsdevel,
	linux-cachefs, linux-xfs, Linux MM, Christoph Hellwig

On Wed, Dec 09, 2020 at 09:47:38AM +0800, Yafang Shao wrote:
> On Wed, Dec 9, 2020 at 2:59 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Tue, Dec 08, 2020 at 08:28:23PM +0800, Yafang Shao wrote:
> > > The xfs_trans context should be active after it is allocated, and
> > > deactive when it is freed.
> > >
> > > So these two helpers are refactored as,
> > > - xfs_trans_context_set()
> > >   Used in xfs_trans_alloc()
> > > - xfs_trans_context_clear()
> > >   Used in xfs_trans_free()
> > >
> > > This patch is based on Darrick's work to fix the issue in xfs/141 in the
> > > earlier version. [1]
> > >
> > > 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia
> > >
> > > Cc: Darrick J. Wong <darrick.wong@oracle.com>
> > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  fs/xfs/xfs_trans.c | 20 +++++++-------------
> > >  1 file changed, 7 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > index 11d390f0d3f2..fe20398a214e 100644
> > > --- a/fs/xfs/xfs_trans.c
> > > +++ b/fs/xfs/xfs_trans.c
> > > @@ -67,6 +67,9 @@ xfs_trans_free(
> > >       xfs_extent_busy_sort(&tp->t_busy);
> > >       xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
> > >
> > > +     /* Detach the transaction from this thread. */
> > > +     xfs_trans_context_clear(tp);
> >
> > Don't you need to check if tp is still the current transaction before
> > you clear PF_MEMALLOC_NOFS, now that the NOFS is bound to the lifespan
> > of the transaction itself instead of the reservation?
> >
> 
> The current->journal_info is always the same with tp here in my verification.
> I don't know in which case they are different.

I don't know why you changed it from the previous version.

> It would be better if you could explain in detail.  Anyway I can add
> the check with your comment in the next version.

xfs_trans_alloc is called to allocate a transaction.  We set _NOFS and
save the old flags (which don't contain _NOFS) to this transaction.

thread logs some changes and calls xfs_trans_roll.

xfs_trans_roll calls xfs_trans_dup to duplicate the old transaction.

xfs_trans_dup allocates a new transaction, which sets PF_MEMALLOC_NOFS
and saves the current context flags (in which _NOFS is set) in the new
transaction.

xfs_trans_roll then commits the old transaction

xfs_trans_commit frees the old transaction

xfs_trans_free restores the old context (which didn't have _NOFS) and
now we've dropped NOFS incorrectly

now we move on with the new transaction, but in the wrong NOFS mode.

note that this becomes a lot more obvious once you start fiddling with
current->journal_info in the last patch.

--D

> 
> >
> > > +
> > >       trace_xfs_trans_free(tp, _RET_IP_);
> > >       if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
> > >               sb_end_intwrite(tp->t_mountp->m_super);
> > > @@ -153,9 +156,6 @@ xfs_trans_reserve(
> > >       int                     error = 0;
> > >       bool                    rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
> > >
> > > -     /* Mark this thread as being in a transaction */
> > > -     xfs_trans_context_set(tp);
> > > -
> > >       /*
> > >        * Attempt to reserve the needed disk blocks by decrementing
> > >        * the number needed from the number available.  This will
> > > @@ -163,10 +163,9 @@ xfs_trans_reserve(
> > >        */
> > >       if (blocks > 0) {
> > >               error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> > > -             if (error != 0) {
> > > -                     xfs_trans_context_clear(tp);
> > > +             if (error != 0)
> > >                       return -ENOSPC;
> > > -             }
> > > +
> > >               tp->t_blk_res += blocks;
> > >       }
> > >
> > > @@ -241,8 +240,6 @@ xfs_trans_reserve(
> > >               tp->t_blk_res = 0;
> > >       }
> > >
> > > -     xfs_trans_context_clear(tp);
> > > -
> > >       return error;
> > >  }
> > >
> > > @@ -284,6 +281,8 @@ xfs_trans_alloc(
> > >       INIT_LIST_HEAD(&tp->t_dfops);
> > >       tp->t_firstblock = NULLFSBLOCK;
> > >
> > > +     /* Mark this thread as being in a transaction */
> > > +     xfs_trans_context_set(tp);
> > >       error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > >       if (error) {
> > >               xfs_trans_cancel(tp);
> > > @@ -878,7 +877,6 @@ __xfs_trans_commit(
> > >
> > >       xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
> > >
> > > -     xfs_trans_context_clear(tp);
> > >       xfs_trans_free(tp);
> > >
> > >       /*
> > > @@ -911,7 +909,6 @@ __xfs_trans_commit(
> > >               tp->t_ticket = NULL;
> > >       }
> > >
> > > -     xfs_trans_context_clear(tp);
> > >       xfs_trans_free_items(tp, !!error);
> > >       xfs_trans_free(tp);
> > >
> > > @@ -971,9 +968,6 @@ xfs_trans_cancel(
> > >               tp->t_ticket = NULL;
> > >       }
> > >
> > > -     /* mark this thread as no longer being in a transaction */
> > > -     xfs_trans_context_clear(tp);
> > > -
> > >       xfs_trans_free_items(tp, dirty);
> > >       xfs_trans_free(tp);
> > >  }
> > > --
> > > 2.18.4
> > >
> 
> 
> 
> -- 
> Thanks
> Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-09  3:53       ` Darrick J. Wong
@ 2020-12-09 10:43         ` Yafang Shao
  0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2020-12-09 10:43 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, Michal Hocko,
	Andrew Morton, David Howells, jlayton, linux-fsdevel,
	linux-cachefs, linux-xfs, Linux MM, Christoph Hellwig

On Wed, Dec 9, 2020 at 11:53 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Wed, Dec 09, 2020 at 09:47:38AM +0800, Yafang Shao wrote:
> > On Wed, Dec 9, 2020 at 2:59 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > >
> > > On Tue, Dec 08, 2020 at 08:28:23PM +0800, Yafang Shao wrote:
> > > > The xfs_trans context should be active after it is allocated, and
> > > > deactive when it is freed.
> > > >
> > > > So these two helpers are refactored as,
> > > > - xfs_trans_context_set()
> > > >   Used in xfs_trans_alloc()
> > > > - xfs_trans_context_clear()
> > > >   Used in xfs_trans_free()
> > > >
> > > > This patch is based on Darrick's work to fix the issue in xfs/141 in the
> > > > earlier version. [1]
> > > >
> > > > 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia
> > > >
> > > > Cc: Darrick J. Wong <darrick.wong@oracle.com>
> > > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > > Cc: Dave Chinner <david@fromorbit.com>
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  fs/xfs/xfs_trans.c | 20 +++++++-------------
> > > >  1 file changed, 7 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > > index 11d390f0d3f2..fe20398a214e 100644
> > > > --- a/fs/xfs/xfs_trans.c
> > > > +++ b/fs/xfs/xfs_trans.c
> > > > @@ -67,6 +67,9 @@ xfs_trans_free(
> > > >       xfs_extent_busy_sort(&tp->t_busy);
> > > >       xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
> > > >
> > > > +     /* Detach the transaction from this thread. */
> > > > +     xfs_trans_context_clear(tp);
> > >
> > > Don't you need to check if tp is still the current transaction before
> > > you clear PF_MEMALLOC_NOFS, now that the NOFS is bound to the lifespan
> > > of the transaction itself instead of the reservation?
> > >
> >
> > The current->journal_info is always the same with tp here in my verification.
> > I don't know in which case they are different.
>
> I don't know why you changed it from the previous version.
>

I should explain it in the change log. Sorry about that.

> > It would be better if you could explain in detail.  Anyway I can add
> > the check with your comment in the next version.
>
> xfs_trans_alloc is called to allocate a transaction.  We set _NOFS and
> save the old flags (which don't contain _NOFS) to this transaction.
>
> thread logs some changes and calls xfs_trans_roll.
>
> xfs_trans_roll calls xfs_trans_dup to duplicate the old transaction.
>
> xfs_trans_dup allocates a new transaction, which sets PF_MEMALLOC_NOFS
> and saves the current context flags (in which _NOFS is set) in the new
> transaction.
>
> xfs_trans_roll then commits the old transaction
>
> xfs_trans_commit frees the old transaction
>
> xfs_trans_free restores the old context (which didn't have _NOFS) and
> now we've dropped NOFS incorrectly
>
> now we move on with the new transaction, but in the wrong NOFS mode.
>
> note that this becomes a lot more obvious once you start fiddling with
> current->journal_info in the last patch.
>

Many thanks for the detailed explanation. I missed the rolling transaction.
I will add this check in the next version.

> --D
>
> >
> > >
> > > > +
> > > >       trace_xfs_trans_free(tp, _RET_IP_);
> > > >       if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
> > > >               sb_end_intwrite(tp->t_mountp->m_super);
> > > > @@ -153,9 +156,6 @@ xfs_trans_reserve(
> > > >       int                     error = 0;
> > > >       bool                    rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
> > > >
> > > > -     /* Mark this thread as being in a transaction */
> > > > -     xfs_trans_context_set(tp);
> > > > -
> > > >       /*
> > > >        * Attempt to reserve the needed disk blocks by decrementing
> > > >        * the number needed from the number available.  This will
> > > > @@ -163,10 +163,9 @@ xfs_trans_reserve(
> > > >        */
> > > >       if (blocks > 0) {
> > > >               error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> > > > -             if (error != 0) {
> > > > -                     xfs_trans_context_clear(tp);
> > > > +             if (error != 0)
> > > >                       return -ENOSPC;
> > > > -             }
> > > > +
> > > >               tp->t_blk_res += blocks;
> > > >       }
> > > >
> > > > @@ -241,8 +240,6 @@ xfs_trans_reserve(
> > > >               tp->t_blk_res = 0;
> > > >       }
> > > >
> > > > -     xfs_trans_context_clear(tp);
> > > > -
> > > >       return error;
> > > >  }
> > > >
> > > > @@ -284,6 +281,8 @@ xfs_trans_alloc(
> > > >       INIT_LIST_HEAD(&tp->t_dfops);
> > > >       tp->t_firstblock = NULLFSBLOCK;
> > > >
> > > > +     /* Mark this thread as being in a transaction */
> > > > +     xfs_trans_context_set(tp);
> > > >       error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> > > >       if (error) {
> > > >               xfs_trans_cancel(tp);
> > > > @@ -878,7 +877,6 @@ __xfs_trans_commit(
> > > >
> > > >       xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
> > > >
> > > > -     xfs_trans_context_clear(tp);
> > > >       xfs_trans_free(tp);
> > > >
> > > >       /*
> > > > @@ -911,7 +909,6 @@ __xfs_trans_commit(
> > > >               tp->t_ticket = NULL;
> > > >       }
> > > >
> > > > -     xfs_trans_context_clear(tp);
> > > >       xfs_trans_free_items(tp, !!error);
> > > >       xfs_trans_free(tp);
> > > >
> > > > @@ -971,9 +968,6 @@ xfs_trans_cancel(
> > > >               tp->t_ticket = NULL;
> > > >       }
> > > >
> > > > -     /* mark this thread as no longer being in a transaction */
> > > > -     xfs_trans_context_clear(tp);
> > > > -
> > > >       xfs_trans_free_items(tp, dirty);
> > > >       xfs_trans_free(tp);
> > > >  }
> > > > --
> > > > 2.18.4
> > > >
> >
> >
> >
> > --
> > Thanks
> > Yafang



-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-12-09 10:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08 12:28 [PATCH v11 0/4] xfs: avoid transaction reservation recursion Yafang Shao
2020-12-08 12:28 ` [PATCH v11 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
2020-12-08 12:28 ` [PATCH v11 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction Yafang Shao
2020-12-08 19:02   ` Darrick J. Wong
2020-12-08 12:28 ` [PATCH v11 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
2020-12-08 18:59   ` Darrick J. Wong
     [not found]     ` <CALOAHbB1uKmQ7ns08KW4zH1ikqD0GAY_Y7VySzmTY0=LTEPURA@mail.gmail.com>
2020-12-09  3:53       ` Darrick J. Wong
2020-12-09 10:43         ` Yafang Shao
2020-12-08 12:28 ` [PATCH v11 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
2020-12-08 18:59   ` Darrick J. Wong
2020-12-09  1:40     ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).