linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 0/4] xfs: avoid transaction reservation recursion
@ 2020-12-17  1:11 Yafang Shao
  2020-12-17  1:11 ` [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Yafang Shao @ 2020-12-17  1:11 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao

PF_FSTRANS which is used to avoid transaction reservation recursion, is
dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
memalloc_nofs_{save,restore} API"), and replaced by PF_MEMALLOC_NOFS which
means to avoid filesystem reclaim recursion.

As these two flags have different meanings, we'd better reintroduce
PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
we can reuse the current->journal_info to do that, per Willy. As the 
check of transaction reservation recursion is used by XFS only, we can 
move the check into xfs_vm_writepage(s), per Dave.

Patch #1 and #2 are to use the memalloc_nofs_{save,restore} API 
Patch #1 is picked form Willy's patchset "Overhaul memalloc_no*"[1]

Patch #3 is the refactor of xfs_trans context, which is activated when
xfs_trans is allocated and deactivated when xfs_trans is freed.

Patch #4 is the implementation of reusing current->journal_info to
avoid transaction reservation recursion.

No obvious error occurred after running xfstests.

[1]. https://lore.kernel.org/linux-mm/20200625113122.7540-1-willy@infradead.org

v13:
- move xfs_trans_context_swap() into patch #3 and set NOFS to the old
  transaction

v12:
Per Darrick's suggestion,
- add the check before calling xfs_trans_context_clear() in
  xfs_trans_context_free().
- move t_pflags into xfs_trans_context_swap()

v11:
- add the warning at the callsite of xfs_trans_context_active()
- improve the commit log of patch #2

v10:
- refactor the code, per Dave.

v9:
- rebase it on xfs tree.
- Darrick fixed an error occurred in xfs/141
- run xfstests, and no obvious error occurred.

v8:
- check xfs_trans_context_active() in xfs_vm_writepage(s), per Dave.

v7:
- check fstrans recursion for XFS only, by introducing a new member in
  struct writeback_control.

v6:
- add Michal's ack and comment in patch #1.

v5:
- pick one of Willy's patch
- introduce four new helpers, per Dave

v4:
- retitle from "xfs: introduce task->in_fstrans for transaction reservation
  recursion protection"
- reuse current->journal_info, per Willy

Matthew Wilcox (Oracle) (1):
  mm: Add become_kswapd and restore_kswapd

Yafang Shao (3):
  xfs: use memalloc_nofs_{save,restore} in xfs transaction
  xfs: refactor the usage around xfs_trans_context_{set,clear}
  xfs: use current->journal_info to avoid transaction reservation
    recursion

 fs/iomap/buffered-io.c    |  7 -------
 fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
 fs/xfs/xfs_aops.c         | 21 +++++++++++++++++++--
 fs/xfs/xfs_linux.h        |  4 ----
 fs/xfs/xfs_trans.c        | 26 ++++++++++++--------------
 fs/xfs/xfs_trans.h        | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
 mm/vmscan.c               | 16 +---------------
 8 files changed, 100 insertions(+), 48 deletions(-)

-- 
2.18.4



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd
  2020-12-17  1:11 [PATCH v13 0/4] xfs: avoid transaction reservation recursion Yafang Shao
@ 2020-12-17  1:11 ` Yafang Shao
  2020-12-17  3:06   ` Dave Chinner
  2020-12-17  1:11 ` [PATCH v13 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction Yafang Shao
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Yafang Shao @ 2020-12-17  1:11 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Michal Hocko,
	Christoph Hellwig, Yafang Shao

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Since XFS needs to pretend to be kswapd in some of its worker threads,
create methods to save & restore kswapd state.  Don't bother restoring
kswapd state in kswapd -- the only time we reach this code is when we're
exiting and the task_struct is about to be destroyed anyway.

Cc: Dave Chinner <david@fromorbit.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
 include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
 mm/vmscan.c               | 16 +---------------
 3 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 51dbff9b0908..0f35b7a38e76 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -2813,8 +2813,9 @@ xfs_btree_split_worker(
 {
 	struct xfs_btree_split_args	*args = container_of(work,
 						struct xfs_btree_split_args, work);
+	bool			is_kswapd = args->kswapd;
 	unsigned long		pflags;
-	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
+	int			memalloc_nofs;
 
 	/*
 	 * we are in a transaction context here, but may also be doing work
@@ -2822,16 +2823,17 @@ xfs_btree_split_worker(
 	 * temporarily to ensure that we don't block waiting for memory reclaim
 	 * in any way.
 	 */
-	if (args->kswapd)
-		new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
-
-	current_set_flags_nested(&pflags, new_pflags);
+	if (is_kswapd)
+		pflags = become_kswapd();
+	memalloc_nofs = memalloc_nofs_save();
 
 	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
 					 args->key, args->curp, args->stat);
 	complete(args->done);
 
-	current_restore_flags_nested(&pflags, new_pflags);
+	memalloc_nofs_restore(memalloc_nofs);
+	if (is_kswapd)
+		restore_kswapd(pflags);
 }
 
 /*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5ece7a9a403..2faf03e79a1e 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
 }
 #endif
 
+/*
+ * Tell the memory management code that this thread is working on behalf
+ * of background memory reclaim (like kswapd).  That means that it will
+ * get access to memory reserves should it need to allocate memory in
+ * order to make forward progress.  With this great power comes great
+ * responsibility to not exhaust those reserves.
+ */
+#define KSWAPD_PF_FLAGS		(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
+
+static inline unsigned long become_kswapd(void)
+{
+	unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
+
+	current->flags |= KSWAPD_PF_FLAGS;
+
+	return flags;
+}
+
+static inline void restore_kswapd(unsigned long flags)
+{
+	current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
+}
+
 #ifdef CONFIG_MEMCG
 DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
 /**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7b4e31eac2cf..15af99d1f3f7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3870,19 +3870,7 @@ static int kswapd(void *p)
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(tsk, cpumask);
 
-	/*
-	 * Tell the memory management that we're a "memory allocator",
-	 * and that if we need more memory we should get access to it
-	 * regardless (see "__alloc_pages()"). "kswapd" should
-	 * never get caught in the normal page freeing logic.
-	 *
-	 * (Kswapd normally doesn't need memory anyway, but sometimes
-	 * you need a small amount of memory in order to be able to
-	 * page out something else, and this flag essentially protects
-	 * us from recursively trying to free more memory as we're
-	 * trying to free the first piece of memory in the first place).
-	 */
-	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
+	become_kswapd();
 	set_freezable();
 
 	WRITE_ONCE(pgdat->kswapd_order, 0);
@@ -3932,8 +3920,6 @@ static int kswapd(void *p)
 			goto kswapd_try_sleep;
 	}
 
-	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
-
 	return 0;
 }
 
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v13 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction
  2020-12-17  1:11 [PATCH v13 0/4] xfs: avoid transaction reservation recursion Yafang Shao
  2020-12-17  1:11 ` [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
@ 2020-12-17  1:11 ` Yafang Shao
  2020-12-17  1:11 ` [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
  2020-12-17  1:11 ` [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
  3 siblings, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2020-12-17  1:11 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao,
	Christoph Hellwig

Introduce a new API to mark the start and end of XFS transactions.
For now, just save and restore the memalloc_nofs flags.

The new helpers as follows,
- xfs_trans_context_set
  Mark the start of XFS transactions
- xfs_trans_context_clear
  Mark the end of XFS transactions

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/xfs_aops.c  |  4 ++--
 fs/xfs/xfs_linux.h |  4 ----
 fs/xfs/xfs_trans.c | 13 +++++++------
 fs/xfs/xfs_trans.h | 12 ++++++++++++
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4304c6416fbb..2371187b7615 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -62,7 +62,7 @@ xfs_setfilesize_trans_alloc(
 	 * We hand off the transaction to the completion thread now, so
 	 * clear the flag here.
 	 */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 	return 0;
 }
 
@@ -125,7 +125,7 @@ xfs_setfilesize_ioend(
 	 * thus we need to mark ourselves as being in a transaction manually.
 	 * Similarly for freeze protection.
 	 */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_set(tp);
 	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
 
 	/* we abort the update if there was an IO error */
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 5b7a1e201559..6ab0f8043c73 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -102,10 +102,6 @@ typedef __u32			xfs_nlink_t;
 #define xfs_cowb_secs		xfs_params.cowb_timer.val
 
 #define current_cpu()		(raw_smp_processor_id())
-#define current_set_flags_nested(sp, f)		\
-		(*(sp) = current->flags, current->flags |= (f))
-#define current_restore_flags_nested(sp, f)	\
-		(current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
 
 #define NBBY		8		/* number of bits per byte */
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index c94e71f741b6..11d390f0d3f2 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -154,7 +154,7 @@ xfs_trans_reserve(
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 
 	/* Mark this thread as being in a transaction */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_set(tp);
 
 	/*
 	 * Attempt to reserve the needed disk blocks by decrementing
@@ -164,7 +164,7 @@ xfs_trans_reserve(
 	if (blocks > 0) {
 		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
 		if (error != 0) {
-			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+			xfs_trans_context_clear(tp);
 			return -ENOSPC;
 		}
 		tp->t_blk_res += blocks;
@@ -241,7 +241,7 @@ xfs_trans_reserve(
 		tp->t_blk_res = 0;
 	}
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 
 	return error;
 }
@@ -878,7 +878,7 @@ __xfs_trans_commit(
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 	xfs_trans_free(tp);
 
 	/*
@@ -910,7 +910,8 @@ __xfs_trans_commit(
 			xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
 		tp->t_ticket = NULL;
 	}
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+
+	xfs_trans_context_clear(tp);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -971,7 +972,7 @@ xfs_trans_cancel(
 	}
 
 	/* mark this thread as no longer being in a transaction */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 084658946cc8..44b11c64a15e 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -268,4 +268,16 @@ xfs_trans_item_relog(
 	return lip->li_ops->iop_relog(lip, tp);
 }
 
+static inline void
+xfs_trans_context_set(struct xfs_trans *tp)
+{
+	tp->t_pflags = memalloc_nofs_save();
+}
+
+static inline void
+xfs_trans_context_clear(struct xfs_trans *tp)
+{
+	memalloc_nofs_restore(tp->t_pflags);
+}
+
 #endif	/* __XFS_TRANS_H__ */
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-17  1:11 [PATCH v13 0/4] xfs: avoid transaction reservation recursion Yafang Shao
  2020-12-17  1:11 ` [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
  2020-12-17  1:11 ` [PATCH v13 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction Yafang Shao
@ 2020-12-17  1:11 ` Yafang Shao
  2020-12-17 22:15   ` Dave Chinner
  2020-12-17  1:11 ` [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
  3 siblings, 1 reply; 15+ messages in thread
From: Yafang Shao @ 2020-12-17  1:11 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao,
	Christoph Hellwig

The xfs_trans context should be active after it is allocated, and
deactive when it is freed.
The rolling transaction should be specially considered, because in the
case when we clear the old transaction the thread's NOFS state shouldn't
be changed, as a result we have to set NOFS in the old transaction's
t_pflags in xfs_trans_context_swap().

So these helpers are refactored as,
- xfs_trans_context_set()
  Used in xfs_trans_alloc()
- xfs_trans_context_clear()
  Used in xfs_trans_free()

And a new helper is instroduced to handle the rolling transaction,
- xfs_trans_context_swap()
  Used in rolling transaction

This patch is based on Darrick's work to fix the issue in xfs/141 in the
earlier version. [1]

1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/xfs_trans.c | 25 +++++++++++--------------
 fs/xfs/xfs_trans.h | 13 +++++++++++++
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 11d390f0d3f2..aa213c3e2408 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -67,6 +67,10 @@ xfs_trans_free(
 	xfs_extent_busy_sort(&tp->t_busy);
 	xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
 
+
+	/* Detach the transaction from this thread. */
+	xfs_trans_context_clear(tp);
+
 	trace_xfs_trans_free(tp, _RET_IP_);
 	if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
 		sb_end_intwrite(tp->t_mountp->m_super);
@@ -119,7 +123,9 @@ xfs_trans_dup(
 
 	ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
 	tp->t_rtx_res = tp->t_rtx_res_used;
-	ntp->t_pflags = tp->t_pflags;
+
+	/* Associate the new transaction with this thread. */
+	xfs_trans_context_swap(tp, ntp);
 
 	/* move deferred ops over to the new tp */
 	xfs_defer_move(ntp, tp);
@@ -153,9 +159,6 @@ xfs_trans_reserve(
 	int			error = 0;
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 
-	/* Mark this thread as being in a transaction */
-	xfs_trans_context_set(tp);
-
 	/*
 	 * Attempt to reserve the needed disk blocks by decrementing
 	 * the number needed from the number available.  This will
@@ -163,10 +166,9 @@ xfs_trans_reserve(
 	 */
 	if (blocks > 0) {
 		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
-		if (error != 0) {
-			xfs_trans_context_clear(tp);
+		if (error != 0)
 			return -ENOSPC;
-		}
+
 		tp->t_blk_res += blocks;
 	}
 
@@ -241,8 +243,6 @@ xfs_trans_reserve(
 		tp->t_blk_res = 0;
 	}
 
-	xfs_trans_context_clear(tp);
-
 	return error;
 }
 
@@ -284,6 +284,8 @@ xfs_trans_alloc(
 	INIT_LIST_HEAD(&tp->t_dfops);
 	tp->t_firstblock = NULLFSBLOCK;
 
+	/* Mark this thread as being in a transaction */
+	xfs_trans_context_set(tp);
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
 	if (error) {
 		xfs_trans_cancel(tp);
@@ -878,7 +880,6 @@ __xfs_trans_commit(
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	xfs_trans_context_clear(tp);
 	xfs_trans_free(tp);
 
 	/*
@@ -911,7 +912,6 @@ __xfs_trans_commit(
 		tp->t_ticket = NULL;
 	}
 
-	xfs_trans_context_clear(tp);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -971,9 +971,6 @@ xfs_trans_cancel(
 		tp->t_ticket = NULL;
 	}
 
-	/* mark this thread as no longer being in a transaction */
-	xfs_trans_context_clear(tp);
-
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
 }
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 44b11c64a15e..12380eaaf7ce 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -280,4 +280,17 @@ xfs_trans_context_clear(struct xfs_trans *tp)
 	memalloc_nofs_restore(tp->t_pflags);
 }
 
+static inline void
+xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
+{
+	ntp->t_pflags = tp->t_pflags;
+	/*
+	 * For the rolling transaction, we have to set NOFS in the old
+	 * transaction's t_pflags so that when we clear the context on
+	 * the old transaction we don't actually change the thread's NOFS
+	 * state.
+	 */
+	tp->t_pflags = current->flags | PF_MEMALLOC_NOFS;
+}
+
 #endif	/* __XFS_TRANS_H__ */
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion
  2020-12-17  1:11 [PATCH v13 0/4] xfs: avoid transaction reservation recursion Yafang Shao
                   ` (2 preceding siblings ...)
  2020-12-17  1:11 ` [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
@ 2020-12-17  1:11 ` Yafang Shao
  2020-12-18  0:14   ` Dave Chinner
  3 siblings, 1 reply; 15+ messages in thread
From: Yafang Shao @ 2020-12-17  1:11 UTC (permalink / raw)
  To: darrick.wong, willy, david, hch, mhocko, akpm, dhowells, jlayton
  Cc: linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Yafang Shao,
	Christoph Hellwig

PF_FSTRANS which is used to avoid transaction reservation recursion, is
dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
PF_MEMALLOC_NOFS") and replaced by PF_MEMALLOC_NOFS which means to avoid
filesystem reclaim recursion.

As these two flags have different meanings, we'd better reintroduce
PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
we can reuse the current->journal_info to do that, per Willy. As the
check of transaction reservation recursion is used by XFS only, we can
move the check into xfs_vm_writepage(s), per Dave.

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/iomap/buffered-io.c |  7 -------
 fs/xfs/xfs_aops.c      | 17 +++++++++++++++++
 fs/xfs/xfs_trans.h     | 26 +++++++++++++++++++-------
 3 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 10cc7979ce38..3c53fa6ce64d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 			PF_MEMALLOC))
 		goto redirty;
 
-	/*
-	 * Given that we do not allow direct reclaim to call us, we should
-	 * never be called in a recursive filesystem reclaim context.
-	 */
-	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
-		goto redirty;
-
 	/*
 	 * Is this page beyond the end of the file?
 	 *
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2371187b7615..0da0242d42c3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -568,6 +568,16 @@ xfs_vm_writepage(
 {
 	struct xfs_writepage_ctx wpc = { };
 
+	/*
+	 * Given that we do not allow direct reclaim to call us, we should
+	 * never be called while in a filesystem transaction.
+	 */
+	if (WARN_ON_ONCE(xfs_trans_context_active())) {
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+		return 0;
+	}
+
 	return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
 }
 
@@ -579,6 +589,13 @@ xfs_vm_writepages(
 	struct xfs_writepage_ctx wpc = { };
 
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
+	/*
+	 * Given that we do not allow direct reclaim to call us, we should
+	 * never be called while in a filesystem transaction.
+	 */
+	if (WARN_ON_ONCE(xfs_trans_context_active()))
+		return 0;
+
 	return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
 }
 
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 12380eaaf7ce..0c8140147b9b 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -268,29 +268,41 @@ xfs_trans_item_relog(
 	return lip->li_ops->iop_relog(lip, tp);
 }
 
+static inline bool
+xfs_trans_context_active(void)
+{
+	/* Use journal_info to indicate current is in a transaction */
+	return current->journal_info != NULL;
+}
+
 static inline void
 xfs_trans_context_set(struct xfs_trans *tp)
 {
+	ASSERT(!current->journal_info);
+	current->journal_info = tp;
 	tp->t_pflags = memalloc_nofs_save();
 }
 
 static inline void
 xfs_trans_context_clear(struct xfs_trans *tp)
 {
+	/*
+	 * If xfs_trans_context_swap() handed the NOFS context to a
+	 * new transaction we do not clear the context here.
+	 */
+	if (current->journal_info != tp)
+		return;
+
+	current->journal_info = NULL;
 	memalloc_nofs_restore(tp->t_pflags);
 }
 
 static inline void
 xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
 {
+	ASSERT(current->journal_info == tp);
+	current->journal_info = ntp;
 	ntp->t_pflags = tp->t_pflags;
-	/*
-	 * For the rolling transaction, we have to set NOFS in the old
-	 * transaction's t_pflags so that when we clear the context on
-	 * the old transaction we don't actually change the thread's NOFS
-	 * state.
-	 */
-	tp->t_pflags = current->flags | PF_MEMALLOC_NOFS;
 }
 
 #endif	/* __XFS_TRANS_H__ */
-- 
2.18.4



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd
  2020-12-17  1:11 ` [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
@ 2020-12-17  3:06   ` Dave Chinner
  2020-12-17  4:00     ` Matthew Wilcox
  2020-12-17  4:46     ` Yafang Shao
  0 siblings, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2020-12-17  3:06 UTC (permalink / raw)
  To: Yafang Shao
  Cc: darrick.wong, willy, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Michal Hocko,
	Christoph Hellwig

On Thu, Dec 17, 2020 at 09:11:54AM +0800, Yafang Shao wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Since XFS needs to pretend to be kswapd in some of its worker threads,
> create methods to save & restore kswapd state.  Don't bother restoring
> kswapd state in kswapd -- the only time we reach this code is when we're
> exiting and the task_struct is about to be destroyed anyway.
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
>  include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
>  mm/vmscan.c               | 16 +---------------
>  3 files changed, 32 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 51dbff9b0908..0f35b7a38e76 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -2813,8 +2813,9 @@ xfs_btree_split_worker(
>  {
>  	struct xfs_btree_split_args	*args = container_of(work,
>  						struct xfs_btree_split_args, work);
> +	bool			is_kswapd = args->kswapd;
>  	unsigned long		pflags;
> -	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
> +	int			memalloc_nofs;
>  
>  	/*
>  	 * we are in a transaction context here, but may also be doing work
> @@ -2822,16 +2823,17 @@ xfs_btree_split_worker(
>  	 * temporarily to ensure that we don't block waiting for memory reclaim
>  	 * in any way.
>  	 */
> -	if (args->kswapd)
> -		new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> -
> -	current_set_flags_nested(&pflags, new_pflags);
> +	if (is_kswapd)
> +		pflags = become_kswapd();
> +	memalloc_nofs = memalloc_nofs_save();
>  
>  	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
>  					 args->key, args->curp, args->stat);
>  	complete(args->done);
>  
> -	current_restore_flags_nested(&pflags, new_pflags);
> +	memalloc_nofs_restore(memalloc_nofs);
> +	if (is_kswapd)
> +		restore_kswapd(pflags);
>  }
>  
>  /*
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index d5ece7a9a403..2faf03e79a1e 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
>  }
>  #endif
>  
> +/*
> + * Tell the memory management code that this thread is working on behalf
> + * of background memory reclaim (like kswapd).  That means that it will
> + * get access to memory reserves should it need to allocate memory in
> + * order to make forward progress.  With this great power comes great
> + * responsibility to not exhaust those reserves.
> + */
> +#define KSWAPD_PF_FLAGS		(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
> +
> +static inline unsigned long become_kswapd(void)
> +{
> +	unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
> +
> +	current->flags |= KSWAPD_PF_FLAGS;
> +
> +	return flags;
> +}

You can get rid of the empty lines out of this function.

> +static inline void restore_kswapd(unsigned long flags)
> +{
> +	current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
> +}

Urk, that requires thinking about to determine whether it is
correct. And it is 3 runtime logic operations (^, ~ and &) too. The
way all the memalloc_*_restore() functions restore the previous
flags is obviously correct and only requires 2 runtime logic
operations because the compiler calculates the ~ operation on the
constant. So why do it differently here? i.e.:

	current->flags = (current->flags & ~KSWAPD_PF_FLAGS) | flags;

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3870,19 +3870,7 @@ static int kswapd(void *p)
>  	if (!cpumask_empty(cpumask))
>  		set_cpus_allowed_ptr(tsk, cpumask);
>  
> -	/*
> -	 * Tell the memory management that we're a "memory allocator",
> -	 * and that if we need more memory we should get access to it
> -	 * regardless (see "__alloc_pages()"). "kswapd" should
> -	 * never get caught in the normal page freeing logic.
> -	 *
> -	 * (Kswapd normally doesn't need memory anyway, but sometimes
> -	 * you need a small amount of memory in order to be able to
> -	 * page out something else, and this flag essentially protects
> -	 * us from recursively trying to free more memory as we're
> -	 * trying to free the first piece of memory in the first place).
> -	 */
> -	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> +	become_kswapd();
>  	set_freezable();
>  
>  	WRITE_ONCE(pgdat->kswapd_order, 0);
> @@ -3932,8 +3920,6 @@ static int kswapd(void *p)
>  			goto kswapd_try_sleep;
>  	}
>  
> -	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> -

Missing a restore_kswapd()?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd
  2020-12-17  3:06   ` Dave Chinner
@ 2020-12-17  4:00     ` Matthew Wilcox
  2020-12-17  4:46     ` Yafang Shao
  1 sibling, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2020-12-17  4:00 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Yafang Shao, darrick.wong, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm, Michal Hocko,
	Christoph Hellwig

On Thu, Dec 17, 2020 at 02:06:09PM +1100, Dave Chinner wrote:
> On Thu, Dec 17, 2020 at 09:11:54AM +0800, Yafang Shao wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > Since XFS needs to pretend to be kswapd in some of its worker threads,
> > create methods to save & restore kswapd state.  Don't bother restoring
> > kswapd state in kswapd -- the only time we reach this code is when we're
> > exiting and the task_struct is about to be destroyed anyway.
...
> > @@ -3932,8 +3920,6 @@ static int kswapd(void *p)
> >  			goto kswapd_try_sleep;
> >  	}
> >  
> > -	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> > -
> 
> Missing a restore_kswapd()?

Deliberately.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd
  2020-12-17  3:06   ` Dave Chinner
  2020-12-17  4:00     ` Matthew Wilcox
@ 2020-12-17  4:46     ` Yafang Shao
  1 sibling, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2020-12-17  4:46 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Matthew Wilcox, Christoph Hellwig, Michal Hocko,
	Andrew Morton, David Howells, jlayton, linux-fsdevel,
	linux-cachefs, linux-xfs, Linux MM, Michal Hocko,
	Christoph Hellwig

On Thu, Dec 17, 2020 at 11:06 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Dec 17, 2020 at 09:11:54AM +0800, Yafang Shao wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> >
> > Since XFS needs to pretend to be kswapd in some of its worker threads,
> > create methods to save & restore kswapd state.  Don't bother restoring
> > kswapd state in kswapd -- the only time we reach this code is when we're
> > exiting and the task_struct is about to be destroyed anyway.
> >
> > Cc: Dave Chinner <david@fromorbit.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
> >  include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
> >  mm/vmscan.c               | 16 +---------------
> >  3 files changed, 32 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 51dbff9b0908..0f35b7a38e76 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -2813,8 +2813,9 @@ xfs_btree_split_worker(
> >  {
> >       struct xfs_btree_split_args     *args = container_of(work,
> >                                               struct xfs_btree_split_args, work);
> > +     bool                    is_kswapd = args->kswapd;
> >       unsigned long           pflags;
> > -     unsigned long           new_pflags = PF_MEMALLOC_NOFS;
> > +     int                     memalloc_nofs;
> >
> >       /*
> >        * we are in a transaction context here, but may also be doing work
> > @@ -2822,16 +2823,17 @@ xfs_btree_split_worker(
> >        * temporarily to ensure that we don't block waiting for memory reclaim
> >        * in any way.
> >        */
> > -     if (args->kswapd)
> > -             new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > -
> > -     current_set_flags_nested(&pflags, new_pflags);
> > +     if (is_kswapd)
> > +             pflags = become_kswapd();
> > +     memalloc_nofs = memalloc_nofs_save();
> >
> >       args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> >                                        args->key, args->curp, args->stat);
> >       complete(args->done);
> >
> > -     current_restore_flags_nested(&pflags, new_pflags);
> > +     memalloc_nofs_restore(memalloc_nofs);
> > +     if (is_kswapd)
> > +             restore_kswapd(pflags);
> >  }
> >
> >  /*
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index d5ece7a9a403..2faf03e79a1e 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> >  }
> >  #endif
> >
> > +/*
> > + * Tell the memory management code that this thread is working on behalf
> > + * of background memory reclaim (like kswapd).  That means that it will
> > + * get access to memory reserves should it need to allocate memory in
> > + * order to make forward progress.  With this great power comes great
> > + * responsibility to not exhaust those reserves.
> > + */
> > +#define KSWAPD_PF_FLAGS              (PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
> > +
> > +static inline unsigned long become_kswapd(void)
> > +{
> > +     unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
> > +
> > +     current->flags |= KSWAPD_PF_FLAGS;
> > +
> > +     return flags;
> > +}
>
> You can get rid of the empty lines out of this function.
>
> > +static inline void restore_kswapd(unsigned long flags)
> > +{
> > +     current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
> > +}
>
> Urk, that requires thinking about to determine whether it is
> correct. And it is 3 runtime logic operations (^, ~ and &) too. The
> way all the memalloc_*_restore() functions restore the previous
> flags is obviously correct and only requires 2 runtime logic
> operations because the compiler calculates the ~ operation on the
> constant. So why do it differently here? i.e.:
>
>         current->flags = (current->flags & ~KSWAPD_PF_FLAGS) | flags;
>

I will change it as you suggested if Matthew doesn't have a different
opinion, Matthew ?


> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3870,19 +3870,7 @@ static int kswapd(void *p)
> >       if (!cpumask_empty(cpumask))
> >               set_cpus_allowed_ptr(tsk, cpumask);
> >
> > -     /*
> > -      * Tell the memory management that we're a "memory allocator",
> > -      * and that if we need more memory we should get access to it
> > -      * regardless (see "__alloc_pages()"). "kswapd" should
> > -      * never get caught in the normal page freeing logic.
> > -      *
> > -      * (Kswapd normally doesn't need memory anyway, but sometimes
> > -      * you need a small amount of memory in order to be able to
> > -      * page out something else, and this flag essentially protects
> > -      * us from recursively trying to free more memory as we're
> > -      * trying to free the first piece of memory in the first place).
> > -      */
> > -     tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > +     become_kswapd();
> >       set_freezable();
> >
> >       WRITE_ONCE(pgdat->kswapd_order, 0);
> > @@ -3932,8 +3920,6 @@ static int kswapd(void *p)
> >                       goto kswapd_try_sleep;
> >       }
> >
> > -     tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> > -
>
> Missing a restore_kswapd()?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com



-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-17  1:11 ` [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
@ 2020-12-17 22:15   ` Dave Chinner
  2020-12-17 23:06     ` Darrick J. Wong
  2020-12-19  0:28     ` Yafang Shao
  0 siblings, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2020-12-17 22:15 UTC (permalink / raw)
  To: Yafang Shao
  Cc: darrick.wong, willy, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm,
	Christoph Hellwig

On Thu, Dec 17, 2020 at 09:11:56AM +0800, Yafang Shao wrote:
> The xfs_trans context should be active after it is allocated, and
> deactive when it is freed.
> The rolling transaction should be specially considered, because in the
> case when we clear the old transaction the thread's NOFS state shouldn't
> be changed, as a result we have to set NOFS in the old transaction's
> t_pflags in xfs_trans_context_swap().
> 
> So these helpers are refactored as,
> - xfs_trans_context_set()
>   Used in xfs_trans_alloc()
> - xfs_trans_context_clear()
>   Used in xfs_trans_free()
> 
> And a new helper is instroduced to handle the rolling transaction,
> - xfs_trans_context_swap()
>   Used in rolling transaction
> 
> This patch is based on Darrick's work to fix the issue in xfs/141 in the
> earlier version. [1]
> 
> 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia

As I said in my last comments, this change of logic is not
necessary.  All we need to do is transfer the NOFS state to the new
transactions and *remove it from the old one*.

IOWs, all this patch should do is:

> @@ -119,7 +123,9 @@ xfs_trans_dup(
>  
>  	ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
>  	tp->t_rtx_res = tp->t_rtx_res_used;
> -	ntp->t_pflags = tp->t_pflags;
> +
> +	/* Associate the new transaction with this thread. */
> +	xfs_trans_context_swap(tp, ntp);
>  
>  	/* move deferred ops over to the new tp */
>  	xfs_defer_move(ntp, tp);

This, and

> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 44b11c64a15e..12380eaaf7ce 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -280,4 +280,17 @@ xfs_trans_context_clear(struct xfs_trans *tp)
>  	memalloc_nofs_restore(tp->t_pflags);
>  }
>  
> +static inline void
> +xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> +{

introduce this wrapper.

> +	ntp->t_pflags = tp->t_pflags;
> +	/*
> +	 * For the rolling transaction, we have to set NOFS in the old
> +	 * transaction's t_pflags so that when we clear the context on
> +	 * the old transaction we don't actually change the thread's NOFS
> +	 * state.
> +	 */
> +	tp->t_pflags = current->flags | PF_MEMALLOC_NOFS;
> +}

But not with this implementation.

Think for a minute, please. All we want to do is avoid clearing
the nofs state when we call xfs_trans_context_clear(tp) if the state
has been handed to another transaction.

Your current implementation hands the state to ntp, but *then leaves
it on tp* as well. So then you hack a PF_MEMALLOC_NOFS flag into
tp->t_pflags so that it doesn't clear that flag (abusing the masking
done during clearing). That's just nasty because it relies on
internal memalloc_nofs_restore() details for correct functionality.

The obvious solution: we've moved the saved process state to a
different context, so it is no longer needed for the current
transaction we are about to commit. So How about just clearing the
saved state from the original transaction when swappingi like so:

static inline void
xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
{
	ntp->t_pflags = tp->t_pflags;
	tp->t_flags = 0;
}

And now, when we go to clear the transaction context, we can simply
do this:

static inline void
xfs_trans_context_clear(struct xfs_trans *tp)
{
	if (tp->t_pflags)
		memalloc_nofs_restore(tp->t_pflags);
}

and the problem is solved. The NOFS state will follow the active
transaction and not be reset until the entire transaction chain is
completed.

In the next patch you can go and introduce current->journal_info
into just the wrapper functions, maintaining the same overall
logic.

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-17 22:15   ` Dave Chinner
@ 2020-12-17 23:06     ` Darrick J. Wong
  2020-12-18  0:07       ` Dave Chinner
  2020-12-19  0:28     ` Yafang Shao
  1 sibling, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-12-17 23:06 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Yafang Shao, willy, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm,
	Christoph Hellwig

On Fri, Dec 18, 2020 at 09:15:09AM +1100, Dave Chinner wrote:
> On Thu, Dec 17, 2020 at 09:11:56AM +0800, Yafang Shao wrote:
> > The xfs_trans context should be active after it is allocated, and
> > deactive when it is freed.
> > The rolling transaction should be specially considered, because in the
> > case when we clear the old transaction the thread's NOFS state shouldn't
> > be changed, as a result we have to set NOFS in the old transaction's
> > t_pflags in xfs_trans_context_swap().
> > 
> > So these helpers are refactored as,
> > - xfs_trans_context_set()
> >   Used in xfs_trans_alloc()
> > - xfs_trans_context_clear()
> >   Used in xfs_trans_free()
> > 
> > And a new helper is instroduced to handle the rolling transaction,
> > - xfs_trans_context_swap()
> >   Used in rolling transaction
> > 
> > This patch is based on Darrick's work to fix the issue in xfs/141 in the
> > earlier version. [1]
> > 
> > 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia
> 
> As I said in my last comments, this change of logic is not
> necessary.  All we need to do is transfer the NOFS state to the new
> transactions and *remove it from the old one*.
> 
> IOWs, all this patch should do is:
> 
> > @@ -119,7 +123,9 @@ xfs_trans_dup(
> >  
> >  	ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
> >  	tp->t_rtx_res = tp->t_rtx_res_used;
> > -	ntp->t_pflags = tp->t_pflags;
> > +
> > +	/* Associate the new transaction with this thread. */
> > +	xfs_trans_context_swap(tp, ntp);
> >  
> >  	/* move deferred ops over to the new tp */
> >  	xfs_defer_move(ntp, tp);
> 
> This, and
> 
> > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > index 44b11c64a15e..12380eaaf7ce 100644
> > --- a/fs/xfs/xfs_trans.h
> > +++ b/fs/xfs/xfs_trans.h
> > @@ -280,4 +280,17 @@ xfs_trans_context_clear(struct xfs_trans *tp)
> >  	memalloc_nofs_restore(tp->t_pflags);
> >  }
> >  
> > +static inline void
> > +xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> > +{
> 
> introduce this wrapper.
> 
> > +	ntp->t_pflags = tp->t_pflags;
> > +	/*
> > +	 * For the rolling transaction, we have to set NOFS in the old
> > +	 * transaction's t_pflags so that when we clear the context on
> > +	 * the old transaction we don't actually change the thread's NOFS
> > +	 * state.
> > +	 */
> > +	tp->t_pflags = current->flags | PF_MEMALLOC_NOFS;
> > +}
> 
> But not with this implementation.
> 
> Think for a minute, please. All we want to do is avoid clearing
> the nofs state when we call xfs_trans_context_clear(tp) if the state
> has been handed to another transaction.
> 
> Your current implementation hands the state to ntp, but *then leaves
> it on tp* as well. So then you hack a PF_MEMALLOC_NOFS flag into
> tp->t_pflags so that it doesn't clear that flag (abusing the masking
> done during clearing). That's just nasty because it relies on
> internal memalloc_nofs_restore() details for correct functionality.
> 
> The obvious solution: we've moved the saved process state to a
> different context, so it is no longer needed for the current
> transaction we are about to commit. So How about just clearing the
> saved state from the original transaction when swappingi like so:
> 
> static inline void
> xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> {
> 	ntp->t_pflags = tp->t_pflags;
> 	tp->t_flags = 0;
> }
> 
> And now, when we go to clear the transaction context, we can simply
> do this:
> 
> static inline void
> xfs_trans_context_clear(struct xfs_trans *tp)
> {
> 	if (tp->t_pflags)
> 		memalloc_nofs_restore(tp->t_pflags);
> }
> 
> and the problem is solved. The NOFS state will follow the active
> transaction and not be reset until the entire transaction chain is
> completed.

Er... correct me if I'm wrong, but I thought t_pflags stores the old
state of current->flags from before we call xfs_trans_alloc?  So if we
call into xfs_trans_alloc with current->flags==0 and we commit the
transaction having not rolled, we won't unset the _NOFS state, and exit
back to userspace with _NOFS set.

I think the logic is correct here -- we transfer the old pflags value
from @tp to @ntp, which effectively puts @ntp in charge of restoring the
old pflags value; and then we set tp->t_pflags to current->flags, which
means that @tp will "restore" the current pflags value into pflags, which
is a nop.

--D

> In the next patch you can go and introduce current->journal_info
> into just the wrapper functions, maintaining the same overall
> logic.
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-17 23:06     ` Darrick J. Wong
@ 2020-12-18  0:07       ` Dave Chinner
  2020-12-19  0:31         ` Yafang Shao
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2020-12-18  0:07 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Yafang Shao, willy, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm,
	Christoph Hellwig

On Thu, Dec 17, 2020 at 03:06:27PM -0800, Darrick J. Wong wrote:
> On Fri, Dec 18, 2020 at 09:15:09AM +1100, Dave Chinner wrote:
> > The obvious solution: we've moved the saved process state to a
> > different context, so it is no longer needed for the current
> > transaction we are about to commit. So How about just clearing the
> > saved state from the original transaction when swappingi like so:
> > 
> > static inline void
> > xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> > {
> > 	ntp->t_pflags = tp->t_pflags;
> > 	tp->t_flags = 0;
> > }
> > 
> > And now, when we go to clear the transaction context, we can simply
> > do this:
> > 
> > static inline void
> > xfs_trans_context_clear(struct xfs_trans *tp)
> > {
> > 	if (tp->t_pflags)
> > 		memalloc_nofs_restore(tp->t_pflags);
> > }
> > 
> > and the problem is solved. The NOFS state will follow the active
> > transaction and not be reset until the entire transaction chain is
> > completed.
> 
> Er... correct me if I'm wrong, but I thought t_pflags stores the old
> state of current->flags from before we call xfs_trans_alloc?  So if we
> call into xfs_trans_alloc with current->flags==0 and we commit the
> transaction having not rolled, we won't unset the _NOFS state, and exit
> back to userspace with _NOFS set.

Minor thinko.

tp->t_pflags only stores a single bit of information w.r.t
PF_MEMALLOC_NOFS state, not the entire set of current task flags:
whether it should be cleared or not on a call to
memalloc_nofs_restore(). See memalloc_nofs_save():

static inline unsigned int memalloc_nofs_save(void)
{
        unsigned int flags = current->flags & PF_MEMALLOC_NOFS;
        current->flags |= PF_MEMALLOC_NOFS;
        return flags;
}

So, yeah, I got the t_pflags logic the wrong way around here - zero
means clear it, non-zero means don't clear it. IOWs, swap:

	ntp->t_pflags = tp->t_pflags;
	tp->t_flags = -1;

and clear:

	if (!tp->t_flags)
		memalloc_nofs_restore(tp->t_pflags);

This should only be temporary code, anyway. The next patch should
change this state clearing check to use current->journal_info, then
we aren't dependent on process flags state at all.

> I think the logic is correct here -- we transfer the old pflags value
> from @tp to @ntp, which effectively puts @ntp in charge of restoring the
> old pflags value; and then we set tp->t_pflags to current->flags, which
> means that @tp will "restore" the current pflags value into pflags, which
> is a nop.

That's pretty much what the current logic guarantees. Once the
wrappers provide this same guarantee, we can greatly simply the the
transaction context state management (i.e. set on alloc, swap on
dup, clear on free). And thread handoffs can just use clear/set
appropriately.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion
  2020-12-17  1:11 ` [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
@ 2020-12-18  0:14   ` Dave Chinner
  2020-12-19  0:16     ` Yafang Shao
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2020-12-18  0:14 UTC (permalink / raw)
  To: Yafang Shao
  Cc: darrick.wong, willy, hch, mhocko, akpm, dhowells, jlayton,
	linux-fsdevel, linux-cachefs, linux-xfs, linux-mm,
	Christoph Hellwig

On Thu, Dec 17, 2020 at 09:11:57AM +0800, Yafang Shao wrote:
> PF_FSTRANS which is used to avoid transaction reservation recursion, is
> dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> PF_MEMALLOC_NOFS") and replaced by PF_MEMALLOC_NOFS which means to avoid
> filesystem reclaim recursion.
> 
> As these two flags have different meanings, we'd better reintroduce
> PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
> we can reuse the current->journal_info to do that, per Willy. As the
> check of transaction reservation recursion is used by XFS only, we can
> move the check into xfs_vm_writepage(s), per Dave.
> 
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Jeff Layton <jlayton@redhat.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  fs/iomap/buffered-io.c |  7 -------
>  fs/xfs/xfs_aops.c      | 17 +++++++++++++++++
>  fs/xfs/xfs_trans.h     | 26 +++++++++++++++++++-------
>  3 files changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 10cc7979ce38..3c53fa6ce64d 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
>  			PF_MEMALLOC))
>  		goto redirty;
>  
> -	/*
> -	 * Given that we do not allow direct reclaim to call us, we should
> -	 * never be called in a recursive filesystem reclaim context.
> -	 */
> -	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> -		goto redirty;
> -
>  	/*
>  	 * Is this page beyond the end of the file?
>  	 *
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 2371187b7615..0da0242d42c3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -568,6 +568,16 @@ xfs_vm_writepage(
>  {
>  	struct xfs_writepage_ctx wpc = { };
>  
> +	/*
> +	 * Given that we do not allow direct reclaim to call us, we should
> +	 * never be called while in a filesystem transaction.
> +	 */

Comment is wrong. This is not protecting against direct reclaim
recursion, this is protecting against writeback from within a
transaction context.

Best to remove the comment altogether, because it is largely
redundant.

> +	if (WARN_ON_ONCE(xfs_trans_context_active())) {
> +		redirty_page_for_writepage(wbc, page);
> +		unlock_page(page);
> +		return 0;
> +	}
> +
>  	return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
>  }
>  
> @@ -579,6 +589,13 @@ xfs_vm_writepages(
>  	struct xfs_writepage_ctx wpc = { };
>  
>  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> +	/*
> +	 * Given that we do not allow direct reclaim to call us, we should
> +	 * never be called while in a filesystem transaction.
> +	 */

same here.

> +	if (WARN_ON_ONCE(xfs_trans_context_active()))
> +		return 0;
> +
>  	return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
>  }
>  
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 12380eaaf7ce..0c8140147b9b 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -268,29 +268,41 @@ xfs_trans_item_relog(
>  	return lip->li_ops->iop_relog(lip, tp);
>  }
>  
> +static inline bool
> +xfs_trans_context_active(void)
> +{
> +	/* Use journal_info to indicate current is in a transaction */
> +	return current->journal_info != NULL;
> +}

Comment is not necessary.

> +
>  static inline void
>  xfs_trans_context_set(struct xfs_trans *tp)
>  {
> +	ASSERT(!current->journal_info);
> +	current->journal_info = tp;
>  	tp->t_pflags = memalloc_nofs_save();
>  }
>  
>  static inline void
>  xfs_trans_context_clear(struct xfs_trans *tp)
>  {
> +	/*
> +	 * If xfs_trans_context_swap() handed the NOFS context to a
> +	 * new transaction we do not clear the context here.
> +	 */

It's a transaction context, not a "NOFS context". Setting NOFS is
just something we implement inside the transaction context. More
correct would be:

	/*
	 * If we handed over the context via xfs_trans_context_swap() then 
	 * the context is no longer ours to clear.
	 */

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion
  2020-12-18  0:14   ` Dave Chinner
@ 2020-12-19  0:16     ` Yafang Shao
  0 siblings, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2020-12-19  0:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Matthew Wilcox, Christoph Hellwig, Michal Hocko,
	Andrew Morton, David Howells, jlayton, linux-fsdevel,
	linux-cachefs, linux-xfs, Linux MM, Christoph Hellwig

On Fri, Dec 18, 2020 at 8:14 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Dec 17, 2020 at 09:11:57AM +0800, Yafang Shao wrote:
> > PF_FSTRANS which is used to avoid transaction reservation recursion, is
> > dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> > PF_MEMALLOC_NOFS") and replaced by PF_MEMALLOC_NOFS which means to avoid
> > filesystem reclaim recursion.
> >
> > As these two flags have different meanings, we'd better reintroduce
> > PF_FSTRANS back. To avoid wasting the space of PF_* flags in task_struct,
> > we can reuse the current->journal_info to do that, per Willy. As the
> > check of transaction reservation recursion is used by XFS only, we can
> > move the check into xfs_vm_writepage(s), per Dave.
> >
> > Cc: Darrick J. Wong <darrick.wong@oracle.com>
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Jeff Layton <jlayton@redhat.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  fs/iomap/buffered-io.c |  7 -------
> >  fs/xfs/xfs_aops.c      | 17 +++++++++++++++++
> >  fs/xfs/xfs_trans.h     | 26 +++++++++++++++++++-------
> >  3 files changed, 36 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 10cc7979ce38..3c53fa6ce64d 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1458,13 +1458,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
> >                       PF_MEMALLOC))
> >               goto redirty;
> >
> > -     /*
> > -      * Given that we do not allow direct reclaim to call us, we should
> > -      * never be called in a recursive filesystem reclaim context.
> > -      */
> > -     if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> > -             goto redirty;
> > -
> >       /*
> >        * Is this page beyond the end of the file?
> >        *
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 2371187b7615..0da0242d42c3 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -568,6 +568,16 @@ xfs_vm_writepage(
> >  {
> >       struct xfs_writepage_ctx wpc = { };
> >
> > +     /*
> > +      * Given that we do not allow direct reclaim to call us, we should
> > +      * never be called while in a filesystem transaction.
> > +      */
>
> Comment is wrong. This is not protecting against direct reclaim
> recursion, this is protecting against writeback from within a
> transaction context.
>

Ah, I forgot to change this comment after copy and paste. Thanks for
pointing it out.

> Best to remove the comment altogether, because it is largely
> redundant.
>

Sure, I will remove these comments.

> > +     if (WARN_ON_ONCE(xfs_trans_context_active())) {
> > +             redirty_page_for_writepage(wbc, page);
> > +             unlock_page(page);
> > +             return 0;
> > +     }
> > +
> >       return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
> >  }
> >
> > @@ -579,6 +589,13 @@ xfs_vm_writepages(
> >       struct xfs_writepage_ctx wpc = { };
> >
> >       xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> > +     /*
> > +      * Given that we do not allow direct reclaim to call us, we should
> > +      * never be called while in a filesystem transaction.
> > +      */
>
> same here.
>
> > +     if (WARN_ON_ONCE(xfs_trans_context_active()))
> > +             return 0;
> > +
> >       return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
> >  }
> >
> > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > index 12380eaaf7ce..0c8140147b9b 100644
> > --- a/fs/xfs/xfs_trans.h
> > +++ b/fs/xfs/xfs_trans.h
> > @@ -268,29 +268,41 @@ xfs_trans_item_relog(
> >       return lip->li_ops->iop_relog(lip, tp);
> >  }
> >
> > +static inline bool
> > +xfs_trans_context_active(void)
> > +{
> > +     /* Use journal_info to indicate current is in a transaction */
> > +     return current->journal_info != NULL;
> > +}
>
> Comment is not necessary.
>
> > +
> >  static inline void
> >  xfs_trans_context_set(struct xfs_trans *tp)
> >  {
> > +     ASSERT(!current->journal_info);
> > +     current->journal_info = tp;
> >       tp->t_pflags = memalloc_nofs_save();
> >  }
> >
> >  static inline void
> >  xfs_trans_context_clear(struct xfs_trans *tp)
> >  {
> > +     /*
> > +      * If xfs_trans_context_swap() handed the NOFS context to a
> > +      * new transaction we do not clear the context here.
> > +      */
>
> It's a transaction context, not a "NOFS context". Setting NOFS is
> just something we implement inside the transaction context. More
> correct would be:
>
>         /*
>          * If we handed over the context via xfs_trans_context_swap() then
>          * the context is no longer ours to clear.
>          */
>

Sure, I will change it.


-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-17 22:15   ` Dave Chinner
  2020-12-17 23:06     ` Darrick J. Wong
@ 2020-12-19  0:28     ` Yafang Shao
  1 sibling, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2020-12-19  0:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Matthew Wilcox, Christoph Hellwig, Michal Hocko,
	Andrew Morton, David Howells, jlayton, linux-fsdevel,
	linux-cachefs, linux-xfs, Linux MM, Christoph Hellwig

On Fri, Dec 18, 2020 at 6:15 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Dec 17, 2020 at 09:11:56AM +0800, Yafang Shao wrote:
> > The xfs_trans context should be active after it is allocated, and
> > deactive when it is freed.
> > The rolling transaction should be specially considered, because in the
> > case when we clear the old transaction the thread's NOFS state shouldn't
> > be changed, as a result we have to set NOFS in the old transaction's
> > t_pflags in xfs_trans_context_swap().
> >
> > So these helpers are refactored as,
> > - xfs_trans_context_set()
> >   Used in xfs_trans_alloc()
> > - xfs_trans_context_clear()
> >   Used in xfs_trans_free()
> >
> > And a new helper is instroduced to handle the rolling transaction,
> > - xfs_trans_context_swap()
> >   Used in rolling transaction
> >
> > This patch is based on Darrick's work to fix the issue in xfs/141 in the
> > earlier version. [1]
> >
> > 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia
>
> As I said in my last comments, this change of logic is not
> necessary.  All we need to do is transfer the NOFS state to the new
> transactions and *remove it from the old one*.
>

Thanks for the explanation, I will change it.

> IOWs, all this patch should do is:
>
> > @@ -119,7 +123,9 @@ xfs_trans_dup(
> >
> >       ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
> >       tp->t_rtx_res = tp->t_rtx_res_used;
> > -     ntp->t_pflags = tp->t_pflags;
> > +
> > +     /* Associate the new transaction with this thread. */
> > +     xfs_trans_context_swap(tp, ntp);
> >
> >       /* move deferred ops over to the new tp */
> >       xfs_defer_move(ntp, tp);
>
> This, and
>
> > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > index 44b11c64a15e..12380eaaf7ce 100644
> > --- a/fs/xfs/xfs_trans.h
> > +++ b/fs/xfs/xfs_trans.h
> > @@ -280,4 +280,17 @@ xfs_trans_context_clear(struct xfs_trans *tp)
> >       memalloc_nofs_restore(tp->t_pflags);
> >  }
> >
> > +static inline void
> > +xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> > +{
>
> introduce this wrapper.
>
> > +     ntp->t_pflags = tp->t_pflags;
> > +     /*
> > +      * For the rolling transaction, we have to set NOFS in the old
> > +      * transaction's t_pflags so that when we clear the context on
> > +      * the old transaction we don't actually change the thread's NOFS
> > +      * state.
> > +      */
> > +     tp->t_pflags = current->flags | PF_MEMALLOC_NOFS;
> > +}
>
> But not with this implementation.
>
> Think for a minute, please. All we want to do is avoid clearing
> the nofs state when we call xfs_trans_context_clear(tp) if the state
> has been handed to another transaction.
>
> Your current implementation hands the state to ntp, but *then leaves
> it on tp* as well. So then you hack a PF_MEMALLOC_NOFS flag into
> tp->t_pflags so that it doesn't clear that flag (abusing the masking
> done during clearing). That's just nasty because it relies on
> internal memalloc_nofs_restore() details for correct functionality.
>
> The obvious solution: we've moved the saved process state to a
> different context, so it is no longer needed for the current
> transaction we are about to commit. So How about just clearing the
> saved state from the original transaction when swappingi like so:
>
> static inline void
> xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> {
>         ntp->t_pflags = tp->t_pflags;
>         tp->t_flags = 0;
> }
>
> And now, when we go to clear the transaction context, we can simply
> do this:
>
> static inline void
> xfs_trans_context_clear(struct xfs_trans *tp)
> {
>         if (tp->t_pflags)
>                 memalloc_nofs_restore(tp->t_pflags);
> }
>
> and the problem is solved. The NOFS state will follow the active
> transaction and not be reset until the entire transaction chain is
> completed.
>
> In the next patch you can go and introduce current->journal_info
> into just the wrapper functions, maintaining the same overall
> logic.
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com



-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear}
  2020-12-18  0:07       ` Dave Chinner
@ 2020-12-19  0:31         ` Yafang Shao
  0 siblings, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2020-12-19  0:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Matthew Wilcox, Christoph Hellwig, Michal Hocko,
	Andrew Morton, David Howells, jlayton, linux-fsdevel,
	linux-cachefs, linux-xfs, Linux MM, Christoph Hellwig

On Fri, Dec 18, 2020 at 8:07 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Dec 17, 2020 at 03:06:27PM -0800, Darrick J. Wong wrote:
> > On Fri, Dec 18, 2020 at 09:15:09AM +1100, Dave Chinner wrote:
> > > The obvious solution: we've moved the saved process state to a
> > > different context, so it is no longer needed for the current
> > > transaction we are about to commit. So How about just clearing the
> > > saved state from the original transaction when swappingi like so:
> > >
> > > static inline void
> > > xfs_trans_context_swap(struct xfs_trans *tp, struct xfs_trans *ntp)
> > > {
> > >     ntp->t_pflags = tp->t_pflags;
> > >     tp->t_flags = 0;
> > > }
> > >
> > > And now, when we go to clear the transaction context, we can simply
> > > do this:
> > >
> > > static inline void
> > > xfs_trans_context_clear(struct xfs_trans *tp)
> > > {
> > >     if (tp->t_pflags)
> > >             memalloc_nofs_restore(tp->t_pflags);
> > > }
> > >
> > > and the problem is solved. The NOFS state will follow the active
> > > transaction and not be reset until the entire transaction chain is
> > > completed.
> >
> > Er... correct me if I'm wrong, but I thought t_pflags stores the old
> > state of current->flags from before we call xfs_trans_alloc?  So if we
> > call into xfs_trans_alloc with current->flags==0 and we commit the
> > transaction having not rolled, we won't unset the _NOFS state, and exit
> > back to userspace with _NOFS set.
>
> Minor thinko.
>
> tp->t_pflags only stores a single bit of information w.r.t
> PF_MEMALLOC_NOFS state, not the entire set of current task flags:
> whether it should be cleared or not on a call to
> memalloc_nofs_restore(). See memalloc_nofs_save():
>
> static inline unsigned int memalloc_nofs_save(void)
> {
>         unsigned int flags = current->flags & PF_MEMALLOC_NOFS;
>         current->flags |= PF_MEMALLOC_NOFS;
>         return flags;
> }
>
> So, yeah, I got the t_pflags logic the wrong way around here - zero
> means clear it, non-zero means don't clear it. IOWs, swap:
>
>         ntp->t_pflags = tp->t_pflags;
>         tp->t_flags = -1;
>
> and clear:
>
>         if (!tp->t_flags)
>                 memalloc_nofs_restore(tp->t_pflags);
>
> This should only be temporary code, anyway. The next patch should
> change this state clearing check to use current->journal_info, then
> we aren't dependent on process flags state at all.
>
> > I think the logic is correct here -- we transfer the old pflags value
> > from @tp to @ntp, which effectively puts @ntp in charge of restoring the
> > old pflags value; and then we set tp->t_pflags to current->flags, which
> > means that @tp will "restore" the current pflags value into pflags, which
> > is a nop.
>
> That's pretty much what the current logic guarantees. Once the
> wrappers provide this same guarantee, we can greatly simply the the
> transaction context state management (i.e. set on alloc, swap on
> dup, clear on free). And thread handoffs can just use clear/set
> appropriately.
>

Make sense to me.


-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-12-19  0:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17  1:11 [PATCH v13 0/4] xfs: avoid transaction reservation recursion Yafang Shao
2020-12-17  1:11 ` [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd Yafang Shao
2020-12-17  3:06   ` Dave Chinner
2020-12-17  4:00     ` Matthew Wilcox
2020-12-17  4:46     ` Yafang Shao
2020-12-17  1:11 ` [PATCH v13 2/4] xfs: use memalloc_nofs_{save,restore} in xfs transaction Yafang Shao
2020-12-17  1:11 ` [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set,clear} Yafang Shao
2020-12-17 22:15   ` Dave Chinner
2020-12-17 23:06     ` Darrick J. Wong
2020-12-18  0:07       ` Dave Chinner
2020-12-19  0:31         ` Yafang Shao
2020-12-19  0:28     ` Yafang Shao
2020-12-17  1:11 ` [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion Yafang Shao
2020-12-18  0:14   ` Dave Chinner
2020-12-19  0:16     ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).