linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 resend 0/2] avoid xfs transaction reservation recursion
@ 2020-11-03 13:17 Yafang Shao
  2020-11-03 13:17 ` [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd Yafang Shao
  2020-11-03 13:17 ` [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion Yafang Shao
  0 siblings, 2 replies; 10+ messages in thread
From: Yafang Shao @ 2020-11-03 13:17 UTC (permalink / raw)
  To: akpm
  Cc: david, hch, darrick.wong, willy, mhocko, linux-mm, linux-fsdevel,
	linux-xfs, Yafang Shao

This patchset avoids transaction reservation recursion by reintroducing
the discarded PF_FSTRANS in a new way, suggested by Dave. In this new
implementation, four new helpers are introduced, which are
xfs_trans_context_{set, clear, update, active},
suggested by Dave. And re-using the task->journal_info to indicates
whehter the task is in fstrans or not, suggested by Willy

Patch #1 is picked from Willy's patchset "Overhaul memalloc_no*"[1]

[1]
https://lore.kernel.org/linux-mm/20200625113122.7540-1-willy@infradead.org/

v8:
- check xfs_trans_context_active() in xfs_vm_writepage(s), per Dave.

v7:
- check fstrans recursion for XFS only, by introducing a new member in
  struct writeback_control.

v6:
- add Michal's ack and comment in patch #1.

v5:
- pick one of Willy's patch
- introduce four new helpers, per Dave

v4:
- retitle from "xfs: introduce task->in_fstrans for transaction reservation
  recursion protection"
- reuse current->journal_info, per Willy


Matthew Wilcox (Oracle) (1):
  mm: Add become_kswapd and restore_kswapd

Yafang Shao (1):
  xfs: avoid transaction reservation recursion

 fs/iomap/buffered-io.c    |  7 -------
 fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
 fs/xfs/xfs_aops.c         | 23 +++++++++++++++++++++--
 fs/xfs/xfs_linux.h        |  4 ----
 fs/xfs/xfs_trans.c        | 19 +++++++++----------
 fs/xfs/xfs_trans.h        | 30 ++++++++++++++++++++++++++++++
 include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
 mm/vmscan.c               | 16 +---------------
 8 files changed, 92 insertions(+), 44 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd
  2020-11-03 13:17 [PATCH v8 resend 0/2] avoid xfs transaction reservation recursion Yafang Shao
@ 2020-11-03 13:17 ` Yafang Shao
  2020-11-03 19:48   ` Darrick J. Wong
  2020-11-03 13:17 ` [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion Yafang Shao
  1 sibling, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2020-11-03 13:17 UTC (permalink / raw)
  To: akpm
  Cc: david, hch, darrick.wong, willy, mhocko, linux-mm, linux-fsdevel,
	linux-xfs, Yafang Shao

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Since XFS needs to pretend to be kswapd in some of its worker threads,
create methods to save & restore kswapd state.  Don't bother restoring
kswapd state in kswapd -- the only time we reach this code is when we're
exiting and the task_struct is about to be destroyed anyway.

Cc: Dave Chinner <david@fromorbit.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
 include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
 mm/vmscan.c               | 16 +---------------
 3 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2d25bab..a04a442 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -2813,8 +2813,9 @@ struct xfs_btree_split_args {
 {
 	struct xfs_btree_split_args	*args = container_of(work,
 						struct xfs_btree_split_args, work);
+	bool			is_kswapd = args->kswapd;
 	unsigned long		pflags;
-	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
+	int			memalloc_nofs;
 
 	/*
 	 * we are in a transaction context here, but may also be doing work
@@ -2822,16 +2823,17 @@ struct xfs_btree_split_args {
 	 * temporarily to ensure that we don't block waiting for memory reclaim
 	 * in any way.
 	 */
-	if (args->kswapd)
-		new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
-
-	current_set_flags_nested(&pflags, new_pflags);
+	if (is_kswapd)
+		pflags = become_kswapd();
+	memalloc_nofs = memalloc_nofs_save();
 
 	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
 					 args->key, args->curp, args->stat);
 	complete(args->done);
 
-	current_restore_flags_nested(&pflags, new_pflags);
+	memalloc_nofs_restore(memalloc_nofs);
+	if (is_kswapd)
+		restore_kswapd(pflags);
 }
 
 /*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5ece7a..2faf03e 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
 }
 #endif
 
+/*
+ * Tell the memory management code that this thread is working on behalf
+ * of background memory reclaim (like kswapd).  That means that it will
+ * get access to memory reserves should it need to allocate memory in
+ * order to make forward progress.  With this great power comes great
+ * responsibility to not exhaust those reserves.
+ */
+#define KSWAPD_PF_FLAGS		(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
+
+static inline unsigned long become_kswapd(void)
+{
+	unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
+
+	current->flags |= KSWAPD_PF_FLAGS;
+
+	return flags;
+}
+
+static inline void restore_kswapd(unsigned long flags)
+{
+	current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
+}
+
 #ifdef CONFIG_MEMCG
 DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
 /**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1b8f0e0..77bc1dd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3869,19 +3869,7 @@ static int kswapd(void *p)
 	if (!cpumask_empty(cpumask))
 		set_cpus_allowed_ptr(tsk, cpumask);
 
-	/*
-	 * Tell the memory management that we're a "memory allocator",
-	 * and that if we need more memory we should get access to it
-	 * regardless (see "__alloc_pages()"). "kswapd" should
-	 * never get caught in the normal page freeing logic.
-	 *
-	 * (Kswapd normally doesn't need memory anyway, but sometimes
-	 * you need a small amount of memory in order to be able to
-	 * page out something else, and this flag essentially protects
-	 * us from recursively trying to free more memory as we're
-	 * trying to free the first piece of memory in the first place).
-	 */
-	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
+	become_kswapd();
 	set_freezable();
 
 	WRITE_ONCE(pgdat->kswapd_order, 0);
@@ -3931,8 +3919,6 @@ static int kswapd(void *p)
 			goto kswapd_try_sleep;
 	}
 
-	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
-
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion
  2020-11-03 13:17 [PATCH v8 resend 0/2] avoid xfs transaction reservation recursion Yafang Shao
  2020-11-03 13:17 ` [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd Yafang Shao
@ 2020-11-03 13:17 ` Yafang Shao
  2020-11-04  0:16   ` Darrick J. Wong
  1 sibling, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2020-11-03 13:17 UTC (permalink / raw)
  To: akpm
  Cc: david, hch, darrick.wong, willy, mhocko, linux-mm, linux-fsdevel,
	linux-xfs, Yafang Shao

PF_FSTRANS which is used to avoid transaction reservation recursion, is
dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
means to avoid filesystem reclaim recursion. That change is subtle.
Let's take the exmple of the check of WARN_ON_ONCE(current->flags &
PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to
PF_MEMALLOC_NOFS is not proper.
Below comment is quoted from Dave,
> It wasn't for memory allocation recursion protection in XFS - it was for
> transaction reservation recursion protection by something trying to flush
> data pages while holding a transaction reservation. Doing
> this could deadlock the journal because the existing reservation
> could prevent the nested reservation for being able to reserve space
> in the journal and that is a self-deadlock vector.
> IOWs, this check is not protecting against memory reclaim recursion
> bugs at all (that's the previous check [1]). This check is
> protecting against the filesystem calling writepages directly from a
> context where it can self-deadlock.
> So what we are seeing here is that the PF_FSTRANS ->
> PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> about what type of error this check was protecting against.

As a result, we should reintroduce PF_FSTRANS. As current->journal_info
isn't used in XFS, we can reuse it to indicate whehter the task is in
fstrans or not, Per Willy. To achieve that, four new helpers are introduce
in this patch, per Dave:
- xfs_trans_context_set()
  Used in xfs_trans_alloc()
- xfs_trans_context_clear()
  Used in xfs_trans_commit() and xfs_trans_cancel()
- xfs_trans_context_update()
  Used in xfs_trans_roll()
- xfs_trans_context_active()
  To check whehter current is in fs transcation or not
[1]. Below check is to avoid memory reclaim recursion.
if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
        PF_MEMALLOC))
        goto redirty;

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>
---
 fs/iomap/buffered-io.c |  7 -------
 fs/xfs/xfs_aops.c      | 23 +++++++++++++++++++++--
 fs/xfs/xfs_linux.h     |  4 ----
 fs/xfs/xfs_trans.c     | 19 +++++++++----------
 fs/xfs/xfs_trans.h     | 30 ++++++++++++++++++++++++++++++
 5 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8180061..2f090b6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1469,13 +1469,6 @@ static void iomap_writepage_end_bio(struct bio *bio)
 		goto redirty;
 
 	/*
-	 * Given that we do not allow direct reclaim to call us, we should
-	 * never be called in a recursive filesystem reclaim context.
-	 */
-	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
-		goto redirty;
-
-	/*
 	 * Is this page beyond the end of the file?
 	 *
 	 * The page index is less than the end_index, adjust the end_offset
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 55d126d..b25196a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -62,7 +62,8 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
 	 * We hand off the transaction to the completion thread now, so
 	 * clear the flag here.
 	 */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
+
 	return 0;
 }
 
@@ -125,7 +126,7 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
 	 * thus we need to mark ourselves as being in a transaction manually.
 	 * Similarly for freeze protection.
 	 */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_set(tp);
 	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
 
 	/* we abort the update if there was an IO error */
@@ -564,6 +565,16 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
 {
 	struct xfs_writepage_ctx wpc = { };
 
+	/*
+	 * Given that we do not allow direct reclaim to call us, we should
+	 * never be called while in a filesystem transaction.
+	 */
+	if (xfs_trans_context_active()) {
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+		return 0;
+	}
+
 	return iomap_writepage(page, wbc, &wpc.ctx, &xfs_writeback_ops);
 }
 
@@ -575,6 +586,14 @@ static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
 	struct xfs_writepage_ctx wpc = { };
 
 	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
+
+	/*
+	 * Given that we do not allow direct reclaim to call us, we should
+	 * never be called while in a filesystem transaction.
+	 */
+	if (xfs_trans_context_active())
+		return 0;
+
 	return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops);
 }
 
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 5b7a1e2..6ab0f80 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -102,10 +102,6 @@
 #define xfs_cowb_secs		xfs_params.cowb_timer.val
 
 #define current_cpu()		(raw_smp_processor_id())
-#define current_set_flags_nested(sp, f)		\
-		(*(sp) = current->flags, current->flags |= (f))
-#define current_restore_flags_nested(sp, f)	\
-		(current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
 
 #define NBBY		8		/* number of bits per byte */
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index c94e71f..b272d07 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -153,8 +153,6 @@
 	int			error = 0;
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 
-	/* Mark this thread as being in a transaction */
-	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
 
 	/*
 	 * Attempt to reserve the needed disk blocks by decrementing
@@ -163,10 +161,8 @@
 	 */
 	if (blocks > 0) {
 		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
-		if (error != 0) {
-			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+		if (error != 0)
 			return -ENOSPC;
-		}
 		tp->t_blk_res += blocks;
 	}
 
@@ -241,8 +237,6 @@
 		tp->t_blk_res = 0;
 	}
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
-
 	return error;
 }
 
@@ -284,6 +278,8 @@
 	INIT_LIST_HEAD(&tp->t_dfops);
 	tp->t_firstblock = NULLFSBLOCK;
 
+	/* Mark this thread as being in a transaction */
+	xfs_trans_context_set(tp);
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
 	if (error) {
 		xfs_trans_cancel(tp);
@@ -878,7 +874,8 @@
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	if (!regrant)
+		xfs_trans_context_clear(tp);
 	xfs_trans_free(tp);
 
 	/*
@@ -910,7 +907,8 @@
 			xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
 		tp->t_ticket = NULL;
 	}
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+
+	xfs_trans_context_clear(tp);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -971,7 +969,7 @@
 	}
 
 	/* mark this thread as no longer being in a transaction */
-	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
+	xfs_trans_context_clear(tp);
 
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
@@ -1013,6 +1011,7 @@
 	if (error)
 		return error;
 
+	xfs_trans_context_update(trans, *tpp);
 	/*
 	 * Reserve space in the log for the next transaction.
 	 * This also pushes items in the "AIL", the list of logged items,
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 0846589..c4877afc 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -268,4 +268,34 @@ void		xfs_trans_buf_copy_type(struct xfs_buf *dst_bp,
 	return lip->li_ops->iop_relog(lip, tp);
 }
 
+static inline void
+xfs_trans_context_set(struct xfs_trans *tp)
+{
+	ASSERT(!current->journal_info);
+	current->journal_info = tp;
+	tp->t_pflags = memalloc_nofs_save();
+}
+
+static inline void
+xfs_trans_context_update(struct xfs_trans *old, struct xfs_trans *new)
+{
+	ASSERT(current->journal_info == old);
+	current->journal_info = new;
+}
+
+static inline void
+xfs_trans_context_clear(struct xfs_trans *tp)
+{
+	ASSERT(current->journal_info == tp);
+	current->journal_info = NULL;
+	memalloc_nofs_restore(tp->t_pflags);
+}
+
+static inline bool
+xfs_trans_context_active(void)
+{
+	/* Use journal_info to indicate current is in a transaction */
+	return current->journal_info != NULL;
+}
+
 #endif	/* __XFS_TRANS_H__ */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd
  2020-11-03 13:17 ` [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd Yafang Shao
@ 2020-11-03 19:48   ` Darrick J. Wong
  2020-11-04 14:17     ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2020-11-03 19:48 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, david, hch, willy, mhocko, linux-mm, linux-fsdevel, linux-xfs

On Tue, Nov 03, 2020 at 09:17:53PM +0800, Yafang Shao wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> Since XFS needs to pretend to be kswapd in some of its worker threads,
> create methods to save & restore kswapd state.  Don't bother restoring
> kswapd state in kswapd -- the only time we reach this code is when we're
> exiting and the task_struct is about to be destroyed anyway.
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
>  include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
>  mm/vmscan.c               | 16 +---------------
>  3 files changed, 32 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 2d25bab..a04a442 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -2813,8 +2813,9 @@ struct xfs_btree_split_args {
>  {
>  	struct xfs_btree_split_args	*args = container_of(work,
>  						struct xfs_btree_split_args, work);
> +	bool			is_kswapd = args->kswapd;
>  	unsigned long		pflags;
> -	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
> +	int			memalloc_nofs;
>  
>  	/*
>  	 * we are in a transaction context here, but may also be doing work
> @@ -2822,16 +2823,17 @@ struct xfs_btree_split_args {
>  	 * temporarily to ensure that we don't block waiting for memory reclaim
>  	 * in any way.
>  	 */
> -	if (args->kswapd)
> -		new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> -
> -	current_set_flags_nested(&pflags, new_pflags);
> +	if (is_kswapd)
> +		pflags = become_kswapd();
> +	memalloc_nofs = memalloc_nofs_save();
>  
>  	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
>  					 args->key, args->curp, args->stat);
>  	complete(args->done);
>  
> -	current_restore_flags_nested(&pflags, new_pflags);
> +	memalloc_nofs_restore(memalloc_nofs);
> +	if (is_kswapd)
> +		restore_kswapd(pflags);

Note that there's a trivial merge conflict with the mrlock_t removal
series.  I'll carry the fix in the tree, assuming that everything
passes.

--D

>  }
>  
>  /*
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index d5ece7a..2faf03e 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
>  }
>  #endif
>  
> +/*
> + * Tell the memory management code that this thread is working on behalf
> + * of background memory reclaim (like kswapd).  That means that it will
> + * get access to memory reserves should it need to allocate memory in
> + * order to make forward progress.  With this great power comes great
> + * responsibility to not exhaust those reserves.
> + */
> +#define KSWAPD_PF_FLAGS		(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
> +
> +static inline unsigned long become_kswapd(void)
> +{
> +	unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
> +
> +	current->flags |= KSWAPD_PF_FLAGS;
> +
> +	return flags;
> +}
> +
> +static inline void restore_kswapd(unsigned long flags)
> +{
> +	current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
> +}
> +
>  #ifdef CONFIG_MEMCG
>  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
>  /**
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1b8f0e0..77bc1dd 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3869,19 +3869,7 @@ static int kswapd(void *p)
>  	if (!cpumask_empty(cpumask))
>  		set_cpus_allowed_ptr(tsk, cpumask);
>  
> -	/*
> -	 * Tell the memory management that we're a "memory allocator",
> -	 * and that if we need more memory we should get access to it
> -	 * regardless (see "__alloc_pages()"). "kswapd" should
> -	 * never get caught in the normal page freeing logic.
> -	 *
> -	 * (Kswapd normally doesn't need memory anyway, but sometimes
> -	 * you need a small amount of memory in order to be able to
> -	 * page out something else, and this flag essentially protects
> -	 * us from recursively trying to free more memory as we're
> -	 * trying to free the first piece of memory in the first place).
> -	 */
> -	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> +	become_kswapd();
>  	set_freezable();
>  
>  	WRITE_ONCE(pgdat->kswapd_order, 0);
> @@ -3931,8 +3919,6 @@ static int kswapd(void *p)
>  			goto kswapd_try_sleep;
>  	}
>  
> -	tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> -
>  	return 0;
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion
  2020-11-03 13:17 ` [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion Yafang Shao
@ 2020-11-04  0:16   ` Darrick J. Wong
  2020-11-04 14:11     ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2020-11-04  0:16 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, david, hch, willy, mhocko, linux-mm, linux-fsdevel, linux-xfs

On Tue, Nov 03, 2020 at 09:17:54PM +0800, Yafang Shao wrote:
> PF_FSTRANS which is used to avoid transaction reservation recursion, is
> dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> means to avoid filesystem reclaim recursion. That change is subtle.
> Let's take the exmple of the check of WARN_ON_ONCE(current->flags &
> PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to
> PF_MEMALLOC_NOFS is not proper.
> Below comment is quoted from Dave,
> > It wasn't for memory allocation recursion protection in XFS - it was for
> > transaction reservation recursion protection by something trying to flush
> > data pages while holding a transaction reservation. Doing
> > this could deadlock the journal because the existing reservation
> > could prevent the nested reservation for being able to reserve space
> > in the journal and that is a self-deadlock vector.
> > IOWs, this check is not protecting against memory reclaim recursion
> > bugs at all (that's the previous check [1]). This check is
> > protecting against the filesystem calling writepages directly from a
> > context where it can self-deadlock.
> > So what we are seeing here is that the PF_FSTRANS ->
> > PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> > about what type of error this check was protecting against.
> 
> As a result, we should reintroduce PF_FSTRANS. As current->journal_info
> isn't used in XFS, we can reuse it to indicate whehter the task is in
> fstrans or not, Per Willy. To achieve that, four new helpers are introduce
> in this patch, per Dave:
> - xfs_trans_context_set()
>   Used in xfs_trans_alloc()
> - xfs_trans_context_clear()
>   Used in xfs_trans_commit() and xfs_trans_cancel()
> - xfs_trans_context_update()
>   Used in xfs_trans_roll()
> - xfs_trans_context_active()
>   To check whehter current is in fs transcation or not
> [1]. Below check is to avoid memory reclaim recursion.
> if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
>         PF_MEMALLOC))
>         goto redirty;
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Urrrrk, I found some problems with this patch while testing.  xfs/141
blows up with:

XFS: Assertion failed: current->journal_info == tp, file:
fs/xfs/xfs_trans.h, line: 289

The call trace is very garbled, but I think it is:

+[ 1815.870749]  __xfs_trans_commit+0x4df/0x680 [xfs]
+[ 1815.871342]  xfs_symlink+0x5ec/0xac0 [xfs]
+[ 1815.871834]  ? lock_release+0x20d/0x450
+[ 1815.872280]  ? get_cached_acl+0x32/0x250
+[ 1815.872847]  xfs_vn_symlink+0x8d/0x1b0 [xfs]
+[ 1815.873742]  vfs_symlink+0xc7/0x150
+[ 1815.874356]  do_symlinkat+0x83/0x110
+[ 1815.874788]  do_syscall_64+0x31/0x40
+[ 1815.875204]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
+[ 1815.875781] RIP: 0033:0x7f2317fc6d7b


> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index c94e71f..b272d07 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -153,8 +153,6 @@
>  	int			error = 0;
>  	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
>  
> -	/* Mark this thread as being in a transaction */
> -	current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
>  
>  	/*
>  	 * Attempt to reserve the needed disk blocks by decrementing
> @@ -163,10 +161,8 @@
>  	 */
>  	if (blocks > 0) {
>  		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> -		if (error != 0) {
> -			current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +		if (error != 0)
>  			return -ENOSPC;
> -		}
>  		tp->t_blk_res += blocks;
>  	}
>  
> @@ -241,8 +237,6 @@
>  		tp->t_blk_res = 0;
>  	}
>  
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> -
>  	return error;
>  }
>  
> @@ -284,6 +278,8 @@
>  	INIT_LIST_HEAD(&tp->t_dfops);
>  	tp->t_firstblock = NULLFSBLOCK;
>  
> +	/* Mark this thread as being in a transaction */
> +	xfs_trans_context_set(tp);
>  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
>  	if (error) {
>  		xfs_trans_cancel(tp);

You're missing a xfs_trans_context_clear() call here.

> @@ -878,7 +874,8 @@
>  
>  	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
>  
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	if (!regrant)
> +		xfs_trans_context_clear(tp);
>  	xfs_trans_free(tp);
>  
>  	/*
> @@ -910,7 +907,8 @@
>  			xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
>  		tp->t_ticket = NULL;
>  	}
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +
> +	xfs_trans_context_clear(tp);
>  	xfs_trans_free_items(tp, !!error);
>  	xfs_trans_free(tp);
>  
> @@ -971,7 +969,7 @@
>  	}
>  
>  	/* mark this thread as no longer being in a transaction */
> -	current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> +	xfs_trans_context_clear(tp);
>  
>  	xfs_trans_free_items(tp, dirty);
>  	xfs_trans_free(tp);
> @@ -1013,6 +1011,7 @@
>  	if (error)
>  		return error;
>  
> +	xfs_trans_context_update(trans, *tpp);

Two bugs here: First, xfs_trans_commit freed @trans, which means that
the assertion commits a UAF error.  Second, if the transaction is
synchronous and the xfs_log_force_lsn at the bottom of
__xfs_trans_commit fails, we'll abort everything without clearing
current->journal_info or restoring the memalloc flags.

Personally I think you should just clear the context from xfs_trans_free
if current->journal_info points to the transaction being freed.  I
/think/ you could fix this with the attached patch; what do you think?

--D

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index b272d0767c87..09ae5c181299 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -67,6 +67,11 @@ xfs_trans_free(
 	xfs_extent_busy_sort(&tp->t_busy);
 	xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
 
+	/* Detach the transaction from this thread. */
+	ASSERT(current->journal_info != NULL);
+	if (current->journal_info == tp)
+		xfs_trans_context_clear(tp);
+
 	trace_xfs_trans_free(tp, _RET_IP_);
 	if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
 		sb_end_intwrite(tp->t_mountp->m_super);
@@ -119,7 +124,11 @@ xfs_trans_dup(
 
 	ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
 	tp->t_rtx_res = tp->t_rtx_res_used;
+
+	/* Associate the new transaction with this thread. */
+	ASSERT(current->journal_info == tp);
 	ntp->t_pflags = tp->t_pflags;
+	current->journal_info = ntp;
 
 	/* move deferred ops over to the new tp */
 	xfs_defer_move(ntp, tp);
@@ -874,8 +883,6 @@ __xfs_trans_commit(
 
 	xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
 
-	if (!regrant)
-		xfs_trans_context_clear(tp);
 	xfs_trans_free(tp);
 
 	/*
@@ -908,7 +915,6 @@ __xfs_trans_commit(
 		tp->t_ticket = NULL;
 	}
 
-	xfs_trans_context_clear(tp);
 	xfs_trans_free_items(tp, !!error);
 	xfs_trans_free(tp);
 
@@ -968,9 +974,6 @@ xfs_trans_cancel(
 		tp->t_ticket = NULL;
 	}
 
-	/* mark this thread as no longer being in a transaction */
-	xfs_trans_context_clear(tp);
-
 	xfs_trans_free_items(tp, dirty);
 	xfs_trans_free(tp);
 }
@@ -1011,7 +1014,6 @@ xfs_trans_roll(
 	if (error)
 		return error;
 
-	xfs_trans_context_update(trans, *tpp);
 	/*
 	 * Reserve space in the log for the next transaction.
 	 * This also pushes items in the "AIL", the list of logged items,
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index c4877afcb8b9..ceb530bf5c4b 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -276,13 +276,6 @@ xfs_trans_context_set(struct xfs_trans *tp)
 	tp->t_pflags = memalloc_nofs_save();
 }
 
-static inline void
-xfs_trans_context_update(struct xfs_trans *old, struct xfs_trans *new)
-{
-	ASSERT(current->journal_info == old);
-	current->journal_info = new;
-}
-
 static inline void
 xfs_trans_context_clear(struct xfs_trans *tp)
 {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion
  2020-11-04  0:16   ` Darrick J. Wong
@ 2020-11-04 14:11     ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2020-11-04 14:11 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andrew Morton, Dave Chinner, Christoph Hellwig, Matthew Wilcox,
	Michal Hocko, Linux MM, linux-fsdevel, linux-xfs

On Wed, Nov 4, 2020 at 8:18 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Tue, Nov 03, 2020 at 09:17:54PM +0800, Yafang Shao wrote:
> > PF_FSTRANS which is used to avoid transaction reservation recursion, is
> > dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to
> > PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce
> > memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which
> > means to avoid filesystem reclaim recursion. That change is subtle.
> > Let's take the exmple of the check of WARN_ON_ONCE(current->flags &
> > PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to
> > PF_MEMALLOC_NOFS is not proper.
> > Below comment is quoted from Dave,
> > > It wasn't for memory allocation recursion protection in XFS - it was for
> > > transaction reservation recursion protection by something trying to flush
> > > data pages while holding a transaction reservation. Doing
> > > this could deadlock the journal because the existing reservation
> > > could prevent the nested reservation for being able to reserve space
> > > in the journal and that is a self-deadlock vector.
> > > IOWs, this check is not protecting against memory reclaim recursion
> > > bugs at all (that's the previous check [1]). This check is
> > > protecting against the filesystem calling writepages directly from a
> > > context where it can self-deadlock.
> > > So what we are seeing here is that the PF_FSTRANS ->
> > > PF_MEMALLOC_NOFS abstraction lost all the actual useful information
> > > about what type of error this check was protecting against.
> >
> > As a result, we should reintroduce PF_FSTRANS. As current->journal_info
> > isn't used in XFS, we can reuse it to indicate whehter the task is in
> > fstrans or not, Per Willy. To achieve that, four new helpers are introduce
> > in this patch, per Dave:
> > - xfs_trans_context_set()
> >   Used in xfs_trans_alloc()
> > - xfs_trans_context_clear()
> >   Used in xfs_trans_commit() and xfs_trans_cancel()
> > - xfs_trans_context_update()
> >   Used in xfs_trans_roll()
> > - xfs_trans_context_active()
> >   To check whehter current is in fs transcation or not
> > [1]. Below check is to avoid memory reclaim recursion.
> > if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
> >         PF_MEMALLOC))
> >         goto redirty;
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>
> Urrrrk, I found some problems with this patch while testing.  xfs/141
> blows up with:
>
> XFS: Assertion failed: current->journal_info == tp, file:
> fs/xfs/xfs_trans.h, line: 289
>
> The call trace is very garbled, but I think it is:
>
> +[ 1815.870749]  __xfs_trans_commit+0x4df/0x680 [xfs]
> +[ 1815.871342]  xfs_symlink+0x5ec/0xac0 [xfs]
> +[ 1815.871834]  ? lock_release+0x20d/0x450
> +[ 1815.872280]  ? get_cached_acl+0x32/0x250
> +[ 1815.872847]  xfs_vn_symlink+0x8d/0x1b0 [xfs]
> +[ 1815.873742]  vfs_symlink+0xc7/0x150
> +[ 1815.874356]  do_symlinkat+0x83/0x110
> +[ 1815.874788]  do_syscall_64+0x31/0x40
> +[ 1815.875204]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +[ 1815.875781] RIP: 0033:0x7f2317fc6d7b
>
>
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index c94e71f..b272d07 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -153,8 +153,6 @@
> >       int                     error = 0;
> >       bool                    rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
> >
> > -     /* Mark this thread as being in a transaction */
> > -     current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> >
> >       /*
> >        * Attempt to reserve the needed disk blocks by decrementing
> > @@ -163,10 +161,8 @@
> >        */
> >       if (blocks > 0) {
> >               error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> > -             if (error != 0) {
> > -                     current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +             if (error != 0)
> >                       return -ENOSPC;
> > -             }
> >               tp->t_blk_res += blocks;
> >       }
> >
> > @@ -241,8 +237,6 @@
> >               tp->t_blk_res = 0;
> >       }
> >
> > -     current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > -
> >       return error;
> >  }
> >
> > @@ -284,6 +278,8 @@
> >       INIT_LIST_HEAD(&tp->t_dfops);
> >       tp->t_firstblock = NULLFSBLOCK;
> >
> > +     /* Mark this thread as being in a transaction */
> > +     xfs_trans_context_set(tp);
> >       error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> >       if (error) {
> >               xfs_trans_cancel(tp);
>
> You're missing a xfs_trans_context_clear() call here.
>
> > @@ -878,7 +874,8 @@
> >
> >       xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
> >
> > -     current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +     if (!regrant)
> > +             xfs_trans_context_clear(tp);
> >       xfs_trans_free(tp);
> >
> >       /*
> > @@ -910,7 +907,8 @@
> >                       xfs_log_ticket_ungrant(mp->m_log, tp->t_ticket);
> >               tp->t_ticket = NULL;
> >       }
> > -     current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +
> > +     xfs_trans_context_clear(tp);
> >       xfs_trans_free_items(tp, !!error);
> >       xfs_trans_free(tp);
> >
> > @@ -971,7 +969,7 @@
> >       }
> >
> >       /* mark this thread as no longer being in a transaction */
> > -     current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS);
> > +     xfs_trans_context_clear(tp);
> >
> >       xfs_trans_free_items(tp, dirty);
> >       xfs_trans_free(tp);
> > @@ -1013,6 +1011,7 @@
> >       if (error)
> >               return error;
> >
> > +     xfs_trans_context_update(trans, *tpp);
>
> Two bugs here: First, xfs_trans_commit freed @trans, which means that
> the assertion commits a UAF error.  Second, if the transaction is
> synchronous and the xfs_log_force_lsn at the bottom of
> __xfs_trans_commit fails, we'll abort everything without clearing
> current->journal_info or restoring the memalloc flags.
>
> Personally I think you should just clear the context from xfs_trans_free
> if current->journal_info points to the transaction being freed.  I
> /think/ you could fix this with the attached patch; what do you think?
>

Thanks for catching this issue and the fix for it.
I will run xfstests with your fix.

> --D
>
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index b272d0767c87..09ae5c181299 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -67,6 +67,11 @@ xfs_trans_free(
>         xfs_extent_busy_sort(&tp->t_busy);
>         xfs_extent_busy_clear(tp->t_mountp, &tp->t_busy, false);
>
> +       /* Detach the transaction from this thread. */
> +       ASSERT(current->journal_info != NULL);
> +       if (current->journal_info == tp)
> +               xfs_trans_context_clear(tp);
> +
>         trace_xfs_trans_free(tp, _RET_IP_);
>         if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
>                 sb_end_intwrite(tp->t_mountp->m_super);
> @@ -119,7 +124,11 @@ xfs_trans_dup(
>
>         ntp->t_rtx_res = tp->t_rtx_res - tp->t_rtx_res_used;
>         tp->t_rtx_res = tp->t_rtx_res_used;
> +
> +       /* Associate the new transaction with this thread. */
> +       ASSERT(current->journal_info == tp);
>         ntp->t_pflags = tp->t_pflags;
> +       current->journal_info = ntp;
>
>         /* move deferred ops over to the new tp */
>         xfs_defer_move(ntp, tp);
> @@ -874,8 +883,6 @@ __xfs_trans_commit(
>
>         xfs_log_commit_cil(mp, tp, &commit_lsn, regrant);
>
> -       if (!regrant)
> -               xfs_trans_context_clear(tp);
>         xfs_trans_free(tp);
>
>         /*
> @@ -908,7 +915,6 @@ __xfs_trans_commit(
>                 tp->t_ticket = NULL;
>         }
>
> -       xfs_trans_context_clear(tp);
>         xfs_trans_free_items(tp, !!error);
>         xfs_trans_free(tp);
>
> @@ -968,9 +974,6 @@ xfs_trans_cancel(
>                 tp->t_ticket = NULL;
>         }
>
> -       /* mark this thread as no longer being in a transaction */
> -       xfs_trans_context_clear(tp);
> -
>         xfs_trans_free_items(tp, dirty);
>         xfs_trans_free(tp);
>  }
> @@ -1011,7 +1014,6 @@ xfs_trans_roll(
>         if (error)
>                 return error;
>
> -       xfs_trans_context_update(trans, *tpp);
>         /*
>          * Reserve space in the log for the next transaction.
>          * This also pushes items in the "AIL", the list of logged items,
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index c4877afcb8b9..ceb530bf5c4b 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -276,13 +276,6 @@ xfs_trans_context_set(struct xfs_trans *tp)
>         tp->t_pflags = memalloc_nofs_save();
>  }
>
> -static inline void
> -xfs_trans_context_update(struct xfs_trans *old, struct xfs_trans *new)
> -{
> -       ASSERT(current->journal_info == old);
> -       current->journal_info = new;
> -}
> -
>  static inline void
>  xfs_trans_context_clear(struct xfs_trans *tp)
>  {



-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd
  2020-11-03 19:48   ` Darrick J. Wong
@ 2020-11-04 14:17     ` Yafang Shao
  2020-11-04 16:26       ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2020-11-04 14:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andrew Morton, Dave Chinner, Christoph Hellwig, Matthew Wilcox,
	Michal Hocko, Linux MM, linux-fsdevel, linux-xfs

On Wed, Nov 4, 2020 at 3:48 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Tue, Nov 03, 2020 at 09:17:53PM +0800, Yafang Shao wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> >
> > Since XFS needs to pretend to be kswapd in some of its worker threads,
> > create methods to save & restore kswapd state.  Don't bother restoring
> > kswapd state in kswapd -- the only time we reach this code is when we're
> > exiting and the task_struct is about to be destroyed anyway.
> >
> > Cc: Dave Chinner <david@fromorbit.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
> >  include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
> >  mm/vmscan.c               | 16 +---------------
> >  3 files changed, 32 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 2d25bab..a04a442 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -2813,8 +2813,9 @@ struct xfs_btree_split_args {
> >  {
> >       struct xfs_btree_split_args     *args = container_of(work,
> >                                               struct xfs_btree_split_args, work);
> > +     bool                    is_kswapd = args->kswapd;
> >       unsigned long           pflags;
> > -     unsigned long           new_pflags = PF_MEMALLOC_NOFS;
> > +     int                     memalloc_nofs;
> >
> >       /*
> >        * we are in a transaction context here, but may also be doing work
> > @@ -2822,16 +2823,17 @@ struct xfs_btree_split_args {
> >        * temporarily to ensure that we don't block waiting for memory reclaim
> >        * in any way.
> >        */
> > -     if (args->kswapd)
> > -             new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > -
> > -     current_set_flags_nested(&pflags, new_pflags);
> > +     if (is_kswapd)
> > +             pflags = become_kswapd();
> > +     memalloc_nofs = memalloc_nofs_save();
> >
> >       args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> >                                        args->key, args->curp, args->stat);
> >       complete(args->done);
> >
> > -     current_restore_flags_nested(&pflags, new_pflags);
> > +     memalloc_nofs_restore(memalloc_nofs);
> > +     if (is_kswapd)
> > +             restore_kswapd(pflags);
>
> Note that there's a trivial merge conflict with the mrlock_t removal
> series.  I'll carry the fix in the tree, assuming that everything
> passes.
>

This patchset is based on Andrew's tree currently.
Seems I should rebase this patchset on your tree instead of Andrew's tree ?

> --D
>
> >  }
> >
> >  /*
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index d5ece7a..2faf03e 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> >  }
> >  #endif
> >
> > +/*
> > + * Tell the memory management code that this thread is working on behalf
> > + * of background memory reclaim (like kswapd).  That means that it will
> > + * get access to memory reserves should it need to allocate memory in
> > + * order to make forward progress.  With this great power comes great
> > + * responsibility to not exhaust those reserves.
> > + */
> > +#define KSWAPD_PF_FLAGS              (PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
> > +
> > +static inline unsigned long become_kswapd(void)
> > +{
> > +     unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
> > +
> > +     current->flags |= KSWAPD_PF_FLAGS;
> > +
> > +     return flags;
> > +}
> > +
> > +static inline void restore_kswapd(unsigned long flags)
> > +{
> > +     current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
> > +}
> > +
> >  #ifdef CONFIG_MEMCG
> >  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> >  /**
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 1b8f0e0..77bc1dd 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3869,19 +3869,7 @@ static int kswapd(void *p)
> >       if (!cpumask_empty(cpumask))
> >               set_cpus_allowed_ptr(tsk, cpumask);
> >
> > -     /*
> > -      * Tell the memory management that we're a "memory allocator",
> > -      * and that if we need more memory we should get access to it
> > -      * regardless (see "__alloc_pages()"). "kswapd" should
> > -      * never get caught in the normal page freeing logic.
> > -      *
> > -      * (Kswapd normally doesn't need memory anyway, but sometimes
> > -      * you need a small amount of memory in order to be able to
> > -      * page out something else, and this flag essentially protects
> > -      * us from recursively trying to free more memory as we're
> > -      * trying to free the first piece of memory in the first place).
> > -      */
> > -     tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > +     become_kswapd();
> >       set_freezable();
> >
> >       WRITE_ONCE(pgdat->kswapd_order, 0);
> > @@ -3931,8 +3919,6 @@ static int kswapd(void *p)
> >                       goto kswapd_try_sleep;
> >       }
> >
> > -     tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> > -
> >       return 0;
> >  }
> >
> > --
> > 1.8.3.1
> >



-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd
  2020-11-04 14:17     ` Yafang Shao
@ 2020-11-04 16:26       ` Darrick J. Wong
  2020-11-04 17:50         ` Amy Parker
  2020-11-05 13:04         ` Yafang Shao
  0 siblings, 2 replies; 10+ messages in thread
From: Darrick J. Wong @ 2020-11-04 16:26 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Dave Chinner, Christoph Hellwig, Matthew Wilcox,
	Michal Hocko, Linux MM, linux-fsdevel, linux-xfs

On Wed, Nov 04, 2020 at 10:17:16PM +0800, Yafang Shao wrote:
> On Wed, Nov 4, 2020 at 3:48 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Tue, Nov 03, 2020 at 09:17:53PM +0800, Yafang Shao wrote:
> > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > >
> > > Since XFS needs to pretend to be kswapd in some of its worker threads,
> > > create methods to save & restore kswapd state.  Don't bother restoring
> > > kswapd state in kswapd -- the only time we reach this code is when we're
> > > exiting and the task_struct is about to be destroyed anyway.
> > >
> > > Cc: Dave Chinner <david@fromorbit.com>
> > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
> > >  include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
> > >  mm/vmscan.c               | 16 +---------------
> > >  3 files changed, 32 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > index 2d25bab..a04a442 100644
> > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > @@ -2813,8 +2813,9 @@ struct xfs_btree_split_args {
> > >  {
> > >       struct xfs_btree_split_args     *args = container_of(work,
> > >                                               struct xfs_btree_split_args, work);
> > > +     bool                    is_kswapd = args->kswapd;
> > >       unsigned long           pflags;
> > > -     unsigned long           new_pflags = PF_MEMALLOC_NOFS;
> > > +     int                     memalloc_nofs;
> > >
> > >       /*
> > >        * we are in a transaction context here, but may also be doing work
> > > @@ -2822,16 +2823,17 @@ struct xfs_btree_split_args {
> > >        * temporarily to ensure that we don't block waiting for memory reclaim
> > >        * in any way.
> > >        */
> > > -     if (args->kswapd)
> > > -             new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > -
> > > -     current_set_flags_nested(&pflags, new_pflags);
> > > +     if (is_kswapd)
> > > +             pflags = become_kswapd();
> > > +     memalloc_nofs = memalloc_nofs_save();
> > >
> > >       args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> > >                                        args->key, args->curp, args->stat);
> > >       complete(args->done);
> > >
> > > -     current_restore_flags_nested(&pflags, new_pflags);
> > > +     memalloc_nofs_restore(memalloc_nofs);
> > > +     if (is_kswapd)
> > > +             restore_kswapd(pflags);
> >
> > Note that there's a trivial merge conflict with the mrlock_t removal
> > series.  I'll carry the fix in the tree, assuming that everything
> > passes.
> >
> 
> This patchset is based on Andrew's tree currently.
> Seems I should rebase this patchset on your tree instead of Andrew's tree ?

That depends on whether or not you want me to push these two patches
through the xfs tree or if they're going through Andrew (Morton?)'s
quiltset.

Frankly I'd rather take them via xfs since most of the diff is xfs...

--D

> > --D
> >
> > >  }
> > >
> > >  /*
> > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > index d5ece7a..2faf03e 100644
> > > --- a/include/linux/sched/mm.h
> > > +++ b/include/linux/sched/mm.h
> > > @@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> > >  }
> > >  #endif
> > >
> > > +/*
> > > + * Tell the memory management code that this thread is working on behalf
> > > + * of background memory reclaim (like kswapd).  That means that it will
> > > + * get access to memory reserves should it need to allocate memory in
> > > + * order to make forward progress.  With this great power comes great
> > > + * responsibility to not exhaust those reserves.
> > > + */
> > > +#define KSWAPD_PF_FLAGS              (PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
> > > +
> > > +static inline unsigned long become_kswapd(void)
> > > +{
> > > +     unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
> > > +
> > > +     current->flags |= KSWAPD_PF_FLAGS;
> > > +
> > > +     return flags;
> > > +}
> > > +
> > > +static inline void restore_kswapd(unsigned long flags)
> > > +{
> > > +     current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
> > > +}
> > > +
> > >  #ifdef CONFIG_MEMCG
> > >  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> > >  /**
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 1b8f0e0..77bc1dd 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -3869,19 +3869,7 @@ static int kswapd(void *p)
> > >       if (!cpumask_empty(cpumask))
> > >               set_cpus_allowed_ptr(tsk, cpumask);
> > >
> > > -     /*
> > > -      * Tell the memory management that we're a "memory allocator",
> > > -      * and that if we need more memory we should get access to it
> > > -      * regardless (see "__alloc_pages()"). "kswapd" should
> > > -      * never get caught in the normal page freeing logic.
> > > -      *
> > > -      * (Kswapd normally doesn't need memory anyway, but sometimes
> > > -      * you need a small amount of memory in order to be able to
> > > -      * page out something else, and this flag essentially protects
> > > -      * us from recursively trying to free more memory as we're
> > > -      * trying to free the first piece of memory in the first place).
> > > -      */
> > > -     tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > +     become_kswapd();
> > >       set_freezable();
> > >
> > >       WRITE_ONCE(pgdat->kswapd_order, 0);
> > > @@ -3931,8 +3919,6 @@ static int kswapd(void *p)
> > >                       goto kswapd_try_sleep;
> > >       }
> > >
> > > -     tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> > > -
> > >       return 0;
> > >  }
> > >
> > > --
> > > 1.8.3.1
> > >
> 
> 
> 
> -- 
> Thanks
> Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd
  2020-11-04 16:26       ` Darrick J. Wong
@ 2020-11-04 17:50         ` Amy Parker
  2020-11-05 13:04         ` Yafang Shao
  1 sibling, 0 replies; 10+ messages in thread
From: Amy Parker @ 2020-11-04 17:50 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Yafang Shao, Andrew Morton, Dave Chinner, Christoph Hellwig,
	Matthew Wilcox, Michal Hocko, Linux MM, linux-fsdevel, linux-xfs

On Wed, Nov 4, 2020 at 8:30 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Wed, Nov 04, 2020 at 10:17:16PM +0800, Yafang Shao wrote:
> > On Wed, Nov 4, 2020 at 3:48 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > >
> > > On Tue, Nov 03, 2020 at 09:17:53PM +0800, Yafang Shao wrote:
> > > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > >
> > > > Since XFS needs to pretend to be kswapd in some of its worker threads,
> > > > create methods to save & restore kswapd state.  Don't bother restoring
> > > > kswapd state in kswapd -- the only time we reach this code is when we're
> > > > exiting and the task_struct is about to be destroyed anyway.
> > > >
> > > > Cc: Dave Chinner <david@fromorbit.com>
> > > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
> > > >  include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
> > > >  mm/vmscan.c               | 16 +---------------
> > > >  3 files changed, 32 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > > index 2d25bab..a04a442 100644
> > > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > > @@ -2813,8 +2813,9 @@ struct xfs_btree_split_args {
> > > >  {
> > > >       struct xfs_btree_split_args     *args = container_of(work,
> > > >                                               struct xfs_btree_split_args, work);
> > > > +     bool                    is_kswapd = args->kswapd;
> > > >       unsigned long           pflags;
> > > > -     unsigned long           new_pflags = PF_MEMALLOC_NOFS;
> > > > +     int                     memalloc_nofs;
> > > >
> > > >       /*
> > > >        * we are in a transaction context here, but may also be doing work
> > > > @@ -2822,16 +2823,17 @@ struct xfs_btree_split_args {
> > > >        * temporarily to ensure that we don't block waiting for memory reclaim
> > > >        * in any way.
> > > >        */
> > > > -     if (args->kswapd)
> > > > -             new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > > -
> > > > -     current_set_flags_nested(&pflags, new_pflags);
> > > > +     if (is_kswapd)
> > > > +             pflags = become_kswapd();
> > > > +     memalloc_nofs = memalloc_nofs_save();
> > > >
> > > >       args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> > > >                                        args->key, args->curp, args->stat);
> > > >       complete(args->done);
> > > >
> > > > -     current_restore_flags_nested(&pflags, new_pflags);
> > > > +     memalloc_nofs_restore(memalloc_nofs);
> > > > +     if (is_kswapd)
> > > > +             restore_kswapd(pflags);
> > >
> > > Note that there's a trivial merge conflict with the mrlock_t removal
> > > series.  I'll carry the fix in the tree, assuming that everything
> > > passes.
> > >
> >
> > This patchset is based on Andrew's tree currently.
> > Seems I should rebase this patchset on your tree instead of Andrew's tree ?
>
> That depends on whether or not you want me to push these two patches
> through the xfs tree or if they're going through Andrew (Morton?)'s
> quiltset.

I think they could go through either, but the XFS tree might be ever-so-slightly
better.

>
> Frankly I'd rather take them via xfs since most of the diff is xfs...

Yeah, it's an XFS patch so let's throw it through the XFS tree.

>
> --D
>
> > > --D
> > >
> > > >  }
> > > >
> > > >  /*
> > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > > index d5ece7a..2faf03e 100644
> > > > --- a/include/linux/sched/mm.h
> > > > +++ b/include/linux/sched/mm.h
> > > > @@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> > > >  }
> > > >  #endif
> > > >
> > > > +/*
> > > > + * Tell the memory management code that this thread is working on behalf
> > > > + * of background memory reclaim (like kswapd).  That means that it will
> > > > + * get access to memory reserves should it need to allocate memory in
> > > > + * order to make forward progress.  With this great power comes great
> > > > + * responsibility to not exhaust those reserves.
> > > > + */
> > > > +#define KSWAPD_PF_FLAGS              (PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
> > > > +
> > > > +static inline unsigned long become_kswapd(void)
> > > > +{
> > > > +     unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
> > > > +
> > > > +     current->flags |= KSWAPD_PF_FLAGS;
> > > > +
> > > > +     return flags;
> > > > +}
> > > > +
> > > > +static inline void restore_kswapd(unsigned long flags)
> > > > +{
> > > > +     current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
> > > > +}
> > > > +
> > > >  #ifdef CONFIG_MEMCG
> > > >  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> > > >  /**
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 1b8f0e0..77bc1dd 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -3869,19 +3869,7 @@ static int kswapd(void *p)
> > > >       if (!cpumask_empty(cpumask))
> > > >               set_cpus_allowed_ptr(tsk, cpumask);
> > > >
> > > > -     /*
> > > > -      * Tell the memory management that we're a "memory allocator",
> > > > -      * and that if we need more memory we should get access to it
> > > > -      * regardless (see "__alloc_pages()"). "kswapd" should
> > > > -      * never get caught in the normal page freeing logic.
> > > > -      *
> > > > -      * (Kswapd normally doesn't need memory anyway, but sometimes
> > > > -      * you need a small amount of memory in order to be able to
> > > > -      * page out something else, and this flag essentially protects
> > > > -      * us from recursively trying to free more memory as we're
> > > > -      * trying to free the first piece of memory in the first place).
> > > > -      */
> > > > -     tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > > +     become_kswapd();
> > > >       set_freezable();
> > > >
> > > >       WRITE_ONCE(pgdat->kswapd_order, 0);
> > > > @@ -3931,8 +3919,6 @@ static int kswapd(void *p)
> > > >                       goto kswapd_try_sleep;
> > > >       }
> > > >
> > > > -     tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> > > > -
> > > >       return 0;
> > > >  }
> > > >
> > > > --
> > > > 1.8.3.1
> > > >
> >
> >
> >
> > --
> > Thanks
> > Yafang

Best regards,
Amy Parker
(they/them)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd
  2020-11-04 16:26       ` Darrick J. Wong
  2020-11-04 17:50         ` Amy Parker
@ 2020-11-05 13:04         ` Yafang Shao
  1 sibling, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2020-11-05 13:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andrew Morton, Dave Chinner, Christoph Hellwig, Matthew Wilcox,
	Michal Hocko, Linux MM, linux-fsdevel, linux-xfs

On Thu, Nov 5, 2020 at 12:26 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Wed, Nov 04, 2020 at 10:17:16PM +0800, Yafang Shao wrote:
> > On Wed, Nov 4, 2020 at 3:48 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > >
> > > On Tue, Nov 03, 2020 at 09:17:53PM +0800, Yafang Shao wrote:
> > > > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > > >
> > > > Since XFS needs to pretend to be kswapd in some of its worker threads,
> > > > create methods to save & restore kswapd state.  Don't bother restoring
> > > > kswapd state in kswapd -- the only time we reach this code is when we're
> > > > exiting and the task_struct is about to be destroyed anyway.
> > > >
> > > > Cc: Dave Chinner <david@fromorbit.com>
> > > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_btree.c | 14 ++++++++------
> > > >  include/linux/sched/mm.h  | 23 +++++++++++++++++++++++
> > > >  mm/vmscan.c               | 16 +---------------
> > > >  3 files changed, 32 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > > index 2d25bab..a04a442 100644
> > > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > > @@ -2813,8 +2813,9 @@ struct xfs_btree_split_args {
> > > >  {
> > > >       struct xfs_btree_split_args     *args = container_of(work,
> > > >                                               struct xfs_btree_split_args, work);
> > > > +     bool                    is_kswapd = args->kswapd;
> > > >       unsigned long           pflags;
> > > > -     unsigned long           new_pflags = PF_MEMALLOC_NOFS;
> > > > +     int                     memalloc_nofs;
> > > >
> > > >       /*
> > > >        * we are in a transaction context here, but may also be doing work
> > > > @@ -2822,16 +2823,17 @@ struct xfs_btree_split_args {
> > > >        * temporarily to ensure that we don't block waiting for memory reclaim
> > > >        * in any way.
> > > >        */
> > > > -     if (args->kswapd)
> > > > -             new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > > -
> > > > -     current_set_flags_nested(&pflags, new_pflags);
> > > > +     if (is_kswapd)
> > > > +             pflags = become_kswapd();
> > > > +     memalloc_nofs = memalloc_nofs_save();
> > > >
> > > >       args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> > > >                                        args->key, args->curp, args->stat);
> > > >       complete(args->done);
> > > >
> > > > -     current_restore_flags_nested(&pflags, new_pflags);
> > > > +     memalloc_nofs_restore(memalloc_nofs);
> > > > +     if (is_kswapd)
> > > > +             restore_kswapd(pflags);
> > >
> > > Note that there's a trivial merge conflict with the mrlock_t removal
> > > series.  I'll carry the fix in the tree, assuming that everything
> > > passes.
> > >
> >
> > This patchset is based on Andrew's tree currently.
> > Seems I should rebase this patchset on your tree instead of Andrew's tree ?
>
> That depends on whether or not you want me to push these two patches
> through the xfs tree or if they're going through Andrew (Morton?)'s
> quiltset.
>
> Frankly I'd rather take them via xfs since most of the diff is xfs...
>

Sure, I will rebase in on the xfs tree.

> > >
> > > >  }
> > > >
> > > >  /*
> > > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > > index d5ece7a..2faf03e 100644
> > > > --- a/include/linux/sched/mm.h
> > > > +++ b/include/linux/sched/mm.h
> > > > @@ -278,6 +278,29 @@ static inline void memalloc_nocma_restore(unsigned int flags)
> > > >  }
> > > >  #endif
> > > >
> > > > +/*
> > > > + * Tell the memory management code that this thread is working on behalf
> > > > + * of background memory reclaim (like kswapd).  That means that it will
> > > > + * get access to memory reserves should it need to allocate memory in
> > > > + * order to make forward progress.  With this great power comes great
> > > > + * responsibility to not exhaust those reserves.
> > > > + */
> > > > +#define KSWAPD_PF_FLAGS              (PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD)
> > > > +
> > > > +static inline unsigned long become_kswapd(void)
> > > > +{
> > > > +     unsigned long flags = current->flags & KSWAPD_PF_FLAGS;
> > > > +
> > > > +     current->flags |= KSWAPD_PF_FLAGS;
> > > > +
> > > > +     return flags;
> > > > +}
> > > > +
> > > > +static inline void restore_kswapd(unsigned long flags)
> > > > +{
> > > > +     current->flags &= ~(flags ^ KSWAPD_PF_FLAGS);
> > > > +}
> > > > +
> > > >  #ifdef CONFIG_MEMCG
> > > >  DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> > > >  /**
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 1b8f0e0..77bc1dd 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -3869,19 +3869,7 @@ static int kswapd(void *p)
> > > >       if (!cpumask_empty(cpumask))
> > > >               set_cpus_allowed_ptr(tsk, cpumask);
> > > >
> > > > -     /*
> > > > -      * Tell the memory management that we're a "memory allocator",
> > > > -      * and that if we need more memory we should get access to it
> > > > -      * regardless (see "__alloc_pages()"). "kswapd" should
> > > > -      * never get caught in the normal page freeing logic.
> > > > -      *
> > > > -      * (Kswapd normally doesn't need memory anyway, but sometimes
> > > > -      * you need a small amount of memory in order to be able to
> > > > -      * page out something else, and this flag essentially protects
> > > > -      * us from recursively trying to free more memory as we're
> > > > -      * trying to free the first piece of memory in the first place).
> > > > -      */
> > > > -     tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > > +     become_kswapd();
> > > >       set_freezable();
> > > >
> > > >       WRITE_ONCE(pgdat->kswapd_order, 0);
> > > > @@ -3931,8 +3919,6 @@ static int kswapd(void *p)
> > > >                       goto kswapd_try_sleep;
> > > >       }
> > > >
> > > > -     tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
> > > > -
> > > >       return 0;
> > > >  }
> > > >
> > > > --
> > > > 1.8.3.1
> > > >
> >
> >
> >
> > --
> > Thanks
> > Yafang



-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-11-05 13:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03 13:17 [PATCH v8 resend 0/2] avoid xfs transaction reservation recursion Yafang Shao
2020-11-03 13:17 ` [PATCH v8 resend 1/2] mm: Add become_kswapd and restore_kswapd Yafang Shao
2020-11-03 19:48   ` Darrick J. Wong
2020-11-04 14:17     ` Yafang Shao
2020-11-04 16:26       ` Darrick J. Wong
2020-11-04 17:50         ` Amy Parker
2020-11-05 13:04         ` Yafang Shao
2020-11-03 13:17 ` [PATCH v8 resend 2/2] xfs: avoid transaction reservation recursion Yafang Shao
2020-11-04  0:16   ` Darrick J. Wong
2020-11-04 14:11     ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).