All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Ignore non-LRU-based reclaim in memcg reclaim
@ 2023-03-09  9:31 Yosry Ahmed
  2023-03-09  9:31 ` [PATCH v2 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Yosry Ahmed @ 2023-03-09  9:31 UTC (permalink / raw)
  To: Alexander Viro, Darrick J. Wong, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

Upon running some proactive reclaim tests using memory.reclaim, we
noticed some tests flaking where writing to memory.reclaim would be
successful even though we did not reclaim the requested amount fully.
Looking further into it, I discovered that *sometimes* we over-report
the number of reclaimed pages in memcg reclaim.

Reclaimed pages through other means than LRU-based reclaim are tracked
through reclaim_state in struct scan_control, which is stashed in
current task_struct. These pages are added to the number of reclaimed
pages through LRUs. For memcg reclaim, these pages generally cannot be
linked to the memcg under reclaim and can cause an overestimated count
of reclaimed pages. This short series tries to address that.

Patches 1-2 are just refactoring, they add helpers that wrap some
operations on current->reclaim_state, and rename
reclaim_state->reclaimed_slab to reclaim_state->reclaimed.

Patch 3 ignores pages reclaimed outside of LRU reclaim in memcg reclaim.
The pages are uncharged anyway, so even if we end up under-reporting
reclaimed pages we will still succeed in making progress during
charging.

Do not let the diff stat deceive you, the core of this series is patch 3,
which has one line of code change. All the rest is refactoring and one
huge comment.

v1 -> v2:
- Renamed report_freed_pages() to mm_account_reclaimed_pages(), as
  suggested by Dave Chinner. There were discussions about leaving
  updating current->reclaim_state open-coded as it's not worth hiding
  the current dereferencing to remove one line, but I'd rather have the
  logic contained with mm/vmscan.c so that the next person that changes
  this logic doesn't have to change 7 different files.
- Renamed add_non_vmscan_reclaimed() to flush_reclaim_state() (Johannes
  Weiner).
- Added more context about how this problem was found in the cover
  letter (Johannes Weiner).
- Added a patch to move set_task_reclaim_state() below the definition of
  cgroup_reclaim(), and added additional helpers in the same position.
  This way all the helpers for reclaim_state live together, and there is
  no need to declare cgroup_reclaim() early or move its definition
  around to call it from flush_reclaim_state(). This should also fix the
  build error reported by the bot in !CONFIG_MEMCG.

RFC -> v1:
- Exported report_freed_pages() in case XFS is built as a module (Matthew
  Wilcox).
- Renamed reclaimed_slab to reclaim in previously missed MGLRU code.
- Refactored using reclaim_state to update sc->nr_reclaimed into a
  helper and added an XL comment explaining why we ignore
  reclaim_state->reclaimed in memcg reclaim (Johannes Weiner).

Yosry Ahmed (3):
  mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim()
  mm: vmscan: refactor updating reclaimed pages in reclaim_state
  mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

 fs/inode.c           |  3 +-
 fs/xfs/xfs_buf.c     |  3 +-
 include/linux/swap.h |  5 ++-
 mm/slab.c            |  3 +-
 mm/slob.c            |  6 +--
 mm/slub.c            |  5 +--
 mm/vmscan.c          | 88 +++++++++++++++++++++++++++++++++++---------
 7 files changed, 81 insertions(+), 32 deletions(-)

-- 
2.40.0.rc0.216.gc4246ad0f0-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim()
  2023-03-09  9:31 [PATCH v2 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
@ 2023-03-09  9:31 ` Yosry Ahmed
  2023-03-09  9:31 ` [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Yosry Ahmed @ 2023-03-09  9:31 UTC (permalink / raw)
  To: Alexander Viro, Darrick J. Wong, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

set_task_reclaim_state() is currently defined in mm/vmscan.c above
an #ifdef CONFIG_MEMCG block where cgroup_reclaim() is defined. We are
about to add some more helpers that operate on reclaim_state, and will
need to use cgroup_reclaim(). Move set_task_reclaim_state() after
the #ifdef CONFIG_MEMCG block containing the definition of
cgroup_reclaim() to keep helpers operating on reclaim_state together.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/vmscan.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c1c5e8b24b8..fef7d1c0f82b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -188,18 +188,6 @@ struct scan_control {
  */
 int vm_swappiness = 60;
 
-static void set_task_reclaim_state(struct task_struct *task,
-				   struct reclaim_state *rs)
-{
-	/* Check for an overwrite */
-	WARN_ON_ONCE(rs && task->reclaim_state);
-
-	/* Check for the nulling of an already-nulled member */
-	WARN_ON_ONCE(!rs && !task->reclaim_state);
-
-	task->reclaim_state = rs;
-}
-
 LIST_HEAD(shrinker_list);
 DECLARE_RWSEM(shrinker_rwsem);
 
@@ -511,6 +499,18 @@ static bool writeback_throttling_sane(struct scan_control *sc)
 }
 #endif
 
+static void set_task_reclaim_state(struct task_struct *task,
+				   struct reclaim_state *rs)
+{
+	/* Check for an overwrite */
+	WARN_ON_ONCE(rs && task->reclaim_state);
+
+	/* Check for the nulling of an already-nulled member */
+	WARN_ON_ONCE(!rs && !task->reclaim_state);
+
+	task->reclaim_state = rs;
+}
+
 static long xchg_nr_deferred(struct shrinker *shrinker,
 			     struct shrink_control *sc)
 {
-- 
2.40.0.rc0.216.gc4246ad0f0-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state
  2023-03-09  9:31 [PATCH v2 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
  2023-03-09  9:31 ` [PATCH v2 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
@ 2023-03-09  9:31 ` Yosry Ahmed
  2023-03-09  9:39   ` Yosry Ahmed
  2023-03-10  2:37   ` kernel test robot
  2023-03-09  9:31 ` [PATCH v2 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
  2023-03-23  4:05 ` [PATCH v2 0/3] Ignore " Yosry Ahmed
  3 siblings, 2 replies; 7+ messages in thread
From: Yosry Ahmed @ 2023-03-09  9:31 UTC (permalink / raw)
  To: Alexander Viro, Darrick J. Wong, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

During reclaim, we keep track of pages reclaimed from other means than
LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
which we stash a pointer to in current task_struct.

However, we keep track of more than just reclaimed slab pages through
this. We also use it for clean file pages dropped through pruned inodes,
and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add
a helper function that wraps updating it through current, so that future
changes to this logic are contained within mm/vmscan.c.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 fs/inode.c           |  3 +--
 fs/xfs/xfs_buf.c     |  3 +--
 include/linux/swap.h |  5 ++++-
 mm/slab.c            |  3 +--
 mm/slob.c            |  6 ++----
 mm/slub.c            |  5 ++---
 mm/vmscan.c          | 36 ++++++++++++++++++++++++++++++------
 7 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 4558dc2f1355..e60fcc41faf1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -864,8 +864,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
 				__count_vm_events(KSWAPD_INODESTEAL, reap);
 			else
 				__count_vm_events(PGINODESTEAL, reap);
-			if (current->reclaim_state)
-				current->reclaim_state->reclaimed_slab += reap;
+			mm_account_reclaimed_pages(reap);
 		}
 		iput(inode);
 		spin_lock(lru_lock);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 54c774af6e1c..060079f1e966 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -286,8 +286,7 @@ xfs_buf_free_pages(
 		if (bp->b_pages[i])
 			__free_page(bp->b_pages[i]);
 	}
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += bp->b_page_count;
+	report_freed_pages(bp->b_page_count);
 
 	if (bp->b_pages != bp->b_page_array)
 		kmem_free(bp->b_pages);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 209a425739a9..589ea2731931 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -153,13 +153,16 @@ union swap_header {
  * memory reclaim
  */
 struct reclaim_state {
-	unsigned long reclaimed_slab;
+	/* pages reclaimed outside of LRU-based reclaim */
+	unsigned long reclaimed;
 #ifdef CONFIG_LRU_GEN
 	/* per-thread mm walk data */
 	struct lru_gen_mm_walk *mm_walk;
 #endif
 };
 
+void mm_account_reclaimed_pages(unsigned long pages);
+
 #ifdef __KERNEL__
 
 struct address_space;
diff --git a/mm/slab.c b/mm/slab.c
index dabc2a671fc6..64bf1de817b2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1392,8 +1392,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab)
 	smp_wmb();
 	__folio_clear_slab(folio);
 
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += 1 << order;
+	mm_account_reclaimed_pages(1 << order);
 	unaccount_slab(slab, order, cachep);
 	__free_pages(&folio->page, order);
 }
diff --git a/mm/slob.c b/mm/slob.c
index fe567fcfa3a3..79cc8680c973 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -61,7 +61,7 @@
 #include <linux/slab.h>
 
 #include <linux/mm.h>
-#include <linux/swap.h> /* struct reclaim_state */
+#include <linux/swap.h> /* mm_account_reclaimed_pages() */
 #include <linux/cache.h>
 #include <linux/init.h>
 #include <linux/export.h>
@@ -211,9 +211,7 @@ static void slob_free_pages(void *b, int order)
 {
 	struct page *sp = virt_to_page(b);
 
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += 1 << order;
-
+	mm_account_reclaimed_pages(1 << order);
 	mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
 			    -(PAGE_SIZE << order));
 	__free_pages(sp, order);
diff --git a/mm/slub.c b/mm/slub.c
index 39327e98fce3..7aa30eef8235 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -11,7 +11,7 @@
  */
 
 #include <linux/mm.h>
-#include <linux/swap.h> /* struct reclaim_state */
+#include <linux/swap.h> /* mm_account_reclaimed_pages() */
 #include <linux/module.h>
 #include <linux/bit_spinlock.h>
 #include <linux/interrupt.h>
@@ -2063,8 +2063,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab)
 	/* Make the mapping reset visible before clearing the flag */
 	smp_wmb();
 	__folio_clear_slab(folio);
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += pages;
+	mm_account_reclaimed_pages(pages);
 	unaccount_slab(slab, order, s);
 	__free_pages(&folio->page, order);
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fef7d1c0f82b..a3e38851b34a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -511,6 +511,34 @@ static void set_task_reclaim_state(struct task_struct *task,
 	task->reclaim_state = rs;
 }
 
+/*
+ * mm_account_reclaimed_pages(): account reclaimed pages outside of LRU-based
+ * reclaim
+ * @pages: number of pages reclaimed
+ *
+ * If the current process is undergoing a reclaim operation, increment the
+ * number of reclaimed pages by @pages.
+ */
+void mm_account_reclaimed_pages(unsigned long pages)
+{
+	if (current->reclaim_state)
+		current->reclaim_state->reclaimed += pages;
+}
+EXPORT_SYMBOL(mm_account_reclaimed_pages);
+
+/*
+ * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
+ * scan_control->nr_reclaimed.
+ */
+static void flush_reclaim_state(struct scan_control *sc,
+				struct reclaim_state *rs)
+{
+	if (rs) {
+		sc->nr_reclaimed += rs->reclaimed;
+		rs->reclaimed = 0;
+	}
+}
+
 static long xchg_nr_deferred(struct shrinker *shrinker,
 			     struct shrink_control *sc)
 {
@@ -5346,8 +5374,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 		vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
 			   sc->nr_reclaimed - reclaimed);
 
-	sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
-	current->reclaim_state->reclaimed_slab = 0;
+	flush_reclaim_state(sc, current->reclaim_state);
 
 	return success ? MEMCG_LRU_YOUNG : 0;
 }
@@ -6472,10 +6499,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 
 	shrink_node_memcgs(pgdat, sc);
 
-	if (reclaim_state) {
-		sc->nr_reclaimed += reclaim_state->reclaimed_slab;
-		reclaim_state->reclaimed_slab = 0;
-	}
+	flush_reclaim_state(sc, reclaim_state);
 
 	/* Record the subtree's reclaim efficiency */
 	if (!sc->proactive)
-- 
2.40.0.rc0.216.gc4246ad0f0-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
  2023-03-09  9:31 [PATCH v2 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
  2023-03-09  9:31 ` [PATCH v2 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
  2023-03-09  9:31 ` [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
@ 2023-03-09  9:31 ` Yosry Ahmed
  2023-03-23  4:05 ` [PATCH v2 0/3] Ignore " Yosry Ahmed
  3 siblings, 0 replies; 7+ messages in thread
From: Yosry Ahmed @ 2023-03-09  9:31 UTC (permalink / raw)
  To: Alexander Viro, Darrick J. Wong, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

We keep track of different types of reclaimed pages through
reclaim_state->reclaimed, and we add them to the reported number of
reclaimed pages. For non-memcg reclaim, this makes sense. For memcg
reclaim, we have no clue if those pages are charged to the memcg under
reclaim.

Slab pages are shared by different memcgs, so a freed slab page may have
only been partially charged to the memcg under reclaim. The same goes
for clean file pages from pruned inodes (on highmem systems) or xfs
buffer pages, there is no simple way to currently link them to the memcg
under reclaim.

Stop reporting those freed pages as reclaimed pages during memcg
reclaim. This should make the return value of writing to memory.reclaim,
and may help reduce unnecessary reclaim retries during memcg charging.

Generally, this should make the return value of
try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g.
freed a slab page that was mostly charged to the memcg under reclaim),
the return value of try_to_free_mem_cgroup_pages() can be
underestimated, but this should be fine. The freed pages will be
uncharged anyway, and we can charge the memcg the next time around as we
usually do memcg reclaim in a retry loop.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/vmscan.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a3e38851b34a..bf9d8e175e92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -533,7 +533,35 @@ EXPORT_SYMBOL(mm_account_reclaimed_pages);
 static void flush_reclaim_state(struct scan_control *sc,
 				struct reclaim_state *rs)
 {
-	if (rs) {
+	/*
+	 * Currently, reclaim_state->reclaimed includes three types of pages
+	 * freed outside of vmscan:
+	 * (1) Slab pages.
+	 * (2) Clean file pages from pruned inodes.
+	 * (3) XFS freed buffer pages.
+	 *
+	 * For all of these cases, we have no way of finding out whether these
+	 * pages were related to the memcg under reclaim. For example, a freed
+	 * slab page could have had only a single object charged to the memcg
+	 * under reclaim. Also, populated inodes are not on shrinker LRUs
+	 * anymore except on highmem systems.
+	 *
+	 * Instead of over-reporting the reclaimed pages in a memcg reclaim,
+	 * only count such pages in system-wide reclaim. This prevents
+	 * unnecessary retries during memcg charging and false positive from
+	 * proactive reclaim (memory.reclaim).
+	 *
+	 * For uncommon cases were the freed pages were actually significantly
+	 * charged to the memcg under reclaim, and we end up under-reporting, it
+	 * should be fine. The freed pages will be uncharged anyway, even if
+	 * they are not reported properly, and we will be able to make forward
+	 * progress in charging (which is usually in a retry loop).
+	 *
+	 * We can go one step further, and report the uncharged objcg pages in
+	 * memcg reclaim, to make reporting more accurate and reduce
+	 * under-reporting, but it's probably not worth the complexity for now.
+	 */
+	if (rs && !cgroup_reclaim(sc)) {
 		sc->nr_reclaimed += rs->reclaimed;
 		rs->reclaimed = 0;
 	}
-- 
2.40.0.rc0.216.gc4246ad0f0-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state
  2023-03-09  9:31 ` [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
@ 2023-03-09  9:39   ` Yosry Ahmed
  2023-03-10  2:37   ` kernel test robot
  1 sibling, 0 replies; 7+ messages in thread
From: Yosry Ahmed @ 2023-03-09  9:39 UTC (permalink / raw)
  To: Alexander Viro, Darrick J. Wong, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm

On Thu, Mar 9, 2023 at 1:31 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> During reclaim, we keep track of pages reclaimed from other means than
> LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
> which we stash a pointer to in current task_struct.
>
> However, we keep track of more than just reclaimed slab pages through
> this. We also use it for clean file pages dropped through pruned inodes,
> and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add
> a helper function that wraps updating it through current, so that future
> changes to this logic are contained within mm/vmscan.c.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  fs/inode.c           |  3 +--
>  fs/xfs/xfs_buf.c     |  3 +--
>  include/linux/swap.h |  5 ++++-
>  mm/slab.c            |  3 +--
>  mm/slob.c            |  6 ++----
>  mm/slub.c            |  5 ++---
>  mm/vmscan.c          | 36 ++++++++++++++++++++++++++++++------
>  7 files changed, 41 insertions(+), 20 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 4558dc2f1355..e60fcc41faf1 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -864,8 +864,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
>                                 __count_vm_events(KSWAPD_INODESTEAL, reap);
>                         else
>                                 __count_vm_events(PGINODESTEAL, reap);
> -                       if (current->reclaim_state)
> -                               current->reclaim_state->reclaimed_slab += reap;
> +                       mm_account_reclaimed_pages(reap);
>                 }
>                 iput(inode);
>                 spin_lock(lru_lock);
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 54c774af6e1c..060079f1e966 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -286,8 +286,7 @@ xfs_buf_free_pages(
>                 if (bp->b_pages[i])
>                         __free_page(bp->b_pages[i]);
>         }
> -       if (current->reclaim_state)
> -               current->reclaim_state->reclaimed_slab += bp->b_page_count;
> +       report_freed_pages(bp->b_page_count);


Ugh I missed updating this one to mm_account_reclaimed_page().

This fixup needs to be squashed here. I will include it in v3 if a
respin is needed, otherwise I hope Andrew can squash it in.

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 060079f1e966..15d1e5a7c2d3 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -286,7 +286,7 @@ xfs_buf_free_pages(
                if (bp->b_pages[i])
                        __free_page(bp->b_pages[i]);
        }
-       report_freed_pages(bp->b_page_count);
+       mm_account_reclaimed_pages(bp->b_page_count);

        if (bp->b_pages != bp->b_page_array)
                kmem_free(bp->b_pages);

>
>
>         if (bp->b_pages != bp->b_page_array)
>                 kmem_free(bp->b_pages);
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 209a425739a9..589ea2731931 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -153,13 +153,16 @@ union swap_header {
>   * memory reclaim
>   */
>  struct reclaim_state {
> -       unsigned long reclaimed_slab;
> +       /* pages reclaimed outside of LRU-based reclaim */
> +       unsigned long reclaimed;
>  #ifdef CONFIG_LRU_GEN
>         /* per-thread mm walk data */
>         struct lru_gen_mm_walk *mm_walk;
>  #endif
>  };
>
> +void mm_account_reclaimed_pages(unsigned long pages);
> +
>  #ifdef __KERNEL__
>
>  struct address_space;
> diff --git a/mm/slab.c b/mm/slab.c
> index dabc2a671fc6..64bf1de817b2 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1392,8 +1392,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab)
>         smp_wmb();
>         __folio_clear_slab(folio);
>
> -       if (current->reclaim_state)
> -               current->reclaim_state->reclaimed_slab += 1 << order;
> +       mm_account_reclaimed_pages(1 << order);
>         unaccount_slab(slab, order, cachep);
>         __free_pages(&folio->page, order);
>  }
> diff --git a/mm/slob.c b/mm/slob.c
> index fe567fcfa3a3..79cc8680c973 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -61,7 +61,7 @@
>  #include <linux/slab.h>
>
>  #include <linux/mm.h>
> -#include <linux/swap.h> /* struct reclaim_state */
> +#include <linux/swap.h> /* mm_account_reclaimed_pages() */
>  #include <linux/cache.h>
>  #include <linux/init.h>
>  #include <linux/export.h>
> @@ -211,9 +211,7 @@ static void slob_free_pages(void *b, int order)
>  {
>         struct page *sp = virt_to_page(b);
>
> -       if (current->reclaim_state)
> -               current->reclaim_state->reclaimed_slab += 1 << order;
> -
> +       mm_account_reclaimed_pages(1 << order);
>         mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
>                             -(PAGE_SIZE << order));
>         __free_pages(sp, order);
> diff --git a/mm/slub.c b/mm/slub.c
> index 39327e98fce3..7aa30eef8235 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -11,7 +11,7 @@
>   */
>
>  #include <linux/mm.h>
> -#include <linux/swap.h> /* struct reclaim_state */
> +#include <linux/swap.h> /* mm_account_reclaimed_pages() */
>  #include <linux/module.h>
>  #include <linux/bit_spinlock.h>
>  #include <linux/interrupt.h>
> @@ -2063,8 +2063,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab)
>         /* Make the mapping reset visible before clearing the flag */
>         smp_wmb();
>         __folio_clear_slab(folio);
> -       if (current->reclaim_state)
> -               current->reclaim_state->reclaimed_slab += pages;
> +       mm_account_reclaimed_pages(pages);
>         unaccount_slab(slab, order, s);
>         __free_pages(&folio->page, order);
>  }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fef7d1c0f82b..a3e38851b34a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -511,6 +511,34 @@ static void set_task_reclaim_state(struct task_struct *task,
>         task->reclaim_state = rs;
>  }
>
> +/*
> + * mm_account_reclaimed_pages(): account reclaimed pages outside of LRU-based
> + * reclaim
> + * @pages: number of pages reclaimed
> + *
> + * If the current process is undergoing a reclaim operation, increment the
> + * number of reclaimed pages by @pages.
> + */
> +void mm_account_reclaimed_pages(unsigned long pages)
> +{
> +       if (current->reclaim_state)
> +               current->reclaim_state->reclaimed += pages;
> +}
> +EXPORT_SYMBOL(mm_account_reclaimed_pages);
> +
> +/*
> + * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
> + * scan_control->nr_reclaimed.
> + */
> +static void flush_reclaim_state(struct scan_control *sc,
> +                               struct reclaim_state *rs)
> +{
> +       if (rs) {
> +               sc->nr_reclaimed += rs->reclaimed;
> +               rs->reclaimed = 0;
> +       }
> +}
> +
>  static long xchg_nr_deferred(struct shrinker *shrinker,
>                              struct shrink_control *sc)
>  {
> @@ -5346,8 +5374,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
>                 vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
>                            sc->nr_reclaimed - reclaimed);
>
> -       sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
> -       current->reclaim_state->reclaimed_slab = 0;
> +       flush_reclaim_state(sc, current->reclaim_state);
>
>         return success ? MEMCG_LRU_YOUNG : 0;
>  }
> @@ -6472,10 +6499,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>
>         shrink_node_memcgs(pgdat, sc);
>
> -       if (reclaim_state) {
> -               sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> -               reclaim_state->reclaimed_slab = 0;
> -       }
> +       flush_reclaim_state(sc, reclaim_state);
>
>         /* Record the subtree's reclaim efficiency */
>         if (!sc->proactive)
> --
> 2.40.0.rc0.216.gc4246ad0f0-goog
>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state
  2023-03-09  9:31 ` [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
  2023-03-09  9:39   ` Yosry Ahmed
@ 2023-03-10  2:37   ` kernel test robot
  1 sibling, 0 replies; 7+ messages in thread
From: kernel test robot @ 2023-03-10  2:37 UTC (permalink / raw)
  To: Yosry Ahmed, Alexander Viro, Darrick J. Wong, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao
  Cc: llvm, oe-kbuild-all, linux-fsdevel, linux-kernel, linux-xfs,
	linux-mm, Yosry Ahmed

Hi Yosry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on linus/master v6.3-rc1 next-20230309]
[cannot apply to akpm-mm/mm-everything vbabka-slab/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-vmscan-move-set_task_reclaim_state-after-cgroup_reclaim/20230309-173354
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link:    https://lore.kernel.org/r/20230309093109.3039327-3-yosryahmed%40google.com
patch subject: [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state
config: powerpc-g5_defconfig (https://download.01.org/0day-ci/archive/20230310/202303101037.13QFIjZH-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 67409911353323ca5edf2049ef0df54132fa1ca7)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/63df60b27250f3f7a2892f99f27258b60a6e8e13
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yosry-Ahmed/mm-vmscan-move-set_task_reclaim_state-after-cgroup_reclaim/20230309-173354
        git checkout 63df60b27250f3f7a2892f99f27258b60a6e8e13
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/kernel/ drivers/net/usb/ fs/xfs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303101037.13QFIjZH-lkp@intel.com/

All errors (new ones prefixed by >>):

>> fs/xfs/xfs_buf.c:289:2: error: call to undeclared function 'report_freed_pages'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           report_freed_pages(bp->b_page_count);
           ^
   fs/xfs/xfs_buf.c:289:2: note: did you mean 'mempool_free_pages'?
   include/linux/mempool.h:102:6: note: 'mempool_free_pages' declared here
   void mempool_free_pages(void *element, void *pool_data);
        ^
   1 error generated.


vim +/report_freed_pages +289 fs/xfs/xfs_buf.c

   273	
   274	static void
   275	xfs_buf_free_pages(
   276		struct xfs_buf	*bp)
   277	{
   278		uint		i;
   279	
   280		ASSERT(bp->b_flags & _XBF_PAGES);
   281	
   282		if (xfs_buf_is_vmapped(bp))
   283			vm_unmap_ram(bp->b_addr, bp->b_page_count);
   284	
   285		for (i = 0; i < bp->b_page_count; i++) {
   286			if (bp->b_pages[i])
   287				__free_page(bp->b_pages[i]);
   288		}
 > 289		report_freed_pages(bp->b_page_count);
   290	
   291		if (bp->b_pages != bp->b_page_array)
   292			kmem_free(bp->b_pages);
   293		bp->b_pages = NULL;
   294		bp->b_flags &= ~_XBF_PAGES;
   295	}
   296	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] Ignore non-LRU-based reclaim in memcg reclaim
  2023-03-09  9:31 [PATCH v2 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
                   ` (2 preceding siblings ...)
  2023-03-09  9:31 ` [PATCH v2 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
@ 2023-03-23  4:05 ` Yosry Ahmed
  3 siblings, 0 replies; 7+ messages in thread
From: Yosry Ahmed @ 2023-03-23  4:05 UTC (permalink / raw)
  To: Alexander Viro, Darrick J. Wong, Christoph Lameter,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm

Any thoughts on this respin?

On Thu, Mar 9, 2023 at 1:31 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Upon running some proactive reclaim tests using memory.reclaim, we
> noticed some tests flaking where writing to memory.reclaim would be
> successful even though we did not reclaim the requested amount fully.
> Looking further into it, I discovered that *sometimes* we over-report
> the number of reclaimed pages in memcg reclaim.
>
> Reclaimed pages through other means than LRU-based reclaim are tracked
> through reclaim_state in struct scan_control, which is stashed in
> current task_struct. These pages are added to the number of reclaimed
> pages through LRUs. For memcg reclaim, these pages generally cannot be
> linked to the memcg under reclaim and can cause an overestimated count
> of reclaimed pages. This short series tries to address that.
>
> Patches 1-2 are just refactoring, they add helpers that wrap some
> operations on current->reclaim_state, and rename
> reclaim_state->reclaimed_slab to reclaim_state->reclaimed.
>
> Patch 3 ignores pages reclaimed outside of LRU reclaim in memcg reclaim.
> The pages are uncharged anyway, so even if we end up under-reporting
> reclaimed pages we will still succeed in making progress during
> charging.
>
> Do not let the diff stat deceive you, the core of this series is patch 3,
> which has one line of code change. All the rest is refactoring and one
> huge comment.
>
> v1 -> v2:
> - Renamed report_freed_pages() to mm_account_reclaimed_pages(), as
>   suggested by Dave Chinner. There were discussions about leaving
>   updating current->reclaim_state open-coded as it's not worth hiding
>   the current dereferencing to remove one line, but I'd rather have the
>   logic contained with mm/vmscan.c so that the next person that changes
>   this logic doesn't have to change 7 different files.
> - Renamed add_non_vmscan_reclaimed() to flush_reclaim_state() (Johannes
>   Weiner).
> - Added more context about how this problem was found in the cover
>   letter (Johannes Weiner).
> - Added a patch to move set_task_reclaim_state() below the definition of
>   cgroup_reclaim(), and added additional helpers in the same position.
>   This way all the helpers for reclaim_state live together, and there is
>   no need to declare cgroup_reclaim() early or move its definition
>   around to call it from flush_reclaim_state(). This should also fix the
>   build error reported by the bot in !CONFIG_MEMCG.
>
> RFC -> v1:
> - Exported report_freed_pages() in case XFS is built as a module (Matthew
>   Wilcox).
> - Renamed reclaimed_slab to reclaim in previously missed MGLRU code.
> - Refactored using reclaim_state to update sc->nr_reclaimed into a
>   helper and added an XL comment explaining why we ignore
>   reclaim_state->reclaimed in memcg reclaim (Johannes Weiner).
>
> Yosry Ahmed (3):
>   mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim()
>   mm: vmscan: refactor updating reclaimed pages in reclaim_state
>   mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
>
>  fs/inode.c           |  3 +-
>  fs/xfs/xfs_buf.c     |  3 +-
>  include/linux/swap.h |  5 ++-
>  mm/slab.c            |  3 +-
>  mm/slob.c            |  6 +--
>  mm/slub.c            |  5 +--
>  mm/vmscan.c          | 88 +++++++++++++++++++++++++++++++++++---------
>  7 files changed, 81 insertions(+), 32 deletions(-)
>
> --
> 2.40.0.rc0.216.gc4246ad0f0-goog
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-23  4:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-09  9:31 [PATCH v2 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
2023-03-09  9:31 ` [PATCH v2 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
2023-03-09  9:31 ` [PATCH v2 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
2023-03-09  9:39   ` Yosry Ahmed
2023-03-10  2:37   ` kernel test robot
2023-03-09  9:31 ` [PATCH v2 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
2023-03-23  4:05 ` [PATCH v2 0/3] Ignore " Yosry Ahmed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.