linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Ignore non-LRU-based reclaim in memcg reclaim
@ 2023-03-31  7:08 Yosry Ahmed
  2023-03-31  7:08 ` [PATCH v3 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Yosry Ahmed @ 2023-03-31  7:08 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao, Dave Chinner
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

Upon running some proactive reclaim tests using memory.reclaim, we
noticed some tests flaking where writing to memory.reclaim would be
successful even though we did not reclaim the requested amount fully.
Looking further into it, I discovered that *sometimes* we over-report
the number of reclaimed pages in memcg reclaim.

Reclaimed pages through other means than LRU-based reclaim are tracked
through reclaim_state in struct scan_control, which is stashed in
current task_struct. These pages are added to the number of reclaimed
pages through LRUs. For memcg reclaim, these pages generally cannot be
linked to the memcg under reclaim and can cause an overestimated count
of reclaimed pages. This short series tries to address that.

Patches 1-2 are just refactoring, they add helpers that wrap some
operations on current->reclaim_state, and rename
reclaim_state->reclaimed_slab to reclaim_state->reclaimed.

Patch 3 ignores pages reclaimed outside of LRU reclaim in memcg reclaim.
The pages are uncharged anyway, so even if we end up under-reporting
reclaimed pages we will still succeed in making progress during
charging.

Do not let the diff stat deceive you, the core of this series is patch 3,
which has one line of code change. All the rest is refactoring and one
huge comment.

v2 -> v3:
- Fixed a compilation problem in patch 2 reported by the bot.
- Rebased on top of v6.3-rc2.

v1 -> v2:
- Renamed report_freed_pages() to mm_account_reclaimed_pages(), as
  suggested by Dave Chinner. There were discussions about leaving
  updating current->reclaim_state open-coded as it's not worth hiding
  the current dereferencing to remove one line, but I'd rather have the
  logic contained with mm/vmscan.c so that the next person that changes
  this logic doesn't have to change 7 different files.
- Renamed add_non_vmscan_reclaimed() to flush_reclaim_state() (Johannes
  Weiner).
- Added more context about how this problem was found in the cover
  letter (Johannes Weiner).
- Added a patch to move set_task_reclaim_state() below the definition of
  cgroup_reclaim(), and added additional helpers in the same position.
  This way all the helpers for reclaim_state live together, and there is
  no need to declare cgroup_reclaim() early or move its definition
  around to call it from flush_reclaim_state(). This should also fix the
  build error reported by the bot in !CONFIG_MEMCG.

RFC -> v1:
- Exported report_freed_pages() in case XFS is built as a module (Matthew
  Wilcox).
- Renamed reclaimed_slab to reclaim in previously missed MGLRU code.
- Refactored using reclaim_state to update sc->nr_reclaimed into a
  helper and added an XL comment explaining why we ignore
  reclaim_state->reclaimed in memcg reclaim (Johannes Weiner).

Yosry Ahmed (3):
  mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim()
  mm: vmscan: refactor updating reclaimed pages in reclaim_state
  mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

 fs/inode.c           |  3 +-
 fs/xfs/xfs_buf.c     |  3 +-
 include/linux/swap.h |  5 ++-
 mm/slab.c            |  3 +-
 mm/slob.c            |  6 +--
 mm/slub.c            |  5 +--
 mm/vmscan.c          | 88 +++++++++++++++++++++++++++++++++++---------
 7 files changed, 81 insertions(+), 32 deletions(-)

-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim()
  2023-03-31  7:08 [PATCH v3 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
@ 2023-03-31  7:08 ` Yosry Ahmed
  2023-03-31  7:08 ` [PATCH v3 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
  2023-03-31  7:08 ` [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
  2 siblings, 0 replies; 8+ messages in thread
From: Yosry Ahmed @ 2023-03-31  7:08 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao, Dave Chinner
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

set_task_reclaim_state() is currently defined in mm/vmscan.c above
an #ifdef CONFIG_MEMCG block where cgroup_reclaim() is defined. We are
about to add some more helpers that operate on reclaim_state, and will
need to use cgroup_reclaim(). Move set_task_reclaim_state() after
the #ifdef CONFIG_MEMCG block containing the definition of
cgroup_reclaim() to keep helpers operating on reclaim_state together.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/vmscan.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c1c5e8b24b8f..fef7d1c0f82b2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -188,18 +188,6 @@ struct scan_control {
  */
 int vm_swappiness = 60;
 
-static void set_task_reclaim_state(struct task_struct *task,
-				   struct reclaim_state *rs)
-{
-	/* Check for an overwrite */
-	WARN_ON_ONCE(rs && task->reclaim_state);
-
-	/* Check for the nulling of an already-nulled member */
-	WARN_ON_ONCE(!rs && !task->reclaim_state);
-
-	task->reclaim_state = rs;
-}
-
 LIST_HEAD(shrinker_list);
 DECLARE_RWSEM(shrinker_rwsem);
 
@@ -511,6 +499,18 @@ static bool writeback_throttling_sane(struct scan_control *sc)
 }
 #endif
 
+static void set_task_reclaim_state(struct task_struct *task,
+				   struct reclaim_state *rs)
+{
+	/* Check for an overwrite */
+	WARN_ON_ONCE(rs && task->reclaim_state);
+
+	/* Check for the nulling of an already-nulled member */
+	WARN_ON_ONCE(!rs && !task->reclaim_state);
+
+	task->reclaim_state = rs;
+}
+
 static long xchg_nr_deferred(struct shrinker *shrinker,
 			     struct shrink_control *sc)
 {
-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state
  2023-03-31  7:08 [PATCH v3 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
  2023-03-31  7:08 ` [PATCH v3 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
@ 2023-03-31  7:08 ` Yosry Ahmed
  2023-03-31 20:51   ` Dave Chinner
  2023-03-31  7:08 ` [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
  2 siblings, 1 reply; 8+ messages in thread
From: Yosry Ahmed @ 2023-03-31  7:08 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao, Dave Chinner
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

During reclaim, we keep track of pages reclaimed from other means than
LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
which we stash a pointer to in current task_struct.

However, we keep track of more than just reclaimed slab pages through
this. We also use it for clean file pages dropped through pruned inodes,
and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add
a helper function that wraps updating it through current, so that future
changes to this logic are contained within mm/vmscan.c.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 fs/inode.c           |  3 +--
 fs/xfs/xfs_buf.c     |  3 +--
 include/linux/swap.h |  5 ++++-
 mm/slab.c            |  3 +--
 mm/slob.c            |  6 ++----
 mm/slub.c            |  5 ++---
 mm/vmscan.c          | 36 ++++++++++++++++++++++++++++++------
 7 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 4558dc2f13557..e60fcc41faf17 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -864,8 +864,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
 				__count_vm_events(KSWAPD_INODESTEAL, reap);
 			else
 				__count_vm_events(PGINODESTEAL, reap);
-			if (current->reclaim_state)
-				current->reclaim_state->reclaimed_slab += reap;
+			mm_account_reclaimed_pages(reap);
 		}
 		iput(inode);
 		spin_lock(lru_lock);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 54c774af6e1c6..15d1e5a7c2d34 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -286,8 +286,7 @@ xfs_buf_free_pages(
 		if (bp->b_pages[i])
 			__free_page(bp->b_pages[i]);
 	}
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += bp->b_page_count;
+	mm_account_reclaimed_pages(bp->b_page_count);
 
 	if (bp->b_pages != bp->b_page_array)
 		kmem_free(bp->b_pages);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 209a425739a9f..589ea2731931f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -153,13 +153,16 @@ union swap_header {
  * memory reclaim
  */
 struct reclaim_state {
-	unsigned long reclaimed_slab;
+	/* pages reclaimed outside of LRU-based reclaim */
+	unsigned long reclaimed;
 #ifdef CONFIG_LRU_GEN
 	/* per-thread mm walk data */
 	struct lru_gen_mm_walk *mm_walk;
 #endif
 };
 
+void mm_account_reclaimed_pages(unsigned long pages);
+
 #ifdef __KERNEL__
 
 struct address_space;
diff --git a/mm/slab.c b/mm/slab.c
index dabc2a671fc6f..64bf1de817b24 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1392,8 +1392,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab)
 	smp_wmb();
 	__folio_clear_slab(folio);
 
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += 1 << order;
+	mm_account_reclaimed_pages(1 << order);
 	unaccount_slab(slab, order, cachep);
 	__free_pages(&folio->page, order);
 }
diff --git a/mm/slob.c b/mm/slob.c
index fe567fcfa3a39..79cc8680c973c 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -61,7 +61,7 @@
 #include <linux/slab.h>
 
 #include <linux/mm.h>
-#include <linux/swap.h> /* struct reclaim_state */
+#include <linux/swap.h> /* mm_account_reclaimed_pages() */
 #include <linux/cache.h>
 #include <linux/init.h>
 #include <linux/export.h>
@@ -211,9 +211,7 @@ static void slob_free_pages(void *b, int order)
 {
 	struct page *sp = virt_to_page(b);
 
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += 1 << order;
-
+	mm_account_reclaimed_pages(1 << order);
 	mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
 			    -(PAGE_SIZE << order));
 	__free_pages(sp, order);
diff --git a/mm/slub.c b/mm/slub.c
index 39327e98fce34..7aa30eef82350 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -11,7 +11,7 @@
  */
 
 #include <linux/mm.h>
-#include <linux/swap.h> /* struct reclaim_state */
+#include <linux/swap.h> /* mm_account_reclaimed_pages() */
 #include <linux/module.h>
 #include <linux/bit_spinlock.h>
 #include <linux/interrupt.h>
@@ -2063,8 +2063,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab)
 	/* Make the mapping reset visible before clearing the flag */
 	smp_wmb();
 	__folio_clear_slab(folio);
-	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += pages;
+	mm_account_reclaimed_pages(pages);
 	unaccount_slab(slab, order, s);
 	__free_pages(&folio->page, order);
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fef7d1c0f82b2..a3e38851b34ac 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -511,6 +511,34 @@ static void set_task_reclaim_state(struct task_struct *task,
 	task->reclaim_state = rs;
 }
 
+/*
+ * mm_account_reclaimed_pages(): account reclaimed pages outside of LRU-based
+ * reclaim
+ * @pages: number of pages reclaimed
+ *
+ * If the current process is undergoing a reclaim operation, increment the
+ * number of reclaimed pages by @pages.
+ */
+void mm_account_reclaimed_pages(unsigned long pages)
+{
+	if (current->reclaim_state)
+		current->reclaim_state->reclaimed += pages;
+}
+EXPORT_SYMBOL(mm_account_reclaimed_pages);
+
+/*
+ * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to
+ * scan_control->nr_reclaimed.
+ */
+static void flush_reclaim_state(struct scan_control *sc,
+				struct reclaim_state *rs)
+{
+	if (rs) {
+		sc->nr_reclaimed += rs->reclaimed;
+		rs->reclaimed = 0;
+	}
+}
+
 static long xchg_nr_deferred(struct shrinker *shrinker,
 			     struct shrink_control *sc)
 {
@@ -5346,8 +5374,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 		vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
 			   sc->nr_reclaimed - reclaimed);
 
-	sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
-	current->reclaim_state->reclaimed_slab = 0;
+	flush_reclaim_state(sc, current->reclaim_state);
 
 	return success ? MEMCG_LRU_YOUNG : 0;
 }
@@ -6472,10 +6499,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 
 	shrink_node_memcgs(pgdat, sc);
 
-	if (reclaim_state) {
-		sc->nr_reclaimed += reclaim_state->reclaimed_slab;
-		reclaim_state->reclaimed_slab = 0;
-	}
+	flush_reclaim_state(sc, reclaim_state);
 
 	/* Record the subtree's reclaim efficiency */
 	if (!sc->proactive)
-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
  2023-03-31  7:08 [PATCH v3 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
  2023-03-31  7:08 ` [PATCH v3 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
  2023-03-31  7:08 ` [PATCH v3 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
@ 2023-03-31  7:08 ` Yosry Ahmed
  2023-03-31  7:25   ` Yu Zhao
  2 siblings, 1 reply; 8+ messages in thread
From: Yosry Ahmed @ 2023-03-31  7:08 UTC (permalink / raw)
  To: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao, Dave Chinner
  Cc: linux-fsdevel, linux-kernel, linux-xfs, linux-mm, Yosry Ahmed

We keep track of different types of reclaimed pages through
reclaim_state->reclaimed, and we add them to the reported number of
reclaimed pages. For non-memcg reclaim, this makes sense. For memcg
reclaim, we have no clue if those pages are charged to the memcg under
reclaim.

Slab pages are shared by different memcgs, so a freed slab page may have
only been partially charged to the memcg under reclaim. The same goes
for clean file pages from pruned inodes (on highmem systems) or xfs
buffer pages, there is no simple way to currently link them to the memcg
under reclaim.

Stop reporting those freed pages as reclaimed pages during memcg
reclaim. This should make the return value of writing to memory.reclaim,
and may help reduce unnecessary reclaim retries during memcg charging.

Generally, this should make the return value of
try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g.
freed a slab page that was mostly charged to the memcg under reclaim),
the return value of try_to_free_mem_cgroup_pages() can be
underestimated, but this should be fine. The freed pages will be
uncharged anyway, and we can charge the memcg the next time around as we
usually do memcg reclaim in a retry loop.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/vmscan.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a3e38851b34ac..bf9d8e175e92a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -533,7 +533,35 @@ EXPORT_SYMBOL(mm_account_reclaimed_pages);
 static void flush_reclaim_state(struct scan_control *sc,
 				struct reclaim_state *rs)
 {
-	if (rs) {
+	/*
+	 * Currently, reclaim_state->reclaimed includes three types of pages
+	 * freed outside of vmscan:
+	 * (1) Slab pages.
+	 * (2) Clean file pages from pruned inodes.
+	 * (3) XFS freed buffer pages.
+	 *
+	 * For all of these cases, we have no way of finding out whether these
+	 * pages were related to the memcg under reclaim. For example, a freed
+	 * slab page could have had only a single object charged to the memcg
+	 * under reclaim. Also, populated inodes are not on shrinker LRUs
+	 * anymore except on highmem systems.
+	 *
+	 * Instead of over-reporting the reclaimed pages in a memcg reclaim,
+	 * only count such pages in system-wide reclaim. This prevents
+	 * unnecessary retries during memcg charging and false positive from
+	 * proactive reclaim (memory.reclaim).
+	 *
+	 * For uncommon cases were the freed pages were actually significantly
+	 * charged to the memcg under reclaim, and we end up under-reporting, it
+	 * should be fine. The freed pages will be uncharged anyway, even if
+	 * they are not reported properly, and we will be able to make forward
+	 * progress in charging (which is usually in a retry loop).
+	 *
+	 * We can go one step further, and report the uncharged objcg pages in
+	 * memcg reclaim, to make reporting more accurate and reduce
+	 * under-reporting, but it's probably not worth the complexity for now.
+	 */
+	if (rs && !cgroup_reclaim(sc)) {
 		sc->nr_reclaimed += rs->reclaimed;
 		rs->reclaimed = 0;
 	}
-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
  2023-03-31  7:08 ` [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
@ 2023-03-31  7:25   ` Yu Zhao
  2023-03-31  7:30     ` Yosry Ahmed
  0 siblings, 1 reply; 8+ messages in thread
From: Yu Zhao @ 2023-03-31  7:25 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Dave Chinner,
	linux-fsdevel, linux-kernel, linux-xfs, linux-mm

On Fri, Mar 31, 2023 at 1:08 AM Yosry Ahmed <yosryahmed@google.com> wrote:

...

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a3e38851b34ac..bf9d8e175e92a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -533,7 +533,35 @@ EXPORT_SYMBOL(mm_account_reclaimed_pages);
>  static void flush_reclaim_state(struct scan_control *sc,
>                                 struct reclaim_state *rs)
>  {
> -       if (rs) {
> +       /*
> +        * Currently, reclaim_state->reclaimed includes three types of pages
> +        * freed outside of vmscan:
> +        * (1) Slab pages.
> +        * (2) Clean file pages from pruned inodes.
> +        * (3) XFS freed buffer pages.
> +        *
> +        * For all of these cases, we have no way of finding out whether these
> +        * pages were related to the memcg under reclaim. For example, a freed
> +        * slab page could have had only a single object charged to the memcg
> +        * under reclaim. Also, populated inodes are not on shrinker LRUs
> +        * anymore except on highmem systems.
> +        *
> +        * Instead of over-reporting the reclaimed pages in a memcg reclaim,
> +        * only count such pages in system-wide reclaim. This prevents
> +        * unnecessary retries during memcg charging and false positive from
> +        * proactive reclaim (memory.reclaim).

What happens when writing to the root memory.reclaim?

> +        *
> +        * For uncommon cases were the freed pages were actually significantly
> +        * charged to the memcg under reclaim, and we end up under-reporting, it
> +        * should be fine. The freed pages will be uncharged anyway, even if
> +        * they are not reported properly, and we will be able to make forward
> +        * progress in charging (which is usually in a retry loop).
> +        *
> +        * We can go one step further, and report the uncharged objcg pages in
> +        * memcg reclaim, to make reporting more accurate and reduce
> +        * under-reporting, but it's probably not worth the complexity for now.
> +        */
> +       if (rs && !cgroup_reclaim(sc)) {

To answer the question above, global_reclaim() would be preferred.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
  2023-03-31  7:25   ` Yu Zhao
@ 2023-03-31  7:30     ` Yosry Ahmed
  0 siblings, 0 replies; 8+ messages in thread
From: Yosry Ahmed @ 2023-03-31  7:30 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Dave Chinner,
	linux-fsdevel, linux-kernel, linux-xfs, linux-mm

On Fri, Mar 31, 2023 at 12:25 AM Yu Zhao <yuzhao@google.com> wrote:
>
> On Fri, Mar 31, 2023 at 1:08 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> ...
>
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a3e38851b34ac..bf9d8e175e92a 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -533,7 +533,35 @@ EXPORT_SYMBOL(mm_account_reclaimed_pages);
> >  static void flush_reclaim_state(struct scan_control *sc,
> >                                 struct reclaim_state *rs)
> >  {
> > -       if (rs) {
> > +       /*
> > +        * Currently, reclaim_state->reclaimed includes three types of pages
> > +        * freed outside of vmscan:
> > +        * (1) Slab pages.
> > +        * (2) Clean file pages from pruned inodes.
> > +        * (3) XFS freed buffer pages.
> > +        *
> > +        * For all of these cases, we have no way of finding out whether these
> > +        * pages were related to the memcg under reclaim. For example, a freed
> > +        * slab page could have had only a single object charged to the memcg
> > +        * under reclaim. Also, populated inodes are not on shrinker LRUs
> > +        * anymore except on highmem systems.
> > +        *
> > +        * Instead of over-reporting the reclaimed pages in a memcg reclaim,
> > +        * only count such pages in system-wide reclaim. This prevents
> > +        * unnecessary retries during memcg charging and false positive from
> > +        * proactive reclaim (memory.reclaim).
>
> What happens when writing to the root memory.reclaim?
>
> > +        *
> > +        * For uncommon cases were the freed pages were actually significantly
> > +        * charged to the memcg under reclaim, and we end up under-reporting, it
> > +        * should be fine. The freed pages will be uncharged anyway, even if
> > +        * they are not reported properly, and we will be able to make forward
> > +        * progress in charging (which is usually in a retry loop).
> > +        *
> > +        * We can go one step further, and report the uncharged objcg pages in
> > +        * memcg reclaim, to make reporting more accurate and reduce
> > +        * under-reporting, but it's probably not worth the complexity for now.
> > +        */
> > +       if (rs && !cgroup_reclaim(sc)) {
>
> To answer the question above, global_reclaim() would be preferred.

Great point, global_reclaim() is fairly recent. I didn't see it
before. Thanks for pointing it out. I will change it for v4 -- will
wait for more feedback before respinning.

Thanks Yu!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state
  2023-03-31  7:08 ` [PATCH v3 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
@ 2023-03-31 20:51   ` Dave Chinner
  2023-03-31 22:11     ` Yosry Ahmed
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2023-03-31 20:51 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao, linux-fsdevel,
	linux-kernel, linux-xfs, linux-mm

On Fri, Mar 31, 2023 at 07:08:17AM +0000, Yosry Ahmed wrote:
> During reclaim, we keep track of pages reclaimed from other means than
> LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
> which we stash a pointer to in current task_struct.
> 
> However, we keep track of more than just reclaimed slab pages through
> this. We also use it for clean file pages dropped through pruned inodes,
> and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add
> a helper function that wraps updating it through current, so that future
> changes to this logic are contained within mm/vmscan.c.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
.....
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fef7d1c0f82b2..a3e38851b34ac 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -511,6 +511,34 @@ static void set_task_reclaim_state(struct task_struct *task,
>  	task->reclaim_state = rs;
>  }
>  
> +/*
> + * mm_account_reclaimed_pages(): account reclaimed pages outside of LRU-based
> + * reclaim
> + * @pages: number of pages reclaimed
> + *
> + * If the current process is undergoing a reclaim operation, increment the
> + * number of reclaimed pages by @pages.
> + */
> +void mm_account_reclaimed_pages(unsigned long pages)
> +{
> +	if (current->reclaim_state)
> +		current->reclaim_state->reclaimed += pages;
> +}
> +EXPORT_SYMBOL(mm_account_reclaimed_pages);

Shouldn't this be a static inline in a header file?

Then you don't need an EXPORT_SYMBOL() - which should really be
EXPORT_SYMBOL_GPL() - and callers don't add the overhead of a
function call for two lines of code....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state
  2023-03-31 20:51   ` Dave Chinner
@ 2023-03-31 22:11     ` Yosry Ahmed
  0 siblings, 0 replies; 8+ messages in thread
From: Yosry Ahmed @ 2023-03-31 22:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, Alexander Viro, Darrick J. Wong,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, Matthew Wilcox (Oracle),
	Miaohe Lin, David Hildenbrand, Johannes Weiner, Peter Xu,
	NeilBrown, Shakeel Butt, Michal Hocko, Yu Zhao, linux-fsdevel,
	linux-kernel, linux-xfs, linux-mm

On Fri, Mar 31, 2023 at 1:51 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Mar 31, 2023 at 07:08:17AM +0000, Yosry Ahmed wrote:
> > During reclaim, we keep track of pages reclaimed from other means than
> > LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
> > which we stash a pointer to in current task_struct.
> >
> > However, we keep track of more than just reclaimed slab pages through
> > this. We also use it for clean file pages dropped through pruned inodes,
> > and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add
> > a helper function that wraps updating it through current, so that future
> > changes to this logic are contained within mm/vmscan.c.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> .....
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index fef7d1c0f82b2..a3e38851b34ac 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -511,6 +511,34 @@ static void set_task_reclaim_state(struct task_struct *task,
> >       task->reclaim_state = rs;
> >  }
> >
> > +/*
> > + * mm_account_reclaimed_pages(): account reclaimed pages outside of LRU-based
> > + * reclaim
> > + * @pages: number of pages reclaimed
> > + *
> > + * If the current process is undergoing a reclaim operation, increment the
> > + * number of reclaimed pages by @pages.
> > + */
> > +void mm_account_reclaimed_pages(unsigned long pages)
> > +{
> > +     if (current->reclaim_state)
> > +             current->reclaim_state->reclaimed += pages;
> > +}
> > +EXPORT_SYMBOL(mm_account_reclaimed_pages);
>
> Shouldn't this be a static inline in a header file?
>
> Then you don't need an EXPORT_SYMBOL() - which should really be
> EXPORT_SYMBOL_GPL() - and callers don't add the overhead of a
> function call for two lines of code....

Yes it should be. Thanks.

An earlier draft of this series had more going on in this function, so
I moved it to mm/vmscan.c, then I forgot to move it to the header
later. Will do that for v4.

>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-03-31 22:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-31  7:08 [PATCH v3 0/3] Ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
2023-03-31  7:08 ` [PATCH v3 1/3] mm: vmscan: move set_task_reclaim_state() after cgroup_reclaim() Yosry Ahmed
2023-03-31  7:08 ` [PATCH v3 2/3] mm: vmscan: refactor updating reclaimed pages in reclaim_state Yosry Ahmed
2023-03-31 20:51   ` Dave Chinner
2023-03-31 22:11     ` Yosry Ahmed
2023-03-31  7:08 ` [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim Yosry Ahmed
2023-03-31  7:25   ` Yu Zhao
2023-03-31  7:30     ` Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).