All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] mm: memcg: fix protection of reclaim target memcg
@ 2022-11-23  9:21 Yosry Ahmed
  2022-11-23  9:21 ` [PATCH v2 1/3] mm: memcg: fix stale " Yosry Ahmed
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-23  9:21 UTC (permalink / raw)
  To: Shakeel Butt, Roman Gushchin, Johannes Weiner, Michal Hocko,
	Yu Zhao, Muchun Song
  Cc: Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm, Yosry Ahmed

This series fixes a bug in calculating the protection of the reclaim
target memcg where we end up using stale effective protection values from
the last reclaim operation, instead of completely ignoring the
protection of the reclaim target as intended. More detailed explanation
and examples in patch 1, which includes the fix.
Patches 2 & 3 introduce a selftest case that catches the bug.

v1 -> v2:
- Instead of adding a new helper, extended
  mem_cgroup_supports_protection() to check if the current memcg is the
  target memcg, renamed to mem_cgroup_unprotected() which is much easier
  to reason about (suggested by Roman).
- Add a selftest case to catch the bug (suggested by Roman).

Yosry Ahmed (3):
  mm: memcg: fix stale protection of reclaim target memcg
  selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  selftests: cgroup: make sure reclaim target memcg is unprotected

 include/linux/memcontrol.h                    | 31 ++++--
 mm/vmscan.c                                   | 11 ++-
 .../selftests/cgroup/test_memcontrol.c        | 96 ++++++++++++-------
 3 files changed, 87 insertions(+), 51 deletions(-)

-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] mm: memcg: fix stale protection of reclaim target memcg
  2022-11-23  9:21 [PATCH v2 0/3] mm: memcg: fix protection of reclaim target memcg Yosry Ahmed
@ 2022-11-23  9:21 ` Yosry Ahmed
  2022-11-24  0:40   ` Roman Gushchin
  2022-11-23  9:21 ` [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until() Yosry Ahmed
  2022-11-23  9:21 ` [PATCH v2 3/3] selftests: cgroup: make sure reclaim target memcg is unprotected Yosry Ahmed
  2 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-23  9:21 UTC (permalink / raw)
  To: Shakeel Butt, Roman Gushchin, Johannes Weiner, Michal Hocko,
	Yu Zhao, Muchun Song
  Cc: Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm, Yosry Ahmed

During reclaim, mem_cgroup_calculate_protection() is used to determine
the effective protection (emin and elow) values of a memcg. The
protection of the reclaim target is ignored, but we cannot set their
effective protection to 0 due to a limitation of the current
implementation (see comment in mem_cgroup_protection()). Instead,
we leave their effective protection values unchaged, and later ignore it
in mem_cgroup_protection().

However, mem_cgroup_protection() is called later in
shrink_lruvec()->get_scan_count(), which is after the
mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a
result, the stale effective protection values of the target memcg may
lead us to skip reclaiming from the target memcg entirely, before
calling shrink_lruvec(). This can be even worse with recursive
protection, where the stale target memcg protection can be higher than
its standalone protection. See two examples below (a similar version of
example (a) is added to test_memcontrol in a later patch).

(a) A simple example with proactive reclaim is as follows. Consider the
following hierarchy:
ROOT
 |
 A
 |
 B (memory.min = 10M)

Consider the following scenario:
- B has memory.current = 10M.
- The system undergoes global reclaim (or memcg reclaim in A).
- In shrink_node_memcgs():
  - mem_cgroup_calculate_protection() calculates the effective min (emin)
    of B as 10M.
  - mem_cgroup_below_min() returns true for B, we do not reclaim from B.
- Now if we want to reclaim 5M from B using proactive reclaim
  (memory.reclaim), we should be able to, as the protection of the
  target memcg should be ignored.
- In shrink_node_memcgs():
  - mem_cgroup_calculate_protection() immediately returns for B without
    doing anything, as B is the target memcg, relying on
    mem_cgroup_protection() to ignore B's stale effective min (still 10M).
  - mem_cgroup_below_min() reads the stale effective min for B and we
    skip it instead of ignoring its protection as intended, as we never
    reach mem_cgroup_protection().

(b) An more complex example with recursive protection is as follows.
Consider the following hierarchy with memory_recursiveprot:
ROOT
 |
 A (memory.min = 50M)
 |
 B (memory.min = 10M, memory.high = 40M)

Consider the following scenario:
- B has memory.current = 35M.
- The system undergoes global reclaim (target memcg is NULL).
- B will have an effective min of 50M (all of A's unclaimed protection).
- B will not be reclaimed from.
- Now allocate 10M more memory in B, pushing it above it's high limit.
- The system undergoes memcg reclaim from B (target memcg is B).
- Like example (a), we do nothing in mem_cgroup_calculate_protection(),
  then call mem_cgroup_below_min(), which will read the stale effective
  min for B (50M) and skip it. In this case, it's even worse because we
  are not just considering B's standalone protection (10M), but we are
  reading a much higher stale protection (50M) which will cause us to not
  reclaim from B at all.

This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple
e{low,min} state mutations from protection checks") which made
mem_cgroup_calculate_protection() only change the state without
returning any value. Before that commit, we used to return
MEMCG_PROT_NONE for the target memcg, which would cause us to skip the
mem_cgroup_below_{min/low}() checks. After that commit we do not return
anything and we end up checking the min & low effective protections for
the target memcg, which are stale.

Update mem_cgroup_supports_protection() to also check if we are
reclaiming from the target, and rename it to mem_cgroup_unprotected()
(now returns true if we should not protect the memcg, much simpler logic).

Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 include/linux/memcontrol.h | 31 +++++++++++++++++++++----------
 mm/vmscan.c                | 11 ++++++-----
 2 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e1644a24009c..d3c8203cab6c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -615,28 +615,32 @@ static inline void mem_cgroup_protection(struct mem_cgroup *root,
 void mem_cgroup_calculate_protection(struct mem_cgroup *root,
 				     struct mem_cgroup *memcg);
 
-static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_unprotected(struct mem_cgroup *target,
+					  struct mem_cgroup *memcg)
 {
 	/*
 	 * The root memcg doesn't account charges, and doesn't support
-	 * protection.
+	 * protection. The target memcg's protection is ignored, see
+	 * mem_cgroup_calculate_protection() and mem_cgroup_protection()
 	 */
-	return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg);
-
+	return mem_cgroup_disabled() || mem_cgroup_is_root(memcg) ||
+		memcg == target;
 }
 
-static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
+					struct mem_cgroup *memcg)
 {
-	if (!mem_cgroup_supports_protection(memcg))
+	if (mem_cgroup_unprotected(target, memcg))
 		return false;
 
 	return READ_ONCE(memcg->memory.elow) >=
 		page_counter_read(&memcg->memory);
 }
 
-static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
+					struct mem_cgroup *memcg)
 {
-	if (!mem_cgroup_supports_protection(memcg))
+	if (mem_cgroup_unprotected(target, memcg))
 		return false;
 
 	return READ_ONCE(memcg->memory.emin) >=
@@ -1209,12 +1213,19 @@ static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
 {
 }
 
-static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_unprotected(struct mem_cgroup *target,
+					  struct mem_cgroup *memcg)
+{
+	return true;
+}
+static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
+					struct mem_cgroup *memcg)
 {
 	return false;
 }
 
-static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
+static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
+					struct mem_cgroup *memcg)
 {
 	return false;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 04d8b88e5216..79ef0fe67518 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4486,7 +4486,7 @@ static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned
 
 	mem_cgroup_calculate_protection(NULL, memcg);
 
-	if (mem_cgroup_below_min(memcg))
+	if (mem_cgroup_below_min(NULL, memcg))
 		return false;
 
 	need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, swappiness, &nr_to_scan);
@@ -5047,8 +5047,9 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *
 	DEFINE_MAX_SEQ(lruvec);
 	DEFINE_MIN_SEQ(lruvec);
 
-	if (mem_cgroup_below_min(memcg) ||
-	    (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim))
+	if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg) ||
+	    (mem_cgroup_below_low(sc->target_mem_cgroup, memcg) &&
+	     !sc->memcg_low_reclaim))
 		return 0;
 
 	*need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, can_swap, &nr_to_scan);
@@ -6048,13 +6049,13 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 
 		mem_cgroup_calculate_protection(target_memcg, memcg);
 
-		if (mem_cgroup_below_min(memcg)) {
+		if (mem_cgroup_below_min(target_memcg, memcg)) {
 			/*
 			 * Hard protection.
 			 * If there is no reclaimable memory, OOM.
 			 */
 			continue;
-		} else if (mem_cgroup_below_low(memcg)) {
+		} else if (mem_cgroup_below_low(target_memcg, memcg)) {
 			/*
 			 * Soft protection.
 			 * Respect the protection only as long as
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  2022-11-23  9:21 [PATCH v2 0/3] mm: memcg: fix protection of reclaim target memcg Yosry Ahmed
  2022-11-23  9:21 ` [PATCH v2 1/3] mm: memcg: fix stale " Yosry Ahmed
@ 2022-11-23  9:21 ` Yosry Ahmed
  2022-11-24  1:03   ` Roman Gushchin
  2022-11-23  9:21 ` [PATCH v2 3/3] selftests: cgroup: make sure reclaim target memcg is unprotected Yosry Ahmed
  2 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-23  9:21 UTC (permalink / raw)
  To: Shakeel Butt, Roman Gushchin, Johannes Weiner, Michal Hocko,
	Yu Zhao, Muchun Song
  Cc: Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm, Yosry Ahmed

Refactor the code that drives writing to memory.reclaim (retrying, error
handling, etc) from test_memcg_reclaim() to a helper called
reclaim_until(), which proactively reclaims from a memcg until its
usage reaches a certain value.

This will be used in a following patch in another test.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 .../selftests/cgroup/test_memcontrol.c        | 85 +++++++++++--------
 1 file changed, 49 insertions(+), 36 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 8833359556f3..d4182e94945e 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -645,6 +645,53 @@ static int test_memcg_max(const char *root)
 	return ret;
 }
 
+/* Reclaim from @memcg until usage reaches @goal_usage */
+static bool reclaim_until(const char *memcg, long goal_usage)
+{
+	char buf[64];
+	int retries = 5;
+	int err;
+	long current, to_reclaim;
+
+	/* Nothing to do here */
+	if (cg_read_long(memcg, "memory.current") <= goal_usage)
+		return true;
+
+	while (true) {
+		current = cg_read_long(memcg, "memory.current");
+		to_reclaim = current - goal_usage;
+
+		/*
+		 * We only keep looping if we get -EAGAIN, which means we could
+		 * not reclaim the full amount. This means we got -EAGAIN when
+		 * we actually reclaimed the requested amount, so fail.
+		 */
+		if (to_reclaim <= 0)
+			break;
+
+		snprintf(buf, sizeof(buf), "%ld", to_reclaim);
+		err = cg_write(memcg, "memory.reclaim", buf);
+		if (!err) {
+			/*
+			 * If writing succeeds, then the written amount should have been
+			 * fully reclaimed (and maybe more).
+			 */
+			current = cg_read_long(memcg, "memory.current");
+			if (!values_close(current, goal_usage, 3) && current > goal_usage)
+				break;
+			return true;
+		}
+
+		/* The kernel could not reclaim the full amount, try again. */
+		if (err == -EAGAIN && retries--)
+			continue;
+
+		/* We got an unexpected error or ran out of retries. */
+		break;
+	}
+	return false;
+}
+
 /*
  * This test checks that memory.reclaim reclaims the given
  * amount of memory (from both anon and file, if possible).
@@ -653,8 +700,7 @@ static int test_memcg_reclaim(const char *root)
 {
 	int ret = KSFT_FAIL, fd, retries;
 	char *memcg;
-	long current, expected_usage, to_reclaim;
-	char buf[64];
+	long current, expected_usage;
 
 	memcg = cg_name(root, "memcg_test");
 	if (!memcg)
@@ -705,41 +751,8 @@ static int test_memcg_reclaim(const char *root)
 	 * Reclaim until current reaches 30M, this makes sure we hit both anon
 	 * and file if swap is enabled.
 	 */
-	retries = 5;
-	while (true) {
-		int err;
-
-		current = cg_read_long(memcg, "memory.current");
-		to_reclaim = current - MB(30);
-
-		/*
-		 * We only keep looping if we get EAGAIN, which means we could
-		 * not reclaim the full amount.
-		 */
-		if (to_reclaim <= 0)
-			goto cleanup;
-
-
-		snprintf(buf, sizeof(buf), "%ld", to_reclaim);
-		err = cg_write(memcg, "memory.reclaim", buf);
-		if (!err) {
-			/*
-			 * If writing succeeds, then the written amount should have been
-			 * fully reclaimed (and maybe more).
-			 */
-			current = cg_read_long(memcg, "memory.current");
-			if (!values_close(current, MB(30), 3) && current > MB(30))
-				goto cleanup;
-			break;
-		}
-
-		/* The kernel could not reclaim the full amount, try again. */
-		if (err == -EAGAIN && retries--)
-			continue;
-
-		/* We got an unexpected error or ran out of retries. */
+	if (!reclaim_until(memcg, MB(30)))
 		goto cleanup;
-	}
 
 	ret = KSFT_PASS;
 cleanup:
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] selftests: cgroup: make sure reclaim target memcg is unprotected
  2022-11-23  9:21 [PATCH v2 0/3] mm: memcg: fix protection of reclaim target memcg Yosry Ahmed
  2022-11-23  9:21 ` [PATCH v2 1/3] mm: memcg: fix stale " Yosry Ahmed
  2022-11-23  9:21 ` [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until() Yosry Ahmed
@ 2022-11-23  9:21 ` Yosry Ahmed
  2022-11-24  1:04   ` Roman Gushchin
  2 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-23  9:21 UTC (permalink / raw)
  To: Shakeel Butt, Roman Gushchin, Johannes Weiner, Michal Hocko,
	Yu Zhao, Muchun Song
  Cc: Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm, Yosry Ahmed

Make sure that we ignore protection of a memcg that is the target of
memcg reclaim.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 tools/testing/selftests/cgroup/test_memcontrol.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index d4182e94945e..bac3b91f1579 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -238,6 +238,8 @@ static int cg_test_proc_killed(const char *cgroup)
 	return -1;
 }
 
+static bool reclaim_until(const char *memcg, long goal_usage);
+
 /*
  * First, this test creates the following hierarchy:
  * A       memory.min = 0,    memory.max = 200M
@@ -266,6 +268,12 @@ static int cg_test_proc_killed(const char *cgroup)
  * unprotected memory in A available, and checks that:
  * a) memory.min protects pagecache even in this case,
  * b) memory.low allows reclaiming page cache with low events.
+ *
+ * Then we try to reclaim from A/B/C using memory.reclaim until its
+ * usage reaches 10M.
+ * This makes sure that:
+ * (a) We ignore the protection of the reclaim target memcg.
+ * (b) The previously calculated emin value (~29M) should be dismissed.
  */
 static int test_memcg_protection(const char *root, bool min)
 {
@@ -385,6 +393,9 @@ static int test_memcg_protection(const char *root, bool min)
 	if (!values_close(cg_read_long(parent[1], "memory.current"), MB(50), 3))
 		goto cleanup;
 
+	if (!reclaim_until(children[0], MB(10)))
+		goto cleanup;
+
 	if (min) {
 		ret = KSFT_PASS;
 		goto cleanup;
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] mm: memcg: fix stale protection of reclaim target memcg
  2022-11-23  9:21 ` [PATCH v2 1/3] mm: memcg: fix stale " Yosry Ahmed
@ 2022-11-24  0:40   ` Roman Gushchin
  2022-11-24  0:57     ` Yosry Ahmed
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2022-11-24  0:40 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 23, 2022 at 09:21:30AM +0000, Yosry Ahmed wrote:
> During reclaim, mem_cgroup_calculate_protection() is used to determine
> the effective protection (emin and elow) values of a memcg. The
> protection of the reclaim target is ignored, but we cannot set their
> effective protection to 0 due to a limitation of the current
> implementation (see comment in mem_cgroup_protection()). Instead,
> we leave their effective protection values unchaged, and later ignore it
> in mem_cgroup_protection().
> 
> However, mem_cgroup_protection() is called later in
> shrink_lruvec()->get_scan_count(), which is after the
> mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a
> result, the stale effective protection values of the target memcg may
> lead us to skip reclaiming from the target memcg entirely, before
> calling shrink_lruvec(). This can be even worse with recursive
> protection, where the stale target memcg protection can be higher than
> its standalone protection. See two examples below (a similar version of
> example (a) is added to test_memcontrol in a later patch).
> 
> (a) A simple example with proactive reclaim is as follows. Consider the
> following hierarchy:
> ROOT
>  |
>  A
>  |
>  B (memory.min = 10M)
> 
> Consider the following scenario:
> - B has memory.current = 10M.
> - The system undergoes global reclaim (or memcg reclaim in A).
> - In shrink_node_memcgs():
>   - mem_cgroup_calculate_protection() calculates the effective min (emin)
>     of B as 10M.
>   - mem_cgroup_below_min() returns true for B, we do not reclaim from B.
> - Now if we want to reclaim 5M from B using proactive reclaim
>   (memory.reclaim), we should be able to, as the protection of the
>   target memcg should be ignored.
> - In shrink_node_memcgs():
>   - mem_cgroup_calculate_protection() immediately returns for B without
>     doing anything, as B is the target memcg, relying on
>     mem_cgroup_protection() to ignore B's stale effective min (still 10M).
>   - mem_cgroup_below_min() reads the stale effective min for B and we
>     skip it instead of ignoring its protection as intended, as we never
>     reach mem_cgroup_protection().
> 
> (b) An more complex example with recursive protection is as follows.
> Consider the following hierarchy with memory_recursiveprot:
> ROOT
>  |
>  A (memory.min = 50M)
>  |
>  B (memory.min = 10M, memory.high = 40M)
> 
> Consider the following scenario:
> - B has memory.current = 35M.
> - The system undergoes global reclaim (target memcg is NULL).
> - B will have an effective min of 50M (all of A's unclaimed protection).
> - B will not be reclaimed from.
> - Now allocate 10M more memory in B, pushing it above it's high limit.
> - The system undergoes memcg reclaim from B (target memcg is B).
> - Like example (a), we do nothing in mem_cgroup_calculate_protection(),
>   then call mem_cgroup_below_min(), which will read the stale effective
>   min for B (50M) and skip it. In this case, it's even worse because we
>   are not just considering B's standalone protection (10M), but we are
>   reading a much higher stale protection (50M) which will cause us to not
>   reclaim from B at all.
> 
> This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple
> e{low,min} state mutations from protection checks") which made
> mem_cgroup_calculate_protection() only change the state without
> returning any value. Before that commit, we used to return
> MEMCG_PROT_NONE for the target memcg, which would cause us to skip the
> mem_cgroup_below_{min/low}() checks. After that commit we do not return
> anything and we end up checking the min & low effective protections for
> the target memcg, which are stale.
> 
> Update mem_cgroup_supports_protection() to also check if we are
> reclaiming from the target, and rename it to mem_cgroup_unprotected()
> (now returns true if we should not protect the memcg, much simpler logic).
> 
> Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Thank you!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/3] mm: memcg: fix stale protection of reclaim target memcg
  2022-11-24  0:40   ` Roman Gushchin
@ 2022-11-24  0:57     ` Yosry Ahmed
  0 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-24  0:57 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 23, 2022 at 4:40 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Nov 23, 2022 at 09:21:30AM +0000, Yosry Ahmed wrote:
> > During reclaim, mem_cgroup_calculate_protection() is used to determine
> > the effective protection (emin and elow) values of a memcg. The
> > protection of the reclaim target is ignored, but we cannot set their
> > effective protection to 0 due to a limitation of the current
> > implementation (see comment in mem_cgroup_protection()). Instead,
> > we leave their effective protection values unchaged, and later ignore it
> > in mem_cgroup_protection().
> >
> > However, mem_cgroup_protection() is called later in
> > shrink_lruvec()->get_scan_count(), which is after the
> > mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a
> > result, the stale effective protection values of the target memcg may
> > lead us to skip reclaiming from the target memcg entirely, before
> > calling shrink_lruvec(). This can be even worse with recursive
> > protection, where the stale target memcg protection can be higher than
> > its standalone protection. See two examples below (a similar version of
> > example (a) is added to test_memcontrol in a later patch).
> >
> > (a) A simple example with proactive reclaim is as follows. Consider the
> > following hierarchy:
> > ROOT
> >  |
> >  A
> >  |
> >  B (memory.min = 10M)
> >
> > Consider the following scenario:
> > - B has memory.current = 10M.
> > - The system undergoes global reclaim (or memcg reclaim in A).
> > - In shrink_node_memcgs():
> >   - mem_cgroup_calculate_protection() calculates the effective min (emin)
> >     of B as 10M.
> >   - mem_cgroup_below_min() returns true for B, we do not reclaim from B.
> > - Now if we want to reclaim 5M from B using proactive reclaim
> >   (memory.reclaim), we should be able to, as the protection of the
> >   target memcg should be ignored.
> > - In shrink_node_memcgs():
> >   - mem_cgroup_calculate_protection() immediately returns for B without
> >     doing anything, as B is the target memcg, relying on
> >     mem_cgroup_protection() to ignore B's stale effective min (still 10M).
> >   - mem_cgroup_below_min() reads the stale effective min for B and we
> >     skip it instead of ignoring its protection as intended, as we never
> >     reach mem_cgroup_protection().
> >
> > (b) An more complex example with recursive protection is as follows.
> > Consider the following hierarchy with memory_recursiveprot:
> > ROOT
> >  |
> >  A (memory.min = 50M)
> >  |
> >  B (memory.min = 10M, memory.high = 40M)
> >
> > Consider the following scenario:
> > - B has memory.current = 35M.
> > - The system undergoes global reclaim (target memcg is NULL).
> > - B will have an effective min of 50M (all of A's unclaimed protection).
> > - B will not be reclaimed from.
> > - Now allocate 10M more memory in B, pushing it above it's high limit.
> > - The system undergoes memcg reclaim from B (target memcg is B).
> > - Like example (a), we do nothing in mem_cgroup_calculate_protection(),
> >   then call mem_cgroup_below_min(), which will read the stale effective
> >   min for B (50M) and skip it. In this case, it's even worse because we
> >   are not just considering B's standalone protection (10M), but we are
> >   reading a much higher stale protection (50M) which will cause us to not
> >   reclaim from B at all.
> >
> > This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple
> > e{low,min} state mutations from protection checks") which made
> > mem_cgroup_calculate_protection() only change the state without
> > returning any value. Before that commit, we used to return
> > MEMCG_PROT_NONE for the target memcg, which would cause us to skip the
> > mem_cgroup_below_{min/low}() checks. After that commit we do not return
> > anything and we end up checking the min & low effective protections for
> > the target memcg, which are stale.
> >
> > Update mem_cgroup_supports_protection() to also check if we are
> > reclaiming from the target, and rename it to mem_cgroup_unprotected()
> > (now returns true if we should not protect the memcg, much simpler logic).
> >
> > Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
>
> Thank you!

Thanks for reviewing!

Do you think we need a CC to stable here?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  2022-11-23  9:21 ` [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until() Yosry Ahmed
@ 2022-11-24  1:03   ` Roman Gushchin
  2022-11-24  3:16     ` Yosry Ahmed
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2022-11-24  1:03 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 23, 2022 at 09:21:31AM +0000, Yosry Ahmed wrote:
> Refactor the code that drives writing to memory.reclaim (retrying, error
> handling, etc) from test_memcg_reclaim() to a helper called
> reclaim_until(), which proactively reclaims from a memcg until its
> usage reaches a certain value.
> 
> This will be used in a following patch in another test.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  .../selftests/cgroup/test_memcontrol.c        | 85 +++++++++++--------
>  1 file changed, 49 insertions(+), 36 deletions(-)
> 
> diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
> index 8833359556f3..d4182e94945e 100644
> --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> @@ -645,6 +645,53 @@ static int test_memcg_max(const char *root)
>  	return ret;


The code below looks correct, but can be simplified a bit.
And btw thank you for adding a test!

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
(idk if you want invest your time in further simplication of this code,
it was this way before this patch, so up to you).

>  }
>  
> +/* Reclaim from @memcg until usage reaches @goal_usage */
> +static bool reclaim_until(const char *memcg, long goal_usage)
> +{
> +	char buf[64];
> +	int retries = 5;
> +	int err;
> +	long current, to_reclaim;
> +
> +	/* Nothing to do here */
> +	if (cg_read_long(memcg, "memory.current") <= goal_usage)
> +		return true;
> +
> +	while (true) {
> +		current = cg_read_long(memcg, "memory.current");
> +		to_reclaim = current - goal_usage;
> +
> +		/*
> +		 * We only keep looping if we get -EAGAIN, which means we could
> +		 * not reclaim the full amount. This means we got -EAGAIN when
> +		 * we actually reclaimed the requested amount, so fail.
> +		 */
> +		if (to_reclaim <= 0)
> +			break;
> +
> +		snprintf(buf, sizeof(buf), "%ld", to_reclaim);
> +		err = cg_write(memcg, "memory.reclaim", buf);
> +		if (!err) {
> +			/*
> +			 * If writing succeeds, then the written amount should have been
> +			 * fully reclaimed (and maybe more).
> +			 */
> +			current = cg_read_long(memcg, "memory.current");
> +			if (!values_close(current, goal_usage, 3) && current > goal_usage)
> +				break;

There are 3 places in this function where memory.current is read and compared
to goal_usage. I believe only one can be left.

> +			return true;
> +		}
> +
> +		/* The kernel could not reclaim the full amount, try again. */
> +		if (err == -EAGAIN && retries--)
> +			continue;
> +
> +		/* We got an unexpected error or ran out of retries. */
> +		break;

if (err != -EAGAIN || retries--)
	break;

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/3] selftests: cgroup: make sure reclaim target memcg is unprotected
  2022-11-23  9:21 ` [PATCH v2 3/3] selftests: cgroup: make sure reclaim target memcg is unprotected Yosry Ahmed
@ 2022-11-24  1:04   ` Roman Gushchin
  0 siblings, 0 replies; 13+ messages in thread
From: Roman Gushchin @ 2022-11-24  1:04 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 23, 2022 at 09:21:32AM +0000, Yosry Ahmed wrote:
> Make sure that we ignore protection of a memcg that is the target of
> memcg reclaim.
> 
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  tools/testing/selftests/cgroup/test_memcontrol.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Thank you!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  2022-11-24  1:03   ` Roman Gushchin
@ 2022-11-24  3:16     ` Yosry Ahmed
  2022-11-29 19:42       ` Yosry Ahmed
  0 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-24  3:16 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 23, 2022 at 5:03 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Nov 23, 2022 at 09:21:31AM +0000, Yosry Ahmed wrote:
> > Refactor the code that drives writing to memory.reclaim (retrying, error
> > handling, etc) from test_memcg_reclaim() to a helper called
> > reclaim_until(), which proactively reclaims from a memcg until its
> > usage reaches a certain value.
> >
> > This will be used in a following patch in another test.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  .../selftests/cgroup/test_memcontrol.c        | 85 +++++++++++--------
> >  1 file changed, 49 insertions(+), 36 deletions(-)
> >
> > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
> > index 8833359556f3..d4182e94945e 100644
> > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > @@ -645,6 +645,53 @@ static int test_memcg_max(const char *root)
> >       return ret;
>
>
> The code below looks correct, but can be simplified a bit.
> And btw thank you for adding a test!
>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> (idk if you want invest your time in further simplication of this code,
> it was this way before this patch, so up to you).

I don't "want" to, but the voices in my head won't shut up until I do so..

Here's a patch that simplifies the code, I inlined it here to avoid
sending a new version. If it looks good to you, it can be squashed
into this patch or merged separately (whatever you and Andrew prefer).
I can also send it in a separate thread if preferred.

From 18c40d61dac05b33cfc9233b17979b54422ed7c5 Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosryahmed@google.com>
Date: Thu, 24 Nov 2022 02:21:12 +0000
Subject: [PATCH] selftests: cgroup: simplify memcg reclaim code

Simplify the code for the reclaim_until() helper used for memcg reclaim
through memory.reclaim.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 .../selftests/cgroup/test_memcontrol.c        | 65 ++++++++++---------
 1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c
b/tools/testing/selftests/cgroup/test_memcontrol.c
index bac3b91f1579..2e2bde44a6f7 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -17,6 +17,7 @@
 #include <netdb.h>
 #include <errno.h>
 #include <sys/mman.h>
+#include <limits.h>

 #include "../kselftest.h"
 #include "cgroup_util.h"
@@ -656,51 +657,51 @@ static int test_memcg_max(const char *root)
        return ret;
 }

-/* Reclaim from @memcg until usage reaches @goal_usage */
+/*
+ * Reclaim from @memcg until usage reaches @goal_usage by writing to
+ * memory.reclaim.
+ *
+ * This function will return false if the usage is already below the
+ * goal.
+ *
+ * This function assumes that writing to memory.reclaim is the only
+ * source of change in memory.current (no concurrent allocations or
+ * reclaim).
+ *
+ * This function makes sure memory.reclaim is sane. It will return
+ * false if memory.reclaim's error codes do not make sense, even if
+ * the usage goal was satisfied.
+ */
 static bool reclaim_until(const char *memcg, long goal_usage)
 {
        char buf[64];
        int retries = 5;
-       int err;
+       int err = INT_MAX;
        long current, to_reclaim;

-       /* Nothing to do here */
-       if (cg_read_long(memcg, "memory.current") <= goal_usage)
-               return true;
-
        while (true) {
                current = cg_read_long(memcg, "memory.current");
-               to_reclaim = current - goal_usage;

-               /*
-                * We only keep looping if we get -EAGAIN, which means we could
-                * not reclaim the full amount. This means we got -EAGAIN when
-                * we actually reclaimed the requested amount, so fail.
-                */
-               if (to_reclaim <= 0)
-                       break;
+               /* First iteration*/
+               if (err == INT_MAX) {
+                       if (current <= goal_usage)
+                               return false;
+               /* Write successful, check reclaimed amount */
+               } else if (!err) {
+                       return current <= goal_usage ||
+                               values_close(current, goal_usage, 3);
+               /* Unexpected error, or ran out of retries */
+               } else if (err != -EAGAIN || !retries--) {
+                       return false;
+               /* EAGAIN -> retry, but check for false negatives */
+               } else if (current <= goal_usage) {
+                       return false;
+               }

+               to_reclaim = current - goal_usage;
                snprintf(buf, sizeof(buf), "%ld", to_reclaim);
                err = cg_write(memcg, "memory.reclaim", buf);
-               if (!err) {
-                       /*
-                        * If writing succeeds, then the written
amount should have been
-                        * fully reclaimed (and maybe more).
-                        */
-                       current = cg_read_long(memcg, "memory.current");
-                       if (!values_close(current, goal_usage, 3) &&
current > goal_usage)
-                               break;
-                       return true;
-               }
-
-               /* The kernel could not reclaim the full amount, try again. */
-               if (err == -EAGAIN && retries--)
-                       continue;
-
-               /* We got an unexpected error or ran out of retries. */
-               break;
        }
-       return false;
 }

 /*
--
2.38.1.584.g0f3c55d4c2-goog

>
> >  }
> >
> > +/* Reclaim from @memcg until usage reaches @goal_usage */
> > +static bool reclaim_until(const char *memcg, long goal_usage)
> > +{
> > +     char buf[64];
> > +     int retries = 5;
> > +     int err;
> > +     long current, to_reclaim;
> > +
> > +     /* Nothing to do here */
> > +     if (cg_read_long(memcg, "memory.current") <= goal_usage)
> > +             return true;
> > +
> > +     while (true) {
> > +             current = cg_read_long(memcg, "memory.current");
> > +             to_reclaim = current - goal_usage;
> > +
> > +             /*
> > +              * We only keep looping if we get -EAGAIN, which means we could
> > +              * not reclaim the full amount. This means we got -EAGAIN when
> > +              * we actually reclaimed the requested amount, so fail.
> > +              */
> > +             if (to_reclaim <= 0)
> > +                     break;
> > +
> > +             snprintf(buf, sizeof(buf), "%ld", to_reclaim);
> > +             err = cg_write(memcg, "memory.reclaim", buf);
> > +             if (!err) {
> > +                     /*
> > +                      * If writing succeeds, then the written amount should have been
> > +                      * fully reclaimed (and maybe more).
> > +                      */
> > +                     current = cg_read_long(memcg, "memory.current");
> > +                     if (!values_close(current, goal_usage, 3) && current > goal_usage)
> > +                             break;
>
> There are 3 places in this function where memory.current is read and compared
> to goal_usage. I believe only one can be left.
>
> > +                     return true;
> > +             }
> > +
> > +             /* The kernel could not reclaim the full amount, try again. */
> > +             if (err == -EAGAIN && retries--)
> > +                     continue;
> > +
> > +             /* We got an unexpected error or ran out of retries. */
> > +             break;
>
> if (err != -EAGAIN || retries--)
>         break;
>
> Thanks!

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  2022-11-24  3:16     ` Yosry Ahmed
@ 2022-11-29 19:42       ` Yosry Ahmed
  2022-11-30 17:19         ` Roman Gushchin
  0 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-29 19:42 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 23, 2022 at 7:16 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Nov 23, 2022 at 5:03 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Wed, Nov 23, 2022 at 09:21:31AM +0000, Yosry Ahmed wrote:
> > > Refactor the code that drives writing to memory.reclaim (retrying, error
> > > handling, etc) from test_memcg_reclaim() to a helper called
> > > reclaim_until(), which proactively reclaims from a memcg until its
> > > usage reaches a certain value.
> > >
> > > This will be used in a following patch in another test.
> > >
> > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > ---
> > >  .../selftests/cgroup/test_memcontrol.c        | 85 +++++++++++--------
> > >  1 file changed, 49 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > index 8833359556f3..d4182e94945e 100644
> > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > @@ -645,6 +645,53 @@ static int test_memcg_max(const char *root)
> > >       return ret;
> >
> >
> > The code below looks correct, but can be simplified a bit.
> > And btw thank you for adding a test!
> >
> > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > (idk if you want invest your time in further simplication of this code,
> > it was this way before this patch, so up to you).
>
> I don't "want" to, but the voices in my head won't shut up until I do so..
>
> Here's a patch that simplifies the code, I inlined it here to avoid
> sending a new version. If it looks good to you, it can be squashed
> into this patch or merged separately (whatever you and Andrew prefer).
> I can also send it in a separate thread if preferred.

Roman, any thoughts on this?

>
>
> From 18c40d61dac05b33cfc9233b17979b54422ed7c5 Mon Sep 17 00:00:00 2001
> From: Yosry Ahmed <yosryahmed@google.com>
> Date: Thu, 24 Nov 2022 02:21:12 +0000
> Subject: [PATCH] selftests: cgroup: simplify memcg reclaim code
>
> Simplify the code for the reclaim_until() helper used for memcg reclaim
> through memory.reclaim.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
>  .../selftests/cgroup/test_memcontrol.c        | 65 ++++++++++---------
>  1 file changed, 33 insertions(+), 32 deletions(-)
>
> diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c
> b/tools/testing/selftests/cgroup/test_memcontrol.c
> index bac3b91f1579..2e2bde44a6f7 100644
> --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> @@ -17,6 +17,7 @@
>  #include <netdb.h>
>  #include <errno.h>
>  #include <sys/mman.h>
> +#include <limits.h>
>
>  #include "../kselftest.h"
>  #include "cgroup_util.h"
> @@ -656,51 +657,51 @@ static int test_memcg_max(const char *root)
>         return ret;
>  }
>
> -/* Reclaim from @memcg until usage reaches @goal_usage */
> +/*
> + * Reclaim from @memcg until usage reaches @goal_usage by writing to
> + * memory.reclaim.
> + *
> + * This function will return false if the usage is already below the
> + * goal.
> + *
> + * This function assumes that writing to memory.reclaim is the only
> + * source of change in memory.current (no concurrent allocations or
> + * reclaim).
> + *
> + * This function makes sure memory.reclaim is sane. It will return
> + * false if memory.reclaim's error codes do not make sense, even if
> + * the usage goal was satisfied.
> + */
>  static bool reclaim_until(const char *memcg, long goal_usage)
>  {
>         char buf[64];
>         int retries = 5;
> -       int err;
> +       int err = INT_MAX;
>         long current, to_reclaim;
>
> -       /* Nothing to do here */
> -       if (cg_read_long(memcg, "memory.current") <= goal_usage)
> -               return true;
> -
>         while (true) {
>                 current = cg_read_long(memcg, "memory.current");
> -               to_reclaim = current - goal_usage;
>
> -               /*
> -                * We only keep looping if we get -EAGAIN, which means we could
> -                * not reclaim the full amount. This means we got -EAGAIN when
> -                * we actually reclaimed the requested amount, so fail.
> -                */
> -               if (to_reclaim <= 0)
> -                       break;
> +               /* First iteration*/
> +               if (err == INT_MAX) {
> +                       if (current <= goal_usage)
> +                               return false;
> +               /* Write successful, check reclaimed amount */
> +               } else if (!err) {
> +                       return current <= goal_usage ||
> +                               values_close(current, goal_usage, 3);
> +               /* Unexpected error, or ran out of retries */
> +               } else if (err != -EAGAIN || !retries--) {
> +                       return false;
> +               /* EAGAIN -> retry, but check for false negatives */
> +               } else if (current <= goal_usage) {
> +                       return false;
> +               }
>
> +               to_reclaim = current - goal_usage;
>                 snprintf(buf, sizeof(buf), "%ld", to_reclaim);
>                 err = cg_write(memcg, "memory.reclaim", buf);
> -               if (!err) {
> -                       /*
> -                        * If writing succeeds, then the written
> amount should have been
> -                        * fully reclaimed (and maybe more).
> -                        */
> -                       current = cg_read_long(memcg, "memory.current");
> -                       if (!values_close(current, goal_usage, 3) &&
> current > goal_usage)
> -                               break;
> -                       return true;
> -               }
> -
> -               /* The kernel could not reclaim the full amount, try again. */
> -               if (err == -EAGAIN && retries--)
> -                       continue;
> -
> -               /* We got an unexpected error or ran out of retries. */
> -               break;
>         }
> -       return false;
>  }
>
>  /*
> --
> 2.38.1.584.g0f3c55d4c2-goog
>
> >
> > >  }
> > >
> > > +/* Reclaim from @memcg until usage reaches @goal_usage */
> > > +static bool reclaim_until(const char *memcg, long goal_usage)
> > > +{
> > > +     char buf[64];
> > > +     int retries = 5;
> > > +     int err;
> > > +     long current, to_reclaim;
> > > +
> > > +     /* Nothing to do here */
> > > +     if (cg_read_long(memcg, "memory.current") <= goal_usage)
> > > +             return true;
> > > +
> > > +     while (true) {
> > > +             current = cg_read_long(memcg, "memory.current");
> > > +             to_reclaim = current - goal_usage;
> > > +
> > > +             /*
> > > +              * We only keep looping if we get -EAGAIN, which means we could
> > > +              * not reclaim the full amount. This means we got -EAGAIN when
> > > +              * we actually reclaimed the requested amount, so fail.
> > > +              */
> > > +             if (to_reclaim <= 0)
> > > +                     break;
> > > +
> > > +             snprintf(buf, sizeof(buf), "%ld", to_reclaim);
> > > +             err = cg_write(memcg, "memory.reclaim", buf);
> > > +             if (!err) {
> > > +                     /*
> > > +                      * If writing succeeds, then the written amount should have been
> > > +                      * fully reclaimed (and maybe more).
> > > +                      */
> > > +                     current = cg_read_long(memcg, "memory.current");
> > > +                     if (!values_close(current, goal_usage, 3) && current > goal_usage)
> > > +                             break;
> >
> > There are 3 places in this function where memory.current is read and compared
> > to goal_usage. I believe only one can be left.
> >
> > > +                     return true;
> > > +             }
> > > +
> > > +             /* The kernel could not reclaim the full amount, try again. */
> > > +             if (err == -EAGAIN && retries--)
> > > +                     continue;
> > > +
> > > +             /* We got an unexpected error or ran out of retries. */
> > > +             break;
> >
> > if (err != -EAGAIN || retries--)
> >         break;
> >
> > Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  2022-11-29 19:42       ` Yosry Ahmed
@ 2022-11-30 17:19         ` Roman Gushchin
  2022-11-30 18:25           ` Yosry Ahmed
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2022-11-30 17:19 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Tue, Nov 29, 2022 at 11:42:31AM -0800, Yosry Ahmed wrote:
> On Wed, Nov 23, 2022 at 7:16 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Wed, Nov 23, 2022 at 5:03 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > On Wed, Nov 23, 2022 at 09:21:31AM +0000, Yosry Ahmed wrote:
> > > > Refactor the code that drives writing to memory.reclaim (retrying, error
> > > > handling, etc) from test_memcg_reclaim() to a helper called
> > > > reclaim_until(), which proactively reclaims from a memcg until its
> > > > usage reaches a certain value.
> > > >
> > > > This will be used in a following patch in another test.
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > > ---
> > > >  .../selftests/cgroup/test_memcontrol.c        | 85 +++++++++++--------
> > > >  1 file changed, 49 insertions(+), 36 deletions(-)
> > > >
> > > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > index 8833359556f3..d4182e94945e 100644
> > > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > @@ -645,6 +645,53 @@ static int test_memcg_max(const char *root)
> > > >       return ret;
> > >
> > >
> > > The code below looks correct, but can be simplified a bit.
> > > And btw thank you for adding a test!
> > >
> > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > > (idk if you want invest your time in further simplication of this code,
> > > it was this way before this patch, so up to you).
> >
> > I don't "want" to, but the voices in my head won't shut up until I do so..
> >
> > Here's a patch that simplifies the code, I inlined it here to avoid
> > sending a new version. If it looks good to you, it can be squashed
> > into this patch or merged separately (whatever you and Andrew prefer).
> > I can also send it in a separate thread if preferred.
> 
> Roman, any thoughts on this?
> 
> >
> >
> > From 18c40d61dac05b33cfc9233b17979b54422ed7c5 Mon Sep 17 00:00:00 2001
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Date: Thu, 24 Nov 2022 02:21:12 +0000
> > Subject: [PATCH] selftests: cgroup: simplify memcg reclaim code
> >
> > Simplify the code for the reclaim_until() helper used for memcg reclaim
> > through memory.reclaim.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> >  .../selftests/cgroup/test_memcontrol.c        | 65 ++++++++++---------
> >  1 file changed, 33 insertions(+), 32 deletions(-)
> >
> > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c
> > b/tools/testing/selftests/cgroup/test_memcontrol.c
> > index bac3b91f1579..2e2bde44a6f7 100644
> > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > @@ -17,6 +17,7 @@
> >  #include <netdb.h>
> >  #include <errno.h>
> >  #include <sys/mman.h>
> > +#include <limits.h>
> >
> >  #include "../kselftest.h"
> >  #include "cgroup_util.h"
> > @@ -656,51 +657,51 @@ static int test_memcg_max(const char *root)
> >         return ret;
> >  }
> >
> > -/* Reclaim from @memcg until usage reaches @goal_usage */
> > +/*
> > + * Reclaim from @memcg until usage reaches @goal_usage by writing to
> > + * memory.reclaim.
> > + *
> > + * This function will return false if the usage is already below the
> > + * goal.
> > + *
> > + * This function assumes that writing to memory.reclaim is the only
> > + * source of change in memory.current (no concurrent allocations or
> > + * reclaim).
> > + *
> > + * This function makes sure memory.reclaim is sane. It will return
> > + * false if memory.reclaim's error codes do not make sense, even if
> > + * the usage goal was satisfied.
> > + */
> >  static bool reclaim_until(const char *memcg, long goal_usage)
> >  {
> >         char buf[64];
> >         int retries = 5;
> > -       int err;
> > +       int err = INT_MAX;
> >         long current, to_reclaim;
> >
> > -       /* Nothing to do here */
> > -       if (cg_read_long(memcg, "memory.current") <= goal_usage)
> > -               return true;
> > -

Hi Yosry!

Thank you for working on this!
I feel like it's still way more complex than it can be.
How about something like this? (completely untested, treat is
as a pseudo-code).


{
	...
	bool ret = false;

	for (retries = 5; retries > 0; retries--) {
		current = cg_read_long(memcg, "memory.current");

		if (current <= goal) // replace with values_close?
			break;

		to_reclaim = current - goal_usage;
		snprintf(buf, sizeof(buf), "%ld", to_reclaim);
		err = cg_write(memcg, "memory.reclaim", buf);
		if (!err)
			ret = true;
		else if (err != -AGAIN)
			break;
	}

	return ret;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  2022-11-30 17:19         ` Roman Gushchin
@ 2022-11-30 18:25           ` Yosry Ahmed
  2022-12-02  3:19             ` Yosry Ahmed
  0 siblings, 1 reply; 13+ messages in thread
From: Yosry Ahmed @ 2022-11-30 18:25 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 30, 2022 at 9:20 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Tue, Nov 29, 2022 at 11:42:31AM -0800, Yosry Ahmed wrote:
> > On Wed, Nov 23, 2022 at 7:16 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > On Wed, Nov 23, 2022 at 5:03 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Wed, Nov 23, 2022 at 09:21:31AM +0000, Yosry Ahmed wrote:
> > > > > Refactor the code that drives writing to memory.reclaim (retrying, error
> > > > > handling, etc) from test_memcg_reclaim() to a helper called
> > > > > reclaim_until(), which proactively reclaims from a memcg until its
> > > > > usage reaches a certain value.
> > > > >
> > > > > This will be used in a following patch in another test.
> > > > >
> > > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > > > ---
> > > > >  .../selftests/cgroup/test_memcontrol.c        | 85 +++++++++++--------
> > > > >  1 file changed, 49 insertions(+), 36 deletions(-)
> > > > >
> > > > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > > index 8833359556f3..d4182e94945e 100644
> > > > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > > @@ -645,6 +645,53 @@ static int test_memcg_max(const char *root)
> > > > >       return ret;
> > > >
> > > >
> > > > The code below looks correct, but can be simplified a bit.
> > > > And btw thank you for adding a test!
> > > >
> > > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > > > (idk if you want invest your time in further simplication of this code,
> > > > it was this way before this patch, so up to you).
> > >
> > > I don't "want" to, but the voices in my head won't shut up until I do so..
> > >
> > > Here's a patch that simplifies the code, I inlined it here to avoid
> > > sending a new version. If it looks good to you, it can be squashed
> > > into this patch or merged separately (whatever you and Andrew prefer).
> > > I can also send it in a separate thread if preferred.
> >
> > Roman, any thoughts on this?
> >
> > >
> > >
> > > From 18c40d61dac05b33cfc9233b17979b54422ed7c5 Mon Sep 17 00:00:00 2001
> > > From: Yosry Ahmed <yosryahmed@google.com>
> > > Date: Thu, 24 Nov 2022 02:21:12 +0000
> > > Subject: [PATCH] selftests: cgroup: simplify memcg reclaim code
> > >
> > > Simplify the code for the reclaim_until() helper used for memcg reclaim
> > > through memory.reclaim.
> > >
> > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > ---
> > >  .../selftests/cgroup/test_memcontrol.c        | 65 ++++++++++---------
> > >  1 file changed, 33 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > index bac3b91f1579..2e2bde44a6f7 100644
> > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > @@ -17,6 +17,7 @@
> > >  #include <netdb.h>
> > >  #include <errno.h>
> > >  #include <sys/mman.h>
> > > +#include <limits.h>
> > >
> > >  #include "../kselftest.h"
> > >  #include "cgroup_util.h"
> > > @@ -656,51 +657,51 @@ static int test_memcg_max(const char *root)
> > >         return ret;
> > >  }
> > >
> > > -/* Reclaim from @memcg until usage reaches @goal_usage */
> > > +/*
> > > + * Reclaim from @memcg until usage reaches @goal_usage by writing to
> > > + * memory.reclaim.
> > > + *
> > > + * This function will return false if the usage is already below the
> > > + * goal.
> > > + *
> > > + * This function assumes that writing to memory.reclaim is the only
> > > + * source of change in memory.current (no concurrent allocations or
> > > + * reclaim).
> > > + *
> > > + * This function makes sure memory.reclaim is sane. It will return
> > > + * false if memory.reclaim's error codes do not make sense, even if
> > > + * the usage goal was satisfied.
> > > + */
> > >  static bool reclaim_until(const char *memcg, long goal_usage)
> > >  {
> > >         char buf[64];
> > >         int retries = 5;
> > > -       int err;
> > > +       int err = INT_MAX;
> > >         long current, to_reclaim;
> > >
> > > -       /* Nothing to do here */
> > > -       if (cg_read_long(memcg, "memory.current") <= goal_usage)
> > > -               return true;
> > > -
>
> Hi Yosry!
>
> Thank you for working on this!
> I feel like it's still way more complex than it can be.
> How about something like this? (completely untested, treat is
> as a pseudo-code).

Thanks Roman!

This looks much simpler, and it nicely and subtly catches the false
negative case (where we return -EAGAIN but have actually reclaimed the
requested amount), but I think it doesn't catch the false positive
case (where memory.reclaim returns 0 but hasn't reclaimed enough
memory). In this case I think we will just keep retrying and ignore
the false positive?

Maybe with the following added check?

>
>
> {
>         ...
>         bool ret = false;
>
>         for (retries = 5; retries > 0; retries--) {
>                 current = cg_read_long(memcg, "memory.current");
>
>                 if (current <= goal) // replace with values_close?
>                         break;
                   else if (ret) { // false positive?
                            ret = false;
                            break;
                    }
>
>                 to_reclaim = current - goal_usage;
>                 snprintf(buf, sizeof(buf), "%ld", to_reclaim);
>                 err = cg_write(memcg, "memory.reclaim", buf);
>                 if (!err)
>                         ret = true;
>                 else if (err != -AGAIN)
>                         break;
>         }
>
>         return ret;
> }

Also, please let me know if you prefer that I send this cleanup in the
same thread like the above, in a completely separate patch that
depends on this series, or have it squashed into this patch in a v3.

Thanks again!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until()
  2022-11-30 18:25           ` Yosry Ahmed
@ 2022-12-02  3:19             ` Yosry Ahmed
  0 siblings, 0 replies; 13+ messages in thread
From: Yosry Ahmed @ 2022-12-02  3:19 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Yu Zhao,
	Muchun Song, Matthew Wilcox (Oracle),
	Vasily Averin, Vlastimil Babka, Chris Down, linux-kernel,
	linux-mm

On Wed, Nov 30, 2022 at 10:25 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Nov 30, 2022 at 9:20 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Tue, Nov 29, 2022 at 11:42:31AM -0800, Yosry Ahmed wrote:
> > > On Wed, Nov 23, 2022 at 7:16 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > On Wed, Nov 23, 2022 at 5:03 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > > >
> > > > > On Wed, Nov 23, 2022 at 09:21:31AM +0000, Yosry Ahmed wrote:
> > > > > > Refactor the code that drives writing to memory.reclaim (retrying, error
> > > > > > handling, etc) from test_memcg_reclaim() to a helper called
> > > > > > reclaim_until(), which proactively reclaims from a memcg until its
> > > > > > usage reaches a certain value.
> > > > > >
> > > > > > This will be used in a following patch in another test.
> > > > > >
> > > > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > > > > ---
> > > > > >  .../selftests/cgroup/test_memcontrol.c        | 85 +++++++++++--------
> > > > > >  1 file changed, 49 insertions(+), 36 deletions(-)
> > > > > >
> > > > > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > > > index 8833359556f3..d4182e94945e 100644
> > > > > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > > > @@ -645,6 +645,53 @@ static int test_memcg_max(const char *root)
> > > > > >       return ret;
> > > > >
> > > > >
> > > > > The code below looks correct, but can be simplified a bit.
> > > > > And btw thank you for adding a test!
> > > > >
> > > > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > > > > (idk if you want invest your time in further simplication of this code,
> > > > > it was this way before this patch, so up to you).
> > > >
> > > > I don't "want" to, but the voices in my head won't shut up until I do so..
> > > >
> > > > Here's a patch that simplifies the code, I inlined it here to avoid
> > > > sending a new version. If it looks good to you, it can be squashed
> > > > into this patch or merged separately (whatever you and Andrew prefer).
> > > > I can also send it in a separate thread if preferred.
> > >
> > > Roman, any thoughts on this?
> > >
> > > >
> > > >
> > > > From 18c40d61dac05b33cfc9233b17979b54422ed7c5 Mon Sep 17 00:00:00 2001
> > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > Date: Thu, 24 Nov 2022 02:21:12 +0000
> > > > Subject: [PATCH] selftests: cgroup: simplify memcg reclaim code
> > > >
> > > > Simplify the code for the reclaim_until() helper used for memcg reclaim
> > > > through memory.reclaim.
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > > > ---
> > > >  .../selftests/cgroup/test_memcontrol.c        | 65 ++++++++++---------
> > > >  1 file changed, 33 insertions(+), 32 deletions(-)
> > > >
> > > > diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > index bac3b91f1579..2e2bde44a6f7 100644
> > > > --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> > > > @@ -17,6 +17,7 @@
> > > >  #include <netdb.h>
> > > >  #include <errno.h>
> > > >  #include <sys/mman.h>
> > > > +#include <limits.h>
> > > >
> > > >  #include "../kselftest.h"
> > > >  #include "cgroup_util.h"
> > > > @@ -656,51 +657,51 @@ static int test_memcg_max(const char *root)
> > > >         return ret;
> > > >  }
> > > >
> > > > -/* Reclaim from @memcg until usage reaches @goal_usage */
> > > > +/*
> > > > + * Reclaim from @memcg until usage reaches @goal_usage by writing to
> > > > + * memory.reclaim.
> > > > + *
> > > > + * This function will return false if the usage is already below the
> > > > + * goal.
> > > > + *
> > > > + * This function assumes that writing to memory.reclaim is the only
> > > > + * source of change in memory.current (no concurrent allocations or
> > > > + * reclaim).
> > > > + *
> > > > + * This function makes sure memory.reclaim is sane. It will return
> > > > + * false if memory.reclaim's error codes do not make sense, even if
> > > > + * the usage goal was satisfied.
> > > > + */
> > > >  static bool reclaim_until(const char *memcg, long goal_usage)
> > > >  {
> > > >         char buf[64];
> > > >         int retries = 5;
> > > > -       int err;
> > > > +       int err = INT_MAX;
> > > >         long current, to_reclaim;
> > > >
> > > > -       /* Nothing to do here */
> > > > -       if (cg_read_long(memcg, "memory.current") <= goal_usage)
> > > > -               return true;
> > > > -
> >
> > Hi Yosry!
> >
> > Thank you for working on this!
> > I feel like it's still way more complex than it can be.
> > How about something like this? (completely untested, treat is
> > as a pseudo-code).
>
> Thanks Roman!
>
> This looks much simpler, and it nicely and subtly catches the false
> negative case (where we return -EAGAIN but have actually reclaimed the
> requested amount), but I think it doesn't catch the false positive
> case (where memory.reclaim returns 0 but hasn't reclaimed enough
> memory). In this case I think we will just keep retrying and ignore
> the false positive?
>
> Maybe with the following added check?
>
> >
> >
> > {
> >         ...
> >         bool ret = false;
> >
> >         for (retries = 5; retries > 0; retries--) {
> >                 current = cg_read_long(memcg, "memory.current");
> >
> >                 if (current <= goal) // replace with values_close?
> >                         break;
>                    else if (ret) { // false positive?
>                             ret = false;
>                             break;
>                     }
> >
> >                 to_reclaim = current - goal_usage;
> >                 snprintf(buf, sizeof(buf), "%ld", to_reclaim);
> >                 err = cg_write(memcg, "memory.reclaim", buf);
> >                 if (!err)
> >                         ret = true;
> >                 else if (err != -AGAIN)
> >                         break;
> >         }
> >
> >         return ret;
> > }
>
> Also, please let me know if you prefer that I send this cleanup in the
> same thread like the above, in a completely separate patch that
> depends on this series, or have it squashed into this patch in a v3.
>
> Thanks again!

I realized I missed a few folks in the CC of this version anyway. Sent
v3 with the suggested refactoring (+ the missing check for false
positives) squashed into this patch. Also included your review tags on
patches 1 & 3 (patch 2 was almost rewritten according to your
suggestions, so I dropped the review tag and added a suggested tag):

https://lore.kernel.org/lkml/20221202031512.1365483-1-yosryahmed@google.com/

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-12-02  3:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-23  9:21 [PATCH v2 0/3] mm: memcg: fix protection of reclaim target memcg Yosry Ahmed
2022-11-23  9:21 ` [PATCH v2 1/3] mm: memcg: fix stale " Yosry Ahmed
2022-11-24  0:40   ` Roman Gushchin
2022-11-24  0:57     ` Yosry Ahmed
2022-11-23  9:21 ` [PATCH v2 2/3] selftests: cgroup: refactor proactive reclaim code to reclaim_until() Yosry Ahmed
2022-11-24  1:03   ` Roman Gushchin
2022-11-24  3:16     ` Yosry Ahmed
2022-11-29 19:42       ` Yosry Ahmed
2022-11-30 17:19         ` Roman Gushchin
2022-11-30 18:25           ` Yosry Ahmed
2022-12-02  3:19             ` Yosry Ahmed
2022-11-23  9:21 ` [PATCH v2 3/3] selftests: cgroup: make sure reclaim target memcg is unprotected Yosry Ahmed
2022-11-24  1:04   ` Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.