All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-17 18:05 Johannes Weiner
  2021-08-17 18:44 ` Rik van Riel
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Johannes Weiner @ 2021-08-17 18:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Leon Yang, Chris Down, Roman Gushchin, Michal Hocko, linux-mm,
	cgroups, linux-kernel, kernel-team

We've noticed occasional OOM killing when memory.low settings are in
effect for cgroups. This is unexpected and undesirable as memory.low
is supposed to express non-OOMing memory priorities between cgroups.

The reason for this is proportional memory.low reclaim. When cgroups
are below their memory.low threshold, reclaim passes them over in the
first round, and then retries if it couldn't find pages anywhere else.
But when cgroups are slighly above their memory.low setting, page scan
force is scaled down and diminished in proportion to the overage, to
the point where it can cause reclaim to fail as well - only in that
case we currently don't retry, and instead trigger OOM.

To fix this, hook proportional reclaim into the same retry logic we
have in place for when cgroups are skipped entirely. This way if
reclaim fails and some cgroups were scanned with dimished pressure,
we'll try another full-force cycle before giving up and OOMing.

Reported-by: Leon Yang <lnyng@fb.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/memcontrol.h | 29 +++++++++++++++--------------
 mm/vmscan.c                | 27 +++++++++++++++++++--------
 2 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bfe5c486f4ad..24797929d8a1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -612,12 +612,15 @@ static inline bool mem_cgroup_disabled(void)
 	return !cgroup_subsys_enabled(memory_cgrp_subsys);
 }
 
-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
-						  struct mem_cgroup *memcg,
-						  bool in_low_reclaim)
+static inline void mem_cgroup_protection(struct mem_cgroup *root,
+					 struct mem_cgroup *memcg,
+					 unsigned long *min,
+					 unsigned long *low)
 {
+	*min = *low = 0;
+
 	if (mem_cgroup_disabled())
-		return 0;
+		return;
 
 	/*
 	 * There is no reclaim protection applied to a targeted reclaim.
@@ -653,13 +656,10 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
 	 *
 	 */
 	if (root == memcg)
-		return 0;
-
-	if (in_low_reclaim)
-		return READ_ONCE(memcg->memory.emin);
+		return;
 
-	return max(READ_ONCE(memcg->memory.emin),
-		   READ_ONCE(memcg->memory.elow));
+	*min = READ_ONCE(memcg->memory.emin);
+	*low = READ_ONCE(memcg->memory.elow);
 }
 
 void mem_cgroup_calculate_protection(struct mem_cgroup *root,
@@ -1147,11 +1147,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
 {
 }
 
-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
-						  struct mem_cgroup *memcg,
-						  bool in_low_reclaim)
+static inline void mem_cgroup_protection(struct mem_cgroup *root,
+					 struct mem_cgroup *memcg,
+					 unsigned long *min,
+					 unsigned long *low)
 {
-	return 0;
+	*min = *low = 0;
 }
 
 static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4620df62f0ff..701106e1829c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -100,9 +100,12 @@ struct scan_control {
 	unsigned int may_swap:1;
 
 	/*
-	 * Cgroups are not reclaimed below their configured memory.low,
-	 * unless we threaten to OOM. If any cgroups are skipped due to
-	 * memory.low and nothing was reclaimed, go back for memory.low.
+	 * Cgroup memory below memory.low is protected as long as we
+	 * don't threaten to OOM. If any cgroup is reclaimed at
+	 * reduced force or passed over entirely due to its memory.low
+	 * setting (memcg_low_skipped), and nothing is reclaimed as a
+	 * result, then go back back for one more cycle that reclaims
+	 * the protected memory (memcg_low_reclaim) to avert OOM.
 	 */
 	unsigned int memcg_low_reclaim:1;
 	unsigned int memcg_low_skipped:1;
@@ -2537,15 +2540,14 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	for_each_evictable_lru(lru) {
 		int file = is_file_lru(lru);
 		unsigned long lruvec_size;
+		unsigned long low, min;
 		unsigned long scan;
-		unsigned long protection;
 
 		lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
-		protection = mem_cgroup_protection(sc->target_mem_cgroup,
-						   memcg,
-						   sc->memcg_low_reclaim);
+		mem_cgroup_protection(sc->target_mem_cgroup, memcg,
+				      &min, &low);
 
-		if (protection) {
+		if (min || low) {
 			/*
 			 * Scale a cgroup's reclaim pressure by proportioning
 			 * its current usage to its memory.low or memory.min
@@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 			 * hard protection.
 			 */
 			unsigned long cgroup_size = mem_cgroup_size(memcg);
+			unsigned long protection;
+
+			/* memory.low scaling, make sure we retry before OOM */
+			if (!sc->memcg_low_reclaim && low > min) {
+				protection = low;
+				sc->memcg_low_skipped = 1;
+			} else {
+				protection = min;
+			}
 
 			/* Avoid TOCTOU with earlier protection check */
 			cgroup_size = max(cgroup_size, protection);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
@ 2021-08-17 18:44 ` Rik van Riel
  2021-08-17 19:10   ` Shakeel Butt
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2021-08-17 18:44 UTC (permalink / raw)
  To: hannes, akpm
  Cc: chris, Leon Yang, mhocko, linux-mm, cgroups, Roman Gushchin,
	linux-kernel, Kernel Team

On Tue, 2021-08-17 at 14:05 -0400, Johannes Weiner wrote:
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.

Good catch.

> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Rik van Riel <riel@surriel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
@ 2021-08-17 19:10   ` Shakeel Butt
  2021-08-17 19:10   ` Shakeel Butt
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Shakeel Butt @ 2021-08-17 19:10 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
	Michal Hocko, Linux MM, Cgroups, LKML, Kernel Team

On Tue, Aug 17, 2021 at 11:03 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
>
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan

*slightly

> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
>
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,

*diminished

> we'll try another full-force cycle before giving up and OOMing.
>
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Should this be considered for stable?

Reviewed-by: Shakeel Butt <shakeelb@google.com>

[...]
>
>  static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4620df62f0ff..701106e1829c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -100,9 +100,12 @@ struct scan_control {
>         unsigned int may_swap:1;
>
>         /*
> -        * Cgroups are not reclaimed below their configured memory.low,
> -        * unless we threaten to OOM. If any cgroups are skipped due to
> -        * memory.low and nothing was reclaimed, go back for memory.low.
> +        * Cgroup memory below memory.low is protected as long as we
> +        * don't threaten to OOM. If any cgroup is reclaimed at
> +        * reduced force or passed over entirely due to its memory.low
> +        * setting (memcg_low_skipped), and nothing is reclaimed as a
> +        * result, then go back back for one more cycle that reclaims

*back

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
@ 2021-08-17 19:10   ` Shakeel Butt
  0 siblings, 0 replies; 15+ messages in thread
From: Shakeel Butt @ 2021-08-17 19:10 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
	Michal Hocko, Linux MM, Cgroups, LKML, Kernel Team

On Tue, Aug 17, 2021 at 11:03 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
>
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan

*slightly

> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
>
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,

*diminished

> we'll try another full-force cycle before giving up and OOMing.
>
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Should this be considered for stable?

Reviewed-by: Shakeel Butt <shakeelb@google.com>

[...]
>
>  static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4620df62f0ff..701106e1829c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -100,9 +100,12 @@ struct scan_control {
>         unsigned int may_swap:1;
>
>         /*
> -        * Cgroups are not reclaimed below their configured memory.low,
> -        * unless we threaten to OOM. If any cgroups are skipped due to
> -        * memory.low and nothing was reclaimed, go back for memory.low.
> +        * Cgroup memory below memory.low is protected as long as we
> +        * don't threaten to OOM. If any cgroup is reclaimed at
> +        * reduced force or passed over entirely due to its memory.low
> +        * setting (memcg_low_skipped), and nothing is reclaimed as a
> +        * result, then go back back for one more cycle that reclaims

*back


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
  2021-08-17 18:44 ` Rik van Riel
  2021-08-17 19:10   ` Shakeel Butt
@ 2021-08-17 19:14 ` Andrew Morton
  2021-08-17 19:45 ` Roman Gushchin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2021-08-17 19:14 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Leon Yang, Chris Down, Roman Gushchin, Michal Hocko, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, 17 Aug 2021 14:05:06 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
> 
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan
> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
> 
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.

Which kernel version(s) do you think need this?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
                   ` (2 preceding siblings ...)
  2021-08-17 19:14 ` Andrew Morton
@ 2021-08-17 19:45 ` Roman Gushchin
  2021-08-18 14:15   ` Johannes Weiner
  2021-08-18 20:18 ` Chris Down
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Roman Gushchin @ 2021-08-17 19:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Chris Down, Michal Hocko, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote:
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
> 
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan
> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
> 
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.
> 
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Roman Gushchin <guro@fb.com>

I guess it's a stable material, so maybe adding:
Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")

?


Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 19:45 ` Roman Gushchin
@ 2021-08-18 14:15   ` Johannes Weiner
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2021-08-18 14:15 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Leon Yang, Chris Down, Michal Hocko, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue, Aug 17, 2021 at 12:45:18PM -0700, Roman Gushchin wrote:
> On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote:
> > We've noticed occasional OOM killing when memory.low settings are in
> > effect for cgroups. This is unexpected and undesirable as memory.low
> > is supposed to express non-OOMing memory priorities between cgroups.
> > 
> > The reason for this is proportional memory.low reclaim. When cgroups
> > are below their memory.low threshold, reclaim passes them over in the
> > first round, and then retries if it couldn't find pages anywhere else.
> > But when cgroups are slighly above their memory.low setting, page scan
> > force is scaled down and diminished in proportion to the overage, to
> > the point where it can cause reclaim to fail as well - only in that
> > case we currently don't retry, and instead trigger OOM.
> > 
> > To fix this, hook proportional reclaim into the same retry logic we
> > have in place for when cgroups are skipped entirely. This way if
> > reclaim fails and some cgroups were scanned with dimished pressure,
> > we'll try another full-force cycle before giving up and OOMing.
> > 
> > Reported-by: Leon Yang <lnyng@fb.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Acked-by: Roman Gushchin <guro@fb.com>

Thank you.

> I guess it's a stable material, so maybe adding:
> Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")

Yes, that Fixes makes sense. Plus:

Cc: <stable@vger.kernel.org> # 5.4+

I initially didn't tag it because the issue is over two years old and
we've had no other reports of this. But thinking about it, it's
probably more a lack of users rather than severity. At FB we only
noticed with a recent rollout of memory_recursiveprot
(8a931f801340c2be10552c7b5622d5f4852f3a36) because we didn't have
working memory.low configurations before that. But now that we do
notice, it's a problem worth fixing. So yes, stable makes sense.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 19:10   ` Shakeel Butt
  (?)
@ 2021-08-18 14:16   ` Johannes Weiner
  -1 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2021-08-18 14:16 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
	Michal Hocko, Linux MM, Cgroups, LKML, Kernel Team

On Tue, Aug 17, 2021 at 12:10:16PM -0700, Shakeel Butt wrote:
> On Tue, Aug 17, 2021 at 11:03 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > We've noticed occasional OOM killing when memory.low settings are in
> > effect for cgroups. This is unexpected and undesirable as memory.low
> > is supposed to express non-OOMing memory priorities between cgroups.
> >
> > The reason for this is proportional memory.low reclaim. When cgroups
> > are below their memory.low threshold, reclaim passes them over in the
> > first round, and then retries if it couldn't find pages anywhere else.
> > But when cgroups are slighly above their memory.low setting, page scan
> 
> *slightly
> 
> > force is scaled down and diminished in proportion to the overage, to
> > the point where it can cause reclaim to fail as well - only in that
> > case we currently don't retry, and instead trigger OOM.
> >
> > To fix this, hook proportional reclaim into the same retry logic we
> > have in place for when cgroups are skipped entirely. This way if
> > reclaim fails and some cgroups were scanned with dimished pressure,
> 
> *diminished

Oops. Andrew, would you mind folding these into the checkpatch fixlet?

> > we'll try another full-force cycle before giving up and OOMing.
> >
> > Reported-by: Leon Yang <lnyng@fb.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Should this be considered for stable?

Yes, I think so after all. Please see my reply to Roman.

> Reviewed-by: Shakeel Butt <shakeelb@google.com>

Thanks Shakeel!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
                   ` (3 preceding siblings ...)
  2021-08-17 19:45 ` Roman Gushchin
@ 2021-08-18 20:18 ` Chris Down
  2021-08-19 15:01 ` Michal Hocko
  2021-08-23 16:09 ` Michal Koutný
  6 siblings, 0 replies; 15+ messages in thread
From: Chris Down @ 2021-08-18 20:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Roman Gushchin, Michal Hocko, linux-mm,
	cgroups, linux-kernel, kernel-team

Johannes Weiner writes:
>We've noticed occasional OOM killing when memory.low settings are in
>effect for cgroups. This is unexpected and undesirable as memory.low
>is supposed to express non-OOMing memory priorities between cgroups.
>
>The reason for this is proportional memory.low reclaim. When cgroups
>are below their memory.low threshold, reclaim passes them over in the
>first round, and then retries if it couldn't find pages anywhere else.
>But when cgroups are slighly above their memory.low setting, page scan
>force is scaled down and diminished in proportion to the overage, to
>the point where it can cause reclaim to fail as well - only in that
>case we currently don't retry, and instead trigger OOM.
>
>To fix this, hook proportional reclaim into the same retry logic we
>have in place for when cgroups are skipped entirely. This way if
>reclaim fails and some cgroups were scanned with dimished pressure,
>we'll try another full-force cycle before giving up and OOMing.
>
>Reported-by: Leon Yang <lnyng@fb.com>
>Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks for tracking this down! Agreed that this looks like a good stable 
candidate.

Acked-by: Chris Down <chris@chrisdown.name>

>---
> include/linux/memcontrol.h | 29 +++++++++++++++--------------
> mm/vmscan.c                | 27 +++++++++++++++++++--------
> 2 files changed, 34 insertions(+), 22 deletions(-)
>
>diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>index bfe5c486f4ad..24797929d8a1 100644
>--- a/include/linux/memcontrol.h
>+++ b/include/linux/memcontrol.h
>@@ -612,12 +612,15 @@ static inline bool mem_cgroup_disabled(void)
> 	return !cgroup_subsys_enabled(memory_cgrp_subsys);
> }
>
>-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
>-						  struct mem_cgroup *memcg,
>-						  bool in_low_reclaim)
>+static inline void mem_cgroup_protection(struct mem_cgroup *root,
>+					 struct mem_cgroup *memcg,
>+					 unsigned long *min,
>+					 unsigned long *low)
> {
>+	*min = *low = 0;
>+
> 	if (mem_cgroup_disabled())
>-		return 0;
>+		return;
>
> 	/*
> 	 * There is no reclaim protection applied to a targeted reclaim.
>@@ -653,13 +656,10 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
> 	 *
> 	 */
> 	if (root == memcg)
>-		return 0;
>-
>-	if (in_low_reclaim)
>-		return READ_ONCE(memcg->memory.emin);
>+		return;
>
>-	return max(READ_ONCE(memcg->memory.emin),
>-		   READ_ONCE(memcg->memory.elow));
>+	*min = READ_ONCE(memcg->memory.emin);
>+	*low = READ_ONCE(memcg->memory.elow);
> }
>
> void mem_cgroup_calculate_protection(struct mem_cgroup *root,
>@@ -1147,11 +1147,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> {
> }
>
>-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
>-						  struct mem_cgroup *memcg,
>-						  bool in_low_reclaim)
>+static inline void mem_cgroup_protection(struct mem_cgroup *root,
>+					 struct mem_cgroup *memcg,
>+					 unsigned long *min,
>+					 unsigned long *low)
> {
>-	return 0;
>+	*min = *low = 0;
> }
>
> static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root,
>diff --git a/mm/vmscan.c b/mm/vmscan.c
>index 4620df62f0ff..701106e1829c 100644
>--- a/mm/vmscan.c
>+++ b/mm/vmscan.c
>@@ -100,9 +100,12 @@ struct scan_control {
> 	unsigned int may_swap:1;
>
> 	/*
>-	 * Cgroups are not reclaimed below their configured memory.low,
>-	 * unless we threaten to OOM. If any cgroups are skipped due to
>-	 * memory.low and nothing was reclaimed, go back for memory.low.
>+	 * Cgroup memory below memory.low is protected as long as we
>+	 * don't threaten to OOM. If any cgroup is reclaimed at
>+	 * reduced force or passed over entirely due to its memory.low
>+	 * setting (memcg_low_skipped), and nothing is reclaimed as a
>+	 * result, then go back back for one more cycle that reclaims
>+	 * the protected memory (memcg_low_reclaim) to avert OOM.
> 	 */
> 	unsigned int memcg_low_reclaim:1;
> 	unsigned int memcg_low_skipped:1;
>@@ -2537,15 +2540,14 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> 	for_each_evictable_lru(lru) {
> 		int file = is_file_lru(lru);
> 		unsigned long lruvec_size;
>+		unsigned long low, min;
> 		unsigned long scan;
>-		unsigned long protection;
>
> 		lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
>-		protection = mem_cgroup_protection(sc->target_mem_cgroup,
>-						   memcg,
>-						   sc->memcg_low_reclaim);
>+		mem_cgroup_protection(sc->target_mem_cgroup, memcg,
>+				      &min, &low);
>
>-		if (protection) {
>+		if (min || low) {
> 			/*
> 			 * Scale a cgroup's reclaim pressure by proportioning
> 			 * its current usage to its memory.low or memory.min
>@@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> 			 * hard protection.
> 			 */
> 			unsigned long cgroup_size = mem_cgroup_size(memcg);
>+			unsigned long protection;
>+
>+			/* memory.low scaling, make sure we retry before OOM */
>+			if (!sc->memcg_low_reclaim && low > min) {
>+				protection = low;
>+				sc->memcg_low_skipped = 1;
>+			} else {
>+				protection = min;
>+			}
>
> 			/* Avoid TOCTOU with earlier protection check */
> 			cgroup_size = max(cgroup_size, protection);
>-- 
>2.32.0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
                   ` (4 preceding siblings ...)
  2021-08-18 20:18 ` Chris Down
@ 2021-08-19 15:01 ` Michal Hocko
  2021-08-19 20:38   ` Johannes Weiner
  2021-08-23 16:09 ` Michal Koutný
  6 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2021-08-19 15:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin, linux-mm,
	cgroups, linux-kernel, kernel-team

On Tue 17-08-21 14:05:06, Johannes Weiner wrote:
> We've noticed occasional OOM killing when memory.low settings are in
> effect for cgroups. This is unexpected and undesirable as memory.low
> is supposed to express non-OOMing memory priorities between cgroups.
> 
> The reason for this is proportional memory.low reclaim. When cgroups
> are below their memory.low threshold, reclaim passes them over in the
> first round, and then retries if it couldn't find pages anywhere else.
> But when cgroups are slighly above their memory.low setting, page scan
> force is scaled down and diminished in proportion to the overage, to
> the point where it can cause reclaim to fail as well - only in that
> case we currently don't retry, and instead trigger OOM.
> 
> To fix this, hook proportional reclaim into the same retry logic we
> have in place for when cgroups are skipped entirely. This way if
> reclaim fails and some cgroups were scanned with dimished pressure,
> we'll try another full-force cycle before giving up and OOMing.
> 
> Reported-by: Leon Yang <lnyng@fb.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Although I have to say that the code is quite tricky and it deserves
more comments. See below.

[...]
> @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  			 * hard protection.
>  			 */
>  			unsigned long cgroup_size = mem_cgroup_size(memcg);
> +			unsigned long protection;
> +
> +			/* memory.low scaling, make sure we retry before OOM */
> +			if (!sc->memcg_low_reclaim && low > min) {
> +				protection = low;
> +				sc->memcg_low_skipped = 1;
> +			} else {
> +				protection = min;
> +			}

Just by looking at this in isolation one could be really curious how
does this not break the low memory protection altogether. The logic is
spread over 3 different places.

Would something like the following be more understandable?

			/*
			 * Low limit protected memcgs are already excluded at
			 * a higher level (shrink_node_memcgs) but scaling
			 * down the reclaim target can result in hard to
			 * reclaim and premature OOM. We do not have a full
			 * picture here so we cannot really judge this
			 * sutuation here but pro-actively flag this scenario
			 * and let do_try_to_free_pages to retry if
			 * there is no progress.
			 */
>  
>  			/* Avoid TOCTOU with earlier protection check */
>  			cgroup_size = max(cgroup_size, protection);
> -- 
> 2.32.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-19 15:01 ` Michal Hocko
@ 2021-08-19 20:38   ` Johannes Weiner
  2021-08-20 15:44     ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Weiner @ 2021-08-19 20:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin, linux-mm,
	cgroups, linux-kernel, kernel-team

On Thu, Aug 19, 2021 at 05:01:38PM +0200, Michal Hocko wrote:
> On Tue 17-08-21 14:05:06, Johannes Weiner wrote:
> > We've noticed occasional OOM killing when memory.low settings are in
> > effect for cgroups. This is unexpected and undesirable as memory.low
> > is supposed to express non-OOMing memory priorities between cgroups.
> > 
> > The reason for this is proportional memory.low reclaim. When cgroups
> > are below their memory.low threshold, reclaim passes them over in the
> > first round, and then retries if it couldn't find pages anywhere else.
> > But when cgroups are slighly above their memory.low setting, page scan
> > force is scaled down and diminished in proportion to the overage, to
> > the point where it can cause reclaim to fail as well - only in that
> > case we currently don't retry, and instead trigger OOM.
> > 
> > To fix this, hook proportional reclaim into the same retry logic we
> > have in place for when cgroups are skipped entirely. This way if
> > reclaim fails and some cgroups were scanned with dimished pressure,
> > we'll try another full-force cycle before giving up and OOMing.
> > 
> > Reported-by: Leon Yang <lnyng@fb.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks

> 
> Although I have to say that the code is quite tricky and it deserves
> more comments. See below.
> 
> [...]
> > @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> >  			 * hard protection.
> >  			 */
> >  			unsigned long cgroup_size = mem_cgroup_size(memcg);
> > +			unsigned long protection;
> > +
> > +			/* memory.low scaling, make sure we retry before OOM */
> > +			if (!sc->memcg_low_reclaim && low > min) {
> > +				protection = low;
> > +				sc->memcg_low_skipped = 1;
> > +			} else {
> > +				protection = min;
> > +			}
> 
> Just by looking at this in isolation one could be really curious how
> does this not break the low memory protection altogether.

You're right, it's a bit too terse.

> The logic is spread over 3 different places.
> 
> Would something like the following be more understandable?
> 
> 			/*
> 			 * Low limit protected memcgs are already excluded at
> 			 * a higher level (shrink_node_memcgs) but scaling
> 			 * down the reclaim target can result in hard to
> 			 * reclaim and premature OOM. We do not have a full
> 			 * picture here so we cannot really judge this
> 			 * sutuation here but pro-actively flag this scenario
> 			 * and let do_try_to_free_pages to retry if
> 			 * there is no progress.
> 			 */

I've been drafting around with this, but it seems to say the same
thing as the comment I put into struct scan_control already:

	/*
	 * Cgroup memory below memory.low is protected as long as we
	 * don't threaten to OOM. If any cgroup is reclaimed at
	 * reduced force or passed over entirely due to its memory.low
	 * setting (memcg_low_skipped), and nothing is reclaimed as a
	 * result, then go back back for one more cycle that reclaims
	 * the protected memory (memcg_low_reclaim) to avert OOM.
	 */

How about a brief version of this with a pointer to the original?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 701106e1829c..c32d686719d5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2580,7 +2580,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 			unsigned long cgroup_size = mem_cgroup_size(memcg);
 			unsigned long protection;
 
-			/* memory.low scaling, make sure we retry before OOM */
+			/*
+			 * Soft protection must not cause reclaim failure. Let
+			 * the upper level know if we skipped pages during the
+			 * first pass, so it can retry if necessary. See the
+			 * struct scan_control definition of those flags.
+			 */
 			if (!sc->memcg_low_reclaim && low > min) {
 				protection = low;
 				sc->memcg_low_skipped = 1;
@@ -2853,16 +2858,16 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 
 		if (mem_cgroup_below_min(memcg)) {
 			/*
-			 * Hard protection.
-			 * If there is no reclaimable memory, OOM.
+			 * Hard protection. Always respected. If there is not
+			 * enough reclaimable memory elsewhere, it's an OOM.
 			 */
 			continue;
 		} else if (mem_cgroup_below_low(memcg)) {
 			/*
-			 * Soft protection.
-			 * Respect the protection only as long as
-			 * there is an unprotected supply
-			 * of reclaimable memory from other cgroups.
+			 * Soft protection must not cause reclaim failure. Let
+			 * the upper level know if we skipped pages during the
+			 * first pass, so it can retry if necessary. See the
+			 * struct scan_control definition of those flags.
 			 */
 			if (!sc->memcg_low_reclaim) {
 				sc->memcg_low_skipped = 1;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-19 20:38   ` Johannes Weiner
@ 2021-08-20 15:44     ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2021-08-20 15:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin, linux-mm,
	cgroups, linux-kernel, kernel-team

On Thu 19-08-21 16:38:59, Johannes Weiner wrote:
> On Thu, Aug 19, 2021 at 05:01:38PM +0200, Michal Hocko wrote:
[...]
> > The logic is spread over 3 different places.
> > 
> > Would something like the following be more understandable?
> > 
> > 			/*
> > 			 * Low limit protected memcgs are already excluded at
> > 			 * a higher level (shrink_node_memcgs) but scaling
> > 			 * down the reclaim target can result in hard to
> > 			 * reclaim and premature OOM. We do not have a full
> > 			 * picture here so we cannot really judge this
> > 			 * sutuation here but pro-actively flag this scenario
> > 			 * and let do_try_to_free_pages to retry if
> > 			 * there is no progress.
> > 			 */
> 
> I've been drafting around with this, but it seems to say the same
> thing as the comment I put into struct scan_control already:
> 
> 	/*
> 	 * Cgroup memory below memory.low is protected as long as we
> 	 * don't threaten to OOM. If any cgroup is reclaimed at
> 	 * reduced force or passed over entirely due to its memory.low
> 	 * setting (memcg_low_skipped), and nothing is reclaimed as a
> 	 * result, then go back back for one more cycle that reclaims
> 	 * the protected memory (memcg_low_reclaim) to avert OOM.
> 	 */
> 
> How about a brief version of this with a pointer to the original?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 701106e1829c..c32d686719d5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2580,7 +2580,12 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  			unsigned long cgroup_size = mem_cgroup_size(memcg);
>  			unsigned long protection;
>  
> -			/* memory.low scaling, make sure we retry before OOM */
> +			/*
> +			 * Soft protection must not cause reclaim failure. Let
> +			 * the upper level know if we skipped pages during the
> +			 * first pass, so it can retry if necessary. See the
> +			 * struct scan_control definition of those flags.
> +			 */
>  			if (!sc->memcg_low_reclaim && low > min) {
>  				protection = low;
>  				sc->memcg_low_skipped = 1;
> @@ -2853,16 +2858,16 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>  
>  		if (mem_cgroup_below_min(memcg)) {
>  			/*
> -			 * Hard protection.
> -			 * If there is no reclaimable memory, OOM.
> +			 * Hard protection. Always respected. If there is not
> +			 * enough reclaimable memory elsewhere, it's an OOM.
>  			 */
>  			continue;
>  		} else if (mem_cgroup_below_low(memcg)) {
>  			/*
> -			 * Soft protection.
> -			 * Respect the protection only as long as
> -			 * there is an unprotected supply
> -			 * of reclaimable memory from other cgroups.
> +			 * Soft protection must not cause reclaim failure. Let
> +			 * the upper level know if we skipped pages during the
> +			 * first pass, so it can retry if necessary. See the
> +			 * struct scan_control definition of those flags.
>  			 */
>  			if (!sc->memcg_low_reclaim) {
>  				sc->memcg_low_skipped = 1;

Yes, this makes the situation more explicit. I still see some advantage
to be explicit about those other layers as this will be easier to follow
the code but I will certainly not insist.

Andrew has already sent your original patch to Linus so this will need
to go as a separate patch. For that
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
                   ` (5 preceding siblings ...)
  2021-08-19 15:01 ` Michal Hocko
@ 2021-08-23 16:09 ` Michal Koutný
  2021-08-23 17:48   ` Johannes Weiner
  6 siblings, 1 reply; 15+ messages in thread
From: Michal Koutný @ 2021-08-23 16:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
	Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team

Hello

(and sorry for a belated reply).

On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> [...]
> +			/* memory.low scaling, make sure we retry before OOM */
> +			if (!sc->memcg_low_reclaim && low > min) {
> +				protection = low;
> +				sc->memcg_low_skipped = 1;

IIUC, this won't result in memory.events:low increment although the
effect is similar (breaching (partial) memory.low protection) and signal
to the user is comparable (overcommited memory.low).

Admittedly, this patch's behavior adheres to the current documentation
(Documentation/admin-guide/cgroup-v2.rst):

> The number of times the cgroup is reclaimed due to high memory
> pressure even though its usage is under the low boundary,

however, that definition might not be what the useful indicator would
be now.
Is it worth including these partial breaches into memory.events:low?

Regards,
Michal

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-23 16:09 ` Michal Koutný
@ 2021-08-23 17:48   ` Johannes Weiner
  2021-08-24 13:01     ` Michal Koutný
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Weiner @ 2021-08-23 17:48 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
	Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team

Hi Michal,

On Mon, Aug 23, 2021 at 06:09:29PM +0200, Michal Koutný wrote:
> Hello
> 
> (and sorry for a belated reply).

It's never too late, thanks for taking a look.

> On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > @@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> > [...]
> > +			/* memory.low scaling, make sure we retry before OOM */
> > +			if (!sc->memcg_low_reclaim && low > min) {
> > +				protection = low;
> > +				sc->memcg_low_skipped = 1;
> 
> IIUC, this won't result in memory.events:low increment although the
> effect is similar (breaching (partial) memory.low protection) and signal
> to the user is comparable (overcommited memory.low).

Good observation. I think you're right, we should probably count such
partial breaches as LOW events as well.

Note that this isn't new behavior. My patch merely moved this part
from mem_cgroup_protection():

-       if (in_low_reclaim)
-               return READ_ONCE(memcg->memory.emin);

Even before, if we retried due to just one (possibly insignificant)
cgroup below low, we'd ignore proportional reclaim and partially
breach ALL protected cgroups, while only counting a low event for the
one group that is usage < low.

> Admittedly, this patch's behavior adheres to the current documentation
> (Documentation/admin-guide/cgroup-v2.rst):
> 
> > The number of times the cgroup is reclaimed due to high memory
> > pressure even though its usage is under the low boundary,
> 
> however, that definition might not be what the useful indicator would
> be now.
> Is it worth including these partial breaches into memory.events:low?

I think it is. How about:

"The number of times the cgroup's memory.low-protected memory was
reclaimed in order to avoid OOM during high memory pressure."

And adding a MEMCG_LOW event to partial breaches. BTW, the comment
block above this code is also out-of-date, because it says we're
honoring memory.low on the retries, but that's not the case.

I'll prepare a follow-up patch for these 3 things as well as the more
verbose comment that Michal Hocko asked for on the retry logic.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  2021-08-23 17:48   ` Johannes Weiner
@ 2021-08-24 13:01     ` Michal Koutný
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Koutný @ 2021-08-24 13:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Leon Yang, Chris Down, Roman Gushchin,
	Michal Hocko, linux-mm, cgroups, linux-kernel, kernel-team

On Mon, Aug 23, 2021 at 01:48:43PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> Note that this isn't new behavior.

Understood, there may be a difference between:
a) a cgroup where the protected reserve was detected (this changed),
b) a cgroup where the protected memory is reclaimed.

> "The number of times the cgroup's memory.low-protected memory was
> reclaimed in order to avoid OOM during high memory pressure."

Yes, this is what I meant (i.e. events for the case b) above).

Michal

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-08-24 13:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner
2021-08-17 18:44 ` Rik van Riel
2021-08-17 19:10 ` Shakeel Butt
2021-08-17 19:10   ` Shakeel Butt
2021-08-18 14:16   ` Johannes Weiner
2021-08-17 19:14 ` Andrew Morton
2021-08-17 19:45 ` Roman Gushchin
2021-08-18 14:15   ` Johannes Weiner
2021-08-18 20:18 ` Chris Down
2021-08-19 15:01 ` Michal Hocko
2021-08-19 20:38   ` Johannes Weiner
2021-08-20 15:44     ` Michal Hocko
2021-08-23 16:09 ` Michal Koutný
2021-08-23 17:48   ` Johannes Weiner
2021-08-24 13:01     ` Michal Koutný

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.