Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling
@ 2020-03-12 18:02 Chris Down
  2020-03-12 18:03 ` [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high Chris Down
  2020-03-16 16:19 ` [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling Johannes Weiner
  0 siblings, 2 replies; 4+ messages in thread
From: Chris Down @ 2020-03-12 18:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Tejun Heo, linux-mm, cgroups, linux-kernel, kernel-team

0e4b01df8659 had a bunch of fixups to use the right division method.
However, it seems that after all that it still wasn't right -- div_u64
takes a 32-bit divisor.

The headroom is still large (2^32 pages), so on mundane systems you
won't hit this, but this should definitely be fixed.

Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-mm@kvack.org
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org # 5.4.x
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63bb6a2aab81..a70206e516fe 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2339,7 +2339,7 @@ void mem_cgroup_handle_over_high(void)
 	 */
 	clamped_high = max(high, 1UL);
 
-	overage = div_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT,
+	overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT,
 			  clamped_high);
 
 	penalty_jiffies = ((u64)overage * overage * HZ)
-- 
2.25.1



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high
  2020-03-12 18:02 [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling Chris Down
@ 2020-03-12 18:03 ` Chris Down
  2020-03-16 16:21   ` Johannes Weiner
  2020-03-16 16:19 ` [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling Johannes Weiner
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Down @ 2020-03-12 18:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Tejun Heo, linux-mm, cgroups, linux-kernel, kernel-team

Prior to this commit, we only directly check the affected cgroup's
memory.high against its usage. However, it's possible that we are being
reclaimed as a result of hitting an ancestor memory.high and should be
penalised based on that, instead.

This patch changes memory.high overage throttling to use the largest
overage in its ancestors when considering how many penalty jiffies to
charge. This makes sure that we penalise poorly behaving cgroups in the
same way regardless of at what level of the hierarchy memory.high was
breached.

Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-mm@kvack.org
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org # 5.4.x
---
 mm/memcontrol.c | 93 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 58 insertions(+), 35 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a70206e516fe..46d649241a21 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2297,28 +2297,41 @@ static void high_work_func(struct work_struct *work)
  #define MEMCG_DELAY_SCALING_SHIFT 14
 
 /*
- * Scheduled by try_charge() to be executed from the userland return path
- * and reclaims memory over the high limit.
+ * Get the number of jiffies that we should penalise a mischievous cgroup which
+ * is exceeding its memory.high by checking both it and its ancestors.
  */
-void mem_cgroup_handle_over_high(void)
+static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
+					  unsigned int nr_pages)
 {
-	unsigned long usage, high, clamped_high;
-	unsigned long pflags;
-	unsigned long penalty_jiffies, overage;
-	unsigned int nr_pages = current->memcg_nr_pages_over_high;
-	struct mem_cgroup *memcg;
+	unsigned long penalty_jiffies;
+	u64 max_overage = 0;
 
-	if (likely(!nr_pages))
-		return;
+	do {
+		unsigned long usage, high;
+		u64 overage;
 
-	memcg = get_mem_cgroup_from_mm(current->mm);
-	reclaim_high(memcg, nr_pages, GFP_KERNEL);
-	current->memcg_nr_pages_over_high = 0;
+		usage = page_counter_read(&memcg->memory);
+		high = READ_ONCE(memcg->high);
+
+		/*
+		 * Prevent division by 0 in overage calculation by acting as if
+		 * it was a threshold of 1 page
+		 */
+		high = max(high, 1UL);
+
+		overage = usage - high;
+		overage <<= MEMCG_DELAY_PRECISION_SHIFT;
+		overage = div64_u64(overage, high);
+
+		if (overage > max_overage)
+			max_overage = overage;
+	} while ((memcg = parent_mem_cgroup(memcg)) &&
+		 !mem_cgroup_is_root(memcg));
+
+	if (!max_overage)
+		return 0;
 
 	/*
-	 * memory.high is breached and reclaim is unable to keep up. Throttle
-	 * allocators proactively to slow down excessive growth.
-	 *
 	 * We use overage compared to memory.high to calculate the number of
 	 * jiffies to sleep (penalty_jiffies). Ideally this value should be
 	 * fairly lenient on small overages, and increasingly harsh when the
@@ -2326,24 +2339,9 @@ void mem_cgroup_handle_over_high(void)
 	 * its crazy behaviour, so we exponentially increase the delay based on
 	 * overage amount.
 	 */
-
-	usage = page_counter_read(&memcg->memory);
-	high = READ_ONCE(memcg->high);
-
-	if (usage <= high)
-		goto out;
-
-	/*
-	 * Prevent division by 0 in overage calculation by acting as if it was a
-	 * threshold of 1 page
-	 */
-	clamped_high = max(high, 1UL);
-
-	overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT,
-			  clamped_high);
-
-	penalty_jiffies = ((u64)overage * overage * HZ)
-		>> (MEMCG_DELAY_PRECISION_SHIFT + MEMCG_DELAY_SCALING_SHIFT);
+	penalty_jiffies = max_overage * max_overage * HZ;
+	penalty_jiffies >>= MEMCG_DELAY_PRECISION_SHIFT;
+	penalty_jiffies >>= MEMCG_DELAY_SCALING_SHIFT;
 
 	/*
 	 * Factor in the task's own contribution to the overage, such that four
@@ -2360,7 +2358,32 @@ void mem_cgroup_handle_over_high(void)
 	 * application moving forwards and also permit diagnostics, albeit
 	 * extremely slowly.
 	 */
-	penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES);
+	return min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES);
+}
+
+/*
+ * Scheduled by try_charge() to be executed from the userland return path
+ * and reclaims memory over the high limit.
+ */
+void mem_cgroup_handle_over_high(void)
+{
+	unsigned long penalty_jiffies;
+	unsigned long pflags;
+	unsigned int nr_pages = current->memcg_nr_pages_over_high;
+	struct mem_cgroup *memcg;
+
+	if (likely(!nr_pages))
+		return;
+
+	memcg = get_mem_cgroup_from_mm(current->mm);
+	reclaim_high(memcg, nr_pages, GFP_KERNEL);
+	current->memcg_nr_pages_over_high = 0;
+
+	/*
+	 * memory.high is breached and reclaim is unable to keep up. Throttle
+	 * allocators proactively to slow down excessive growth.
+	 */
+	penalty_jiffies = calculate_high_delay(memcg, nr_pages);
 
 	/*
 	 * Don't sleep if the amount of jiffies this memcg owes us is so low
-- 
2.25.1



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling
  2020-03-12 18:02 [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling Chris Down
  2020-03-12 18:03 ` [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high Chris Down
@ 2020-03-16 16:19 ` Johannes Weiner
  1 sibling, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2020-03-16 16:19 UTC (permalink / raw)
  To: Chris Down
  Cc: Andrew Morton, Tejun Heo, linux-mm, cgroups, linux-kernel, kernel-team

On Thu, Mar 12, 2020 at 06:02:54PM +0000, Chris Down wrote:
> 0e4b01df8659 had a bunch of fixups to use the right division method.
> However, it seems that after all that it still wasn't right -- div_u64
> takes a 32-bit divisor.
> 
> The headroom is still large (2^32 pages), so on mundane systems you
> won't hit this, but this should definitely be fixed.
> 
> Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Chris Down <chris@chrisdown.name>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: cgroups@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: kernel-team@fb.com
> Cc: stable@vger.kernel.org # 5.4.x

div_u64 versus div64_u64 is really a handgrenade. We just fixed a
bunch of those in psi as well.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high
  2020-03-12 18:03 ` [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high Chris Down
@ 2020-03-16 16:21   ` Johannes Weiner
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2020-03-16 16:21 UTC (permalink / raw)
  To: Chris Down
  Cc: Andrew Morton, Tejun Heo, linux-mm, cgroups, linux-kernel, kernel-team

On Thu, Mar 12, 2020 at 06:03:04PM +0000, Chris Down wrote:
> Prior to this commit, we only directly check the affected cgroup's
> memory.high against its usage. However, it's possible that we are being
> reclaimed as a result of hitting an ancestor memory.high and should be
> penalised based on that, instead.
> 
> This patch changes memory.high overage throttling to use the largest
> overage in its ancestors when considering how many penalty jiffies to
> charge. This makes sure that we penalise poorly behaving cgroups in the
> same way regardless of at what level of the hierarchy memory.high was
> breached.
> 
> Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Chris Down <chris@chrisdown.name>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: cgroups@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: kernel-team@fb.com
> Cc: stable@vger.kernel.org # 5.4.x

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12 18:02 [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling Chris Down
2020-03-12 18:03 ` [PATCH 2/2] mm, memcg: Throttle allocators based on ancestral memory.high Chris Down
2020-03-16 16:21   ` Johannes Weiner
2020-03-16 16:19 ` [PATCH 1/2] mm, memcg: Fix corruption on 64-bit divisor in memory.high throttling Johannes Weiner

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git