From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D65FC433E0 for ; Tue, 2 Jun 2020 04:49:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3250E20772 for ; Tue, 2 Jun 2020 04:49:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="HKyZP3Fa" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3250E20772 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0CF3E28003A; Tue, 2 Jun 2020 00:49:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A89C280012; Tue, 2 Jun 2020 00:49:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED6DB28003A; Tue, 2 Jun 2020 00:49:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id D5419280012 for ; Tue, 2 Jun 2020 00:49:44 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 91BFA181AEF10 for ; Tue, 2 Jun 2020 04:49:44 +0000 (UTC) X-FDA: 76883043888.04.fruit59_37e73cb0f9405 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 78B8580061E1 for ; Tue, 2 Jun 2020 04:49:44 +0000 (UTC) X-HE-Tag: fruit59_37e73cb0f9405 X-Filterd-Recvd-Size: 6206 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 2 Jun 2020 04:49:44 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E3CEB20734; Tue, 2 Jun 2020 04:49:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1591073383; bh=OZquwXFKHuTclgKzOR7UhzeJOP5F0DIZIspIIviekVU=; h=Date:From:To:Subject:In-Reply-To:From; b=HKyZP3Fa3Eig09J56lQyC4JeFMDUjUzGfBzacD5r6uKYR/BQMtW3jvEHeRt/a0aJ9 y7LZQ46in811DuBjv53w8d9XsMCqh7jXCr0wz7goPiueVJcyALE7n9rYqn2eKMuCu3 rIDE9CyuVSCeC0n6cx+DT52BCUMAVT+xzWvSAhlc= Date: Mon, 01 Jun 2020 21:49:42 -0700 From: Andrew Morton To: akpm@linux-foundation.org, chris@chrisdown.name, hannes@cmpxchg.org, hughd@google.com, kuba@kernel.org, linux-mm@kvack.org, mhocko@kernel.org, mm-commits@vger.kernel.org, shakeelb@google.com, tj@kernel.org, torvalds@linux-foundation.org Subject: [patch 079/128] mm/memcg: prepare for swap over-high accounting and penalty calculation Message-ID: <20200602044942._uiPP5Mr0%akpm@linux-foundation.org> In-Reply-To: <20200601214457.919c35648e96a2b46b573fe1@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Queue-Id: 78B8580061E1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jakub Kicinski Subject: mm/memcg: prepare for swap over-high accounting and penalty calculation Patch series "memcg: Slow down swap allocation as the available space gets depleted", v6. Tejun describes the problem as follows: When swap runs out, there's an abrupt change in system behavior - the anonymous memory suddenly becomes unmanageable which readily breaks any sort of memory isolation and can bring down the whole system. To avoid that, oomd [1] monitors free swap space and triggers kills when it drops below the specific threshold (e.g. 15%). While this works, it's far from ideal: - Depending on IO performance and total swap size, a given headroom might not be enough or too much. - oomd has to monitor swap depletion in addition to the usual pressure metrics and it currently doesn't consider memory.swap.max. Solve this by adapting parts of the approach that memory.high uses - slow down allocation as the resource gets depleted turning the depletion behavior from abrupt cliff one to gradual degradation observable through memory pressure metric. [1] https://github.com/facebookincubator/oomd This patch (of 4): Slice the memory overage calculation logic a little bit so we can reuse it to apply a similar penalty to the swap. The logic which accesses the memory-specific fields (use and high values) has to be taken out of calculate_high_delay(). Link: http://lkml.kernel.org/r/20200527195846.102707-1-kuba@kernel.org Link: http://lkml.kernel.org/r/20200527195846.102707-2-kuba@kernel.org Signed-off-by: Jakub Kicinski Reviewed-by: Shakeel Butt Acked-by: Johannes Weiner Cc: Hugh Dickins Cc: Chris Down Cc: Michal Hocko Cc: Tejun Heo Signed-off-by: Andrew Morton --- mm/memcontrol.c | 62 +++++++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 27 deletions(-) --- a/mm/memcontrol.c~mm-prepare-for-swap-over-high-accounting-and-penalty-calculation +++ a/mm/memcontrol.c @@ -2321,41 +2321,48 @@ static void high_work_func(struct work_s #define MEMCG_DELAY_PRECISION_SHIFT 20 #define MEMCG_DELAY_SCALING_SHIFT 14 -/* - * Get the number of jiffies that we should penalise a mischievous cgroup which - * is exceeding its memory.high by checking both it and its ancestors. - */ -static unsigned long calculate_high_delay(struct mem_cgroup *memcg, - unsigned int nr_pages) +static u64 calculate_overage(unsigned long usage, unsigned long high) { - unsigned long penalty_jiffies; - u64 max_overage = 0; + u64 overage; - do { - unsigned long usage, high; - u64 overage; + if (usage <= high) + return 0; - usage = page_counter_read(&memcg->memory); - high = READ_ONCE(memcg->high); + /* + * Prevent division by 0 in overage calculation by acting as if + * it was a threshold of 1 page + */ + high = max(high, 1UL); - if (usage <= high) - continue; + overage = usage - high; + overage <<= MEMCG_DELAY_PRECISION_SHIFT; + return div64_u64(overage, high); +} - /* - * Prevent division by 0 in overage calculation by acting as if - * it was a threshold of 1 page - */ - high = max(high, 1UL); - - overage = usage - high; - overage <<= MEMCG_DELAY_PRECISION_SHIFT; - overage = div64_u64(overage, high); +static u64 mem_find_max_overage(struct mem_cgroup *memcg) +{ + u64 overage, max_overage = 0; - if (overage > max_overage) - max_overage = overage; + do { + overage = calculate_overage(page_counter_read(&memcg->memory), + READ_ONCE(memcg->high)); + max_overage = max(overage, max_overage); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); + return max_overage; +} + +/* + * Get the number of jiffies that we should penalise a mischievous cgroup which + * is exceeding its memory.high by checking both it and its ancestors. + */ +static unsigned long calculate_high_delay(struct mem_cgroup *memcg, + unsigned int nr_pages, + u64 max_overage) +{ + unsigned long penalty_jiffies; + if (!max_overage) return 0; @@ -2411,7 +2418,8 @@ void mem_cgroup_handle_over_high(void) * memory.high is breached and reclaim is unable to keep up. Throttle * allocators proactively to slow down excessive growth. */ - penalty_jiffies = calculate_high_delay(memcg, nr_pages); + penalty_jiffies = calculate_high_delay(memcg, nr_pages, + mem_find_max_overage(memcg)); /* * Don't sleep if the amount of jiffies this memcg owes us is so low _