From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17A0EC2D0D1 for ; Thu, 19 Dec 2019 20:07:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BEE9A24683 for ; Thu, 19 Dec 2019 20:07:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="VjJqbq8A" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BEE9A24683 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C6FB48E0178; Thu, 19 Dec 2019 15:07:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C1FE78E00F5; Thu, 19 Dec 2019 15:07:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABF618E0178; Thu, 19 Dec 2019 15:07:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194]) by kanga.kvack.org (Postfix) with ESMTP id 944028E00F5 for ; Thu, 19 Dec 2019 15:07:32 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 344B42463 for ; Thu, 19 Dec 2019 20:07:32 +0000 (UTC) X-FDA: 76282975944.07.cub81_7899870ba041 X-HE-Tag: cub81_7899870ba041 X-Filterd-Recvd-Size: 8842 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Dec 2019 20:07:31 +0000 (UTC) Received: by mail-qk1-f176.google.com with SMTP id t129so5675093qke.10 for ; Thu, 19 Dec 2019 12:07:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PL8MqaXiwKQ8e2aIklCO0Oeg7gsz/w0sjRuDLttFpbE=; b=VjJqbq8AABYf4B2P0PurzpZKzScRk6aKxSxjKiPZC2bbemPtys0tkpxfKzesX+pi/I zvKCA09NmYbT/ny6h+fp2IXx+mr+BRS3tLEOMXGlIDS3w+FE8IYlU+nQIXcCPh7uIEDk Nq1CGXd0WGMPPIIoDXamG/lLVJ/QkeL0LA0EXie07RYrAwFFhHCtP/9/CBPjz1VP9rbq NBXuhyXT8uwUBoncpw85LFHiaLJcJ8DxwSaG2+3ZgQr+XDnMDQpolsZYMTv5hJ3KCOpZ JqXD80xpUOZj+B6PejIcrLsOhSNuxTlWUF8Vii+nfGe+EtAb9D9cD8hlH9jGTA8W52qt tRnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PL8MqaXiwKQ8e2aIklCO0Oeg7gsz/w0sjRuDLttFpbE=; b=Sy62Ez/NFwAx+Fr6pufyKR9OPxsOjwy0cQJXL2H/O+mL9/PFBQNKel5gN+yRIMEFDN xgacvvnX6doaJfxk9JSilTrRy7QCHjM15tMEyz5YIfW8zRE7qXaQ+mfXO1x7+q9EQ8qa GH0MoKSWwREuJC0nwm1ezvgfq/bMu0lQMiFdconVhQveArpfnNCqGdVRob6MrJmiJVqp p8h3zf+W8jHLft0VYEyL/fhAeoMLKl1yEbSwZ5raagtBjURwgehdlYuly6Q1LXVhS/PD b1vSHrZGfIX6BJAdNkdKiu6tx1a7bCc9/v+osP7oEBy7kOoxZR/5KME0p3AZ+2sHKodc JyKw== X-Gm-Message-State: APjAAAVNenyxP8LabMSmGWdlI3pMKyCqxcdxd2YmXz2rww4DdTNphruv Us6EvAI++llg0y0k+wOljW7T1w== X-Google-Smtp-Source: APXvYqzpH5gqg/GGNlohwp4UTljGPvbBPqtX7jDbdB9yYDkFhlQguLBBlSlP+fZw/KJ0JBWts9VElA== X-Received: by 2002:a05:620a:899:: with SMTP id b25mr10005980qka.197.1576786050454; Thu, 19 Dec 2019 12:07:30 -0800 (PST) Received: from localhost ([2620:10d:c091:500::91a1]) by smtp.gmail.com with ESMTPSA id j15sm2161073qtn.37.2019.12.19.12.07.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Dec 2019 12:07:29 -0800 (PST) From: Johannes Weiner To: Andrew Morton Cc: Roman Gushchin , Michal Hocko , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2 1/3] mm: memcontrol: fix memory.low proportional distribution Date: Thu, 19 Dec 2019 15:07:16 -0500 Message-Id: <20191219200718.15696-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191219200718.15696-1-hannes@cmpxchg.org> References: <20191219200718.15696-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory.low is overcommitted - i.e. the children claim more protection than their shared ancestor grants them - the allowance is distributed in proportion to how much each sibling uses their own declared protection: low_usage =3D min(memory.low, memory.current) elow =3D parent_elow * (low_usage / siblings_low_usage) However, siblings_low_usage is not the sum of all low_usages. It sums up the usages of *only those cgroups that are within their memory.low* That means that low_usage can be *bigger* than siblings_low_usage, and consequently the total protection afforded to the children can be bigger than what the ancestor grants the subtree. Consider three groups where two are in excess of their protection: A/memory.low =3D 10G A/A1/memory.low =3D 10G, memory.current =3D 20G A/A2/memory.low =3D 10G, memory.current =3D 20G A/A3/memory.low =3D 10G, memory.current =3D 8G siblings_low_usage =3D 8G (only A3 contributes) A1/elow =3D parent_elow(10G) * low_usage(10G) / siblings_low_usage(8G) = =3D 12.5G -> 10G A2/elow =3D parent_elow(10G) * low_usage(10G) / siblings_low_usage(8G) = =3D 12.5G -> 10G A3/elow =3D parent_elow(10G) * low_usage(8G) / siblings_low_usage(8G) =3D= 10.0G (the 12.5G are capped to the explicit memory.low setting of 10G) With that, the sum of all awarded protection below A is 30G, when A only grants 10G for the entire subtree. What does this mean in practice? A1 and A2 would still be in excess of their 10G allowance and would be reclaimed, whereas A3 would not. As they eventually drop below their protection setting, they would be counted in siblings_low_usage again and the error would right itself. When reclaim was applied in a binary fashion (cgroup is reclaimed when it's above its protection, otherwise it's skipped) this would actually work out just fine. However, since 1bc63fb1272b ("mm, memcg: make scan aggression always exclude protection"), reclaim pressure is scaled to how much a cgroup is above its protection. As a result this calculation error unduly skews pressure away from A1 and A2 toward the rest of the system. But why did we do it like this in the first place? The reasoning behind exempting groups in excess from siblings_low_usage was to go after them first during reclaim in an overcommitted subtree: A/memory.low =3D 2G, memory.current =3D 4G A/A1/memory.low =3D 3G, memory.current =3D 2G A/A2/memory.low =3D 1G, memory.current =3D 2G siblings_low_usage =3D 2G (only A1 contributes) A1/elow =3D parent_elow(2G) * low_usage(2G) / siblings_low_usage(2G) =3D= 2G A2/elow =3D parent_elow(2G) * low_usage(1G) / siblings_low_usage(2G) =3D= 1G While the children combined are overcomitting A and are technically both at fault, A2 is actively declaring unprotected memory and we would like to reclaim that first. However, while this sounds like a noble goal on the face of it, it doesn't make much difference in actual memory distribution: Because A is overcommitted, reclaim will not stop once A2 gets pushed back to within its allowance; we'll have to reclaim A1 either way. The end result is still that protection is distributed proportionally, with A1 getting 3/4 (1.5G) and A2 getting 1/4 (0.5G) of A's allowance. [ If A weren't overcommitted, it wouldn't make a difference since each cgroup would just get the protection it declares: A/memory.low =3D 2G, memory.current =3D 3G A/A1/memory.low =3D 1G, memory.current =3D 1G A/A2/memory.low =3D 1G, memory.current =3D 2G With the current calculation: siblings_low_usage =3D 1G (only A1 contributes) A1/elow =3D parent_elow(2G) * low_usage(1G) / siblings_low_usage(1G) =3D= 2G -> 1G A2/elow =3D parent_elow(2G) * low_usage(1G) / siblings_low_usage(1G) =3D= 2G -> 1G Including excess groups in siblings_low_usage: siblings_low_usage =3D 2G A1/elow =3D parent_elow(2G) * low_usage(1G) / siblings_low_usage(2G) =3D= 1G -> 1G A2/elow =3D parent_elow(2G) * low_usage(1G) / siblings_low_usage(2G) =3D= 1G -> 1G ] Simplify the calculation and fix the proportional reclaim bug by including excess cgroups in siblings_low_usage. Signed-off-by: Johannes Weiner --- mm/memcontrol.c | 4 +--- mm/page_counter.c | 12 ++---------- 2 files changed, 3 insertions(+), 13 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c5b5f74cfd4d..874a0b00f89b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6236,9 +6236,7 @@ struct cgroup_subsys memory_cgrp_subsys =3D { * elow =3D min( memory.low, parent->elow * ------------------ ), * siblings_low_usage * - * | memory.current, if memory.current < memory.low - * low_usage =3D | - * | 0, otherwise. + * low_usage =3D min(memory.low, memory.current) * * * Such definition of the effective memory.low provides the expected diff --git a/mm/page_counter.c b/mm/page_counter.c index de31470655f6..75d53f15f040 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -23,11 +23,7 @@ static void propagate_protected_usage(struct page_coun= ter *c, return; =20 if (c->min || atomic_long_read(&c->min_usage)) { - if (usage <=3D c->min) - protected =3D usage; - else - protected =3D 0; - + protected =3D min(usage, c->min); old_protected =3D atomic_long_xchg(&c->min_usage, protected); delta =3D protected - old_protected; if (delta) @@ -35,11 +31,7 @@ static void propagate_protected_usage(struct page_coun= ter *c, } =20 if (c->low || atomic_long_read(&c->low_usage)) { - if (usage <=3D c->low) - protected =3D usage; - else - protected =3D 0; - + protected =3D min(usage, c->low); old_protected =3D atomic_long_xchg(&c->low_usage, protected); delta =3D protected - old_protected; if (delta) --=20 2.24.1