From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80176C433DF for ; Fri, 10 Jul 2020 19:19:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3FB872084C for ; Fri, 10 Jul 2020 19:19:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SrZQX7GJ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3FB872084C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C75216B0002; Fri, 10 Jul 2020 15:19:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C24AD6B0003; Fri, 10 Jul 2020 15:19:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B39EF8D0001; Fri, 10 Jul 2020 15:19:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 9D6E26B0002 for ; Fri, 10 Jul 2020 15:19:51 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1D6AC824805A for ; Fri, 10 Jul 2020 19:19:51 +0000 (UTC) X-FDA: 77023130982.29.sense29_2a152e626ed0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id E77FB18086E35 for ; Fri, 10 Jul 2020 19:19:50 +0000 (UTC) X-HE-Tag: sense29_2a152e626ed0 X-Filterd-Recvd-Size: 6237 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jul 2020 19:19:50 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id e4so7684833ljn.4 for ; Fri, 10 Jul 2020 12:19:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=I+/ZpUV9Tne4bK1xpzXtQG1xnjDQvm0MBQHRZpe8AKo=; b=SrZQX7GJ/Kzlw7h1SoI4rLoG7YGWqAT/kts4Om0DARGMfvQ+gkwU91XHLDByWaZPKr IoYy4s39ckACTY2bkjVyoBhXim+W3EULp1zDxw6MMTAeKt9oTeFxWHogTuPIvA8iXUDw Zczvd4dOYsYW1xQZz4YXUbxDyYS3XHJWc89DzZRjAfGNrqq+9P563ReWl16DhDZQL51V 9Yspg+SEe7+FbfqOWkkMymkTEuxTqEx9tUplIbsBELrvMgRsd9LFGZTl9sjshweklDQn M+nVoOCtjCNi558ol0/dG/X9JBVZyEu5AVsAsr88olD3n7pfKtPVsEBZRtwQ/JIAqNCJ 1Qsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=I+/ZpUV9Tne4bK1xpzXtQG1xnjDQvm0MBQHRZpe8AKo=; b=DQAgVscXeiez44BOjGHQKqFpJE1ZhS693Ej9AvD2saU1o4kyFybMadTQ8qA95gdQB7 rD9YgUECLiUY+YHCH/MxT+dtkx9WsQhaePJD9Gkp2V1Ws3+kO59zQmmuPn7iPDn3/gAm +eN7Mo2+/PQv0dWMPziKvrr/Ux6/2qtqJb+9cmBZy2//RsCv02j7O1x3DqoYEofDNr5W 0UfDZbnZnHFhA4/1g7F7MyNuZMgGAQRy3nPBINEFWR9jAAoNVpknNN43WgTpW9plNyLv qiUSTtLGhBqdOcJtb84p7h44C5B2EjGHRL36fiGxSG1189Iz+Nfzs0qnaPkOvTfOa8Ne 4wQg== X-Gm-Message-State: AOAM530Kxsja05nDhHHlUze5I05kpv12xOuuNL+NlRGeTtP+Ljuf2VJb mBqAiBuxY7tHGRRYF62S+vPwU/q9IpQ/K6R5xjIsXw== X-Google-Smtp-Source: ABdhPJzFoBkdGox40ei+klbQof0Gx0aadBK6ptgvGIgeyE4rH15dG1JqNPii8/FCJemts6XHE9l+BcstW37Cp3UGcYM= X-Received: by 2002:a2e:9996:: with SMTP id w22mr3900690lji.446.1594408788744; Fri, 10 Jul 2020 12:19:48 -0700 (PDT) MIME-Version: 1.0 References: <20200709194718.189231-1-guro@fb.com> <20200710122917.GB3022@dhcp22.suse.cz> <20200710184205.GB350256@carbon.dhcp.thefacebook.com> In-Reply-To: <20200710184205.GB350256@carbon.dhcp.thefacebook.com> From: Shakeel Butt Date: Fri, 10 Jul 2020 12:19:37 -0700 Message-ID: Subject: Re: [PATCH] mm: memcontrol: avoid workload stalls when lowering memory.high To: Roman Gushchin Cc: Michal Hocko , Andrew Morton , Johannes Weiner , Linux MM , Kernel Team , LKML , Domas Mituzas , Tejun Heo , Chris Down Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: E77FB18086E35 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000075, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 10, 2020 at 11:42 AM Roman Gushchin wrote: > > On Fri, Jul 10, 2020 at 07:12:22AM -0700, Shakeel Butt wrote: > > On Fri, Jul 10, 2020 at 5:29 AM Michal Hocko wrote: > > > > > > On Thu 09-07-20 12:47:18, Roman Gushchin wrote: > > > > Memory.high limit is implemented in a way such that the kernel > > > > penalizes all threads which are allocating a memory over the limit. > > > > Forcing all threads into the synchronous reclaim and adding some > > > > artificial delays allows to slow down the memory consumption and > > > > potentially give some time for userspace oom handlers/resource control > > > > agents to react. > > > > > > > > It works nicely if the memory usage is hitting the limit from below, > > > > however it works sub-optimal if a user adjusts memory.high to a value > > > > way below the current memory usage. It basically forces all workload > > > > threads (doing any memory allocations) into the synchronous reclaim > > > > and sleep. This makes the workload completely unresponsive for > > > > a long period of time and can also lead to a system-wide contention on > > > > lru locks. It can happen even if the workload is not actually tight on > > > > memory and has, for example, a ton of cold pagecache. > > > > > > > > In the current implementation writing to memory.high causes an atomic > > > > update of page counter's high value followed by an attempt to reclaim > > > > enough memory to fit into the new limit. To fix the problem described > > > > above, all we need is to change the order of execution: try to push > > > > the memory usage under the limit first, and only then set the new > > > > high limit. > > > > > > Shakeel would this help with your pro-active reclaim usecase? It would > > > require to reset the high limit right after the reclaim returns which is > > > quite ugly but it would at least not require a completely new interface. > > > You would simply do > > > high = current - to_reclaim > > > echo $high > memory.high > > > echo infinity > memory.high # To prevent direct reclaim > > > # allocation stalls > > > > > > > This will reduce the chance of stalls but the interface is still > > non-delegatable i.e. applications can not change their own memory.high > > for the use-cases like application controlled proactive reclaim and > > uswapd. > > Can you, please, elaborate a bit more on this? I didn't understand > why. > Sure. Do we want memory.high a CFTYPE_NS_DELEGATABLE type file? I don't think so otherwise any job on a system can change their memory.high and can adversely impact the isolation and memory scheduling of the system. Next we have to agree that there are valid use-cases to allow applications to reclaim from their cgroups and I think uswapd and proactive reclaim are valid use-cases. Let's suppose memory.high is the only way to trigger reclaim but the application can not write to their top level memory.high, so, it has to create a dummy cgroup of which it has write access to memory.high and has to move itself to that dummy cgroup to use memory.high to trigger reclaim for uswapd/proactive-reclaim.