From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29DF3C2D0EF for ; Fri, 17 Apr 2020 17:18:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CFC212078E for ; Fri, 17 Apr 2020 17:18:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KN9L8a7H" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CFC212078E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 617458E003A; Fri, 17 Apr 2020 13:18:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C8778E0023; Fri, 17 Apr 2020 13:18:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4DE598E003A; Fri, 17 Apr 2020 13:18:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id 374E48E0023 for ; Fri, 17 Apr 2020 13:18:30 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E9F2575B2 for ; Fri, 17 Apr 2020 17:18:29 +0000 (UTC) X-FDA: 76718005938.22.lake46_845a3e6af484e X-HE-Tag: lake46_845a3e6af484e X-Filterd-Recvd-Size: 5916 Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 17:18:29 +0000 (UTC) Received: by mail-lf1-f68.google.com with SMTP id j14so2404594lfg.9 for ; Fri, 17 Apr 2020 10:18:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=o4AxRTlVcgkXgmi2LH7FRSyDbtdJOdGOVjbxoP8py4E=; b=KN9L8a7H5um8chRLVRTdUiq5rDqyihp3g4hBvjZNc0rLhfvTvEUIR3bpd7TMO9YPh2 DYhybPls3AOAIJuz2jJD64RiTxjWqpFqMpjJsf/IRALVWQmI/V51Xjv/LBHYwMADNnDD XGoYsVgvmvbxEQd94R6RjNSmKNtoj/BHwVCe5OE2kwdMzXWbkSJYVSmzwzZHr1If24JZ 2CyUinq6W/uFHqjvQ756hZpIY6QMOqNIw3KRxW+AOe+SmoUc58AIPolI35skXhsCpazf Gwfc8LqFDsWpjZ7e5JpD1ZrLCRi2AwsyyUXqm8iTmYGXk5GdKBflHIfz7U0dB17m40AO SbzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o4AxRTlVcgkXgmi2LH7FRSyDbtdJOdGOVjbxoP8py4E=; b=htuSMWFk7zXdu7qiRc8Pkq+Dhgqn/PijuWi/EU1jYnOpZ3BbchTEc5KKGSNRCUW/90 fRmwCCQsm4L4N4zzKNafWCs9s/6o69bbws+Vt/M/8v7/mvTQnjCh8BBfufLOM8w3dkDL jP9acyfYE4h508EJqZ6d1XKkuMUDXkYa4rDIhHBnAfXedUG2Q2wb+2GfqmUETz73sELI B/xmAXERoRYwPJYY17wg86zP6JrTIkDMFYUHRsHKba0Rmqb89PcBVYx5DrscJCjuOCSz 9IIR4FDtdrsbkeC3SMtTpm7okDXUdeqNh7o20pD+Y7S3GtT2Q1c1zKxP/G7dhE7kBldt XQ+w== X-Gm-Message-State: AGi0PuY00gOP/LgEI1cP+xkUYprY6A/t4bC/I3zHrSnGWBNRrMpgeBON gLuKrpgHkT3V+/gjTZ5cAMvwE9QI6/G/DzjYc7bjiQ== X-Google-Smtp-Source: APiQypLkRQHbdzhph7XVHw5ZeyHzUgxbYSRMTPvK6KeMnOLzAWUSiu4oqQ+yZMWVnEjMWtTb9H/RSvGHXBRgELHhtn4= X-Received: by 2002:a19:5206:: with SMTP id m6mr2714469lfb.33.1587143907411; Fri, 17 Apr 2020 10:18:27 -0700 (PDT) MIME-Version: 1.0 References: <20200417010617.927266-1-kuba@kernel.org> <20200417162355.GA43469@mtj.thefacebook.com> In-Reply-To: <20200417162355.GA43469@mtj.thefacebook.com> From: Shakeel Butt Date: Fri, 17 Apr 2020 10:18:15 -0700 Message-ID: Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted To: Tejun Heo Cc: Jakub Kicinski , Andrew Morton , Linux MM , Kernel Team , Johannes Weiner , Chris Down , Cgroups Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Tejun, On Fri, Apr 17, 2020 at 9:23 AM Tejun Heo wrote: > > Hello, > > On Fri, Apr 17, 2020 at 09:11:33AM -0700, Shakeel Butt wrote: > > On Thu, Apr 16, 2020 at 6:06 PM Jakub Kicinski wrote: > > > > > > Tejun describes the problem as follows: > > > > > > When swap runs out, there's an abrupt change in system behavior - > > > the anonymous memory suddenly becomes unmanageable which readily > > > breaks any sort of memory isolation and can bring down the whole > > > system. > > > > Can you please add more info on this abrupt change in system behavior > > and what do you mean by anon memory becoming unmanageable? > > In the sense that anonymous memory becomes essentially memlocked. > > > Once the system is in global reclaim and doing swapping the memory > > isolation is already broken. Here I am assuming you are talking about > > There currently are issues with anonymous memory management which makes them > different / worse than page cache but I don't follow why swapping > necessarily means that isolation is broken. Page refaults don't indicate > that memory isolation is broken after all. > Sorry, I meant the performance isolation. Direct reclaim does not really differentiate who to stall and whose CPU to use. > > memcg limit reclaim and memcg limits are overcommitted. Shouldn't > > running out of swap will trigger the OOM earlier which should be > > better than impacting the whole system. > > The primary scenario which was being considered was undercommitted > protections but I don't think that makes any relevant differences. > What is undercommitted protections? Does it mean there is still swap available on the system but the memcg is hitting its swap limit? > This is exactly similar to delay injection for memory.high. What's desired > is slowing down the workload as the available resource is depleted so that > the resource shortage presents as gradual degradation of performance and > matching increase in resource PSI. This allows the situation to be detected > and handled from userland while avoiding sudden and unpredictable behavior > changes. > Let me try to understand this with an example. Memcg 'A' has memory.high = 100 MiB, memory.max = 150 MiB and memory.swap.max = 50 MiB. When A's usage goes over 100 MiB, it will reclaim the anon, file and kmem. The anon will go to swap and increase its swap usage until it hits the limit. Now the 'A' reclaim_high has fewer things (file & kmem) to reclaim but the mem_cgroup_handle_over_high() will keep A's increase in usage in check. So, my question is: should the slowdown by memory.high depends on the reclaimable memory? If there is no reclaimable memory and the job hits memory.high, should the kernel slow it down to crawl until the PSI monitor comes and decides what to do. If I understand correctly, the problem is the kernel slow down is not successful when reclaimable memory is very low. Please correct me if I am wrong. Shakeel