From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 013F0C2D0EF for ; Fri, 17 Apr 2020 17:51:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD5DD20780 for ; Fri, 17 Apr 2020 17:51:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rcPENmuw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD5DD20780 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4A0888E003F; Fri, 17 Apr 2020 13:51:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4510A8E0023; Fri, 17 Apr 2020 13:51:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3669B8E003F; Fri, 17 Apr 2020 13:51:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 1EE798E0023 for ; Fri, 17 Apr 2020 13:51:24 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D86B04DDF for ; Fri, 17 Apr 2020 17:51:23 +0000 (UTC) X-FDA: 76718088846.02.tramp36_80936cf6a3f25 X-HE-Tag: tramp36_80936cf6a3f25 X-Filterd-Recvd-Size: 7169 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 17:51:23 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id m8so2947313lji.1 for ; Fri, 17 Apr 2020 10:51:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0bvq8GDDyYV41j+5P8OsI6ehsGT+D4XvqiNl9KWPqiw=; b=rcPENmuw3uoibqCQ1bei91SZIX5CUxsppMlNgftSQMML52eG74i79y68GqmTEcLfP0 ULN6ukv+0a58WwyLrgvPd2vyg1NgsO6mTXw02TFLShqH7rTMyLxmJj0pas6ibkeULTuF zvL7GDe8Nsjj2H6xwJKjHk1scm/yLizWICpyBkHsDskJnl4p0tKyqkLm3QTxkg8uD3DB 5WI68ZUgtj++O69DQfmKAb72c7UIxoFYL2VMHIvnMSHyFT3fHtrXZMsF3+Uc1gNI9ySG 7VYION5mZD2kA76B0mIi1P5r48iacIP98VQkdw0A9e8+QLgsfdukRlPBYqeo5N9elsJs 2LzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0bvq8GDDyYV41j+5P8OsI6ehsGT+D4XvqiNl9KWPqiw=; b=L+c3GVdKFia+EhFnb8hKKtmn9oMI4ozLeeUh4438gSBSE8XYmnVbfDp9XwiGCxNpmY 1hblwH3IeRB2xbAd2S/0JkL1r01pSs+nfiCRJkz+y2UFx/os7HbDDbJWkxiXNLu3A+q5 f2RQNjw3MM4glgbNyiTZfEq+/pUsi4QtLiS1F/AHFupeNYt0KTuE+edhWqNVy/U6cgvM CMF4l/ThWeW7MlnkyEFLCPBWo6v+vrezto9Cjab9GJtGNzsDtW5aj2nEj+QqfuGE2R33 dhW7Vjl+ZmPzDK29djTA0nit5NOO8fYz9KWHe417MhDW9UoFhef6YMSPcEeRAjNxIyMe Qgug== X-Gm-Message-State: AGi0PuYBW0+2W1oekkVJ+jUlz6rsITzPLWapOUwODflHSW81jaTSvM3P i9MOvQgZE8dDsfgKEH6NSvtRlDYO0MD6Klb0nyyKtg== X-Google-Smtp-Source: APiQypJFNJn4Za/tJEY0/xUDDM4+G2XMSoYiOKLmeeNO9+iJSvrfOrft0DSH72B8yscUZkbsIPgbTRzs/qAQOub3CS8= X-Received: by 2002:a2e:9a4a:: with SMTP id k10mr2880113ljj.115.1587145881775; Fri, 17 Apr 2020 10:51:21 -0700 (PDT) MIME-Version: 1.0 References: <20200417010617.927266-1-kuba@kernel.org> <20200417162355.GA43469@mtj.thefacebook.com> <20200417173615.GB43469@mtj.thefacebook.com> In-Reply-To: <20200417173615.GB43469@mtj.thefacebook.com> From: Shakeel Butt Date: Fri, 17 Apr 2020 10:51:10 -0700 Message-ID: Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted To: Tejun Heo Cc: Jakub Kicinski , Andrew Morton , Linux MM , Kernel Team , Johannes Weiner , Chris Down , Cgroups Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 17, 2020 at 10:36 AM Tejun Heo wrote: > > Hello, > > On Fri, Apr 17, 2020 at 10:18:15AM -0700, Shakeel Butt wrote: > > > There currently are issues with anonymous memory management which makes them > > > different / worse than page cache but I don't follow why swapping > > > necessarily means that isolation is broken. Page refaults don't indicate > > > that memory isolation is broken after all. > > > > Sorry, I meant the performance isolation. Direct reclaim does not > > really differentiate who to stall and whose CPU to use. > > Can you please elaborate concrete scenarios? I'm having a hard time seeing > differences from page cache. > Oh I was talking about the global reclaim here. In global reclaim, any task can be throttled (throttle_direct_reclaim()). Memory freed by using the CPU of high priority low latency jobs can be stolen by low priority batch jobs. > > > > memcg limit reclaim and memcg limits are overcommitted. Shouldn't > > > > running out of swap will trigger the OOM earlier which should be > > > > better than impacting the whole system. > > > > > > The primary scenario which was being considered was undercommitted > > > protections but I don't think that makes any relevant differences. > > > > > > > What is undercommitted protections? Does it mean there is still swap > > available on the system but the memcg is hitting its swap limit? > > Hahaha, I assumed you were talking about memory.high/max and was saying that > the primary scenarios that were being considered was usage of memory.low > interacting with swap. Again, can you please give an concrete example so > that we don't misunderstand each other? > > > > This is exactly similar to delay injection for memory.high. What's desired > > > is slowing down the workload as the available resource is depleted so that > > > the resource shortage presents as gradual degradation of performance and > > > matching increase in resource PSI. This allows the situation to be detected > > > and handled from userland while avoiding sudden and unpredictable behavior > > > changes. > > > > > > > Let me try to understand this with an example. Memcg 'A' has > > Ah, you already went there. Great. > > > memory.high = 100 MiB, memory.max = 150 MiB and memory.swap.max = 50 > > MiB. When A's usage goes over 100 MiB, it will reclaim the anon, file > > and kmem. The anon will go to swap and increase its swap usage until > > it hits the limit. Now the 'A' reclaim_high has fewer things (file & > > kmem) to reclaim but the mem_cgroup_handle_over_high() will keep A's > > increase in usage in check. > > > > So, my question is: should the slowdown by memory.high depends on the > > reclaimable memory? If there is no reclaimable memory and the job hits > > memory.high, should the kernel slow it down to crawl until the PSI > > monitor comes and decides what to do. If I understand correctly, the > > problem is the kernel slow down is not successful when reclaimable > > memory is very low. Please correct me if I am wrong. > > In combination with memory.high, swap slowdown may not be necessary because > memory.high's slow down mechanism is already there to handle "can't swap" > scenario whether that's because swap is disabled wholesale, limited or > depleted. However, please consider the following scenario. > > cgroup A has memory.low protection and no other restrictions. cgroup B has > no protection and has access to swap. When B's memory starts bloating and > gets the system under memory contention, it'll start consuming swap until it > can't. When swap becomes depleted for B, there's nothing holding it back and > B will start eating into A's protection. > In this example does 'B' have memory.high and memory.max set and by A having no other restrictions, I am assuming you meant unlimited high and max for A? Can 'A' use memory.min? > The proposed mechanism just plugs another vector for the same condition > where anonymous memory management breaks down because they can no longer be > reclaimed due to swap unavailability. > > Thanks. > > -- > tejun