From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2707FCA90AF for ; Tue, 12 May 2020 17:55:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D2401206CC for ; Tue, 12 May 2020 17:55:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="whE0WE77" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2401206CC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 72BFD9000D9; Tue, 12 May 2020 13:55:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6DD5B900036; Tue, 12 May 2020 13:55:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CB6E9000D9; Tue, 12 May 2020 13:55:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0205.hostedemail.com [216.40.44.205]) by kanga.kvack.org (Postfix) with ESMTP id 432B7900036 for ; Tue, 12 May 2020 13:55:40 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 02A7F4417 for ; Tue, 12 May 2020 17:55:40 +0000 (UTC) X-FDA: 76808819640.24.wool29_1450941116037 X-HE-Tag: wool29_1450941116037 X-Filterd-Recvd-Size: 3698 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 May 2020 17:55:39 +0000 (UTC) Received: from kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com (unknown [163.114.132.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 33762206B9; Tue, 12 May 2020 17:55:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1589306138; bh=xdOV4sVw/QH3W5l5kTFnjtP5Xn7D0F1rZuTijRqPjeA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=whE0WE770+VHaMbfnYuPaiR3YTLVRI2bm6dCBiAjZO8iCWSULQm9XueRSDg27tl/e xqTLM3PvalH3TpK8APtJHmd69y0BT7kubpKq9WhkBe8iFqy9B20swAuxAdcwEnf6Rz lFtQWXAqkZCrbJ7H4+d3wLM5A+dn32IvGgCQWhOw= Date: Tue, 12 May 2020 10:55:36 -0700 From: Jakub Kicinski To: Michal Hocko Cc: akpm@linux-foundation.org, linux-mm@kvack.org, kernel-team@fb.com, tj@kernel.org, hannes@cmpxchg.org, chris@chrisdown.name, cgroups@vger.kernel.org, shakeelb@google.com Subject: Re: [PATCH mm v2 3/3] mm: automatically penalize tasks with high swap use Message-ID: <20200512105536.748da94e@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> In-Reply-To: <20200512072634.GP29153@dhcp22.suse.cz> References: <20200511225516.2431921-1-kuba@kernel.org> <20200511225516.2431921-4-kuba@kernel.org> <20200512072634.GP29153@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 12 May 2020 09:26:34 +0200 Michal Hocko wrote: > On Mon 11-05-20 15:55:16, Jakub Kicinski wrote: > > Use swap.high when deciding if swap is full. > > Please be more specific why. How about: Use swap.high when deciding if swap is full to influence ongoing swap reclaim in a best effort manner. > > Perform reclaim and count memory over high events. > > Please expand on this and explain how this is working and why the > semantic is subtly different from MEMCG_HIGH. I suspect the reason > is that there is no reclaim for the swap so you are only emitting an > event on the memcg which is actually throttled. This is in line with > memory.high but the difference is that we do reclaim each memcg subtree > in the high limit excess. That means that the counter tells us how many > times the specific memcg was in excess which would be impossible with > your implementation. Right, with memory all cgroups over high get penalized with the extra reclaim work. For swap we just have the delay, so the event is associated with the worst offender, anything lower didn't really matter. But it's easy enough to change if you prefer. Otherwise I'll just add this to the commit message: Count swap over high events. Note that unlike memory over high events we only count them for the worst offender. This is because the delay penalties for both swap and memory over high are not cumulative, i.e. we use the max delay. > I would also suggest to explain or ideally even separate the swap > penalty scaling logic to a seprate patch. What kind of data it is based > on? It's a hard thing to get production data for since, as we mentioned we don't expect the limit to be hit. It was more of a process of experimentation and finding a gradual slope that "felt right"... Is there a more scientific process we can follow here? We want the delay to be small at first for a first few pages and then grow to make sure we stop the task from going too much over high. The square function works pretty well IMHO.