From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C137E95A61 for ; Sat, 7 Oct 2023 13:09:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EAAB6B01F4; Sat, 7 Oct 2023 09:09:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29A0F6B01F3; Sat, 7 Oct 2023 09:09:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13B256B01F4; Sat, 7 Oct 2023 09:09:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 031A56B01F2 for ; Sat, 7 Oct 2023 09:09:22 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CAFC71CA405 for ; Sat, 7 Oct 2023 13:09:21 +0000 (UTC) X-FDA: 81318696522.13.EEE98A7 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf05.hostedemail.com (Postfix) with ESMTP id DD9E3100002 for ; Sat, 7 Oct 2023 13:09:19 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BwyQshPP; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of iecedge@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=iecedge@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696684159; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=GMaVybiPBatBJav/uAY5haebs3qdPaDSXyI+W1B54LU=; b=bhCH9yg+X7GIIFWAGV3Jf1Ldkp+ka05I58OjGK7x9vKFnxMA6U8QCxqqI/I4Lr5fOVXxBC zcRA5RFG8q+ourKItcbsxYQzicAhgt3ULxVPDGnJ569t1IJd9Z9dOxxwuI9YOaSoMq1LX7 STXFlJ9xR9tupsMhsAJWoRYcw9F3bfU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BwyQshPP; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of iecedge@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=iecedge@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696684159; a=rsa-sha256; cv=none; b=21JwJf50hTBr7AsECybVzGMWu1MCbLaaS1VIUCfnq5YfTEZ9gOKKfAIbHQbEFZL3eDz4iv 3KHXCRqo7QNoTJWS9A2o4s89Q6dqCWNJ8LYFSL1JtMtB2xxLkBEfF+J5gB27MLooh/GcF0 WruHzIexV58FD+uQUAuYK7BVni8Y3Hs= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1c60cec8041so21696595ad.3 for ; Sat, 07 Oct 2023 06:09:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696684158; x=1697288958; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=GMaVybiPBatBJav/uAY5haebs3qdPaDSXyI+W1B54LU=; b=BwyQshPPBmcWnpS7AKGbL/iBwNk1GpIjSk3Otd1a8gg93/i/Ygs9cGMzfT+U+Zzp7l 9Dyy8fRenZ3nLxjyO1CexiYAMRjHFeFonb7RLQtGT2QYS3UWPJ0CsQd1e6y2GJjlpnuz 2VcbFTiemGCXSITYiM+lsDTpsaEoVKVDO8E/o2X0w475b7rIaIHzUIpXvJFXzpdW7kxD tChRl9vjGmUmlaSCg8BmaRG0qNm4LrhviPyaAykX7kOAWxtFZ3+Lj4KptsjF93mFzioo a4toQdbcI2BqGtYR1JKmwkQ3D9aO7EL7a1iMlOOjYup2VR3ivGophiwwmll3+/jJuK0u Iojw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696684158; x=1697288958; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GMaVybiPBatBJav/uAY5haebs3qdPaDSXyI+W1B54LU=; b=bLvL9EtMuf7SCaHfRGrtAEU1uG4hQcj3VSjI3ZGmqFLT150eF0SOC1QDGr+JkeKyMW KMtBbvN+VEe8gv11ZyE9hia+VD8FHXRI2zur3dQktqeeQnsreESPYblEWY4HqxnW9Mo5 Wkr/gxfnoxCtX9DNmoxbNBaX7dNb+G3KkObx5c3IYH9w5HLdr0aHObFj6IkvF/FZYfs4 oe7TKrM9pJddRlnhNdtYpZ6I1PxzG3rnR0NpgPTpu2p2rzDrjPxjxyesU31fbMm6tfDn Wuz4qP3jF4eObpHfRfpZOrSSVItUfiJnNVCdymW+bP46WjtAFMYc4PAjInU0HgugJ8o+ yTHQ== X-Gm-Message-State: AOJu0YyDhNje98Na0PwoUQXRRtb0fgHOMYo6aygTUxglvGtvSkRlc3hz XwXvO8/ptduY2p+dUJMvjcY= X-Google-Smtp-Source: AGHT+IFaP6eO2guWjq/4dtQ6Exq/ime9W2re4abFClQj8KUDhyM4H+ySjxMg1zDh0IzF3fPDy7Ob/Q== X-Received: by 2002:a17:90b:1997:b0:26d:54de:b0d6 with SMTP id mv23-20020a17090b199700b0026d54deb0d6mr9656193pjb.20.1696684158515; Sat, 07 Oct 2023 06:09:18 -0700 (PDT) Received: from C02FV47ZMD6T.corp.ebay.com ([202.76.247.146]) by smtp.googlemail.com with ESMTPSA id fw14-20020a17090b128e00b002680b2d2ab6sm7130957pjb.19.2023.10.07.06.09.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Oct 2023 06:09:18 -0700 (PDT) From: Jianlin Lv X-Google-Original-From: Jianlin Lv To: tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, corbet@lwn.net, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, akpm@linux-foundation.org, yosryahmed@google.com, willy@infradead.org, linmiaohe@huawei.com, wangkefeng.wang@huawei.com, laoar.shao@gmail.com, yuzhao@google.com, wuyun.abel@bytedance.com, david@redhat.com, ying.huang@intel.com, peterx@redhat.com, vishal.moola@gmail.com, hughd@google.com Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, jianlv@ebay.com, iecedge@gmail.com Subject: [PATCH] memcg: add interface to force disable swap Date: Sat, 7 Oct 2023 21:09:05 +0800 Message-ID: <20231007130905.78554-1-jianlv@ebay.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: DD9E3100002 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 33qj633qdxhz96ot5x8zsm9rku1zsmmq X-HE-Tag: 1696684159-395474 X-HE-Meta: U2FsdGVkX19IWVOkBoQZhC1/O9+OCoP7dKYhA0sviYVeBJmmOTicghh4ZEpOeRE1Xcgz2ORhb6NIFobpxeuxjTorXgMrQ+IaT5pi34IBBGQumN2SW5itT93Py6e6uSIzdTYfQ1e2L4Dm3JAOwZFO4hDFwgX20xZzHPDjWZdzUATRtQuTWS8L2/AOWsyPuwKilF7vX8yuVuMkGRKgPQRYhwgG+d7cLWqFcxh269A6N/2mH4ftP1kMisfxhC0ja5xMvk0uu6OcYWR+G0llQvY/en4wU5K9oFJoNq0FUzFEkBvM/xyI4h+2zLwb7w2Y+EXMTXKB0He9NGMuj4KGgqMeBqBOqAxBa1UdnSbT497z7DjGmsgWHqIpleo4SR9+hKHbbwellf5ruR/HzzSimJ2kkShdKyBYCD2IMBtq8dwsLLnzAKPLdgYsE6kcII5yBYcPI038p3ORCZMI2FXUP+dr2+ZPDM+oFfhqL0v3jFwBuVV2R4GaOxjn+nZibMg36VB8PDrw0mwG6kJWZnPhZzFlLR4ZJsGNyMFiw6MsY6qljtB+koXRK8lCA44NFx30oavJHwdUO0x+k2NkJd6AfTBjSQ2B31dNC/adilvgDSaOoIy5d68k4Hl6aUs/Qg2Iq98CLpCG77Ga4djaWbyL9688LbYorgeLcXEYRoxpcy+oBCqlv0+VBUydECThF5idRvncr/h4vXz2sJ58jmKMdsY9xtJ2V53W63UCz27+vBaNTD/8jpMVg47UYuqaBQ47HRFX/sCfoCaM79bZRXh0x36Qedo0+NgoXvb1BvxiLYcMKUZ+NIBOdWQ+Qe12tVD9RXOJsOGpOPqO+qRWhxrZAyVwRIscsemBxV5QqFBq70JKFgJSoJEGl3Bi9hFLw7yo5GoWvl1GJdSsURev4NFcG2v9eowD5n/YuLgHk8aVqhN1vxAzY1auuTbFsRus5kck97fhxVIGYGYYXM3iy7FvKRC 5s0wNlzN 2anz/igE3qNeHOmdBQYoKr9jy+DdTBWfHNgJ3y88ybSiqBZg6wQ+iWcfEY0dnmRuZh9IHEqHL+Z9BSsvm8FlxHKtzHlkOtgGS26LQfWh0qjx14qoqQEL1cQv1iVR6y10OWPosZm3cf8TBWcjuIkZhU/Rs9CYy5VHy97FO2FCF+CZyltWeMHjxy9mrNsoKOyIqlSCxw32v32kVXVZJl1JrCLLGoyJIQ06IKoRN0Edv7uRcRIzkHOyDh6Cw7QOROFZ+8OAnU/ATpf8z/eBVoQKdL/AEmcXH34DYtyPfJ7f2Tt5c0EVB8MHQrRMOsthLpJRoLYqHJvCZ5BW7RtUYfMQsWa3BItjRTm9tG1IV68juVx7P2TUE0ls45BY0RpoV94KkhbE0LpTVo4iBOXeijYeIIhNOY05wktZ246q15PTri0M4kpPsefUo08zoVpenfyjRbwcpFIcbslLT+klcukwOrJsmaXt6IhLfjvn+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jianlin Lv Global reclaim will swap even if swappiness is set to 0. In particular case, users wish to be able to completely disable swap for specific processes. One scenario is that if JVM memory pages falls into swap, the performance will noticeably reduce and the GC pauses tend to increase to levels not tolerable by most applications. If it's possible to only disable swap out for specific processes, it can address the JVM GC pauses issues, and at the same time, memory reclaim pressure is also manageable. This patch adds "memory.swap_force_disable" control file to support disable swap for non-root cgroup. When process is associated with a cgroup, 'echo 1 > memory.swap_force_disable' will forbid anon pages be swapped out. This patch also adds read and write handler of the control file. Signed-off-by: Jianlin Lv --- .../admin-guide/cgroup-v1/memory.rst | 15 ++++++++++ include/linux/memcontrol.h | 1 + include/linux/swap.h | 15 ++++++++++ mm/memcontrol.c | 28 +++++++++++++++++++ mm/vmscan.c | 3 +- 5 files changed, 61 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index ff456871bf4b..be84b98bc6fe 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -86,6 +86,7 @@ Brief summary of control files. memory.pressure_level set memory pressure notifications memory.swappiness set/show swappiness parameter of vmscan (See sysctl's vm.swappiness) + memory.swap_force_disable set/show force disable swap memory.move_charge_at_immigrate set/show controls of moving charges This knob is deprecated and shouldn't be used. @@ -615,6 +616,20 @@ enforces that 0 swappiness really prevents from any swapping even if there is a swap storage available. This might lead to memcg OOM killer if there are no file pages to reclaim. +swap_force_disable is used to allow control group to disable swap even if swap +storage is available. This feature is disabled by default. If you want to +disable swap for specified processes, swap_force_disable can be setup by +following commands:: + + # cd /sys/fs/cgroup/memory/ + # mkdir test + # cd test + # echo 1 > memory.swap_force_disable + # echo > cgroup.procs + +.. note:: + swap_force_disable only take effect for non-root cgroups. + 5.4 failcnt ----------- diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e4e24da16d2c..b26dcb0756c0 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -246,6 +246,7 @@ struct mem_cgroup { int under_oom; int swappiness; + int swap_force_disable; /* OOM-Killer disable */ int oom_kill_disable; diff --git a/include/linux/swap.h b/include/linux/swap.h index 493487ed7c38..b202de576984 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -624,6 +624,21 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) } #endif +#ifdef CONFIG_MEMCG +static inline int mem_cgroup_swap_force_disable(struct mem_cgroup *memcg) +{ + if (mem_cgroup_disabled() || mem_cgroup_is_root(memcg)) + return 0; + + return memcg->swap_force_disable; +} +#else +static inline int mem_cgroup_swap_force_disable(struct mem_cgroup *memcg) +{ + return 0; +} +#endif + #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp); static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5b009b233ab8..024750444c79 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4196,6 +4196,28 @@ static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, return 0; } +static u64 mem_cgroup_swap_force_disable_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + return mem_cgroup_swap_force_disable(memcg); +} + +static int mem_cgroup_swap_force_disable_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + /* cannot set to root cgroup and only 0 and 1 are allowed */ + if (mem_cgroup_is_root(memcg) || !((val == 0) || (val == 1))) + return -EINVAL; + + memcg->swap_force_disable = val; + + return 0; +} + static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) { struct mem_cgroup_threshold_ary *t; @@ -5064,6 +5086,11 @@ static struct cftype mem_cgroup_legacy_files[] = { .read_u64 = mem_cgroup_swappiness_read, .write_u64 = mem_cgroup_swappiness_write, }, + { + .name = "swap_force_disable", + .read_u64 = mem_cgroup_swap_force_disable_read, + .write_u64 = mem_cgroup_swap_force_disable_write, + }, { .name = "move_charge_at_immigrate", .read_u64 = mem_cgroup_move_charge_read, @@ -5367,6 +5394,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); if (parent) { WRITE_ONCE(memcg->swappiness, mem_cgroup_swappiness(parent)); + WRITE_ONCE(memcg->swap_force_disable, mem_cgroup_swap_force_disable(parent)); WRITE_ONCE(memcg->oom_kill_disable, READ_ONCE(parent->oom_kill_disable)); page_counter_init(&memcg->memory, &parent->memory); diff --git a/mm/vmscan.c b/mm/vmscan.c index 6f13394b112e..5fdb4ac07007 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3029,6 +3029,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, struct mem_cgroup *memcg = lruvec_memcg(lruvec); unsigned long anon_cost, file_cost, total_cost; int swappiness = mem_cgroup_swappiness(memcg); + int swap_force_disable = mem_cgroup_swap_force_disable(memcg); u64 fraction[ANON_AND_FILE]; u64 denominator = 0; /* gcc */ enum scan_balance scan_balance; @@ -3036,7 +3037,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, enum lru_list lru; /* If we have no swap space, do not bother scanning anon folios. */ - if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id, sc)) { + if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id, sc) || swap_force_disable) { scan_balance = SCAN_FILE; goto out; } -- 2.34.1