From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 434D6C433EF for ; Tue, 22 Mar 2022 07:23:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE1926B0072; Tue, 22 Mar 2022 03:23:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B90706B0073; Tue, 22 Mar 2022 03:23:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A314E8D0001; Tue, 22 Mar 2022 03:23:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 918736B0072 for ; Tue, 22 Mar 2022 03:23:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 44A01182751D3 for ; Tue, 22 Mar 2022 07:23:03 +0000 (UTC) X-FDA: 79271180646.17.5007DDC Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf21.hostedemail.com (Postfix) with ESMTP id BF8601C001B for ; Tue, 22 Mar 2022 07:23:02 +0000 (UTC) Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-2e612af95e3so69108647b3.9 for ; Tue, 22 Mar 2022 00:23:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=z2A4+dHsl/ogD4uXCEY/Y6/9TFLjlFsFb7JPdKbMfvY=; b=W6lHwCHIfKZWp0EWgiiYhQB2Ksv2qZcfvD8ez5KwL3HZmV9VxCrD082ushSrvwwBRl iXyPbXkq1cfl0PLv/KobJ0RwTHxoQxYx+ZMlUETpUVpXBILGlBSxTQ4jummhc5JecgHk uSScsl3GuA8RKuu9YJk/mrpicnEgBXpXK/rEaW6R04ddMTXbNvxXns+ZlVC2HKFq8PeR Hjgi3oJZk5XOPFY7rdyjk+cjSma7LtnH/NgupgkqbOlwDM3l7UH+j6qzdAXcBNShQfgk gUyAT0D563iZcbqYIMDL93amFeoIAgDDRj1e9tq4j7fXQl2bh4VGZId7fikeLdBO500j NK0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=z2A4+dHsl/ogD4uXCEY/Y6/9TFLjlFsFb7JPdKbMfvY=; b=gKhRfAvqrBEoWHbOT1ltRH599JmFzuxwBoqIvjJjM1UdaOToTMtIdyKHtDl2qVcDfZ cbZwBuWAF9/eMyadsgbbIb4B/1fq0QhKMWd7q/OUCiNufYsQkRNvBI86bGujvEl08Lit jg4EoD2ykf9PEnZYmJBjL3KGosyhzC0V3zvoKxvHUDhVKtBLIcCZSAWhYFxDTx4yJ4SN 5h1ZtC9IlOQUUU4vdNp9Tdbs7Zyy3rKD/57s62HvFoKdXw31NTWABoCgTG+4AsHkvvzI +T1RUcTpyWQ+6kjlTvTlk7xJnI2+F2UtQaJKF1o7B4VPCRfffM8/v3EXJWAepUNsYmT2 xAaQ== X-Gm-Message-State: AOAM530/PSXDaxz157IlNquGGbduGmNRtRh16s5vi1zlssVLIjBJfQh1 DhMxK7cuGvO548E0gfDJcGZ0IvBViz9k5sNhG+o= X-Google-Smtp-Source: ABdhPJx3/eBtOtwteUg66RLu+UoP08EC0GR8O62g/Lw7/VeBHW4bejbs+b6FeSIydzWrzJvziTtJ24jF3R913wJVg4E= X-Received: by 2002:a81:70c7:0:b0:2e5:8350:a7 with SMTP id l190-20020a8170c7000000b002e5835000a7mr28873733ywc.4.1647933781889; Tue, 22 Mar 2022 00:23:01 -0700 (PDT) MIME-Version: 1.0 References: <20220309021230.721028-1-yuzhao@google.com> <20220309021230.721028-12-yuzhao@google.com> In-Reply-To: <20220309021230.721028-12-yuzhao@google.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 22 Mar 2022 20:22:51 +1300 Message-ID: Subject: Re: [PATCH v9 11/14] mm: multi-gen LRU: thrashing prevention To: Yu Zhao Cc: Andrew Morton , Linus Torvalds , Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , LAK , Linux Doc Mailing List , LKML , Linux-MM , Kernel Page Reclaim v2 , x86 , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BF8601C001B X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=W6lHwCHI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=21cnbao@gmail.com X-Stat-Signature: z4h3y4f5wun88mpx1xq43oortxojaiw3 X-Rspamd-Server: rspam07 X-HE-Tag: 1647933782-722016 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 9, 2022 at 3:48 PM Yu Zhao wrote: > > Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention, as > requested by many desktop users [1]. > > When set to value N, it prevents the working set of N milliseconds > from getting evicted. The OOM killer is triggered if this working set > cannot be kept in memory. Based on the average human detectable lag > (~100ms), N=3D1000 usually eliminates intolerable lags due to thrashing. > Larger values like N=3D3000 make lags less noticeable at the risk of > premature OOM kills. > > Compared with the size-based approach, e.g., [2], this time-based > approach has the following advantages: > 1. It is easier to configure because it is agnostic to applications > and memory sizes. > 2. It is more reliable because it is directly wired to the OOM killer. > how are userspace oom daemons like android lmkd, systemd-oomd supposed to work with this time-based oom killer? only one of min_ttl_ms and userspace daemon should be enabled? or both should be enabled at the same time? > [1] https://lore.kernel.org/lkml/Ydza%2FzXKY9ATRoh6@google.com/ > [2] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/ > > Signed-off-by: Yu Zhao > Acked-by: Brian Geffon > Acked-by: Jan Alexander Steffens (heftig) > Acked-by: Oleksandr Natalenko > Acked-by: Steven Barrett > Acked-by: Suleiman Souhlal > Tested-by: Daniel Byrne > Tested-by: Donald Carr > Tested-by: Holger Hoffst=C3=A4tte > Tested-by: Konstantin Kharlamov > Tested-by: Shuang Zhai > Tested-by: Sofia Trinh > Tested-by: Vaibhav Jain > --- > include/linux/mmzone.h | 2 ++ > mm/vmscan.c | 69 +++++++++++++++++++++++++++++++++++++++--- > 2 files changed, 67 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 116c9237e401..f98f9ce50e67 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -403,6 +403,8 @@ struct lru_gen_struct { > unsigned long max_seq; > /* the eviction increments the oldest generation numbers */ > unsigned long min_seq[ANON_AND_FILE]; > + /* the birth time of each generation in jiffies */ > + unsigned long timestamps[MAX_NR_GENS]; > /* the multi-gen LRU lists */ > struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; > /* the sizes of the above lists */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 55cc7d6b018b..6aa083b8bb26 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4229,6 +4229,7 @@ static void inc_max_seq(struct lruvec *lruvec) > for (type =3D 0; type < ANON_AND_FILE; type++) > reset_ctrl_pos(lruvec, type, false); > > + WRITE_ONCE(lrugen->timestamps[next], jiffies); > /* make sure preceding modifications appear */ > smp_store_release(&lrugen->max_seq, lrugen->max_seq + 1); > > @@ -4340,7 +4341,8 @@ static long get_nr_evictable(struct lruvec *lruvec,= unsigned long max_seq, > return total > 0 ? total : 0; > } > > -static void age_lruvec(struct lruvec *lruvec, struct scan_control *sc) > +static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, > + unsigned long min_ttl) > { > bool need_aging; > long nr_to_scan; > @@ -4349,14 +4351,22 @@ static void age_lruvec(struct lruvec *lruvec, str= uct scan_control *sc) > DEFINE_MAX_SEQ(lruvec); > DEFINE_MIN_SEQ(lruvec); > > + if (min_ttl) { > + int gen =3D lru_gen_from_seq(min_seq[LRU_GEN_FILE]); > + unsigned long birth =3D READ_ONCE(lruvec->lrugen.timestam= ps[gen]); > + > + if (time_is_after_jiffies(birth + min_ttl)) > + return false; > + } > + > mem_cgroup_calculate_protection(NULL, memcg); > > if (mem_cgroup_below_min(memcg)) > - return; > + return false; > > nr_to_scan =3D get_nr_evictable(lruvec, max_seq, min_seq, swappin= ess, &need_aging); > if (!nr_to_scan) > - return; > + return false; > > nr_to_scan >>=3D sc->priority; > > @@ -4365,11 +4375,18 @@ static void age_lruvec(struct lruvec *lruvec, str= uct scan_control *sc) > > if (nr_to_scan && need_aging && (!mem_cgroup_below_low(memcg) || = sc->memcg_low_reclaim)) > try_to_inc_max_seq(lruvec, max_seq, sc, swappiness, false= ); > + > + return true; > } > > +/* to protect the working set of the last N jiffies */ > +static unsigned long lru_gen_min_ttl __read_mostly; > + > static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_cont= rol *sc) > { > struct mem_cgroup *memcg; > + bool success =3D false; > + unsigned long min_ttl =3D READ_ONCE(lru_gen_min_ttl); > > VM_BUG_ON(!current_is_kswapd()); > > @@ -4395,12 +4412,29 @@ static void lru_gen_age_node(struct pglist_data *= pgdat, struct scan_control *sc) > do { > struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat)= ; > > - age_lruvec(lruvec, sc); > + if (age_lruvec(lruvec, sc, min_ttl)) > + success =3D true; > > cond_resched(); > } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL))); > > current->reclaim_state->mm_walk =3D NULL; > + > + /* > + * The main goal is to OOM kill if every generation from all memc= gs is > + * younger than min_ttl. However, another theoretical possibility= is all > + * memcgs are either below min or empty. > + */ > + if (!success && mutex_trylock(&oom_lock)) { > + struct oom_control oc =3D { > + .gfp_mask =3D sc->gfp_mask, > + .order =3D sc->order, > + }; > + > + out_of_memory(&oc); > + > + mutex_unlock(&oom_lock); > + } > } > > /* > @@ -5112,6 +5146,28 @@ static void lru_gen_change_state(bool enable) > * sysfs interface > ***********************************************************************= *******/ > > +static ssize_t show_min_ttl(struct kobject *kobj, struct kobj_attribute = *attr, char *buf) > +{ > + return sprintf(buf, "%u\n", jiffies_to_msecs(READ_ONCE(lru_gen_mi= n_ttl))); > +} > + > +static ssize_t store_min_ttl(struct kobject *kobj, struct kobj_attribute= *attr, > + const char *buf, size_t len) > +{ > + unsigned int msecs; > + > + if (kstrtouint(buf, 0, &msecs)) > + return -EINVAL; > + > + WRITE_ONCE(lru_gen_min_ttl, msecs_to_jiffies(msecs)); > + > + return len; > +} > + > +static struct kobj_attribute lru_gen_min_ttl_attr =3D __ATTR( > + min_ttl_ms, 0644, show_min_ttl, store_min_ttl > +); > + > static ssize_t show_enable(struct kobject *kobj, struct kobj_attribute *= attr, char *buf) > { > unsigned int caps =3D 0; > @@ -5160,6 +5216,7 @@ static struct kobj_attribute lru_gen_enabled_attr = =3D __ATTR( > ); > > static struct attribute *lru_gen_attrs[] =3D { > + &lru_gen_min_ttl_attr.attr, > &lru_gen_enabled_attr.attr, > NULL > }; > @@ -5175,12 +5232,16 @@ static struct attribute_group lru_gen_attr_group = =3D { > > void lru_gen_init_lruvec(struct lruvec *lruvec) > { > + int i; > int gen, type, zone; > struct lru_gen_struct *lrugen =3D &lruvec->lrugen; > > lrugen->max_seq =3D MIN_NR_GENS + 1; > lrugen->enabled =3D lru_gen_enabled(); > > + for (i =3D 0; i <=3D MIN_NR_GENS + 1; i++) > + lrugen->timestamps[i] =3D jiffies; > + > for_each_gen_type_zone(gen, type, zone) > INIT_LIST_HEAD(&lrugen->lists[gen][type][zone]); > > -- > 2.35.1.616.g0bdcbb4464-goog > Thanks Barry