From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E42D7C433EF for ; Sun, 3 Apr 2022 00:56:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44F9C6B0071; Sat, 2 Apr 2022 20:56:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FFA26B0072; Sat, 2 Apr 2022 20:56:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C7026B0073; Sat, 2 Apr 2022 20:56:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 1BDA06B0071 for ; Sat, 2 Apr 2022 20:56:44 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B887B1833A4B0 for ; Sun, 3 Apr 2022 00:56:33 +0000 (UTC) X-FDA: 79313752266.19.EB6BDB7 Received: from mail3-162.sinamail.sina.com.cn (mail3-162.sinamail.sina.com.cn [202.108.3.162]) by imf19.hostedemail.com (Postfix) with SMTP id 0394C1A0006 for ; Sun, 3 Apr 2022 00:56:30 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.249.57.134]) by sina.com (172.16.97.27) with ESMTP id 6248F0BA0002C8B5; Sun, 3 Apr 2022 08:56:28 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 22135549283354 From: Hillf Danton To: Roman Gushchin Cc: MM , Matthew Wilcox , Dave Chinner , Mel Gorman , Stephen Brennan , Yu Zhao , David Hildenbrand , LKML Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker Date: Sun, 3 Apr 2022 08:56:18 +0800 Message-Id: <20220403005618.5263-1-hdanton@sina.com> In-Reply-To: References: <20220402072103.5140-1-hdanton@sina.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: qh5gqp3gdr3ugzh5au16dkinuqyo86a7 Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.162 as permitted sender) smtp.mailfrom=hdanton@sina.com X-Rspamd-Queue-Id: 0394C1A0006 X-HE-Tag: 1648947390-78547 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 2 Apr 2022 10:54:36 -0700 Roman Gushchin wrote: > Hello Hillf! >=20 Hello Roman, > Thank you for sharing it, really interesting! I=3DE2=3D80=3D99m actuall= y working o=3D > n the same problem.=3D20 Good to know you have some interest in it. Feel free to let me know you would like to take it over to avoid repeated works on both sides. >=20 > No code to share yet, but here are some of my thoughts: > 1) If there is a =3DE2=3D80=3D9Cnatural=3DE2=3D80=3D9D memory pressure,= no additional sl=3D > ab scanning is needed. Agree - the periodic shrinker can be canceled once kswapd wakes up. > 2) =3D46rom a power perspective it=3DE2=3D80=3D99s better to scan more = at once, but l=3D > ess often. The shrinker proposed is a catapult on the vmscan side without knowing where the cold slab objects are piling up in Dave's backyard but he is free to take different actions than the regular shrinker - IOW this shrinker alone does not make much sense wrt shooting six birds without the stone on the slab owner side. It is currently scanning *every* slab cache at an arbitrary frequency, once 30 seconds - I am open to a minute or whatever. > 3) Maybe we need a feedback loop with the slab allocator: e.g. if slabs= are a=3D > lmost full there is more sense to do a proactive scanning and free up s= ome m=3D > emory, otherwise we=3DE2=3D80=3D99ll end up allocating more slabs. But = it=3DE2=3D80=3D99=3D > s tricky. There are 31 bits available in the periodic flag added to shrink control. > 4) If the scanning is not resulting in any memory reclaim, maybe we sho= uld (=3D > temporarily) exclude the corresponding shrinker from the scanning. Given the periodic flag, Dave is free to ignore the scan request and the scan result is currently dropped on the vmscan side because what is considered is the cold slab objects that for instance have been inactive for more than 30 seconds in every slab cache, rather than kswapd's cake. BR Hillf >=20 > Thanks! >=20 > > On Apr 2, 2022, at 12:21 AM, Hillf Danton wrote: > >=3D20 > > =3DEF=3DBB=3DBFTo mitigate the pain of having "several millions" of n= egative den=3D > tries in > > a single directory [1] for example, add the periodic slab shrinker th= at > > runs independent of direct and background reclaimers in bid to recycl= e the=3D >=20 > > slab objects that haven been cold for more than 30 seconds. > >=3D20 > > Q, Why is it needed? > > A, Kswapd may take a nap as long as 30 minutes. > >=3D20 > > Add periodic flag to shrink control to let cache owners know this is = the > > periodic shrinker that equals to the regular one running at the lowes= t > > recalim priority, and feel free to take no action without one-off obj= ects > > piling up. > >=3D20 > > Only for thoughts now. > >=3D20 > > Hillf > >=3D20 > > [1] https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-ste= phen.=3D > s.brennan@oracle.com/ > >=3D20 > > --- x/include/linux/shrinker.h > > +++ y/include/linux/shrinker.h > > @@ -14,6 +14,7 @@ struct shrink_control { > >=3D20 > > /* current node being shrunk (for NUMA aware shrinkers) */ > > int nid; > > + int periodic; > >=3D20 > > /* > > * How many objects scan_objects should scan and try to reclaim. > > --- x/mm/vmscan.c > > +++ y/mm/vmscan.c > > @@ -781,6 +781,8 @@ static unsigned long do_shrink_slab(stru > > scanned +=3D3D shrinkctl->nr_scanned; > >=3D20 > > cond_resched(); > > + if (shrinkctl->periodic) > > + break; > > } > >=3D20 > > /* > > @@ -906,7 +908,8 @@ static unsigned long shrink_slab_memcg(g > > */ > > static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > > struct mem_cgroup *memcg, > > - int priority) > > + int priority, > > + int periodic) > > { > > unsigned long ret, freed =3D3D 0; > > struct shrinker *shrinker; > > @@ -929,6 +932,7 @@ static unsigned long shrink_slab(gfp_t g > > .gfp_mask =3D3D gfp_mask, > > .nid =3D3D nid, > > .memcg =3D3D memcg, > > + .periodic =3D3D periodic, > > }; > >=3D20 > > ret =3D3D do_shrink_slab(&sc, shrinker, priority); > > @@ -952,7 +956,7 @@ out: > > return freed; > > } > >=3D20 > > -static void drop_slab_node(int nid) > > +static void drop_slab_node(int nid, int periodic) > > { > > unsigned long freed; > > int shift =3D3D 0; > > @@ -966,19 +970,31 @@ static void drop_slab_node(int nid) > > freed =3D3D 0; > > memcg =3D3D mem_cgroup_iter(NULL, NULL, NULL); > > do { > > - freed +=3D3D shrink_slab(GFP_KERNEL, nid, memcg, 0); > > + freed +=3D3D shrink_slab(GFP_KERNEL, nid, memcg, 0, peri= odic); > > } while ((memcg =3D3D mem_cgroup_iter(NULL, memcg, NULL)) !=3D= 3D NULL);=3D >=20 > > } while ((freed >> shift++) > 1); > > } > >=3D20 > > -void drop_slab(void) > > +static void __drop_slab(int periodic) > > { > > int nid; > >=3D20 > > for_each_online_node(nid) > > - drop_slab_node(nid); > > + drop_slab_node(nid, periodic); > > +} > > + > > +void drop_slab(void) > > +{ > > + __drop_slab(0); > > } > >=3D20 > > +static void periodic_slab_shrinker_workfn(struct work_struct *work) > > +{ > > + __drop_slab(1); > > + queue_delayed_work(system_unbound_wq, to_delayed_work(work), 30*= HZ); > > +} > > +static DECLARE_DELAYED_WORK(periodic_slab_shrinker, periodic_slab_sh= rinke=3D > r_workfn); > > + > > static inline int is_page_cache_freeable(struct folio *folio) > > { > > /* > > @@ -3098,7 +3114,7 @@ static void shrink_node_memcgs(pg_data_t > > shrink_lruvec(lruvec, sc); > >=3D20 > > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > > - sc->priority); > > + sc->priority, 0); > >=3D20 > > /* Record the group's reclaim efficiency */ > > vmpressure(sc->gfp_mask, memcg, false, > > @@ -4354,8 +4370,11 @@ static void kswapd_try_to_sleep(pg_data_ > > */ > > set_pgdat_percpu_threshold(pgdat, calculate_normal_threshold); > >=3D20 > > - if (!kthread_should_stop()) > > + if (!kthread_should_stop()) { > > + queue_delayed_work(system_unbound_wq, > > + &periodic_slab_shrinker, 60*HZ); > > schedule(); > > + } > >=3D20 > > set_pgdat_percpu_threshold(pgdat, calculate_pressure_threshold= ); > > } else { > > --