All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: MM <linux-mm@kvack.org>, Matthew Wilcox <willy@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Stephen Brennan <stephen.s.brennan@oracle.com>,
	Yu Zhao <yuzhao@google.com>, David Hildenbrand <david@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] mm/vmscan: add periodic slab shrinker
Date: Sun,  3 Apr 2022 08:56:18 +0800	[thread overview]
Message-ID: <20220403005618.5263-1-hdanton@sina.com> (raw)
In-Reply-To: <CE7BB198-BF09-4D9F-AE99-85324B81E472@linux.dev>

On Sat, 2 Apr 2022 10:54:36 -0700 Roman Gushchin wrote:
> Hello Hillf!
> 
Hello Roman,

> Thank you for sharing it, really interesting! I=E2=80=99m actually working o=
> n the same problem.=20

Good to know you have some interest in it.
Feel free to let me know you would like to take it over to avoid
repeated works on both sides.

> 
> No code to share yet, but here are some of my thoughts:
> 1) If there is a =E2=80=9Cnatural=E2=80=9D memory pressure, no additional sl=
> ab scanning is needed.

Agree - the periodic shrinker can be canceled once kswapd wakes up.

> 2) =46rom a power perspective it=E2=80=99s better to scan more at once, but l=
> ess often.

The shrinker proposed is a catapult on the vmscan side without knowing
where the cold slab objects are piling up in Dave's backyard but he is
free to take different actions than the regular shrinker - IOW this
shrinker alone does not make much sense wrt shooting six birds without
the stone on the slab owner side.

It is currently scanning *every* slab cache at an arbitrary frequency,
once 30 seconds - I am open to a minute or whatever.

> 3) Maybe we need a feedback loop with the slab allocator: e.g. if slabs are a=
> lmost full there is more sense to do a proactive scanning and free up some m=
> emory, otherwise we=E2=80=99ll end up allocating more slabs. But it=E2=80=99=
> s tricky.

There are 31 bits available in the periodic flag added to shrink control.

> 4) If the scanning is not resulting in any memory reclaim, maybe we should (=
> temporarily) exclude the corresponding shrinker from the scanning.

Given the periodic flag, Dave is free to ignore the scan request and the
scan result is currently dropped on the vmscan side because what is
considered is the cold slab objects that for instance have been inactive
for more than 30 seconds in every slab cache, rather than kswapd's cake.

BR
Hillf
> 
> Thanks!
> 
> > On Apr 2, 2022, at 12:21 AM, Hillf Danton <hdanton@sina.com> wrote:
> >=20
> > =EF=BB=BFTo mitigate the pain of having "several millions" of negative den=
> tries in
> > a single directory [1] for example, add the periodic slab shrinker that
> > runs independent of direct and background reclaimers in bid to recycle the=
> 
> > slab objects that haven been cold for more than 30 seconds.
> >=20
> > Q, Why is it needed?
> > A, Kswapd may take a nap as long as 30 minutes.
> >=20
> > Add periodic flag to shrink control to let cache owners know this is the
> > periodic shrinker that equals to the regular one running at the lowest
> > recalim priority, and feel free to take no action without one-off objects
> > piling up.
> >=20
> > Only for thoughts now.
> >=20
> > Hillf
> >=20
> > [1] https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.=
> s.brennan@oracle.com/
> >=20
> > --- x/include/linux/shrinker.h
> > +++ y/include/linux/shrinker.h
> > @@ -14,6 +14,7 @@ struct shrink_control {
> >=20
> >    /* current node being shrunk (for NUMA aware shrinkers) */
> >    int nid;
> > +    int periodic;
> >=20
> >    /*
> >     * How many objects scan_objects should scan and try to reclaim.
> > --- x/mm/vmscan.c
> > +++ y/mm/vmscan.c
> > @@ -781,6 +781,8 @@ static unsigned long do_shrink_slab(stru
> >        scanned +=3D shrinkctl->nr_scanned;
> >=20
> >        cond_resched();
> > +        if (shrinkctl->periodic)
> > +            break;
> >    }
> >=20
> >    /*
> > @@ -906,7 +908,8 @@ static unsigned long shrink_slab_memcg(g
> >  */
> > static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> >                 struct mem_cgroup *memcg,
> > -                 int priority)
> > +                 int priority,
> > +                 int periodic)
> > {
> >    unsigned long ret, freed =3D 0;
> >    struct shrinker *shrinker;
> > @@ -929,6 +932,7 @@ static unsigned long shrink_slab(gfp_t g
> >            .gfp_mask =3D gfp_mask,
> >            .nid =3D nid,
> >            .memcg =3D memcg,
> > +            .periodic =3D periodic,
> >        };
> >=20
> >        ret =3D do_shrink_slab(&sc, shrinker, priority);
> > @@ -952,7 +956,7 @@ out:
> >    return freed;
> > }
> >=20
> > -static void drop_slab_node(int nid)
> > +static void drop_slab_node(int nid, int periodic)
> > {
> >    unsigned long freed;
> >    int shift =3D 0;
> > @@ -966,19 +970,31 @@ static void drop_slab_node(int nid)
> >        freed =3D 0;
> >        memcg =3D mem_cgroup_iter(NULL, NULL, NULL);
> >        do {
> > -            freed +=3D shrink_slab(GFP_KERNEL, nid, memcg, 0);
> > +            freed +=3D shrink_slab(GFP_KERNEL, nid, memcg, 0, periodic);
> >        } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)) !=3D NULL);=
> 
> >    } while ((freed >> shift++) > 1);
> > }
> >=20
> > -void drop_slab(void)
> > +static void __drop_slab(int periodic)
> > {
> >    int nid;
> >=20
> >    for_each_online_node(nid)
> > -        drop_slab_node(nid);
> > +        drop_slab_node(nid, periodic);
> > +}
> > +
> > +void drop_slab(void)
> > +{
> > +    __drop_slab(0);
> > }
> >=20
> > +static void periodic_slab_shrinker_workfn(struct work_struct *work)
> > +{
> > +    __drop_slab(1);
> > +    queue_delayed_work(system_unbound_wq, to_delayed_work(work), 30*HZ);
> > +}
> > +static DECLARE_DELAYED_WORK(periodic_slab_shrinker, periodic_slab_shrinke=
> r_workfn);
> > +
> > static inline int is_page_cache_freeable(struct folio *folio)
> > {
> >    /*
> > @@ -3098,7 +3114,7 @@ static void shrink_node_memcgs(pg_data_t
> >        shrink_lruvec(lruvec, sc);
> >=20
> >        shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
> > -                sc->priority);
> > +                sc->priority, 0);
> >=20
> >        /* Record the group's reclaim efficiency */
> >        vmpressure(sc->gfp_mask, memcg, false,
> > @@ -4354,8 +4370,11 @@ static void kswapd_try_to_sleep(pg_data_
> >         */
> >        set_pgdat_percpu_threshold(pgdat, calculate_normal_threshold);
> >=20
> > -        if (!kthread_should_stop())
> > +        if (!kthread_should_stop()) {
> > +            queue_delayed_work(system_unbound_wq,
> > +                        &periodic_slab_shrinker, 60*HZ);
> >            schedule();
> > +        }
> >=20
> >        set_pgdat_percpu_threshold(pgdat, calculate_pressure_threshold);
> >    } else {
> > --


  reply	other threads:[~2022-04-03  0:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-02  7:21 [RFC] mm/vmscan: add periodic slab shrinker Hillf Danton
2022-04-02 17:54 ` Roman Gushchin
2022-04-03  0:56   ` Hillf Danton [this message]
2022-04-04  1:09     ` Dave Chinner
2022-04-04  5:14       ` Hillf Danton
2022-04-04 18:32         ` Roman Gushchin
2022-04-04 19:08       ` Roman Gushchin
2022-04-05  5:17         ` Dave Chinner
2022-04-05 16:35           ` Roman Gushchin
2022-04-05 20:58             ` Yang Shi
2022-04-05 21:21               ` Matthew Wilcox
2022-04-06  0:01                 ` Dave Chinner
2022-04-06  4:14                   ` Hillf Danton
2022-04-21 19:03                   ` Kent Overstreet
2022-04-21 23:55                     ` Dave Chinner
2022-04-05 21:31               ` Roman Gushchin
2022-04-06  0:11                 ` Dave Chinner
2022-04-05 17:22       ` Stephen Brennan
2022-04-05 21:18         ` Matthew Wilcox
2022-04-05 23:54           ` Dave Chinner
2022-04-06  1:06             ` Stephen Brennan
2022-04-06  3:52               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220403005618.5263-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=roman.gushchin@linux.dev \
    --cc=stephen.s.brennan@oracle.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.