linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Huang Ying <ying.huang@intel.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	weixugc@google.com, fvdl@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems
Date: Thu, 8 Dec 2022 09:09:39 +0100	[thread overview]
Message-ID: <Y5Gbwwp7AlFiltuu@dhcp22.suse.cz> (raw)
In-Reply-To: <CAHS8izMKK107wVFSJvg36nQ=WzXd8_cjYBtR0p47L+XLYUSsqA@mail.gmail.com>

On Wed 07-12-22 13:43:55, Mina Almasry wrote:
> On Wed, Dec 7, 2022 at 3:12 AM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > Anyway a proper nr_reclaimed tracking should be rather straightforward
> > but I do not expect to make a big difference in practice
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 026199c047e0..1b7f2d8cb128 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1633,7 +1633,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >         LIST_HEAD(ret_folios);
> >         LIST_HEAD(free_folios);
> >         LIST_HEAD(demote_folios);
> > -       unsigned int nr_reclaimed = 0;
> > +       unsigned int nr_reclaimed = 0, nr_demoted = 0;
> >         unsigned int pgactivate = 0;
> >         bool do_demote_pass;
> >         struct swap_iocb *plug = NULL;
> > @@ -2065,8 +2065,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >         }
> >         /* 'folio_list' is always empty here */
> >
> > -       /* Migrate folios selected for demotion */
> > -       nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
> > +       /*
> > +        * Migrate folios selected for demotion.
> > +        * Do not consider demoted pages to be reclaimed for the memcg reclaim
> > +        * because no charges are really freed during the migration. Global
> > +        * reclaim aims at releasing memory from nodes/zones so consider
> > +        * demotion to reclaim memory.
> > +        */
> > +       nr_demoted += demote_folio_list(&demote_folios, pgdat);
> > +       if (!cgroup_reclaim(sc))
> > +               nr_reclaimed += nr_demoted;
> > +
> >         /* Folios that could not be demoted are still in @demote_folios */
> >         if (!list_empty(&demote_folios)) {
> >                 /* Folios which weren't demoted go back on @folio_list for retry: */
> >
> > [...]
> 
> Thank you again, but this patch breaks the memory.reclaim nodes arg
> for me. This is my test case. I run it on a machine with 2 memory
> tiers.
> 
> Memory tier 1= nodes 0-2
> Memory tier 2= node 3
> 
>     mkdir -p /sys/fs/cgroup/unified/test
>     cd /sys/fs/cgroup/unified/test
>     echo $$ > cgroup.procs
>     head -c 500m /dev/random > /tmp/testfile
>     echo $$ > /sys/fs/cgroup/unified/cgroup.procs
>     echo "1m nodes=0-2" > memory.reclaim
> 
> In my opinion the expected behavior is for the kernel to demote 1mb of
> memory from nodes 0-2 to node 3.
> 
> Actual behavior on the tip of mm-unstable is as expected.
> 
> Actual behavior with your patch cherry-picked to mm-unstable is that
> the kernel demotes all 500mb of memory from nodes 0-2 to node 3, and
> returns -EAGAIN to the user. This may be the correct behavior you're
> intending, but it completely breaks the use case I implemented the
> nodes= arg for and listed on the commit message of that change.

Yes, strictly speaking the behavior is correct albeit unexpected. You
have told the kernel to _reclaim_ that much memory but demotion are
simply aging handling rather than a reclaim if the demotion target has a
lot of memory free. This would be the case without any nodemask as well
btw.

I am worried this will popping up again and again. I thought your nodes
subset approach could deal with this but I have overlooked one important
thing in your patch. The user provided nodemask controls where to
reclaim from but it doesn't constrain demotion targets. Is this
intentional? Would it actually make more sense to control demotion by
addint demotion nodes into the nodemask?

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2022-12-08  8:09 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06  2:34 [PATCH v3] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems Mina Almasry
2022-12-06  3:13 ` Huang, Ying
2022-12-06  4:15   ` Mina Almasry
2022-12-06  5:22     ` Huang, Ying
2022-12-06 12:20 ` Michal Hocko
2022-12-06 16:06   ` Mina Almasry
2022-12-06 19:55     ` Michal Hocko
2022-12-07  1:22       ` Huang, Ying
2022-12-07  1:55       ` Mina Almasry
2022-12-07 11:12         ` Michal Hocko
2022-12-07 21:43           ` Mina Almasry
2022-12-08  8:09             ` Michal Hocko [this message]
2022-12-08  9:00               ` Mina Almasry
2022-12-08 11:54                 ` Michal Hocko
2022-12-09  0:59                   ` Wei Xu
2022-12-09  8:08                     ` Michal Hocko
2022-12-09 16:41                       ` Wei Xu
2022-12-09 21:16                         ` Michal Hocko
2022-12-09 21:39                           ` Mina Almasry
2022-12-12  8:33                             ` Michal Hocko
2022-12-10  8:01                           ` Wei Xu
2022-12-12  8:36                             ` Michal Hocko
2022-12-06 15:15 ` kernel test robot
2022-12-06 18:17 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5Gbwwp7AlFiltuu@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=weixugc@google.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).