linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	 Yang Shi <yang.shi@linux.alibaba.com>,
	David Rientjes <rientjes@google.com>,
	 Huang Ying <ying.huang@intel.com>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim
Date: Tue, 2 Feb 2021 14:45:29 -0800	[thread overview]
Message-ID: <CAHbLzkrSYsoVV1eHHO9kWv2xe96qmAt6dmC_FsBrydsZxGLvew@mail.gmail.com> (raw)
In-Reply-To: <20210202115516.GC12139@linux>

On Tue, Feb 2, 2021 at 3:55 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Mon, Jan 25, 2021 at 04:34:27PM -0800, Dave Hansen wrote:
> >
> > From: Dave Hansen <dave.hansen@linux.intel.com>
> >
> > This is mostly derived from a patch from Yang Shi:
> >
> >       https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang.shi@linux.alibaba.com/
> >
> > Add code to the reclaim path (shrink_page_list()) to "demote" data
> > to another NUMA node instead of discarding the data.  This always
> > avoids the cost of I/O needed to read the page back in and sometimes
> > avoids the writeout cost when the pagee is dirty.
> >
> > A second pass through shrink_page_list() will be made if any demotions
> > fail.  This essentally falls back to normal reclaim behavior in the
> > case that demotions fail.  Previous versions of this patch may have
> > simply failed to reclaim pages which were eligible for demotion but
> > were unable to be demoted in practice.
> >
> > Note: This just adds the start of infratructure for migration. It is
> > actually disabled next to the FIXME in migrate_demote_page_ok().
> >
> > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Yang Shi <yang.shi@linux.alibaba.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Huang Ying <ying.huang@intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: osalvador <osalvador@suse.de>
> >
> > --
> >
> > changes from 202010:
> >  * add MR_NUMA_MISPLACED to trace MIGRATE_REASON define
> >  * make migrate_demote_page_ok() static, remove 'sc' arg until
> >    later patch
> >  * remove unnecessary alloc_demote_page() hugetlb warning
> >  * Simplify alloc_demote_page() gfp mask.  Depend on
> >    __GFP_NORETRY to make it lightweight instead of fancier
> >    stuff like leaving out __GFP_IO/FS.
> >  * Allocate migration page with alloc_migration_target()
> >    instead of allocating directly.
> > changes from 20200730:
> >  * Add another pass through shrink_page_list() when demotion
> >    fails.
> > ---
>
> [...]
>
> > +static struct page *alloc_demote_page(struct page *page, unsigned long node)
> > +{
> > +        struct migration_target_control mtc = {
> > +             /*
> > +              * Fail quickly and quietly.  Page will likely
> > +              * just be discarded instead of migrated.
> > +              */
> > +             .gfp_mask = GFP_HIGHUSER | __GFP_NORETRY | __GFP_NOWARN,
> > +             .nid = node
> > +     };
> > +
> > +        return alloc_migration_target(page, (unsigned long)&mtc);
> > +}
>
> Migration for THP pages will set direct reclaim. I guess that is fine right?
> AFAIK, direct reclaim will only be tried once with GFP_NORETRY.
>
> > +
> > +/*
> > + * Take pages on @demote_list and attempt to demote them to
> > + * another node.  Pages which are not demoted are left on
> > + * @demote_pages.
> > + */
> > +static unsigned int demote_page_list(struct list_head *demote_pages,
> > +                                  struct pglist_data *pgdat,
> > +                                  struct scan_control *sc)
> > +{
> > +     int target_nid = next_demotion_node(pgdat->node_id);
> > +     unsigned int nr_succeeded = 0;
> > +     int err;
> > +
> > +     if (list_empty(demote_pages))
> > +             return 0;
> > +
> > +     /* Demotion ignores all cpuset and mempolicy settings */
> > +     err = migrate_pages(demote_pages, alloc_demote_page, NULL,
> > +                         target_nid, MIGRATE_ASYNC, MR_DEMOTION,
> > +                         &nr_succeeded);
> > +
> > +     return nr_succeeded;
> > +}
> > +
> >  /*
> >   * shrink_page_list() returns the number of reclaimed pages
> >   */
> > @@ -1078,12 +1135,15 @@ static unsigned int shrink_page_list(str
> >  {
> >       LIST_HEAD(ret_pages);
> >       LIST_HEAD(free_pages);
> > +     LIST_HEAD(demote_pages);
> >       unsigned int nr_reclaimed = 0;
> >       unsigned int pgactivate = 0;
> > +     bool do_demote_pass = true;
> >
> >       memset(stat, 0, sizeof(*stat));
> >       cond_resched();
> >
> > +retry:
> >       while (!list_empty(page_list)) {
> >               struct address_space *mapping;
> >               struct page *page;
> > @@ -1233,6 +1293,16 @@ static unsigned int shrink_page_list(str
> >               }
> >
> >               /*
> > +              * Before reclaiming the page, try to relocate
> > +              * its contents to another node.
> > +              */
> > +             if (do_demote_pass && migrate_demote_page_ok(page)) {
> > +                     list_add(&page->lru, &demote_pages);
> > +                     unlock_page(page);
> > +                     continue;
> > +             }
>
> Should we keep it simple for now and only try to demote those pages that are
> free of cpusets and memory policies?
> Actually, demoting those pages to a CPU or a NUMA node that does not fall into
> their set, would violate those constraints right?

Yes, this has been discussed since the very beginning. There is not an
easy way to figure out the memory placement policy (cpuset and
mempolicy) from "page". I think this also prevents "demote those pages
that are free of cpusets and memory policies".

The conclusion was the violation should be fine for now. And the
demotion feature is opt'ed in by a new node reclaim mode.

> So I think we should leave those pages alone for now.
>
> > +
> > +             /*
> >                * Anonymous process memory has backing store?
> >                * Try to allocate it some swap space here.
> >                * Lazyfree page could be freed directly
> > @@ -1479,6 +1549,17 @@ keep:
> >               list_add(&page->lru, &ret_pages);
> >               VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page);
> >       }
> > +     /* 'page_list' is always empty here */
> > +
> > +     /* Migrate pages selected for demotion */
> > +     nr_reclaimed += demote_page_list(&demote_pages, pgdat, sc);
> > +     /* Pages that could not be demoted are still in @demote_pages */
> > +     if (!list_empty(&demote_pages)) {
> > +             /* Pages which failed to demoted go back on on @page_list for retry: */
> > +             list_splice_init(&demote_pages, page_list);
> > +             do_demote_pass = false;
> > +             goto retry;
> > +     }
> >
> >       pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
> >
> > _
> >
>
> --
> Oscar Salvador
> SUSE L3
>


  reply	other threads:[~2021-02-02 22:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26  0:34 [RFC][PATCH 00/13] [v5] Migrate Pages in lieu of discard Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 01/13] mm/vmscan: restore zone_reclaim_mode ABI Dave Hansen
2021-02-10  9:42   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 02/13] mm/vmscan: move RECLAIM* bits to uapi header Dave Hansen
2021-02-10  9:44   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 03/13] mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks Dave Hansen
2021-01-31  1:10   ` David Rientjes
2021-02-10  9:54   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 04/13] mm/numa: node demotion data structure and lookup Dave Hansen
2021-01-31  1:19   ` David Rientjes
2021-02-01 17:49     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 05/13] mm/numa: automatically generate node migration order Dave Hansen
2021-01-29 20:46   ` Yang Shi
2021-02-01 19:13     ` Dave Hansen
2021-02-02 11:43       ` Oscar Salvador
2021-02-02 17:46       ` Yang Shi
2021-02-03  0:43         ` Dave Hansen
2021-02-04  0:26           ` Yang Shi
2021-01-26  0:34 ` [RFC][PATCH 06/13] mm/migrate: update migration order during on hotplug events Dave Hansen
2021-01-29 20:59   ` Yang Shi
2021-02-02 11:42   ` Oscar Salvador
2021-02-09 23:45     ` Dave Hansen
2021-02-10  8:55       ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 07/13] mm/migrate: make migrate_pages() return nr_succeeded Dave Hansen
2021-01-29 21:04   ` Yang Shi
2021-02-09 23:41     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim Dave Hansen
2021-02-02 11:55   ` Oscar Salvador
2021-02-02 22:45     ` Yang Shi [this message]
2021-02-02 22:56       ` Dave Hansen
2021-02-02 18:22   ` Yang Shi
2021-02-02 18:34     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 09/13] mm/vmscan: add page demotion counter Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 10/13] mm/vmscan: add helper for querying ability to age anonymous pages Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 11/13] mm/vmscan: Consider anonymous pages without swap Dave Hansen
2021-02-02 18:56   ` Yang Shi
2021-02-02 21:35     ` Dave Hansen
2021-02-02 22:35       ` Yang Shi
2021-01-26  0:34 ` [RFC][PATCH 12/13] mm/vmscan: never demote for memcg reclaim Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 13/13] mm/migrate: new zone_reclaim_mode to enable reclaim migration Dave Hansen
2021-01-31  1:13 ` [RFC][PATCH 00/13] [v5] Migrate Pages in lieu of discard David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkrSYsoVV1eHHO9kWv2xe96qmAt6dmC_FsBrydsZxGLvew@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).