All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	David Rientjes <rientjes@google.com>,
	Huang Ying <ying.huang@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order
Date: Tue, 2 Feb 2021 09:46:39 -0800	[thread overview]
Message-ID: <CAHbLzkqrPvY4Zb17AGJi1Zi7OV9WDUTEpj5DpfWY9c2WHzPBYw@mail.gmail.com> (raw)
In-Reply-To: <317d4c23-76a7-b653-87a4-bab642fa1717@intel.com>

On Mon, Feb 1, 2021 at 11:13 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/29/21 12:46 PM, Yang Shi wrote:
> ...
> >>  int next_demotion_node(int node)
> >>  {
> >> -       return node_demotion[node];
> >> +       /*
> >> +        * node_demotion[] is updated without excluding
> >> +        * this function from running.  READ_ONCE() avoids
> >> +        * reading multiple, inconsistent 'node' values
> >> +        * during an update.
> >> +        */
> >
> > Don't we need a smp_rmb() here? The single write barrier might be not
> > enough in migration target set. Typically a write barrier should be
> > used in pairs with a read barrier.
>
> I don't think we need one, practically.
>
> Since there is no locking against node_demotion[] updates, although a
> smp_rmb() would ensure that this read is up-to-date, it could change
> freely after the smp_rmb().

Yes, but this should be able to guarantee we see "disable + after"
state. Isn't it more preferred?

>
> In other words, smp_rmb() would shrink the window where a "stale" read
> could occur but would not eliminate it.
>
> >> +       return READ_ONCE(node_demotion[node]);
> >
> > Why not consolidate the patch #4 in this patch? The patch #4 just add
> > the definition of node_demotion and the function, then the function is
> > changed in this patch, and the function is not used by anyone between
> > the adding and changing.
>
> I really wanted to highlight that the locking scheme and the READ_ONCE()
> (or lack thereof) was specifically due to how node_demotion[] was being
> updated.
>
> The READ_ONCE() is not, for instance, inherent to the data structure.
>
> ...
> >> +/*
> >> + * When memory fills up on a node, memory contents can be
> >> + * automatically migrated to another node instead of
> >> + * discarded at reclaim.
> >> + *
> >> + * Establish a "migration path" which will start at nodes
> >> + * with CPUs and will follow the priorities used to build the
> >> + * page allocator zonelists.
> >> + *
> >> + * The difference here is that cycles must be avoided.  If
> >> + * node0 migrates to node1, then neither node1, nor anything
> >> + * node1 migrates to can migrate to node0.
> >> + *
> >> + * This function can run simultaneously with readers of
> >> + * node_demotion[].  However, it can not run simultaneously
> >> + * with itself.  Exclusion is provided by memory hotplug events
> >> + * being single-threaded.
> >
> > Maybe an example diagram for the physical topology and how the
> > migration target is generated in the comment seems helpful to
> > understand the code.
>
> Sure.  Were you thinking of a code comment, or enhanced changelog?

I'd prefer a code comment.

>
> Let's say there's a system with two sockets each with the same three
> classes of memory: fast, medium and slow.  Each memory class is placed
> in its own NUMA node and the CPUs are attached to the fast memory.  That
> leaves 6 NUMA nodes (0-5):
>
>         Socket A: 0, 1, 2
>         Socket B: 3, 4, 5
>
> The migration path for this configuration path would start on the nodes
> with the processors and fast memory, progress through medium and end
> with the slow memory:
>
>         0 -> 1 -> 2 -> stop
>         3 -> 4 -> 5 -> stop
>
> This is represented in the node_demotion[] like this:
>
>         {  1, // Node 0 migrates to 1
>            2, // Node 1 migrates to 2
>           -1, // Node 2 does not migrate
>            4, // Node 3 migrates to 1
>            5, // Node 4 migrates to 2
>           -1} // Node 5 does not migrate
>
> Is that what you were thinking of?

Perfect.

>
> ...
> >> +again:
> >> +       this_pass = next_pass;
> >> +       next_pass = NODE_MASK_NONE;
> >> +       /*
> >> +        * To avoid cycles in the migration "graph", ensure
> >> +        * that migration sources are not future targets by
> >> +        * setting them in 'used_targets'.  Do this only
> >> +        * once per pass so that multiple source nodes can
> >> +        * share a target node.
> >> +        *
> >> +        * 'used_targets' will become unavailable in future
> >> +        * passes.  This limits some opportunities for
> >> +        * multiple source nodes to share a desintation.
> >
> > s/desination/destination
>
> Fixed, thanks.
>
> >> +        */
> >> +       nodes_or(used_targets, used_targets, this_pass);
> >> +       for_each_node_mask(node, this_pass) {
> >> +               int target_node = establish_migrate_target(node, &used_targets);
> >> +
> >> +               if (target_node == NUMA_NO_NODE)
> >> +                       continue;
> >> +
> >> +               /* Visit targets from this pass in the next pass: */
> >> +               node_set(target_node, next_pass);
> >> +       }
> >> +       /* Is another pass necessary? */
> >> +       if (!nodes_empty(next_pass))
> >> +               goto again;
> >> +}
> >> +
> >> +void set_migration_target_nodes(void)
> >
> > It seems this function is not called outside migrate.c, so it should be static.
>
> Fixed, thanks.

  parent reply	other threads:[~2021-02-02 17:50 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26  0:34 [RFC][PATCH 00/13] [v5] Migrate Pages in lieu of discard Dave Hansen
2021-01-26  0:34 ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 01/13] mm/vmscan: restore zone_reclaim_mode ABI Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-02-10  9:42   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 02/13] mm/vmscan: move RECLAIM* bits to uapi header Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-02-10  9:44   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 03/13] mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-31  1:10   ` David Rientjes
2021-01-31  1:10     ` David Rientjes
2021-02-10  9:54   ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 04/13] mm/numa: node demotion data structure and lookup Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-31  1:19   ` David Rientjes
2021-01-31  1:19     ` David Rientjes
2021-02-01 17:49     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 05/13] mm/numa: automatically generate node migration order Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-29 20:46   ` Yang Shi
2021-01-29 20:46     ` Yang Shi
2021-02-01 19:13     ` Dave Hansen
2021-02-02 11:43       ` Oscar Salvador
2021-02-02 17:46       ` Yang Shi [this message]
2021-02-02 17:46         ` Yang Shi
2021-02-03  0:43         ` Dave Hansen
2021-02-04  0:26           ` Yang Shi
2021-02-04  0:26             ` Yang Shi
2021-01-26  0:34 ` [RFC][PATCH 06/13] mm/migrate: update migration order during on hotplug events Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-29 20:59   ` Yang Shi
2021-01-29 20:59     ` Yang Shi
2021-02-02 11:42   ` Oscar Salvador
2021-02-09 23:45     ` Dave Hansen
2021-02-10  8:55       ` Oscar Salvador
2021-01-26  0:34 ` [RFC][PATCH 07/13] mm/migrate: make migrate_pages() return nr_succeeded Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-29 21:04   ` Yang Shi
2021-01-29 21:04     ` Yang Shi
2021-02-09 23:41     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 08/13] mm/migrate: demote pages during reclaim Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-02-02 11:55   ` Oscar Salvador
2021-02-02 22:45     ` Yang Shi
2021-02-02 22:45       ` Yang Shi
2021-02-02 22:56       ` Dave Hansen
2021-02-02 18:22   ` Yang Shi
2021-02-02 18:22     ` Yang Shi
2021-02-02 18:34     ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 09/13] mm/vmscan: add page demotion counter Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 10/13] mm/vmscan: add helper for querying ability to age anonymous pages Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 11/13] mm/vmscan: Consider anonymous pages without swap Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-02-02 18:56   ` Yang Shi
2021-02-02 18:56     ` Yang Shi
2021-02-02 21:35     ` Dave Hansen
2021-02-02 22:35       ` Yang Shi
2021-02-02 22:35         ` Yang Shi
2021-01-26  0:34 ` [RFC][PATCH 12/13] mm/vmscan: never demote for memcg reclaim Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-26  0:34 ` [RFC][PATCH 13/13] mm/migrate: new zone_reclaim_mode to enable reclaim migration Dave Hansen
2021-01-26  0:34   ` Dave Hansen
2021-01-31  1:13 ` [RFC][PATCH 00/13] [v5] Migrate Pages in lieu of discard David Rientjes
2021-01-31  1:13   ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkqrPvY4Zb17AGJi1Zi7OV9WDUTEpj5DpfWY9c2WHzPBYw@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.