All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, Yang Shi <shy828301@gmail.com>,
	Michal Hocko <mhocko@suse.com>, Wei Xu <weixugc@google.com>,
	David Rientjes <rientjes@google.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"David Hildenbrand" <david@redhat.com>,
	osalvador <osalvador@suse.de>
Subject: Re: [PATCH -V8 02/10] mm/numa: automatically generate node migration order
Date: Tue, 22 Jun 2021 09:14:29 +0800	[thread overview]
Message-ID: <87sg1an1je.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <2AA3D792-7F14-4297-8EDD-3B5A7B31AECA@nvidia.com> (Zi Yan's message of "Mon, 21 Jun 2021 10:50:14 -0400")

Zi Yan <ziy@nvidia.com> writes:

> On 19 Jun 2021, at 4:18, Huang, Ying wrote:
>
>> Zi Yan <ziy@nvidia.com> writes:
>>
>>> On 18 Jun 2021, at 2:15, Huang Ying wrote:

[snip]

>>>> +/*
>>>> + * When memory fills up on a node, memory contents can be
>>>> + * automatically migrated to another node instead of
>>>> + * discarded at reclaim.
>>>> + *
>>>> + * Establish a "migration path" which will start at nodes
>>>> + * with CPUs and will follow the priorities used to build the
>>>> + * page allocator zonelists.
>>>> + *
>>>> + * The difference here is that cycles must be avoided.  If
>>>> + * node0 migrates to node1, then neither node1, nor anything
>>>> + * node1 migrates to can migrate to node0.
>>>> + *
>>>> + * This function can run simultaneously with readers of
>>>> + * node_demotion[].  However, it can not run simultaneously
>>>> + * with itself.  Exclusion is provided by memory hotplug events
>>>> + * being single-threaded.
>>>> + */
>>>> +static void __set_migration_target_nodes(void)
>>>> +{
>>>> +	nodemask_t next_pass	= NODE_MASK_NONE;
>>>> +	nodemask_t this_pass	= NODE_MASK_NONE;
>>>> +	nodemask_t used_targets = NODE_MASK_NONE;
>>>> +	int node;
>>>> +
>>>> +	/*
>>>> +	 * Avoid any oddities like cycles that could occur
>>>> +	 * from changes in the topology.  This will leave
>>>> +	 * a momentary gap when migration is disabled.
>>>> +	 */
>>>> +	disable_all_migrate_targets();
>>>> +
>>>> +	/*
>>>> +	 * Ensure that the "disable" is visible across the system.
>>>> +	 * Readers will see either a combination of before+disable
>>>> +	 * state or disable+after.  They will never see before and
>>>> +	 * after state together.
>>>> +	 *
>>>> +	 * The before+after state together might have cycles and
>>>> +	 * could cause readers to do things like loop until this
>>>> +	 * function finishes.  This ensures they can only see a
>>>> +	 * single "bad" read and would, for instance, only loop
>>>> +	 * once.
>>>> +	 */
>>>> +	smp_wmb();
>>>> +
>>>> +	/*
>>>> +	 * Allocations go close to CPUs, first.  Assume that
>>>> +	 * the migration path starts at the nodes with CPUs.
>>>> +	 */
>>>> +	next_pass = node_states[N_CPU];
>>>
>>> Is there a plan of allowing user to change where the migration
>>> path starts? Or maybe one step further providing an interface
>>> to allow user to specify the demotion path. Something like
>>> /sys/devices/system/node/node*/node_demotion.
>>
>> I don't think that's necessary at least for now.  Do you know any real
>> world use case for this?
>
> In our P9+volta system, GPU memory is exposed as a NUMA node.
> For the GPU workloads with data size greater than GPU memory size,
> it will be very helpful to allow pages in GPU memory to be migrated/demoted
> to CPU memory. With your current assumption, GPU memory -> CPU memory
> demotion seems not possible, right? This should also apply to any
> system with a device memory exposed as a NUMA node and workloads running
> on the device and using CPU memory as a lower tier memory than the device
> memory.

Thanks a lot for your use case!  It appears that the demotion path
specified by users is one possible way to satisfy your requirement.  And
I think it's possible to enable that on top of this patchset.  But we
still have no specific plan to work on that at least for now.

Best Regards,
Huang, Ying

  reply	other threads:[~2021-06-22  1:14 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-18  6:15 [PATCH -V8 00/10] Migrate Pages in lieu of discard Huang Ying
2021-06-18  6:15 ` [PATCH -V8 01/10] mm/numa: node demotion data structure and lookup Huang Ying
2021-06-18  6:15 ` [PATCH -V8 02/10] mm/numa: automatically generate node migration order Huang Ying
2021-06-18 15:14   ` Zi Yan
2021-06-19  8:18     ` Huang, Ying
2021-06-19  8:18       ` Huang, Ying
2021-06-21 14:50       ` Zi Yan
2021-06-22  1:14         ` Huang, Ying [this message]
2021-06-22  1:14           ` Huang, Ying
2021-06-22 12:13           ` Dave Hansen
2021-06-22 12:06         ` Dave Hansen
2021-06-22 12:48           ` Zi Yan
2021-06-21 19:51       ` Yang Shi
2021-06-21 19:51         ` Yang Shi
2021-06-22  0:55         ` Huang, Ying
2021-06-22  0:55           ` Huang, Ying
2021-06-21 19:53       ` Dave Hansen
2021-06-22  0:54         ` Huang, Ying
2021-06-22  0:54           ` Huang, Ying
2021-06-18  6:15 ` [PATCH -V8 03/10] mm/migrate: update node demotion order during on hotplug events Huang Ying
2021-06-18  6:15 ` [PATCH -V8 04/10] mm/migrate: make migrate_pages() return nr_succeeded Huang Ying
2021-06-18  7:53   ` Oscar Salvador
2021-06-18  8:15     ` Huang, Ying
2021-06-18  8:15       ` Huang, Ying
2021-06-18  6:15 ` [PATCH -V8 05/10] mm/migrate: demote pages during reclaim Huang Ying
2021-06-18 15:42   ` Zi Yan
2021-06-19  7:45     ` Huang, Ying
2021-06-19  7:45       ` Huang, Ying
2021-06-21 19:58       ` Yang Shi
2021-06-21 19:58         ` Yang Shi
2021-06-22  2:09         ` Huang, Ying
2021-06-22  2:09           ` Huang, Ying
2021-06-22 17:15           ` Yang Shi
2021-06-22 17:15             ` Yang Shi
2021-06-22 18:15             ` Zi Yan
2021-06-23  2:19             ` Huang, Ying
2021-06-23  2:19               ` Huang, Ying
2021-06-18  6:15 ` [PATCH -V8 06/10] mm/vmscan: add page demotion counter Huang Ying
2021-06-18  6:15 ` [PATCH -V8 07/10] mm/vmscan: add helper for querying ability to age anonymous pages Huang Ying
2021-06-18 15:45   ` Zi Yan
2021-06-19  2:33     ` Huang, Ying
2021-06-19  2:33       ` Huang, Ying
2021-06-18  6:15 ` [PATCH -V8 08/10] mm/vmscan: Consider anonymous pages without swap Huang Ying
2021-06-18  6:15 ` [PATCH -V8 09/10] mm/vmscan: never demote for memcg reclaim Huang Ying
2021-06-18  6:15 ` [PATCH -V8 10/10] mm/migrate: add sysfs interface to enable reclaim migration Huang Ying
2021-06-22  9:00 ` [PATCH -V8 00/10] Migrate Pages in lieu of discard Oscar Salvador
2021-06-23  1:12   ` Huang, Ying
2021-06-23  1:12     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sg1an1je.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=weixugc@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.