linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: mhocko@suse.com, vbabka@suse.cz, mgorman@suse.de,
	minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com,
	bsingharora@gmail.com, srikar@linux.vnet.ibm.com,
	haren@linux.vnet.ibm.com, jglisse@redhat.com,
	dan.j.williams@intel.com
Subject: Re: [RFC V2 03/12] mm: Change generic FALLBACK zonelist creation process
Date: Tue, 31 Jan 2017 07:06:20 +0530	[thread overview]
Message-ID: <434aa74c-e917-490e-85ab-8c67b1a82d95@linux.vnet.ibm.com> (raw)
In-Reply-To: <07bd439c-6270-b219-227b-4079d36a2788@intel.com>

On 01/30/2017 11:04 PM, Dave Hansen wrote:
> On 01/29/2017 07:35 PM, Anshuman Khandual wrote:
>> * CDM node's zones are not part of any other node's FALLBACK zonelist
>> * CDM node's FALLBACK list contains it's own memory zones followed by
>>   all system RAM zones in regular order as before
>> * CDM node's zones are part of it's own NOFALLBACK zonelist
> 
> This seems like a sane policy for the system that you're describing.
> But, it's still a policy, and it's rather hard-coded into the kernel.

Right. In the original RFC which I had posted in October, I had thought
about this issue and created 'pglist_data->coherent_device' as a u64
element where each bit in the mask can indicate a specific policy request
for the hot plugged coherent device. But it looked too complicated in
for the moment in absence of other potential coherent memory HW which
really requires anything other than isolation and explicit allocation
method.

> Let's say we had a CDM node with 100x more RAM than the rest of the
> system and it was just as fast as the rest of the RAM.  Would we still
> want it isolated like this?  Or would we want a different policy?

Though in this particular case this CDM can be hot plugged into the
system as a normal NUMA node (I dont see any reason why it should
not be treated as normal NUMA node) but I do understand the need
for different policy requirements for different kind of coherent
memory.

But then the other argument being, dont we want to keep this 100X more
memory isolated for some special purpose to be utilized by specific
applications ?

There is a sense that if the non system RAM memory is coherent and
similar there cannot be much differences between what they would
expect from the kernel.

> 
> Why do we need this hard-coded along with the cpuset stuff later in the
> series.  Doesn't taking a node out of the cpuset also take it out of the
> fallback lists?

There are two mutually exclusive approaches which are described in
this patch series.

(1) zonelist modification based approach
(2) cpuset restriction based approach

As mentioned in the cover letter,

"
NOTE: These two set of patches mutually exclusive of each other and
represent two different approaches. Only one of these sets should be
applied at any point of time.

Set1:
  mm: Change generic FALLBACK zonelist creation process
  mm: Change mbind(MPOL_BIND) implementation for CDM nodes

Set2:
  cpuset: Add cpuset_inc() inside cpuset_init()
  mm: Exclude CDM nodes from task->mems_allowed and root cpuset
  mm: Ignore cpuset enforcement when allocation flag has __GFP_THISNODE
"

> 
>>  	while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +		/*
>> +		 * CDM node's own zones should not be part of any other
>> +		 * node's fallback zonelist but only it's own fallback
>> +		 * zonelist.
>> +		 */
>> +		if (is_cdm_node(node) && (pgdat->node_id != node))
>> +			continue;
>> +#endif
> 
> On a superficial note: Isn't that #ifdef unnecessary?  is_cdm_node() has
> a 'return 0' stub when the config option is off anyway.

Right, will fix it up.

  reply	other threads:[~2017-01-31  1:36 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-30  3:35 [RFC V2 00/12] Define coherent device memory node Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 01/12] mm: Define coherent device memory (CDM) node Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 02/12] mm: Isolate HugeTLB allocations away from CDM nodes Anshuman Khandual
2017-01-30 17:19   ` Dave Hansen
2017-01-31  1:03     ` Anshuman Khandual
2017-01-31  1:37       ` Dave Hansen
2017-02-01 13:59         ` Anshuman Khandual
2017-02-01 19:01           ` Dave Hansen
2017-01-30  3:35 ` [RFC V2 03/12] mm: Change generic FALLBACK zonelist creation process Anshuman Khandual
2017-01-30 17:34   ` Dave Hansen
2017-01-31  1:36     ` Anshuman Khandual [this message]
2017-01-31  1:57       ` Dave Hansen
2017-01-31  7:25         ` John Hubbard
2017-01-31 18:04           ` Dave Hansen
2017-01-31 19:14             ` David Nellans
2017-02-01  6:56             ` Anshuman Khandual
2017-02-01  6:46           ` Anshuman Khandual
2017-02-01  6:40         ` Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 04/12] mm: Change mbind(MPOL_BIND) implementation for CDM nodes Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 05/12] cpuset: Add cpuset_inc() inside cpuset_init() Anshuman Khandual
2017-01-30 17:36   ` Dave Hansen
2017-01-30 20:30   ` Mel Gorman
2017-01-31 14:22     ` [RFC] cpuset: Enable changing of top_cpuset's mems_allowed nodemask Anshuman Khandual
2017-01-31 16:00       ` Mel Gorman
2017-02-01  7:31         ` Anshuman Khandual
2017-02-01  8:53           ` Michal Hocko
2017-02-01  9:18           ` Mel Gorman
2017-01-31 14:36     ` [RFC V2 05/12] cpuset: Add cpuset_inc() inside cpuset_init() Vlastimil Babka
2017-01-31 15:30       ` Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 06/12] mm: Exclude CDM nodes from task->mems_allowed and root cpuset Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 07/12] mm: Ignore cpuset enforcement when allocation flag has __GFP_THISNODE Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 08/12] mm: Add new VMA flag VM_CDM Anshuman Khandual
2017-01-30 18:52   ` Jerome Glisse
2017-01-31  4:22     ` Anshuman Khandual
2017-01-31  6:05       ` Jerome Glisse
2017-01-30  3:35 ` [RFC V2 09/12] mm: Exclude CDM marked VMAs from auto NUMA Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 10/12] mm: Ignore madvise(MADV_MERGEABLE) request for VM_CDM marked VMAs Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 11/12] mm: Tag VMA with VM_CDM flag during page fault Anshuman Khandual
2017-01-30 17:51   ` Dave Hansen
2017-01-31  5:10     ` Anshuman Khandual
2017-01-31 17:54       ` Dave Hansen
2017-01-30  3:35 ` [RFC V2 12/12] mm: Tag VMA with VM_CDM flag explicitly during mbind(MPOL_BIND) Anshuman Khandual
2017-01-30 17:54   ` Dave Hansen
2017-01-31  4:36     ` Anshuman Khandual
2017-02-07 18:07       ` Dave Hansen
2017-02-08 14:13         ` Anshuman Khandual
2017-02-08 15:04         ` Jerome Glisse
2017-01-30  3:35 ` [DEBUG 13/21] powerpc/mm: Identify coherent device memory nodes during platform init Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 14/21] powerpc/mm: Create numa nodes for hotplug memory Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 15/21] powerpc/mm: Enable CONFIG_MOVABLE_NODE for PPC64 platform Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 16/21] mm: Enable CONFIG_MOVABLE_NODE on powerpc Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 17/21] mm: Export definition of 'zone_names' array through mmzone.h Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 18/21] mm: Add debugfs interface to dump each node's zonelist information Anshuman Khandual
2017-01-30  3:36 ` [DEBUG 19/21] mm: Add migrate_virtual_range migration interface Anshuman Khandual
2017-01-30  3:36 ` [DEBUG 20/21] drivers: Add two drivers for coherent device memory tests Anshuman Khandual
2017-01-30  3:36 ` [DEBUG 21/21] selftests/powerpc: Add a script to perform random VMA migrations Anshuman Khandual
2017-01-31  5:48 ` [RFC V2 00/12] Define coherent device memory node Anshuman Khandual
2017-01-31  6:15   ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=434aa74c-e917-490e-85ab-8c67b1a82d95@linux.vnet.ibm.com \
    --to=khandual@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=haren@linux.vnet.ibm.com \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).