linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Keith Busch <keith.busch@intel.com>,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-mm@kvack.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Rafael Wysocki <rafael@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH 1/7] node: Link memory nodes to their compute nodes
Date: Mon, 19 Nov 2018 08:16:02 +0530	[thread overview]
Message-ID: <79caebd8-ebf1-58b5-31e7-ead3626a1ec7@arm.com> (raw)
In-Reply-To: <20181114224921.12123-2-keith.busch@intel.com>



On 11/15/2018 04:19 AM, Keith Busch wrote:
> Memory-only nodes will often have affinity to a compute node, and
> platforms have ways to express that locality relationship.

It may not have a local affinity to any compute node but it might have a
valid NUMA distance from all available compute nodes. This is particularly
true when the coherent device memory which is accessible from all available
compute nodes without having local affinity to any compute node other than
the device compute which may or not be represented as a NUMA node in itself.

But in case of normally system memory also, a memory only node might be far
from other CPU nodes and may not have CPUs of it's own. In that case there
is no local affinity anyways.

> 
> A node containing CPUs or other DMA devices that can initiate memory
> access are referred to as "memory iniators". A "memory target" is a

Memory initiators should also include heterogeneous compute elements like
GPU cores, FPGA elements etc apart from CPU and DMA engines.

> node that provides at least one phyiscal address range accessible to a
> memory initiator.

This definition for "memory target" makes sense. Coherent accesses within
PA range from all possible "memory initiators" which should also include
heterogeneous compute elements as mentioned before.

> 
> In preparation for these systems, provide a new kernel API to link
> the target memory node to its initiator compute node with symlinks to
> each other.

Makes sense but how would we really define NUMA placement for various
heterogeneous compute elements which are connected differently to the
system bus differently than the CPU and DMA. 

> 
> The following example shows the new sysfs hierarchy setup for memory node
> 'Y' local to commpute node 'X':
> 
>   # ls -l /sys/devices/system/node/nodeX/initiator*
>   /sys/devices/system/node/nodeX/targetY -> ../nodeY
> 
>   # ls -l /sys/devices/system/node/nodeY/target*
>   /sys/devices/system/node/nodeY/initiatorX -> ../nodeX

This inter linking makes sense but once we are able to define all possible
memory initiators and memory targets as NUMA nodes (which might not very
trivial) taking into account heterogeneous compute environment. But this
linking at least establishes the coherency relationship between memory
initiators and memory targets.

> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/base/node.c  | 32 ++++++++++++++++++++++++++++++++
>  include/linux/node.h |  2 ++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 86d6cd92ce3d..a9b7512a9502 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -372,6 +372,38 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid)
>  				 kobject_name(&node_devices[nid]->dev.kobj));
>  }
>  
> +int register_memory_node_under_compute_node(unsigned int m, unsigned int p)
> +{
> +	int ret;
> +	char initiator[20], target[17];

20, 17 seems arbitrary here.

> +
> +	if (!node_online(p) || !node_online(m))
> +		return -ENODEV;

Just wondering how a NUMA node for group of GPU compute elements will look
like which are not manage by kernel but are still memory initiators having
access to a number of memory targets.

> +	if (m == p)
> +		return 0;

Why skip ? Should not we link memory target to it's own node which can be
it's memory initiator as well. Caller of this linking function might decide
on whether the memory target is accessible from same NUMA node as a memory
initiator or not.

> +
> +	snprintf(initiator, sizeof(initiator), "initiator%d", p);
> +	snprintf(target, sizeof(target), "target%d", m);
> +
> +	ret = sysfs_create_link(&node_devices[p]->dev.kobj,
> +				&node_devices[m]->dev.kobj,
> +				target);
> +	if (ret)
> +		return ret;
> +
> +	ret = sysfs_create_link(&node_devices[m]->dev.kobj,
> +				&node_devices[p]->dev.kobj,
> +				initiator);
> +	if (ret)
> +		goto err;
> +
> +	return 0;
> + err:
> +	sysfs_remove_link(&node_devices[p]->dev.kobj,
> +			  kobject_name(&node_devices[m]->dev.kobj));
> +	return ret;
> +}
> +
>  int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
>  {
>  	struct device *obj;
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 257bb3d6d014..1fd734a3fb3f 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -75,6 +75,8 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>  extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>  					   unsigned long phys_index);
>  
> +extern int register_memory_node_under_compute_node(unsigned int m, unsigned int p);
> +
>  #ifdef CONFIG_HUGETLBFS
>  extern void register_hugetlbfs_with_node(node_registration_func_t doregister,
>  					 node_registration_func_t unregister);
>
The code is all good but as mentioned before the primary concern is whether
this semantics will be able to correctly represent all possible present and
future heterogeneous compute environments with multi attribute memory. This
is going to be a kernel API. So apart from various NUMA representation for
all possible kinds, the interface has to be abstract with generic elements
and room for future extension.

      parent reply	other threads:[~2018-11-19  2:46 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14 22:49 [PATCH 1/7] node: Link memory nodes to their compute nodes Keith Busch
2018-11-14 22:49 ` [PATCH 2/7] node: Add heterogenous memory performance Keith Busch
2018-11-19  3:35   ` Anshuman Khandual
2018-11-19 15:46     ` Keith Busch
2018-11-22 13:22       ` Anshuman Khandual
2018-11-27  7:00   ` Dan Williams
2018-11-27 17:42     ` Dan Williams
2018-11-27 17:44     ` Keith Busch
2018-11-14 22:49 ` [PATCH 3/7] doc/vm: New documentation for " Keith Busch
2018-11-15 12:59   ` Jonathan Cameron
2018-12-10 16:12     ` Dan Williams
2018-11-20 13:51   ` Mike Rapoport
2018-11-20 15:31     ` Keith Busch
2018-11-14 22:49 ` [PATCH 4/7] node: Add memory caching attributes Keith Busch
2018-11-15  0:40   ` Dave Hansen
2018-11-19  4:14   ` Anshuman Khandual
2018-11-19 23:06     ` Keith Busch
2018-11-22 13:29       ` Anshuman Khandual
2018-11-26 15:14         ` Keith Busch
2018-11-26 19:06   ` Greg Kroah-Hartman
2018-11-26 19:53     ` Keith Busch
2018-11-26 19:06   ` Greg Kroah-Hartman
2018-11-14 22:49 ` [PATCH 5/7] doc/vm: New documentation for memory cache Keith Busch
2018-11-15  0:41   ` Dave Hansen
2018-11-15 13:16   ` Jonathan Cameron
2018-11-20 13:53   ` Mike Rapoport
2018-11-14 22:49 ` [PATCH 6/7] acpi: Create subtable parsing infrastructure Keith Busch
2018-11-19  9:58   ` Rafael J. Wysocki
2018-11-19 18:36     ` Keith Busch
2018-11-14 22:49 ` [PATCH 7/7] acpi/hmat: Parse and report heterogeneous memory Keith Busch
2018-11-15 13:57 ` [PATCH 1/7] node: Link memory nodes to their compute nodes Matthew Wilcox
2018-11-15 14:59   ` Keith Busch
2018-11-15 17:50     ` Dan Williams
2018-11-19  3:04       ` Anshuman Khandual
2018-11-15 20:36     ` Matthew Wilcox
2018-11-16 18:32       ` Keith Busch
2018-11-19  3:15         ` Anshuman Khandual
2018-11-19 15:49           ` Keith Busch
2018-12-04 15:43         ` Aneesh Kumar K.V
2018-12-04 16:54           ` Keith Busch
2018-11-16 22:55       ` Dan Williams
2018-11-19  2:52     ` Anshuman Khandual
2018-11-19  2:46 ` Anshuman Khandual [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79caebd8-ebf1-58b5-31e7-ead3626a1ec7@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=keith.busch@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).