From: Balbir Singh <bsingharora@gmail.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-kernel@vger.kernel.org
Cc: "Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com>,
"Box, David E" <david.e.box@intel.com>,
"Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>,
"Lahtinen, Joonas" <joonas.lahtinen@intel.com>,
"Moore, Robert" <robert.moore@intel.com>,
"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
"Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
"Schmauss, Erik" <erik.schmauss@intel.com>,
"Verma, Vishal L" <vishal.l.verma@intel.com>,
"Zheng, Lv" <lv.zheng@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jerome Glisse <jglisse@redhat.com>, Len Brown <lenb@kernel.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@lists.01.org
Subject: Re: [RFC v2 0/5] surface heterogeneous memory performance information
Date: Fri, 07 Jul 2017 16:27:16 +1000 [thread overview]
Message-ID: <1499408836.23251.3.camel@gmail.com> (raw)
In-Reply-To: <20170706215233.11329-1-ross.zwisler@linux.intel.com>
On Thu, 2017-07-06 at 15:52 -0600, Ross Zwisler wrote:
> ==== Quick Summary ====
>
> Platforms in the very near future will have multiple types of memory
> attached to a single CPU. These disparate memory ranges will have some
> characteristics in common, such as CPU cache coherence, but they can have
> wide ranges of performance both in terms of latency and bandwidth.
>
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.
>
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node. This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.
>
> We solve this issue by providing userspace with performance information on
> individual memory ranges. This performance information is exposed via
> sysfs:
>
> # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
> mem_tgt2/firmware_id:1
> mem_tgt2/is_cached:0
> mem_tgt2/is_enabled:1
> mem_tgt2/is_isolated:0
Could you please explain these charactersitics, are they in the patches
to follow?
> mem_tgt2/phys_addr_base:0x0
> mem_tgt2/phys_length_bytes:0x800000000
> mem_tgt2/local_init/read_bw_MBps:30720
> mem_tgt2/local_init/read_lat_nsec:100
> mem_tgt2/local_init/write_bw_MBps:30720
> mem_tgt2/local_init/write_lat_nsec:100
>
How to these numbers compare to normal system memory?
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.
>
> This series is built upon acpica-1705:
>
> https://github.com/zetalog/linux/commits/acpica-1705
>
> And you can find a working tree here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/zwisler/linux.git/log/?h=hmem_sysfs
>
> ==== Lots of Details ====
>
> This patch set is only concerned with CPU-addressable memory types, not
> on-device memory like what we have with Jerome Glisse's HMM series:
>
> https://lwn.net/Articles/726691/
>
> This patch set works by enabling the new Heterogeneous Memory Attribute
> Table (HMAT) table, newly defined in ACPI 6.2. One major conceptual change
> in ACPI 6.2 related to this work is that proximity domains no longer need
> to contain a processor. We can now have memory-only proximity domains,
> which means that we can now have memory-only Linux NUMA nodes.
>
> Here is an example configuration where we have a single processor, one
> range of regular memory and one range of HBM:
>
> +---------------+ +----------------+
> | Processor | | Memory |
> | prox domain 0 +---+ prox domain 1 |
> | NUMA node 1 | | NUMA node 2 |
> +-------+-------+ +----------------+
> |
> +-------+----------+
> | HBM |
> | prox domain 2 |
> | NUMA node 0 |
> +------------------+
>
> This gives us one initiator (the processor) and two targets (the two memory
> ranges). Each of these three has its own ACPI proximity domain and
> associated Linux NUMA node. Note also that while there is a 1:1 mapping
> from each proximity domain to each NUMA node, the numbers don't necessarily
> match up. Additionally we can have extra NUMA nodes that don't map back to
> ACPI proximity domains.
Could you expand on proximity domains, are they the same as node distance
or is this ACPI terminology for something more?
>
> The above configuration could also have the processor and one of the two
> memory ranges sharing a proximity domain and NUMA node, but for the
> purposes of the HMAT the two memory ranges will always need to be
> separated.
>
> The overall goal of this series and of the HMAT is to allow users to
> identify memory using its performance characteristics. This can broadly be
> done in one of two ways:
>
> Option 1: Provide the user with a way to map between proximity domains and
> NUMA nodes and a way to access the HMAT directly (probably via
> /sys/firmware/acpi/tables). Then, through possibly a library and a daemon,
> provide an API so that applications can either request information about
> memory ranges, or request memory allocations that meet a given set of
> performance characteristics.
>
> Option 2: Provide the user with HMAT performance data directly in sysfs,
> allowing applications to directly access it without the need for the
> library and daemon.
>
> The kernel work for option 1 is started by patches 1-3. These just surface
> the minimal amount of information in sysfs to allow userspace to map
> between proximity domains and NUMA nodes so that the raw data in the HMAT
> table can be understood.
>
> Patches 4 and 5 enable option 2, adding performance information from the
> HMAT to sysfs. The second option is complicated by the amount of HMAT data
> that could be present in very large systems, so in this series we only
> surface performance information for local (initiator,target) pairings. The
> changelog for patch 5 discusses this in detail.
>
> The naming collision between Jerome's "Heterogeneous Memory Management
> (HMM)" and this "Heterogeneous Memory (HMEM)" series is unfortunate, but I
> was trying to stick with the word "Heterogeneous" because of the naming of
> the ACPI 6.2 Heterogeneous Memory Attribute Table table. Suggestions for
> better naming are welcome.
>
Balbir Singh.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-07 6:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-06 21:52 [RFC v2 0/5] surface heterogeneous memory performance information Ross Zwisler
2017-07-06 21:52 ` [RFC v2 1/5] acpi: add missing include in acpi_numa.h Ross Zwisler
2017-07-06 22:08 ` Rafael J. Wysocki
2017-07-06 21:52 ` [RFC v2 2/5] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
2017-07-06 22:13 ` Rafael J. Wysocki
2017-07-06 22:22 ` Ross Zwisler
2017-07-06 22:36 ` Rafael J. Wysocki
2017-07-06 21:52 ` [RFC v2 3/5] hmem: add heterogeneous memory sysfs support Ross Zwisler
2017-07-07 5:53 ` John Hubbard
2017-07-07 16:32 ` Ross Zwisler
2017-07-06 21:52 ` [RFC v2 4/5] sysfs: add sysfs_add_group_link() Ross Zwisler
2017-07-06 21:52 ` [RFC v2 5/5] hmem: add performance attributes Ross Zwisler
2017-07-06 23:08 ` [RFC v2 0/5] surface heterogeneous memory performance information Jerome Glisse
2017-07-06 23:30 ` Dave Hansen
2017-07-07 5:30 ` John Hubbard
2017-07-07 16:30 ` Ross Zwisler
2017-07-07 6:27 ` Balbir Singh [this message]
2017-07-07 16:19 ` Dave Hansen
2017-07-07 16:25 ` Ross Zwisler
2017-07-19 9:48 ` Bob Liu
2017-07-19 15:25 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1499408836.23251.3.camel@gmail.com \
--to=bsingharora@gmail.com \
--cc=Jaroslaw.Kogut@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=david.e.box@intel.com \
--cc=devel@acpica.org \
--cc=erik.schmauss@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=jglisse@redhat.com \
--cc=joonas.lahtinen@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=lukasz.anaczkowski@intel.com \
--cc=lukasz.odzioba@intel.com \
--cc=lv.zheng@intel.com \
--cc=murugasamy.nachimuthu@intel.com \
--cc=rafael.j.wysocki@intel.com \
--cc=rjw@rjwysocki.net \
--cc=robert.moore@intel.com \
--cc=ross.zwisler@linux.intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).