linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, "Anaczkowski,
	Lukasz" <lukasz.anaczkowski@intel.com>,
	"Box, David E" <david.e.box@intel.com>,
	"Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>,
	"Koss, Marcin" <marcin.koss@intel.com>,
	"Koziej, Artur" <artur.koziej@intel.com>,
	"Lahtinen, Joonas" <joonas.lahtinen@intel.com>,
	"Moore, Robert" <robert.moore@intel.com>,
	"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
	"Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Schmauss, Erik" <erik.schmauss@intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"Zheng, Lv" <lv.zheng@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Balbir Singh <bsingharora@gmail.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>, Len Brown <lenb@kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	devel@acpica.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, linux-api@vger.kernel.org
Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT
Date: Thu, 14 Dec 2017 14:00:32 +0100	[thread overview]
Message-ID: <20171214130032.GK16951@dhcp22.suse.cz> (raw)
In-Reply-To: <20171214021019.13579-1-ross.zwisler@linux.intel.com>

[CC linix-api]

On Wed 13-12-17 19:10:16, Ross Zwisler wrote:
> This is the third revision of my patches adding a sysfs representation
> of the ACPI Heterogeneous Memory Attribute Table (HMAT).  These patches
> are based on v4.15-rc3 and a working tree can be found here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/zwisler/linux.git/log/?h=hmat_v3
> 
> My goal is to get these patches merged for v4.16.

Has actually reviewed the overal design already for this to be 4.16
thing? I do not see any acks/reviewed-bys in any of the patches...

> Changes from previous version (https://lkml.org/lkml/2017/7/6/749):

... comments on this last posting are touching the surface rather than
really discuss the overal design.

>  - Changed "HMEM" to "HMAT" and "hmem" to "hmat" throughout to make sure
>    that this effort doesn't get confused with Jerome's HMM work and to
>    make it clear that this enabling is tightly tied to the ACPI HMAT
>    table.  (John Hubbard)
> 
>  - Moved the link in the initiator (i.e. mem_init0/mem_tgt2) from
>    pointing to the "mem_tgt2/local_init" attribute group to instead
>    point at the mem_tgt2 target itself.  (Brice Goglin)
> 
>  - Simplified the contents of both the initiators and the targets so
>    that we just symlink to the NUMA node and don't duplicate
>    information.  For initiators this means that we no longer enumerate
>    CPUs, and for targets this means that we don't provide physical
>    address start and length information.  All of this is already
>    available in the NUMA node directory itself (i.e.
>    /sys/devices/system/node/node0), and it already accounts for the fact
>    that both multiple CPUs and multiple memory regions can be owned by a
>    given NUMA node.  Also removed some extra attributes (is_enabled,
>    is_isolated) which I don't think are useful at this point in time.
> 
> I have tested this against many different configs that I implemented
> using qemu.

What is the testing procedure? How can I setup qemu to simlate such HW?

[Keeping the rest of the email for linux-api reference]

> ---
> 
> ==== Quick Summary ====
> 
> Platforms exist today which have multiple types of memory attached to a
> single CPU.  These disparate memory ranges have some characteristics in
> common, such as CPU cache coherence, but they can have wide ranges of
> performance both in terms of latency and bandwidth.
> 
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.
> 
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node.  This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.
> 
> We solve this issue by providing userspace with performance information on
> individual memory ranges.  This performance information is exposed via
> sysfs:
> 
>   # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
>   mem_tgt2/firmware_id:1
>   mem_tgt2/is_cached:0
>   mem_tgt2/local_init/read_bw_MBps:40960
>   mem_tgt2/local_init/read_lat_nsec:50
>   mem_tgt2/local_init/write_bw_MBps:40960
>   mem_tgt2/local_init/write_lat_nsec:50
> 
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.

How? Could you provide some examples?

> ==== Lots of Details ====
> 
> This patch set provides a sysfs representation of parts of the
> Heterogeneous Memory Attribute Table (HMAT), newly defined in ACPI 6.2.
> One major conceptual change in ACPI 6.2 related to this work is that
> proximity domains no longer need to contain a processor.  We can now
> have memory-only proximity domains, which means that we can now have
> memory-only Linux NUMA nodes.
> 
> Here is an example configuration where we have a single processor, one
> range of regular memory and one range of HBM:
> 
>   +---------------+   +----------------+
>   | Processor     |   | Memory         |
>   | prox domain 0 +---+ prox domain 1  |
>   | NUMA node 1   |   | NUMA node 2    |
>   +-------+-------+   +----------------+
>           |
>   +-------+----------+
>   | HBM              |
>   | prox domain 2    |
>   | NUMA node 0      |
>   +------------------+
> 
> This gives us one initiator (the processor) and two targets (the two memory
> ranges).  Each of these three has its own ACPI proximity domain and
> associated Linux NUMA node.  Note also that while there is a 1:1 mapping
> from each proximity domain to each NUMA node, the numbers don't necessarily
> match up.  Additionally we can have extra NUMA nodes that don't map back to
> ACPI proximity domains.
> 
> The above configuration could also have the processor and one of the two
> memory ranges sharing a proximity domain and NUMA node, but for the
> purposes of the HMAT the two memory ranges will need to be separated.
> 
> The overall goal of this series and of the HMAT is to allow users to
> identify memory using its performance characteristics.  This is
> complicated by the amount of HMAT data that could be present in very
> large systems, so in this series we only surface performance information
> for local (initiator,target) pairings.  The changelog for patch 5
> discusses this in detail.
> 
> Ross Zwisler (3):
>   acpi: HMAT support in acpi_parse_entries_array()
>   hmat: add heterogeneous memory sysfs support
>   hmat: add performance attributes
> 
>  MAINTAINERS                         |   6 +
>  drivers/acpi/Kconfig                |   1 +
>  drivers/acpi/Makefile               |   1 +
>  drivers/acpi/hmat/Kconfig           |   7 +
>  drivers/acpi/hmat/Makefile          |   2 +
>  drivers/acpi/hmat/core.c            | 797 ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/hmat/hmat.h            |  64 +++
>  drivers/acpi/hmat/initiator.c       |  43 ++
>  drivers/acpi/hmat/perf_attributes.c |  56 +++
>  drivers/acpi/hmat/target.c          |  55 +++
>  drivers/acpi/tables.c               |  52 ++-
>  11 files changed, 1073 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/acpi/hmat/Kconfig
>  create mode 100644 drivers/acpi/hmat/Makefile
>  create mode 100644 drivers/acpi/hmat/core.c
>  create mode 100644 drivers/acpi/hmat/hmat.h
>  create mode 100644 drivers/acpi/hmat/initiator.c
>  create mode 100644 drivers/acpi/hmat/perf_attributes.c
>  create mode 100644 drivers/acpi/hmat/target.c
> 
> -- 
> 2.14.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2017-12-14 13:00 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-14  2:10 [PATCH v3 0/3] create sysfs representation of ACPI HMAT Ross Zwisler
2017-12-14  2:10 ` [PATCH v3 1/3] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
2017-12-15  0:49   ` Rafael J. Wysocki
2017-12-15  1:10   ` Dan Williams
2017-12-16  1:53     ` Rafael J. Wysocki
2017-12-16  1:57       ` Dan Williams
2017-12-16  2:15         ` Rafael J. Wysocki
2017-12-14  2:10 ` [PATCH v3 2/3] hmat: add heterogeneous memory sysfs support Ross Zwisler
2017-12-15  0:52   ` Rafael J. Wysocki
2017-12-15 20:53     ` Ross Zwisler
2017-12-14  2:10 ` [PATCH v3 3/3] hmat: add performance attributes Ross Zwisler
2017-12-14 13:00 ` Michal Hocko [this message]
2017-12-18 20:35   ` [PATCH v3 0/3] create sysfs representation of ACPI HMAT Ross Zwisler
2017-12-20 16:41     ` Ross Zwisler
2017-12-21 13:18       ` Michal Hocko
2017-12-20 18:19     ` Matthew Wilcox
2017-12-20 20:22       ` Dave Hansen
2017-12-20 21:16         ` Matthew Wilcox
2017-12-20 21:24           ` Ross Zwisler
2017-12-20 22:29             ` Dan Williams
2017-12-20 22:41               ` Ross Zwisler
2017-12-21 20:31                 ` Brice Goglin
2017-12-22 22:53                   ` Dan Williams
2017-12-22 23:22                     ` Ross Zwisler
2017-12-22 23:57                       ` Dan Williams
2017-12-23  1:14                         ` Rafael J. Wysocki
2017-12-27  9:10                     ` Brice Goglin
2017-12-30  6:58                       ` Matthew Wilcox
2017-12-30  9:19                         ` Brice Goglin
2017-12-20 21:13       ` Ross Zwisler
2017-12-21  1:41         ` Elliott, Robert (Persistent Memory)
2017-12-22 21:46           ` Ross Zwisler
2017-12-21 12:50       ` Michael Ellerman
2017-12-22  3:09 ` Anshuman Khandual
2017-12-22 10:31   ` Kogut, Jaroslaw
2017-12-22 14:37     ` Anshuman Khandual
2017-12-22 17:13   ` Dave Hansen
2017-12-23  5:14     ` Anshuman Khandual
2017-12-22 22:13   ` Ross Zwisler
2017-12-23  6:56     ` Anshuman Khandual
2017-12-22 22:31   ` Ross Zwisler
2017-12-25  2:05     ` Liubo(OS Lab)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171214130032.GK16951@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=Jaroslaw.Kogut@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=artur.koziej@intel.com \
    --cc=brice.goglin@gmail.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david.e.box@intel.com \
    --cc=devel@acpica.org \
    --cc=erik.schmauss@intel.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joonas.lahtinen@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=lukasz.anaczkowski@intel.com \
    --cc=lukasz.odzioba@intel.com \
    --cc=lv.zheng@intel.com \
    --cc=marcin.koss@intel.com \
    --cc=murugasamy.nachimuthu@intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=robert.moore@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).