linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com>,
	"Box, David E" <david.e.box@intel.com>,
	"Koss, Marcin" <marcin.koss@intel.com>,
	"Koziej, Artur" <artur.koziej@intel.com>,
	"Lahtinen, Joonas" <joonas.lahtinen@intel.com>,
	"Moore, Robert" <robert.moore@intel.com>,
	"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
	"Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Schmauss, Erik" <erik.schmauss@intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"Zheng, Lv" <lv.zheng@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Balbir Singh <bsingharora@gmail.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>, Len Brown <lenb@kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	"devel@acpica.org" <devel@acpica.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: RE: [PATCH v3 0/3] create sysfs representation of ACPI HMAT
Date: Fri, 22 Dec 2017 10:31:05 +0000	[thread overview]
Message-ID: <34EF90DF7C7F0647A403B771519912C7F5382CF3@irsmsx111.ger.corp.intel.com> (raw)
In-Reply-To: <2d6420f7-0a95-adfe-7390-a2aea4385ab2@linux.vnet.ibm.com>

> ... first thinking about redesigning the NUMA for
> heterogeneous memory may not be a good idea. Will look into this further.

I agree with comment that first a direction should be defined how to handle heterogeneous memory system.

> https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/
> Hierarchical_NUMA_Design_Plumbers_2017.pdf

I miss in the presentation a user perspective of the new approach, e.g.
- How does application developer see/understand the heterogeneous memory system?
- How does app developer use the heterogeneous memory system? 
- What are modification in API/sys interfaces?

In other hand, if we assume that separate memory NUMA node has different memory capabilities/attributes from stand point of particular CPU, it is easy to explain for user how to describe/handle heterogeneous memory. 

Of course, current numa design is not sufficient in kernel in following areas today:
- Exposing memory attributes that describe heterogeneous memory system
- Interfaces to use the heterogeneous memory system, e.g. more sophisticated policies
- Internal mechanism in memory management, e.g. automigration, maybe something else.

> -----Original Message-----
> From: Anshuman Khandual [mailto:khandual@linux.vnet.ibm.com]
> Sent: Friday, December 22, 2017 4:10 AM
> To: Ross Zwisler <ross.zwisler@linux.intel.com>; linux-kernel@vger.kernel.org
> Cc: Anaczkowski, Lukasz <lukasz.anaczkowski@intel.com>; Box, David E
> <david.e.box@intel.com>; Kogut, Jaroslaw <Jaroslaw.Kogut@intel.com>; Koss,
> Marcin <marcin.koss@intel.com>; Koziej, Artur <artur.koziej@intel.com>;
> Lahtinen, Joonas <joonas.lahtinen@intel.com>; Moore, Robert
> <robert.moore@intel.com>; Nachimuthu, Murugasamy
> <murugasamy.nachimuthu@intel.com>; Odzioba, Lukasz
> <lukasz.odzioba@intel.com>; Wysocki, Rafael J <rafael.j.wysocki@intel.com>;
> Rafael J. Wysocki <rjw@rjwysocki.net>; Schmauss, Erik
> <erik.schmauss@intel.com>; Verma, Vishal L <vishal.l.verma@intel.com>;
> Zheng, Lv <lv.zheng@intel.com>; Andrew Morton <akpm@linux-
> foundation.org>; Balbir Singh <bsingharora@gmail.com>; Brice Goglin
> <brice.goglin@gmail.com>; Williams, Dan J <dan.j.williams@intel.com>;
> Hansen, Dave <dave.hansen@intel.com>; Jerome Glisse <jglisse@redhat.com>;
> John Hubbard <jhubbard@nvidia.com>; Len Brown <lenb@kernel.org>; Tim
> Chen <tim.c.chen@linux.intel.com>; devel@acpica.org; linux-
> acpi@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org
> Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT
> 
> On 12/14/2017 07:40 AM, Ross Zwisler wrote:
> > ==== Quick Summary ====
> >
> > Platforms exist today which have multiple types of memory attached to
> > a single CPU.  These disparate memory ranges have some characteristics
> > in common, such as CPU cache coherence, but they can have wide ranges
> > of performance both in terms of latency and bandwidth.
> 
> Right.
> 
> >
> > For example, consider a system that contains persistent memory,
> > standard DDR memory and High Bandwidth Memory (HBM), all attached to
> the same CPU.
> > There could potentially be an order of magnitude or more difference in
> > performance between the slowest and fastest memory attached to that CPU.
> 
> Right.
> 
> >
> > With the current Linux code NUMA nodes are CPU-centric, so all the
> > memory attached to a given CPU will be lumped into the same NUMA node.
> > This makes it very difficult for userspace applications to understand
> > the performance of different memory ranges on a given CPU.
> 
> Right but that might require fundamental changes to the NUMA
> representation.
> Plugging those memory as separate NUMA nodes, identify them through sysfs
> and try allocating from it through mbind() seems like a short term solution.
> 
> Though if we decide to go in this direction, sysfs interface or something similar
> is required to enumerate memory properties.
> 
> >
> > We solve this issue by providing userspace with performance
> > information on individual memory ranges.  This performance information
> > is exposed via
> > sysfs:
> >
> >   # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
> >   mem_tgt2/firmware_id:1
> >   mem_tgt2/is_cached:0
> >   mem_tgt2/local_init/read_bw_MBps:40960
> >   mem_tgt2/local_init/read_lat_nsec:50
> >   mem_tgt2/local_init/write_bw_MBps:40960
> >   mem_tgt2/local_init/write_lat_nsec:50
> 
> I might have missed discussions from earlier versions, why we have this kind of
> a "source --> target" model ? We will enlist properties for all possible "source --
> > target" on the system ? Right now it shows only bandwidth and latency
> properties, can it accommodate other properties as well in future ?
> 
> >
> > This allows applications to easily find the memory that they want to use.
> > We expect that the existing NUMA APIs will be enhanced to use this new
> > information so that applications can continue to use them to select
> > their desired memory.
> 
> I had presented a proposal for NUMA redesign in the Plumbers Conference this
> year where various memory devices with different kind of memory attributes
> can be represented in the kernel and be used explicitly from the user space.
> Here is the link to the proposal if you feel interested. The proposal is very
> intrusive and also I dont have a RFC for it yet for discussion here.
> 
> https://linuxplumbersconf.org/2017/ocw//system/presentations/4656/original/
> Hierarchical_NUMA_Design_Plumbers_2017.pdf
> 
> Problem is, designing the sysfs interface for memory attribute detection from
> user space without first thinking about redesigning the NUMA for
> heterogeneous memory may not be a good idea. Will look into this further.

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

  reply	other threads:[~2017-12-22 10:31 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-14  2:10 [PATCH v3 0/3] create sysfs representation of ACPI HMAT Ross Zwisler
2017-12-14  2:10 ` [PATCH v3 1/3] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
2017-12-15  0:49   ` Rafael J. Wysocki
2017-12-15  1:10   ` Dan Williams
2017-12-16  1:53     ` Rafael J. Wysocki
2017-12-16  1:57       ` Dan Williams
2017-12-16  2:15         ` Rafael J. Wysocki
2017-12-14  2:10 ` [PATCH v3 2/3] hmat: add heterogeneous memory sysfs support Ross Zwisler
2017-12-15  0:52   ` Rafael J. Wysocki
2017-12-15 20:53     ` Ross Zwisler
2017-12-14  2:10 ` [PATCH v3 3/3] hmat: add performance attributes Ross Zwisler
2017-12-14 13:00 ` [PATCH v3 0/3] create sysfs representation of ACPI HMAT Michal Hocko
2017-12-18 20:35   ` Ross Zwisler
2017-12-20 16:41     ` Ross Zwisler
2017-12-21 13:18       ` Michal Hocko
2017-12-20 18:19     ` Matthew Wilcox
2017-12-20 20:22       ` Dave Hansen
2017-12-20 21:16         ` Matthew Wilcox
2017-12-20 21:24           ` Ross Zwisler
2017-12-20 22:29             ` Dan Williams
2017-12-20 22:41               ` Ross Zwisler
2017-12-21 20:31                 ` Brice Goglin
2017-12-22 22:53                   ` Dan Williams
2017-12-22 23:22                     ` Ross Zwisler
2017-12-22 23:57                       ` Dan Williams
2017-12-23  1:14                         ` Rafael J. Wysocki
2017-12-27  9:10                     ` Brice Goglin
2017-12-30  6:58                       ` Matthew Wilcox
2017-12-30  9:19                         ` Brice Goglin
2017-12-20 21:13       ` Ross Zwisler
2017-12-21  1:41         ` Elliott, Robert (Persistent Memory)
2017-12-22 21:46           ` Ross Zwisler
2017-12-21 12:50       ` Michael Ellerman
2017-12-22  3:09 ` Anshuman Khandual
2017-12-22 10:31   ` Kogut, Jaroslaw [this message]
2017-12-22 14:37     ` Anshuman Khandual
2017-12-22 17:13   ` Dave Hansen
2017-12-23  5:14     ` Anshuman Khandual
2017-12-22 22:13   ` Ross Zwisler
2017-12-23  6:56     ` Anshuman Khandual
2017-12-22 22:31   ` Ross Zwisler
2017-12-25  2:05     ` Liubo(OS Lab)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34EF90DF7C7F0647A403B771519912C7F5382CF3@irsmsx111.ger.corp.intel.com \
    --to=jaroslaw.kogut@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=artur.koziej@intel.com \
    --cc=brice.goglin@gmail.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david.e.box@intel.com \
    --cc=devel@acpica.org \
    --cc=erik.schmauss@intel.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joonas.lahtinen@intel.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=lukasz.anaczkowski@intel.com \
    --cc=lukasz.odzioba@intel.com \
    --cc=lv.zheng@intel.com \
    --cc=marcin.koss@intel.com \
    --cc=murugasamy.nachimuthu@intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=robert.moore@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).