linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Anaczkowski, Lukasz" <lukasz.anaczkowski@intel.com>,
	"Box, David E" <david.e.box@intel.com>,
	"Kogut, Jaroslaw" <Jaroslaw.Kogut@intel.com>,
	"Koss, Marcin" <marcin.koss@intel.com>,
	"Koziej, Artur" <artur.koziej@intel.com>,
	"Lahtinen, Joonas" <joonas.lahtinen@intel.com>,
	"Moore, Robert" <robert.moore@intel.com>,
	"Nachimuthu, Murugasamy" <murugasamy.nachimuthu@intel.com>,
	"Odzioba, Lukasz" <lukasz.odzioba@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Schmauss, Erik" <erik.schmauss@intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"Zheng, Lv" <lv.zheng@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Balbir Singh <bsingharora@gmail.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>, Len Brown <lenb@kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	devel@acpica.org, Linux ACPI <linux-acpi@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Linux API <linux-api@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT
Date: Wed, 20 Dec 2017 15:41:05 -0700	[thread overview]
Message-ID: <20171220224105.GA27258@linux.intel.com> (raw)
In-Reply-To: <CAPcyv4gTknp=0yQnVrrB5Ui+mJE_x-wdkV86UD4hsYnx3CAjfA@mail.gmail.com>

On Wed, Dec 20, 2017 at 02:29:56PM -0800, Dan Williams wrote:
> On Wed, Dec 20, 2017 at 1:24 PM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > On Wed, Dec 20, 2017 at 01:16:49PM -0800, Matthew Wilcox wrote:
> >> On Wed, Dec 20, 2017 at 12:22:21PM -0800, Dave Hansen wrote:
> >> > On 12/20/2017 10:19 AM, Matthew Wilcox wrote:
> >> > > I don't know what the right interface is, but my laptop has a set of
> >> > > /sys/devices/system/memory/memoryN/ directories.  Perhaps this is the
> >> > > right place to expose write_bw (etc).
> >> >
> >> > Those directories are already too redundant and wasteful.  I think we'd
> >> > really rather not add to them.  In addition, it's technically possible
> >> > to have a memory section span NUMA nodes and have different performance
> >> > properties, which make it impossible to represent there.
> >> >
> >> > In any case, ACPI PXM's (Proximity Domains) are guaranteed to have
> >> > uniform performance properties in the HMAT, and we just so happen to
> >> > always create one NUMA node per PXM.  So, NUMA nodes really are a good fit.
> >>
> >> I think you're missing my larger point which is that I don't think this
> >> should be exposed to userspace as an ACPI feature.  Because if you do,
> >> then it'll also be exposed to userspace as an openfirmware feature.
> >> And sooner or later a devicetree feature.  And then writing a portable
> >> program becomes an exercise in suffering.
> >>
> >> So, what's the right place in sysfs that isn't tied to ACPI?  A new
> >> directory or set of directories under /sys/devices/system/memory/ ?
> >
> > Oh, the current location isn't at all tied to acpi except that it happens to
> > be named 'hmat'.  When it was all named 'hmem' it was just:
> >
> > /sys/devices/system/hmem
> >
> > Which has no ACPI-isms at all.  I'm happy to move it under
> > /sys/devices/system/memory/hmat if that's helpful, but I think we still have
> > the issue that the data represented therein is still pulled right from the
> > HMAT, and I don't know how to abstract it into something more platform
> > agnostic until I know what data is provided by those other platforms.
> >
> > For example, the HMAT provides latency information and bandwidth information
> > for both reads and writes.  Will the devicetree/openfirmware/etc version have
> > this same info, or will it be just different enough that it won't translate
> > into whatever I choose to stick in sysfs?
> 
> For the initial implementation do we need to have a representation of
> all the performance data? Given that
> /sys/devices/system/node/nodeX/distance is the only generic
> performance attribute published by the kernel today it is already the
> case that applications that need to target specific memories need to
> go parse information that is not provided by the kernel by default.
> The question is can those specialized applications stay special and go
> parse the platform specific data sources, like raw HMAT, directly, or
> do we expect general purpose applications to make use of this data? I
> think a firmware-id to numa-node translation facility
> (/sys/devices/system/node/nodeX/fwid) is a simple start that we can
> build on with more information as specific use cases arise.

We don't represent all the performance data, we only represent the data for
local initiator/target pairs.  I do think that this is useful to have in sysfs
because it provides a way to easily answer the most commonly asked questions
(or at least what I'm guessing will be the most commmonly asked queststions),
i.e. "given a CPU, what are the speeds of the various types of memory attached
to it", and "given a chunk of memory, how fast is it and to which CPU is it
local"?  By providing this base level of information I'm hoping to prevent
most applications from having to parse the HMAT directly.

The question of whether or not to include this local performance information
was one of the main questions of the initial RFC patch series, and I did get
feedback (albiet off-list) that the local performance information was
valuable to at least some users.  I did intentionally structure my (now very
short) set so that the performance information was added as a separate patch,
so we can get to the place you're talking about where we only provide firmware
id <=> proximity domain mappings by just leaving off the last patch in the
series.

I'm personally still of the opinion though that this last patch does add
value.

  reply	other threads:[~2017-12-20 22:41 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-14  2:10 [PATCH v3 0/3] create sysfs representation of ACPI HMAT Ross Zwisler
2017-12-14  2:10 ` [PATCH v3 1/3] acpi: HMAT support in acpi_parse_entries_array() Ross Zwisler
2017-12-15  0:49   ` Rafael J. Wysocki
2017-12-15  1:10   ` Dan Williams
2017-12-16  1:53     ` Rafael J. Wysocki
2017-12-16  1:57       ` Dan Williams
2017-12-16  2:15         ` Rafael J. Wysocki
2017-12-14  2:10 ` [PATCH v3 2/3] hmat: add heterogeneous memory sysfs support Ross Zwisler
2017-12-15  0:52   ` Rafael J. Wysocki
2017-12-15 20:53     ` Ross Zwisler
2017-12-14  2:10 ` [PATCH v3 3/3] hmat: add performance attributes Ross Zwisler
2017-12-14 13:00 ` [PATCH v3 0/3] create sysfs representation of ACPI HMAT Michal Hocko
2017-12-18 20:35   ` Ross Zwisler
2017-12-20 16:41     ` Ross Zwisler
2017-12-21 13:18       ` Michal Hocko
2017-12-20 18:19     ` Matthew Wilcox
2017-12-20 20:22       ` Dave Hansen
2017-12-20 21:16         ` Matthew Wilcox
2017-12-20 21:24           ` Ross Zwisler
2017-12-20 22:29             ` Dan Williams
2017-12-20 22:41               ` Ross Zwisler [this message]
2017-12-21 20:31                 ` Brice Goglin
2017-12-22 22:53                   ` Dan Williams
2017-12-22 23:22                     ` Ross Zwisler
2017-12-22 23:57                       ` Dan Williams
2017-12-23  1:14                         ` Rafael J. Wysocki
2017-12-27  9:10                     ` Brice Goglin
2017-12-30  6:58                       ` Matthew Wilcox
2017-12-30  9:19                         ` Brice Goglin
2017-12-20 21:13       ` Ross Zwisler
2017-12-21  1:41         ` Elliott, Robert (Persistent Memory)
2017-12-22 21:46           ` Ross Zwisler
2017-12-21 12:50       ` Michael Ellerman
2017-12-22  3:09 ` Anshuman Khandual
2017-12-22 10:31   ` Kogut, Jaroslaw
2017-12-22 14:37     ` Anshuman Khandual
2017-12-22 17:13   ` Dave Hansen
2017-12-23  5:14     ` Anshuman Khandual
2017-12-22 22:13   ` Ross Zwisler
2017-12-23  6:56     ` Anshuman Khandual
2017-12-22 22:31   ` Ross Zwisler
2017-12-25  2:05     ` Liubo(OS Lab)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171220224105.GA27258@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=Jaroslaw.Kogut@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=artur.koziej@intel.com \
    --cc=brice.goglin@gmail.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david.e.box@intel.com \
    --cc=devel@acpica.org \
    --cc=erik.schmauss@intel.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joonas.lahtinen@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukasz.anaczkowski@intel.com \
    --cc=lukasz.odzioba@intel.com \
    --cc=lv.zheng@intel.com \
    --cc=marcin.koss@intel.com \
    --cc=mhocko@kernel.org \
    --cc=murugasamy.nachimuthu@intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=robert.moore@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).