linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Yang Shi <yang.shi@linux.alibaba.com>,
	Michal Hocko <mhocko@suse.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Fengguang Wu <fengguang.wu@intel.com>,
	"Du, Fan" <fan.du@intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node
Date: Mon, 25 Mar 2019 12:29:48 -0700	[thread overview]
Message-ID: <CAPcyv4gNrFOQJhKUV7crZqNfg8LQFZRVO04Z+Fo50kzswVQ=TA@mail.gmail.com> (raw)
In-Reply-To: <3df2bf0e-0b1d-d299-3b8e-51c306cdc559@inria.fr>

On Mon, Mar 25, 2019 at 10:45 AM Brice Goglin <Brice.Goglin@inria.fr> wrote:
>
> Le 25/03/2019 à 17:56, Dan Williams a écrit :
> >
> > I'm generally against the concept that a "pmem" or "type" flag should
> > indicate anything about the expected performance of the address range.
> > The kernel should explicitly look to the HMAT for performance data and
> > not otherwise make type-based performance assumptions.
>
>
> Oh sorry, I didn't mean to have the kernel use such a flag to decide of
> placement, but rather to expose more information to userspace to clarify
> what all these nodes are about when userspace will decide where to
> allocate things.

I understand, but I'm concerned about the risk of userspace developing
vendor-specific, or generation-specific policies around a coarse type
identifier. I think the lack of type specificity is a feature rather
than a gap, because it requires userspace to consider deeper
information.

Perhaps "path" might be a suitable replacement identifier rather than
type. I.e. memory that originates from an ACPI.NFIT root device is
likely "pmem".

> I understand that current NVDIMM-F are not slower than DDR and HMAT
> would better describe this than a flag. But I have seen so many buggy or
> dummy SLIT tables in the past that I wonder if we can expect HMAT to be
> widely available (and correct).

That's always a fear that the platform BIOS will try to game OS
behavior. However, that was the reason that HMAT was defined to
indicate actual performance values rather than relative. It is
hopefully harder to game than the relative SLIT values, but I'l  grant
you it's now impossible.

> Is there a safe fallback in case of missing or buggy HMAT? For instance,
> is DDR supposed to be listed before NVDIMM (or HBM) in SRAT?

One fallback might be to make some of these sysfs attributes writable
so userspace can correct the situation, but I'm otherwise unclear of
what you mean by "safe". If a platform has hard dependencies on
correctly enumerating memory performance capabilities then there's not
much the kernel can do if the HMAT is botched. I would expect the
general case is that the performance capabilities are a soft
dependency. but things still work if the data is wrong.

  reply	other threads:[~2019-03-25 19:30 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-23  4:44 [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Yang Shi
2019-03-23  4:44 ` [PATCH 01/10] mm: control memory placement by nodemask for two tier main memory Yang Shi
2019-03-23 17:21   ` Dan Williams
2019-03-25 19:28     ` Yang Shi
2019-03-25 23:18       ` Dan Williams
2019-03-25 23:36         ` Yang Shi
2019-03-25 23:42           ` Dan Williams
2019-03-23  4:44 ` [PATCH 02/10] mm: mempolicy: introduce MPOL_HYBRID policy Yang Shi
2019-03-23  4:44 ` [PATCH 03/10] mm: mempolicy: promote page to DRAM for MPOL_HYBRID Yang Shi
2019-03-23  4:44 ` [PATCH 04/10] mm: numa: promote pages to DRAM when it is accessed twice Yang Shi
2019-03-29  0:31   ` kbuild test robot
2019-03-23  4:44 ` [PATCH 05/10] mm: page_alloc: make find_next_best_node could skip DRAM node Yang Shi
2019-03-23  4:44 ` [PATCH 06/10] mm: vmscan: demote anon DRAM pages to PMEM node Yang Shi
2019-03-23  6:03   ` Zi Yan
2019-03-25 21:49     ` Yang Shi
2019-03-24 22:20   ` Keith Busch
2019-03-25 19:49     ` Yang Shi
2019-03-27  0:35       ` Keith Busch
2019-03-27  3:41         ` Yang Shi
2019-03-27 13:08           ` Keith Busch
2019-03-27 17:00             ` Zi Yan
2019-03-27 17:05               ` Dave Hansen
2019-03-27 17:48                 ` Zi Yan
2019-03-27 18:00                   ` Dave Hansen
2019-03-27 20:37                     ` Zi Yan
2019-03-27 20:42                       ` Dave Hansen
2019-03-28 21:59             ` Yang Shi
2019-03-28 22:45               ` Keith Busch
2019-03-23  4:44 ` [PATCH 07/10] mm: vmscan: add page demotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 08/10] mm: numa: add page promotion counter Yang Shi
2019-03-23  4:44 ` [PATCH 09/10] doc: add description for MPOL_HYBRID mode Yang Shi
2019-03-23  4:44 ` [PATCH 10/10] doc: elaborate the PMEM allocation rule Yang Shi
2019-03-25 16:15 ` [RFC PATCH 0/10] Another Approach to Use PMEM as NUMA Node Brice Goglin
2019-03-25 16:56   ` Dan Williams
2019-03-25 17:45     ` Brice Goglin
2019-03-25 19:29       ` Dan Williams [this message]
2019-03-25 23:09         ` Brice Goglin
2019-03-25 23:37           ` Dan Williams
2019-03-26 12:19             ` Jonathan Cameron
2019-03-25 20:04   ` Yang Shi
2019-03-26 13:58 ` Michal Hocko
2019-03-26 18:33   ` Yang Shi
2019-03-26 18:37     ` Michal Hocko
2019-03-27  2:58       ` Yang Shi
2019-03-27  9:01         ` Michal Hocko
2019-03-27 17:34           ` Dan Williams
2019-03-27 18:59             ` Yang Shi
2019-03-27 20:09               ` Michal Hocko
2019-03-28  2:09                 ` Yang Shi
2019-03-28  6:58                   ` Michal Hocko
2019-03-28 18:58                     ` Yang Shi
2019-03-28 19:12                       ` Michal Hocko
2019-03-28 19:40                         ` Yang Shi
2019-03-28 20:40                           ` Michal Hocko
2019-03-28  8:21                   ` Dan Williams
2019-03-27 20:14               ` Dave Hansen
2019-03-27 20:35             ` Matthew Wilcox
2019-03-27 20:40               ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gNrFOQJhKUV7crZqNfg8LQFZRVO04Z+Fo50kzswVQ=TA@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=Brice.Goglin@inria.fr \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=fan.du@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).