linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Keith Busch <keith.busch@intel.com>,
	Vishal L Verma <vishal.l.verma@intel.com>,
	X86 ML <x86@kernel.org>, Linux MM <linux-mm@kvack.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>, <linuxarm@huawei.com>
Subject: Re: [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device
Date: Fri, 5 Apr 2019 18:39:45 +0100	[thread overview]
Message-ID: <20190405183945.0000155f@huawei.com> (raw)
In-Reply-To: <CAPcyv4hxBFcJKbVVgNiE4UYXZS4XY9hfE8W9mN+VrcWS9AvJLw@mail.gmail.com>

On Fri, 5 Apr 2019 09:56:22 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Fri, Apr 5, 2019 at 9:24 AM Jonathan Cameron
> <jonathan.cameron@huawei.com> wrote:
> >
> > On Fri, 5 Apr 2019 08:43:03 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >  
> > > On Fri, Apr 5, 2019 at 4:19 AM Jonathan Cameron
> > > <jonathan.cameron@huawei.com> wrote:  
> > > >
> > > > On Thu, 4 Apr 2019 12:08:49 -0700
> > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > >  
> > > > > Memory that has been tagged EFI_SPECIAL_PURPOSE, and has performance
> > > > > properties described by the ACPI HMAT is expected to have an application
> > > > > specific consumer.
> > > > >
> > > > > Those consumers may want 100% of the memory capacity to be reserved from
> > > > > any usage by the kernel. By default, with this enabling, a platform
> > > > > device is created to represent this differentiated resource.
> > > > >
> > > > > A follow on change arranges for device-dax to claim these devices by
> > > > > default and provide an mmap interface for the target application.
> > > > > However, if the administrator prefers that some or all of the special
> > > > > purpose memory is made available to the core-mm the device-dax hotplug
> > > > > facility can be used to online the memory with its own numa node.
> > > > >
> > > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > > > Cc: Len Brown <lenb@kernel.org>
> > > > > Cc: Keith Busch <keith.busch@intel.com>
> > > > > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> > > >
> > > > Hi Dan,
> > > >
> > > > Great to see you getting this discussion going so fast and in
> > > > general the approach makes sense to me.
> > > >
> > > > I'm a little confused why HMAT has anything to do with this.
> > > > SPM is defined either via the attribute in SRAT SPA entries,
> > > > EF_MEMORY_SP or via the EFI memory map.
> > > >
> > > > Whether it is in HMAT or not isn't all that relevant.
> > > > Back in the days of the reservation hint (so before yesterday :)
> > > > it was relevant obviously but that's no longer true.
> > > >
> > > > So what am I missing?  
> > >
> > > It's a good question, and an assumption I should have explicitly
> > > declared in the changelog. The problem with EFI_MEMORY_SP is the same
> > > as the problem with the EfiPersistentMemory type, it isn't precise
> > > enough on its own for the kernel to delineate 'type' or
> > > device/replaceable-unit boundaries. For example, I expect one
> > > EFI_MEMORY_SP range of a specific type may be contiguous with another
> > > range of a different type. Similar to the NFIT there is no requirement
> > > in the specification that platform firmware inject multiple range
> > > entries. Instead that precision is left to the SRAT + HMAT, or the
> > > NFIT in the case of PMEM.  
> >
> > Absolutely, as long as they are all SPM, they could be anywhere in
> > the system.
> >  
> > >
> > > Conversely, and thinking through this a bit more, if a memory range is
> > > "special", but the platform fails to enumerate it in HMAT I think
> > > Linux should scream loudly that the firmware is broken and leave the
> > > range alone. The "scream loudly" piece is missing in the current set,
> > > but the "leave the range alone" functionality is included.  
> >
> > I am certainly keen on screaming if the various entries are inconsistent
> > but am not sure they necessarily are here.
> >
> > So there are a couple of ways we could get an SPM range defined.
> > The key thing here is that firmware should be attempting to describe
> > what it has to some degree somewhere.  If not it won't get a good
> > result ;)  So if there is no SRAT then you are on your own. SCREAM!
> >
> > 1. Directly in the memory map.  If there is no other information then
> >    tough luck the kernel can only sensibly handle it as one device.
> >    Or not at all, which seems like a reasonable decision to me.
> >    SCREAM
> >
> > 2. In memory map + a proximity domain entry in SRAT.  Given memory
> >    with different characteristics should be in different proximity
> >    domains anyway - this should be fairly precise. The slight snag
> >    here is that the fine grained nature of SRAT is actually a side
> >    effect of HMAT, so not sure well platforms have traditional
> >    describe their more subtle differences.
> >
> > 3. In NFIT as NFIT SPA carries the memory attribute.  Not sure if
> >    we should scream if this disagrees with the memory map.
> >
> > 4. In HMAT?  Now this changed in ACPI 6.3 to clean up the 'messy'
> >    prior relationship between it and SRAT.  Now HMAT no longer has
> >    memory address ranges as you observed.  That means, to describe
> >    properties of memory, it has to use the proximity domains of
> >    SRAT.  It provides lots of additional info about those domains
> >    but it is SRAT that defines them.
> >
> > So I would argue that HMAT itself doesn't tell us anything useful.
> > SRAT certainly does though so I think this should be coming from
> > SRAT (or NFIT as that also defines the required precision)  
> 
> I agree, yes, SRAT by itself is sufficient for this "precision"
> concern. However, do we, core Linux developers, really want to
> encourage platform vendors that they can ignore deploying HMAT data
> and get Linux to honor that sub-case for EFI_MEMORY_SP? My personal
> experience is that platform firmware will take advantage of almost any
> opportunity to minimize the data it provides to the OS. The only hard
> lever Linux has to encourage platform firmware to give complete data
> is to decline to support configurations that have incomplete data.
> 

If we decide as a community that this is the way we want to go, I'm
happy to politely point it out to our firmware people (who are a more
proactive group on detailed system descriptions than many!)

If we make this a clearly stated policy, perhaps via some comments
in the code or Documentation/ that that would be even better
and avoid people taking the 'but you could support my firmware'
line in the future.

I'll see if I can reach out to other OS vendors as well so we
can present a unified front on this (perhaps after a few days, just
in case we have any dissenting voices here!)

Thanks,

Jonathan


  reply	other threads:[~2019-04-05 17:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-04 19:08 [RFC PATCH 0/5] EFI Special Purpose Memory Support Dan Williams
2019-04-04 19:08 ` [RFC PATCH 1/5] efi: Detect UEFI 2.8 Special Purpose Memory Dan Williams
2019-04-06  4:21   ` Ard Biesheuvel
2019-04-09 16:43     ` Dan Williams
2019-04-09 17:21       ` Ard Biesheuvel
2019-04-10  2:10         ` Dan Williams
2019-04-12 20:43           ` Ard Biesheuvel
2019-04-12 21:18             ` Dan Williams
2019-04-15 11:43       ` Enrico Weigelt, metux IT consult
2019-04-04 19:08 ` [RFC PATCH 2/5] lib/memregion: Uplevel the pmem "region" ida to a global allocator Dan Williams
2019-04-04 19:32   ` Matthew Wilcox
2019-04-04 21:02     ` Dan Williams
2019-04-04 19:08 ` [RFC PATCH 3/5] acpi/hmat: Track target address ranges Dan Williams
2019-04-04 20:58   ` Keith Busch
2019-04-04 20:58     ` Dan Williams
2019-04-04 19:08 ` [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device Dan Williams
2019-04-05 11:18   ` Jonathan Cameron
2019-04-05 15:43     ` Dan Williams
2019-04-05 16:23       ` Jonathan Cameron
2019-04-05 16:56         ` Dan Williams
2019-04-05 17:39           ` Jonathan Cameron [this message]
2019-04-09 12:13   ` Christoph Hellwig
2019-04-09 14:49     ` Dan Williams
2019-04-04 19:08 ` [RFC PATCH 5/5] device-dax: Add a driver for "hmem" devices Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190405183945.0000155f@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=keith.busch@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxarm@huawei.com \
    --cc=rjw@rjwysocki.net \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).