linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: Vikram Sethi <vsethi@nvidia.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"Natu, Mahesh" <mahesh.natu@intel.com>,
	"Rudoff, Andy" <andy.rudoff@intel.com>,
	Jeff Smith <JSMITH@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	"jglisse@redhat.com" <jglisse@redhat.com>,
	Linux MM <linux-mm@kvack.org>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>
Subject: Re: Onlining CXL Type2 device coherent memory
Date: Sat, 31 Oct 2020 09:51:23 -0700	[thread overview]
Message-ID: <CAPcyv4jX1tedjuU-vCSKgvhQeNFukyq9d0ddmsk7jAjWMX+iBQ@mail.gmail.com> (raw)
In-Reply-To: <451b2571-c3e8-97d8-bfd0-f8054a1b75c5@redhat.com>

On Sat, Oct 31, 2020 at 3:21 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 30.10.20 21:37, Dan Williams wrote:
> > On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> >>
> >> Hello,
> >>
> >> I wanted to kick off a discussion on how Linux onlining of CXL [1] type 2 device
> >> Coherent memory aka Host managed device memory (HDM) will work for type 2 CXL
> >> devices which are available/plugged in at boot. A type 2 CXL device can be simply
> >> thought of as an accelerator with coherent device memory, that also has a
> >> CXL.cache to cache system memory.
> >>
> >> One could envision that BIOS/UEFI could expose the HDM in EFI memory map
> >> as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However, at least
> >> on some architectures (arm64) EFI conventional memory available at kernel boot
> >> memory cannot be offlined, so this may not be suitable on all architectures.
> >
> > That seems an odd restriction. Add David, linux-mm, and linux-acpi as
> > they might be interested / have comments on this restriction as well.
> >
>
> I am missing some important details.
>
> a) What happens after offlining? Will the memory be remove_memory()'ed?
> Will the device get physically unplugged?
>
> b) What's the general purpose of the memory and its intended usage when
> *not* exposed as system RAM? What's the main point of treating it like
> ordinary system RAM as default?
>
> Also, can you be sure that you can offline that memory? If it's
> ZONE_NORMAL (as usually all system RAM in the initial map), there are no
> such guarantees, especially once the system ran for long enough, but
> also in other cases (e.g., shuffling), or if allocation policies change
> in the future.
>
> So I *guess* you would already have to use kernel cmdline hacks like
> "movablecore" to make it work. In that case, you can directly specify
> what you *actually* want (which I am not sure yet I completely
> understood) - e.g., something like "memmap=16G!16G" ... or something
> similar.
>
> I consider offlining+removing *boot* memory to not physically unplug it
> (e.g., a DIMM getting unplugged) abusing the memory hotunplug
> infrastructure. It's a different thing when manually adding memory like
> dax_kmem does via add_memory_driver_managed().
>
>
> Now, back to your original question: arm64 does not support physically
> unplugging DIMMs that were part of the initial map. If you'd reboot
> after unplugging a DIMM, your system would crash. We achieve that by
> disallowing to offline boot memory - we could also try to handle it in
> ACPI code. But again, most uses of offlining+removing boot memory are
> abusing the memory hotunplug infrastructure and should rather be solved
> cleanly via a different mechanism (firmware, kernel cmdline, ...).
>
> Just recently discussed in
>
> https://lkml.kernel.org/r/de8388df2fbc5a6a33aab95831ba7db4@codeaurora.org
>
> >> Further, the device driver associated with the type 2 device/accelerator may
> >> want to save off a chunk of HDM for driver private use.
> >> So it seems the more appropriate model may be something like dev dax model
> >> where the device driver probe/open calls add_memory_driver_managed, and
> >> the driver could choose how much of the HDM it wants to reserve and how
> >> much to make generally available for application mmap/malloc.
> >
> > Sure, it can always be driver managed. The trick will be getting the
> > platform firmware to agree to not map it by default, but I suspect
> > you'll have a hard time convincing platform-firmware to take that
> > stance. The BIOS does not know, and should not care what OS is booting
> > when it produces the memory map. So I think CXL memory unplug after
> > the fact is more realistic than trying to get the BIOS not to map it.
> > So, to me it looks like arm64 needs to reconsider its unplug stance.
>
> My personal opinion is, if memory isn't just "ordinary system RAM", then
> let the system know early that memory is special (as we do with
> soft-reserved).
>
> Ideally, you could configure the firmware (e.g., via BIOS setup) on what
> to do, that's the cleanest solution, but I can understand that's rather
> hard to achieve.

Yes, my hope, which is about the most influence I can have on
platform-firmware implementations, is that it marks CXL attached
memory as soft-reserved by default and allow OS policy decide where it
goes. Barring that, for the configuration that Vikram mentioned, the
only other way to get this differentiated / not-ordinary system-ram
back to being driver managed would be to unplug it. The soft-reserved
path is cleaner.

  reply	other threads:[~2020-10-31 16:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 23:05 Onlining CXL Type2 device coherent memory Vikram Sethi
2020-10-29 14:50 ` Ben Widawsky
2020-10-30 20:37 ` Dan Williams
2020-10-30 20:59   ` Matthew Wilcox
2020-10-30 23:38     ` Dan Williams
2020-10-30 22:39   ` Vikram Sethi
2020-11-02 17:47     ` Dan Williams
2020-10-31 10:21   ` David Hildenbrand
2020-10-31 16:51     ` Dan Williams [this message]
2020-11-02  9:51       ` David Hildenbrand
2020-11-02 16:17         ` Vikram Sethi
2020-11-02 17:53           ` David Hildenbrand
2020-11-02 18:03             ` Dan Williams
2020-11-02 19:25               ` Vikram Sethi
2020-11-02 19:45                 ` Dan Williams
2020-11-03  3:56                 ` Alistair Popple
2020-11-02 18:34       ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4jX1tedjuU-vCSKgvhQeNFukyq9d0ddmsk7jAjWMX+iBQ@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=JSMITH@nvidia.com \
    --cc=andy.rudoff@intel.com \
    --cc=anshuman.khandual@arm.com \
    --cc=david@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mahesh.natu@intel.com \
    --cc=mhairgrove@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).