linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Schofield, Alison" <alison.schofield@intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	linux-cxl@vger.kernel.org,
	Linux ACPI <linux-acpi@vger.kernel.org>
Subject: Re: [PATCH] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
Date: Mon, 18 Oct 2021 11:15:36 -0700	[thread overview]
Message-ID: <CAPcyv4g=gVeJtSAMPH5VTZfDk+eoL0zkgnQMny=T+xX8RyQKjQ@mail.gmail.com> (raw)
In-Reply-To: <20211018102538.00007023@Huawei.com>

On Mon, Oct 18, 2021 at 2:25 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Fri, 15 Oct 2021 11:58:36 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Fri, Oct 15, 2021 at 10:00 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Fri, 8 Oct 2021 18:53:39 -0700
> > > <alison.schofield@intel.com> wrote:
> > >
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > > > subtable may be assigned to a NUMA node. Since there is no
> > > > requirement that the SRAT be comprehensive for CXL memory another
> > > > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > > > in the SRAT.
> > > >
> > > > Use the CXL Fixed Memory Window Structure's (CFMWS) of the ACPI CXL
> > > > Early Discovery Table (CEDT) to find all CXL memory ranges. Create a
> > > > NUMA node for each range that is not already assigned to a NUMA node.
> > > > Add a memblk attaching its host physical address range to the node.
> > > >
> > > > Note that these ranges may not actually map any memory at boot time.
> > > > They may describe persistent capacity or may be present to enable
> > > > hot-plug.
> > > >
> > > > Consumers can use phys_to_target_node() to discover the NUMA node.
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > Hi Alison,
> > >
> > > I'm not sure that a CFMWS entry should map to a single NUMA node...
> > >
> > > Each entry corresponds to a contiguous HPA range into which CXL devices
> > > below a set of ports (if interleaved) or one port should be mapped.
> > >
> > > That could be multiple devices, each with it's own performance characteristics,
> > > or potentially a mix of persistent and volatile memory on a system with limited
> > > qtg groups.
> > >
> > > Maybe it's the best we can do though given information available
> > > before any devices are present.
> > >
> >
> > Regardless of the performance of the individual devices they can only
> > map to one of the available CFMWS entries. So the maximum number of
> > degrees of freedom is one node per CFMWS. Now if you have only one
> > entry to pick from, but have interleave sets with widely different
> > performance characteristics to online it becomes a policy decision
> > about whether to force map those interleave sets into the same node,
> > and that policy can be maintained outside the kernel.
> >
> > The alternative is to rework NUMA nodes to be something that can be
> > declared dynamically as currently there are assumptions throughout the
> > kernel that num_possible_nodes() is statically determined early in
> > boot. I am not seeing strong evidence that complexity needs to be
> > tackled in the near term, and "NUMA-node per CFMWS" should (famous
> > last words) serve CXL needs for the foreseeable future.
>
> I'm less optimistic we won't end up revisiting this in the medium
> term but can tackle that when we have better visibility of what
> people are actually building.

Agree. When we were game planning this patch internally the 2 options
were, build full support for defining new NUMA nodes after boot, or
just extend the boot-time NUMA node possibilities minimally by the
declared degrees of freedom in the CFMWS. The latter path was taken
because it gets us "80%" of what CXL needs without precluding going
the former path later if that remaining "20% proves critical to add
finer grained dynamic support.

      reply	other threads:[~2021-10-18 18:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-09  1:53 [PATCH] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT alison.schofield
2021-10-09  2:00 ` Alison Schofield
2021-10-09  3:56 ` kernel test robot
2021-10-09 13:23 ` kernel test robot
2021-10-11 17:13 ` Ira Weiny
2021-10-11 22:00   ` Alison Schofield
2021-10-14  0:42     ` Dan Williams
2021-10-13 23:18 ` Dan Williams
2021-10-15 16:59 ` Jonathan Cameron
2021-10-15 18:58   ` Dan Williams
2021-10-18  9:25     ` Jonathan Cameron
2021-10-18 18:15       ` Dan Williams [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4g=gVeJtSAMPH5VTZfDk+eoL0zkgnQMny=T+xX8RyQKjQ@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).