From: Dan Williams <dan.j.williams@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Schofield, Alison" <alison.schofield@intel.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Ben Widawsky <ben.widawsky@intel.com>,
linux-cxl@vger.kernel.org,
Linux ACPI <linux-acpi@vger.kernel.org>
Subject: Re: [PATCH] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
Date: Mon, 18 Oct 2021 11:15:36 -0700 [thread overview]
Message-ID: <CAPcyv4g=gVeJtSAMPH5VTZfDk+eoL0zkgnQMny=T+xX8RyQKjQ@mail.gmail.com> (raw)
In-Reply-To: <20211018102538.00007023@Huawei.com>
On Mon, Oct 18, 2021 at 2:25 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Fri, 15 Oct 2021 11:58:36 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Fri, Oct 15, 2021 at 10:00 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Fri, 8 Oct 2021 18:53:39 -0700
> > > <alison.schofield@intel.com> wrote:
> > >
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > > > subtable may be assigned to a NUMA node. Since there is no
> > > > requirement that the SRAT be comprehensive for CXL memory another
> > > > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > > > in the SRAT.
> > > >
> > > > Use the CXL Fixed Memory Window Structure's (CFMWS) of the ACPI CXL
> > > > Early Discovery Table (CEDT) to find all CXL memory ranges. Create a
> > > > NUMA node for each range that is not already assigned to a NUMA node.
> > > > Add a memblk attaching its host physical address range to the node.
> > > >
> > > > Note that these ranges may not actually map any memory at boot time.
> > > > They may describe persistent capacity or may be present to enable
> > > > hot-plug.
> > > >
> > > > Consumers can use phys_to_target_node() to discover the NUMA node.
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > Hi Alison,
> > >
> > > I'm not sure that a CFMWS entry should map to a single NUMA node...
> > >
> > > Each entry corresponds to a contiguous HPA range into which CXL devices
> > > below a set of ports (if interleaved) or one port should be mapped.
> > >
> > > That could be multiple devices, each with it's own performance characteristics,
> > > or potentially a mix of persistent and volatile memory on a system with limited
> > > qtg groups.
> > >
> > > Maybe it's the best we can do though given information available
> > > before any devices are present.
> > >
> >
> > Regardless of the performance of the individual devices they can only
> > map to one of the available CFMWS entries. So the maximum number of
> > degrees of freedom is one node per CFMWS. Now if you have only one
> > entry to pick from, but have interleave sets with widely different
> > performance characteristics to online it becomes a policy decision
> > about whether to force map those interleave sets into the same node,
> > and that policy can be maintained outside the kernel.
> >
> > The alternative is to rework NUMA nodes to be something that can be
> > declared dynamically as currently there are assumptions throughout the
> > kernel that num_possible_nodes() is statically determined early in
> > boot. I am not seeing strong evidence that complexity needs to be
> > tackled in the near term, and "NUMA-node per CFMWS" should (famous
> > last words) serve CXL needs for the foreseeable future.
>
> I'm less optimistic we won't end up revisiting this in the medium
> term but can tackle that when we have better visibility of what
> people are actually building.
Agree. When we were game planning this patch internally the 2 options
were, build full support for defining new NUMA nodes after boot, or
just extend the boot-time NUMA node possibilities minimally by the
declared degrees of freedom in the CFMWS. The latter path was taken
because it gets us "80%" of what CXL needs without precluding going
the former path later if that remaining "20% proves critical to add
finer grained dynamic support.
prev parent reply other threads:[~2021-10-18 18:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-09 1:53 [PATCH] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT alison.schofield
2021-10-09 2:00 ` Alison Schofield
2021-10-09 3:56 ` kernel test robot
2021-10-09 13:23 ` kernel test robot
2021-10-11 17:13 ` Ira Weiny
2021-10-11 22:00 ` Alison Schofield
2021-10-14 0:42 ` Dan Williams
2021-10-13 23:18 ` Dan Williams
2021-10-15 16:59 ` Jonathan Cameron
2021-10-15 18:58 ` Dan Williams
2021-10-18 9:25 ` Jonathan Cameron
2021-10-18 18:15 ` Dan Williams [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPcyv4g=gVeJtSAMPH5VTZfDk+eoL0zkgnQMny=T+xX8RyQKjQ@mail.gmail.com' \
--to=dan.j.williams@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=ben.widawsky@intel.com \
--cc=ira.weiny@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).