All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-cxl@vger.kernel.org,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	"Natu, Mahesh" <mahesh.natu@intel.com>,
	Chet R Douglas <chet.r.douglas@intel.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Vishal L Verma <vishal.l.verma@intel.com>
Subject: Re: [RFC] ACPI Code First ECR: Generic Target
Date: Tue, 16 Feb 2021 08:29:01 -0800	[thread overview]
Message-ID: <CAPcyv4iv9kFLU7U9=VpYJZOiahUWJAZ_J_ZWCrGy1Lgqq+07kg@mail.gmail.com> (raw)
In-Reply-To: <20210216110643.000071f0@Huawei.com>

On Tue, Feb 16, 2021 at 3:08 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
[..]
> > Why does GI need anything more than acpi_map_pxm_to_node() to have a
> > node number assigned?
>
> It might have been possible (with limitations) to do it by making multiple
> proximity domains map to a single numa node, along with some additional
> functionality to allow it to retrieve the real node for aware drivers,
> but seeing as we already had the memoryless node infrastructure in place,
> it fitted more naturally into that scheme.  GI introduction to the
> ACPI spec, and indeed the kernel was originally driven by the needs of
> CCIX (before CXL was public) with CCIX's symmetric view of initiators
> (CPU or other) + a few other existing situations where we'd been
> papering over the topology for years and paying a cost in custom
> load balancing in drivers etc. That more symmetric view meant that the
> natural approach was to treat these as memoryless nodes.
>
> The full handling of nodes is needed to deal with situations like
> the following contrived setup. With a few interconnect
> links I haven't bothered drawing, there are existing systems where
> a portion of the topology looks like this:
>
>
>     RAM                              RAM             RAM
>      |                                |               |
>  --------        ---------        --------        --------
> | a      |      | b       |      | c      |      | d      |
> |   CPUs |------|  PCI RC |------| CPUs   |------|  CPUs  |
> |        |      |         |      |        |      |        |
>  --------        ---------        --------        --------
>                      |
>                   PCI EP
>
> We need the GI representation to allow an "aware" driver to understand
> that the PCI EP is equal distances from CPUs and RAM on (a) and (c),
> (and that using allocations from (d) is a a bad idea).  This would be
> the same as a driver running on an PCI RC attached to a memoryless
> CPU node (you would hope no one would build one of those, but I've seen
> them occasionally).  Such an aware driver carefully places both memory
> and processing threads / interrupts etc to balance the load.

That's an explanation for why GI exists, not an explanation for why a
GI needs to be anything more than translated to a Linux numa node
number and an api to lookup distance.

>
> In pre GI days, can just drop (b) into (a or c) and not worry about it, but
> that comes with a large performance cost (20% plus on network throughput
> on some of our more crazy systems, due to it appearing that balancing
> memory load across (a) and (c) doesn't make sense).  Also, if we happened
> to drop it into (c) then once we run out of space on (c) we'll start
> using (d) which is a bad idea.
>
> With GI nodes, you need an unaware PCI driver to work well and they
> will use allocations linked to the particular NUMA node that are in.
> The kernel needs to know a reasonable place to shunt them to and in
> more complex topologies the zone list may not correspond to that of
> any other node.

The kernel "needs", no it doesn't. Look at the "target_node" handling
for PMEM. Those nodes are offline, the distance can be determined, and
only when they become memory does the node become online.

The only point I can see GI needing anything more than the equivalent
of "target_node" is when the scheduler can submit jobs to GI
initiators like a CPU. Otherwise, GI is just a seed for a node number
plus numa distance.

>   In a CCIX world for example, a GI can sit between
> a pair of Home Agents with memory, and the host on the other side of
> them.  We had a lot of fun working through these cases back when drawing
> up the ACPI changes to support them. :)
>

Yes, I can imagine several interesting ACPI cases, but still
struggling to justify the GI zone list metadata.

  reply	other threads:[~2021-02-16 16:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-10  3:55 [RFC] ACPI Code First ECR: Generic Target Dan Williams
2021-02-10 11:23 ` Jonathan Cameron
2021-02-10 15:18   ` Natu, Mahesh
2021-02-10 16:02     ` Jonathan Cameron
2021-02-10 16:24   ` Dan Williams
2021-02-11  9:42     ` Jonathan Cameron
2021-02-11 17:06       ` Dan Williams
2021-02-12 12:24         ` Jonathan Cameron
2021-02-12 23:51           ` Dan Williams
2021-02-16 11:06             ` Jonathan Cameron
2021-02-16 16:29               ` Dan Williams [this message]
2021-02-16 18:06                 ` Jonathan Cameron
2021-02-16 18:22                   ` Dan Williams
2021-02-16 18:58                     ` Jonathan Cameron
2021-02-16 19:41                       ` Dan Williams
2021-02-17  9:53                         ` Jonathan Cameron
2021-02-10 17:02 ` Vikram Sethi
2021-02-12  0:13   ` Natu, Mahesh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4iv9kFLU7U9=VpYJZOiahUWJAZ_J_ZWCrGy1Lgqq+07kg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ben.widawsky@intel.com \
    --cc=chet.r.douglas@intel.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=mahesh.natu@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.