All of lore.kernel.org
 help / color / mirror / Atom feed
* [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
@ 2021-04-10  5:34 Williams, Dan J
  2021-04-12 13:32 ` Jonathan Cameron
  2021-04-19 22:55 ` Vikram Sethi
  0 siblings, 2 replies; 9+ messages in thread
From: Williams, Dan J @ 2021-04-10  5:34 UTC (permalink / raw)
  To: linux-cxl, linux-acpi
  Cc: Natu, Mahesh, Jonathan.Cameron, Douglas, Chet R, Verma, Vishal L,
	Widawsky, Ben

Changes since v1 [1]:
  * Rename Generic Target to Generic Port and make a new distinct SRAT
    type independent of Generic Initiator (Jonathan)
  * Clarify that this new "Port" concept is not limited to CXL. It is a
    generic way to describe the performance of static paths to
    dynamically added system memory (Mahesh)
  * Fixes and cleanups (Chet)

[1]: 
http://lore.kernel.org/r/CAPcyv4gmd_cygXK0PpGkXmJLC3_ctEpRvpi5P-QcuXusFX5oNQ@mail.gmail.com

---

While the platform BIOS is able to describe the performance
characteristics of CXL memory that is present at boot, it is unable to
statically enumerate the performance of CXL memory hot inserted
post-boot. The OS can enumerate most of the characteristics from link
registers and CDAT, but the performance from the CPU to the host
bridge, for example, is not enumerated by PCIE or CXL. Introduce an
ACPI mechanism for this purpose.

---

# Title: Introduce a Generic Port for hotplug memory buses like CXL

# Status: Draft v2

# Document: ACPI Specification 6.4

# License
SPDX-License Identifier: CC-BY-4.0

# Submitter:
* Sponsor: Dan Williams, Intel
* Creators/Contributors:
    * Mahesh Natu, Intel
    * Chet Douglas, Intel
    * Deepak Shivakumar, Intel
    * Jonathan Cameron, Huawei

# Changes since v1
  * Rename Generic Target to Generic Port and make a new distinct SRAT
    type independent of Generic Initiator (Jonathan)
  * Clarify that this new "Port" concept is not limited to CXL. It is a
    generic way to describe the performance of static paths to
    dynamically added system memory (Mahesh)

# Summary of the Change
Introduce a new "Generic Port" type to the SRAT to describe the
performance from CPU and other initiator domains to the root of a CXL
topology, or any other topology that might dynamically add system memory
behind the "Port". This is in support of, but not limited to, the OS
being able to enumerate the performance topology for dynamically added /
discovered CXL Memory Device endpoints.

# Benefits of the Change
Consider the case of a system with a set of CXL Host Bridges (ACPI0016),
and some endpoints attached at boot. In that scenario the platform
firmware is able to enumerate those devices, enumerate and map CXL
memory into the system physical memory address space, and generate the
typical static SRAT/SLIT/HMAT set of tables describing CXL attached
memory. Now, consider the case where devices are dynamically added and
enumerated post boot, i.e. post generation of the static memory tables.
In this scenario platform firmware is unable to perform the end-to-end
enumeration necessary to populate SRAT and HMAT for the endpoints that
may be hot-inserted behind those bridges post power-on. The
address-range is unknown so SRAT can not be pre-populated in the typical
way that hotplug system memory is enumerated. Even if a static address
range was set aside for future hotplug the performance is unknown (no
CDAT nor interleave configuration) so HMAT can not be pre-populated.

However, what is known to platform firmware that generates the
SRAT/SLIT/HMAT and is the performance characteristics of the path
between CPU and Generic Initiators to the Generic Port (e.g. CXL Host
Bridge). With the addition of a Generic Port proximity domain to the
SRAT then the SLIT and HMAT can enumerate the platform-static component
of a given edge in the platform-performance topology graph. It enables
the OS to build out a performance mapping for system memory address
ranges dynamically discovered, or provisioned, behind a Generic Port.
The OS mapping takes into account the Generic Port performance (as
either an initiator or a target), the interleave configuration, and the
bus enumerable performance characteristics (link latency, bandwidth,
switch traversals) to supplement the static HMAT data enumerated at
boot.

# Impact of the Change
A new SRAT type requires non-conforming system software to ignore the
new type in the SRAT, ignore any coordinate in the SLIT that includes
the associated port's proximity domain, and ignore any coordinate in the
HMAT that includes the port's proximity domain as either an initiator or
a target.

In contrast, conforming system software need only consult the Generic
Port data to optionally extend the enumeration and distinguish Port
attached initiators and memory targets from the existing set of
enumerated proximity domains.

A conforming implementation also has the option to ignore the Generic Port
contribution to the performance,

in either a row, or col  to be considered by system software that parses
SRAT, SLIT, and HMAT. Given that the OS still needs to dynamically
enumerate and instantiate the memory ranges and initiators behind the
Generic Port. The assumption is that operating systems that do not
support native CXL enumeration will ignore this data in the HMAT, while
CXL native enumeration aware environments will use this fragment of the
performance path to calculate the performance characteristics.

# References
* Compute Express Link Specification v2.0,
<https://www.computeexpresslink.org/>

# Detailed Description of the Change

* Section 5.2.16 System Resource Affinity Table (SRAT) add another
  bullet for Generic Ports:

	* generic ports (e.g. host bridges that can dynamically discover
	  new initiators and instantiate new memory range targets)

* Add new section 5.2.16.7 Generic Port Affinity Structure:
  The Generic Port Affinity Structure provides an association between a
  proximity domain number and a device handle representing a Generic
  Port (e.g. CXL Host Bridge, or similar device that hosts a dynamic
  topology of memory ranges and/or initiators).

  Support of Generic Port Affinity Structures by an OSPM is optional.

* Add a table describing the Generic Port Affinity Structure (Table
  5.xx):


| Field  | Byte Length | Byte Offset | Description                  |
| :----- | :---        | :---        | :--------------------------- |
| Type   | 1           | 0           | 6 Generic Port Structure     |
| Length | 1           | 1           | 32                           |
| Reserved | 1         | 2           | Reserved and must be zero    |
| Device Handle Type | 1 | 3 | Device Handle Type: See 5.2.16.6 Generic Initiator Affinity Structure for the possible device handle types and their format. |
| Proximity Domain | 4 | 4 | The proximity domain to identify the performance of this port in the HMAT. |
| Device Handle | 16   | 8           | Device Handle of the Generic Port, see Table 5.57 and 5.58 for a description of this field. |
| Flags  | 4           | 24          | See table 5.59 for a description of this field. |
| Reserved | 4         | 28          | Reserved and must be zero.   |

* Replace all instances of "Initiator" with "Initiator / Port" in "Table
  5.59 Flags - Generic Initiator Affinity Structure", including the
  table name.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-10  5:34 [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory Williams, Dan J
@ 2021-04-12 13:32 ` Jonathan Cameron
  2021-04-19 22:55 ` Vikram Sethi
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2021-04-12 13:32 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-cxl, linux-acpi, Natu, Mahesh, Douglas, Chet R, Verma,
	Vishal L, Widawsky, Ben

On Sat, 10 Apr 2021 05:34:29 +0000
"Williams, Dan J" <dan.j.williams@intel.com> wrote:

> Changes since v1 [1]:
>   * Rename Generic Target to Generic Port and make a new distinct SRAT
>     type independent of Generic Initiator (Jonathan)
>   * Clarify that this new "Port" concept is not limited to CXL. It is a
>     generic way to describe the performance of static paths to
>     dynamically added system memory (Mahesh)
>   * Fixes and cleanups (Chet)
> 
> [1]: 
> http://lore.kernel.org/r/CAPcyv4gmd_cygXK0PpGkXmJLC3_ctEpRvpi5P-QcuXusFX5oNQ@mail.gmail.com

Hi Dan,

I'm fine with the actual changes.  A few editorial suggestions inline.

Description wise, it rightly focuses on the case of memory on CXL.
I'm a little worried people might miss the fact it enables the other
direction, e.g. GI on CXL accessing system memory. Perhaps that needs
calling out a little more in the intro?

> 
> ---
> 
> While the platform BIOS is able to describe the performance
> characteristics of CXL memory that is present at boot, it is unable to
> statically enumerate the performance of CXL memory hot inserted
> post-boot. The OS can enumerate most of the characteristics from link
> registers and CDAT, but the performance from the CPU to the host
> bridge, for example, is not enumerated by PCIE or CXL. Introduce an
> ACPI mechanism for this purpose.
> 
> ---
> 
> # Title: Introduce a Generic Port for hotplug memory buses like CXL
> 
> # Status: Draft v2
> 
> # Document: ACPI Specification 6.4
> 
> # License
> SPDX-License Identifier: CC-BY-4.0
> 
> # Submitter:
> * Sponsor: Dan Williams, Intel
> * Creators/Contributors:
>     * Mahesh Natu, Intel
>     * Chet Douglas, Intel
>     * Deepak Shivakumar, Intel
>     * Jonathan Cameron, Huawei
> 
> # Changes since v1
>   * Rename Generic Target to Generic Port and make a new distinct SRAT
>     type independent of Generic Initiator (Jonathan)
>   * Clarify that this new "Port" concept is not limited to CXL. It is a
>     generic way to describe the performance of static paths to
>     dynamically added system memory (Mahesh)
> 
> # Summary of the Change
> Introduce a new "Generic Port" type to the SRAT to describe the
> performance from CPU and other initiator domains to the root of a CXL
> topology, or any other topology that might dynamically add system memory
> behind the "Port". This is in support of, but not limited to, the OS
> being able to enumerate the performance topology for dynamically added /
> discovered CXL Memory Device endpoints.
> 
> # Benefits of the Change
> Consider the case of a system with a set of CXL Host Bridges (ACPI0016),
> and some endpoints attached at boot. In that scenario the platform
> firmware is able to enumerate those devices, enumerate and map CXL
> memory into the system physical memory address space, and generate the
> typical static SRAT/SLIT/HMAT set of tables describing CXL attached
> memory. Now, consider the case where devices are dynamically added and
> enumerated post boot, i.e. post generation of the static memory tables.
> In this scenario platform firmware is unable to perform the end-to-end
> enumeration necessary to populate SRAT and HMAT for the endpoints that
> may be hot-inserted behind those bridges post power-on. The
> address-range is unknown so SRAT can not be pre-populated in the typical
> way that hotplug system memory is enumerated. Even if a static address
> range was set aside for future hotplug the performance is unknown (no
> CDAT nor interleave configuration) so HMAT can not be pre-populated.
> 
> However, what is known to platform firmware that generates the
> SRAT/SLIT/HMAT and is the performance characteristics of the path

HMAT is (drop the and)

> between CPU and Generic Initiators to the Generic Port (e.g. CXL Host
> Bridge). With the addition of a Generic Port proximity domain to the

Generic Port Affinity Structure

> SRAT then the SLIT and HMAT can enumerate the platform-static component
> of a given edge in the platform-performance topology graph. It enables
> the OS to build out a performance mapping for system memory address
> ranges dynamically discovered, or provisioned, behind a Generic Port.
> The OS mapping takes into account the Generic Port performance (as
> either an initiator or a target), the interleave configuration, and the
> bus enumerable performance characteristics (link latency, bandwidth,
> switch traversals) to supplement the static HMAT data enumerated at
> boot.
> 
> # Impact of the Change
> A new SRAT type requires non-conforming system software to ignore the
> new type in the SRAT, ignore any coordinate in the SLIT that includes
> the associated port's proximity domain, and ignore any coordinate in the
> HMAT that includes the port's proximity domain as either an initiator or
> a target.

Rereading the following comments after writing them, I think we are probably best
just ignoring the corners it refers to, but have left it here in case it comes up in
discussions.

Initially I wondered if we needed a note here to explain that we don't
need an explicit (i.e. _OSC) way to discover support for Generic Ports
because we don't expect to run into the issue with _PXM results in DSDT
needing to be modified based on that support (as we don't expect _PXM
to ever refer to a proximity domain containing only a Generic Port).

Perhaps that is somewhat of a backwards argument.  Does it
make sense to add the constraint that a proximity domain when it only
contains a Generic Port should not be used in _PXM?
"
6.2.14 _PXM (Proximity)
This optional object is used to describe proximity domain associations
within a machine. _PXM evaluates to an integer that identifies a device
as belonging to a Proximity Domain defined in the System Resource
Affinity Table (SRAT).
"

Now it's already possible to use Affinity structures in SRAT to define
domains that will result in craziness.  E.g. an ITS that is in a domain
on it's own.  So perhaps this isn't the time to make that more explicit.

> 
> In contrast, conforming system software need only consult the Generic
> Port data to optionally extend the enumeration and distinguish Port
> attached initiators and memory targets from the existing set of
> enumerated proximity domains.
> 
> A conforming implementation also has the option to ignore the Generic Port
> contribution to the performance,
> 
> in either a row, or col  to be considered by system software that parses

Odd line break above and sentence doesn't parse.  Editing issue perhaps?

> SRAT, SLIT, and HMAT. Given that the OS still needs to dynamically
> enumerate and instantiate the memory ranges and initiators behind the
> Generic Port.

"Given" suggests something as a result of, or perhaps the intent here
"Given that, the OS..."

> The assumption is that operating systems that do not
> support native CXL enumeration will ignore this data in the HMAT, while
> CXL native enumeration aware environments will use this fragment of the
> performance path to calculate the performance characteristics.
> 
> # References
> * Compute Express Link Specification v2.0,
> <https://www.computeexpresslink.org/>
> 
> # Detailed Description of the Change
> 
> * Section 5.2.16 System Resource Affinity Table (SRAT) add another
>   bullet for Generic Ports:
> 
> 	* generic ports (e.g. host bridges that can dynamically discover
> 	  new initiators and instantiate new memory range targets)

That implies the host bridge is responsible for the work of discovering
the initiators and memory range targets.  Perhaps reword along the lines of

e.g. host bridges beyond which new initiators and memory range targets
may be dynamically discovered.

> 
> * Add new section 5.2.16.7 Generic Port Affinity Structure:
>   The Generic Port Affinity Structure provides an association between a
>   proximity domain number and a device handle representing a Generic
>   Port (e.g. CXL Host Bridge, or similar device that hosts a dynamic
>   topology of memory ranges and/or initiators).
> 
>   Support of Generic Port Affinity Structures by an OSPM is optional.
> 
> * Add a table describing the Generic Port Affinity Structure (Table
>   5.xx):
> 
> 
> | Field  | Byte Length | Byte Offset | Description                  |
> | :----- | :---        | :---        | :--------------------------- |
> | Type   | 1           | 0           | 6 Generic Port Structure     |
> | Length | 1           | 1           | 32                           |
> | Reserved | 1         | 2           | Reserved and must be zero    |
> | Device Handle Type | 1 | 3 | Device Handle Type: See 5.2.16.6 Generic Initiator Affinity Structure for the possible device handle types and their format. |
> | Proximity Domain | 4 | 4 | The proximity domain to identify the performance of this port in the HMAT. |

Perhaps make that vague to allow for SLIT as well?  I'd imagine any sane
OS will rely on HMAT being available but that constraint doesn't exist yet
in the ACPI spec (I wish it did!)

> | Device Handle | 16   | 8           | Device Handle of the Generic Port, see Table 5.57 and 5.58 for a description of this field. |
> | Flags  | 4           | 24          | See table 5.59 for a description of this field. |
> | Reserved | 4         | 28          | Reserved and must be zero.   |
> 
> * Replace all instances of "Initiator" with "Initiator / Port" in "Table
>   5.59 Flags - Generic Initiator Affinity Structure", including the
>   table name.

Looks to me like all the text around HMAT and SLIT is 'broad' enough that we don't
need to modify anything there which definitely makes this a cleaner change than
I expected.

Good stuff,

Jonathan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-10  5:34 [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory Williams, Dan J
  2021-04-12 13:32 ` Jonathan Cameron
@ 2021-04-19 22:55 ` Vikram Sethi
  2021-04-20  3:19   ` Dan Williams
  1 sibling, 1 reply; 9+ messages in thread
From: Vikram Sethi @ 2021-04-19 22:55 UTC (permalink / raw)
  To: Williams, Dan J, linux-cxl, linux-acpi
  Cc: Natu, Mahesh, Jonathan.Cameron, Douglas, Chet R, Verma, Vishal L,
	Widawsky, Ben, Samer El-Haj-Mahmoud, Thanu Rangarajan

Hi Dan, 
> From: Williams, Dan J <dan.j.williams@intel.com>
> While the platform BIOS is able to describe the performance
> characteristics of CXL memory that is present at boot, it is unable to
> statically enumerate the performance of CXL memory hot inserted
> post-boot. The OS can enumerate most of the characteristics from link
> registers and CDAT, but the performance from the CPU to the host
> bridge, for example, is not enumerated by PCIE or CXL. Introduce an
> ACPI mechanism for this purpose.
> 
> ---
> 
> # Title: Introduce a Generic Port for hotplug memory buses like CXL
> 
> # Status: Draft v2
> 
> # Document: ACPI Specification 6.4
> 
> # License
> SPDX-License Identifier: CC-BY-4.0
> 
> # Submitter:
> * Sponsor: Dan Williams, Intel
> * Creators/Contributors:
>     * Mahesh Natu, Intel
>     * Chet Douglas, Intel
>     * Deepak Shivakumar, Intel
>     * Jonathan Cameron, Huawei
> 
> # Changes since v1
>   * Rename Generic Target to Generic Port and make a new distinct SRAT
>     type independent of Generic Initiator (Jonathan)
>   * Clarify that this new "Port" concept is not limited to CXL. It is a
>     generic way to describe the performance of static paths to
>     dynamically added system memory (Mahesh)
> 
> # Summary of the Change
> Introduce a new "Generic Port" type to the SRAT to describe the
> performance from CPU and other initiator domains to the root of a CXL
> topology, or any other topology that might dynamically add system memory
> behind the "Port". This is in support of, but not limited to, the OS
> being able to enumerate the performance topology for dynamically added /
> discovered CXL Memory Device endpoints.
> 
> # Benefits of the Change
> Consider the case of a system with a set of CXL Host Bridges (ACPI0016),
> and some endpoints attached at boot. In that scenario the platform
> firmware is able to enumerate those devices, enumerate and map CXL
> memory into the system physical memory address space, and generate the
> typical static SRAT/SLIT/HMAT set of tables describing CXL attached
> memory. Now, consider the case where devices are dynamically added and
> enumerated post boot, i.e. post generation of the static memory tables.
> In this scenario platform firmware is unable to perform the end-to-end
> enumeration necessary to populate SRAT and HMAT for the endpoints that
> may be hot-inserted behind those bridges post power-on. The
> address-range is unknown so SRAT can not be pre-populated in the typical
> way that hotplug system memory is enumerated. Even if a static address
> range was set aside for future hotplug the performance is unknown (no
> CDAT nor interleave configuration) so HMAT can not be pre-populated.
> 
> However, what is known to platform firmware that generates the
> SRAT/SLIT/HMAT and is the performance characteristics of the path
> between CPU and Generic Initiators to the Generic Port (e.g. CXL Host
> Bridge). With the addition of a Generic Port proximity domain to the
> SRAT then the SLIT and HMAT can enumerate the platform-static component
> of a given edge in the platform-performance topology graph. It enables
> the OS to build out a performance mapping for system memory address
> ranges dynamically discovered, or provisioned, behind a Generic Port.
> The OS mapping takes into account the Generic Port performance (as
> either an initiator or a target), the interleave configuration, and the
> bus enumerable performance characteristics (link latency, bandwidth,
> switch traversals) to supplement the static HMAT data enumerated at
> boot.
> 
> # Impact of the Change
> A new SRAT type requires non-conforming system software to ignore the
> new type in the SRAT, ignore any coordinate in the SLIT that includes
> the associated port's proximity domain, and ignore any coordinate in the
> HMAT that includes the port's proximity domain as either an initiator or
> a target.
> 
> In contrast, conforming system software need only consult the Generic
> Port data to optionally extend the enumeration and distinguish Port
> attached initiators and memory targets from the existing set of
> enumerated proximity domains.
> 
> A conforming implementation also has the option to ignore the Generic Port
> contribution to the performance,
> 
> in either a row, or col  to be considered by system software that parses
> SRAT, SLIT, and HMAT. Given that the OS still needs to dynamically
> enumerate and instantiate the memory ranges and initiators behind the
> Generic Port. The assumption is that operating systems that do not
> support native CXL enumeration will ignore this data in the HMAT, while
> CXL native enumeration aware environments will use this fragment of the
> performance path to calculate the performance characteristics.
> 
> # References
> * Compute Express Link Specification v2.0,
> <https://www.computeexpresslink.org/>
> 
> # Detailed Description of the Change
> 
> * Section 5.2.16 System Resource Affinity Table (SRAT) add another
>   bullet for Generic Ports:
> 
>         * generic ports (e.g. host bridges that can dynamically discover
>           new initiators and instantiate new memory range targets)
> 
> * Add new section 5.2.16.7 Generic Port Affinity Structure:
>   The Generic Port Affinity Structure provides an association between a
>   proximity domain number and a device handle representing a Generic
>   Port (e.g. CXL Host Bridge, or similar device that hosts a dynamic
>   topology of memory ranges and/or initiators).
> 
>   Support of Generic Port Affinity Structures by an OSPM is optional.
> 
> * Add a table describing the Generic Port Affinity Structure (Table
>   5.xx):
> 
> 
> | Field  | Byte Length | Byte Offset | Description                  |
> | :----- | :---        | :---        | :--------------------------- |
> | Type   | 1           | 0           | 6 Generic Port Structure     |
> | Length | 1           | 1           | 32                           |
> | Reserved | 1         | 2           | Reserved and must be zero    |
> | Device Handle Type | 1 | 3 | Device Handle Type: See 5.2.16.6 Generic Initiator
> Affinity Structure for the possible device handle types and their format. |
> | Proximity Domain | 4 | 4 | The proximity domain to identify the performance of
> this port in the HMAT. |
> | Device Handle | 16   | 8           | Device Handle of the Generic Port, see Table
> 5.57 and 5.58 for a description of this field. |
> | Flags  | 4           | 24          | See table 5.59 for a description of this field. |
> | Reserved | 4         | 28          | Reserved and must be zero.   |
> 
> * Replace all instances of "Initiator" with "Initiator / Port" in "Table
>   5.59 Flags - Generic Initiator Affinity Structure", including the
>   table name.

I wanted to discuss the implications of a CXL host bridge implementation that 
does not set the "Architectural Transactions" bit/flag in the aforementioned 
Flags in Table 5.59.

Since the kernel would be expecting all "System RAM" to have equivalent 
Functional properties, if HDM cannot have all the same functionality, then in
the absence of ISA specific ACPI tables clarifying what architectural feature isn't
supported, the kernel may be forced to not online the HDM memory as system 
RAM. If there is more granular expression of what features are lacking in a ISA
Specific table (eg Memory Tagging/Memory Protection keys not supported),
the kernel could choose to not enable that feature in all of system RAM (if 
discrepancy discovered at boot), but still online the HDM as System RAM.

To that end, it may be useful to clarify this to host vendors by way of an 
Implementation note (ideally in the CXL specification). Something like:
"CXL hosts are encouraged to support all architectural features in HDM 
as they do in CPU attached memory to avoid either the memory from 
being onlined as System RAM, or the architectural feature being disabled. 
Hosts must indicate architectural parity for HDM in ACPI SRAT 
“Generic Port” flags “Architectural transactions” bit by setting it to 1. 
A port that sets this bit to 0 will need ISA specific ways/ACPI tables to 
describe which specific ISA features would not work in HDM, so an OS 
could disable that memory or that feature."

Thoughts?

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-19 22:55 ` Vikram Sethi
@ 2021-04-20  3:19   ` Dan Williams
  2021-04-20  4:22     ` Vikram Sethi
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2021-04-20  3:19 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: linux-cxl, linux-acpi, Natu, Mahesh, Jonathan.Cameron, Douglas,
	Chet R, Verma, Vishal L, Widawsky, Ben, Samer El-Haj-Mahmoud,
	Thanu Rangarajan

On Mon, Apr 19, 2021 at 3:56 PM Vikram Sethi <vsethi@nvidia.com> wrote:
[..]
> > * Replace all instances of "Initiator" with "Initiator / Port" in "Table
> >   5.59 Flags - Generic Initiator Affinity Structure", including the
> >   table name.
>
> I wanted to discuss the implications of a CXL host bridge implementation that
> does not set the "Architectural Transactions" bit/flag in the aforementioned
> Flags in Table 5.59.
>
> Since the kernel would be expecting all "System RAM" to have equivalent
> Functional properties, if HDM cannot have all the same functionality, then in
> the absence of ISA specific ACPI tables clarifying what architectural feature isn't
> supported, the kernel may be forced to not online the HDM memory as system
> RAM. If there is more granular expression of what features are lacking in a ISA
> Specific table (eg Memory Tagging/Memory Protection keys not supported),
> the kernel could choose to not enable that feature in all of system RAM (if
> discrepancy discovered at boot), but still online the HDM as System RAM.
>
> To that end, it may be useful to clarify this to host vendors by way of an
> Implementation note (ideally in the CXL specification). Something like:
> "CXL hosts are encouraged to support all architectural features in HDM
> as they do in CPU attached memory to avoid either the memory from
> being onlined as System RAM, or the architectural feature being disabled.
> Hosts must indicate architectural parity for HDM in ACPI SRAT
> “Generic Port” flags “Architectural transactions” bit by setting it to 1.
> A port that sets this bit to 0 will need ISA specific ways/ACPI tables to
> describe which specific ISA features would not work in HDM, so an OS
> could disable that memory or that feature."
>
> Thoughts?

The problem, as you know, is that those features are already defined
without those "ISA specific ways / ACPI tables". I think it simply
must be the case that the only agent in the system that is aware of
the intersection of capabilities between ISA and CXL (platform
firmware) must mitigate the conflict. I.e. so that the CXL
infrastructure need not worry about ISA feature capability and vice
versa. To me, this looks like a platform firmware pre-boot
configuration menu / switch that turns off CXL (declines to publish
ACPI0016 devices) if incompatible ISA feature "X" is enabled, or the
reverse turns off ISA feature "X" if CXL is enabled. In other words,
the conflict needs to be resolved before the OS boots, setting this
bit to 0 is not a viable option for mitigating the conflict because
there is no requirement for the OS to even look at this flag.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-20  3:19   ` Dan Williams
@ 2021-04-20  4:22     ` Vikram Sethi
  2021-04-20  5:08       ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Vikram Sethi @ 2021-04-20  4:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-cxl, linux-acpi, Natu, Mahesh, Jonathan.Cameron, Douglas,
	Chet R, Verma, Vishal L, Widawsky, Ben, Samer El-Haj-Mahmoud,
	Thanu Rangarajan

> From: Dan Williams <dan.j.williams@intel.com>
> On Mon, Apr 19, 2021 at 3:56 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> [..]
> > > * Replace all instances of "Initiator" with "Initiator / Port" in "Table
> > >   5.59 Flags - Generic Initiator Affinity Structure", including the
> > >   table name.
> >
> > I wanted to discuss the implications of a CXL host bridge implementation that
> > does not set the "Architectural Transactions" bit/flag in the aforementioned
> > Flags in Table 5.59.
> >
> > Since the kernel would be expecting all "System RAM" to have equivalent
> > Functional properties, if HDM cannot have all the same functionality, then in
> > the absence of ISA specific ACPI tables clarifying what architectural feature isn't
> > supported, the kernel may be forced to not online the HDM memory as system
> > RAM. If there is more granular expression of what features are lacking in a ISA
> > Specific table (eg Memory Tagging/Memory Protection keys not supported),
> > the kernel could choose to not enable that feature in all of system RAM (if
> > discrepancy discovered at boot), but still online the HDM as System RAM.
> >
> > To that end, it may be useful to clarify this to host vendors by way of an
> > Implementation note (ideally in the CXL specification). Something like:
> > "CXL hosts are encouraged to support all architectural features in HDM
> > as they do in CPU attached memory to avoid either the memory from
> > being onlined as System RAM, or the architectural feature being disabled.
> > Hosts must indicate architectural parity for HDM in ACPI SRAT
> > “Generic Port” flags “Architectural transactions” bit by setting it to 1.
> > A port that sets this bit to 0 will need ISA specific ways/ACPI tables to
> > describe which specific ISA features would not work in HDM, so an OS
> > could disable that memory or that feature."
> >
> > Thoughts?
> 
> The problem, as you know, is that those features are already defined
> without those "ISA specific ways / ACPI tables". I think it simply
> must be the case that the only agent in the system that is aware of
> the intersection of capabilities between ISA and CXL (platform
> firmware) must mitigate the conflict. I.e. so that the CXL
> infrastructure need not worry about ISA feature capability and vice
> versa. To me, this looks like a platform firmware pre-boot
> configuration menu / switch that turns off CXL (declines to publish
> ACPI0016 devices) if incompatible ISA feature "X" is enabled, or the
> reverse turns off ISA feature "X" if CXL is enabled. In other words,
> the conflict needs to be resolved before the OS boots, setting this
> bit to 0 is not a viable option for mitigating the conflict because
> there is no requirement for the OS to even look at this flag.

Leaving it to Firmware is easier for the OS, but could be a couple 
of issues with that:
Platform firmware may not have a way of disabling ISA feature
if it is directly visible to the OS via CPU ID registers, and the
registers can't be trapped to some FW and values adjusted 
on access
Platform Firmware may not know if the OS supports a specific
Feature (code may not exist or not default option etc) so it
may be premature/suboptimal to disable CXL hostbridge 
altogether. Although I suppose a UEFI variable type knob
could be adjusted in this case and take effect on reboot.

Also, for some *future* ISA features where it may be possible and
practical to define ISA feature support discovery per NUMA 
node/address range w/ ACPI (prior to userspace ABI being set),
the platform would want to enable the CXL host bridge and leave
selective enablement of the feature to the OS. Yes, this is messy
and best avoided, but it may make sense depending on ISA
feature and how messy it makes user space. I'm personally
not in favor of this latter option, but I'm told this was discussed
in other Coherent interconnect forums and chosen as a path 
forward.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-20  4:22     ` Vikram Sethi
@ 2021-04-20  5:08       ` Dan Williams
  2021-04-20  9:03         ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2021-04-20  5:08 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: linux-cxl, linux-acpi, Natu, Mahesh, Jonathan.Cameron, Douglas,
	Chet R, Verma, Vishal L, Widawsky, Ben, Samer El-Haj-Mahmoud,
	Thanu Rangarajan

On Mon, Apr 19, 2021 at 9:22 PM Vikram Sethi <vsethi@nvidia.com> wrote:
>
> > From: Dan Williams <dan.j.williams@intel.com>
> > On Mon, Apr 19, 2021 at 3:56 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> > [..]
> > > > * Replace all instances of "Initiator" with "Initiator / Port" in "Table
> > > >   5.59 Flags - Generic Initiator Affinity Structure", including the
> > > >   table name.
> > >
> > > I wanted to discuss the implications of a CXL host bridge implementation that
> > > does not set the "Architectural Transactions" bit/flag in the aforementioned
> > > Flags in Table 5.59.
> > >
> > > Since the kernel would be expecting all "System RAM" to have equivalent
> > > Functional properties, if HDM cannot have all the same functionality, then in
> > > the absence of ISA specific ACPI tables clarifying what architectural feature isn't
> > > supported, the kernel may be forced to not online the HDM memory as system
> > > RAM. If there is more granular expression of what features are lacking in a ISA
> > > Specific table (eg Memory Tagging/Memory Protection keys not supported),
> > > the kernel could choose to not enable that feature in all of system RAM (if
> > > discrepancy discovered at boot), but still online the HDM as System RAM.
> > >
> > > To that end, it may be useful to clarify this to host vendors by way of an
> > > Implementation note (ideally in the CXL specification). Something like:
> > > "CXL hosts are encouraged to support all architectural features in HDM
> > > as they do in CPU attached memory to avoid either the memory from
> > > being onlined as System RAM, or the architectural feature being disabled.
> > > Hosts must indicate architectural parity for HDM in ACPI SRAT
> > > “Generic Port” flags “Architectural transactions” bit by setting it to 1.
> > > A port that sets this bit to 0 will need ISA specific ways/ACPI tables to
> > > describe which specific ISA features would not work in HDM, so an OS
> > > could disable that memory or that feature."
> > >
> > > Thoughts?
> >
> > The problem, as you know, is that those features are already defined
> > without those "ISA specific ways / ACPI tables". I think it simply
> > must be the case that the only agent in the system that is aware of
> > the intersection of capabilities between ISA and CXL (platform
> > firmware) must mitigate the conflict. I.e. so that the CXL
> > infrastructure need not worry about ISA feature capability and vice
> > versa. To me, this looks like a platform firmware pre-boot
> > configuration menu / switch that turns off CXL (declines to publish
> > ACPI0016 devices) if incompatible ISA feature "X" is enabled, or the
> > reverse turns off ISA feature "X" if CXL is enabled. In other words,
> > the conflict needs to be resolved before the OS boots, setting this
> > bit to 0 is not a viable option for mitigating the conflict because
> > there is no requirement for the OS to even look at this flag.
>
> Leaving it to Firmware is easier for the OS, but could be a couple
> of issues with that:
> Platform firmware may not have a way of disabling ISA feature
> if it is directly visible to the OS via CPU ID registers, and the
> registers can't be trapped to some FW and values adjusted
> on access
> Platform Firmware may not know if the OS supports a specific
> Feature (code may not exist or not default option etc) so it
> may be premature/suboptimal to disable CXL hostbridge
> altogether. Although I suppose a UEFI variable type knob
> could be adjusted in this case and take effect on reboot.
>
> Also, for some *future* ISA features where it may be possible and
> practical to define ISA feature support discovery per NUMA
> node/address range w/ ACPI (prior to userspace ABI being set),
> the platform would want to enable the CXL host bridge and leave
> selective enablement of the feature to the OS. Yes, this is messy
> and best avoided, but it may make sense depending on ISA
> feature and how messy it makes user space. I'm personally
> not in favor of this latter option, but I'm told this was discussed
> in other Coherent interconnect forums and chosen as a path
> forward.

I think it's reasonable for new stuff to define _OSC or other opt-in
requirements to allow the OS to manage ISA vs CXL conflict policy. For
existing conflicts the only reliable mechanism is decline to publish
ACPI0016 if platform firmware can enumerate an ISA feature that it is
not supported on CXL. So I think the proposal here is a recommendation
for platform firmware implementations that they are responsible for
this conflict resolution unless / until other mechanisms arrive.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-20  5:08       ` Dan Williams
@ 2021-04-20  9:03         ` Jonathan Cameron
  2021-04-20 16:53           ` Vikram Sethi
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Cameron @ 2021-04-20  9:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vikram Sethi, linux-cxl, linux-acpi, Natu, Mahesh, Douglas,
	Chet R, Verma, Vishal L, Widawsky, Ben, Samer El-Haj-Mahmoud,
	Thanu Rangarajan

On Mon, 19 Apr 2021 22:08:39 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Mon, Apr 19, 2021 at 9:22 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> >  
> > > From: Dan Williams <dan.j.williams@intel.com>
> > > On Mon, Apr 19, 2021 at 3:56 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> > > [..]  
> > > > > * Replace all instances of "Initiator" with "Initiator / Port" in "Table
> > > > >   5.59 Flags - Generic Initiator Affinity Structure", including the
> > > > >   table name.  
> > > >
> > > > I wanted to discuss the implications of a CXL host bridge implementation that
> > > > does not set the "Architectural Transactions" bit/flag in the aforementioned
> > > > Flags in Table 5.59.
> > > >
> > > > Since the kernel would be expecting all "System RAM" to have equivalent
> > > > Functional properties, if HDM cannot have all the same functionality, then in
> > > > the absence of ISA specific ACPI tables clarifying what architectural feature isn't
> > > > supported, the kernel may be forced to not online the HDM memory as system
> > > > RAM. If there is more granular expression of what features are lacking in a ISA
> > > > Specific table (eg Memory Tagging/Memory Protection keys not supported),
> > > > the kernel could choose to not enable that feature in all of system RAM (if
> > > > discrepancy discovered at boot), but still online the HDM as System RAM.
> > > >
> > > > To that end, it may be useful to clarify this to host vendors by way of an
> > > > Implementation note (ideally in the CXL specification). Something like:
> > > > "CXL hosts are encouraged to support all architectural features in HDM
> > > > as they do in CPU attached memory to avoid either the memory from
> > > > being onlined as System RAM, or the architectural feature being disabled.
> > > > Hosts must indicate architectural parity for HDM in ACPI SRAT
> > > > “Generic Port” flags “Architectural transactions” bit by setting it to 1.
> > > > A port that sets this bit to 0 will need ISA specific ways/ACPI tables to
> > > > describe which specific ISA features would not work in HDM, so an OS
> > > > could disable that memory or that feature."
> > > >
> > > > Thoughts?  
> > >
> > > The problem, as you know, is that those features are already defined
> > > without those "ISA specific ways / ACPI tables". I think it simply
> > > must be the case that the only agent in the system that is aware of
> > > the intersection of capabilities between ISA and CXL (platform
> > > firmware) must mitigate the conflict. I.e. so that the CXL
> > > infrastructure need not worry about ISA feature capability and vice
> > > versa. To me, this looks like a platform firmware pre-boot
> > > configuration menu / switch that turns off CXL (declines to publish
> > > ACPI0016 devices) if incompatible ISA feature "X" is enabled, or the
> > > reverse turns off ISA feature "X" if CXL is enabled. In other words,
> > > the conflict needs to be resolved before the OS boots, setting this
> > > bit to 0 is not a viable option for mitigating the conflict because
> > > there is no requirement for the OS to even look at this flag.  
> >
> > Leaving it to Firmware is easier for the OS, but could be a couple
> > of issues with that:
> > Platform firmware may not have a way of disabling ISA feature
> > if it is directly visible to the OS via CPU ID registers, and the
> > registers can't be trapped to some FW and values adjusted
> > on access
> > Platform Firmware may not know if the OS supports a specific
> > Feature (code may not exist or not default option etc) so it
> > may be premature/suboptimal to disable CXL hostbridge
> > altogether. Although I suppose a UEFI variable type knob
> > could be adjusted in this case and take effect on reboot.
> >
> > Also, for some *future* ISA features where it may be possible and
> > practical to define ISA feature support discovery per NUMA
> > node/address range w/ ACPI (prior to userspace ABI being set),
> > the platform would want to enable the CXL host bridge and leave
> > selective enablement of the feature to the OS. Yes, this is messy
> > and best avoided, but it may make sense depending on ISA
> > feature and how messy it makes user space. I'm personally
> > not in favor of this latter option, but I'm told this was discussed
> > in other Coherent interconnect forums and chosen as a path
> > forward.  
> 
> I think it's reasonable for new stuff to define _OSC or other opt-in
> requirements to allow the OS to manage ISA vs CXL conflict policy. For
> existing conflicts the only reliable mechanism is decline to publish
> ACPI0016 if platform firmware can enumerate an ISA feature that it is
> not supported on CXL. So I think the proposal here is a recommendation
> for platform firmware implementations that they are responsible for
> this conflict resolution unless / until other mechanisms arrive.

Agreed with one addition.  It should be possible to retrofit negotiation
for existing features as well. Default policy should be that it's firmware's
problem but if the OS uses _OSC to negotiate something else then it may
be possible to be more flexible. As long as the default is safe, relaxing
that can happen once mechanisms are defined.  The actual decision on
whether to enable ACPI0016 can for example be pushed into AML code.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-20  9:03         ` Jonathan Cameron
@ 2021-04-20 16:53           ` Vikram Sethi
  2021-04-20 17:30             ` Jonathan Cameron
  0 siblings, 1 reply; 9+ messages in thread
From: Vikram Sethi @ 2021-04-20 16:53 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: linux-cxl, linux-acpi, Natu, Mahesh, Douglas, Chet R, Verma,
	Vishal L, Widawsky, Ben, Samer El-Haj-Mahmoud, Thanu Rangarajan,
	Lorenzo Pieralisi, Mark Hairgrove



> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Sent: Tuesday, April 20, 2021 4:04 AM
> To: Dan Williams <dan.j.williams@intel.com>
> Cc: Vikram Sethi <vsethi@nvidia.com>; linux-cxl@vger.kernel.org; linux-
> acpi@vger.kernel.org; Natu, Mahesh <mahesh.natu@intel.com>; Douglas, Chet R
> <chet.r.douglas@intel.com>; Verma, Vishal L <vishal.l.verma@intel.com>;
> Widawsky, Ben <ben.widawsky@intel.com>; Samer El-Haj-Mahmoud <Samer.El-
> Haj-Mahmoud@arm.com>; Thanu Rangarajan <Thanu.Rangarajan@arm.com>
> Subject: Re: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug
> memory
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, 19 Apr 2021 22:08:39 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > On Mon, Apr 19, 2021 at 9:22 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> > >
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > > On Mon, Apr 19, 2021 at 3:56 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> > > > [..]
> > > > > > * Replace all instances of "Initiator" with "Initiator / Port" in "Table
> > > > > >   5.59 Flags - Generic Initiator Affinity Structure", including the
> > > > > >   table name.
> > > > >
> > > > > I wanted to discuss the implications of a CXL host bridge implementation
> that
> > > > > does not set the "Architectural Transactions" bit/flag in the aforementioned
> > > > > Flags in Table 5.59.
> > > > >
> > > > > Since the kernel would be expecting all "System RAM" to have equivalent
> > > > > Functional properties, if HDM cannot have all the same functionality, then
> in
> > > > > the absence of ISA specific ACPI tables clarifying what architectural feature
> isn't
> > > > > supported, the kernel may be forced to not online the HDM memory as
> system
> > > > > RAM. If there is more granular expression of what features are lacking in a
> ISA
> > > > > Specific table (eg Memory Tagging/Memory Protection keys not
> supported),
> > > > > the kernel could choose to not enable that feature in all of system RAM (if
> > > > > discrepancy discovered at boot), but still online the HDM as System RAM.
> > > > >
> > > > > To that end, it may be useful to clarify this to host vendors by way of an
> > > > > Implementation note (ideally in the CXL specification). Something like:
> > > > > "CXL hosts are encouraged to support all architectural features in HDM
> > > > > as they do in CPU attached memory to avoid either the memory from
> > > > > being onlined as System RAM, or the architectural feature being disabled.
> > > > > Hosts must indicate architectural parity for HDM in ACPI SRAT
> > > > > “Generic Port” flags “Architectural transactions” bit by setting it to 1.
> > > > > A port that sets this bit to 0 will need ISA specific ways/ACPI tables to
> > > > > describe which specific ISA features would not work in HDM, so an OS
> > > > > could disable that memory or that feature."
> > > > >
> > > > > Thoughts?
> > > >
> > > > The problem, as you know, is that those features are already defined
> > > > without those "ISA specific ways / ACPI tables". I think it simply
> > > > must be the case that the only agent in the system that is aware of
> > > > the intersection of capabilities between ISA and CXL (platform
> > > > firmware) must mitigate the conflict. I.e. so that the CXL
> > > > infrastructure need not worry about ISA feature capability and vice
> > > > versa. To me, this looks like a platform firmware pre-boot
> > > > configuration menu / switch that turns off CXL (declines to publish
> > > > ACPI0016 devices) if incompatible ISA feature "X" is enabled, or the
> > > > reverse turns off ISA feature "X" if CXL is enabled. In other words,
> > > > the conflict needs to be resolved before the OS boots, setting this
> > > > bit to 0 is not a viable option for mitigating the conflict because
> > > > there is no requirement for the OS to even look at this flag.
> > >
> > > Leaving it to Firmware is easier for the OS, but could be a couple
> > > of issues with that:
> > > Platform firmware may not have a way of disabling ISA feature
> > > if it is directly visible to the OS via CPU ID registers, and the
> > > registers can't be trapped to some FW and values adjusted
> > > on access
> > > Platform Firmware may not know if the OS supports a specific
> > > Feature (code may not exist or not default option etc) so it
> > > may be premature/suboptimal to disable CXL hostbridge
> > > altogether. Although I suppose a UEFI variable type knob
> > > could be adjusted in this case and take effect on reboot.
> > >
> > > Also, for some *future* ISA features where it may be possible and
> > > practical to define ISA feature support discovery per NUMA
> > > node/address range w/ ACPI (prior to userspace ABI being set),
> > > the platform would want to enable the CXL host bridge and leave
> > > selective enablement of the feature to the OS. Yes, this is messy
> > > and best avoided, but it may make sense depending on ISA
> > > feature and how messy it makes user space. I'm personally
> > > not in favor of this latter option, but I'm told this was discussed
> > > in other Coherent interconnect forums and chosen as a path
> > > forward.
> >
> > I think it's reasonable for new stuff to define _OSC or other opt-in
> > requirements to allow the OS to manage ISA vs CXL conflict policy. For
> > existing conflicts the only reliable mechanism is decline to publish
> > ACPI0016 if platform firmware can enumerate an ISA feature that it is
> > not supported on CXL. So I think the proposal here is a recommendation
> > for platform firmware implementations that they are responsible for
> > this conflict resolution unless / until other mechanisms arrive.
> 
> Agreed with one addition.  It should be possible to retrofit negotiation
> for existing features as well. Default policy should be that it's firmware's
> problem but if the OS uses _OSC to negotiate something else then it may
> be possible to be more flexible. As long as the default is safe, relaxing
> that can happen once mechanisms are defined.  The actual decision on
> whether to enable ACPI0016 can for example be pushed into AML code.
> 

Relaxations could happen only when ABI to userspace isn't already set
and in use though, no?

What about Coherent device memory in type 2 devices? That is not EFI
conventional memory but once some part has been onlined by a driver by
calling something like add_memory_driver_managed, that part should 
have all the properties of System RAM as well. Not clear that Platform
firmware would know if a driver intents to online some or all of that
memory. 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory
  2021-04-20 16:53           ` Vikram Sethi
@ 2021-04-20 17:30             ` Jonathan Cameron
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2021-04-20 17:30 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: Dan Williams, linux-cxl, linux-acpi, Natu, Mahesh, Douglas,
	Chet R, Verma, Vishal L, Widawsky, Ben, Samer El-Haj-Mahmoud,
	Thanu Rangarajan, Lorenzo Pieralisi, Mark Hairgrove

On Tue, 20 Apr 2021 16:53:36 +0000
Vikram Sethi <vsethi@nvidia.com> wrote:

> > -----Original Message-----
> > From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> > Sent: Tuesday, April 20, 2021 4:04 AM
> > To: Dan Williams <dan.j.williams@intel.com>
> > Cc: Vikram Sethi <vsethi@nvidia.com>; linux-cxl@vger.kernel.org; linux-
> > acpi@vger.kernel.org; Natu, Mahesh <mahesh.natu@intel.com>; Douglas, Chet R
> > <chet.r.douglas@intel.com>; Verma, Vishal L <vishal.l.verma@intel.com>;
> > Widawsky, Ben <ben.widawsky@intel.com>; Samer El-Haj-Mahmoud <Samer.El-
> > Haj-Mahmoud@arm.com>; Thanu Rangarajan <Thanu.Rangarajan@arm.com>
> > Subject: Re: [ACPI Code First ECN v2]: Generic Port, performace data for hotplug
> > memory
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Mon, 19 Apr 2021 22:08:39 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > On Mon, Apr 19, 2021 at 9:22 PM Vikram Sethi <vsethi@nvidia.com> wrote:  
> > > >  
> > > > > From: Dan Williams <dan.j.williams@intel.com>
> > > > > On Mon, Apr 19, 2021 at 3:56 PM Vikram Sethi <vsethi@nvidia.com> wrote:
> > > > > [..]  
> > > > > > > * Replace all instances of "Initiator" with "Initiator / Port" in "Table
> > > > > > >   5.59 Flags - Generic Initiator Affinity Structure", including the
> > > > > > >   table name.  
> > > > > >
> > > > > > I wanted to discuss the implications of a CXL host bridge implementation  
> > that  
> > > > > > does not set the "Architectural Transactions" bit/flag in the aforementioned
> > > > > > Flags in Table 5.59.
> > > > > >
> > > > > > Since the kernel would be expecting all "System RAM" to have equivalent
> > > > > > Functional properties, if HDM cannot have all the same functionality, then  
> > in  
> > > > > > the absence of ISA specific ACPI tables clarifying what architectural feature  
> > isn't  
> > > > > > supported, the kernel may be forced to not online the HDM memory as  
> > system  
> > > > > > RAM. If there is more granular expression of what features are lacking in a  
> > ISA  
> > > > > > Specific table (eg Memory Tagging/Memory Protection keys not  
> > supported),  
> > > > > > the kernel could choose to not enable that feature in all of system RAM (if
> > > > > > discrepancy discovered at boot), but still online the HDM as System RAM.
> > > > > >
> > > > > > To that end, it may be useful to clarify this to host vendors by way of an
> > > > > > Implementation note (ideally in the CXL specification). Something like:
> > > > > > "CXL hosts are encouraged to support all architectural features in HDM
> > > > > > as they do in CPU attached memory to avoid either the memory from
> > > > > > being onlined as System RAM, or the architectural feature being disabled.
> > > > > > Hosts must indicate architectural parity for HDM in ACPI SRAT
> > > > > > “Generic Port” flags “Architectural transactions” bit by setting it to 1.
> > > > > > A port that sets this bit to 0 will need ISA specific ways/ACPI tables to
> > > > > > describe which specific ISA features would not work in HDM, so an OS
> > > > > > could disable that memory or that feature."
> > > > > >
> > > > > > Thoughts?  
> > > > >
> > > > > The problem, as you know, is that those features are already defined
> > > > > without those "ISA specific ways / ACPI tables". I think it simply
> > > > > must be the case that the only agent in the system that is aware of
> > > > > the intersection of capabilities between ISA and CXL (platform
> > > > > firmware) must mitigate the conflict. I.e. so that the CXL
> > > > > infrastructure need not worry about ISA feature capability and vice
> > > > > versa. To me, this looks like a platform firmware pre-boot
> > > > > configuration menu / switch that turns off CXL (declines to publish
> > > > > ACPI0016 devices) if incompatible ISA feature "X" is enabled, or the
> > > > > reverse turns off ISA feature "X" if CXL is enabled. In other words,
> > > > > the conflict needs to be resolved before the OS boots, setting this
> > > > > bit to 0 is not a viable option for mitigating the conflict because
> > > > > there is no requirement for the OS to even look at this flag.  
> > > >
> > > > Leaving it to Firmware is easier for the OS, but could be a couple
> > > > of issues with that:
> > > > Platform firmware may not have a way of disabling ISA feature
> > > > if it is directly visible to the OS via CPU ID registers, and the
> > > > registers can't be trapped to some FW and values adjusted
> > > > on access
> > > > Platform Firmware may not know if the OS supports a specific
> > > > Feature (code may not exist or not default option etc) so it
> > > > may be premature/suboptimal to disable CXL hostbridge
> > > > altogether. Although I suppose a UEFI variable type knob
> > > > could be adjusted in this case and take effect on reboot.
> > > >
> > > > Also, for some *future* ISA features where it may be possible and
> > > > practical to define ISA feature support discovery per NUMA
> > > > node/address range w/ ACPI (prior to userspace ABI being set),
> > > > the platform would want to enable the CXL host bridge and leave
> > > > selective enablement of the feature to the OS. Yes, this is messy
> > > > and best avoided, but it may make sense depending on ISA
> > > > feature and how messy it makes user space. I'm personally
> > > > not in favor of this latter option, but I'm told this was discussed
> > > > in other Coherent interconnect forums and chosen as a path
> > > > forward.  
> > >
> > > I think it's reasonable for new stuff to define _OSC or other opt-in
> > > requirements to allow the OS to manage ISA vs CXL conflict policy. For
> > > existing conflicts the only reliable mechanism is decline to publish
> > > ACPI0016 if platform firmware can enumerate an ISA feature that it is
> > > not supported on CXL. So I think the proposal here is a recommendation
> > > for platform firmware implementations that they are responsible for
> > > this conflict resolution unless / until other mechanisms arrive.  
> > 
> > Agreed with one addition.  It should be possible to retrofit negotiation
> > for existing features as well. Default policy should be that it's firmware's
> > problem but if the OS uses _OSC to negotiate something else then it may
> > be possible to be more flexible. As long as the default is safe, relaxing
> > that can happen once mechanisms are defined.  The actual decision on
> > whether to enable ACPI0016 can for example be pushed into AML code.
> >   
> 
> Relaxations could happen only when ABI to userspace isn't already set
> and in use though, no?

Indeed, though here we are talking about relaxing from:
- Firmware prevents the enabling of a feature (ISA feature or CXL) to...
- OS negotiates with the firmware to say it knows more (via some new method)
  and hence does want the feature enabled.

So userspace ABI doesn't change, things simply become available that were
previously not on this particular system (because firmware had to hide them)

> 
> What about Coherent device memory in type 2 devices? That is not EFI
> conventional memory but once some part has been onlined by a driver by
> calling something like add_memory_driver_managed, that part should 
> have all the properties of System RAM as well. Not clear that Platform
> firmware would know if a driver intents to online some or all of that
> memory. 

Type 2 devices are a pain to talk about in any remotely general fashion,
but it's certainly possible that their drivers might do something
different after negotiating with the firmware for permission to do so.
As you say though, firmware can't know if a driver intends to online that
memory, so as the potential is there, the safe option is not expose that
device at all (I think we'll fairly quickly develop a way to make this work
given alternative is turn lots of things off :)

My guess is it would be system wide.  So taking example of Memory tagging:
If the platform doesn't support that feature for all CXL attached memory
found (or potentially hotplugged), today it has to turn it off, or
refuse to expose connected CXL devices (it sets ACPI0016 _STA to say it's
not there or something like that).

If later, we get some new mechanism that allows the OS to handle this
mismatch, it can use _OSC or similar to tell the firmware it can handle
this case.  The firmware can then report the ACPI0016 as present.

The AML stuff is similar to the _OSC related _PXM manipulation that let
us safely introduce Generic Initiators - there we rely on the _OSC being
called before and _PXM calls are made and have the _PXM results change
dependent on result of that _OSC negotiation of capabilities.

Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-20 17:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-10  5:34 [ACPI Code First ECN v2]: Generic Port, performace data for hotplug memory Williams, Dan J
2021-04-12 13:32 ` Jonathan Cameron
2021-04-19 22:55 ` Vikram Sethi
2021-04-20  3:19   ` Dan Williams
2021-04-20  4:22     ` Vikram Sethi
2021-04-20  5:08       ` Dan Williams
2021-04-20  9:03         ` Jonathan Cameron
2021-04-20 16:53           ` Vikram Sethi
2021-04-20 17:30             ` Jonathan Cameron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.