All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
@ 2022-10-07 15:21 ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

Whilst I have carried on Huai-Cheng Kuo's series version numbering and
naming, there have been very substantial changes since v6 so I would
suggest fresh review makes sense for anyone who has looked at this before.
In particularly if the Avery design folks could check I haven't broken
anything that would be great.

For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/

Summary of changes:
1) Linux headers definitions for DOE are now upstream so drop that patch.
2) Add CDAT for switch upstream port.
3) Generate 'plausible' default CDAT tables when a file is not provided.
4) General refactoring to calculate the correct table sizes and allocate
   based on that rather than copying from a local static array.
5) Changes from earlier reviews such as matching QEMU type naming style.
6) Moved compliance and SPDM usecases to future patch sets.

Sign-offs on these are complex because the patches were originally developed
by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
Cameron who made substantial changes.

Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to maintain this
code as per the original MAINTAINERS entry.

What's here?

This series brings generic PCI Express Data Object Exchange support (DOE)
DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
config space via a PCIe Extended Capability Structure.
The PCIe spec defines several protocols (including one to discover what
protocols a given DOE instance supports) and other specification such as
CXL define additional protocols using their own vendor IDs.

In this series we make use of the DOE to support the CXL spec defined
Table Access Protocol, specifically to provide access to CDAT - a
table specified in a specification that is hosted by the UEFI forum
and is used to provide runtime discoverability of the sort of information
that would otherwise be available in firmware tables (memory types,
latency and bandwidth information etc).

The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
The version merged did not support interrupts (earlier versions did
so that support in the emulation was tested a while back).

This series provides CDAT emulation for CXL switch upstream ports
and CXL type 3 memory devices. Note that to exercise the switch support
additional Linux kernel patches are needed.
https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
(I'll post a new version of that support shortly)

Additional protocols will be supported by follow on patch sets:
* CXL compliance protocol.
* CMA / SPDM device attestation.
(Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
that tree next week)

Huai-Cheng Kuo (3):
  hw/pci: PCIe Data Object Exchange emulation
  hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
  hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange

Jonathan Cameron (2):
  hw/mem/cxl-type3: Add MSIX support
  hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE

 MAINTAINERS                    |   7 +
 hw/cxl/cxl-cdat.c              | 222 ++++++++++++++++++++
 hw/cxl/meson.build             |   1 +
 hw/mem/cxl_type3.c             | 236 +++++++++++++++++++++
 hw/pci-bridge/cxl_upstream.c   | 182 +++++++++++++++-
 hw/pci/meson.build             |   1 +
 hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
 include/hw/cxl/cxl_component.h |   7 +
 include/hw/cxl/cxl_device.h    |   3 +
 include/hw/cxl/cxl_pci.h       |   1 +
 include/hw/pci/pci_ids.h       |   3 +
 include/hw/pci/pcie.h          |   1 +
 include/hw/pci/pcie_doe.h      | 123 +++++++++++
 include/hw/pci/pcie_regs.h     |   4 +
 15 files changed, 1323 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/cxl-cdat.c
 create mode 100644 hw/pci/pcie_doe.c
 create mode 100644 include/hw/cxl/cxl_cdat.h
 create mode 100644 include/hw/pci/pcie_doe.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
@ 2022-10-07 15:21 ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

Whilst I have carried on Huai-Cheng Kuo's series version numbering and
naming, there have been very substantial changes since v6 so I would
suggest fresh review makes sense for anyone who has looked at this before.
In particularly if the Avery design folks could check I haven't broken
anything that would be great.

For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/

Summary of changes:
1) Linux headers definitions for DOE are now upstream so drop that patch.
2) Add CDAT for switch upstream port.
3) Generate 'plausible' default CDAT tables when a file is not provided.
4) General refactoring to calculate the correct table sizes and allocate
   based on that rather than copying from a local static array.
5) Changes from earlier reviews such as matching QEMU type naming style.
6) Moved compliance and SPDM usecases to future patch sets.

Sign-offs on these are complex because the patches were originally developed
by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
Cameron who made substantial changes.

Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to maintain this
code as per the original MAINTAINERS entry.

What's here?

This series brings generic PCI Express Data Object Exchange support (DOE)
DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
config space via a PCIe Extended Capability Structure.
The PCIe spec defines several protocols (including one to discover what
protocols a given DOE instance supports) and other specification such as
CXL define additional protocols using their own vendor IDs.

In this series we make use of the DOE to support the CXL spec defined
Table Access Protocol, specifically to provide access to CDAT - a
table specified in a specification that is hosted by the UEFI forum
and is used to provide runtime discoverability of the sort of information
that would otherwise be available in firmware tables (memory types,
latency and bandwidth information etc).

The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
The version merged did not support interrupts (earlier versions did
so that support in the emulation was tested a while back).

This series provides CDAT emulation for CXL switch upstream ports
and CXL type 3 memory devices. Note that to exercise the switch support
additional Linux kernel patches are needed.
https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
(I'll post a new version of that support shortly)

Additional protocols will be supported by follow on patch sets:
* CXL compliance protocol.
* CMA / SPDM device attestation.
(Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
that tree next week)

Huai-Cheng Kuo (3):
  hw/pci: PCIe Data Object Exchange emulation
  hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
  hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange

Jonathan Cameron (2):
  hw/mem/cxl-type3: Add MSIX support
  hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE

 MAINTAINERS                    |   7 +
 hw/cxl/cxl-cdat.c              | 222 ++++++++++++++++++++
 hw/cxl/meson.build             |   1 +
 hw/mem/cxl_type3.c             | 236 +++++++++++++++++++++
 hw/pci-bridge/cxl_upstream.c   | 182 +++++++++++++++-
 hw/pci/meson.build             |   1 +
 hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
 include/hw/cxl/cxl_component.h |   7 +
 include/hw/cxl/cxl_device.h    |   3 +
 include/hw/cxl/cxl_pci.h       |   1 +
 include/hw/pci/pci_ids.h       |   3 +
 include/hw/pci/pcie.h          |   1 +
 include/hw/pci/pcie_doe.h      | 123 +++++++++++
 include/hw/pci/pcie_regs.h     |   4 +
 15 files changed, 1323 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/cxl-cdat.c
 create mode 100644 hw/pci/pcie_doe.c
 create mode 100644 include/hw/cxl/cxl_cdat.h
 create mode 100644 include/hw/pci/pcie_doe.h

-- 
2.37.2



^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 1/5] hw/pci: PCIe Data Object Exchange emulation
  2022-10-07 15:21 ` Jonathan Cameron via
@ 2022-10-07 15:21   ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

Emulation of PCIe Data Object Exchange (DOE)
PCIE Base Specification r6.0 6.3 Data Object Exchange

Supports multiple DOE PCIe Extended Capabilities for a single PCIe
device. For each capability, a static array of DOEProtocol should be passed
to pcie_doe_init(). The protocols in that array will be registered under
the DOE capability structure. For each protocol, vendor ID, type, and
corresponding callback function (handle_request()) should be implemented.
This callback function represents how the DOE request for corresponding
protocol will be handled.

pcie_doe_{read/write}_config() must be appended to corresponding PCI
device's config_read/write() handler to enable DOE access. In
pcie_doe_read_config(), false will be returned if pci_config_read()
offset is not within DOE capability range. In pcie_doe_write_config(),
the function will have no affect if the address is not within the related
DOE PCIE extended capability.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 MAINTAINERS                |   7 +
 hw/pci/meson.build         |   1 +
 hw/pci/pcie_doe.c          | 367 +++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci_ids.h   |   3 +
 include/hw/pci/pcie.h      |   1 +
 include/hw/pci/pcie_doe.h  | 123 +++++++++++++
 include/hw/pci/pcie_regs.h |   4 +
 7 files changed, 506 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e1530b51a2..9c8d9280a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1832,6 +1832,13 @@ F: qapi/pci.json
 F: docs/pci*
 F: docs/specs/*pci*
 
+PCIE DOE
+M: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
+M: Chris Browy <cbrowy@avery-design.com>
+S: Supported
+F: include/hw/pci/pcie_doe.h
+F: hw/pci/pcie_doe.c
+
 ACPI/SMBIOS
 M: Michael S. Tsirkin <mst@redhat.com>
 M: Igor Mammedov <imammedo@redhat.com>
diff --git a/hw/pci/meson.build b/hw/pci/meson.build
index bcc9c75919..5aff7ed1c6 100644
--- a/hw/pci/meson.build
+++ b/hw/pci/meson.build
@@ -13,6 +13,7 @@ pci_ss.add(files(
 # allow plugging PCIe devices into PCI buses, include them even if
 # CONFIG_PCI_EXPRESS=n.
 pci_ss.add(files('pcie.c', 'pcie_aer.c'))
+pci_ss.add(files('pcie_doe.c'))
 softmmu_ss.add(when: 'CONFIG_PCI_EXPRESS', if_true: files('pcie_port.c', 'pcie_host.c'))
 softmmu_ss.add_all(when: 'CONFIG_PCI', if_true: pci_ss)
 
diff --git a/hw/pci/pcie_doe.c b/hw/pci/pcie_doe.c
new file mode 100644
index 0000000000..2210f86968
--- /dev/null
+++ b/hw/pci/pcie_doe.c
@@ -0,0 +1,367 @@
+/*
+ * PCIe Data Object Exchange
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/range.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_doe.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+
+#define DWORD_BYTE 4
+
+typedef struct DoeDiscoveryReq {
+    DOEHeader header;
+    uint8_t index;
+    uint8_t reserved[3];
+} QEMU_PACKED DoeDiscoveryReq;
+
+typedef struct DoeDiscoveryRsp {
+    DOEHeader header;
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    uint8_t next_index;
+} QEMU_PACKED DoeDiscoveryRsp;
+
+static bool pcie_doe_discovery(DOECap *doe_cap)
+{
+    DoeDiscoveryReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    DoeDiscoveryRsp rsp;
+    uint8_t index = req->index;
+    DOEProtocol *prot;
+
+    /* Discard request if length does not match DoeDiscoveryReq */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(DoeDiscoveryReq), DWORD_BYTE)) {
+        return false;
+    }
+
+    rsp.header = (DOEHeader) {
+        .vendor_id = PCI_VENDOR_ID_PCI_SIG,
+        .data_obj_type = PCI_SIG_DOE_DISCOVERY,
+        .length = DIV_ROUND_UP(sizeof(DoeDiscoveryRsp), DWORD_BYTE),
+    };
+
+    /* Point to the requested protocol, index 0 must be Discovery */
+    if (index == 0) {
+        rsp.vendor_id = PCI_VENDOR_ID_PCI_SIG;
+        rsp.data_obj_type = PCI_SIG_DOE_DISCOVERY;
+    } else {
+        if (index < doe_cap->protocol_num) {
+            prot = &doe_cap->protocols[index - 1];
+            rsp.vendor_id = prot->vendor_id;
+            rsp.data_obj_type = prot->data_obj_type;
+        } else {
+            rsp.vendor_id = 0xFFFF;
+            rsp.data_obj_type = 0xFF;
+        }
+    }
+
+    if (index + 1 == doe_cap->protocol_num) {
+        rsp.next_index = 0;
+    } else {
+        rsp.next_index = index + 1;
+    }
+
+    pcie_doe_set_rsp(doe_cap, &rsp);
+
+    return true;
+}
+
+static void pcie_doe_reset_mbox(DOECap *st)
+{
+    st->read_mbox_idx = 0;
+    st->read_mbox_len = 0;
+    st->write_mbox_len = 0;
+
+    memset(st->read_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+    memset(st->write_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+}
+
+void pcie_doe_init(PCIDevice *dev, DOECap *doe_cap, uint16_t offset,
+                   DOEProtocol *protocols, bool intr, uint16_t vec)
+{
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_DOE, 0x1, offset,
+                        PCI_DOE_SIZEOF);
+
+    doe_cap->pdev = dev;
+    doe_cap->offset = offset;
+
+    if (intr && (msi_present(dev) || msix_present(dev))) {
+        doe_cap->cap.intr = intr;
+        doe_cap->cap.vec = vec;
+    }
+
+    doe_cap->write_mbox = g_malloc0(PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+    doe_cap->read_mbox = g_malloc0(PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+
+    pcie_doe_reset_mbox(doe_cap);
+
+    doe_cap->protocols = protocols;
+    for (; protocols->vendor_id; protocols++) {
+        doe_cap->protocol_num++;
+    }
+    assert(doe_cap->protocol_num < PCI_DOE_PROTOCOL_NUM_MAX);
+
+    /* Increment to allow for the discovery protocol */
+    doe_cap->protocol_num++;
+}
+
+void pcie_doe_fini(DOECap *doe_cap)
+{
+    g_free(doe_cap->read_mbox);
+    g_free(doe_cap->write_mbox);
+    g_free(doe_cap);
+}
+
+uint32_t pcie_doe_build_protocol(DOEProtocol *p)
+{
+    return DATA_OBJ_BUILD_HEADER1(p->vendor_id, p->data_obj_type);
+}
+
+void *pcie_doe_get_write_mbox_ptr(DOECap *doe_cap)
+{
+    return doe_cap->write_mbox;
+}
+
+/*
+ * Copy the response to read mailbox buffer
+ * This might be called in self-defined handle_request() if a DOE response is
+ * required in the corresponding protocol
+ */
+void pcie_doe_set_rsp(DOECap *doe_cap, void *rsp)
+{
+    uint32_t len = pcie_doe_get_obj_len(rsp);
+
+    memcpy(doe_cap->read_mbox + doe_cap->read_mbox_len, rsp, len * DWORD_BYTE);
+    doe_cap->read_mbox_len += len;
+}
+
+uint32_t pcie_doe_get_obj_len(void *obj)
+{
+    uint32_t len;
+
+    if (!obj) {
+        return 0;
+    }
+
+    /* Only lower 18 bits are valid */
+    len = DATA_OBJ_LEN_MASK(((DOEHeader *)obj)->length);
+
+    /* PCIe r6.0 Table 6.29: a value of 00000h indicates 2^18 DW */
+    return (len) ? len : PCI_DOE_DW_SIZE_MAX;
+}
+
+static void pcie_doe_irq_assert(DOECap *doe_cap)
+{
+    PCIDevice *dev = doe_cap->pdev;
+
+    if (doe_cap->cap.intr && doe_cap->ctrl.intr) {
+        if (doe_cap->status.intr) {
+            return;
+        }
+        doe_cap->status.intr = 1;
+
+        if (msix_enabled(dev)) {
+            msix_notify(dev, doe_cap->cap.vec);
+        } else if (msi_enabled(dev)) {
+            msi_notify(dev, doe_cap->cap.vec);
+        }
+    }
+}
+
+static void pcie_doe_set_ready(DOECap *doe_cap, bool rdy)
+{
+    doe_cap->status.ready = rdy;
+
+    if (rdy) {
+        pcie_doe_irq_assert(doe_cap);
+    }
+}
+
+static void pcie_doe_set_error(DOECap *doe_cap, bool err)
+{
+    doe_cap->status.error = err;
+
+    if (err) {
+        pcie_doe_irq_assert(doe_cap);
+    }
+}
+
+/*
+ * Check incoming request in write_mbox for protocol format
+ */
+static void pcie_doe_prepare_rsp(DOECap *doe_cap)
+{
+    bool success = false;
+    int p;
+    bool (*handle_request)(DOECap *) = NULL;
+
+    if (doe_cap->status.error) {
+        return;
+    }
+
+    if (doe_cap->write_mbox[0] ==
+        DATA_OBJ_BUILD_HEADER1(PCI_VENDOR_ID_PCI_SIG, PCI_SIG_DOE_DISCOVERY)) {
+        handle_request = pcie_doe_discovery;
+    } else {
+        for (p = 0; p < doe_cap->protocol_num - 1; p++) {
+            if (doe_cap->write_mbox[0] ==
+                pcie_doe_build_protocol(&doe_cap->protocols[p])) {
+                handle_request = doe_cap->protocols[p].handle_request;
+                break;
+            }
+        }
+    }
+
+    /*
+     * PCIe r6 DOE 6.30.1:
+     * If the number of DW transferred does not match the
+     * indicated Length for a data object, then the
+     * data object must be silently discarded.
+     */
+    if (handle_request && (doe_cap->write_mbox_len ==
+        pcie_doe_get_obj_len(pcie_doe_get_write_mbox_ptr(doe_cap)))) {
+        success = handle_request(doe_cap);
+    }
+
+    if (success) {
+        pcie_doe_set_ready(doe_cap, 1);
+    } else {
+        pcie_doe_reset_mbox(doe_cap);
+    }
+}
+
+/*
+ * Read from DOE config space.
+ * Return false if the address not within DOE_CAP range.
+ */
+bool pcie_doe_read_config(DOECap *doe_cap, uint32_t addr, int size,
+                          uint32_t *buf)
+{
+    uint32_t shift;
+    uint16_t doe_offset = doe_cap->offset;
+
+    if (!range_covers_byte(doe_offset + PCI_EXP_DOE_CAP,
+                           PCI_DOE_SIZEOF - 4, addr)) {
+        return false;
+    }
+
+    addr -= doe_offset;
+    *buf = 0;
+
+    if (range_covers_byte(PCI_EXP_DOE_CAP, DWORD_BYTE, addr)) {
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_REG, INTR_SUPP,
+                          doe_cap->cap.intr);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_REG, DOE_INTR_MSG_NUM,
+                          doe_cap->cap.vec);
+    } else if (range_covers_byte(PCI_EXP_DOE_CTRL, DWORD_BYTE, addr)) {
+        /* Must return ABORT=0 and GO=0 */
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_CONTROL, DOE_INTR_EN,
+                          doe_cap->ctrl.intr);
+    } else if (range_covers_byte(PCI_EXP_DOE_STATUS, DWORD_BYTE, addr)) {
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_BUSY,
+                          doe_cap->status.busy);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_INTR_STATUS,
+                          doe_cap->status.intr);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_ERROR,
+                          doe_cap->status.error);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DATA_OBJ_RDY,
+                          doe_cap->status.ready);
+    /* Mailbox should be DW accessed */
+    } else if (addr == PCI_EXP_DOE_RD_DATA_MBOX && size == DWORD_BYTE) {
+        if (doe_cap->status.ready && !doe_cap->status.error) {
+            *buf = doe_cap->read_mbox[doe_cap->read_mbox_idx];
+        }
+    }
+
+    /* Process Alignment */
+    shift = addr % DWORD_BYTE;
+    *buf = extract32(*buf, shift * 8, size * 8);
+
+    return true;
+}
+
+/*
+ * Write to DOE config space.
+ * Return if the address not within DOE_CAP range or receives an abort
+ */
+void pcie_doe_write_config(DOECap *doe_cap,
+                           uint32_t addr, uint32_t val, int size)
+{
+    uint16_t doe_offset = doe_cap->offset;
+    uint32_t shift;
+
+    if (!range_covers_byte(doe_offset + PCI_EXP_DOE_CAP,
+                           PCI_DOE_SIZEOF - 4, addr)) {
+        return;
+    }
+
+    /* Process Alignment */
+    shift = addr % DWORD_BYTE;
+    addr -= (doe_offset + shift);
+    val = deposit32(val, shift * 8, size * 8, val);
+
+    switch (addr) {
+    case PCI_EXP_DOE_CTRL:
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_ABORT)) {
+            pcie_doe_set_ready(doe_cap, 0);
+            pcie_doe_set_error(doe_cap, 0);
+            pcie_doe_reset_mbox(doe_cap);
+            return;
+        }
+
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_GO)) {
+            pcie_doe_prepare_rsp(doe_cap);
+        }
+
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_INTR_EN)) {
+            doe_cap->ctrl.intr = 1;
+        /* Clear interrupt bit located within the first byte */
+        } else if (shift == 0) {
+            doe_cap->ctrl.intr = 0;
+        }
+        break;
+    case PCI_EXP_DOE_STATUS:
+        if (FIELD_EX32(val, PCI_DOE_CAP_STATUS, DOE_INTR_STATUS)) {
+            doe_cap->status.intr = 0;
+        }
+        break;
+    case PCI_EXP_DOE_RD_DATA_MBOX:
+        /* Mailbox should be DW accessed */
+        if (size != DWORD_BYTE) {
+            return;
+        }
+        doe_cap->read_mbox_idx++;
+        if (doe_cap->read_mbox_idx == doe_cap->read_mbox_len) {
+            pcie_doe_reset_mbox(doe_cap);
+            pcie_doe_set_ready(doe_cap, 0);
+        } else if (doe_cap->read_mbox_idx > doe_cap->read_mbox_len) {
+            /* Underflow */
+            pcie_doe_set_error(doe_cap, 1);
+        }
+        break;
+    case PCI_EXP_DOE_WR_DATA_MBOX:
+        /* Mailbox should be DW accessed */
+        if (size != DWORD_BYTE) {
+            return;
+        }
+        doe_cap->write_mbox[doe_cap->write_mbox_len] = val;
+        doe_cap->write_mbox_len++;
+        break;
+    case PCI_EXP_DOE_CAP:
+        /* fallthrough */
+    default:
+        break;
+    }
+}
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index d5ddea558b..bc9f834fd1 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -157,6 +157,9 @@
 
 /* Vendors and devices.  Sort key: vendor first, device next. */
 
+/* Ref: PCIe r6.0 Table 6-32 */
+#define PCI_VENDOR_ID_PCI_SIG            0x0001
+
 #define PCI_VENDOR_ID_LSI_LOGIC          0x1000
 #define PCI_DEVICE_ID_LSI_53C810         0x0001
 #define PCI_DEVICE_ID_LSI_53C895A        0x0012
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 798a262a0a..698d3de851 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -26,6 +26,7 @@
 #include "hw/pci/pcie_aer.h"
 #include "hw/pci/pcie_sriov.h"
 #include "hw/hotplug.h"
+#include "hw/pci/pcie_doe.h"
 
 typedef enum {
     /* for attention and power indicator */
diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
new file mode 100644
index 0000000000..ba4d8b03bd
--- /dev/null
+++ b/include/hw/pci/pcie_doe.h
@@ -0,0 +1,123 @@
+/*
+ * PCIe Data Object Exchange
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef PCIE_DOE_H
+#define PCIE_DOE_H
+
+#include "qemu/range.h"
+#include "qemu/typedefs.h"
+#include "hw/register.h"
+
+/*
+ * Reference:
+ * PCIe r6.0 - 7.9.24 Data Object Exchange Extended Capability
+ */
+/* Capabilities Register - r6.0 7.9.24.2 */
+#define PCI_EXP_DOE_CAP             0x04
+REG32(PCI_DOE_CAP_REG, 0)
+    FIELD(PCI_DOE_CAP_REG, INTR_SUPP, 0, 1)
+    FIELD(PCI_DOE_CAP_REG, DOE_INTR_MSG_NUM, 1, 11)
+
+/* Control Register - r6.0 7.9.24.3 */
+#define PCI_EXP_DOE_CTRL            0x08
+REG32(PCI_DOE_CAP_CONTROL, 0)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_ABORT, 0, 1)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_INTR_EN, 1, 1)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_GO, 31, 1)
+
+/* Status Register - r6.0 7.9.24.4 */
+#define PCI_EXP_DOE_STATUS          0x0c
+REG32(PCI_DOE_CAP_STATUS, 0)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_BUSY, 0, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_INTR_STATUS, 1, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_ERROR, 2, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DATA_OBJ_RDY, 31, 1)
+
+/* Write Data Mailbox Register - r6.0 7.9.24.5 */
+#define PCI_EXP_DOE_WR_DATA_MBOX    0x10
+
+/* Read Data Mailbox Register - 7.9.xx.6 */
+#define PCI_EXP_DOE_RD_DATA_MBOX    0x14
+
+/* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
+#define PCI_SIG_DOE_DISCOVERY       0x00
+
+#define PCI_DOE_DW_SIZE_MAX         (1 << 18)
+#define PCI_DOE_PROTOCOL_NUM_MAX    256
+
+#define DATA_OBJ_BUILD_HEADER1(v, p)    (((p) << 16) | (v))
+#define DATA_OBJ_LEN_MASK(len)          ((len) & (PCI_DOE_DW_SIZE_MAX - 1))
+
+typedef struct DOEHeader DOEHeader;
+typedef struct DOEProtocol DOEProtocol;
+typedef struct DOECap DOECap;
+
+struct DOEHeader {
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    uint8_t reserved;
+    uint32_t length;
+} QEMU_PACKED;
+
+/* Protocol infos and rsp function callback */
+struct DOEProtocol {
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    bool (*handle_request)(DOECap *);
+};
+
+struct DOECap {
+    /* Owner */
+    PCIDevice *pdev;
+
+    uint16_t offset;
+
+    struct {
+        bool intr;
+        uint16_t vec;
+    } cap;
+
+    struct {
+        bool abort;
+        bool intr;
+        bool go;
+    } ctrl;
+
+    struct {
+        bool busy;
+        bool intr;
+        bool error;
+        bool ready;
+    } status;
+
+    uint32_t *write_mbox;
+    uint32_t *read_mbox;
+
+    /* Mailbox position indicator */
+    uint32_t read_mbox_idx;
+    uint32_t read_mbox_len;
+    uint32_t write_mbox_len;
+
+    /* Protocols and its callback response */
+    DOEProtocol *protocols;
+    uint16_t protocol_num;
+};
+
+void pcie_doe_init(PCIDevice *pdev, DOECap *doe_cap, uint16_t offset,
+                   DOEProtocol *protocols, bool intr, uint16_t vec);
+void pcie_doe_fini(DOECap *doe_cap);
+bool pcie_doe_read_config(DOECap *doe_cap, uint32_t addr, int size,
+                          uint32_t *buf);
+void pcie_doe_write_config(DOECap *doe_cap, uint32_t addr,
+                           uint32_t val, int size);
+uint32_t pcie_doe_build_protocol(DOEProtocol *p);
+void *pcie_doe_get_write_mbox_ptr(DOECap *doe_cap);
+void pcie_doe_set_rsp(DOECap *doe_cap, void *rsp);
+uint32_t pcie_doe_get_obj_len(void *obj);
+#endif /* PCIE_DOE_H */
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 1db86b0ec4..963dc2e170 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -179,4 +179,8 @@ typedef enum PCIExpLinkWidth {
 #define PCI_ACS_VER                     0x1
 #define PCI_ACS_SIZEOF                  8
 
+/* DOE Capability Register Fields */
+#define PCI_DOE_VER                     0x1
+#define PCI_DOE_SIZEOF                  24
+
 #endif /* QEMU_PCIE_REGS_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 1/5] hw/pci: PCIe Data Object Exchange emulation
@ 2022-10-07 15:21   ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

Emulation of PCIe Data Object Exchange (DOE)
PCIE Base Specification r6.0 6.3 Data Object Exchange

Supports multiple DOE PCIe Extended Capabilities for a single PCIe
device. For each capability, a static array of DOEProtocol should be passed
to pcie_doe_init(). The protocols in that array will be registered under
the DOE capability structure. For each protocol, vendor ID, type, and
corresponding callback function (handle_request()) should be implemented.
This callback function represents how the DOE request for corresponding
protocol will be handled.

pcie_doe_{read/write}_config() must be appended to corresponding PCI
device's config_read/write() handler to enable DOE access. In
pcie_doe_read_config(), false will be returned if pci_config_read()
offset is not within DOE capability range. In pcie_doe_write_config(),
the function will have no affect if the address is not within the related
DOE PCIE extended capability.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 MAINTAINERS                |   7 +
 hw/pci/meson.build         |   1 +
 hw/pci/pcie_doe.c          | 367 +++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci_ids.h   |   3 +
 include/hw/pci/pcie.h      |   1 +
 include/hw/pci/pcie_doe.h  | 123 +++++++++++++
 include/hw/pci/pcie_regs.h |   4 +
 7 files changed, 506 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e1530b51a2..9c8d9280a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1832,6 +1832,13 @@ F: qapi/pci.json
 F: docs/pci*
 F: docs/specs/*pci*
 
+PCIE DOE
+M: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
+M: Chris Browy <cbrowy@avery-design.com>
+S: Supported
+F: include/hw/pci/pcie_doe.h
+F: hw/pci/pcie_doe.c
+
 ACPI/SMBIOS
 M: Michael S. Tsirkin <mst@redhat.com>
 M: Igor Mammedov <imammedo@redhat.com>
diff --git a/hw/pci/meson.build b/hw/pci/meson.build
index bcc9c75919..5aff7ed1c6 100644
--- a/hw/pci/meson.build
+++ b/hw/pci/meson.build
@@ -13,6 +13,7 @@ pci_ss.add(files(
 # allow plugging PCIe devices into PCI buses, include them even if
 # CONFIG_PCI_EXPRESS=n.
 pci_ss.add(files('pcie.c', 'pcie_aer.c'))
+pci_ss.add(files('pcie_doe.c'))
 softmmu_ss.add(when: 'CONFIG_PCI_EXPRESS', if_true: files('pcie_port.c', 'pcie_host.c'))
 softmmu_ss.add_all(when: 'CONFIG_PCI', if_true: pci_ss)
 
diff --git a/hw/pci/pcie_doe.c b/hw/pci/pcie_doe.c
new file mode 100644
index 0000000000..2210f86968
--- /dev/null
+++ b/hw/pci/pcie_doe.c
@@ -0,0 +1,367 @@
+/*
+ * PCIe Data Object Exchange
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/range.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_doe.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+
+#define DWORD_BYTE 4
+
+typedef struct DoeDiscoveryReq {
+    DOEHeader header;
+    uint8_t index;
+    uint8_t reserved[3];
+} QEMU_PACKED DoeDiscoveryReq;
+
+typedef struct DoeDiscoveryRsp {
+    DOEHeader header;
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    uint8_t next_index;
+} QEMU_PACKED DoeDiscoveryRsp;
+
+static bool pcie_doe_discovery(DOECap *doe_cap)
+{
+    DoeDiscoveryReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    DoeDiscoveryRsp rsp;
+    uint8_t index = req->index;
+    DOEProtocol *prot;
+
+    /* Discard request if length does not match DoeDiscoveryReq */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(DoeDiscoveryReq), DWORD_BYTE)) {
+        return false;
+    }
+
+    rsp.header = (DOEHeader) {
+        .vendor_id = PCI_VENDOR_ID_PCI_SIG,
+        .data_obj_type = PCI_SIG_DOE_DISCOVERY,
+        .length = DIV_ROUND_UP(sizeof(DoeDiscoveryRsp), DWORD_BYTE),
+    };
+
+    /* Point to the requested protocol, index 0 must be Discovery */
+    if (index == 0) {
+        rsp.vendor_id = PCI_VENDOR_ID_PCI_SIG;
+        rsp.data_obj_type = PCI_SIG_DOE_DISCOVERY;
+    } else {
+        if (index < doe_cap->protocol_num) {
+            prot = &doe_cap->protocols[index - 1];
+            rsp.vendor_id = prot->vendor_id;
+            rsp.data_obj_type = prot->data_obj_type;
+        } else {
+            rsp.vendor_id = 0xFFFF;
+            rsp.data_obj_type = 0xFF;
+        }
+    }
+
+    if (index + 1 == doe_cap->protocol_num) {
+        rsp.next_index = 0;
+    } else {
+        rsp.next_index = index + 1;
+    }
+
+    pcie_doe_set_rsp(doe_cap, &rsp);
+
+    return true;
+}
+
+static void pcie_doe_reset_mbox(DOECap *st)
+{
+    st->read_mbox_idx = 0;
+    st->read_mbox_len = 0;
+    st->write_mbox_len = 0;
+
+    memset(st->read_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+    memset(st->write_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+}
+
+void pcie_doe_init(PCIDevice *dev, DOECap *doe_cap, uint16_t offset,
+                   DOEProtocol *protocols, bool intr, uint16_t vec)
+{
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_DOE, 0x1, offset,
+                        PCI_DOE_SIZEOF);
+
+    doe_cap->pdev = dev;
+    doe_cap->offset = offset;
+
+    if (intr && (msi_present(dev) || msix_present(dev))) {
+        doe_cap->cap.intr = intr;
+        doe_cap->cap.vec = vec;
+    }
+
+    doe_cap->write_mbox = g_malloc0(PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+    doe_cap->read_mbox = g_malloc0(PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+
+    pcie_doe_reset_mbox(doe_cap);
+
+    doe_cap->protocols = protocols;
+    for (; protocols->vendor_id; protocols++) {
+        doe_cap->protocol_num++;
+    }
+    assert(doe_cap->protocol_num < PCI_DOE_PROTOCOL_NUM_MAX);
+
+    /* Increment to allow for the discovery protocol */
+    doe_cap->protocol_num++;
+}
+
+void pcie_doe_fini(DOECap *doe_cap)
+{
+    g_free(doe_cap->read_mbox);
+    g_free(doe_cap->write_mbox);
+    g_free(doe_cap);
+}
+
+uint32_t pcie_doe_build_protocol(DOEProtocol *p)
+{
+    return DATA_OBJ_BUILD_HEADER1(p->vendor_id, p->data_obj_type);
+}
+
+void *pcie_doe_get_write_mbox_ptr(DOECap *doe_cap)
+{
+    return doe_cap->write_mbox;
+}
+
+/*
+ * Copy the response to read mailbox buffer
+ * This might be called in self-defined handle_request() if a DOE response is
+ * required in the corresponding protocol
+ */
+void pcie_doe_set_rsp(DOECap *doe_cap, void *rsp)
+{
+    uint32_t len = pcie_doe_get_obj_len(rsp);
+
+    memcpy(doe_cap->read_mbox + doe_cap->read_mbox_len, rsp, len * DWORD_BYTE);
+    doe_cap->read_mbox_len += len;
+}
+
+uint32_t pcie_doe_get_obj_len(void *obj)
+{
+    uint32_t len;
+
+    if (!obj) {
+        return 0;
+    }
+
+    /* Only lower 18 bits are valid */
+    len = DATA_OBJ_LEN_MASK(((DOEHeader *)obj)->length);
+
+    /* PCIe r6.0 Table 6.29: a value of 00000h indicates 2^18 DW */
+    return (len) ? len : PCI_DOE_DW_SIZE_MAX;
+}
+
+static void pcie_doe_irq_assert(DOECap *doe_cap)
+{
+    PCIDevice *dev = doe_cap->pdev;
+
+    if (doe_cap->cap.intr && doe_cap->ctrl.intr) {
+        if (doe_cap->status.intr) {
+            return;
+        }
+        doe_cap->status.intr = 1;
+
+        if (msix_enabled(dev)) {
+            msix_notify(dev, doe_cap->cap.vec);
+        } else if (msi_enabled(dev)) {
+            msi_notify(dev, doe_cap->cap.vec);
+        }
+    }
+}
+
+static void pcie_doe_set_ready(DOECap *doe_cap, bool rdy)
+{
+    doe_cap->status.ready = rdy;
+
+    if (rdy) {
+        pcie_doe_irq_assert(doe_cap);
+    }
+}
+
+static void pcie_doe_set_error(DOECap *doe_cap, bool err)
+{
+    doe_cap->status.error = err;
+
+    if (err) {
+        pcie_doe_irq_assert(doe_cap);
+    }
+}
+
+/*
+ * Check incoming request in write_mbox for protocol format
+ */
+static void pcie_doe_prepare_rsp(DOECap *doe_cap)
+{
+    bool success = false;
+    int p;
+    bool (*handle_request)(DOECap *) = NULL;
+
+    if (doe_cap->status.error) {
+        return;
+    }
+
+    if (doe_cap->write_mbox[0] ==
+        DATA_OBJ_BUILD_HEADER1(PCI_VENDOR_ID_PCI_SIG, PCI_SIG_DOE_DISCOVERY)) {
+        handle_request = pcie_doe_discovery;
+    } else {
+        for (p = 0; p < doe_cap->protocol_num - 1; p++) {
+            if (doe_cap->write_mbox[0] ==
+                pcie_doe_build_protocol(&doe_cap->protocols[p])) {
+                handle_request = doe_cap->protocols[p].handle_request;
+                break;
+            }
+        }
+    }
+
+    /*
+     * PCIe r6 DOE 6.30.1:
+     * If the number of DW transferred does not match the
+     * indicated Length for a data object, then the
+     * data object must be silently discarded.
+     */
+    if (handle_request && (doe_cap->write_mbox_len ==
+        pcie_doe_get_obj_len(pcie_doe_get_write_mbox_ptr(doe_cap)))) {
+        success = handle_request(doe_cap);
+    }
+
+    if (success) {
+        pcie_doe_set_ready(doe_cap, 1);
+    } else {
+        pcie_doe_reset_mbox(doe_cap);
+    }
+}
+
+/*
+ * Read from DOE config space.
+ * Return false if the address not within DOE_CAP range.
+ */
+bool pcie_doe_read_config(DOECap *doe_cap, uint32_t addr, int size,
+                          uint32_t *buf)
+{
+    uint32_t shift;
+    uint16_t doe_offset = doe_cap->offset;
+
+    if (!range_covers_byte(doe_offset + PCI_EXP_DOE_CAP,
+                           PCI_DOE_SIZEOF - 4, addr)) {
+        return false;
+    }
+
+    addr -= doe_offset;
+    *buf = 0;
+
+    if (range_covers_byte(PCI_EXP_DOE_CAP, DWORD_BYTE, addr)) {
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_REG, INTR_SUPP,
+                          doe_cap->cap.intr);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_REG, DOE_INTR_MSG_NUM,
+                          doe_cap->cap.vec);
+    } else if (range_covers_byte(PCI_EXP_DOE_CTRL, DWORD_BYTE, addr)) {
+        /* Must return ABORT=0 and GO=0 */
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_CONTROL, DOE_INTR_EN,
+                          doe_cap->ctrl.intr);
+    } else if (range_covers_byte(PCI_EXP_DOE_STATUS, DWORD_BYTE, addr)) {
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_BUSY,
+                          doe_cap->status.busy);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_INTR_STATUS,
+                          doe_cap->status.intr);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_ERROR,
+                          doe_cap->status.error);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DATA_OBJ_RDY,
+                          doe_cap->status.ready);
+    /* Mailbox should be DW accessed */
+    } else if (addr == PCI_EXP_DOE_RD_DATA_MBOX && size == DWORD_BYTE) {
+        if (doe_cap->status.ready && !doe_cap->status.error) {
+            *buf = doe_cap->read_mbox[doe_cap->read_mbox_idx];
+        }
+    }
+
+    /* Process Alignment */
+    shift = addr % DWORD_BYTE;
+    *buf = extract32(*buf, shift * 8, size * 8);
+
+    return true;
+}
+
+/*
+ * Write to DOE config space.
+ * Return if the address not within DOE_CAP range or receives an abort
+ */
+void pcie_doe_write_config(DOECap *doe_cap,
+                           uint32_t addr, uint32_t val, int size)
+{
+    uint16_t doe_offset = doe_cap->offset;
+    uint32_t shift;
+
+    if (!range_covers_byte(doe_offset + PCI_EXP_DOE_CAP,
+                           PCI_DOE_SIZEOF - 4, addr)) {
+        return;
+    }
+
+    /* Process Alignment */
+    shift = addr % DWORD_BYTE;
+    addr -= (doe_offset + shift);
+    val = deposit32(val, shift * 8, size * 8, val);
+
+    switch (addr) {
+    case PCI_EXP_DOE_CTRL:
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_ABORT)) {
+            pcie_doe_set_ready(doe_cap, 0);
+            pcie_doe_set_error(doe_cap, 0);
+            pcie_doe_reset_mbox(doe_cap);
+            return;
+        }
+
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_GO)) {
+            pcie_doe_prepare_rsp(doe_cap);
+        }
+
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_INTR_EN)) {
+            doe_cap->ctrl.intr = 1;
+        /* Clear interrupt bit located within the first byte */
+        } else if (shift == 0) {
+            doe_cap->ctrl.intr = 0;
+        }
+        break;
+    case PCI_EXP_DOE_STATUS:
+        if (FIELD_EX32(val, PCI_DOE_CAP_STATUS, DOE_INTR_STATUS)) {
+            doe_cap->status.intr = 0;
+        }
+        break;
+    case PCI_EXP_DOE_RD_DATA_MBOX:
+        /* Mailbox should be DW accessed */
+        if (size != DWORD_BYTE) {
+            return;
+        }
+        doe_cap->read_mbox_idx++;
+        if (doe_cap->read_mbox_idx == doe_cap->read_mbox_len) {
+            pcie_doe_reset_mbox(doe_cap);
+            pcie_doe_set_ready(doe_cap, 0);
+        } else if (doe_cap->read_mbox_idx > doe_cap->read_mbox_len) {
+            /* Underflow */
+            pcie_doe_set_error(doe_cap, 1);
+        }
+        break;
+    case PCI_EXP_DOE_WR_DATA_MBOX:
+        /* Mailbox should be DW accessed */
+        if (size != DWORD_BYTE) {
+            return;
+        }
+        doe_cap->write_mbox[doe_cap->write_mbox_len] = val;
+        doe_cap->write_mbox_len++;
+        break;
+    case PCI_EXP_DOE_CAP:
+        /* fallthrough */
+    default:
+        break;
+    }
+}
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index d5ddea558b..bc9f834fd1 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -157,6 +157,9 @@
 
 /* Vendors and devices.  Sort key: vendor first, device next. */
 
+/* Ref: PCIe r6.0 Table 6-32 */
+#define PCI_VENDOR_ID_PCI_SIG            0x0001
+
 #define PCI_VENDOR_ID_LSI_LOGIC          0x1000
 #define PCI_DEVICE_ID_LSI_53C810         0x0001
 #define PCI_DEVICE_ID_LSI_53C895A        0x0012
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 798a262a0a..698d3de851 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -26,6 +26,7 @@
 #include "hw/pci/pcie_aer.h"
 #include "hw/pci/pcie_sriov.h"
 #include "hw/hotplug.h"
+#include "hw/pci/pcie_doe.h"
 
 typedef enum {
     /* for attention and power indicator */
diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
new file mode 100644
index 0000000000..ba4d8b03bd
--- /dev/null
+++ b/include/hw/pci/pcie_doe.h
@@ -0,0 +1,123 @@
+/*
+ * PCIe Data Object Exchange
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef PCIE_DOE_H
+#define PCIE_DOE_H
+
+#include "qemu/range.h"
+#include "qemu/typedefs.h"
+#include "hw/register.h"
+
+/*
+ * Reference:
+ * PCIe r6.0 - 7.9.24 Data Object Exchange Extended Capability
+ */
+/* Capabilities Register - r6.0 7.9.24.2 */
+#define PCI_EXP_DOE_CAP             0x04
+REG32(PCI_DOE_CAP_REG, 0)
+    FIELD(PCI_DOE_CAP_REG, INTR_SUPP, 0, 1)
+    FIELD(PCI_DOE_CAP_REG, DOE_INTR_MSG_NUM, 1, 11)
+
+/* Control Register - r6.0 7.9.24.3 */
+#define PCI_EXP_DOE_CTRL            0x08
+REG32(PCI_DOE_CAP_CONTROL, 0)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_ABORT, 0, 1)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_INTR_EN, 1, 1)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_GO, 31, 1)
+
+/* Status Register - r6.0 7.9.24.4 */
+#define PCI_EXP_DOE_STATUS          0x0c
+REG32(PCI_DOE_CAP_STATUS, 0)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_BUSY, 0, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_INTR_STATUS, 1, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_ERROR, 2, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DATA_OBJ_RDY, 31, 1)
+
+/* Write Data Mailbox Register - r6.0 7.9.24.5 */
+#define PCI_EXP_DOE_WR_DATA_MBOX    0x10
+
+/* Read Data Mailbox Register - 7.9.xx.6 */
+#define PCI_EXP_DOE_RD_DATA_MBOX    0x14
+
+/* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
+#define PCI_SIG_DOE_DISCOVERY       0x00
+
+#define PCI_DOE_DW_SIZE_MAX         (1 << 18)
+#define PCI_DOE_PROTOCOL_NUM_MAX    256
+
+#define DATA_OBJ_BUILD_HEADER1(v, p)    (((p) << 16) | (v))
+#define DATA_OBJ_LEN_MASK(len)          ((len) & (PCI_DOE_DW_SIZE_MAX - 1))
+
+typedef struct DOEHeader DOEHeader;
+typedef struct DOEProtocol DOEProtocol;
+typedef struct DOECap DOECap;
+
+struct DOEHeader {
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    uint8_t reserved;
+    uint32_t length;
+} QEMU_PACKED;
+
+/* Protocol infos and rsp function callback */
+struct DOEProtocol {
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    bool (*handle_request)(DOECap *);
+};
+
+struct DOECap {
+    /* Owner */
+    PCIDevice *pdev;
+
+    uint16_t offset;
+
+    struct {
+        bool intr;
+        uint16_t vec;
+    } cap;
+
+    struct {
+        bool abort;
+        bool intr;
+        bool go;
+    } ctrl;
+
+    struct {
+        bool busy;
+        bool intr;
+        bool error;
+        bool ready;
+    } status;
+
+    uint32_t *write_mbox;
+    uint32_t *read_mbox;
+
+    /* Mailbox position indicator */
+    uint32_t read_mbox_idx;
+    uint32_t read_mbox_len;
+    uint32_t write_mbox_len;
+
+    /* Protocols and its callback response */
+    DOEProtocol *protocols;
+    uint16_t protocol_num;
+};
+
+void pcie_doe_init(PCIDevice *pdev, DOECap *doe_cap, uint16_t offset,
+                   DOEProtocol *protocols, bool intr, uint16_t vec);
+void pcie_doe_fini(DOECap *doe_cap);
+bool pcie_doe_read_config(DOECap *doe_cap, uint32_t addr, int size,
+                          uint32_t *buf);
+void pcie_doe_write_config(DOECap *doe_cap, uint32_t addr,
+                           uint32_t val, int size);
+uint32_t pcie_doe_build_protocol(DOEProtocol *p);
+void *pcie_doe_get_write_mbox_ptr(DOECap *doe_cap);
+void pcie_doe_set_rsp(DOECap *doe_cap, void *rsp);
+uint32_t pcie_doe_get_obj_len(void *obj);
+#endif /* PCIE_DOE_H */
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 1db86b0ec4..963dc2e170 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -179,4 +179,8 @@ typedef enum PCIExpLinkWidth {
 #define PCI_ACS_VER                     0x1
 #define PCI_ACS_SIZEOF                  8
 
+/* DOE Capability Register Fields */
+#define PCI_DOE_VER                     0x1
+#define PCI_DOE_SIZEOF                  24
+
 #endif /* QEMU_PCIE_REGS_H */
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 2/5] hw/mem/cxl-type3: Add MSIX support
  2022-10-07 15:21 ` Jonathan Cameron via
@ 2022-10-07 15:21   ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

This will be used by several upcoming patch sets so break it out
such that it doesn't matter which one lands first.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/mem/cxl_type3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a71bf1afeb..568c9d62f5 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -13,6 +13,7 @@
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
 #include "hw/cxl/cxl.h"
+#include "hw/pci/msix.h"
 
 /*
  * Null value of all Fs suggested by IEEE RA guidelines for use of
@@ -146,6 +147,8 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     ComponentRegisters *regs = &cxl_cstate->crb;
     MemoryRegion *mr = &regs->component_registers;
     uint8_t *pci_conf = pci_dev->config;
+    unsigned short msix_num = 1;
+    int i;
 
     if (!cxl_setup_memory(ct3d, errp)) {
         return;
@@ -180,6 +183,12 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
                      PCI_BASE_ADDRESS_SPACE_MEMORY |
                          PCI_BASE_ADDRESS_MEM_TYPE_64,
                      &ct3d->cxl_dstate.device_registers);
+
+    /* MSI(-X) Initailization */
+    msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
+    for (i = 0; i < msix_num; i++) {
+        msix_vector_use(pci_dev, i);
+    }
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 2/5] hw/mem/cxl-type3: Add MSIX support
@ 2022-10-07 15:21   ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

This will be used by several upcoming patch sets so break it out
such that it doesn't matter which one lands first.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/mem/cxl_type3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a71bf1afeb..568c9d62f5 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -13,6 +13,7 @@
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
 #include "hw/cxl/cxl.h"
+#include "hw/pci/msix.h"
 
 /*
  * Null value of all Fs suggested by IEEE RA guidelines for use of
@@ -146,6 +147,8 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     ComponentRegisters *regs = &cxl_cstate->crb;
     MemoryRegion *mr = &regs->component_registers;
     uint8_t *pci_conf = pci_dev->config;
+    unsigned short msix_num = 1;
+    int i;
 
     if (!cxl_setup_memory(ct3d, errp)) {
         return;
@@ -180,6 +183,12 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
                      PCI_BASE_ADDRESS_SPACE_MEMORY |
                          PCI_BASE_ADDRESS_MEM_TYPE_64,
                      &ct3d->cxl_dstate.device_registers);
+
+    /* MSI(-X) Initailization */
+    msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
+    for (i = 0; i < msix_num; i++) {
+        msix_vector_use(pci_dev, i);
+    }
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
  2022-10-07 15:21 ` Jonathan Cameron via
@ 2022-10-07 15:21   ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

The Data Object Exchange implementation of CXL Coherent Device Attribute
Table (CDAT). This implementation is referring to "Coherent Device
Attribute Table Specification, Rev. 1.02, Oct. 2020" and "Compute
Express Link Specification, Rev. 2.0, Oct. 2020"

This patch adds core support that will be shared by both
end-points and switch port emulation.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
Changes since RFC:
- Split out libary code from specific device.
---
 hw/cxl/cxl-cdat.c              | 222 +++++++++++++++++++++++++++++++++
 hw/cxl/meson.build             |   1 +
 include/hw/cxl/cxl_cdat.h      | 165 ++++++++++++++++++++++++
 include/hw/cxl/cxl_component.h |   7 ++
 include/hw/cxl/cxl_device.h    |   3 +
 include/hw/cxl/cxl_pci.h       |   1 +
 6 files changed, 399 insertions(+)

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
new file mode 100644
index 0000000000..137178632b
--- /dev/null
+++ b/hw/cxl/cxl-cdat.c
@@ -0,0 +1,222 @@
+/*
+ * CXL CDAT Structure
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+
+static void cdat_len_check(CDATSubHeader *hdr, Error **errp)
+{
+    assert(hdr->length);
+    assert(hdr->reserved == 0);
+
+    switch (hdr->type) {
+    case CDAT_TYPE_DSMAS:
+        assert(hdr->length == sizeof(CDATDsmas));
+        break;
+    case CDAT_TYPE_DSLBIS:
+        assert(hdr->length == sizeof(CDATDslbis));
+        break;
+    case CDAT_TYPE_DSMSCIS:
+        assert(hdr->length == sizeof(CDATDsmscis));
+        break;
+    case CDAT_TYPE_DSIS:
+        assert(hdr->length == sizeof(CDATDsis));
+        break;
+    case CDAT_TYPE_DSEMTS:
+        assert(hdr->length == sizeof(CDATDsemts));
+        break;
+    case CDAT_TYPE_SSLBIS:
+        assert(hdr->length >= sizeof(CDATSslbisHeader));
+        assert((hdr->length - sizeof(CDATSslbisHeader)) %
+               sizeof(CDATSslbe) == 0);
+        break;
+    default:
+        error_setg(errp, "Type %d is reserved", hdr->type);
+    }
+}
+
+static void ct3_build_cdat(CDATObject *cdat, Error **errp)
+{
+    g_autofree CDATTableHeader *cdat_header = NULL;
+    g_autofree CDATEntry *cdat_st = NULL;
+    uint8_t sum = 0;
+    int ent, i;
+
+    /* Use default table if fopen == NULL */
+    assert(cdat->build_cdat_table);
+
+    cdat_header = g_malloc0(sizeof(*cdat_header));
+    if (!cdat_header) {
+        error_setg(errp, "Failed to allocate CDAT header");
+        return;
+    }
+
+    cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf, cdat->private);
+
+    if (!cdat->built_buf_len) {
+        /* Build later as not all data available yet */
+        cdat->to_update = true;
+        return;
+    }
+    cdat->to_update = false;
+
+    cdat_st = g_malloc0(sizeof(*cdat_st) * (cdat->built_buf_len + 1));
+    if (!cdat_st) {
+        error_setg(errp, "Failed to allocate CDAT entry array");
+        return;
+    }
+
+    /* Entry 0 for CDAT header, starts with Entry 1 */
+    for (ent = 1; ent < cdat->built_buf_len + 1; ent++) {
+        CDATSubHeader *hdr = cdat->built_buf[ent - 1];
+        uint8_t *buf = (uint8_t *)cdat->built_buf[ent - 1];
+
+        cdat_st[ent].base = hdr;
+        cdat_st[ent].length = hdr->length;
+
+        cdat_header->length += hdr->length;
+        for (i = 0; i < hdr->length; i++) {
+            sum += buf[i];
+        }
+    }
+
+    /* CDAT header */
+    cdat_header->revision = CXL_CDAT_REV;
+    /* For now, no runtime updates */
+    cdat_header->sequence = 0;
+    cdat_header->length += sizeof(CDATTableHeader);
+    sum += cdat_header->revision + cdat_header->sequence +
+        cdat_header->length;
+    /* Sum of all bytes including checksum must be 0 */
+    cdat_header->checksum = ~sum + 1;
+
+    cdat_st[0].base = g_steal_pointer(&cdat_header);
+    cdat_st[0].length = sizeof(*cdat_header);
+    cdat->entry_len = 1 + cdat->built_buf_len;
+    cdat->entry = g_steal_pointer(&cdat_st);
+}
+
+static void ct3_load_cdat(CDATObject *cdat, Error **errp)
+{
+    g_autofree CDATEntry *cdat_st = NULL;
+    uint8_t sum = 0;
+    int num_ent;
+    int i = 0, ent = 1, file_size = 0;
+    CDATSubHeader *hdr;
+    FILE *fp = NULL;
+
+    /* Read CDAT file and create its cache */
+    fp = fopen(cdat->filename, "r");
+    if (!fp) {
+        error_setg(errp, "CDAT: Unable to open file");
+        return;
+    }
+
+    fseek(fp, 0, SEEK_END);
+    file_size = ftell(fp);
+    fseek(fp, 0, SEEK_SET);
+    cdat->buf = g_malloc0(file_size);
+
+    if (fread(cdat->buf, file_size, 1, fp) == 0) {
+        error_setg(errp, "CDAT: File read failed");
+        return;
+    }
+
+    fclose(fp);
+
+    if (file_size < sizeof(CDATTableHeader)) {
+        error_setg(errp, "CDAT: File too short");
+        return;
+    }
+    i = sizeof(CDATTableHeader);
+    num_ent = 1;
+    while (i < file_size) {
+        hdr = (CDATSubHeader *)(cdat->buf + i);
+        cdat_len_check(hdr, errp);
+        i += hdr->length;
+        num_ent++;
+    }
+    if (i != file_size) {
+        error_setg(errp, "CDAT: File length missmatch");
+        return;
+    }
+
+    cdat_st = g_malloc0(sizeof(*cdat_st) * num_ent);
+    if (!cdat_st) {
+        error_setg(errp, "CDAT: Failed to allocate entry array");
+        return;
+    }
+
+    /* Set CDAT header, Entry = 0 */
+    cdat_st[0].base = cdat->buf;
+    cdat_st[0].length = sizeof(CDATTableHeader);
+    i = 0;
+
+    while (i < cdat_st[0].length) {
+        sum += cdat->buf[i++];
+    }
+
+    /* Read CDAT structures */
+    while (i < file_size) {
+        hdr = (CDATSubHeader *)(cdat->buf + i);
+        cdat_len_check(hdr, errp);
+
+        cdat_st[ent].base = hdr;
+        cdat_st[ent].length = hdr->length;
+
+        while (cdat->buf + i <
+               (uint8_t *)cdat_st[ent].base + cdat_st[ent].length) {
+            assert(i < file_size);
+            sum += cdat->buf[i++];
+        }
+
+        ent++;
+    }
+
+    if (sum != 0) {
+        warn_report("CDAT: Found checksum mismatch in %s", cdat->filename);
+    }
+    cdat->entry_len = num_ent;
+    cdat->entry = g_steal_pointer(&cdat_st);
+}
+
+void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    if (cdat->filename) {
+        ct3_load_cdat(cdat, errp);
+    } else {
+        ct3_build_cdat(cdat, errp);
+    }
+}
+
+void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    if (cdat->to_update) {
+        ct3_build_cdat(cdat, errp);
+    }
+}
+
+void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    free(cdat->entry);
+    if (cdat->built_buf)
+        cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
+                              cdat->private);
+    if (cdat->buf)
+        free(cdat->buf);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index f117b99949..cfa95ffd40 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -4,6 +4,7 @@ softmmu_ss.add(when: 'CONFIG_CXL',
                    'cxl-device-utils.c',
                    'cxl-mailbox-utils.c',
                    'cxl-host.c',
+                   'cxl-cdat.c',
                ),
                if_false: files(
                    'cxl-host-stubs.c',
diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
new file mode 100644
index 0000000000..fdb1fa98f4
--- /dev/null
+++ b/include/hw/cxl/cxl_cdat.h
@@ -0,0 +1,165 @@
+/*
+ * CXL CDAT Structure
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_CDAT_H
+#define CXL_CDAT_H
+
+#include "hw/cxl/cxl_pci.h"
+
+/*
+ * Reference:
+ *   Coherent Device Attribute Table (CDAT) Specification, Rev. 1.02, Oct. 2020
+ *   Compute Express Link (CXL) Specification, Rev. 2.0, Oct. 2020
+ */
+/* Table Access DOE - CXL 8.1.11 */
+#define CXL_DOE_TABLE_ACCESS      2
+#define CXL_DOE_PROTOCOL_CDAT     ((CXL_DOE_TABLE_ACCESS << 16) | CXL_VENDOR_ID)
+
+/* Read Entry - CXL 8.1.11.1 */
+#define CXL_DOE_TAB_TYPE_CDAT 0
+#define CXL_DOE_TAB_ENT_MAX 0xFFFF
+
+/* Read Entry Request - CXL 8.1.11.1 Table 134 */
+#define CXL_DOE_TAB_REQ 0
+typedef struct CDATReq {
+    DOEHeader header;
+    uint8_t req_code;
+    uint8_t table_type;
+    uint16_t entry_handle;
+} QEMU_PACKED CDATReq;
+
+/* Read Entry Response - CXL 8.1.11.1 Table 135 */
+#define CXL_DOE_TAB_RSP 0
+typedef struct CDATRsp {
+    DOEHeader header;
+    uint8_t rsp_code;
+    uint8_t table_type;
+    uint16_t entry_handle;
+} QEMU_PACKED CDATRsp;
+
+/* CDAT Table Format - CDAT Table 1 */
+#define CXL_CDAT_REV 1
+typedef struct CDATTableHeader {
+    uint32_t length;
+    uint8_t revision;
+    uint8_t checksum;
+    uint8_t reserved[6];
+    uint32_t sequence;
+} QEMU_PACKED CDATTableHeader;
+
+/* CDAT Structure Types - CDAT Table 2 */
+typedef enum {
+    CDAT_TYPE_DSMAS = 0,
+    CDAT_TYPE_DSLBIS = 1,
+    CDAT_TYPE_DSMSCIS = 2,
+    CDAT_TYPE_DSIS = 3,
+    CDAT_TYPE_DSEMTS = 4,
+    CDAT_TYPE_SSLBIS = 5,
+} CDATType;
+
+typedef struct CDATSubHeader {
+    uint8_t type;
+    uint8_t reserved;
+    uint16_t length;
+} CDATSubHeader;
+
+/* Device Scoped Memory Affinity Structure - CDAT Table 3 */
+typedef struct CDATDsmas {
+    CDATSubHeader header;
+    uint8_t DSMADhandle;
+    uint8_t flags;
+#define CDAT_DSMAS_FLAG_NV              (1 << 2)
+#define CDAT_DSMAS_FLAG_SHAREABLE       (1 << 3)
+#define CDAT_DSMAS_FLAG_HW_COHERENT     (1 << 4)
+#define CDAT_DSMAS_FLAG_DYNAMIC_CAP     (1 << 5)
+    uint16_t reserved;
+    uint64_t DPA_base;
+    uint64_t DPA_length;
+} QEMU_PACKED CDATDsmas;
+
+/* Device Scoped Latency and Bandwidth Information Structure - CDAT Table 5 */
+typedef struct CDATDslbis {
+    CDATSubHeader header;
+    uint8_t handle;
+    /* Definitions of these fields refer directly to HMAT fields */
+    uint8_t flags;
+    uint8_t data_type;
+    uint8_t reserved;
+    uint64_t entry_base_unit;
+    uint16_t entry[3];
+    uint16_t reserved2;
+} QEMU_PACKED CDATDslbis;
+
+/* Device Scoped Memory Side Cache Information Structure - CDAT Table 6 */
+typedef struct CDATDsmscis {
+    CDATSubHeader header;
+    uint8_t DSMAS_handle;
+    uint8_t reserved[3];
+    uint64_t memory_side_cache_size;
+    uint32_t cache_attributes;
+} QEMU_PACKED CDATDsmscis;
+
+/* Device Scoped Initiator Structure - CDAT Table 7 */
+typedef struct CDATDsis {
+    CDATSubHeader header;
+    uint8_t flags;
+    uint8_t handle;
+    uint16_t reserved;
+} QEMU_PACKED CDATDsis;
+
+/* Device Scoped EFI Memory Type Structure - CDAT Table 8 */
+typedef struct CDATDsemts {
+    CDATSubHeader header;
+    uint8_t DSMAS_handle;
+    uint8_t EFI_memory_type_attr;
+    uint16_t reserved;
+    uint64_t DPA_offset;
+    uint64_t DPA_length;
+} QEMU_PACKED CDATDsemts;
+
+/* Switch Scoped Latency and Bandwidth Information Structure - CDAT Table 9 */
+typedef struct CDATSslbisHeader {
+    CDATSubHeader header;
+    uint8_t data_type;
+    uint8_t reserved[3];
+    uint64_t entry_base_unit;
+} QEMU_PACKED CDATSslbisHeader;
+
+/* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
+typedef struct CDATSslbe {
+    uint16_t port_x_id;
+    uint16_t port_y_id;
+    uint16_t latency_bandwidth;
+    uint16_t reserved;
+} QEMU_PACKED CDATSslbe;
+
+typedef struct CDATSslbis {
+    CDATSslbisHeader sslbis_header;
+    CDATSslbe sslbe[];
+} CDATSslbis;
+
+typedef struct CDATEntry {
+    void *base;
+    uint32_t length;
+} CDATEntry;
+
+typedef struct CDATObject {
+    CDATEntry *entry;
+    int entry_len;
+
+    int (*build_cdat_table)(CDATSubHeader ***cdat_table, void *priv);
+    void (*free_cdat_table)(CDATSubHeader **, int num, void *priv);
+    bool to_update;
+    void *private;
+    char *filename;
+    uint8_t *buf;
+    struct CDATSubHeader **built_buf;
+    int built_buf_len;
+} CDATObject;
+#endif /* CXL_CDAT_H */
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 94ec2f07d7..34075cfb72 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -19,6 +19,7 @@
 #include "qemu/range.h"
 #include "qemu/typedefs.h"
 #include "hw/register.h"
+#include "qapi/error.h"
 
 enum reg_type {
     CXL2_DEVICE,
@@ -184,6 +185,8 @@ typedef struct cxl_component {
             struct PCIDevice *pdev;
         };
     };
+
+    CDATObject cdat;
 } CXLComponentState;
 
 void cxl_component_register_block_init(Object *obj,
@@ -220,4 +223,8 @@ static inline hwaddr cxl_decode_ig(int ig)
 
 CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
 
+void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp);
+void cxl_doe_cdat_release(CXLComponentState *cxl_cstate);
+void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp);
+
 #endif
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e4d221cdb3..449b0edfe9 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -243,6 +243,9 @@ struct CXLType3Dev {
     AddressSpace hostmem_as;
     CXLComponentState cxl_cstate;
     CXLDeviceState cxl_dstate;
+
+    /* DOE */
+    DOECap doe_cdat;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h
index 01cf002096..3cb79eca1e 100644
--- a/include/hw/cxl/cxl_pci.h
+++ b/include/hw/cxl/cxl_pci.h
@@ -13,6 +13,7 @@
 #include "qemu/compiler.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pcie.h"
+#include "hw/cxl/cxl_cdat.h"
 
 #define CXL_VENDOR_ID 0x1e98
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
@ 2022-10-07 15:21   ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

The Data Object Exchange implementation of CXL Coherent Device Attribute
Table (CDAT). This implementation is referring to "Coherent Device
Attribute Table Specification, Rev. 1.02, Oct. 2020" and "Compute
Express Link Specification, Rev. 2.0, Oct. 2020"

This patch adds core support that will be shared by both
end-points and switch port emulation.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
Changes since RFC:
- Split out libary code from specific device.
---
 hw/cxl/cxl-cdat.c              | 222 +++++++++++++++++++++++++++++++++
 hw/cxl/meson.build             |   1 +
 include/hw/cxl/cxl_cdat.h      | 165 ++++++++++++++++++++++++
 include/hw/cxl/cxl_component.h |   7 ++
 include/hw/cxl/cxl_device.h    |   3 +
 include/hw/cxl/cxl_pci.h       |   1 +
 6 files changed, 399 insertions(+)

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
new file mode 100644
index 0000000000..137178632b
--- /dev/null
+++ b/hw/cxl/cxl-cdat.c
@@ -0,0 +1,222 @@
+/*
+ * CXL CDAT Structure
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+
+static void cdat_len_check(CDATSubHeader *hdr, Error **errp)
+{
+    assert(hdr->length);
+    assert(hdr->reserved == 0);
+
+    switch (hdr->type) {
+    case CDAT_TYPE_DSMAS:
+        assert(hdr->length == sizeof(CDATDsmas));
+        break;
+    case CDAT_TYPE_DSLBIS:
+        assert(hdr->length == sizeof(CDATDslbis));
+        break;
+    case CDAT_TYPE_DSMSCIS:
+        assert(hdr->length == sizeof(CDATDsmscis));
+        break;
+    case CDAT_TYPE_DSIS:
+        assert(hdr->length == sizeof(CDATDsis));
+        break;
+    case CDAT_TYPE_DSEMTS:
+        assert(hdr->length == sizeof(CDATDsemts));
+        break;
+    case CDAT_TYPE_SSLBIS:
+        assert(hdr->length >= sizeof(CDATSslbisHeader));
+        assert((hdr->length - sizeof(CDATSslbisHeader)) %
+               sizeof(CDATSslbe) == 0);
+        break;
+    default:
+        error_setg(errp, "Type %d is reserved", hdr->type);
+    }
+}
+
+static void ct3_build_cdat(CDATObject *cdat, Error **errp)
+{
+    g_autofree CDATTableHeader *cdat_header = NULL;
+    g_autofree CDATEntry *cdat_st = NULL;
+    uint8_t sum = 0;
+    int ent, i;
+
+    /* Use default table if fopen == NULL */
+    assert(cdat->build_cdat_table);
+
+    cdat_header = g_malloc0(sizeof(*cdat_header));
+    if (!cdat_header) {
+        error_setg(errp, "Failed to allocate CDAT header");
+        return;
+    }
+
+    cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf, cdat->private);
+
+    if (!cdat->built_buf_len) {
+        /* Build later as not all data available yet */
+        cdat->to_update = true;
+        return;
+    }
+    cdat->to_update = false;
+
+    cdat_st = g_malloc0(sizeof(*cdat_st) * (cdat->built_buf_len + 1));
+    if (!cdat_st) {
+        error_setg(errp, "Failed to allocate CDAT entry array");
+        return;
+    }
+
+    /* Entry 0 for CDAT header, starts with Entry 1 */
+    for (ent = 1; ent < cdat->built_buf_len + 1; ent++) {
+        CDATSubHeader *hdr = cdat->built_buf[ent - 1];
+        uint8_t *buf = (uint8_t *)cdat->built_buf[ent - 1];
+
+        cdat_st[ent].base = hdr;
+        cdat_st[ent].length = hdr->length;
+
+        cdat_header->length += hdr->length;
+        for (i = 0; i < hdr->length; i++) {
+            sum += buf[i];
+        }
+    }
+
+    /* CDAT header */
+    cdat_header->revision = CXL_CDAT_REV;
+    /* For now, no runtime updates */
+    cdat_header->sequence = 0;
+    cdat_header->length += sizeof(CDATTableHeader);
+    sum += cdat_header->revision + cdat_header->sequence +
+        cdat_header->length;
+    /* Sum of all bytes including checksum must be 0 */
+    cdat_header->checksum = ~sum + 1;
+
+    cdat_st[0].base = g_steal_pointer(&cdat_header);
+    cdat_st[0].length = sizeof(*cdat_header);
+    cdat->entry_len = 1 + cdat->built_buf_len;
+    cdat->entry = g_steal_pointer(&cdat_st);
+}
+
+static void ct3_load_cdat(CDATObject *cdat, Error **errp)
+{
+    g_autofree CDATEntry *cdat_st = NULL;
+    uint8_t sum = 0;
+    int num_ent;
+    int i = 0, ent = 1, file_size = 0;
+    CDATSubHeader *hdr;
+    FILE *fp = NULL;
+
+    /* Read CDAT file and create its cache */
+    fp = fopen(cdat->filename, "r");
+    if (!fp) {
+        error_setg(errp, "CDAT: Unable to open file");
+        return;
+    }
+
+    fseek(fp, 0, SEEK_END);
+    file_size = ftell(fp);
+    fseek(fp, 0, SEEK_SET);
+    cdat->buf = g_malloc0(file_size);
+
+    if (fread(cdat->buf, file_size, 1, fp) == 0) {
+        error_setg(errp, "CDAT: File read failed");
+        return;
+    }
+
+    fclose(fp);
+
+    if (file_size < sizeof(CDATTableHeader)) {
+        error_setg(errp, "CDAT: File too short");
+        return;
+    }
+    i = sizeof(CDATTableHeader);
+    num_ent = 1;
+    while (i < file_size) {
+        hdr = (CDATSubHeader *)(cdat->buf + i);
+        cdat_len_check(hdr, errp);
+        i += hdr->length;
+        num_ent++;
+    }
+    if (i != file_size) {
+        error_setg(errp, "CDAT: File length missmatch");
+        return;
+    }
+
+    cdat_st = g_malloc0(sizeof(*cdat_st) * num_ent);
+    if (!cdat_st) {
+        error_setg(errp, "CDAT: Failed to allocate entry array");
+        return;
+    }
+
+    /* Set CDAT header, Entry = 0 */
+    cdat_st[0].base = cdat->buf;
+    cdat_st[0].length = sizeof(CDATTableHeader);
+    i = 0;
+
+    while (i < cdat_st[0].length) {
+        sum += cdat->buf[i++];
+    }
+
+    /* Read CDAT structures */
+    while (i < file_size) {
+        hdr = (CDATSubHeader *)(cdat->buf + i);
+        cdat_len_check(hdr, errp);
+
+        cdat_st[ent].base = hdr;
+        cdat_st[ent].length = hdr->length;
+
+        while (cdat->buf + i <
+               (uint8_t *)cdat_st[ent].base + cdat_st[ent].length) {
+            assert(i < file_size);
+            sum += cdat->buf[i++];
+        }
+
+        ent++;
+    }
+
+    if (sum != 0) {
+        warn_report("CDAT: Found checksum mismatch in %s", cdat->filename);
+    }
+    cdat->entry_len = num_ent;
+    cdat->entry = g_steal_pointer(&cdat_st);
+}
+
+void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    if (cdat->filename) {
+        ct3_load_cdat(cdat, errp);
+    } else {
+        ct3_build_cdat(cdat, errp);
+    }
+}
+
+void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    if (cdat->to_update) {
+        ct3_build_cdat(cdat, errp);
+    }
+}
+
+void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    free(cdat->entry);
+    if (cdat->built_buf)
+        cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
+                              cdat->private);
+    if (cdat->buf)
+        free(cdat->buf);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index f117b99949..cfa95ffd40 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -4,6 +4,7 @@ softmmu_ss.add(when: 'CONFIG_CXL',
                    'cxl-device-utils.c',
                    'cxl-mailbox-utils.c',
                    'cxl-host.c',
+                   'cxl-cdat.c',
                ),
                if_false: files(
                    'cxl-host-stubs.c',
diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
new file mode 100644
index 0000000000..fdb1fa98f4
--- /dev/null
+++ b/include/hw/cxl/cxl_cdat.h
@@ -0,0 +1,165 @@
+/*
+ * CXL CDAT Structure
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_CDAT_H
+#define CXL_CDAT_H
+
+#include "hw/cxl/cxl_pci.h"
+
+/*
+ * Reference:
+ *   Coherent Device Attribute Table (CDAT) Specification, Rev. 1.02, Oct. 2020
+ *   Compute Express Link (CXL) Specification, Rev. 2.0, Oct. 2020
+ */
+/* Table Access DOE - CXL 8.1.11 */
+#define CXL_DOE_TABLE_ACCESS      2
+#define CXL_DOE_PROTOCOL_CDAT     ((CXL_DOE_TABLE_ACCESS << 16) | CXL_VENDOR_ID)
+
+/* Read Entry - CXL 8.1.11.1 */
+#define CXL_DOE_TAB_TYPE_CDAT 0
+#define CXL_DOE_TAB_ENT_MAX 0xFFFF
+
+/* Read Entry Request - CXL 8.1.11.1 Table 134 */
+#define CXL_DOE_TAB_REQ 0
+typedef struct CDATReq {
+    DOEHeader header;
+    uint8_t req_code;
+    uint8_t table_type;
+    uint16_t entry_handle;
+} QEMU_PACKED CDATReq;
+
+/* Read Entry Response - CXL 8.1.11.1 Table 135 */
+#define CXL_DOE_TAB_RSP 0
+typedef struct CDATRsp {
+    DOEHeader header;
+    uint8_t rsp_code;
+    uint8_t table_type;
+    uint16_t entry_handle;
+} QEMU_PACKED CDATRsp;
+
+/* CDAT Table Format - CDAT Table 1 */
+#define CXL_CDAT_REV 1
+typedef struct CDATTableHeader {
+    uint32_t length;
+    uint8_t revision;
+    uint8_t checksum;
+    uint8_t reserved[6];
+    uint32_t sequence;
+} QEMU_PACKED CDATTableHeader;
+
+/* CDAT Structure Types - CDAT Table 2 */
+typedef enum {
+    CDAT_TYPE_DSMAS = 0,
+    CDAT_TYPE_DSLBIS = 1,
+    CDAT_TYPE_DSMSCIS = 2,
+    CDAT_TYPE_DSIS = 3,
+    CDAT_TYPE_DSEMTS = 4,
+    CDAT_TYPE_SSLBIS = 5,
+} CDATType;
+
+typedef struct CDATSubHeader {
+    uint8_t type;
+    uint8_t reserved;
+    uint16_t length;
+} CDATSubHeader;
+
+/* Device Scoped Memory Affinity Structure - CDAT Table 3 */
+typedef struct CDATDsmas {
+    CDATSubHeader header;
+    uint8_t DSMADhandle;
+    uint8_t flags;
+#define CDAT_DSMAS_FLAG_NV              (1 << 2)
+#define CDAT_DSMAS_FLAG_SHAREABLE       (1 << 3)
+#define CDAT_DSMAS_FLAG_HW_COHERENT     (1 << 4)
+#define CDAT_DSMAS_FLAG_DYNAMIC_CAP     (1 << 5)
+    uint16_t reserved;
+    uint64_t DPA_base;
+    uint64_t DPA_length;
+} QEMU_PACKED CDATDsmas;
+
+/* Device Scoped Latency and Bandwidth Information Structure - CDAT Table 5 */
+typedef struct CDATDslbis {
+    CDATSubHeader header;
+    uint8_t handle;
+    /* Definitions of these fields refer directly to HMAT fields */
+    uint8_t flags;
+    uint8_t data_type;
+    uint8_t reserved;
+    uint64_t entry_base_unit;
+    uint16_t entry[3];
+    uint16_t reserved2;
+} QEMU_PACKED CDATDslbis;
+
+/* Device Scoped Memory Side Cache Information Structure - CDAT Table 6 */
+typedef struct CDATDsmscis {
+    CDATSubHeader header;
+    uint8_t DSMAS_handle;
+    uint8_t reserved[3];
+    uint64_t memory_side_cache_size;
+    uint32_t cache_attributes;
+} QEMU_PACKED CDATDsmscis;
+
+/* Device Scoped Initiator Structure - CDAT Table 7 */
+typedef struct CDATDsis {
+    CDATSubHeader header;
+    uint8_t flags;
+    uint8_t handle;
+    uint16_t reserved;
+} QEMU_PACKED CDATDsis;
+
+/* Device Scoped EFI Memory Type Structure - CDAT Table 8 */
+typedef struct CDATDsemts {
+    CDATSubHeader header;
+    uint8_t DSMAS_handle;
+    uint8_t EFI_memory_type_attr;
+    uint16_t reserved;
+    uint64_t DPA_offset;
+    uint64_t DPA_length;
+} QEMU_PACKED CDATDsemts;
+
+/* Switch Scoped Latency and Bandwidth Information Structure - CDAT Table 9 */
+typedef struct CDATSslbisHeader {
+    CDATSubHeader header;
+    uint8_t data_type;
+    uint8_t reserved[3];
+    uint64_t entry_base_unit;
+} QEMU_PACKED CDATSslbisHeader;
+
+/* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
+typedef struct CDATSslbe {
+    uint16_t port_x_id;
+    uint16_t port_y_id;
+    uint16_t latency_bandwidth;
+    uint16_t reserved;
+} QEMU_PACKED CDATSslbe;
+
+typedef struct CDATSslbis {
+    CDATSslbisHeader sslbis_header;
+    CDATSslbe sslbe[];
+} CDATSslbis;
+
+typedef struct CDATEntry {
+    void *base;
+    uint32_t length;
+} CDATEntry;
+
+typedef struct CDATObject {
+    CDATEntry *entry;
+    int entry_len;
+
+    int (*build_cdat_table)(CDATSubHeader ***cdat_table, void *priv);
+    void (*free_cdat_table)(CDATSubHeader **, int num, void *priv);
+    bool to_update;
+    void *private;
+    char *filename;
+    uint8_t *buf;
+    struct CDATSubHeader **built_buf;
+    int built_buf_len;
+} CDATObject;
+#endif /* CXL_CDAT_H */
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 94ec2f07d7..34075cfb72 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -19,6 +19,7 @@
 #include "qemu/range.h"
 #include "qemu/typedefs.h"
 #include "hw/register.h"
+#include "qapi/error.h"
 
 enum reg_type {
     CXL2_DEVICE,
@@ -184,6 +185,8 @@ typedef struct cxl_component {
             struct PCIDevice *pdev;
         };
     };
+
+    CDATObject cdat;
 } CXLComponentState;
 
 void cxl_component_register_block_init(Object *obj,
@@ -220,4 +223,8 @@ static inline hwaddr cxl_decode_ig(int ig)
 
 CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
 
+void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp);
+void cxl_doe_cdat_release(CXLComponentState *cxl_cstate);
+void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp);
+
 #endif
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e4d221cdb3..449b0edfe9 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -243,6 +243,9 @@ struct CXLType3Dev {
     AddressSpace hostmem_as;
     CXLComponentState cxl_cstate;
     CXLDeviceState cxl_dstate;
+
+    /* DOE */
+    DOECap doe_cdat;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h
index 01cf002096..3cb79eca1e 100644
--- a/include/hw/cxl/cxl_pci.h
+++ b/include/hw/cxl/cxl_pci.h
@@ -13,6 +13,7 @@
 #include "qemu/compiler.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pcie.h"
+#include "hw/cxl/cxl_cdat.h"
 
 #define CXL_VENDOR_ID 0x1e98
 
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-07 15:21 ` Jonathan Cameron via
@ 2022-10-07 15:21   ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
in "-device cxl-type3"'s command option. The file is required to provide
the whole CDAT table in binary mode. The other is to use the default
that provides some 'reasonable' numbers based on type of memory and
size.

The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
capability offset 0x190. The config read/write to this capability range
can be generated in the OS to request the CDAT data.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

--
Changes since RFC:
- Break out type 3 user of library as separate patch.
- Change reported data for default to be based on the options provided
  for the type 3 device.
---
 hw/mem/cxl_type3.c | 227 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 568c9d62f5..3fa5d70662 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -12,9 +12,218 @@
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
+#include "sysemu/numa.h"
 #include "hw/cxl/cxl.h"
 #include "hw/pci/msix.h"
 
+#define DWORD_BYTE 4
+
+static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
+                                void *priv)
+{
+    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
+    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
+    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
+    CXLType3Dev *ct3d = priv;
+    int len = 0;
+    int i = 0;
+    int next_dsmad_handle = 0;
+    int nonvolatile_dsmad = -1;
+    int dslbis_nonvolatile_num = 4;
+    MemoryRegion *mr;
+
+    /* Non volatile aspects */
+    if (ct3d->hostmem) {
+        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
+        if (!dsmas_nonvolatile) {
+            return -ENOMEM;
+        }
+        nonvolatile_dsmad = next_dsmad_handle++;
+        mr = host_memory_backend_get_memory(ct3d->hostmem);
+        if (!mr) {
+            return -EINVAL;
+        }
+        *dsmas_nonvolatile = (CDATDsmas) {
+            .header = {
+                .type = CDAT_TYPE_DSMAS,
+                .length = sizeof(*dsmas_nonvolatile),
+            },
+            .DSMADhandle = nonvolatile_dsmad,
+            .flags = CDAT_DSMAS_FLAG_NV,
+            .DPA_base = 0,
+            .DPA_length = int128_get64(mr->size),
+        };
+        len++;
+
+        /* For now, no memory side cache, plausiblish numbers */
+        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
+        if (!dslbis_nonvolatile)
+            return -ENOMEM;
+
+        dslbis_nonvolatile[0] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_READ_LATENCY,
+            .entry_base_unit = 10000, /* 10ns base */
+            .entry[0] = 15, /* 150ns */
+        };
+        len++;
+
+        dslbis_nonvolatile[1] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
+            .entry_base_unit = 10000,
+            .entry[0] = 25, /* 250ns */
+        };
+        len++;
+       
+        dslbis_nonvolatile[2] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
+            .entry_base_unit = 1000, /* GB/s */
+            .entry[0] = 16,
+        };
+        len++;
+
+        dslbis_nonvolatile[3] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
+            .entry_base_unit = 1000, /* GB/s */
+            .entry[0] = 16,
+        };
+        len++;
+
+        mr = host_memory_backend_get_memory(ct3d->hostmem);
+        if (!mr) {
+            return -EINVAL;
+        }
+        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
+        *dsemts_nonvolatile = (CDATDsemts) {
+            .header = {
+                .type = CDAT_TYPE_DSEMTS,
+                .length = sizeof(*dsemts_nonvolatile),
+            },
+            .DSMAS_handle = nonvolatile_dsmad,
+            .EFI_memory_type_attr = 2, /* Reserved - the non volatile from DSMAS matters */
+            .DPA_offset = 0,
+            .DPA_length = int128_get64(mr->size),
+        };
+        len++;
+    }
+
+    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
+    /* Header always at start of structure */
+    if (dsmas_nonvolatile) {
+        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
+    }
+    if (dslbis_nonvolatile) {
+        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);        
+        int j;
+
+        for (j = 0; j < dslbis_nonvolatile_num; j++) {
+            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
+        }
+    }
+    if (dsemts_nonvolatile) {
+        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
+    }
+    
+    return len;
+}
+
+static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        g_free(cdat_table[i]);
+    }
+    g_free(cdat_table);
+}
+
+static bool cxl_doe_cdat_rsp(DOECap *doe_cap)
+{
+    CDATObject *cdat = &CXL_TYPE3(doe_cap->pdev)->cxl_cstate.cdat;
+    uint16_t ent;
+    void *base;
+    uint32_t len;
+    CDATReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    CDATRsp rsp;
+
+    assert(cdat->entry_len);
+
+    /* Discard if request length mismatched */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(CDATReq), DWORD_BYTE)) {
+        return false;
+    }
+
+    ent = req->entry_handle;
+    base = cdat->entry[ent].base;
+    len = cdat->entry[ent].length;
+
+    rsp = (CDATRsp) {
+        .header = {
+            .vendor_id = CXL_VENDOR_ID,
+            .data_obj_type = CXL_DOE_TABLE_ACCESS,
+            .reserved = 0x0,
+            .length = DIV_ROUND_UP((sizeof(rsp) + len), DWORD_BYTE),
+        },
+        .rsp_code = CXL_DOE_TAB_RSP,
+        .table_type = CXL_DOE_TAB_TYPE_CDAT,
+        .entry_handle = (ent < cdat->entry_len - 1) ?
+                        ent + 1 : CXL_DOE_TAB_ENT_MAX,
+    };
+
+    memcpy(doe_cap->read_mbox, &rsp, sizeof(rsp));
+    memcpy(doe_cap->read_mbox + DIV_ROUND_UP(sizeof(rsp), DWORD_BYTE),
+           base, len);
+
+    doe_cap->read_mbox_len += rsp.header.length;
+
+    return true;
+}
+
+static uint32_t ct3d_config_read(PCIDevice *pci_dev, uint32_t addr, int size)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+    uint32_t val;
+
+    if (pcie_doe_read_config(&ct3d->doe_cdat, addr, size, &val)) {
+        return val;
+    }
+
+    return pci_default_read_config(pci_dev, addr, size);
+}
+
+static void ct3d_config_write(PCIDevice *pci_dev, uint32_t addr, uint32_t val,
+                              int size)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+
+    pcie_doe_write_config(&ct3d->doe_cdat, addr, val, size);
+    pci_default_write_config(pci_dev, addr, val, size);
+}
+
 /*
  * Null value of all Fs suggested by IEEE RA guidelines for use of
  * EU, OUI and CID
@@ -140,6 +349,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
     return true;
 }
 
+static DOEProtocol doe_cdat_prot[] = {
+    { CXL_VENDOR_ID, CXL_DOE_TABLE_ACCESS, cxl_doe_cdat_rsp },
+    { }
+};
+
 static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 {
     CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
@@ -189,6 +403,14 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     for (i = 0; i < msix_num; i++) {
         msix_vector_use(pci_dev, i);
     }
+
+    /* DOE Initailization */
+    pcie_doe_init(pci_dev, &ct3d->doe_cdat, 0x190, doe_cdat_prot, true, 0);
+
+    cxl_cstate->cdat.build_cdat_table = ct3_build_cdat_table;
+    cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
+    cxl_cstate->cdat.private = ct3d;
+    cxl_doe_cdat_init(cxl_cstate, errp);
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
@@ -197,6 +419,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     CXLComponentState *cxl_cstate = &ct3d->cxl_cstate;
     ComponentRegisters *regs = &cxl_cstate->crb;
 
+    cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
     address_space_destroy(&ct3d->hostmem_as);
 }
@@ -296,6 +519,7 @@ static Property ct3_props[] = {
     DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
+    DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -361,6 +585,9 @@ static void ct3_class_init(ObjectClass *oc, void *data)
     pc->device_id = 0xd93; /* LVF for now */
     pc->revision = 1;
 
+    pc->config_write = ct3d_config_write;
+    pc->config_read = ct3d_config_read;
+
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
     dc->desc = "CXL PMEM Device (Type 3)";
     dc->reset = ct3d_reset;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
@ 2022-10-07 15:21   ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
in "-device cxl-type3"'s command option. The file is required to provide
the whole CDAT table in binary mode. The other is to use the default
that provides some 'reasonable' numbers based on type of memory and
size.

The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
capability offset 0x190. The config read/write to this capability range
can be generated in the OS to request the CDAT data.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

--
Changes since RFC:
- Break out type 3 user of library as separate patch.
- Change reported data for default to be based on the options provided
  for the type 3 device.
---
 hw/mem/cxl_type3.c | 227 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 568c9d62f5..3fa5d70662 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -12,9 +12,218 @@
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
+#include "sysemu/numa.h"
 #include "hw/cxl/cxl.h"
 #include "hw/pci/msix.h"
 
+#define DWORD_BYTE 4
+
+static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
+                                void *priv)
+{
+    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
+    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
+    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
+    CXLType3Dev *ct3d = priv;
+    int len = 0;
+    int i = 0;
+    int next_dsmad_handle = 0;
+    int nonvolatile_dsmad = -1;
+    int dslbis_nonvolatile_num = 4;
+    MemoryRegion *mr;
+
+    /* Non volatile aspects */
+    if (ct3d->hostmem) {
+        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
+        if (!dsmas_nonvolatile) {
+            return -ENOMEM;
+        }
+        nonvolatile_dsmad = next_dsmad_handle++;
+        mr = host_memory_backend_get_memory(ct3d->hostmem);
+        if (!mr) {
+            return -EINVAL;
+        }
+        *dsmas_nonvolatile = (CDATDsmas) {
+            .header = {
+                .type = CDAT_TYPE_DSMAS,
+                .length = sizeof(*dsmas_nonvolatile),
+            },
+            .DSMADhandle = nonvolatile_dsmad,
+            .flags = CDAT_DSMAS_FLAG_NV,
+            .DPA_base = 0,
+            .DPA_length = int128_get64(mr->size),
+        };
+        len++;
+
+        /* For now, no memory side cache, plausiblish numbers */
+        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
+        if (!dslbis_nonvolatile)
+            return -ENOMEM;
+
+        dslbis_nonvolatile[0] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_READ_LATENCY,
+            .entry_base_unit = 10000, /* 10ns base */
+            .entry[0] = 15, /* 150ns */
+        };
+        len++;
+
+        dslbis_nonvolatile[1] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
+            .entry_base_unit = 10000,
+            .entry[0] = 25, /* 250ns */
+        };
+        len++;
+       
+        dslbis_nonvolatile[2] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
+            .entry_base_unit = 1000, /* GB/s */
+            .entry[0] = 16,
+        };
+        len++;
+
+        dslbis_nonvolatile[3] = (CDATDslbis) {
+            .header = {
+                .type = CDAT_TYPE_DSLBIS,
+                .length = sizeof(*dslbis_nonvolatile),
+            },
+            .handle = nonvolatile_dsmad,
+            .flags = HMAT_LB_MEM_MEMORY,
+            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
+            .entry_base_unit = 1000, /* GB/s */
+            .entry[0] = 16,
+        };
+        len++;
+
+        mr = host_memory_backend_get_memory(ct3d->hostmem);
+        if (!mr) {
+            return -EINVAL;
+        }
+        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
+        *dsemts_nonvolatile = (CDATDsemts) {
+            .header = {
+                .type = CDAT_TYPE_DSEMTS,
+                .length = sizeof(*dsemts_nonvolatile),
+            },
+            .DSMAS_handle = nonvolatile_dsmad,
+            .EFI_memory_type_attr = 2, /* Reserved - the non volatile from DSMAS matters */
+            .DPA_offset = 0,
+            .DPA_length = int128_get64(mr->size),
+        };
+        len++;
+    }
+
+    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
+    /* Header always at start of structure */
+    if (dsmas_nonvolatile) {
+        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
+    }
+    if (dslbis_nonvolatile) {
+        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);        
+        int j;
+
+        for (j = 0; j < dslbis_nonvolatile_num; j++) {
+            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
+        }
+    }
+    if (dsemts_nonvolatile) {
+        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
+    }
+    
+    return len;
+}
+
+static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        g_free(cdat_table[i]);
+    }
+    g_free(cdat_table);
+}
+
+static bool cxl_doe_cdat_rsp(DOECap *doe_cap)
+{
+    CDATObject *cdat = &CXL_TYPE3(doe_cap->pdev)->cxl_cstate.cdat;
+    uint16_t ent;
+    void *base;
+    uint32_t len;
+    CDATReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    CDATRsp rsp;
+
+    assert(cdat->entry_len);
+
+    /* Discard if request length mismatched */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(CDATReq), DWORD_BYTE)) {
+        return false;
+    }
+
+    ent = req->entry_handle;
+    base = cdat->entry[ent].base;
+    len = cdat->entry[ent].length;
+
+    rsp = (CDATRsp) {
+        .header = {
+            .vendor_id = CXL_VENDOR_ID,
+            .data_obj_type = CXL_DOE_TABLE_ACCESS,
+            .reserved = 0x0,
+            .length = DIV_ROUND_UP((sizeof(rsp) + len), DWORD_BYTE),
+        },
+        .rsp_code = CXL_DOE_TAB_RSP,
+        .table_type = CXL_DOE_TAB_TYPE_CDAT,
+        .entry_handle = (ent < cdat->entry_len - 1) ?
+                        ent + 1 : CXL_DOE_TAB_ENT_MAX,
+    };
+
+    memcpy(doe_cap->read_mbox, &rsp, sizeof(rsp));
+    memcpy(doe_cap->read_mbox + DIV_ROUND_UP(sizeof(rsp), DWORD_BYTE),
+           base, len);
+
+    doe_cap->read_mbox_len += rsp.header.length;
+
+    return true;
+}
+
+static uint32_t ct3d_config_read(PCIDevice *pci_dev, uint32_t addr, int size)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+    uint32_t val;
+
+    if (pcie_doe_read_config(&ct3d->doe_cdat, addr, size, &val)) {
+        return val;
+    }
+
+    return pci_default_read_config(pci_dev, addr, size);
+}
+
+static void ct3d_config_write(PCIDevice *pci_dev, uint32_t addr, uint32_t val,
+                              int size)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+
+    pcie_doe_write_config(&ct3d->doe_cdat, addr, val, size);
+    pci_default_write_config(pci_dev, addr, val, size);
+}
+
 /*
  * Null value of all Fs suggested by IEEE RA guidelines for use of
  * EU, OUI and CID
@@ -140,6 +349,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
     return true;
 }
 
+static DOEProtocol doe_cdat_prot[] = {
+    { CXL_VENDOR_ID, CXL_DOE_TABLE_ACCESS, cxl_doe_cdat_rsp },
+    { }
+};
+
 static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 {
     CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
@@ -189,6 +403,14 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     for (i = 0; i < msix_num; i++) {
         msix_vector_use(pci_dev, i);
     }
+
+    /* DOE Initailization */
+    pcie_doe_init(pci_dev, &ct3d->doe_cdat, 0x190, doe_cdat_prot, true, 0);
+
+    cxl_cstate->cdat.build_cdat_table = ct3_build_cdat_table;
+    cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
+    cxl_cstate->cdat.private = ct3d;
+    cxl_doe_cdat_init(cxl_cstate, errp);
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
@@ -197,6 +419,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     CXLComponentState *cxl_cstate = &ct3d->cxl_cstate;
     ComponentRegisters *regs = &cxl_cstate->crb;
 
+    cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
     address_space_destroy(&ct3d->hostmem_as);
 }
@@ -296,6 +519,7 @@ static Property ct3_props[] = {
     DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
+    DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -361,6 +585,9 @@ static void ct3_class_init(ObjectClass *oc, void *data)
     pc->device_id = 0xd93; /* LVF for now */
     pc->revision = 1;
 
+    pc->config_write = ct3d_config_write;
+    pc->config_read = ct3d_config_read;
+
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
     dc->desc = "CXL PMEM Device (Type 3)";
     dc->reset = ct3d_reset;
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
  2022-10-07 15:21 ` Jonathan Cameron via
@ 2022-10-07 15:21   ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

This Data Object Exchange Mailbox allows software to query the
latency and bandwidth between ports on the switch. For now
only provide information on routes between the upstream port and
each downstream port (not p2p).

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/pci-bridge/cxl_upstream.c | 182 ++++++++++++++++++++++++++++++++++-
 include/hw/cxl/cxl_cdat.h    |   1 +
 2 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
index a83a3e81e4..9209c704ae 100644
--- a/hw/pci-bridge/cxl_upstream.c
+++ b/hw/pci-bridge/cxl_upstream.c
@@ -10,11 +10,12 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "hw/qdev-properties.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/pcie.h"
 #include "hw/pci/pcie_port.h"
 
-#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 1
+#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 2
 
 #define CXL_UPSTREAM_PORT_MSI_OFFSET 0x70
 #define CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET 0x90
@@ -28,6 +29,7 @@ typedef struct CXLUpstreamPort {
 
     /*< public >*/
     CXLComponentState cxl_cstate;
+    DOECap doe_cdat;
 } CXLUpstreamPort;
 
 CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
@@ -60,6 +62,9 @@ static void cxl_usp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
 static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
                                  uint32_t val, int len)
 {
+    CXLUpstreamPort *usp = CXL_USP(d);
+
+    pcie_doe_write_config(&usp->doe_cdat, address, val, len);
     pci_bridge_write_config(d, address, val, len);
     pcie_cap_flr_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
@@ -67,6 +72,18 @@ static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
     cxl_usp_dvsec_write_config(d, address, val, len);
 }
 
+static uint32_t cxl_usp_read_config(PCIDevice *d, uint32_t address, int len)
+{
+    CXLUpstreamPort *usp = CXL_USP(d);
+    uint32_t val;
+    
+    if (pcie_doe_read_config(&usp->doe_cdat, address, len, &val)) {
+        return val;
+    }
+
+    return pci_default_read_config(d, address, len);
+}
+
 static void latch_registers(CXLUpstreamPort *usp)
 {
     uint32_t *reg_state = usp->cxl_cstate.crb.cache_mem_registers;
@@ -119,6 +136,155 @@ static void build_dvsecs(CXLComponentState *cxl)
                                REG_LOC_DVSEC_REVID, dvsec);
 }
 
+static bool cxl_doe_cdat_rsp(DOECap *doe_cap)
+{
+    CDATObject *cdat = &CXL_USP(doe_cap->pdev)->cxl_cstate.cdat;
+    uint16_t ent;
+    void *base;
+    uint32_t len;
+    CDATReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    CDATRsp rsp;
+
+    cxl_doe_cdat_update(&CXL_USP(doe_cap->pdev)->cxl_cstate, &error_fatal);
+    assert(cdat->entry_len);
+
+    /* Discard if request length mismatched */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(CDATReq), sizeof(uint32_t))) {
+        return false;
+    }
+
+    ent = req->entry_handle;
+    base = cdat->entry[ent].base;
+    len = cdat->entry[ent].length;
+
+    rsp = (CDATRsp) {
+        .header = {
+            .vendor_id = CXL_VENDOR_ID,
+            .data_obj_type = CXL_DOE_TABLE_ACCESS,
+            .reserved = 0x0,
+            .length = DIV_ROUND_UP((sizeof(rsp) + len), sizeof(uint32_t)),
+        },
+        .rsp_code = CXL_DOE_TAB_RSP,
+        .table_type = CXL_DOE_TAB_TYPE_CDAT,
+        .entry_handle = (ent < cdat->entry_len - 1) ?
+                        ent + 1 : CXL_DOE_TAB_ENT_MAX,
+    };
+
+    memcpy(doe_cap->read_mbox, &rsp, sizeof(rsp));
+        memcpy(doe_cap->read_mbox + DIV_ROUND_UP(sizeof(rsp), sizeof(uint32_t)),
+           base, len);
+
+    doe_cap->read_mbox_len += rsp.header.length;
+
+    return true;
+}
+
+static DOEProtocol doe_cdat_prot[] = {
+    { CXL_VENDOR_ID, CXL_DOE_TABLE_ACCESS, cxl_doe_cdat_rsp },
+    { }
+};
+
+static int build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
+{
+    g_autofree CDATSslbis *sslbis_latency = NULL;
+    g_autofree CDATSslbis *sslbis_bandwidth = NULL;
+    CXLUpstreamPort *us = CXL_USP(priv);
+    PCIBus *bus = &PCI_BRIDGE(us)->sec_bus;
+    int devfn, sslbis_size;
+    int len = 0;
+    int i = 0;
+    int count = 0;
+    uint16_t port_ids[256];
+
+    for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
+        PCIDevice *d = bus->devices[devfn];
+        PCIEPort *port;
+        
+        if (!d || !pci_is_express(d) || !d->exp.exp_cap) {
+            continue;
+        }
+
+        /*
+         * Whilst the PCI express spec doesn't allow anything other than
+         * downstream ports on this bus, let us be a little paranoid
+         */
+        if (!object_dynamic_cast(OBJECT(d), TYPE_PCIE_PORT)) {
+            continue;
+        }
+
+        port = PCIE_PORT(d);
+        port_ids[count] = port->port;
+        count++;
+    }
+
+    /* May not yet have any ports - try again later */
+    if (count == 0) {
+        return 0;
+    }
+
+    sslbis_size = sizeof(CDATSslbis) + sizeof(*sslbis_latency->sslbe) * count;
+    sslbis_latency = g_malloc(sslbis_size);
+    *sslbis_latency = (CDATSslbis) {
+        .sslbis_header = {
+            .header = {
+                .type = CDAT_TYPE_SSLBIS,
+                .length = sslbis_size,
+            },
+            .data_type = HMATLB_DATA_TYPE_ACCESS_LATENCY,
+            .entry_base_unit = 10000,
+        },
+    };
+    
+    for (i = 0; i < count; i++) {
+        sslbis_latency->sslbe[i] = (CDATSslbe) {
+            .port_x_id = CDAT_PORT_ID_USP,
+            .port_y_id = port_ids[i],
+            .latency_bandwidth = 15, /* 150ns */
+        };
+    }
+    len++;
+    
+    sslbis_bandwidth = g_malloc(sslbis_size);
+    *sslbis_bandwidth = (CDATSslbis) {
+        .sslbis_header = {
+            .header = {
+                .type = CDAT_TYPE_SSLBIS,
+                .length = sslbis_size,
+            },
+            .data_type = HMATLB_DATA_TYPE_ACCESS_BANDWIDTH,
+            .entry_base_unit = 1000,
+        },
+    };
+    
+    for (i = 0; i < count; i++) {
+        sslbis_bandwidth->sslbe[i] = (CDATSslbe) {
+            .port_x_id = CDAT_PORT_ID_USP,
+            .port_y_id = port_ids[i],
+            .latency_bandwidth = 16, /* 16 GB/s */
+        };
+    }
+    len++;
+    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
+    /* Header always at start of structure */
+    i = 0;
+    (*cdat_table)[i++] = g_steal_pointer(&sslbis_latency);
+    (*cdat_table)[i++] = g_steal_pointer(&sslbis_bandwidth);
+    
+    return len;
+}
+
+static void free_default_cdat_table(CDATSubHeader **cdat_table, int num,
+                                    void *priv)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        g_free(cdat_table[i]);
+    }
+    g_free(cdat_table);
+}
+
 static void cxl_usp_realize(PCIDevice *d, Error **errp)
 {
     PCIEPort *p = PCIE_PORT(d);
@@ -161,6 +327,13 @@ static void cxl_usp_realize(PCIDevice *d, Error **errp)
                      PCI_BASE_ADDRESS_MEM_TYPE_64,
                      component_bar);
 
+    pcie_doe_init(d, &usp->doe_cdat, cxl_cstate->dvsec_offset, doe_cdat_prot, true, 1);
+
+    cxl_cstate->cdat.build_cdat_table = build_cdat_table;
+    cxl_cstate->cdat.free_cdat_table = free_default_cdat_table;
+    cxl_cstate->cdat.private = d;
+    cxl_doe_cdat_init(cxl_cstate, errp);
+
     return;
 
 err_cap:
@@ -179,6 +352,11 @@ static void cxl_usp_exitfn(PCIDevice *d)
     pci_bridge_exitfn(d);
 }
 
+static Property cxl_upstream_props[] = {
+    DEFINE_PROP_STRING("cdat", CXLUpstreamPort, cxl_cstate.cdat.filename),
+    DEFINE_PROP_END_OF_LIST()
+};
+
 static void cxl_upstream_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
@@ -186,6 +364,7 @@ static void cxl_upstream_class_init(ObjectClass *oc, void *data)
 
     k->is_bridge = true;
     k->config_write = cxl_usp_write_config;
+    k->config_read = cxl_usp_read_config;
     k->realize = cxl_usp_realize;
     k->exit = cxl_usp_exitfn;
     k->vendor_id = 0x19e5; /* Huawei */
@@ -194,6 +373,7 @@ static void cxl_upstream_class_init(ObjectClass *oc, void *data)
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->desc = "CXL Switch Upstream Port";
     dc->reset = cxl_usp_reset;
+    device_class_set_props(dc, cxl_upstream_props);
 }
 
 static const TypeInfo cxl_usp_info = {
diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
index fdb1fa98f4..6d251dc0fb 100644
--- a/include/hw/cxl/cxl_cdat.h
+++ b/include/hw/cxl/cxl_cdat.h
@@ -131,6 +131,7 @@ typedef struct CDATSslbisHeader {
     uint64_t entry_base_unit;
 } QEMU_PACKED CDATSslbisHeader;
 
+#define CDAT_PORT_ID_USP 0x100
 /* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
 typedef struct CDATSslbe {
     uint16_t port_x_id;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
@ 2022-10-07 15:21   ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-07 15:21 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: linuxarm, ira.weiny

This Data Object Exchange Mailbox allows software to query the
latency and bandwidth between ports on the switch. For now
only provide information on routes between the upstream port and
each downstream port (not p2p).

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/pci-bridge/cxl_upstream.c | 182 ++++++++++++++++++++++++++++++++++-
 include/hw/cxl/cxl_cdat.h    |   1 +
 2 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
index a83a3e81e4..9209c704ae 100644
--- a/hw/pci-bridge/cxl_upstream.c
+++ b/hw/pci-bridge/cxl_upstream.c
@@ -10,11 +10,12 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "hw/qdev-properties.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/pcie.h"
 #include "hw/pci/pcie_port.h"
 
-#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 1
+#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 2
 
 #define CXL_UPSTREAM_PORT_MSI_OFFSET 0x70
 #define CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET 0x90
@@ -28,6 +29,7 @@ typedef struct CXLUpstreamPort {
 
     /*< public >*/
     CXLComponentState cxl_cstate;
+    DOECap doe_cdat;
 } CXLUpstreamPort;
 
 CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
@@ -60,6 +62,9 @@ static void cxl_usp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
 static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
                                  uint32_t val, int len)
 {
+    CXLUpstreamPort *usp = CXL_USP(d);
+
+    pcie_doe_write_config(&usp->doe_cdat, address, val, len);
     pci_bridge_write_config(d, address, val, len);
     pcie_cap_flr_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
@@ -67,6 +72,18 @@ static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
     cxl_usp_dvsec_write_config(d, address, val, len);
 }
 
+static uint32_t cxl_usp_read_config(PCIDevice *d, uint32_t address, int len)
+{
+    CXLUpstreamPort *usp = CXL_USP(d);
+    uint32_t val;
+    
+    if (pcie_doe_read_config(&usp->doe_cdat, address, len, &val)) {
+        return val;
+    }
+
+    return pci_default_read_config(d, address, len);
+}
+
 static void latch_registers(CXLUpstreamPort *usp)
 {
     uint32_t *reg_state = usp->cxl_cstate.crb.cache_mem_registers;
@@ -119,6 +136,155 @@ static void build_dvsecs(CXLComponentState *cxl)
                                REG_LOC_DVSEC_REVID, dvsec);
 }
 
+static bool cxl_doe_cdat_rsp(DOECap *doe_cap)
+{
+    CDATObject *cdat = &CXL_USP(doe_cap->pdev)->cxl_cstate.cdat;
+    uint16_t ent;
+    void *base;
+    uint32_t len;
+    CDATReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    CDATRsp rsp;
+
+    cxl_doe_cdat_update(&CXL_USP(doe_cap->pdev)->cxl_cstate, &error_fatal);
+    assert(cdat->entry_len);
+
+    /* Discard if request length mismatched */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(CDATReq), sizeof(uint32_t))) {
+        return false;
+    }
+
+    ent = req->entry_handle;
+    base = cdat->entry[ent].base;
+    len = cdat->entry[ent].length;
+
+    rsp = (CDATRsp) {
+        .header = {
+            .vendor_id = CXL_VENDOR_ID,
+            .data_obj_type = CXL_DOE_TABLE_ACCESS,
+            .reserved = 0x0,
+            .length = DIV_ROUND_UP((sizeof(rsp) + len), sizeof(uint32_t)),
+        },
+        .rsp_code = CXL_DOE_TAB_RSP,
+        .table_type = CXL_DOE_TAB_TYPE_CDAT,
+        .entry_handle = (ent < cdat->entry_len - 1) ?
+                        ent + 1 : CXL_DOE_TAB_ENT_MAX,
+    };
+
+    memcpy(doe_cap->read_mbox, &rsp, sizeof(rsp));
+        memcpy(doe_cap->read_mbox + DIV_ROUND_UP(sizeof(rsp), sizeof(uint32_t)),
+           base, len);
+
+    doe_cap->read_mbox_len += rsp.header.length;
+
+    return true;
+}
+
+static DOEProtocol doe_cdat_prot[] = {
+    { CXL_VENDOR_ID, CXL_DOE_TABLE_ACCESS, cxl_doe_cdat_rsp },
+    { }
+};
+
+static int build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
+{
+    g_autofree CDATSslbis *sslbis_latency = NULL;
+    g_autofree CDATSslbis *sslbis_bandwidth = NULL;
+    CXLUpstreamPort *us = CXL_USP(priv);
+    PCIBus *bus = &PCI_BRIDGE(us)->sec_bus;
+    int devfn, sslbis_size;
+    int len = 0;
+    int i = 0;
+    int count = 0;
+    uint16_t port_ids[256];
+
+    for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
+        PCIDevice *d = bus->devices[devfn];
+        PCIEPort *port;
+        
+        if (!d || !pci_is_express(d) || !d->exp.exp_cap) {
+            continue;
+        }
+
+        /*
+         * Whilst the PCI express spec doesn't allow anything other than
+         * downstream ports on this bus, let us be a little paranoid
+         */
+        if (!object_dynamic_cast(OBJECT(d), TYPE_PCIE_PORT)) {
+            continue;
+        }
+
+        port = PCIE_PORT(d);
+        port_ids[count] = port->port;
+        count++;
+    }
+
+    /* May not yet have any ports - try again later */
+    if (count == 0) {
+        return 0;
+    }
+
+    sslbis_size = sizeof(CDATSslbis) + sizeof(*sslbis_latency->sslbe) * count;
+    sslbis_latency = g_malloc(sslbis_size);
+    *sslbis_latency = (CDATSslbis) {
+        .sslbis_header = {
+            .header = {
+                .type = CDAT_TYPE_SSLBIS,
+                .length = sslbis_size,
+            },
+            .data_type = HMATLB_DATA_TYPE_ACCESS_LATENCY,
+            .entry_base_unit = 10000,
+        },
+    };
+    
+    for (i = 0; i < count; i++) {
+        sslbis_latency->sslbe[i] = (CDATSslbe) {
+            .port_x_id = CDAT_PORT_ID_USP,
+            .port_y_id = port_ids[i],
+            .latency_bandwidth = 15, /* 150ns */
+        };
+    }
+    len++;
+    
+    sslbis_bandwidth = g_malloc(sslbis_size);
+    *sslbis_bandwidth = (CDATSslbis) {
+        .sslbis_header = {
+            .header = {
+                .type = CDAT_TYPE_SSLBIS,
+                .length = sslbis_size,
+            },
+            .data_type = HMATLB_DATA_TYPE_ACCESS_BANDWIDTH,
+            .entry_base_unit = 1000,
+        },
+    };
+    
+    for (i = 0; i < count; i++) {
+        sslbis_bandwidth->sslbe[i] = (CDATSslbe) {
+            .port_x_id = CDAT_PORT_ID_USP,
+            .port_y_id = port_ids[i],
+            .latency_bandwidth = 16, /* 16 GB/s */
+        };
+    }
+    len++;
+    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
+    /* Header always at start of structure */
+    i = 0;
+    (*cdat_table)[i++] = g_steal_pointer(&sslbis_latency);
+    (*cdat_table)[i++] = g_steal_pointer(&sslbis_bandwidth);
+    
+    return len;
+}
+
+static void free_default_cdat_table(CDATSubHeader **cdat_table, int num,
+                                    void *priv)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        g_free(cdat_table[i]);
+    }
+    g_free(cdat_table);
+}
+
 static void cxl_usp_realize(PCIDevice *d, Error **errp)
 {
     PCIEPort *p = PCIE_PORT(d);
@@ -161,6 +327,13 @@ static void cxl_usp_realize(PCIDevice *d, Error **errp)
                      PCI_BASE_ADDRESS_MEM_TYPE_64,
                      component_bar);
 
+    pcie_doe_init(d, &usp->doe_cdat, cxl_cstate->dvsec_offset, doe_cdat_prot, true, 1);
+
+    cxl_cstate->cdat.build_cdat_table = build_cdat_table;
+    cxl_cstate->cdat.free_cdat_table = free_default_cdat_table;
+    cxl_cstate->cdat.private = d;
+    cxl_doe_cdat_init(cxl_cstate, errp);
+
     return;
 
 err_cap:
@@ -179,6 +352,11 @@ static void cxl_usp_exitfn(PCIDevice *d)
     pci_bridge_exitfn(d);
 }
 
+static Property cxl_upstream_props[] = {
+    DEFINE_PROP_STRING("cdat", CXLUpstreamPort, cxl_cstate.cdat.filename),
+    DEFINE_PROP_END_OF_LIST()
+};
+
 static void cxl_upstream_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
@@ -186,6 +364,7 @@ static void cxl_upstream_class_init(ObjectClass *oc, void *data)
 
     k->is_bridge = true;
     k->config_write = cxl_usp_write_config;
+    k->config_read = cxl_usp_read_config;
     k->realize = cxl_usp_realize;
     k->exit = cxl_usp_exitfn;
     k->vendor_id = 0x19e5; /* Huawei */
@@ -194,6 +373,7 @@ static void cxl_upstream_class_init(ObjectClass *oc, void *data)
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->desc = "CXL Switch Upstream Port";
     dc->reset = cxl_usp_reset;
+    device_class_set_props(dc, cxl_upstream_props);
 }
 
 static const TypeInfo cxl_usp_info = {
diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
index fdb1fa98f4..6d251dc0fb 100644
--- a/include/hw/cxl/cxl_cdat.h
+++ b/include/hw/cxl/cxl_cdat.h
@@ -131,6 +131,7 @@ typedef struct CDATSslbisHeader {
     uint64_t entry_base_unit;
 } QEMU_PACKED CDATSslbisHeader;
 
+#define CDAT_PORT_ID_USP 0x100
 /* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
 typedef struct CDATSslbe {
     uint16_t port_x_id;
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
  2022-10-07 15:21 ` Jonathan Cameron via
@ 2022-10-10 10:30   ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-10 10:30 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: ira.weiny

On Fri, 7 Oct 2022 16:21:51 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> Whilst I have carried on Huai-Cheng Kuo's series version numbering and
> naming, there have been very substantial changes since v6 so I would
> suggest fresh review makes sense for anyone who has looked at this before.
> In particularly if the Avery design folks could check I haven't broken
> anything that would be great.

I forgot to run checkpatch on these and there is some white space that
will need cleaning up and one instance of missing brackets.
As that doesn't greatly affect review, I'll wait for a few days to see
if there is other feedback to incorporate in v8.

Sorry for the resulting noise!

These are now available at
https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-09
along with a bunch of other CXL features:
* Compliance DOE protocol
* SPDM / CMA over DOE supprot
* ARM64 support in general.
* Various small emulation additions.
* CPMU support

I'll add a few more features to similarly named branches over the next
week or so including initial support for standalone switch CCI mailboxes.

Jonathan
 
> 
> For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
> https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/
> 
> Summary of changes:
> 1) Linux headers definitions for DOE are now upstream so drop that patch.
> 2) Add CDAT for switch upstream port.
> 3) Generate 'plausible' default CDAT tables when a file is not provided.
> 4) General refactoring to calculate the correct table sizes and allocate
>    based on that rather than copying from a local static array.
> 5) Changes from earlier reviews such as matching QEMU type naming style.
> 6) Moved compliance and SPDM usecases to future patch sets.
> 
> Sign-offs on these are complex because the patches were originally developed
> by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
> Cameron who made substantial changes.
> 
> Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to maintain this
> code as per the original MAINTAINERS entry.
> 
> What's here?
> 
> This series brings generic PCI Express Data Object Exchange support (DOE)
> DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
> config space via a PCIe Extended Capability Structure.
> The PCIe spec defines several protocols (including one to discover what
> protocols a given DOE instance supports) and other specification such as
> CXL define additional protocols using their own vendor IDs.
> 
> In this series we make use of the DOE to support the CXL spec defined
> Table Access Protocol, specifically to provide access to CDAT - a
> table specified in a specification that is hosted by the UEFI forum
> and is used to provide runtime discoverability of the sort of information
> that would otherwise be available in firmware tables (memory types,
> latency and bandwidth information etc).
> 
> The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
> The version merged did not support interrupts (earlier versions did
> so that support in the emulation was tested a while back).
> 
> This series provides CDAT emulation for CXL switch upstream ports
> and CXL type 3 memory devices. Note that to exercise the switch support
> additional Linux kernel patches are needed.
> https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
> (I'll post a new version of that support shortly)
> 
> Additional protocols will be supported by follow on patch sets:
> * CXL compliance protocol.
> * CMA / SPDM device attestation.
> (Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
> that tree next week)
> 
> Huai-Cheng Kuo (3):
>   hw/pci: PCIe Data Object Exchange emulation
>   hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
>   hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
> 
> Jonathan Cameron (2):
>   hw/mem/cxl-type3: Add MSIX support
>   hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
> 
>  MAINTAINERS                    |   7 +
>  hw/cxl/cxl-cdat.c              | 222 ++++++++++++++++++++
>  hw/cxl/meson.build             |   1 +
>  hw/mem/cxl_type3.c             | 236 +++++++++++++++++++++
>  hw/pci-bridge/cxl_upstream.c   | 182 +++++++++++++++-
>  hw/pci/meson.build             |   1 +
>  hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
>  include/hw/cxl/cxl_component.h |   7 +
>  include/hw/cxl/cxl_device.h    |   3 +
>  include/hw/cxl/cxl_pci.h       |   1 +
>  include/hw/pci/pci_ids.h       |   3 +
>  include/hw/pci/pcie.h          |   1 +
>  include/hw/pci/pcie_doe.h      | 123 +++++++++++
>  include/hw/pci/pcie_regs.h     |   4 +
>  15 files changed, 1323 insertions(+), 1 deletion(-)
>  create mode 100644 hw/cxl/cxl-cdat.c
>  create mode 100644 hw/pci/pcie_doe.c
>  create mode 100644 include/hw/cxl/cxl_cdat.h
>  create mode 100644 include/hw/pci/pcie_doe.h
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
@ 2022-10-10 10:30   ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-10 10:30 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: ira.weiny

On Fri, 7 Oct 2022 16:21:51 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> Whilst I have carried on Huai-Cheng Kuo's series version numbering and
> naming, there have been very substantial changes since v6 so I would
> suggest fresh review makes sense for anyone who has looked at this before.
> In particularly if the Avery design folks could check I haven't broken
> anything that would be great.

I forgot to run checkpatch on these and there is some white space that
will need cleaning up and one instance of missing brackets.
As that doesn't greatly affect review, I'll wait for a few days to see
if there is other feedback to incorporate in v8.

Sorry for the resulting noise!

These are now available at
https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-09
along with a bunch of other CXL features:
* Compliance DOE protocol
* SPDM / CMA over DOE supprot
* ARM64 support in general.
* Various small emulation additions.
* CPMU support

I'll add a few more features to similarly named branches over the next
week or so including initial support for standalone switch CCI mailboxes.

Jonathan
 
> 
> For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
> https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/
> 
> Summary of changes:
> 1) Linux headers definitions for DOE are now upstream so drop that patch.
> 2) Add CDAT for switch upstream port.
> 3) Generate 'plausible' default CDAT tables when a file is not provided.
> 4) General refactoring to calculate the correct table sizes and allocate
>    based on that rather than copying from a local static array.
> 5) Changes from earlier reviews such as matching QEMU type naming style.
> 6) Moved compliance and SPDM usecases to future patch sets.
> 
> Sign-offs on these are complex because the patches were originally developed
> by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
> Cameron who made substantial changes.
> 
> Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to maintain this
> code as per the original MAINTAINERS entry.
> 
> What's here?
> 
> This series brings generic PCI Express Data Object Exchange support (DOE)
> DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
> config space via a PCIe Extended Capability Structure.
> The PCIe spec defines several protocols (including one to discover what
> protocols a given DOE instance supports) and other specification such as
> CXL define additional protocols using their own vendor IDs.
> 
> In this series we make use of the DOE to support the CXL spec defined
> Table Access Protocol, specifically to provide access to CDAT - a
> table specified in a specification that is hosted by the UEFI forum
> and is used to provide runtime discoverability of the sort of information
> that would otherwise be available in firmware tables (memory types,
> latency and bandwidth information etc).
> 
> The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
> The version merged did not support interrupts (earlier versions did
> so that support in the emulation was tested a while back).
> 
> This series provides CDAT emulation for CXL switch upstream ports
> and CXL type 3 memory devices. Note that to exercise the switch support
> additional Linux kernel patches are needed.
> https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
> (I'll post a new version of that support shortly)
> 
> Additional protocols will be supported by follow on patch sets:
> * CXL compliance protocol.
> * CMA / SPDM device attestation.
> (Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
> that tree next week)
> 
> Huai-Cheng Kuo (3):
>   hw/pci: PCIe Data Object Exchange emulation
>   hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
>   hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
> 
> Jonathan Cameron (2):
>   hw/mem/cxl-type3: Add MSIX support
>   hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
> 
>  MAINTAINERS                    |   7 +
>  hw/cxl/cxl-cdat.c              | 222 ++++++++++++++++++++
>  hw/cxl/meson.build             |   1 +
>  hw/mem/cxl_type3.c             | 236 +++++++++++++++++++++
>  hw/pci-bridge/cxl_upstream.c   | 182 +++++++++++++++-
>  hw/pci/meson.build             |   1 +
>  hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
>  include/hw/cxl/cxl_component.h |   7 +
>  include/hw/cxl/cxl_device.h    |   3 +
>  include/hw/cxl/cxl_pci.h       |   1 +
>  include/hw/pci/pci_ids.h       |   3 +
>  include/hw/pci/pcie.h          |   1 +
>  include/hw/pci/pcie_doe.h      | 123 +++++++++++
>  include/hw/pci/pcie_regs.h     |   4 +
>  15 files changed, 1323 insertions(+), 1 deletion(-)
>  create mode 100644 hw/cxl/cxl-cdat.c
>  create mode 100644 hw/pci/pcie_doe.c
>  create mode 100644 include/hw/cxl/cxl_cdat.h
>  create mode 100644 include/hw/pci/pcie_doe.h
> 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
  2022-10-10 10:30   ` Jonathan Cameron via
  (?)
@ 2022-10-11  9:45   ` Huai-Cheng
  -1 siblings, 0 replies; 58+ messages in thread
From: Huai-Cheng @ 2022-10-11  9:45 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Chris Browy, ira.weiny

[-- Attachment #1: Type: text/plain, Size: 5536 bytes --]

Hi Jonathan,

We've reviewed the patches related to DOE and everything looks good. And we
are glad to maintain the code as the maintainers.

Thanks for applying the changes.

Best Regards,
Huai-Cheng Kuo

On Mon, Oct 10, 2022 at 6:30 PM Jonathan Cameron <
Jonathan.Cameron@huawei.com> wrote:

> On Fri, 7 Oct 2022 16:21:51 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>
> > Whilst I have carried on Huai-Cheng Kuo's series version numbering and
> > naming, there have been very substantial changes since v6 so I would
> > suggest fresh review makes sense for anyone who has looked at this
> before.
> > In particularly if the Avery design folks could check I haven't broken
> > anything that would be great.
>
> I forgot to run checkpatch on these and there is some white space that
> will need cleaning up and one instance of missing brackets.
> As that doesn't greatly affect review, I'll wait for a few days to see
> if there is other feedback to incorporate in v8.
>
> Sorry for the resulting noise!
>
> These are now available at
> https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-09
> along with a bunch of other CXL features:
> * Compliance DOE protocol
> * SPDM / CMA over DOE supprot
> * ARM64 support in general.
> * Various small emulation additions.
> * CPMU support
>
> I'll add a few more features to similarly named branches over the next
> week or so including initial support for standalone switch CCI mailboxes.
>
> Jonathan
>
> >
> > For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
> >
> https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/
> >
> > Summary of changes:
> > 1) Linux headers definitions for DOE are now upstream so drop that patch.
> > 2) Add CDAT for switch upstream port.
> > 3) Generate 'plausible' default CDAT tables when a file is not provided.
> > 4) General refactoring to calculate the correct table sizes and allocate
> >    based on that rather than copying from a local static array.
> > 5) Changes from earlier reviews such as matching QEMU type naming style.
> > 6) Moved compliance and SPDM usecases to future patch sets.
> >
> > Sign-offs on these are complex because the patches were originally
> developed
> > by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by
> Jonathan
> > Cameron who made substantial changes.
> >
> > Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to
> maintain this
> > code as per the original MAINTAINERS entry.
> >
> > What's here?
> >
> > This series brings generic PCI Express Data Object Exchange support (DOE)
> > DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in
> PCI
> > config space via a PCIe Extended Capability Structure.
> > The PCIe spec defines several protocols (including one to discover what
> > protocols a given DOE instance supports) and other specification such as
> > CXL define additional protocols using their own vendor IDs.
> >
> > In this series we make use of the DOE to support the CXL spec defined
> > Table Access Protocol, specifically to provide access to CDAT - a
> > table specified in a specification that is hosted by the UEFI forum
> > and is used to provide runtime discoverability of the sort of information
> > that would otherwise be available in firmware tables (memory types,
> > latency and bandwidth information etc).
> >
> > The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
> > The version merged did not support interrupts (earlier versions did
> > so that support in the emulation was tested a while back).
> >
> > This series provides CDAT emulation for CXL switch upstream ports
> > and CXL type 3 memory devices. Note that to exercise the switch support
> > additional Linux kernel patches are needed.
> >
> https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
> > (I'll post a new version of that support shortly)
> >
> > Additional protocols will be supported by follow on patch sets:
> > * CXL compliance protocol.
> > * CMA / SPDM device attestation.
> > (Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will
> refresh
> > that tree next week)
> >
> > Huai-Cheng Kuo (3):
> >   hw/pci: PCIe Data Object Exchange emulation
> >   hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
> >   hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
> >
> > Jonathan Cameron (2):
> >   hw/mem/cxl-type3: Add MSIX support
> >   hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
> >
> >  MAINTAINERS                    |   7 +
> >  hw/cxl/cxl-cdat.c              | 222 ++++++++++++++++++++
> >  hw/cxl/meson.build             |   1 +
> >  hw/mem/cxl_type3.c             | 236 +++++++++++++++++++++
> >  hw/pci-bridge/cxl_upstream.c   | 182 +++++++++++++++-
> >  hw/pci/meson.build             |   1 +
> >  hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
> >  include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
> >  include/hw/cxl/cxl_component.h |   7 +
> >  include/hw/cxl/cxl_device.h    |   3 +
> >  include/hw/cxl/cxl_pci.h       |   1 +
> >  include/hw/pci/pci_ids.h       |   3 +
> >  include/hw/pci/pcie.h          |   1 +
> >  include/hw/pci/pcie_doe.h      | 123 +++++++++++
> >  include/hw/pci/pcie_regs.h     |   4 +
> >  15 files changed, 1323 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/cxl/cxl-cdat.c
> >  create mode 100644 hw/pci/pcie_doe.c
> >  create mode 100644 include/hw/cxl/cxl_cdat.h
> >  create mode 100644 include/hw/pci/pcie_doe.h
> >
>
>

[-- Attachment #2: Type: text/html, Size: 7197 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-07 15:21 ` Jonathan Cameron via
                   ` (6 preceding siblings ...)
  (?)
@ 2022-10-11 21:19 ` Gregory Price
  2022-10-11 21:19   ` [PATCH 1/5] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL Gregory Price
                     ` (5 more replies)
  -1 siblings, 6 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-11 21:19 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

Summary of Changes:
1) Correction of PCI_CLASS from STORAGE_EXPRESS to MEMORY_CXL on init
2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
3) Refactor CDAT DSMAS Initialization for multi-region initialization
4) Multi-Region and Volatile Memory support for CXL Type-3 Devices
5) Test and Documentation updates

Developed with input from Jonathan Cameron and Davidloh Bueso.

This series brings 2 features to CXL Type-3 Devices:
    1) Volatile Memory Region support
    2) Multi-Region support (1 Volatile, 1 Persistent)

In this series we implement multi-region and volatile region support
through 6 major changes to CXL devices
    1) The HostMemoryBackend [hostmem] has been replaced by two
       [hostvmem] and [hostpmem] to store volatile and persistent memory
       respectively
    2) The single AddressSpace has been replaced by two AddressSpaces
       [hostvmem_as] and [hostpmem_as] to map respective memdevs.
    3) Each memory region size and total region are stored separately
    4) The CDAT and DVSEC memory map entries have been updated:
       a) if vmem is present, vmem is mapped at DPA(0)
       b) if pmem is present
          i)  and vmem is present, pmem is mapped at DPA(vmem->size)
          ii) else, pmem is mapped at DPA(0)
       c) partitioning of pmem is not supported in this patch set but
          has been discussed and this design should suffice.
    5) Read/Write functions have been updated to access AddressSpaces
       according to the mapping described in #4
    6) cxl-mailbox has been updated to report the respective size of
       volatile and persistent memory regions

CXL Spec (3.0) Section 8.2.9.8.2.0 - Get Partition Info
  Active Volatile Memory
    The device shall provide this volatile capacity starting at DPA 0
  Active Persistent Memory
    The device shall provide this persistent capacity starting at the
    DPA immediately following the volatile capacity

Partitioning of Persistent Memory regions may be supported on following
patch sets.

Submitted as an extention to the CDAT emulation because the CDAT DSMAS
entry concerns memory mapping and is required to successfully map memory
regions correctly in bios/efi.

Gregory Price (5):
  hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL
  hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
  hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory
    Regions
  hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  cxl: update tests and documentation for new cxl properties

 docs/system/devices/cxl.rst |  53 ++++-
 hw/cxl/cxl-mailbox-utils.c  |  23 +-
 hw/mem/cxl_type3.c          | 449 +++++++++++++++++++++++-------------
 include/hw/cxl/cxl_device.h |  11 +-
 tests/qtest/cxl-test.c      |  81 ++++++-
 5 files changed, 416 insertions(+), 201 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/5] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL
  2022-10-11 21:19 ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
@ 2022-10-11 21:19   ` Gregory Price
  2022-10-11 21:19   ` [PATCH 2/5] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-11 21:19 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny,
	Jonathan Cameron

Current code sets to STORAGE_EXPRESS and then overrides it.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/mem/cxl_type3.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 3e7ca7a455..282f274266 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -535,7 +535,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     }
 
     pci_config_set_prog_interface(pci_conf, 0x10);
-    pci_config_set_class(pci_conf, PCI_CLASS_MEMORY_CXL);
 
     pcie_endpoint_cap_init(pci_dev, 0x80);
     if (ct3d->sn != UI64_NULL) {
@@ -763,7 +762,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
     pc->config_read = ct3d_config_read;
     pc->realize = ct3_realize;
     pc->exit = ct3_exit;
-    pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
+    pc->class_id = PCI_CLASS_MEMORY_CXL;
     pc->vendor_id = PCI_VENDOR_ID_INTEL;
     pc->device_id = 0xd93; /* LVF for now */
     pc->revision = 1;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/5] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
  2022-10-11 21:19 ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
  2022-10-11 21:19   ` [PATCH 1/5] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL Gregory Price
@ 2022-10-11 21:19   ` Gregory Price
  2022-10-11 21:19   ` [PATCH 3/5] hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory Regions Gregory Price
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-11 21:19 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

Remove usage of magic numbers when accessing capacity fields and replace
with CXL_CAPACITY_MULTIPLIER, matching the kernel definition.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
---
 hw/cxl/cxl-mailbox-utils.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c7e1a88b44..776c8cbadc 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -14,6 +14,8 @@
 #include "qemu/log.h"
 #include "qemu/uuid.h"
 
+#define CXL_CAPACITY_MULTIPLIER   0x10000000 /* SZ_256M */
+
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
  *  1. Add the command set and cmd to the enum.
@@ -140,7 +142,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
     } QEMU_PACKED *fw_info;
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
-    if (cxl_dstate->pmem_size < (256 << 20)) {
+    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -285,7 +287,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
     uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -295,8 +297,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
     /* PMEM only */
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
-    id->total_capacity = size / (256 << 20);
-    id->persistent_capacity = size / (256 << 20);
+    id->total_capacity = size / CXL_CAPACITY_MULTIPLIER;
+    id->persistent_capacity = size / CXL_CAPACITY_MULTIPLIER;
     id->lsa_size = cvc->get_lsa_size(ct3d);
     id->poison_list_max_mer[1] = 0x1; /* 256 poison records */
 
@@ -317,14 +319,14 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
     uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
     /* PMEM only */
     part_info->active_vmem = 0;
     part_info->next_vmem = 0;
-    part_info->active_pmem = size / (256 << 20);
+    part_info->active_pmem = size / CXL_CAPACITY_MULTIPLIER;
     part_info->next_pmem = 0;
 
     *len = sizeof(*part_info);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 3/5] hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory Regions
  2022-10-11 21:19 ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
  2022-10-11 21:19   ` [PATCH 1/5] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL Gregory Price
  2022-10-11 21:19   ` [PATCH 2/5] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
@ 2022-10-11 21:19   ` Gregory Price
  2022-10-12 14:10       ` Jonathan Cameron via
  2022-10-11 21:19   ` [PATCH 4/5] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-11 21:19 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

This is a preparatory commit for enabling multiple memory regions within
a single CXL Type-3 device.  We will need to initialize multiple CDAT
DSMAS regions (and subsequent DSLBIS, and DSEMTS entries), so generalize
the intialization into a function.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/mem/cxl_type3.c | 275 +++++++++++++++++++++++++--------------------
 1 file changed, 154 insertions(+), 121 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 282f274266..dda78704c2 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -24,145 +24,178 @@
 #define UI64_NULL ~(0ULL)
 #define DWORD_BYTE 4
 
+static int ct3_build_dsmas(CDATDsmas *dsmas,
+                           CDATDslbis *dslbis,
+                           CDATDsemts *dsemts,
+                           MemoryRegion *mr,
+                           int dsmad_handle,
+                           bool is_pmem,
+                           uint64_t dpa_base)
+{
+    int len = 0;
+    /* ttl_len should be incremented for every entry */
+
+    /* Device Scoped Memory Affinity Structure */
+    *dsmas = (CDATDsmas) {
+        .header = {
+            .type = CDAT_TYPE_DSMAS,
+            .length = sizeof(*dsmas),
+        },
+        .DSMADhandle = dsmad_handle,
+        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0),
+        .DPA_base = dpa_base,
+        .DPA_length = int128_get64(mr->size),
+    };
+    len++;
+
+    /* For now, no memory side cache, plausiblish numbers */
+    dslbis[0] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis),
+        },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_READ_LATENCY,
+        .entry_base_unit = 10000, /* 10ns base */
+        .entry[0] = 15, /* 150ns */
+    };
+    len++;
+
+    dslbis[1] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis),
+        },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
+        .entry_base_unit = 10000,
+        .entry[0] = 25, /* 250ns */
+    };
+    len++;
+
+    dslbis[2] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis),
+            },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
+        .entry_base_unit = 1000, /* GB/s */
+        .entry[0] = 16,
+    };
+    len++;
+
+    dslbis[3] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis),
+        },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
+        .entry_base_unit = 1000, /* GB/s */
+        .entry[0] = 16,
+    };
+    len++;
+
+    *dsemts = (CDATDsemts) {
+        .header = {
+            .type = CDAT_TYPE_DSEMTS,
+            .length = sizeof(*dsemts),
+        },
+        .DSMAS_handle = dsmad_handle,
+        /* EFI_MEMORY_NV implies EfiReservedMemoryType */
+        .EFI_memory_type_attr = is_pmem ? 2 : 0,
+        /* Reserved - the non volatile from DSMAS matters */
+        .DPA_offset = 0,
+        .DPA_length = int128_get64(mr->size),
+    };
+    len++;
+    return len;
+}
+
 static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
                                 void *priv)
 {
-    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
-    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
-    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
+    g_autofree CDATDsmas *dsmas = NULL;
+    g_autofree CDATDslbis *dslbis = NULL;
+    g_autofree CDATDsemts *dsemts = NULL;
     CXLType3Dev *ct3d = priv;
-    int len = 0;
-    int i = 0;
-    int next_dsmad_handle = 0;
-    int nonvolatile_dsmad = -1;
-    int dslbis_nonvolatile_num = 4;
+    int cdat_len = 0;
+    int cdat_idx = 0, sub_idx = 0;
+    int dsmas_num, dslbis_num, dsemts_num;
+    int dsmad_handle = 0;
+    uint64_t dpa_base = 0;
     MemoryRegion *mr;
 
-    /* Non volatile aspects */
-    if (ct3d->hostmem) {
-        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
-        if (!dsmas_nonvolatile) {
-            return -ENOMEM;
-        }
-        nonvolatile_dsmad = next_dsmad_handle++;
-        mr = host_memory_backend_get_memory(ct3d->hostmem);
-        if (!mr) {
-            return -EINVAL;
-        }
-        *dsmas_nonvolatile = (CDATDsmas) {
-            .header = {
-                .type = CDAT_TYPE_DSMAS,
-                .length = sizeof(*dsmas_nonvolatile),
-            },
-            .DSMADhandle = nonvolatile_dsmad,
-            .flags = CDAT_DSMAS_FLAG_NV,
-            .DPA_base = 0,
-            .DPA_length = int128_get64(mr->size),
-        };
-        len++;
-
-        /* For now, no memory side cache, plausiblish numbers */
-        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
-        if (!dslbis_nonvolatile)
-            return -ENOMEM;
-
-        dslbis_nonvolatile[0] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_READ_LATENCY,
-            .entry_base_unit = 10000, /* 10ns base */
-            .entry[0] = 15, /* 150ns */
-        };
-        len++;
-
-        dslbis_nonvolatile[1] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
-            .entry_base_unit = 10000,
-            .entry[0] = 25, /* 250ns */
-        };
-        len++;
-       
-        dslbis_nonvolatile[2] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
-            .entry_base_unit = 1000, /* GB/s */
-            .entry[0] = 16,
-        };
-        len++;
-
-        dslbis_nonvolatile[3] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
-            .entry_base_unit = 1000, /* GB/s */
-            .entry[0] = 16,
-        };
-        len++;
-
-        mr = host_memory_backend_get_memory(ct3d->hostmem);
-        if (!mr) {
-            return -EINVAL;
-        }
-        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
-        *dsemts_nonvolatile = (CDATDsemts) {
-            .header = {
-                .type = CDAT_TYPE_DSEMTS,
-                .length = sizeof(*dsemts_nonvolatile),
-            },
-            .DSMAS_handle = nonvolatile_dsmad,
-            .EFI_memory_type_attr = 2, /* Reserved - the non volatile from DSMAS matters */
-            .DPA_offset = 0,
-            .DPA_length = int128_get64(mr->size),
-        };
-        len++;
+    if (!ct3d->hostmem | !host_memory_backend_get_memory(ct3d->hostmem)) {
+        return -EINVAL;
+    }
+
+    dsmas_num = 1;
+    dslbis_num = 4 * dsmas_num;
+    dsemts_num = dsmas_num;
+
+    dsmas = g_malloc(sizeof(*dsmas) * dsmas_num);
+    dslbis = g_malloc(sizeof(*dslbis) * dslbis_num);
+    dsemts = g_malloc(sizeof(*dsemts) * dsemts_num);
+
+    if (!dsmas || !dslbis || !dsemts) {
+        return -ENOMEM;
+    }
+
+    mr = host_memory_backend_get_memory(ct3d->hostmem);
+    cdat_len += ct3_build_dsmas(&dsmas[dsmad_handle],
+                                &dslbis[4 * dsmad_handle],
+                                &dsemts[dsmad_handle],
+                                mr,
+                                dsmad_handle,
+                                false,
+                                dpa_base);
+    dpa_base += mr->size;
+    dsmad_handle++;
+
+    /* Allocate and fill in the CDAT table */
+    *cdat_table = g_malloc0(cdat_len * sizeof(*cdat_table));
+    if (!*cdat_table) {
+        return -ENOMEM;
     }
 
-    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
     /* Header always at start of structure */
-    if (dsmas_nonvolatile) {
-        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
+    CDATDsmas *dsmas_ent = g_steal_pointer(&dsmas);
+    for (sub_idx = 0; sub_idx < dsmas_num; sub_idx++) {
+        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dsmas_ent[sub_idx];
     }
-    if (dslbis_nonvolatile) {
-        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);        
-        int j;
 
-        for (j = 0; j < dslbis_nonvolatile_num; j++) {
-            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
-        }
+    CDATDslbis *dslbis_ent = g_steal_pointer(&dslbis);
+    for (sub_idx = 0; sub_idx < dslbis_num; sub_idx++) {
+        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dslbis_ent[sub_idx];
     }
-    if (dsemts_nonvolatile) {
-        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
+
+    CDATDsemts *dsemts_ent = g_steal_pointer(&dsemts);
+    for (sub_idx = 0; sub_idx < dsemts_num; sub_idx++) {
+        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dsemts_ent[sub_idx];
     }
-    
-    return len;
+
+    return cdat_len;
 }
 
 static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
 {
-    int i;
+    int dsmas_num = 1;
+    int dslbis_idx = dsmas_num;
+    int dsemts_idx = dsmas_num + (dsmas_num * 4);
+
+    /* There are only 3 sub-tables to free: dsmas, dslbis, dsemts */
+    assert(num == (dsmas_num + (dsmas_num * 4) + (dsmas_num)));
+
+    g_free(cdat_table[0]);
+    g_free(cdat_table[dslbis_idx]);
+    g_free(cdat_table[dsemts_idx]);
 
-    for (i = 0; i < num; i++) {
-        g_free(cdat_table[i]);
-    }
     g_free(cdat_table);
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 4/5] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
  2022-10-11 21:19 ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
                     ` (2 preceding siblings ...)
  2022-10-11 21:19   ` [PATCH 3/5] hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory Regions Gregory Price
@ 2022-10-11 21:19   ` Gregory Price
  2022-10-11 21:19   ` [PATCH 5/5] cxl: update tests and documentation for new cxl properties Gregory Price
  2022-10-11 22:20   ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Michael S. Tsirkin
  5 siblings, 0 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-11 21:19 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

This commit enables each CXL Type-3 device to contain one volatile
memory region and one persistent region.

Two new properties have been added to cxl-type3 device initialization:
    [volatile-memdev] and [persistent-memdev]

The existing [memdev] property has been deprecated and will default the
memory region to a persistent memory region (although a user may assign
the region to a ram or file backed region). It cannot be used in
combination with the new [persistent-memdev] property.

Partitioning volatile memory from persistent memory is not yet supported.

Volatile memory is mapped at DPA(0x0), while Persistent memory is mapped
at DPA(vmem->size), per CXL Spec 8.2.9.8.2.0 - Get Partition Info.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  21 ++--
 hw/mem/cxl_type3.c          | 197 ++++++++++++++++++++++++++----------
 include/hw/cxl/cxl_device.h |  11 +-
 3 files changed, 162 insertions(+), 67 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 776c8cbadc..88d33e9a37 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -142,7 +142,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
     } QEMU_PACKED *fw_info;
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
-    if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
+    if (cxl_dstate->mem_size < CXL_CAPACITY_MULTIPLIER) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -285,20 +285,20 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
 
     CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
-    uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
+    if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
     id = (void *)cmd->payload;
     memset(id, 0, sizeof(*id));
 
-    /* PMEM only */
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
-    id->total_capacity = size / CXL_CAPACITY_MULTIPLIER;
-    id->persistent_capacity = size / CXL_CAPACITY_MULTIPLIER;
+    id->total_capacity = cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER;
+    id->persistent_capacity = cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER;
+    id->volatile_capacity = cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER;
     id->lsa_size = cvc->get_lsa_size(ct3d);
     id->poison_list_max_mer[1] = 0x1; /* 256 poison records */
 
@@ -317,16 +317,15 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
         uint64_t next_pmem;
     } QEMU_PACKED *part_info = (void *)cmd->payload;
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
-    uint64_t size = cxl_dstate->pmem_size;
 
-    if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
+    if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
-    /* PMEM only */
-    part_info->active_vmem = 0;
+    part_info->active_vmem = cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER;
     part_info->next_vmem = 0;
-    part_info->active_pmem = size / CXL_CAPACITY_MULTIPLIER;
+    part_info->active_pmem = cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER;
     part_info->next_pmem = 0;
 
     *len = sizeof(*part_info);
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index dda78704c2..c371cd06e1 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -131,11 +131,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     uint64_t dpa_base = 0;
     MemoryRegion *mr;
 
-    if (!ct3d->hostmem | !host_memory_backend_get_memory(ct3d->hostmem)) {
+    if ((!ct3d->hostvmem && !ct3d->hostpmem) ||
+        (ct3d->hostvmem && !host_memory_backend_get_memory(ct3d->hostvmem)) ||
+        (ct3d->hostpmem && !host_memory_backend_get_memory(ct3d->hostpmem))) {
         return -EINVAL;
     }
 
-    dsmas_num = 1;
+    dsmas_num = (ct3d->hostvmem ? 1 : 0) + (ct3d->hostpmem ? 1 : 0);
     dslbis_num = 4 * dsmas_num;
     dsemts_num = dsmas_num;
 
@@ -147,16 +149,30 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         return -ENOMEM;
     }
 
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    cdat_len += ct3_build_dsmas(&dsmas[dsmad_handle],
-                                &dslbis[4 * dsmad_handle],
-                                &dsemts[dsmad_handle],
-                                mr,
-                                dsmad_handle,
-                                false,
-                                dpa_base);
-    dpa_base += mr->size;
-    dsmad_handle++;
+    if (ct3d->hostvmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostvmem);
+        cdat_len += ct3_build_dsmas(&dsmas[dsmad_handle],
+                        &dslbis[4 * dsmad_handle],
+                        &dsemts[dsmad_handle],
+                        mr,
+                        dsmad_handle,
+                        false,
+                        dpa_base);
+        dpa_base += mr->size;
+        dsmad_handle++;
+    }
+    if (ct3d->hostpmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostpmem);
+        cdat_len += ct3_build_dsmas(&dsmas[dsmad_handle],
+                        &dslbis[4 * dsmad_handle],
+                        &dsemts[dsmad_handle],
+                        mr,
+                        dsmad_handle,
+                        false,
+                        dpa_base);
+        dpa_base += mr->size;
+        dsmad_handle++;
+    }
 
     /* Allocate and fill in the CDAT table */
     *cdat_table = g_malloc0(cdat_len * sizeof(*cdat_table));
@@ -185,7 +201,8 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
 
 static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
 {
-    int dsmas_num = 1;
+    CXLType3Dev *ct3d = priv;
+    int dsmas_num = (ct3d->hostvmem ? 1 : 0) + (ct3d->hostpmem ? 1 : 0);
     int dslbis_idx = dsmas_num;
     int dsemts_idx = dsmas_num + (dsmas_num * 4);
 
@@ -386,16 +403,48 @@ static void build_dvsecs(CXLType3Dev *ct3d)
     CXLDVSECRegisterLocator *regloc_dvsec;
     uint8_t *dvsec;
     int i;
+    uint32_t range1_size_hi = 0, range1_size_lo = 0,
+             range1_base_hi = 0, range1_base_lo = 0,
+             range2_size_hi = 0, range2_size_lo = 0,
+             range2_base_hi = 0, range2_base_lo = 0;
+
+    /*
+     * Volatile memory is mapped as (0x0)
+     * Persistent memory is mapped at (volatile->size)
+     */
+    if (ct3d->hostvmem && ct3d->hostpmem) {
+        range1_size_hi = ct3d->hostvmem->size >> 32;
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                         (ct3d->hostvmem->size & 0xF0000000);
+        range1_base_hi = 0;
+        range1_base_lo = 0;
+        range2_size_hi = ct3d->hostpmem->size >> 32;
+        range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                         (ct3d->hostpmem->size & 0xF0000000);
+        range2_base_hi = ct3d->hostvmem->size >> 32;
+        range2_base_lo = ct3d->hostvmem->size & 0xF0000000;
+    } else {
+        HostMemoryBackend* hmbe = ct3d->hostvmem ?
+                                  ct3d->hostvmem : ct3d->hostpmem;
+        range1_size_hi = hmbe->size >> 32;
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+                         (hmbe->size & 0xF0000000);
+        range1_base_hi = 0;
+        range1_base_lo = 0;
+    }
 
     dvsec = (uint8_t *)&(CXLDVSECDevice){
         .cap = 0x1e,
         .ctrl = 0x2,
         .status2 = 0x2,
-        .range1_size_hi = ct3d->hostmem->size >> 32,
-        .range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
-        (ct3d->hostmem->size & 0xF0000000),
-        .range1_base_hi = 0,
-        .range1_base_lo = 0,
+        .range1_size_hi = range1_size_hi,
+        .range1_size_lo = range1_size_lo,
+        .range1_base_hi = range1_base_hi,
+        .range1_base_lo = range1_base_lo,
+        .range2_size_hi = range2_size_hi,
+        .range2_size_lo = range2_size_lo,
+        .range2_base_hi = range2_base_hi,
+        .range2_base_lo = range2_base_lo
     };
     cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
                                PCIE_CXL_DEVICE_DVSEC_LENGTH,
@@ -483,35 +532,57 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
     MemoryRegion *mr;
     char *name;
 
-    if (!ct3d->hostmem) {
-        error_setg(errp, "memdev property must be set");
+    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+        error_setg(errp, "at least one memdev property must be set");
+        return false;
+    } else if (ct3d->hostmem && ct3d->hostpmem) {
+        error_setg(errp, "[memdev] cannot be used with new "
+                         "[persistent-memdev] property");
         return false;
+    } else if (ct3d->hostmem) {
+        /* Use of hostmem property implies pmem */
+        ct3d->hostpmem = ct3d->hostmem;
+        ct3d->hostmem = NULL;
     }
 
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!mr) {
-        error_setg(errp, "memdev property must be set");
+    if (ct3d->hostpmem && !ct3d->lsa) {
+        error_setg(errp, "lsa property must be set for persistent devices");
         return false;
     }
-    memory_region_set_nonvolatile(mr, true);
-    memory_region_set_enabled(mr, true);
-    host_memory_backend_set_mapped(ct3d->hostmem, true);
 
-    if (ds->id) {
-        name = g_strdup_printf("cxl-type3-dpa-space:%s", ds->id);
-    } else {
-        name = g_strdup("cxl-type3-dpa-space");
+    if (ct3d->hostvmem)
+    {
+        mr = host_memory_backend_get_memory(ct3d->hostvmem);
+        memory_region_set_nonvolatile(mr, false);
+        memory_region_set_enabled(mr, true);
+        host_memory_backend_set_mapped(ct3d->hostvmem, true);
+        if (ds->id) {
+            name = g_strdup_printf("cxl-type3-dpa-vmem-space:%s", ds->id);
+        } else {
+            name = g_strdup("cxl-type3-dpa-vmem-space");
+        }
+        address_space_init(&ct3d->hostvmem_as, mr, name);
+        ct3d->cxl_dstate.vmem_size = mr->size;
+        ct3d->cxl_dstate.mem_size += mr->size;
+        g_free(name);
     }
-    address_space_init(&ct3d->hostmem_as, mr, name);
-    g_free(name);
 
-    ct3d->cxl_dstate.pmem_size = ct3d->hostmem->size;
-
-    if (!ct3d->lsa) {
-        error_setg(errp, "lsa property must be set");
-        return false;
+    if (ct3d->hostpmem)
+    {
+        mr = host_memory_backend_get_memory(ct3d->hostpmem);
+        memory_region_set_nonvolatile(mr, true);
+        memory_region_set_enabled(mr, true);
+        host_memory_backend_set_mapped(ct3d->hostpmem, true);
+        if (ds->id) {
+            name = g_strdup_printf("cxl-type3-dpa-pmem-space:%s", ds->id);
+        } else {
+            name = g_strdup("cxl-type3-dpa-pmem-space");
+        }
+        address_space_init(&ct3d->hostpmem_as, mr, name);
+        ct3d->cxl_dstate.pmem_size = mr->size;
+        ct3d->cxl_dstate.mem_size += mr->size;
+        g_free(name);
     }
-
     return true;
 }
 
@@ -627,7 +698,10 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     spdm_sock_fini(ct3d->doe_spdm.socket);
     g_free(regs->special_ops);
-    address_space_destroy(&ct3d->hostmem_as);
+    if (ct3d->hostvmem)
+        address_space_destroy(&ct3d->hostvmem_as);
+    if (ct3d->hostpmem)
+        address_space_destroy(&ct3d->hostpmem_as);
 }
 
 /* TODO: Support multiple HDM decoders and DPA skip */
@@ -667,11 +741,15 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
 {
     CXLType3Dev *ct3d = CXL_TYPE3(d);
     uint64_t dpa_offset;
-    MemoryRegion *mr;
+    MemoryRegion *vmr = NULL, *pmr = NULL;
+    AddressSpace* as;
 
-    /* TODO support volatile region */
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!mr) {
+    if (ct3d->hostvmem)
+        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+    if (ct3d->hostpmem)
+        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+
+    if (!vmr && !pmr) {
         return MEMTX_ERROR;
     }
 
@@ -679,11 +757,13 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
         return MEMTX_ERROR;
     }
 
-    if (dpa_offset > int128_get64(mr->size)) {
+    if (dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
         return MEMTX_ERROR;
     }
 
-    return address_space_read(&ct3d->hostmem_as, dpa_offset, attrs, data, size);
+    as = (vmr && (dpa_offset <= int128_get64(vmr->size))) ?
+        &ct3d->hostvmem_as : &ct3d->hostpmem_as;
+    return address_space_read(as, dpa_offset, attrs, data, size);
 }
 
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
@@ -691,10 +771,15 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 {
     CXLType3Dev *ct3d = CXL_TYPE3(d);
     uint64_t dpa_offset;
-    MemoryRegion *mr;
+    MemoryRegion *vmr = NULL, *pmr = NULL;
+    AddressSpace* as;
+
+    if (ct3d->hostvmem)
+        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+    if (ct3d->hostpmem)
+        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
 
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!mr) {
+    if (!vmr && !pmr) {
         return MEMTX_OK;
     }
 
@@ -702,11 +787,13 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
         return MEMTX_OK;
     }
 
-    if (dpa_offset > int128_get64(mr->size)) {
+    if (dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
         return MEMTX_OK;
     }
-    return address_space_write(&ct3d->hostmem_as, dpa_offset, attrs,
-                               &data, size);
+
+    as = (vmr && (dpa_offset <= int128_get64(vmr->size))) ?
+        &ct3d->hostvmem_as : &ct3d->hostpmem_as;
+    return address_space_write(as, dpa_offset, attrs, &data, size);
 }
 
 static void ct3d_reset(DeviceState *dev)
@@ -721,7 +808,11 @@ static void ct3d_reset(DeviceState *dev)
 
 static Property ct3_props[] = {
     DEFINE_PROP_LINK("memdev", CXLType3Dev, hostmem, TYPE_MEMORY_BACKEND,
-                     HostMemoryBackend *),
+                     HostMemoryBackend *), /* for backward compatibility */
+    DEFINE_PROP_LINK("persistent-memdev", CXLType3Dev, hostpmem,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
+    DEFINE_PROP_LINK("volatile-memdev", CXLType3Dev, hostvmem,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
@@ -804,7 +895,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
     pc->config_read = ct3d_config_read;
 
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
-    dc->desc = "CXL PMEM Device (Type 3)";
+    dc->desc = "CXL Memory Device (Type 3)";
     dc->reset = ct3d_reset;
     device_class_set_props(dc, ct3_props);
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 0f4e29345f..458853b373 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -141,8 +141,10 @@ typedef struct cxl_device_state {
         uint64_t host_set;
     } timestamp;
 
-    /* memory region for persistent memory, HDM */
+    /* memory region size, HDM */
+    uint64_t mem_size;
     uint64_t pmem_size;
+    uint64_t vmem_size;
 
     /* Move me later */
     CPMUState cpmu[CXL_NUM_CPMU_INSTANCES];
@@ -270,12 +272,15 @@ struct CXLType3Dev {
     PCIDevice parent_obj;
 
     /* Properties */
-    HostMemoryBackend *hostmem;
+    HostMemoryBackend *hostmem; /* deprecated */
+    HostMemoryBackend *hostvmem;
+    HostMemoryBackend *hostpmem;
     HostMemoryBackend *lsa;
     uint64_t sn;
 
     /* State */
-    AddressSpace hostmem_as;
+    AddressSpace hostvmem_as;
+    AddressSpace hostpmem_as;
     CXLComponentState cxl_cstate;
     CXLDeviceState cxl_dstate;
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 5/5] cxl: update tests and documentation for new cxl properties
  2022-10-11 21:19 ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
                     ` (3 preceding siblings ...)
  2022-10-11 21:19   ` [PATCH 4/5] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
@ 2022-10-11 21:19   ` Gregory Price
  2022-10-11 22:20   ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Michael S. Tsirkin
  5 siblings, 0 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-11 21:19 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

Adds explicit examples for --persistent-memdev and --volatile-memdev

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 docs/system/devices/cxl.rst | 53 ++++++++++++++++++------
 tests/qtest/cxl-test.c      | 81 +++++++++++++++++++++++++++++++------
 2 files changed, 110 insertions(+), 24 deletions(-)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index f25783a4ec..9e165064c8 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -300,15 +300,36 @@ Example topology involving a switch::
 
 Example command lines
 ---------------------
-A very simple setup with just one directly attached CXL Type 3 device::
+A very simple setup with just one directly attached CXL Type 3 Persistent Memory device::
 
   qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
   ...
-  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
+  -object memory-backend-file,pmem=true,id=pmem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
+  -object memory-backend-file,pmem=true,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
+  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
+  -device cxl-type3,bus=root_port13,persistent-memdev=pmem0,lsa=cxl-lsa1,id=cxl-pmem0 \
+  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
+
+A very simple setup with just one directly attached CXL Type 3 Volatile Memory device::
+
+  qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
+  ...
+  -object memory-backend-ram,id=vmem0,share=on,size=256M \
   -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
   -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
+  -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
+  -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
+
+The same volatile setup may optionally include an LSA region::
+
+  qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \
+  ...
+  -object memory-backend-ram,id=vmem0,share=on,size=256M \
+  -object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
+  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
+  -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,lsa=cxl-lsa0,id=cxl-vmem0 \
   -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
 
 A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way
@@ -328,13 +349,13 @@ the CXL Type3 device directly attached (no switches).::
   -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
   -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
   -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
+  -device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \
   -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
-  -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \
+  -device cxl-type3,bus=root_port14,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \
   -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
-  -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \
+  -device cxl-type3,bus=root_port15,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \
   -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
-  -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
+  -device cxl-type3,bus=root_port16,persistent-memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
   -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.targets.1=cxl.2,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k
 
 An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave::
@@ -354,15 +375,23 @@ An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave::
   -device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
   -device cxl-upstream,bus=root_port0,id=us0 \
   -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-  -device cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,size=256M \
+  -device cxl-type3,bus=swport0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,size=256M \
   -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-  -device cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,size=256M \
+  -device cxl-type3,bus=swport1,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,size=256M \
   -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-  -device cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,size=256M \
+  -device cxl-type3,bus=swport2,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,size=256M \
   -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-  -device cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,size=256M \
+  -device cxl-type3,bus=swport3,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,size=256M \
   -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
 
+Deprecations
+------------
+
+The Type 3 device [memdev] attribute has been deprecated in favor
+of the [persistent-memdev] and [volatile-memdev] attributes. [memdev]
+will default to a persistent memory device for backward compatibility
+and is incapable of being used in combination with [persistent-memdev].
+
 Kernel Configuration Options
 ----------------------------
 
diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
index f0a8a4045d..1a7a25dc53 100644
--- a/tests/qtest/cxl-test.c
+++ b/tests/qtest/cxl-test.c
@@ -34,29 +34,44 @@
                  "-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
                  "-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
 
-#define QEMU_T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
-                 "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
-                 "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
+#define QEMU_T3D_DEPRECATED \
+  "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
+  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
+
+#define QEMU_T3D_PMEM \
+  "-object memory-backend-file,id=m0,mem-path=%s,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
+  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-m0,lsa=lsa0,id=pmem0 "
+
+#define QEMU_T3D_VMEM \
+  "-object memory-backend-ram,id=mem0,size=256M " \
+  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=mem0 "
+
+#define QEMU_T3D_VMEM_LSA \
+  "-object memory-backend-ram,id=mem0,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
+  "-device cxl-type3,bus=rp0,volatile-memdev=mem0,lsa=lsa0,id=mem0 "
 
 #define QEMU_2T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "    \
                   "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
+                  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
                   "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
                   "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 "
+                  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 "
 
 #define QEMU_4T3D "-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
                   "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
+                  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
                   "-object memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "    \
                   "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 " \
+                  "-device cxl-type3,bus=rp1,persistent-memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 " \
                   "-object memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "    \
                   "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp2,memdev=cxl-mem2,lsa=lsa2,id=cxl-pmem2 " \
+                  "-device cxl-type3,bus=rp2,persistent-memdev=cxl-mem2,lsa=lsa2,id=cxl-pmem2 " \
                   "-object memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "    \
                   "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M "    \
-                  "-device cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "
+                  "-device cxl-type3,bus=rp3,persistent-memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "
 
 static void cxl_basic_hb(void)
 {
@@ -95,14 +110,53 @@ static void cxl_2root_port(void)
 }
 
 #ifdef CONFIG_POSIX
-static void cxl_t3d(void)
+static void cxl_t3d_deprecated(void)
+{
+    g_autoptr(GString) cmdline = g_string_new(NULL);
+    g_autofree const char *tmpfs = NULL;
+
+    tmpfs = g_dir_make_tmp("cxl-test-XXXXXX", NULL);
+
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_DEPRECATED,
+                    tmpfs, tmpfs);
+
+    qtest_start(cmdline->str);
+    qtest_end();
+}
+
+static void cxl_t3d_persistent(void)
+{
+    g_autoptr(GString) cmdline = g_string_new(NULL);
+    g_autofree const char *tmpfs = NULL;
+
+    tmpfs = g_dir_make_tmp("cxl-test-XXXXXX", NULL);
+
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_PMEM,
+                    tmpfs, tmpfs);
+
+    qtest_start(cmdline->str);
+    qtest_end();
+}
+
+static void cxl_t3d_volatile(void)
+{
+    g_autoptr(GString) cmdline = g_string_new(NULL);
+
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_VMEM);
+
+    qtest_start(cmdline->str);
+    qtest_end();
+}
+
+static void cxl_t3d_volatile_lsa(void)
 {
     g_autoptr(GString) cmdline = g_string_new(NULL);
     g_autofree const char *tmpfs = NULL;
 
     tmpfs = g_dir_make_tmp("cxl-test-XXXXXX", NULL);
 
-    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D, tmpfs, tmpfs);
+    g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D_VMEM_LSA,
+                    tmpfs);
 
     qtest_start(cmdline->str);
     qtest_end();
@@ -167,7 +221,10 @@ int main(int argc, char **argv)
         qtest_add_func("/pci/cxl/rp", cxl_root_port);
         qtest_add_func("/pci/cxl/rp_x2", cxl_2root_port);
 #ifdef CONFIG_POSIX
-        qtest_add_func("/pci/cxl/type3_device", cxl_t3d);
+        qtest_add_func("/pci/cxl/type3_device", cxl_t3d_deprecated);
+        qtest_add_func("/pci/cxl/type3_device_pmem", cxl_t3d_persistent);
+        qtest_add_func("/pci/cxl/type3_device_vmem", cxl_t3d_volatile);
+        qtest_add_func("/pci/cxl/type3_device_vmem_lsa", cxl_t3d_volatile_lsa);
         qtest_add_func("/pci/cxl/rp_x2_type3_x2", cxl_1pxb_2rp_2t3d);
         qtest_add_func("/pci/cxl/pxb_x2_root_port_x4_type3_x4",
                        cxl_2pxb_4rp_4t3d);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices
  2022-10-11 21:19 ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
                     ` (4 preceding siblings ...)
  2022-10-11 21:19   ` [PATCH 5/5] cxl: update tests and documentation for new cxl properties Gregory Price
@ 2022-10-11 22:20   ` Michael S. Tsirkin
  5 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2022-10-11 22:20 UTC (permalink / raw)
  To: Gregory Price
  Cc: jonathan.cameron, qemu-devel, linux-cxl, alison.schofield, dave,
	a.manzanares, bwidawsk, gregory.price, hchkuo, cbrowy, ira.weiny

On Tue, Oct 11, 2022 at 05:19:11PM -0400, Gregory Price wrote:
> Summary of Changes:
> 1) Correction of PCI_CLASS from STORAGE_EXPRESS to MEMORY_CXL on init
> 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> 3) Refactor CDAT DSMAS Initialization for multi-region initialization
> 4) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> 5) Test and Documentation updates
> 
> Developed with input from Jonathan Cameron and Davidloh Bueso.
> 
> This series brings 2 features to CXL Type-3 Devices:
>     1) Volatile Memory Region support
>     2) Multi-Region support (1 Volatile, 1 Persistent)
> 
> In this series we implement multi-region and volatile region support
> through 6 major changes to CXL devices
>     1) The HostMemoryBackend [hostmem] has been replaced by two
>        [hostvmem] and [hostpmem] to store volatile and persistent memory
>        respectively
>     2) The single AddressSpace has been replaced by two AddressSpaces
>        [hostvmem_as] and [hostpmem_as] to map respective memdevs.
>     3) Each memory region size and total region are stored separately
>     4) The CDAT and DVSEC memory map entries have been updated:
>        a) if vmem is present, vmem is mapped at DPA(0)
>        b) if pmem is present
>           i)  and vmem is present, pmem is mapped at DPA(vmem->size)
>           ii) else, pmem is mapped at DPA(0)
>        c) partitioning of pmem is not supported in this patch set but
>           has been discussed and this design should suffice.
>     5) Read/Write functions have been updated to access AddressSpaces
>        according to the mapping described in #4
>     6) cxl-mailbox has been updated to report the respective size of
>        volatile and persistent memory regions
> 
> CXL Spec (3.0) Section 8.2.9.8.2.0 - Get Partition Info
>   Active Volatile Memory
>     The device shall provide this volatile capacity starting at DPA 0
>   Active Persistent Memory
>     The device shall provide this persistent capacity starting at the
>     DPA immediately following the volatile capacity
> 
> Partitioning of Persistent Memory regions may be supported on following
> patch sets.
> 
> Submitted as an extention to the CDAT emulation because the CDAT DSMAS
> entry concerns memory mapping and is required to successfully map memory
> regions correctly in bios/efi.

As there will be v8 of CDAT patches I expect there will be a rebase
of this patchset too.

> Gregory Price (5):
>   hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL
>   hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
>   hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory
>     Regions
>   hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)
>   cxl: update tests and documentation for new cxl properties
> 
>  docs/system/devices/cxl.rst |  53 ++++-
>  hw/cxl/cxl-mailbox-utils.c  |  23 +-
>  hw/mem/cxl_type3.c          | 449 +++++++++++++++++++++++-------------
>  include/hw/cxl/cxl_device.h |  11 +-
>  tests/qtest/cxl-test.c      |  81 ++++++-
>  5 files changed, 416 insertions(+), 201 deletions(-)
> 
> -- 
> 2.37.3


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/5] hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory Regions
  2022-10-11 21:19   ` [PATCH 3/5] hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory Regions Gregory Price
@ 2022-10-12 14:10       ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-12 14:10 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Tue, 11 Oct 2022 17:19:14 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> This is a preparatory commit for enabling multiple memory regions within
> a single CXL Type-3 device.  We will need to initialize multiple CDAT
> DSMAS regions (and subsequent DSLBIS, and DSEMTS entries), so generalize
> the intialization into a function.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Hi Gregory,

Main comment here is that DOE isn't in yet.  Some of the changes
you have made in here should be review comments on that series rather
than here.

I'm also not keen on taking the various allocations to arrays,
particularly when seeing the hacky result in the free routine.

Just spin some more pointers and 3 more allocations (once persistent is
added).

Jonathan

> ---
>  hw/mem/cxl_type3.c | 275 +++++++++++++++++++++++++--------------------
>  1 file changed, 154 insertions(+), 121 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 282f274266..dda78704c2 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -24,145 +24,178 @@
>  #define UI64_NULL ~(0ULL)
>  #define DWORD_BYTE 4
>  
> +static int ct3_build_dsmas(CDATDsmas *dsmas,
> +                           CDATDslbis *dslbis,
> +                           CDATDsemts *dsemts,
> +                           MemoryRegion *mr,
> +                           int dsmad_handle,
> +                           bool is_pmem,
> +                           uint64_t dpa_base)

Rewrap this to be just under 80 characters per line.

This is building a lot more than DSMAS.
Could rename it, or could break it down into functions
that deal with each type  of entry.

> +{
> +    int len = 0;
> +    /* ttl_len should be incremented for every entry */

ttl_ ?

Given you now allocate outside of this function, probably
more appropriate to just add the entries up there.


> +
> +    /* Device Scoped Memory Affinity Structure */
> +    *dsmas = (CDATDsmas) {
> +        .header = {
> +            .type = CDAT_TYPE_DSMAS,
> +            .length = sizeof(*dsmas),
> +        },
> +        .DSMADhandle = dsmad_handle,
> +        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0),
> +        .DPA_base = dpa_base,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
> +
> +    /* For now, no memory side cache, plausiblish numbers */
> +    dslbis[0] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +        },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_LATENCY,
> +        .entry_base_unit = 10000, /* 10ns base */
> +        .entry[0] = 15, /* 150ns */
> +    };
> +    len++;
> +
> +    dslbis[1] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +        },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> +        .entry_base_unit = 10000,
> +        .entry[0] = 25, /* 250ns */
> +    };
> +    len++;
> +
> +    dslbis[2] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +            },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    dslbis[3] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +        },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    *dsemts = (CDATDsemts) {
> +        .header = {
> +            .type = CDAT_TYPE_DSEMTS,
> +            .length = sizeof(*dsemts),
> +        },
> +        .DSMAS_handle = dsmad_handle,
> +        /* EFI_MEMORY_NV implies EfiReservedMemoryType */
> +        .EFI_memory_type_attr = is_pmem ? 2 : 0,
> +        /* Reserved - the non volatile from DSMAS matters */
> +        .DPA_offset = 0,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
> +    return len;
> +}
> +
>  static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                  void *priv)
>  {
> -    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
> -    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
> +    g_autofree CDATDsmas *dsmas = NULL;
> +    g_autofree CDATDslbis *dslbis = NULL;
> +    g_autofree CDATDsemts *dsemts = NULL;
>      CXLType3Dev *ct3d = priv;
> -    int len = 0;

There are changes in here that aren't necessary and just result
in a much harder diff to review.  Why rename len to cdat_len?

> -    int i = 0;
> -    int next_dsmad_handle = 0;
> -    int nonvolatile_dsmad = -1;
> -    int dslbis_nonvolatile_num = 4;
> +    int cdat_len = 0;
> +    int cdat_idx = 0, sub_idx = 0;
> +    int dsmas_num, dslbis_num, dsemts_num;
> +    int dsmad_handle = 0;
> +    uint64_t dpa_base = 0;
>      MemoryRegion *mr;
>  
> -    /* Non volatile aspects */
> -    if (ct3d->hostmem) {
> -        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -        if (!dsmas_nonvolatile) {
> -            return -ENOMEM;
> -        }
> -        nonvolatile_dsmad = next_dsmad_handle++;
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        *dsmas_nonvolatile = (CDATDsmas) {
> -            .header = {
> -                .type = CDAT_TYPE_DSMAS,
> -                .length = sizeof(*dsmas_nonvolatile),
> -            },
> -            .DSMADhandle = nonvolatile_dsmad,
> -            .flags = CDAT_DSMAS_FLAG_NV,
> -            .DPA_base = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> -
> -        /* For now, no memory side cache, plausiblish numbers */
> -        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> -        if (!dslbis_nonvolatile)
> -            return -ENOMEM;
> -
> -        dslbis_nonvolatile[0] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_LATENCY,
> -            .entry_base_unit = 10000, /* 10ns base */
> -            .entry[0] = 15, /* 150ns */
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[1] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> -            .entry_base_unit = 10000,
> -            .entry[0] = 25, /* 250ns */
> -        };
> -        len++;
> -       
> -        dslbis_nonvolatile[2] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[3] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -        *dsemts_nonvolatile = (CDATDsemts) {
> -            .header = {
> -                .type = CDAT_TYPE_DSEMTS,
> -                .length = sizeof(*dsemts_nonvolatile),
> -            },
> -            .DSMAS_handle = nonvolatile_dsmad,
> -            .EFI_memory_type_attr = 2, /* Reserved - the non volatile from DSMAS matters */
> -            .DPA_offset = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> +    if (!ct3d->hostmem | !host_memory_backend_get_memory(ct3d->hostmem)) {

I don't immediately see why we need this test here and didn't previously.  If it
should always have been there, put that as a review on the DOE patches not here.

> +        return -EINVAL;
> +    }
> +
> +    dsmas_num = 1;
> +    dslbis_num = 4 * dsmas_num;
> +    dsemts_num = dsmas_num;
> +
> +    dsmas = g_malloc(sizeof(*dsmas) * dsmas_num);

As we aren't likely to add any more entries after non volatile
I'm not convinced making everything arrays then indexing into them
is worth while, particularly as it's causing huge amounts of churn.

If this style of preallocate then fill makes more sense (I don't particularly
like it breaks up the handling of each element), then propose that in review
of the original series.  Having this level of 'style' of function
change here makes for a hard to read set.  We can still change the
original patch.  I'm not yet convinced it's worth making that change.

I think I'd rather see the allocation and fill all in the factored out function.


> +    dslbis = g_malloc(sizeof(*dslbis) * dslbis_num);
> +    dsemts = g_malloc(sizeof(*dsemts) * dsemts_num);
> +
> +    if (!dsmas || !dslbis || !dsemts) {
> +        return -ENOMEM;
> +    }
> +
> +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> +    cdat_len += ct3_build_dsmas(&dsmas[dsmad_handle],

There isn't a specific connection between dsmad_handle.  This code
kind of makes it look like it's not just that we've decided to use handle
0 and later 1.

That's another reason I'd rather not do this with arrays.

> +                                &dslbis[4 * dsmad_handle],
> +                                &dsemts[dsmad_handle],
> +                                mr,
> +                                dsmad_handle,
> +                                false,
> +                                dpa_base);
> +    dpa_base += mr->size;
> +    dsmad_handle++;
> +
> +    /* Allocate and fill in the CDAT table */
> +    *cdat_table = g_malloc0(cdat_len * sizeof(*cdat_table));
> +    if (!*cdat_table) {
> +        return -ENOMEM;
>      }
>  
> -    *cdat_table = g_malloc0(len * sizeof(*cdat_table));

Looks like I'm missing an allocation failure check here in original
code. Please put that in a review of that series rather than introducing
the change hidden down in here. 

>      /* Header always at start of structure */
> -    if (dsmas_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
> +    CDATDsmas *dsmas_ent = g_steal_pointer(&dsmas);

We should not be introducing new local variables down here.
Style wise stick to old school C style of all definitions at the top
or within a scoped region {}.


> +    for (sub_idx = 0; sub_idx < dsmas_num; sub_idx++) {
> +        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dsmas_ent[sub_idx];

for a local index j is fine.
Using a more specific name for what was i is fair enough.  Belongs
in review of original patch given that hasn't been accepted yet.

>      }
> -    if (dslbis_nonvolatile) {
> -        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);        
> -        int j;
>  
> -        for (j = 0; j < dslbis_nonvolatile_num; j++) {
> -            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
> -        }
> +    CDATDslbis *dslbis_ent = g_steal_pointer(&dslbis);
> +    for (sub_idx = 0; sub_idx < dslbis_num; sub_idx++) {
> +        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dslbis_ent[sub_idx];
>      }
> -    if (dsemts_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +
> +    CDATDsemts *dsemts_ent = g_steal_pointer(&dsemts);
> +    for (sub_idx = 0; sub_idx < dsemts_num; sub_idx++) {
> +        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dsemts_ent[sub_idx];
>      }
> -    
> -    return len;
> +
> +    return cdat_len;
>  }
>  
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
>  {
> -    int i;
> +    int dsmas_num = 1;
> +    int dslbis_idx = dsmas_num;
> +    int dsemts_idx = dsmas_num + (dsmas_num * 4);
> +
> +    /* There are only 3 sub-tables to free: dsmas, dslbis, dsemts */
> +    assert(num == (dsmas_num + (dsmas_num * 4) + (dsmas_num)));

This alone is a very good reason not to do allocations as arrays.
It looks extremely fragile to me.  Lets just pay the cost of a few
more small allocations to keep the code easier to maintain.


> +
> +    g_free(cdat_table[0]);
> +    g_free(cdat_table[dslbis_idx]);
> +    g_free(cdat_table[dsemts_idx]);
>  
> -    for (i = 0; i < num; i++) {
> -        g_free(cdat_table[i]);
> -    }
>      g_free(cdat_table);
>  }
>  


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/5] hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory Regions
@ 2022-10-12 14:10       ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-12 14:10 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Tue, 11 Oct 2022 17:19:14 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> This is a preparatory commit for enabling multiple memory regions within
> a single CXL Type-3 device.  We will need to initialize multiple CDAT
> DSMAS regions (and subsequent DSLBIS, and DSEMTS entries), so generalize
> the intialization into a function.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Hi Gregory,

Main comment here is that DOE isn't in yet.  Some of the changes
you have made in here should be review comments on that series rather
than here.

I'm also not keen on taking the various allocations to arrays,
particularly when seeing the hacky result in the free routine.

Just spin some more pointers and 3 more allocations (once persistent is
added).

Jonathan

> ---
>  hw/mem/cxl_type3.c | 275 +++++++++++++++++++++++++--------------------
>  1 file changed, 154 insertions(+), 121 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 282f274266..dda78704c2 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -24,145 +24,178 @@
>  #define UI64_NULL ~(0ULL)
>  #define DWORD_BYTE 4
>  
> +static int ct3_build_dsmas(CDATDsmas *dsmas,
> +                           CDATDslbis *dslbis,
> +                           CDATDsemts *dsemts,
> +                           MemoryRegion *mr,
> +                           int dsmad_handle,
> +                           bool is_pmem,
> +                           uint64_t dpa_base)

Rewrap this to be just under 80 characters per line.

This is building a lot more than DSMAS.
Could rename it, or could break it down into functions
that deal with each type  of entry.

> +{
> +    int len = 0;
> +    /* ttl_len should be incremented for every entry */

ttl_ ?

Given you now allocate outside of this function, probably
more appropriate to just add the entries up there.


> +
> +    /* Device Scoped Memory Affinity Structure */
> +    *dsmas = (CDATDsmas) {
> +        .header = {
> +            .type = CDAT_TYPE_DSMAS,
> +            .length = sizeof(*dsmas),
> +        },
> +        .DSMADhandle = dsmad_handle,
> +        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0),
> +        .DPA_base = dpa_base,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
> +
> +    /* For now, no memory side cache, plausiblish numbers */
> +    dslbis[0] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +        },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_LATENCY,
> +        .entry_base_unit = 10000, /* 10ns base */
> +        .entry[0] = 15, /* 150ns */
> +    };
> +    len++;
> +
> +    dslbis[1] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +        },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> +        .entry_base_unit = 10000,
> +        .entry[0] = 25, /* 250ns */
> +    };
> +    len++;
> +
> +    dslbis[2] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +            },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    dslbis[3] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis),
> +        },
> +        .handle = dsmad_handle,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    *dsemts = (CDATDsemts) {
> +        .header = {
> +            .type = CDAT_TYPE_DSEMTS,
> +            .length = sizeof(*dsemts),
> +        },
> +        .DSMAS_handle = dsmad_handle,
> +        /* EFI_MEMORY_NV implies EfiReservedMemoryType */
> +        .EFI_memory_type_attr = is_pmem ? 2 : 0,
> +        /* Reserved - the non volatile from DSMAS matters */
> +        .DPA_offset = 0,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
> +    return len;
> +}
> +
>  static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                  void *priv)
>  {
> -    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
> -    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
> +    g_autofree CDATDsmas *dsmas = NULL;
> +    g_autofree CDATDslbis *dslbis = NULL;
> +    g_autofree CDATDsemts *dsemts = NULL;
>      CXLType3Dev *ct3d = priv;
> -    int len = 0;

There are changes in here that aren't necessary and just result
in a much harder diff to review.  Why rename len to cdat_len?

> -    int i = 0;
> -    int next_dsmad_handle = 0;
> -    int nonvolatile_dsmad = -1;
> -    int dslbis_nonvolatile_num = 4;
> +    int cdat_len = 0;
> +    int cdat_idx = 0, sub_idx = 0;
> +    int dsmas_num, dslbis_num, dsemts_num;
> +    int dsmad_handle = 0;
> +    uint64_t dpa_base = 0;
>      MemoryRegion *mr;
>  
> -    /* Non volatile aspects */
> -    if (ct3d->hostmem) {
> -        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -        if (!dsmas_nonvolatile) {
> -            return -ENOMEM;
> -        }
> -        nonvolatile_dsmad = next_dsmad_handle++;
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        *dsmas_nonvolatile = (CDATDsmas) {
> -            .header = {
> -                .type = CDAT_TYPE_DSMAS,
> -                .length = sizeof(*dsmas_nonvolatile),
> -            },
> -            .DSMADhandle = nonvolatile_dsmad,
> -            .flags = CDAT_DSMAS_FLAG_NV,
> -            .DPA_base = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> -
> -        /* For now, no memory side cache, plausiblish numbers */
> -        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> -        if (!dslbis_nonvolatile)
> -            return -ENOMEM;
> -
> -        dslbis_nonvolatile[0] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_LATENCY,
> -            .entry_base_unit = 10000, /* 10ns base */
> -            .entry[0] = 15, /* 150ns */
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[1] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> -            .entry_base_unit = 10000,
> -            .entry[0] = 25, /* 250ns */
> -        };
> -        len++;
> -       
> -        dslbis_nonvolatile[2] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[3] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -        *dsemts_nonvolatile = (CDATDsemts) {
> -            .header = {
> -                .type = CDAT_TYPE_DSEMTS,
> -                .length = sizeof(*dsemts_nonvolatile),
> -            },
> -            .DSMAS_handle = nonvolatile_dsmad,
> -            .EFI_memory_type_attr = 2, /* Reserved - the non volatile from DSMAS matters */
> -            .DPA_offset = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> +    if (!ct3d->hostmem | !host_memory_backend_get_memory(ct3d->hostmem)) {

I don't immediately see why we need this test here and didn't previously.  If it
should always have been there, put that as a review on the DOE patches not here.

> +        return -EINVAL;
> +    }
> +
> +    dsmas_num = 1;
> +    dslbis_num = 4 * dsmas_num;
> +    dsemts_num = dsmas_num;
> +
> +    dsmas = g_malloc(sizeof(*dsmas) * dsmas_num);

As we aren't likely to add any more entries after non volatile
I'm not convinced making everything arrays then indexing into them
is worth while, particularly as it's causing huge amounts of churn.

If this style of preallocate then fill makes more sense (I don't particularly
like it breaks up the handling of each element), then propose that in review
of the original series.  Having this level of 'style' of function
change here makes for a hard to read set.  We can still change the
original patch.  I'm not yet convinced it's worth making that change.

I think I'd rather see the allocation and fill all in the factored out function.


> +    dslbis = g_malloc(sizeof(*dslbis) * dslbis_num);
> +    dsemts = g_malloc(sizeof(*dsemts) * dsemts_num);
> +
> +    if (!dsmas || !dslbis || !dsemts) {
> +        return -ENOMEM;
> +    }
> +
> +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> +    cdat_len += ct3_build_dsmas(&dsmas[dsmad_handle],

There isn't a specific connection between dsmad_handle.  This code
kind of makes it look like it's not just that we've decided to use handle
0 and later 1.

That's another reason I'd rather not do this with arrays.

> +                                &dslbis[4 * dsmad_handle],
> +                                &dsemts[dsmad_handle],
> +                                mr,
> +                                dsmad_handle,
> +                                false,
> +                                dpa_base);
> +    dpa_base += mr->size;
> +    dsmad_handle++;
> +
> +    /* Allocate and fill in the CDAT table */
> +    *cdat_table = g_malloc0(cdat_len * sizeof(*cdat_table));
> +    if (!*cdat_table) {
> +        return -ENOMEM;
>      }
>  
> -    *cdat_table = g_malloc0(len * sizeof(*cdat_table));

Looks like I'm missing an allocation failure check here in original
code. Please put that in a review of that series rather than introducing
the change hidden down in here. 

>      /* Header always at start of structure */
> -    if (dsmas_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
> +    CDATDsmas *dsmas_ent = g_steal_pointer(&dsmas);

We should not be introducing new local variables down here.
Style wise stick to old school C style of all definitions at the top
or within a scoped region {}.


> +    for (sub_idx = 0; sub_idx < dsmas_num; sub_idx++) {
> +        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dsmas_ent[sub_idx];

for a local index j is fine.
Using a more specific name for what was i is fair enough.  Belongs
in review of original patch given that hasn't been accepted yet.

>      }
> -    if (dslbis_nonvolatile) {
> -        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);        
> -        int j;
>  
> -        for (j = 0; j < dslbis_nonvolatile_num; j++) {
> -            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
> -        }
> +    CDATDslbis *dslbis_ent = g_steal_pointer(&dslbis);
> +    for (sub_idx = 0; sub_idx < dslbis_num; sub_idx++) {
> +        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dslbis_ent[sub_idx];
>      }
> -    if (dsemts_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +
> +    CDATDsemts *dsemts_ent = g_steal_pointer(&dsemts);
> +    for (sub_idx = 0; sub_idx < dsemts_num; sub_idx++) {
> +        (*cdat_table)[cdat_idx++] = (CDATSubHeader*)&dsemts_ent[sub_idx];
>      }
> -    
> -    return len;
> +
> +    return cdat_len;
>  }
>  
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
>  {
> -    int i;
> +    int dsmas_num = 1;
> +    int dslbis_idx = dsmas_num;
> +    int dsemts_idx = dsmas_num + (dsmas_num * 4);
> +
> +    /* There are only 3 sub-tables to free: dsmas, dslbis, dsemts */
> +    assert(num == (dsmas_num + (dsmas_num * 4) + (dsmas_num)));

This alone is a very good reason not to do allocations as arrays.
It looks extremely fragile to me.  Lets just pay the cost of a few
more small allocations to keep the code easier to maintain.


> +
> +    g_free(cdat_table[0]);
> +    g_free(cdat_table[dslbis_idx]);
> +    g_free(cdat_table[dsemts_idx]);
>  
> -    for (i = 0; i < num; i++) {
> -        g_free(cdat_table[i]);
> -    }
>      g_free(cdat_table);
>  }
>  



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-07 15:21   ` Jonathan Cameron via
  (?)
@ 2022-10-12 16:01   ` Gregory Price
  2022-10-13 10:40       ` Jonathan Cameron via
  2022-10-13 10:56       ` Jonathan Cameron via
  -1 siblings, 2 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-12 16:01 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, linuxarm, ira.weiny

This code contains heap corruption on free, and I think should be
refactored to pre-allocate all the entries we're interested in putting
into the table.  This would flatten the code and simplify the error
handling steps.

Also, should we consider making a union with all the possible entries to
make entry allocation easier?  It may eat a few extra bytes of memory,
but it would simplify the allocation/cleanup code here further.

Given that every allocation has to be checked, i'm also not convinced
the use of g_autofree is worth the potential footguns associated with
it.

> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 568c9d62f5..3fa5d70662 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -12,9 +12,218 @@
> +static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> +                                void *priv)
> +{
(snip)
> +        /* For now, no memory side cache, plausiblish numbers */
> +        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> +        if (!dslbis_nonvolatile)
> +            return -ENOMEM;

this allocation creates a table of entries, which is later freed
incorrectly

> +
> +    *cdat_table = g_malloc0(len * sizeof(*cdat_table));

this allocation needs to be checked

> +    /* Header always at start of structure */
> +    if (dsmas_nonvolatile) {
> +        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
> +    }
> +    if (dslbis_nonvolatile) {
> +        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);        

using a local reference used to avoid a g_autofree footgun suggests
we should not use g_autofree here, and possibly reconsider the overall
strategy for allocation and cleanup

> +        int j;
> +
> +        for (j = 0; j < dslbis_nonvolatile_num; j++) {
> +            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
> +        }

this fills the CDAT table with sub-references to the table allocated
above, which leads to heap corruption with the current code, or
complicated cleanup if we decide to keep it

> +    
> +    return len;
> +}
> +
> +static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
> +{
> +    int i;
> +

And here we free every entry of the table, which can/will cause heap
corruption when the sub-table entries are freed

> +    for (i = 0; i < num; i++) {
> +        g_free(cdat_table[i]);
> +    }
> +    g_free(cdat_table);
> +}



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-07 15:21   ` Jonathan Cameron via
  (?)
  (?)
@ 2022-10-12 18:21   ` Gregory Price
  2022-10-12 18:21     ` [PATCH 1/5] hw/mem/cxl_type3: fix checkpatch errors Gregory Price
                       ` (5 more replies)
  -1 siblings, 6 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-12 18:21 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

Included in this response is a recommended patch set on top of this
patch that resolves a number of issues, including style and a heap
corruption bug.

The purpose of this patch set is to refactor the CDAT initialization
code to support future patch sets that will introduce multi-region
support in CXL Type3 devices.

1) Checkpatch errors in the immediately prior patch
2) Flatting of code in cdat initialization
3) Changes in allocation and error checking for cleanliness
4) Change in the allocation/free strategy of CDAT sub-tables to simplify
   multi-region allocation in the future.  Also resolves a heap
   corruption bug
5) Refactor of CDAT initialization code into a function that initializes
   sub-tables per memory-region.

Gregory Price (5):
  hw/mem/cxl_type3: fix checkpatch errors
  hw/mem/cxl_type3: Pull validation checks ahead of functional code
  hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
  hw/mem/cxl_type3: Change the CDAT allocation/free strategy
  hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
    function

 hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
 1 file changed, 122 insertions(+), 118 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 1/5] hw/mem/cxl_type3: fix checkpatch errors
  2022-10-12 18:21   ` Gregory Price
@ 2022-10-12 18:21     ` Gregory Price
  2022-10-12 18:21     ` [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code Gregory Price
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: Gregory Price @ 2022-10-12 18:21 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

This fixes checkpatch errors in the prior commit.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/mem/cxl_type3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 3fa5d70662..94bc439d89 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -56,9 +56,11 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         len++;
 
         /* For now, no memory side cache, plausiblish numbers */
-        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
-        if (!dslbis_nonvolatile)
+        dslbis_nonvolatile =
+            g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
+        if (!dslbis_nonvolatile) {
             return -ENOMEM;
+        }
 
         dslbis_nonvolatile[0] = (CDATDslbis) {
             .header = {
@@ -85,7 +87,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
             .entry[0] = 25, /* 250ns */
         };
         len++;
-       
+
         dslbis_nonvolatile[2] = (CDATDslbis) {
             .header = {
                 .type = CDAT_TYPE_DSLBIS,
@@ -123,7 +125,8 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
                 .length = sizeof(*dsemts_nonvolatile),
             },
             .DSMAS_handle = nonvolatile_dsmad,
-            .EFI_memory_type_attr = 2, /* Reserved - the non volatile from DSMAS matters */
+            /* Reserved - the non volatile from DSMAS matters */
+            .EFI_memory_type_attr = 2,
             .DPA_offset = 0,
             .DPA_length = int128_get64(mr->size),
         };
@@ -136,7 +139,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
     }
     if (dslbis_nonvolatile) {
-        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);        
+        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
         int j;
 
         for (j = 0; j < dslbis_nonvolatile_num; j++) {
@@ -146,7 +149,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     if (dsemts_nonvolatile) {
         (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
     }
-    
+
     return len;
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code
  2022-10-12 18:21   ` Gregory Price
  2022-10-12 18:21     ` [PATCH 1/5] hw/mem/cxl_type3: fix checkpatch errors Gregory Price
@ 2022-10-12 18:21     ` Gregory Price
  2022-10-13  9:07         ` Jonathan Cameron via
  2022-10-12 18:21     ` [PATCH 3/5] hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work Gregory Price
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-12 18:21 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

For style - pulling these validations ahead flattens the code.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/mem/cxl_type3.c | 193 ++++++++++++++++++++++-----------------------
 1 file changed, 96 insertions(+), 97 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 94bc439d89..43b2b9e041 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -32,107 +32,106 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     int dslbis_nonvolatile_num = 4;
     MemoryRegion *mr;
 
+    if (!ct3d->hostmem) {
+        return len;
+    }
+
+    mr = host_memory_backend_get_memory(ct3d->hostmem);
+    if (!mr) {
+        return -EINVAL;
+    }
+
     /* Non volatile aspects */
-    if (ct3d->hostmem) {
-        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
-        if (!dsmas_nonvolatile) {
-            return -ENOMEM;
-        }
-        nonvolatile_dsmad = next_dsmad_handle++;
-        mr = host_memory_backend_get_memory(ct3d->hostmem);
-        if (!mr) {
-            return -EINVAL;
-        }
-        *dsmas_nonvolatile = (CDATDsmas) {
-            .header = {
-                .type = CDAT_TYPE_DSMAS,
-                .length = sizeof(*dsmas_nonvolatile),
-            },
-            .DSMADhandle = nonvolatile_dsmad,
-            .flags = CDAT_DSMAS_FLAG_NV,
-            .DPA_base = 0,
-            .DPA_length = int128_get64(mr->size),
-        };
-        len++;
-
-        /* For now, no memory side cache, plausiblish numbers */
-        dslbis_nonvolatile =
-            g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
-        if (!dslbis_nonvolatile) {
-            return -ENOMEM;
-        }
+    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
+    if (!dsmas_nonvolatile) {
+        return -ENOMEM;
+    }
+    nonvolatile_dsmad = next_dsmad_handle++;
+    *dsmas_nonvolatile = (CDATDsmas) {
+        .header = {
+            .type = CDAT_TYPE_DSMAS,
+            .length = sizeof(*dsmas_nonvolatile),
+        },
+        .DSMADhandle = nonvolatile_dsmad,
+        .flags = CDAT_DSMAS_FLAG_NV,
+        .DPA_base = 0,
+        .DPA_length = int128_get64(mr->size),
+    };
+    len++;
 
-        dslbis_nonvolatile[0] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_READ_LATENCY,
-            .entry_base_unit = 10000, /* 10ns base */
-            .entry[0] = 15, /* 150ns */
-        };
-        len++;
-
-        dslbis_nonvolatile[1] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
-            .entry_base_unit = 10000,
-            .entry[0] = 25, /* 250ns */
-        };
-        len++;
-
-        dslbis_nonvolatile[2] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
-            .entry_base_unit = 1000, /* GB/s */
-            .entry[0] = 16,
-        };
-        len++;
-
-        dslbis_nonvolatile[3] = (CDATDslbis) {
-            .header = {
-                .type = CDAT_TYPE_DSLBIS,
-                .length = sizeof(*dslbis_nonvolatile),
-            },
-            .handle = nonvolatile_dsmad,
-            .flags = HMAT_LB_MEM_MEMORY,
-            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
-            .entry_base_unit = 1000, /* GB/s */
-            .entry[0] = 16,
-        };
-        len++;
-
-        mr = host_memory_backend_get_memory(ct3d->hostmem);
-        if (!mr) {
-            return -EINVAL;
-        }
-        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
-        *dsemts_nonvolatile = (CDATDsemts) {
-            .header = {
-                .type = CDAT_TYPE_DSEMTS,
-                .length = sizeof(*dsemts_nonvolatile),
-            },
-            .DSMAS_handle = nonvolatile_dsmad,
-            /* Reserved - the non volatile from DSMAS matters */
-            .EFI_memory_type_attr = 2,
-            .DPA_offset = 0,
-            .DPA_length = int128_get64(mr->size),
-        };
-        len++;
+    /* For now, no memory side cache, plausiblish numbers */
+    dslbis_nonvolatile =
+        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
+    if (!dslbis_nonvolatile) {
+        return -ENOMEM;
     }
 
+    dslbis_nonvolatile[0] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis_nonvolatile),
+        },
+        .handle = nonvolatile_dsmad,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_READ_LATENCY,
+        .entry_base_unit = 10000, /* 10ns base */
+        .entry[0] = 15, /* 150ns */
+    };
+    len++;
+
+    dslbis_nonvolatile[1] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis_nonvolatile),
+        },
+        .handle = nonvolatile_dsmad,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
+        .entry_base_unit = 10000,
+        .entry[0] = 25, /* 250ns */
+    };
+    len++;
+
+    dslbis_nonvolatile[2] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis_nonvolatile),
+        },
+        .handle = nonvolatile_dsmad,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
+        .entry_base_unit = 1000, /* GB/s */
+        .entry[0] = 16,
+    };
+    len++;
+
+    dslbis_nonvolatile[3] = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis_nonvolatile),
+        },
+        .handle = nonvolatile_dsmad,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
+        .entry_base_unit = 1000, /* GB/s */
+        .entry[0] = 16,
+    };
+    len++;
+
+    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
+    *dsemts_nonvolatile = (CDATDsemts) {
+        .header = {
+            .type = CDAT_TYPE_DSEMTS,
+            .length = sizeof(*dsemts_nonvolatile),
+        },
+        .DSMAS_handle = nonvolatile_dsmad,
+        /* Reserved - the non volatile from DSMAS matters */
+        .EFI_memory_type_attr = 2,
+        .DPA_offset = 0,
+        .DPA_length = int128_get64(mr->size),
+    };
+    len++;
+
     *cdat_table = g_malloc0(len * sizeof(*cdat_table));
     /* Header always at start of structure */
     if (dsmas_nonvolatile) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 3/5] hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
  2022-10-12 18:21   ` Gregory Price
  2022-10-12 18:21     ` [PATCH 1/5] hw/mem/cxl_type3: fix checkpatch errors Gregory Price
  2022-10-12 18:21     ` [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code Gregory Price
@ 2022-10-12 18:21     ` Gregory Price
  2022-10-13 10:44         ` Jonathan Cameron via
  2022-10-12 18:21     ` [PATCH 4/5] hw/mem/cxl_type3: Change the CDAT allocation/free strategy Gregory Price
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-12 18:21 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

Makes the size of the allocated cdat table static (6 entries),
flattens the code, and reduces the number of exit conditions

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/mem/cxl_type3.c | 52 ++++++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 30 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 43b2b9e041..0e0ea70387 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -17,6 +17,7 @@
 #include "hw/pci/msix.h"
 
 #define DWORD_BYTE 4
+#define CT3_CDAT_SUBTABLE_SIZE 6
 
 static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
                                 void *priv)
@@ -25,7 +26,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
     g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
     CXLType3Dev *ct3d = priv;
-    int len = 0;
     int i = 0;
     int next_dsmad_handle = 0;
     int nonvolatile_dsmad = -1;
@@ -33,7 +33,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     MemoryRegion *mr;
 
     if (!ct3d->hostmem) {
-        return len;
+        return 0;
     }
 
     mr = host_memory_backend_get_memory(ct3d->hostmem);
@@ -41,11 +41,22 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         return -EINVAL;
     }
 
+    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));
+    if (!*cdat_table) {
+        return -ENOMEM;
+    }
+
     /* Non volatile aspects */
     dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
-    if (!dsmas_nonvolatile) {
+    dslbis_nonvolatile =
+        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
+    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
+    if (!dsmas_nonvolatile || !dslbis_nonvolatile || !dsemts_nonvolatile) {
+        g_free(*cdat_table);
+        *cdat_table = NULL;
         return -ENOMEM;
     }
+
     nonvolatile_dsmad = next_dsmad_handle++;
     *dsmas_nonvolatile = (CDATDsmas) {
         .header = {
@@ -57,15 +68,8 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .DPA_base = 0,
         .DPA_length = int128_get64(mr->size),
     };
-    len++;
 
     /* For now, no memory side cache, plausiblish numbers */
-    dslbis_nonvolatile =
-        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
-    if (!dslbis_nonvolatile) {
-        return -ENOMEM;
-    }
-
     dslbis_nonvolatile[0] = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
@@ -77,7 +81,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .entry_base_unit = 10000, /* 10ns base */
         .entry[0] = 15, /* 150ns */
     };
-    len++;
 
     dslbis_nonvolatile[1] = (CDATDslbis) {
         .header = {
@@ -90,7 +93,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .entry_base_unit = 10000,
         .entry[0] = 25, /* 250ns */
     };
-    len++;
 
     dslbis_nonvolatile[2] = (CDATDslbis) {
         .header = {
@@ -103,7 +105,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .entry_base_unit = 1000, /* GB/s */
         .entry[0] = 16,
     };
-    len++;
 
     dslbis_nonvolatile[3] = (CDATDslbis) {
         .header = {
@@ -116,9 +117,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .entry_base_unit = 1000, /* GB/s */
         .entry[0] = 16,
     };
-    len++;
 
-    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
     *dsemts_nonvolatile = (CDATDsemts) {
         .header = {
             .type = CDAT_TYPE_DSEMTS,
@@ -130,26 +129,19 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .DPA_offset = 0,
         .DPA_length = int128_get64(mr->size),
     };
-    len++;
 
-    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
     /* Header always at start of structure */
-    if (dsmas_nonvolatile) {
-        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
-    }
-    if (dslbis_nonvolatile) {
-        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
-        int j;
+    (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
 
-        for (j = 0; j < dslbis_nonvolatile_num; j++) {
-            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
-        }
-    }
-    if (dsemts_nonvolatile) {
-        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
+    CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
+    int j;
+    for (j = 0; j < dslbis_nonvolatile_num; j++) {
+        (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
     }
 
-    return len;
+    (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
+
+    return CT3_CDAT_SUBTABLE_SIZE;
 }
 
 static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 4/5] hw/mem/cxl_type3: Change the CDAT allocation/free strategy
  2022-10-12 18:21   ` Gregory Price
                       ` (2 preceding siblings ...)
  2022-10-12 18:21     ` [PATCH 3/5] hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work Gregory Price
@ 2022-10-12 18:21     ` Gregory Price
  2022-10-13 10:45         ` Jonathan Cameron via
  2022-10-12 18:21     ` [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function Gregory Price
  2022-10-13  8:57       ` Jonathan Cameron via
  5 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-12 18:21 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

The existing code allocates a subtable for SLBIS entries, uses a
local variable to avoid a g_autofree footgun, and the cleanup code
causes heap corruption.

Rather than allocate a table, explicitly allocate each individual entry
and make the sub-table size static.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/mem/cxl_type3.c | 49 ++++++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 0e0ea70387..220b9f09a9 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -23,13 +23,14 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
                                 void *priv)
 {
     g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
-    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
+    g_autofree CDATDslbis *dslbis_nonvolatile1 = NULL;
+    g_autofree CDATDslbis *dslbis_nonvolatile2 = NULL;
+    g_autofree CDATDslbis *dslbis_nonvolatile3 = NULL;
+    g_autofree CDATDslbis *dslbis_nonvolatile4 = NULL;
     g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
     CXLType3Dev *ct3d = priv;
-    int i = 0;
     int next_dsmad_handle = 0;
     int nonvolatile_dsmad = -1;
-    int dslbis_nonvolatile_num = 4;
     MemoryRegion *mr;
 
     if (!ct3d->hostmem) {
@@ -48,10 +49,15 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
 
     /* Non volatile aspects */
     dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
-    dslbis_nonvolatile =
-        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
+    dslbis_nonvolatile1 = g_malloc(sizeof(*dslbis_nonvolatile1));
+    dslbis_nonvolatile2 = g_malloc(sizeof(*dslbis_nonvolatile2));
+    dslbis_nonvolatile3 = g_malloc(sizeof(*dslbis_nonvolatile3));
+    dslbis_nonvolatile4 = g_malloc(sizeof(*dslbis_nonvolatile4));
     dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
-    if (!dsmas_nonvolatile || !dslbis_nonvolatile || !dsemts_nonvolatile) {
+
+    if (!dsmas_nonvolatile || !dsemts_nonvolatile ||
+        !dslbis_nonvolatile1 || !dslbis_nonvolatile2 ||
+        !dslbis_nonvolatile3 || !dslbis_nonvolatile4) {
         g_free(*cdat_table);
         *cdat_table = NULL;
         return -ENOMEM;
@@ -70,10 +76,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     };
 
     /* For now, no memory side cache, plausiblish numbers */
-    dslbis_nonvolatile[0] = (CDATDslbis) {
+    *dslbis_nonvolatile1 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile),
+            .length = sizeof(*dslbis_nonvolatile1),
         },
         .handle = nonvolatile_dsmad,
         .flags = HMAT_LB_MEM_MEMORY,
@@ -82,10 +88,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .entry[0] = 15, /* 150ns */
     };
 
-    dslbis_nonvolatile[1] = (CDATDslbis) {
+    *dslbis_nonvolatile2 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile),
+            .length = sizeof(*dslbis_nonvolatile2),
         },
         .handle = nonvolatile_dsmad,
         .flags = HMAT_LB_MEM_MEMORY,
@@ -94,10 +100,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .entry[0] = 25, /* 250ns */
     };
 
-    dslbis_nonvolatile[2] = (CDATDslbis) {
+    *dslbis_nonvolatile3 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile),
+            .length = sizeof(*dslbis_nonvolatile3),
         },
         .handle = nonvolatile_dsmad,
         .flags = HMAT_LB_MEM_MEMORY,
@@ -106,10 +112,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
         .entry[0] = 16,
     };
 
-    dslbis_nonvolatile[3] = (CDATDslbis) {
+    *dslbis_nonvolatile4 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile),
+            .length = sizeof(*dslbis_nonvolatile4),
         },
         .handle = nonvolatile_dsmad,
         .flags = HMAT_LB_MEM_MEMORY,
@@ -131,15 +137,12 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     };
 
     /* Header always at start of structure */
-    (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
-
-    CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
-    int j;
-    for (j = 0; j < dslbis_nonvolatile_num; j++) {
-        (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
-    }
-
-    (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
+    (*cdat_table)[0] = g_steal_pointer(&dsmas_nonvolatile);
+    (*cdat_table)[1] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile1);
+    (*cdat_table)[2] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile2);
+    (*cdat_table)[3] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile3);
+    (*cdat_table)[4] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile4);
+    (*cdat_table)[5] = g_steal_pointer(&dsemts_nonvolatile);
 
     return CT3_CDAT_SUBTABLE_SIZE;
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function
  2022-10-12 18:21   ` Gregory Price
                       ` (3 preceding siblings ...)
  2022-10-12 18:21     ` [PATCH 4/5] hw/mem/cxl_type3: Change the CDAT allocation/free strategy Gregory Price
@ 2022-10-12 18:21     ` Gregory Price
  2022-10-13 10:47         ` Jonathan Cameron via
  2022-10-13  8:57       ` Jonathan Cameron via
  5 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-12 18:21 UTC (permalink / raw)
  To: jonathan.cameron
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

The CDAT can contain multiple entries for multiple memory regions, this
will allow us to re-use the initialization code when volatile memory
region support is added.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 hw/mem/cxl_type3.c | 137 ++++++++++++++++++++++++---------------------
 1 file changed, 72 insertions(+), 65 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 220b9f09a9..3c5485abd0 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -19,117 +19,93 @@
 #define DWORD_BYTE 4
 #define CT3_CDAT_SUBTABLE_SIZE 6
 
-static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
-                                void *priv)
+static int ct3_build_cdat_subtable(CDATSubHeader **cdat_table,
+        MemoryRegion *mr, int dsmad_handle)
 {
-    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
-    g_autofree CDATDslbis *dslbis_nonvolatile1 = NULL;
-    g_autofree CDATDslbis *dslbis_nonvolatile2 = NULL;
-    g_autofree CDATDslbis *dslbis_nonvolatile3 = NULL;
-    g_autofree CDATDslbis *dslbis_nonvolatile4 = NULL;
-    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
-    CXLType3Dev *ct3d = priv;
-    int next_dsmad_handle = 0;
-    int nonvolatile_dsmad = -1;
-    MemoryRegion *mr;
-
-    if (!ct3d->hostmem) {
-        return 0;
-    }
-
-    mr = host_memory_backend_get_memory(ct3d->hostmem);
-    if (!mr) {
-        return -EINVAL;
-    }
-
-    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));
-    if (!*cdat_table) {
-        return -ENOMEM;
-    }
-
-    /* Non volatile aspects */
-    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
-    dslbis_nonvolatile1 = g_malloc(sizeof(*dslbis_nonvolatile1));
-    dslbis_nonvolatile2 = g_malloc(sizeof(*dslbis_nonvolatile2));
-    dslbis_nonvolatile3 = g_malloc(sizeof(*dslbis_nonvolatile3));
-    dslbis_nonvolatile4 = g_malloc(sizeof(*dslbis_nonvolatile4));
-    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
-
-    if (!dsmas_nonvolatile || !dsemts_nonvolatile ||
-        !dslbis_nonvolatile1 || !dslbis_nonvolatile2 ||
-        !dslbis_nonvolatile3 || !dslbis_nonvolatile4) {
-        g_free(*cdat_table);
-        *cdat_table = NULL;
+    g_autofree CDATDsmas *dsmas = NULL;
+    g_autofree CDATDslbis *dslbis1 = NULL;
+    g_autofree CDATDslbis *dslbis2 = NULL;
+    g_autofree CDATDslbis *dslbis3 = NULL;
+    g_autofree CDATDslbis *dslbis4 = NULL;
+    g_autofree CDATDsemts *dsemts = NULL;
+
+    dsmas = g_malloc(sizeof(*dsmas));
+    dslbis1 = g_malloc(sizeof(*dslbis1));
+    dslbis2 = g_malloc(sizeof(*dslbis2));
+    dslbis3 = g_malloc(sizeof(*dslbis3));
+    dslbis4 = g_malloc(sizeof(*dslbis4));
+    dsemts = g_malloc(sizeof(*dsemts));
+
+    if (!dsmas || !dslbis1 || !dslbis2 || !dslbis3 || !dslbis4 || !dsemts) {
         return -ENOMEM;
     }
 
-    nonvolatile_dsmad = next_dsmad_handle++;
-    *dsmas_nonvolatile = (CDATDsmas) {
+    *dsmas = (CDATDsmas) {
         .header = {
             .type = CDAT_TYPE_DSMAS,
-            .length = sizeof(*dsmas_nonvolatile),
+            .length = sizeof(*dsmas),
         },
-        .DSMADhandle = nonvolatile_dsmad,
+        .DSMADhandle = dsmad_handle,
         .flags = CDAT_DSMAS_FLAG_NV,
         .DPA_base = 0,
         .DPA_length = int128_get64(mr->size),
     };
 
     /* For now, no memory side cache, plausiblish numbers */
-    *dslbis_nonvolatile1 = (CDATDslbis) {
+    *dslbis1 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile1),
+            .length = sizeof(*dslbis1),
         },
-        .handle = nonvolatile_dsmad,
+        .handle = dsmad_handle,
         .flags = HMAT_LB_MEM_MEMORY,
         .data_type = HMAT_LB_DATA_READ_LATENCY,
         .entry_base_unit = 10000, /* 10ns base */
         .entry[0] = 15, /* 150ns */
     };
 
-    *dslbis_nonvolatile2 = (CDATDslbis) {
+    *dslbis2 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile2),
+            .length = sizeof(*dslbis2),
         },
-        .handle = nonvolatile_dsmad,
+        .handle = dsmad_handle,
         .flags = HMAT_LB_MEM_MEMORY,
         .data_type = HMAT_LB_DATA_WRITE_LATENCY,
         .entry_base_unit = 10000,
         .entry[0] = 25, /* 250ns */
     };
 
-    *dslbis_nonvolatile3 = (CDATDslbis) {
+    *dslbis3 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile3),
+            .length = sizeof(*dslbis3),
         },
-        .handle = nonvolatile_dsmad,
+        .handle = dsmad_handle,
         .flags = HMAT_LB_MEM_MEMORY,
         .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
         .entry_base_unit = 1000, /* GB/s */
         .entry[0] = 16,
     };
 
-    *dslbis_nonvolatile4 = (CDATDslbis) {
+    *dslbis4 = (CDATDslbis) {
         .header = {
             .type = CDAT_TYPE_DSLBIS,
-            .length = sizeof(*dslbis_nonvolatile4),
+            .length = sizeof(*dslbis4),
         },
-        .handle = nonvolatile_dsmad,
+        .handle = dsmad_handle,
         .flags = HMAT_LB_MEM_MEMORY,
         .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
         .entry_base_unit = 1000, /* GB/s */
         .entry[0] = 16,
     };
 
-    *dsemts_nonvolatile = (CDATDsemts) {
+    *dsemts = (CDATDsemts) {
         .header = {
             .type = CDAT_TYPE_DSEMTS,
-            .length = sizeof(*dsemts_nonvolatile),
+            .length = sizeof(*dsemts),
         },
-        .DSMAS_handle = nonvolatile_dsmad,
+        .DSMAS_handle = dsmad_handle,
         /* Reserved - the non volatile from DSMAS matters */
         .EFI_memory_type_attr = 2,
         .DPA_offset = 0,
@@ -137,16 +113,47 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
     };
 
     /* Header always at start of structure */
-    (*cdat_table)[0] = g_steal_pointer(&dsmas_nonvolatile);
-    (*cdat_table)[1] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile1);
-    (*cdat_table)[2] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile2);
-    (*cdat_table)[3] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile3);
-    (*cdat_table)[4] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile4);
-    (*cdat_table)[5] = g_steal_pointer(&dsemts_nonvolatile);
+    cdat_table[0] = g_steal_pointer(&dsmas);
+    cdat_table[1] = (CDATSubHeader *)g_steal_pointer(&dslbis1);
+    cdat_table[2] = (CDATSubHeader *)g_steal_pointer(&dslbis2);
+    cdat_table[3] = (CDATSubHeader *)g_steal_pointer(&dslbis3);
+    cdat_table[4] = (CDATSubHeader *)g_steal_pointer(&dslbis4);
+    cdat_table[5] = g_steal_pointer(&dsemts);
 
     return CT3_CDAT_SUBTABLE_SIZE;
 }
 
+static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
+                                void *priv)
+{
+    CXLType3Dev *ct3d = priv;
+    MemoryRegion *mr;
+    int ret = 0;
+
+    if (!ct3d->hostmem) {
+        return 0;
+    }
+
+    mr = host_memory_backend_get_memory(ct3d->hostmem);
+    if (!mr) {
+        return -EINVAL;
+    }
+
+    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));
+    if (!*cdat_table) {
+        return -ENOMEM;
+    }
+
+    /* Non volatile aspects */
+    ret = ct3_build_cdat_subtable(*cdat_table, mr, 0);
+    if (ret < 0) {
+        g_free(*cdat_table);
+        *cdat_table = NULL;
+    }
+
+    return ret;
+}
+
 static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
 {
     int i;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-12 18:21   ` Gregory Price
@ 2022-10-13  8:57       ` Jonathan Cameron via
  2022-10-12 18:21     ` [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code Gregory Price
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13  8:57 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:15 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Included in this response is a recommended patch set on top of this
> patch that resolves a number of issues, including style and a heap
> corruption bug.
> 
> The purpose of this patch set is to refactor the CDAT initialization
> code to support future patch sets that will introduce multi-region
> support in CXL Type3 devices.
> 
> 1) Checkpatch errors in the immediately prior patch
> 2) Flatting of code in cdat initialization
> 3) Changes in allocation and error checking for cleanliness
> 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
>    multi-region allocation in the future.  Also resolves a heap
>    corruption bug
> 5) Refactor of CDAT initialization code into a function that initializes
>    sub-tables per memory-region.
> 
> Gregory Price (5):
>   hw/mem/cxl_type3: fix checkpatch errors
>   hw/mem/cxl_type3: Pull validation checks ahead of functional code
>   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
>   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
>   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
>     function
> 
>  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
>  1 file changed, 122 insertions(+), 118 deletions(-)
> 

Thanks, I'm going to roll this stuff into the original patch set for v8.
Some of this I already have (like the check patch stuff).
Some I may disagree with in which case  I'll reply to the patches - note
I haven't looked at them in detail yet!

Jonathan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
@ 2022-10-13  8:57       ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13  8:57 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:15 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Included in this response is a recommended patch set on top of this
> patch that resolves a number of issues, including style and a heap
> corruption bug.
> 
> The purpose of this patch set is to refactor the CDAT initialization
> code to support future patch sets that will introduce multi-region
> support in CXL Type3 devices.
> 
> 1) Checkpatch errors in the immediately prior patch
> 2) Flatting of code in cdat initialization
> 3) Changes in allocation and error checking for cleanliness
> 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
>    multi-region allocation in the future.  Also resolves a heap
>    corruption bug
> 5) Refactor of CDAT initialization code into a function that initializes
>    sub-tables per memory-region.
> 
> Gregory Price (5):
>   hw/mem/cxl_type3: fix checkpatch errors
>   hw/mem/cxl_type3: Pull validation checks ahead of functional code
>   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
>   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
>   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
>     function
> 
>  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
>  1 file changed, 122 insertions(+), 118 deletions(-)
> 

Thanks, I'm going to roll this stuff into the original patch set for v8.
Some of this I already have (like the check patch stuff).
Some I may disagree with in which case  I'll reply to the patches - note
I haven't looked at them in detail yet!

Jonathan


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code
  2022-10-12 18:21     ` [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code Gregory Price
@ 2022-10-13  9:07         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13  9:07 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:17 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> For style - pulling these validations ahead flattens the code.

True, but at the cost of separating the check from where it is
obvious why we have the check.  I'd prefer to see it next to the
use. 

Inverting the hostmem check is resonable so I'll make that change.

My original thinking is that doing so would make adding non volatile
support messier but given you plan to factor out most of this the
change won't be too bad anyway.


> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>
> ---
>  hw/mem/cxl_type3.c | 193 ++++++++++++++++++++++-----------------------
>  1 file changed, 96 insertions(+), 97 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 94bc439d89..43b2b9e041 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -32,107 +32,106 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      int dslbis_nonvolatile_num = 4;
>      MemoryRegion *mr;
>  
> +    if (!ct3d->hostmem) {
> +        return len;
> +    }
> +
> +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> +    if (!mr) {
> +        return -EINVAL;
> +    }
> +
>      /* Non volatile aspects */
> -    if (ct3d->hostmem) {
> -        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -        if (!dsmas_nonvolatile) {
> -            return -ENOMEM;
> -        }
> -        nonvolatile_dsmad = next_dsmad_handle++;
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        *dsmas_nonvolatile = (CDATDsmas) {
> -            .header = {
> -                .type = CDAT_TYPE_DSMAS,
> -                .length = sizeof(*dsmas_nonvolatile),
> -            },
> -            .DSMADhandle = nonvolatile_dsmad,
> -            .flags = CDAT_DSMAS_FLAG_NV,
> -            .DPA_base = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> -
> -        /* For now, no memory side cache, plausiblish numbers */
> -        dslbis_nonvolatile =
> -            g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> -        if (!dslbis_nonvolatile) {
> -            return -ENOMEM;
> -        }
> +    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> +    if (!dsmas_nonvolatile) {
> +        return -ENOMEM;
> +    }
> +    nonvolatile_dsmad = next_dsmad_handle++;
> +    *dsmas_nonvolatile = (CDATDsmas) {
> +        .header = {
> +            .type = CDAT_TYPE_DSMAS,
> +            .length = sizeof(*dsmas_nonvolatile),
> +        },
> +        .DSMADhandle = nonvolatile_dsmad,
> +        .flags = CDAT_DSMAS_FLAG_NV,
> +        .DPA_base = 0,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
>  
> -        dslbis_nonvolatile[0] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_LATENCY,
> -            .entry_base_unit = 10000, /* 10ns base */
> -            .entry[0] = 15, /* 150ns */
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[1] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> -            .entry_base_unit = 10000,
> -            .entry[0] = 25, /* 250ns */
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[2] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[3] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -        *dsemts_nonvolatile = (CDATDsemts) {
> -            .header = {
> -                .type = CDAT_TYPE_DSEMTS,
> -                .length = sizeof(*dsemts_nonvolatile),
> -            },
> -            .DSMAS_handle = nonvolatile_dsmad,
> -            /* Reserved - the non volatile from DSMAS matters */
> -            .EFI_memory_type_attr = 2,
> -            .DPA_offset = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> +    /* For now, no memory side cache, plausiblish numbers */
> +    dslbis_nonvolatile =
> +        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> +    if (!dslbis_nonvolatile) {
> +        return -ENOMEM;
>      }
>  
> +    dslbis_nonvolatile[0] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_LATENCY,
> +        .entry_base_unit = 10000, /* 10ns base */
> +        .entry[0] = 15, /* 150ns */
> +    };
> +    len++;
> +
> +    dslbis_nonvolatile[1] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> +        .entry_base_unit = 10000,
> +        .entry[0] = 25, /* 250ns */
> +    };
> +    len++;
> +
> +    dslbis_nonvolatile[2] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    dslbis_nonvolatile[3] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> +    *dsemts_nonvolatile = (CDATDsemts) {
> +        .header = {
> +            .type = CDAT_TYPE_DSEMTS,
> +            .length = sizeof(*dsemts_nonvolatile),
> +        },
> +        .DSMAS_handle = nonvolatile_dsmad,
> +        /* Reserved - the non volatile from DSMAS matters */
> +        .EFI_memory_type_attr = 2,
> +        .DPA_offset = 0,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
> +
>      *cdat_table = g_malloc0(len * sizeof(*cdat_table));
>      /* Header always at start of structure */
>      if (dsmas_nonvolatile) {


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code
@ 2022-10-13  9:07         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13  9:07 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:17 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> For style - pulling these validations ahead flattens the code.

True, but at the cost of separating the check from where it is
obvious why we have the check.  I'd prefer to see it next to the
use. 

Inverting the hostmem check is resonable so I'll make that change.

My original thinking is that doing so would make adding non volatile
support messier but given you plan to factor out most of this the
change won't be too bad anyway.


> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>
> ---
>  hw/mem/cxl_type3.c | 193 ++++++++++++++++++++++-----------------------
>  1 file changed, 96 insertions(+), 97 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 94bc439d89..43b2b9e041 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -32,107 +32,106 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      int dslbis_nonvolatile_num = 4;
>      MemoryRegion *mr;
>  
> +    if (!ct3d->hostmem) {
> +        return len;
> +    }
> +
> +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> +    if (!mr) {
> +        return -EINVAL;
> +    }
> +
>      /* Non volatile aspects */
> -    if (ct3d->hostmem) {
> -        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -        if (!dsmas_nonvolatile) {
> -            return -ENOMEM;
> -        }
> -        nonvolatile_dsmad = next_dsmad_handle++;
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        *dsmas_nonvolatile = (CDATDsmas) {
> -            .header = {
> -                .type = CDAT_TYPE_DSMAS,
> -                .length = sizeof(*dsmas_nonvolatile),
> -            },
> -            .DSMADhandle = nonvolatile_dsmad,
> -            .flags = CDAT_DSMAS_FLAG_NV,
> -            .DPA_base = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> -
> -        /* For now, no memory side cache, plausiblish numbers */
> -        dslbis_nonvolatile =
> -            g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> -        if (!dslbis_nonvolatile) {
> -            return -ENOMEM;
> -        }
> +    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> +    if (!dsmas_nonvolatile) {
> +        return -ENOMEM;
> +    }
> +    nonvolatile_dsmad = next_dsmad_handle++;
> +    *dsmas_nonvolatile = (CDATDsmas) {
> +        .header = {
> +            .type = CDAT_TYPE_DSMAS,
> +            .length = sizeof(*dsmas_nonvolatile),
> +        },
> +        .DSMADhandle = nonvolatile_dsmad,
> +        .flags = CDAT_DSMAS_FLAG_NV,
> +        .DPA_base = 0,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
>  
> -        dslbis_nonvolatile[0] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_LATENCY,
> -            .entry_base_unit = 10000, /* 10ns base */
> -            .entry[0] = 15, /* 150ns */
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[1] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> -            .entry_base_unit = 10000,
> -            .entry[0] = 25, /* 250ns */
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[2] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        dslbis_nonvolatile[3] = (CDATDslbis) {
> -            .header = {
> -                .type = CDAT_TYPE_DSLBIS,
> -                .length = sizeof(*dslbis_nonvolatile),
> -            },
> -            .handle = nonvolatile_dsmad,
> -            .flags = HMAT_LB_MEM_MEMORY,
> -            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> -            .entry_base_unit = 1000, /* GB/s */
> -            .entry[0] = 16,
> -        };
> -        len++;
> -
> -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> -        if (!mr) {
> -            return -EINVAL;
> -        }
> -        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -        *dsemts_nonvolatile = (CDATDsemts) {
> -            .header = {
> -                .type = CDAT_TYPE_DSEMTS,
> -                .length = sizeof(*dsemts_nonvolatile),
> -            },
> -            .DSMAS_handle = nonvolatile_dsmad,
> -            /* Reserved - the non volatile from DSMAS matters */
> -            .EFI_memory_type_attr = 2,
> -            .DPA_offset = 0,
> -            .DPA_length = int128_get64(mr->size),
> -        };
> -        len++;
> +    /* For now, no memory side cache, plausiblish numbers */
> +    dslbis_nonvolatile =
> +        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> +    if (!dslbis_nonvolatile) {
> +        return -ENOMEM;
>      }
>  
> +    dslbis_nonvolatile[0] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_LATENCY,
> +        .entry_base_unit = 10000, /* 10ns base */
> +        .entry[0] = 15, /* 150ns */
> +    };
> +    len++;
> +
> +    dslbis_nonvolatile[1] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> +        .entry_base_unit = 10000,
> +        .entry[0] = 25, /* 250ns */
> +    };
> +    len++;
> +
> +    dslbis_nonvolatile[2] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    dslbis_nonvolatile[3] = (CDATDslbis) {
> +        .header = {
> +            .type = CDAT_TYPE_DSLBIS,
> +            .length = sizeof(*dslbis_nonvolatile),
> +        },
> +        .handle = nonvolatile_dsmad,
> +        .flags = HMAT_LB_MEM_MEMORY,
> +        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> +        .entry_base_unit = 1000, /* GB/s */
> +        .entry[0] = 16,
> +    };
> +    len++;
> +
> +    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> +    *dsemts_nonvolatile = (CDATDsemts) {
> +        .header = {
> +            .type = CDAT_TYPE_DSEMTS,
> +            .length = sizeof(*dsemts_nonvolatile),
> +        },
> +        .DSMAS_handle = nonvolatile_dsmad,
> +        /* Reserved - the non volatile from DSMAS matters */
> +        .EFI_memory_type_attr = 2,
> +        .DPA_offset = 0,
> +        .DPA_length = int128_get64(mr->size),
> +    };
> +    len++;
> +
>      *cdat_table = g_malloc0(len * sizeof(*cdat_table));
>      /* Header always at start of structure */
>      if (dsmas_nonvolatile) {



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-12 16:01   ` Gregory Price
@ 2022-10-13 10:40       ` Jonathan Cameron via
  2022-10-13 10:56       ` Jonathan Cameron via
  1 sibling, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 10:40 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, linuxarm, ira.weiny

On Wed, 12 Oct 2022 12:01:54 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> This code contains heap corruption on free, and I think should be
> refactored to pre-allocate all the entries we're interested in putting
> into the table.

Good point on the heap corruption.. (oops. Particularly as I raised
that I didn't like the complexity of your free in your previous version
and still failed to notice the current code was wrong...)


>  This would flatten the code and simplify the error
> handling steps.

I'm not so keen on this.  Error handling is pretty trivial because of
the autofree magic.  It will get a tiny bit harder once we have
two calls to the factored out function, but not too bad - we just
need to free the handed off pointers in reverse from wherever we
got to before the error.

> 
> Also, should we consider making a union with all the possible entries to
> make entry allocation easier?  It may eat a few extra bytes of memory,
> but it would simplify the allocation/cleanup code here further.

An interesting point, though gets trickier once we have variable numbers
of elements.  I'm not sure it's worth the effort to save a few lines
of code.

> 
> Given that every allocation has to be checked, i'm also not convinced
> the use of g_autofree is worth the potential footguns associated with
> it.

After rolling a version with some of your suggested changes incorporated
the autofree logic is all nice and localized so I think it's well worth
having. Only slightly messy bit is we end up with 4 separate pointers
for the bandwidth and latency elements. 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
@ 2022-10-13 10:40       ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 10:40 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, linuxarm, ira.weiny

On Wed, 12 Oct 2022 12:01:54 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> This code contains heap corruption on free, and I think should be
> refactored to pre-allocate all the entries we're interested in putting
> into the table.

Good point on the heap corruption.. (oops. Particularly as I raised
that I didn't like the complexity of your free in your previous version
and still failed to notice the current code was wrong...)


>  This would flatten the code and simplify the error
> handling steps.

I'm not so keen on this.  Error handling is pretty trivial because of
the autofree magic.  It will get a tiny bit harder once we have
two calls to the factored out function, but not too bad - we just
need to free the handed off pointers in reverse from wherever we
got to before the error.

> 
> Also, should we consider making a union with all the possible entries to
> make entry allocation easier?  It may eat a few extra bytes of memory,
> but it would simplify the allocation/cleanup code here further.

An interesting point, though gets trickier once we have variable numbers
of elements.  I'm not sure it's worth the effort to save a few lines
of code.

> 
> Given that every allocation has to be checked, i'm also not convinced
> the use of g_autofree is worth the potential footguns associated with
> it.

After rolling a version with some of your suggested changes incorporated
the autofree logic is all nice and localized so I think it's well worth
having. Only slightly messy bit is we end up with 4 separate pointers
for the bandwidth and latency elements. 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code
  2022-10-13  9:07         ` Jonathan Cameron via
@ 2022-10-13 10:42           ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 10:42 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Thu, 13 Oct 2022 10:07:40 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 12 Oct 2022 14:21:17 -0400
> Gregory Price <gourry.memverge@gmail.com> wrote:
> 
> > For style - pulling these validations ahead flattens the code.  
> 
> True, but at the cost of separating the check from where it is
> obvious why we have the check.  I'd prefer to see it next to the
> use. 
That separation made a bit more sense after factoring out the code
as then we want to pass the mr in rather than the HostMemBackend.

So in the end I did what you suggested :)

Jonathan

> 
> Inverting the hostmem check is resonable so I'll make that change.
> 
> My original thinking is that doing so would make adding non volatile
> support messier but given you plan to factor out most of this the
> change won't be too bad anyway.
> 
> 
> > 
> > Signed-off-by: Gregory Price <gregory.price@memverge.com>
> > ---
> >  hw/mem/cxl_type3.c | 193 ++++++++++++++++++++++-----------------------
> >  1 file changed, 96 insertions(+), 97 deletions(-)
> > 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 94bc439d89..43b2b9e041 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -32,107 +32,106 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> >      int dslbis_nonvolatile_num = 4;
> >      MemoryRegion *mr;
> >  
> > +    if (!ct3d->hostmem) {
> > +        return len;
> > +    }
> > +
> > +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> > +    if (!mr) {
> > +        return -EINVAL;
> > +    }
> > +
> >      /* Non volatile aspects */
> > -    if (ct3d->hostmem) {
> > -        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> > -        if (!dsmas_nonvolatile) {
> > -            return -ENOMEM;
> > -        }
> > -        nonvolatile_dsmad = next_dsmad_handle++;
> > -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> > -        if (!mr) {
> > -            return -EINVAL;
> > -        }
> > -        *dsmas_nonvolatile = (CDATDsmas) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSMAS,
> > -                .length = sizeof(*dsmas_nonvolatile),
> > -            },
> > -            .DSMADhandle = nonvolatile_dsmad,
> > -            .flags = CDAT_DSMAS_FLAG_NV,
> > -            .DPA_base = 0,
> > -            .DPA_length = int128_get64(mr->size),
> > -        };
> > -        len++;
> > -
> > -        /* For now, no memory side cache, plausiblish numbers */
> > -        dslbis_nonvolatile =
> > -            g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> > -        if (!dslbis_nonvolatile) {
> > -            return -ENOMEM;
> > -        }
> > +    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> > +    if (!dsmas_nonvolatile) {
> > +        return -ENOMEM;
> > +    }
> > +    nonvolatile_dsmad = next_dsmad_handle++;
> > +    *dsmas_nonvolatile = (CDATDsmas) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSMAS,
> > +            .length = sizeof(*dsmas_nonvolatile),
> > +        },
> > +        .DSMADhandle = nonvolatile_dsmad,
> > +        .flags = CDAT_DSMAS_FLAG_NV,
> > +        .DPA_base = 0,
> > +        .DPA_length = int128_get64(mr->size),
> > +    };
> > +    len++;
> >  
> > -        dslbis_nonvolatile[0] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_READ_LATENCY,
> > -            .entry_base_unit = 10000, /* 10ns base */
> > -            .entry[0] = 15, /* 150ns */
> > -        };
> > -        len++;
> > -
> > -        dslbis_nonvolatile[1] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> > -            .entry_base_unit = 10000,
> > -            .entry[0] = 25, /* 250ns */
> > -        };
> > -        len++;
> > -
> > -        dslbis_nonvolatile[2] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> > -            .entry_base_unit = 1000, /* GB/s */
> > -            .entry[0] = 16,
> > -        };
> > -        len++;
> > -
> > -        dslbis_nonvolatile[3] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> > -            .entry_base_unit = 1000, /* GB/s */
> > -            .entry[0] = 16,
> > -        };
> > -        len++;
> > -
> > -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> > -        if (!mr) {
> > -            return -EINVAL;
> > -        }
> > -        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> > -        *dsemts_nonvolatile = (CDATDsemts) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSEMTS,
> > -                .length = sizeof(*dsemts_nonvolatile),
> > -            },
> > -            .DSMAS_handle = nonvolatile_dsmad,
> > -            /* Reserved - the non volatile from DSMAS matters */
> > -            .EFI_memory_type_attr = 2,
> > -            .DPA_offset = 0,
> > -            .DPA_length = int128_get64(mr->size),
> > -        };
> > -        len++;
> > +    /* For now, no memory side cache, plausiblish numbers */
> > +    dslbis_nonvolatile =
> > +        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> > +    if (!dslbis_nonvolatile) {
> > +        return -ENOMEM;
> >      }
> >  
> > +    dslbis_nonvolatile[0] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_READ_LATENCY,
> > +        .entry_base_unit = 10000, /* 10ns base */
> > +        .entry[0] = 15, /* 150ns */
> > +    };
> > +    len++;
> > +
> > +    dslbis_nonvolatile[1] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> > +        .entry_base_unit = 10000,
> > +        .entry[0] = 25, /* 250ns */
> > +    };
> > +    len++;
> > +
> > +    dslbis_nonvolatile[2] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> > +        .entry_base_unit = 1000, /* GB/s */
> > +        .entry[0] = 16,
> > +    };
> > +    len++;
> > +
> > +    dslbis_nonvolatile[3] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> > +        .entry_base_unit = 1000, /* GB/s */
> > +        .entry[0] = 16,
> > +    };
> > +    len++;
> > +
> > +    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> > +    *dsemts_nonvolatile = (CDATDsemts) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSEMTS,
> > +            .length = sizeof(*dsemts_nonvolatile),
> > +        },
> > +        .DSMAS_handle = nonvolatile_dsmad,
> > +        /* Reserved - the non volatile from DSMAS matters */
> > +        .EFI_memory_type_attr = 2,
> > +        .DPA_offset = 0,
> > +        .DPA_length = int128_get64(mr->size),
> > +    };
> > +    len++;
> > +
> >      *cdat_table = g_malloc0(len * sizeof(*cdat_table));
> >      /* Header always at start of structure */
> >      if (dsmas_nonvolatile) {  
> 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code
@ 2022-10-13 10:42           ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 10:42 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Thu, 13 Oct 2022 10:07:40 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 12 Oct 2022 14:21:17 -0400
> Gregory Price <gourry.memverge@gmail.com> wrote:
> 
> > For style - pulling these validations ahead flattens the code.  
> 
> True, but at the cost of separating the check from where it is
> obvious why we have the check.  I'd prefer to see it next to the
> use. 
That separation made a bit more sense after factoring out the code
as then we want to pass the mr in rather than the HostMemBackend.

So in the end I did what you suggested :)

Jonathan

> 
> Inverting the hostmem check is resonable so I'll make that change.
> 
> My original thinking is that doing so would make adding non volatile
> support messier but given you plan to factor out most of this the
> change won't be too bad anyway.
> 
> 
> > 
> > Signed-off-by: Gregory Price <gregory.price@memverge.com>
> > ---
> >  hw/mem/cxl_type3.c | 193 ++++++++++++++++++++++-----------------------
> >  1 file changed, 96 insertions(+), 97 deletions(-)
> > 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 94bc439d89..43b2b9e041 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -32,107 +32,106 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> >      int dslbis_nonvolatile_num = 4;
> >      MemoryRegion *mr;
> >  
> > +    if (!ct3d->hostmem) {
> > +        return len;
> > +    }
> > +
> > +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> > +    if (!mr) {
> > +        return -EINVAL;
> > +    }
> > +
> >      /* Non volatile aspects */
> > -    if (ct3d->hostmem) {
> > -        dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> > -        if (!dsmas_nonvolatile) {
> > -            return -ENOMEM;
> > -        }
> > -        nonvolatile_dsmad = next_dsmad_handle++;
> > -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> > -        if (!mr) {
> > -            return -EINVAL;
> > -        }
> > -        *dsmas_nonvolatile = (CDATDsmas) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSMAS,
> > -                .length = sizeof(*dsmas_nonvolatile),
> > -            },
> > -            .DSMADhandle = nonvolatile_dsmad,
> > -            .flags = CDAT_DSMAS_FLAG_NV,
> > -            .DPA_base = 0,
> > -            .DPA_length = int128_get64(mr->size),
> > -        };
> > -        len++;
> > -
> > -        /* For now, no memory side cache, plausiblish numbers */
> > -        dslbis_nonvolatile =
> > -            g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> > -        if (!dslbis_nonvolatile) {
> > -            return -ENOMEM;
> > -        }
> > +    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> > +    if (!dsmas_nonvolatile) {
> > +        return -ENOMEM;
> > +    }
> > +    nonvolatile_dsmad = next_dsmad_handle++;
> > +    *dsmas_nonvolatile = (CDATDsmas) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSMAS,
> > +            .length = sizeof(*dsmas_nonvolatile),
> > +        },
> > +        .DSMADhandle = nonvolatile_dsmad,
> > +        .flags = CDAT_DSMAS_FLAG_NV,
> > +        .DPA_base = 0,
> > +        .DPA_length = int128_get64(mr->size),
> > +    };
> > +    len++;
> >  
> > -        dslbis_nonvolatile[0] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_READ_LATENCY,
> > -            .entry_base_unit = 10000, /* 10ns base */
> > -            .entry[0] = 15, /* 150ns */
> > -        };
> > -        len++;
> > -
> > -        dslbis_nonvolatile[1] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> > -            .entry_base_unit = 10000,
> > -            .entry[0] = 25, /* 250ns */
> > -        };
> > -        len++;
> > -
> > -        dslbis_nonvolatile[2] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> > -            .entry_base_unit = 1000, /* GB/s */
> > -            .entry[0] = 16,
> > -        };
> > -        len++;
> > -
> > -        dslbis_nonvolatile[3] = (CDATDslbis) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSLBIS,
> > -                .length = sizeof(*dslbis_nonvolatile),
> > -            },
> > -            .handle = nonvolatile_dsmad,
> > -            .flags = HMAT_LB_MEM_MEMORY,
> > -            .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> > -            .entry_base_unit = 1000, /* GB/s */
> > -            .entry[0] = 16,
> > -        };
> > -        len++;
> > -
> > -        mr = host_memory_backend_get_memory(ct3d->hostmem);
> > -        if (!mr) {
> > -            return -EINVAL;
> > -        }
> > -        dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> > -        *dsemts_nonvolatile = (CDATDsemts) {
> > -            .header = {
> > -                .type = CDAT_TYPE_DSEMTS,
> > -                .length = sizeof(*dsemts_nonvolatile),
> > -            },
> > -            .DSMAS_handle = nonvolatile_dsmad,
> > -            /* Reserved - the non volatile from DSMAS matters */
> > -            .EFI_memory_type_attr = 2,
> > -            .DPA_offset = 0,
> > -            .DPA_length = int128_get64(mr->size),
> > -        };
> > -        len++;
> > +    /* For now, no memory side cache, plausiblish numbers */
> > +    dslbis_nonvolatile =
> > +        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> > +    if (!dslbis_nonvolatile) {
> > +        return -ENOMEM;
> >      }
> >  
> > +    dslbis_nonvolatile[0] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_READ_LATENCY,
> > +        .entry_base_unit = 10000, /* 10ns base */
> > +        .entry[0] = 15, /* 150ns */
> > +    };
> > +    len++;
> > +
> > +    dslbis_nonvolatile[1] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
> > +        .entry_base_unit = 10000,
> > +        .entry[0] = 25, /* 250ns */
> > +    };
> > +    len++;
> > +
> > +    dslbis_nonvolatile[2] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
> > +        .entry_base_unit = 1000, /* GB/s */
> > +        .entry[0] = 16,
> > +    };
> > +    len++;
> > +
> > +    dslbis_nonvolatile[3] = (CDATDslbis) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSLBIS,
> > +            .length = sizeof(*dslbis_nonvolatile),
> > +        },
> > +        .handle = nonvolatile_dsmad,
> > +        .flags = HMAT_LB_MEM_MEMORY,
> > +        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
> > +        .entry_base_unit = 1000, /* GB/s */
> > +        .entry[0] = 16,
> > +    };
> > +    len++;
> > +
> > +    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> > +    *dsemts_nonvolatile = (CDATDsemts) {
> > +        .header = {
> > +            .type = CDAT_TYPE_DSEMTS,
> > +            .length = sizeof(*dsemts_nonvolatile),
> > +        },
> > +        .DSMAS_handle = nonvolatile_dsmad,
> > +        /* Reserved - the non volatile from DSMAS matters */
> > +        .EFI_memory_type_attr = 2,
> > +        .DPA_offset = 0,
> > +        .DPA_length = int128_get64(mr->size),
> > +    };
> > +    len++;
> > +
> >      *cdat_table = g_malloc0(len * sizeof(*cdat_table));
> >      /* Header always at start of structure */
> >      if (dsmas_nonvolatile) {  
> 
> 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/5] hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
  2022-10-12 18:21     ` [PATCH 3/5] hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work Gregory Price
@ 2022-10-13 10:44         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 10:44 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:18 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Makes the size of the allocated cdat table static (6 entries),
> flattens the code, and reduces the number of exit conditions
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Hmm. I don't entirely like this as it stands because it leads to more
fragile code as we don't have clear association between number
of entries and actual assignments.

So, what I've done (inspired by this) is moved to a local enum
in the factored out building function that has an element for
each of the entries (used ultimately to assign them) and
a trailing NUM_ENTRIES element we can then use in place of
the CT3_CDAT_SUBTABLE_SIZE define you have here.

I went with the 2 pass approach mentioned in a later patch, so
if cdat_table passed to the factored out code is NULL, we just
return NUM_ENTRIES directly.

> ---
>  hw/mem/cxl_type3.c | 52 ++++++++++++++++++++--------------------------
>  1 file changed, 22 insertions(+), 30 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 43b2b9e041..0e0ea70387 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -17,6 +17,7 @@
>  #include "hw/pci/msix.h"
>  
>  #define DWORD_BYTE 4
> +#define CT3_CDAT_SUBTABLE_SIZE 6

>  
>  static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                  void *priv)
> @@ -25,7 +26,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
>      g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
>      CXLType3Dev *ct3d = priv;
> -    int len = 0;
>      int i = 0;
>      int next_dsmad_handle = 0;
>      int nonvolatile_dsmad = -1;
> @@ -33,7 +33,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      MemoryRegion *mr;
>  
>      if (!ct3d->hostmem) {
> -        return len;
> +        return 0;
>      }
>  
>      mr = host_memory_backend_get_memory(ct3d->hostmem);
> @@ -41,11 +41,22 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          return -EINVAL;
>      }
>  
> +    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));
> +    if (!*cdat_table) {
> +        return -ENOMEM;
> +    }
> +
>      /* Non volatile aspects */
>      dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -    if (!dsmas_nonvolatile) {
> +    dslbis_nonvolatile =
> +        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> +    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> +    if (!dsmas_nonvolatile || !dslbis_nonvolatile || !dsemts_nonvolatile) {

I don't like aggregated error checking. It saves lines of code, but leads
to generally less mantainable code.  I prefer to do one thing, check it and handle
necessary errors - provides a small localized chunk of code that is easy to
review and maintain.
1. Allocate structure
2. Fill structure.

We have to leave the assignment till later as only want to steal the pointers
once we know there are no error paths.

> +        g_free(*cdat_table);

We have auto free to clean this up. So if this did make sense, use a local
g_autofree CDATSubHeader **cdat_table = NULL;
and steal the pointer when assigning *cdat_table at the end of this function
after all the failure paths.

This code all ends up in the caller of the factored out code anyway so
that comment becomes irrelevant on the version I've ended up with.

Jonathan



> +        *cdat_table = NULL;
>          return -ENOMEM;
>      }
> +
>      nonvolatile_dsmad = next_dsmad_handle++;
>      *dsmas_nonvolatile = (CDATDsmas) {
>          .header = {
> @@ -57,15 +68,8 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .DPA_base = 0,
>          .DPA_length = int128_get64(mr->size),
>      };
> -    len++;
>  
>      /* For now, no memory side cache, plausiblish numbers */
> -    dslbis_nonvolatile =
> -        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> -    if (!dslbis_nonvolatile) {
> -        return -ENOMEM;
> -    }
> -
>      dslbis_nonvolatile[0] = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> @@ -77,7 +81,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 10000, /* 10ns base */
>          .entry[0] = 15, /* 150ns */
>      };
> -    len++;
>  
>      dslbis_nonvolatile[1] = (CDATDslbis) {
>          .header = {
> @@ -90,7 +93,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 10000,
>          .entry[0] = 25, /* 250ns */
>      };
> -    len++;
>  
>      dslbis_nonvolatile[2] = (CDATDslbis) {
>          .header = {
> @@ -103,7 +105,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
> -    len++;
>  
>      dslbis_nonvolatile[3] = (CDATDslbis) {
>          .header = {
> @@ -116,9 +117,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
> -    len++;
>  
> -    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
>      *dsemts_nonvolatile = (CDATDsemts) {
>          .header = {
>              .type = CDAT_TYPE_DSEMTS,
> @@ -130,26 +129,19 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .DPA_offset = 0,
>          .DPA_length = int128_get64(mr->size),
>      };
> -    len++;
>  
> -    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
>      /* Header always at start of structure */
> -    if (dsmas_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
> -    }
> -    if (dslbis_nonvolatile) {
> -        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
> -        int j;
> +    (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
>  
> -        for (j = 0; j < dslbis_nonvolatile_num; j++) {
> -            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
> -        }
> -    }
> -    if (dsemts_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +    CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
Removing the paranoid checking makes sense if we are going to handle
the volatile / non volatile as 'whole sets of tables'.

> +    int j;
> +    for (j = 0; j < dslbis_nonvolatile_num; j++) {
> +        (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
>      }
>  
> -    return len;
> +    (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +
> +    return CT3_CDAT_SUBTABLE_SIZE;
>  }
>  
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 3/5] hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
@ 2022-10-13 10:44         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 10:44 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:18 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Makes the size of the allocated cdat table static (6 entries),
> flattens the code, and reduces the number of exit conditions
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Hmm. I don't entirely like this as it stands because it leads to more
fragile code as we don't have clear association between number
of entries and actual assignments.

So, what I've done (inspired by this) is moved to a local enum
in the factored out building function that has an element for
each of the entries (used ultimately to assign them) and
a trailing NUM_ENTRIES element we can then use in place of
the CT3_CDAT_SUBTABLE_SIZE define you have here.

I went with the 2 pass approach mentioned in a later patch, so
if cdat_table passed to the factored out code is NULL, we just
return NUM_ENTRIES directly.

> ---
>  hw/mem/cxl_type3.c | 52 ++++++++++++++++++++--------------------------
>  1 file changed, 22 insertions(+), 30 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 43b2b9e041..0e0ea70387 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -17,6 +17,7 @@
>  #include "hw/pci/msix.h"
>  
>  #define DWORD_BYTE 4
> +#define CT3_CDAT_SUBTABLE_SIZE 6

>  
>  static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                  void *priv)
> @@ -25,7 +26,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
>      g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
>      CXLType3Dev *ct3d = priv;
> -    int len = 0;
>      int i = 0;
>      int next_dsmad_handle = 0;
>      int nonvolatile_dsmad = -1;
> @@ -33,7 +33,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      MemoryRegion *mr;
>  
>      if (!ct3d->hostmem) {
> -        return len;
> +        return 0;
>      }
>  
>      mr = host_memory_backend_get_memory(ct3d->hostmem);
> @@ -41,11 +41,22 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          return -EINVAL;
>      }
>  
> +    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));
> +    if (!*cdat_table) {
> +        return -ENOMEM;
> +    }
> +
>      /* Non volatile aspects */
>      dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -    if (!dsmas_nonvolatile) {
> +    dslbis_nonvolatile =
> +        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> +    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> +    if (!dsmas_nonvolatile || !dslbis_nonvolatile || !dsemts_nonvolatile) {

I don't like aggregated error checking. It saves lines of code, but leads
to generally less mantainable code.  I prefer to do one thing, check it and handle
necessary errors - provides a small localized chunk of code that is easy to
review and maintain.
1. Allocate structure
2. Fill structure.

We have to leave the assignment till later as only want to steal the pointers
once we know there are no error paths.

> +        g_free(*cdat_table);

We have auto free to clean this up. So if this did make sense, use a local
g_autofree CDATSubHeader **cdat_table = NULL;
and steal the pointer when assigning *cdat_table at the end of this function
after all the failure paths.

This code all ends up in the caller of the factored out code anyway so
that comment becomes irrelevant on the version I've ended up with.

Jonathan



> +        *cdat_table = NULL;
>          return -ENOMEM;
>      }
> +
>      nonvolatile_dsmad = next_dsmad_handle++;
>      *dsmas_nonvolatile = (CDATDsmas) {
>          .header = {
> @@ -57,15 +68,8 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .DPA_base = 0,
>          .DPA_length = int128_get64(mr->size),
>      };
> -    len++;
>  
>      /* For now, no memory side cache, plausiblish numbers */
> -    dslbis_nonvolatile =
> -        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> -    if (!dslbis_nonvolatile) {
> -        return -ENOMEM;
> -    }
> -
>      dslbis_nonvolatile[0] = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> @@ -77,7 +81,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 10000, /* 10ns base */
>          .entry[0] = 15, /* 150ns */
>      };
> -    len++;
>  
>      dslbis_nonvolatile[1] = (CDATDslbis) {
>          .header = {
> @@ -90,7 +93,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 10000,
>          .entry[0] = 25, /* 250ns */
>      };
> -    len++;
>  
>      dslbis_nonvolatile[2] = (CDATDslbis) {
>          .header = {
> @@ -103,7 +105,6 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
> -    len++;
>  
>      dslbis_nonvolatile[3] = (CDATDslbis) {
>          .header = {
> @@ -116,9 +117,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
> -    len++;
>  
> -    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
>      *dsemts_nonvolatile = (CDATDsemts) {
>          .header = {
>              .type = CDAT_TYPE_DSEMTS,
> @@ -130,26 +129,19 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .DPA_offset = 0,
>          .DPA_length = int128_get64(mr->size),
>      };
> -    len++;
>  
> -    *cdat_table = g_malloc0(len * sizeof(*cdat_table));
>      /* Header always at start of structure */
> -    if (dsmas_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
> -    }
> -    if (dslbis_nonvolatile) {
> -        CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
> -        int j;
> +    (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
>  
> -        for (j = 0; j < dslbis_nonvolatile_num; j++) {
> -            (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
> -        }
> -    }
> -    if (dsemts_nonvolatile) {
> -        (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +    CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
Removing the paranoid checking makes sense if we are going to handle
the volatile / non volatile as 'whole sets of tables'.

> +    int j;
> +    for (j = 0; j < dslbis_nonvolatile_num; j++) {
> +        (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
>      }
>  
> -    return len;
> +    (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +
> +    return CT3_CDAT_SUBTABLE_SIZE;
>  }
>  
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/5] hw/mem/cxl_type3: Change the CDAT allocation/free strategy
  2022-10-12 18:21     ` [PATCH 4/5] hw/mem/cxl_type3: Change the CDAT allocation/free strategy Gregory Price
@ 2022-10-13 10:45         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 10:45 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:19 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> The existing code allocates a subtable for SLBIS entries, uses a
> local variable to avoid a g_autofree footgun, and the cleanup code
> causes heap corruption.

Ah good point (particularly given I moaned about how you were handling
the frees and still failed to notice the current code was broken!)


> 
> Rather than allocate a table, explicitly allocate each individual entry
> and make the sub-table size static.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

I'll integrate a change in the spirit of what you have here, but
without aggregating the error handling paths.

> ---
>  hw/mem/cxl_type3.c | 49 ++++++++++++++++++++++++----------------------
>  1 file changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 0e0ea70387..220b9f09a9 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -23,13 +23,14 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                  void *priv)
>  {
>      g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile1 = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile2 = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile3 = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile4 = NULL;
>      g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
>      CXLType3Dev *ct3d = priv;
> -    int i = 0;
>      int next_dsmad_handle = 0;
>      int nonvolatile_dsmad = -1;
> -    int dslbis_nonvolatile_num = 4;
>      MemoryRegion *mr;
>  
>      if (!ct3d->hostmem) {
> @@ -48,10 +49,15 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>  
>      /* Non volatile aspects */
>      dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -    dslbis_nonvolatile =
> -        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> +    dslbis_nonvolatile1 = g_malloc(sizeof(*dslbis_nonvolatile1));
> +    dslbis_nonvolatile2 = g_malloc(sizeof(*dslbis_nonvolatile2));
> +    dslbis_nonvolatile3 = g_malloc(sizeof(*dslbis_nonvolatile3));
> +    dslbis_nonvolatile4 = g_malloc(sizeof(*dslbis_nonvolatile4));
>      dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -    if (!dsmas_nonvolatile || !dslbis_nonvolatile || !dsemts_nonvolatile) {
> +
> +    if (!dsmas_nonvolatile || !dsemts_nonvolatile ||
> +        !dslbis_nonvolatile1 || !dslbis_nonvolatile2 ||
> +        !dslbis_nonvolatile3 || !dslbis_nonvolatile4) {
>          g_free(*cdat_table);
>          *cdat_table = NULL;
>          return -ENOMEM;
> @@ -70,10 +76,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      };
>  
>      /* For now, no memory side cache, plausiblish numbers */
> -    dslbis_nonvolatile[0] = (CDATDslbis) {
> +    *dslbis_nonvolatile1 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile1),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -82,10 +88,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry[0] = 15, /* 150ns */
>      };
>  
> -    dslbis_nonvolatile[1] = (CDATDslbis) {
> +    *dslbis_nonvolatile2 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile2),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -94,10 +100,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry[0] = 25, /* 250ns */
>      };
>  
> -    dslbis_nonvolatile[2] = (CDATDslbis) {
> +    *dslbis_nonvolatile3 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile3),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -106,10 +112,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry[0] = 16,
>      };
>  
> -    dslbis_nonvolatile[3] = (CDATDslbis) {
> +    *dslbis_nonvolatile4 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile4),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -131,15 +137,12 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      };
>  
>      /* Header always at start of structure */
> -    (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
> -
> -    CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
> -    int j;
> -    for (j = 0; j < dslbis_nonvolatile_num; j++) {
> -        (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
> -    }
> -
> -    (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +    (*cdat_table)[0] = g_steal_pointer(&dsmas_nonvolatile);
> +    (*cdat_table)[1] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile1);
> +    (*cdat_table)[2] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile2);
> +    (*cdat_table)[3] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile3);
> +    (*cdat_table)[4] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile4);
> +    (*cdat_table)[5] = g_steal_pointer(&dsemts_nonvolatile);
Moving to simple indexing makes sense now they are all in one place (making
introducing a bug much less likely!)

I've introduced an enum so that we have an automatic agreement between
number of elements and these assignments.

>  
>      return CT3_CDAT_SUBTABLE_SIZE;
>  }


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 4/5] hw/mem/cxl_type3: Change the CDAT allocation/free strategy
@ 2022-10-13 10:45         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 10:45 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:19 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> The existing code allocates a subtable for SLBIS entries, uses a
> local variable to avoid a g_autofree footgun, and the cleanup code
> causes heap corruption.

Ah good point (particularly given I moaned about how you were handling
the frees and still failed to notice the current code was broken!)


> 
> Rather than allocate a table, explicitly allocate each individual entry
> and make the sub-table size static.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

I'll integrate a change in the spirit of what you have here, but
without aggregating the error handling paths.

> ---
>  hw/mem/cxl_type3.c | 49 ++++++++++++++++++++++++----------------------
>  1 file changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 0e0ea70387..220b9f09a9 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -23,13 +23,14 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                  void *priv)
>  {
>      g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile1 = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile2 = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile3 = NULL;
> +    g_autofree CDATDslbis *dslbis_nonvolatile4 = NULL;
>      g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
>      CXLType3Dev *ct3d = priv;
> -    int i = 0;
>      int next_dsmad_handle = 0;
>      int nonvolatile_dsmad = -1;
> -    int dslbis_nonvolatile_num = 4;
>      MemoryRegion *mr;
>  
>      if (!ct3d->hostmem) {
> @@ -48,10 +49,15 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>  
>      /* Non volatile aspects */
>      dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -    dslbis_nonvolatile =
> -        g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> +    dslbis_nonvolatile1 = g_malloc(sizeof(*dslbis_nonvolatile1));
> +    dslbis_nonvolatile2 = g_malloc(sizeof(*dslbis_nonvolatile2));
> +    dslbis_nonvolatile3 = g_malloc(sizeof(*dslbis_nonvolatile3));
> +    dslbis_nonvolatile4 = g_malloc(sizeof(*dslbis_nonvolatile4));
>      dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -    if (!dsmas_nonvolatile || !dslbis_nonvolatile || !dsemts_nonvolatile) {
> +
> +    if (!dsmas_nonvolatile || !dsemts_nonvolatile ||
> +        !dslbis_nonvolatile1 || !dslbis_nonvolatile2 ||
> +        !dslbis_nonvolatile3 || !dslbis_nonvolatile4) {
>          g_free(*cdat_table);
>          *cdat_table = NULL;
>          return -ENOMEM;
> @@ -70,10 +76,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      };
>  
>      /* For now, no memory side cache, plausiblish numbers */
> -    dslbis_nonvolatile[0] = (CDATDslbis) {
> +    *dslbis_nonvolatile1 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile1),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -82,10 +88,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry[0] = 15, /* 150ns */
>      };
>  
> -    dslbis_nonvolatile[1] = (CDATDslbis) {
> +    *dslbis_nonvolatile2 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile2),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -94,10 +100,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry[0] = 25, /* 250ns */
>      };
>  
> -    dslbis_nonvolatile[2] = (CDATDslbis) {
> +    *dslbis_nonvolatile3 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile3),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -106,10 +112,10 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>          .entry[0] = 16,
>      };
>  
> -    dslbis_nonvolatile[3] = (CDATDslbis) {
> +    *dslbis_nonvolatile4 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile),
> +            .length = sizeof(*dslbis_nonvolatile4),
>          },
>          .handle = nonvolatile_dsmad,
>          .flags = HMAT_LB_MEM_MEMORY,
> @@ -131,15 +137,12 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      };
>  
>      /* Header always at start of structure */
> -    (*cdat_table)[i++] = g_steal_pointer(&dsmas_nonvolatile);
> -
> -    CDATDslbis *dslbis = g_steal_pointer(&dslbis_nonvolatile);
> -    int j;
> -    for (j = 0; j < dslbis_nonvolatile_num; j++) {
> -        (*cdat_table)[i++] = (CDATSubHeader *)&dslbis[j];
> -    }
> -
> -    (*cdat_table)[i++] = g_steal_pointer(&dsemts_nonvolatile);
> +    (*cdat_table)[0] = g_steal_pointer(&dsmas_nonvolatile);
> +    (*cdat_table)[1] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile1);
> +    (*cdat_table)[2] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile2);
> +    (*cdat_table)[3] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile3);
> +    (*cdat_table)[4] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile4);
> +    (*cdat_table)[5] = g_steal_pointer(&dsemts_nonvolatile);
Moving to simple indexing makes sense now they are all in one place (making
introducing a bug much less likely!)

I've introduced an enum so that we have an automatic agreement between
number of elements and these assignments.

>  
>      return CT3_CDAT_SUBTABLE_SIZE;
>  }



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function
  2022-10-12 18:21     ` [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function Gregory Price
@ 2022-10-13 10:47         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 10:47 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:20 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> The CDAT can contain multiple entries for multiple memory regions, this
> will allow us to re-use the initialization code when volatile memory
> region support is added.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

I'm in two minds about this... We could integrate it in the original series,
but at that time the change is justified.  Or we could leave it as a first
patch in your follow on series.

Anyhow, I went with a similar refactor inspired by this.


> ---
>  hw/mem/cxl_type3.c | 137 ++++++++++++++++++++++++---------------------
>  1 file changed, 72 insertions(+), 65 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 220b9f09a9..3c5485abd0 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -19,117 +19,93 @@
>  #define DWORD_BYTE 4
>  #define CT3_CDAT_SUBTABLE_SIZE 6
>  
> -static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> -                                void *priv)
> +static int ct3_build_cdat_subtable(CDATSubHeader **cdat_table,
> +        MemoryRegion *mr, int dsmad_handle)

subtable is particularly well defined.  Maybe
ct3_build_cdat_entries_for_mr()?

>  {
> -    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile1 = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile2 = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile3 = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile4 = NULL;
> -    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
> -    CXLType3Dev *ct3d = priv;
> -    int next_dsmad_handle = 0;
> -    int nonvolatile_dsmad = -1;
> -    MemoryRegion *mr;
> -
> -    if (!ct3d->hostmem) {
> -        return 0;
> -    }
> -
> -    mr = host_memory_backend_get_memory(ct3d->hostmem);
> -    if (!mr) {
> -        return -EINVAL;
> -    }
> -
> -    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));
> -    if (!*cdat_table) {
> -        return -ENOMEM;
> -    }
> -
> -    /* Non volatile aspects */
> -    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -    dslbis_nonvolatile1 = g_malloc(sizeof(*dslbis_nonvolatile1));
> -    dslbis_nonvolatile2 = g_malloc(sizeof(*dslbis_nonvolatile2));
> -    dslbis_nonvolatile3 = g_malloc(sizeof(*dslbis_nonvolatile3));
> -    dslbis_nonvolatile4 = g_malloc(sizeof(*dslbis_nonvolatile4));
> -    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -
> -    if (!dsmas_nonvolatile || !dsemts_nonvolatile ||
> -        !dslbis_nonvolatile1 || !dslbis_nonvolatile2 ||
> -        !dslbis_nonvolatile3 || !dslbis_nonvolatile4) {
> -        g_free(*cdat_table);
> -        *cdat_table = NULL;
> +    g_autofree CDATDsmas *dsmas = NULL;
> +    g_autofree CDATDslbis *dslbis1 = NULL;
> +    g_autofree CDATDslbis *dslbis2 = NULL;
> +    g_autofree CDATDslbis *dslbis3 = NULL;
> +    g_autofree CDATDslbis *dslbis4 = NULL;
> +    g_autofree CDATDsemts *dsemts = NULL;
> +
> +    dsmas = g_malloc(sizeof(*dsmas));
> +    dslbis1 = g_malloc(sizeof(*dslbis1));
> +    dslbis2 = g_malloc(sizeof(*dslbis2));
> +    dslbis3 = g_malloc(sizeof(*dslbis3));
> +    dslbis4 = g_malloc(sizeof(*dslbis4));
> +    dsemts = g_malloc(sizeof(*dsemts));
> +
> +    if (!dsmas || !dslbis1 || !dslbis2 || !dslbis3 || !dslbis4 || !dsemts) {
>          return -ENOMEM;
>      }
>  
> -    nonvolatile_dsmad = next_dsmad_handle++;
> -    *dsmas_nonvolatile = (CDATDsmas) {
> +    *dsmas = (CDATDsmas) {
>          .header = {
>              .type = CDAT_TYPE_DSMAS,
> -            .length = sizeof(*dsmas_nonvolatile),
> +            .length = sizeof(*dsmas),
>          },
> -        .DSMADhandle = nonvolatile_dsmad,
> +        .DSMADhandle = dsmad_handle,
>          .flags = CDAT_DSMAS_FLAG_NV,
>          .DPA_base = 0,
>          .DPA_length = int128_get64(mr->size),
>      };
>  
>      /* For now, no memory side cache, plausiblish numbers */
> -    *dslbis_nonvolatile1 = (CDATDslbis) {
> +    *dslbis1 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile1),
> +            .length = sizeof(*dslbis1),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_READ_LATENCY,
>          .entry_base_unit = 10000, /* 10ns base */
>          .entry[0] = 15, /* 150ns */

If we are going to wrap this up for volatile / non-volatile 
we probably need to pass in a reasonable value for these.
Whilst not technically always true, to test the Linux handling
I'd want non-volatile to report as longer latency.

>      };
>  
> -    *dslbis_nonvolatile2 = (CDATDslbis) {
> +    *dslbis2 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile2),
> +            .length = sizeof(*dslbis2),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_WRITE_LATENCY,
>          .entry_base_unit = 10000,
>          .entry[0] = 25, /* 250ns */
>      };
>  
> -    *dslbis_nonvolatile3 = (CDATDslbis) {
> +    *dslbis3 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile3),
> +            .length = sizeof(*dslbis3),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
>  
> -    *dslbis_nonvolatile4 = (CDATDslbis) {
> +    *dslbis4 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile4),
> +            .length = sizeof(*dslbis4),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
>  
> -    *dsemts_nonvolatile = (CDATDsemts) {
> +    *dsemts = (CDATDsemts) {
>          .header = {
>              .type = CDAT_TYPE_DSEMTS,
> -            .length = sizeof(*dsemts_nonvolatile),
> +            .length = sizeof(*dsemts),
>          },
> -        .DSMAS_handle = nonvolatile_dsmad,
> +        .DSMAS_handle = dsmad_handle,
>          /* Reserved - the non volatile from DSMAS matters */
>          .EFI_memory_type_attr = 2,
>          .DPA_offset = 0,
> @@ -137,16 +113,47 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      };
>  
>      /* Header always at start of structure */
> -    (*cdat_table)[0] = g_steal_pointer(&dsmas_nonvolatile);
> -    (*cdat_table)[1] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile1);
> -    (*cdat_table)[2] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile2);
> -    (*cdat_table)[3] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile3);
> -    (*cdat_table)[4] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile4);
> -    (*cdat_table)[5] = g_steal_pointer(&dsemts_nonvolatile);
> +    cdat_table[0] = g_steal_pointer(&dsmas);
> +    cdat_table[1] = (CDATSubHeader *)g_steal_pointer(&dslbis1);
> +    cdat_table[2] = (CDATSubHeader *)g_steal_pointer(&dslbis2);
> +    cdat_table[3] = (CDATSubHeader *)g_steal_pointer(&dslbis3);
> +    cdat_table[4] = (CDATSubHeader *)g_steal_pointer(&dslbis4);
> +    cdat_table[5] = g_steal_pointer(&dsemts);
>  
>      return CT3_CDAT_SUBTABLE_SIZE;
>  }
>  
> +static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> +                                void *priv)
> +{
> +    CXLType3Dev *ct3d = priv;
> +    MemoryRegion *mr;
> +    int ret = 0;
> +
> +    if (!ct3d->hostmem) {
> +        return 0;
> +    }
> +
> +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> +    if (!mr) {
> +        return -EINVAL;
> +    }
> +
> +    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));

This bakes in assumptions at the wrong layer in the code.  Out here we should not
know how big the table is - that is a job just for the ct3_build_cdat_subtable()
part.

Various options come to mind..
1) Two pass approach. First call ct3_build_cdat_subtable() with NULL pointer
   passed in.  For that all it does it return the number of elements.
   The the caller calls it again providing suitable storage.
2) Allocate in ct3_build_cdat_subtable() then copy in the caller or use
   directly if only one type of memory present.

I've gone with the 2 pass approach.  Let me know what you think of it
once I send the patches out in a few mins.

Thanks,

Jonathan



> +    if (!*cdat_table) {
> +        return -ENOMEM;
> +    }
> +
> +    /* Non volatile aspects */
> +    ret = ct3_build_cdat_subtable(*cdat_table, mr, 0);
> +    if (ret < 0) {
> +        g_free(*cdat_table);
> +        *cdat_table = NULL;
> +    }
> +
> +    return ret;
> +}
> +
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
>  {
>      int i;


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function
@ 2022-10-13 10:47         ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 10:47 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, alison.schofield, dave, a.manzanares,
	bwidawsk, gregory.price, mst, hchkuo, cbrowy, ira.weiny

On Wed, 12 Oct 2022 14:21:20 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> The CDAT can contain multiple entries for multiple memory regions, this
> will allow us to re-use the initialization code when volatile memory
> region support is added.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

I'm in two minds about this... We could integrate it in the original series,
but at that time the change is justified.  Or we could leave it as a first
patch in your follow on series.

Anyhow, I went with a similar refactor inspired by this.


> ---
>  hw/mem/cxl_type3.c | 137 ++++++++++++++++++++++++---------------------
>  1 file changed, 72 insertions(+), 65 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 220b9f09a9..3c5485abd0 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -19,117 +19,93 @@
>  #define DWORD_BYTE 4
>  #define CT3_CDAT_SUBTABLE_SIZE 6
>  
> -static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> -                                void *priv)
> +static int ct3_build_cdat_subtable(CDATSubHeader **cdat_table,
> +        MemoryRegion *mr, int dsmad_handle)

subtable is particularly well defined.  Maybe
ct3_build_cdat_entries_for_mr()?

>  {
> -    g_autofree CDATDsmas *dsmas_nonvolatile = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile1 = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile2 = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile3 = NULL;
> -    g_autofree CDATDslbis *dslbis_nonvolatile4 = NULL;
> -    g_autofree CDATDsemts *dsemts_nonvolatile = NULL;
> -    CXLType3Dev *ct3d = priv;
> -    int next_dsmad_handle = 0;
> -    int nonvolatile_dsmad = -1;
> -    MemoryRegion *mr;
> -
> -    if (!ct3d->hostmem) {
> -        return 0;
> -    }
> -
> -    mr = host_memory_backend_get_memory(ct3d->hostmem);
> -    if (!mr) {
> -        return -EINVAL;
> -    }
> -
> -    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));
> -    if (!*cdat_table) {
> -        return -ENOMEM;
> -    }
> -
> -    /* Non volatile aspects */
> -    dsmas_nonvolatile = g_malloc(sizeof(*dsmas_nonvolatile));
> -    dslbis_nonvolatile1 = g_malloc(sizeof(*dslbis_nonvolatile1));
> -    dslbis_nonvolatile2 = g_malloc(sizeof(*dslbis_nonvolatile2));
> -    dslbis_nonvolatile3 = g_malloc(sizeof(*dslbis_nonvolatile3));
> -    dslbis_nonvolatile4 = g_malloc(sizeof(*dslbis_nonvolatile4));
> -    dsemts_nonvolatile = g_malloc(sizeof(*dsemts_nonvolatile));
> -
> -    if (!dsmas_nonvolatile || !dsemts_nonvolatile ||
> -        !dslbis_nonvolatile1 || !dslbis_nonvolatile2 ||
> -        !dslbis_nonvolatile3 || !dslbis_nonvolatile4) {
> -        g_free(*cdat_table);
> -        *cdat_table = NULL;
> +    g_autofree CDATDsmas *dsmas = NULL;
> +    g_autofree CDATDslbis *dslbis1 = NULL;
> +    g_autofree CDATDslbis *dslbis2 = NULL;
> +    g_autofree CDATDslbis *dslbis3 = NULL;
> +    g_autofree CDATDslbis *dslbis4 = NULL;
> +    g_autofree CDATDsemts *dsemts = NULL;
> +
> +    dsmas = g_malloc(sizeof(*dsmas));
> +    dslbis1 = g_malloc(sizeof(*dslbis1));
> +    dslbis2 = g_malloc(sizeof(*dslbis2));
> +    dslbis3 = g_malloc(sizeof(*dslbis3));
> +    dslbis4 = g_malloc(sizeof(*dslbis4));
> +    dsemts = g_malloc(sizeof(*dsemts));
> +
> +    if (!dsmas || !dslbis1 || !dslbis2 || !dslbis3 || !dslbis4 || !dsemts) {
>          return -ENOMEM;
>      }
>  
> -    nonvolatile_dsmad = next_dsmad_handle++;
> -    *dsmas_nonvolatile = (CDATDsmas) {
> +    *dsmas = (CDATDsmas) {
>          .header = {
>              .type = CDAT_TYPE_DSMAS,
> -            .length = sizeof(*dsmas_nonvolatile),
> +            .length = sizeof(*dsmas),
>          },
> -        .DSMADhandle = nonvolatile_dsmad,
> +        .DSMADhandle = dsmad_handle,
>          .flags = CDAT_DSMAS_FLAG_NV,
>          .DPA_base = 0,
>          .DPA_length = int128_get64(mr->size),
>      };
>  
>      /* For now, no memory side cache, plausiblish numbers */
> -    *dslbis_nonvolatile1 = (CDATDslbis) {
> +    *dslbis1 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile1),
> +            .length = sizeof(*dslbis1),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_READ_LATENCY,
>          .entry_base_unit = 10000, /* 10ns base */
>          .entry[0] = 15, /* 150ns */

If we are going to wrap this up for volatile / non-volatile 
we probably need to pass in a reasonable value for these.
Whilst not technically always true, to test the Linux handling
I'd want non-volatile to report as longer latency.

>      };
>  
> -    *dslbis_nonvolatile2 = (CDATDslbis) {
> +    *dslbis2 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile2),
> +            .length = sizeof(*dslbis2),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_WRITE_LATENCY,
>          .entry_base_unit = 10000,
>          .entry[0] = 25, /* 250ns */
>      };
>  
> -    *dslbis_nonvolatile3 = (CDATDslbis) {
> +    *dslbis3 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile3),
> +            .length = sizeof(*dslbis3),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
>  
> -    *dslbis_nonvolatile4 = (CDATDslbis) {
> +    *dslbis4 = (CDATDslbis) {
>          .header = {
>              .type = CDAT_TYPE_DSLBIS,
> -            .length = sizeof(*dslbis_nonvolatile4),
> +            .length = sizeof(*dslbis4),
>          },
> -        .handle = nonvolatile_dsmad,
> +        .handle = dsmad_handle,
>          .flags = HMAT_LB_MEM_MEMORY,
>          .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
>          .entry_base_unit = 1000, /* GB/s */
>          .entry[0] = 16,
>      };
>  
> -    *dsemts_nonvolatile = (CDATDsemts) {
> +    *dsemts = (CDATDsemts) {
>          .header = {
>              .type = CDAT_TYPE_DSEMTS,
> -            .length = sizeof(*dsemts_nonvolatile),
> +            .length = sizeof(*dsemts),
>          },
> -        .DSMAS_handle = nonvolatile_dsmad,
> +        .DSMAS_handle = dsmad_handle,
>          /* Reserved - the non volatile from DSMAS matters */
>          .EFI_memory_type_attr = 2,
>          .DPA_offset = 0,
> @@ -137,16 +113,47 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>      };
>  
>      /* Header always at start of structure */
> -    (*cdat_table)[0] = g_steal_pointer(&dsmas_nonvolatile);
> -    (*cdat_table)[1] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile1);
> -    (*cdat_table)[2] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile2);
> -    (*cdat_table)[3] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile3);
> -    (*cdat_table)[4] = (CDATSubHeader *)g_steal_pointer(&dslbis_nonvolatile4);
> -    (*cdat_table)[5] = g_steal_pointer(&dsemts_nonvolatile);
> +    cdat_table[0] = g_steal_pointer(&dsmas);
> +    cdat_table[1] = (CDATSubHeader *)g_steal_pointer(&dslbis1);
> +    cdat_table[2] = (CDATSubHeader *)g_steal_pointer(&dslbis2);
> +    cdat_table[3] = (CDATSubHeader *)g_steal_pointer(&dslbis3);
> +    cdat_table[4] = (CDATSubHeader *)g_steal_pointer(&dslbis4);
> +    cdat_table[5] = g_steal_pointer(&dsemts);
>  
>      return CT3_CDAT_SUBTABLE_SIZE;
>  }
>  
> +static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> +                                void *priv)
> +{
> +    CXLType3Dev *ct3d = priv;
> +    MemoryRegion *mr;
> +    int ret = 0;
> +
> +    if (!ct3d->hostmem) {
> +        return 0;
> +    }
> +
> +    mr = host_memory_backend_get_memory(ct3d->hostmem);
> +    if (!mr) {
> +        return -EINVAL;
> +    }
> +
> +    *cdat_table = g_malloc0(CT3_CDAT_SUBTABLE_SIZE * sizeof(*cdat_table));

This bakes in assumptions at the wrong layer in the code.  Out here we should not
know how big the table is - that is a job just for the ct3_build_cdat_subtable()
part.

Various options come to mind..
1) Two pass approach. First call ct3_build_cdat_subtable() with NULL pointer
   passed in.  For that all it does it return the number of elements.
   The the caller calls it again providing suitable storage.
2) Allocate in ct3_build_cdat_subtable() then copy in the caller or use
   directly if only one type of memory present.

I've gone with the 2 pass approach.  Let me know what you think of it
once I send the patches out in a few mins.

Thanks,

Jonathan



> +    if (!*cdat_table) {
> +        return -ENOMEM;
> +    }
> +
> +    /* Non volatile aspects */
> +    ret = ct3_build_cdat_subtable(*cdat_table, mr, 0);
> +    if (ret < 0) {
> +        g_free(*cdat_table);
> +        *cdat_table = NULL;
> +    }
> +
> +    return ret;
> +}
> +
>  static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
>  {
>      int i;



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-12 16:01   ` Gregory Price
@ 2022-10-13 10:56       ` Jonathan Cameron via
  2022-10-13 10:56       ` Jonathan Cameron via
  1 sibling, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 10:56 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, linuxarm, ira.weiny

On Wed, 12 Oct 2022 12:01:54 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> This code contains heap corruption on free, and I think should be
> refactored to pre-allocate all the entries we're interested in putting
> into the table.  This would flatten the code and simplify the error
> handling steps.
> 
> Also, should we consider making a union with all the possible entries to
> make entry allocation easier?  It may eat a few extra bytes of memory,
> but it would simplify the allocation/cleanup code here further.
> 
> Given that every allocation has to be checked, i'm also not convinced
> the use of g_autofree is worth the potential footguns associated with
> it.
> 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 568c9d62f5..3fa5d70662 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -12,9 +12,218 @@
> > +static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> > +                                void *priv)
> > +{  
> (snip)
> > +        /* For now, no memory side cache, plausiblish numbers */
> > +        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> > +        if (!dslbis_nonvolatile)
> > +            return -ENOMEM;  
> 
> this allocation creates a table of entries, which is later freed
> incorrectly
> 
> > +
> > +    *cdat_table = g_malloc0(len * sizeof(*cdat_table));  
> 
> this allocation needs to be checked
I just realized that sizeof should be sizeof(**cdat_table)

I've moved to a local autofree pointer after factoring out the
guts of the code so this gets simpler anyway (and was more obviously wrong!)

Jonathan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
@ 2022-10-13 10:56       ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 10:56 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, linuxarm, ira.weiny

On Wed, 12 Oct 2022 12:01:54 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> This code contains heap corruption on free, and I think should be
> refactored to pre-allocate all the entries we're interested in putting
> into the table.  This would flatten the code and simplify the error
> handling steps.
> 
> Also, should we consider making a union with all the possible entries to
> make entry allocation easier?  It may eat a few extra bytes of memory,
> but it would simplify the allocation/cleanup code here further.
> 
> Given that every allocation has to be checked, i'm also not convinced
> the use of g_autofree is worth the potential footguns associated with
> it.
> 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 568c9d62f5..3fa5d70662 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -12,9 +12,218 @@
> > +static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
> > +                                void *priv)
> > +{  
> (snip)
> > +        /* For now, no memory side cache, plausiblish numbers */
> > +        dslbis_nonvolatile = g_malloc(sizeof(*dslbis_nonvolatile) * dslbis_nonvolatile_num);
> > +        if (!dslbis_nonvolatile)
> > +            return -ENOMEM;  
> 
> this allocation creates a table of entries, which is later freed
> incorrectly
> 
> > +
> > +    *cdat_table = g_malloc0(len * sizeof(*cdat_table));  
> 
> this allocation needs to be checked
I just realized that sizeof should be sizeof(**cdat_table)

I've moved to a local autofree pointer after factoring out the
guts of the code so this gets simpler anyway (and was more obviously wrong!)

Jonathan


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
  2022-10-07 15:21   ` Jonathan Cameron via
@ 2022-10-13 11:04     ` Jonathan Cameron via
  -1 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 11:04 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: ira.weiny

On Fri, 7 Oct 2022 16:21:54 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> 
> The Data Object Exchange implementation of CXL Coherent Device Attribute
> Table (CDAT). This implementation is referring to "Coherent Device
> Attribute Table Specification, Rev. 1.02, Oct. 2020" and "Compute
> Express Link Specification, Rev. 2.0, Oct. 2020"
> 
> This patch adds core support that will be shared by both
> end-points and switch port emulation.
> 
> Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> Signed-off-by: Chris Browy <cbrowy@avery-design.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Whilst doing v8 I'll uprev this to the 1.03 CDAT spec and CXL 3.0
Changes are minor and it's backwards compatible but it's hard
to get the older CXL spec now 3.0 is out.

> 
> ---
> Changes since RFC:
> - Split out libary code from specific device.
> ---
>  hw/cxl/cxl-cdat.c              | 222 +++++++++++++++++++++++++++++++++
>  hw/cxl/meson.build             |   1 +
>  include/hw/cxl/cxl_cdat.h      | 165 ++++++++++++++++++++++++
>  include/hw/cxl/cxl_component.h |   7 ++
>  include/hw/cxl/cxl_device.h    |   3 +
>  include/hw/cxl/cxl_pci.h       |   1 +
>  6 files changed, 399 insertions(+)
> 
> diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> new file mode 100644
> index 0000000000..137178632b
> --- /dev/null
> +++ b/hw/cxl/cxl-cdat.c
> @@ -0,0 +1,222 @@
> +/*
> + * CXL CDAT Structure
> + *
> + * Copyright (C) 2021 Avery Design Systems, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/pci/pci.h"
> +#include "hw/cxl/cxl.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +
> +static void cdat_len_check(CDATSubHeader *hdr, Error **errp)
> +{
> +    assert(hdr->length);
> +    assert(hdr->reserved == 0);
> +
> +    switch (hdr->type) {
> +    case CDAT_TYPE_DSMAS:
> +        assert(hdr->length == sizeof(CDATDsmas));
> +        break;
> +    case CDAT_TYPE_DSLBIS:
> +        assert(hdr->length == sizeof(CDATDslbis));
> +        break;
> +    case CDAT_TYPE_DSMSCIS:
> +        assert(hdr->length == sizeof(CDATDsmscis));
> +        break;
> +    case CDAT_TYPE_DSIS:
> +        assert(hdr->length == sizeof(CDATDsis));
> +        break;
> +    case CDAT_TYPE_DSEMTS:
> +        assert(hdr->length == sizeof(CDATDsemts));
> +        break;
> +    case CDAT_TYPE_SSLBIS:
> +        assert(hdr->length >= sizeof(CDATSslbisHeader));
> +        assert((hdr->length - sizeof(CDATSslbisHeader)) %
> +               sizeof(CDATSslbe) == 0);
> +        break;
> +    default:
> +        error_setg(errp, "Type %d is reserved", hdr->type);
> +    }
> +}
> +
> +static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> +{
> +    g_autofree CDATTableHeader *cdat_header = NULL;
> +    g_autofree CDATEntry *cdat_st = NULL;
> +    uint8_t sum = 0;
> +    int ent, i;
> +
> +    /* Use default table if fopen == NULL */
> +    assert(cdat->build_cdat_table);
> +
> +    cdat_header = g_malloc0(sizeof(*cdat_header));
> +    if (!cdat_header) {
> +        error_setg(errp, "Failed to allocate CDAT header");
> +        return;
> +    }
> +
> +    cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf, cdat->private);
> +
> +    if (!cdat->built_buf_len) {
> +        /* Build later as not all data available yet */
> +        cdat->to_update = true;
> +        return;
> +    }
> +    cdat->to_update = false;
> +
> +    cdat_st = g_malloc0(sizeof(*cdat_st) * (cdat->built_buf_len + 1));
> +    if (!cdat_st) {
> +        error_setg(errp, "Failed to allocate CDAT entry array");
> +        return;
> +    }
> +
> +    /* Entry 0 for CDAT header, starts with Entry 1 */
> +    for (ent = 1; ent < cdat->built_buf_len + 1; ent++) {
> +        CDATSubHeader *hdr = cdat->built_buf[ent - 1];
> +        uint8_t *buf = (uint8_t *)cdat->built_buf[ent - 1];
> +
> +        cdat_st[ent].base = hdr;
> +        cdat_st[ent].length = hdr->length;
> +
> +        cdat_header->length += hdr->length;
> +        for (i = 0; i < hdr->length; i++) {
> +            sum += buf[i];
> +        }
> +    }
> +
> +    /* CDAT header */
> +    cdat_header->revision = CXL_CDAT_REV;
> +    /* For now, no runtime updates */
> +    cdat_header->sequence = 0;
> +    cdat_header->length += sizeof(CDATTableHeader);
> +    sum += cdat_header->revision + cdat_header->sequence +
> +        cdat_header->length;
> +    /* Sum of all bytes including checksum must be 0 */
> +    cdat_header->checksum = ~sum + 1;
> +
> +    cdat_st[0].base = g_steal_pointer(&cdat_header);
> +    cdat_st[0].length = sizeof(*cdat_header);
> +    cdat->entry_len = 1 + cdat->built_buf_len;
> +    cdat->entry = g_steal_pointer(&cdat_st);
> +}
> +
> +static void ct3_load_cdat(CDATObject *cdat, Error **errp)
> +{
> +    g_autofree CDATEntry *cdat_st = NULL;
> +    uint8_t sum = 0;
> +    int num_ent;
> +    int i = 0, ent = 1, file_size = 0;
> +    CDATSubHeader *hdr;
> +    FILE *fp = NULL;
> +
> +    /* Read CDAT file and create its cache */
> +    fp = fopen(cdat->filename, "r");
> +    if (!fp) {
> +        error_setg(errp, "CDAT: Unable to open file");
> +        return;
> +    }
> +
> +    fseek(fp, 0, SEEK_END);
> +    file_size = ftell(fp);
> +    fseek(fp, 0, SEEK_SET);
> +    cdat->buf = g_malloc0(file_size);
> +
> +    if (fread(cdat->buf, file_size, 1, fp) == 0) {
> +        error_setg(errp, "CDAT: File read failed");
> +        return;
> +    }
> +
> +    fclose(fp);
> +
> +    if (file_size < sizeof(CDATTableHeader)) {
> +        error_setg(errp, "CDAT: File too short");
> +        return;
> +    }
> +    i = sizeof(CDATTableHeader);
> +    num_ent = 1;
> +    while (i < file_size) {
> +        hdr = (CDATSubHeader *)(cdat->buf + i);
> +        cdat_len_check(hdr, errp);
> +        i += hdr->length;
> +        num_ent++;
> +    }
> +    if (i != file_size) {
> +        error_setg(errp, "CDAT: File length missmatch");
> +        return;
> +    }
> +
> +    cdat_st = g_malloc0(sizeof(*cdat_st) * num_ent);
> +    if (!cdat_st) {
> +        error_setg(errp, "CDAT: Failed to allocate entry array");
> +        return;
> +    }
> +
> +    /* Set CDAT header, Entry = 0 */
> +    cdat_st[0].base = cdat->buf;
> +    cdat_st[0].length = sizeof(CDATTableHeader);
> +    i = 0;
> +
> +    while (i < cdat_st[0].length) {
> +        sum += cdat->buf[i++];
> +    }
> +
> +    /* Read CDAT structures */
> +    while (i < file_size) {
> +        hdr = (CDATSubHeader *)(cdat->buf + i);
> +        cdat_len_check(hdr, errp);
> +
> +        cdat_st[ent].base = hdr;
> +        cdat_st[ent].length = hdr->length;
> +
> +        while (cdat->buf + i <
> +               (uint8_t *)cdat_st[ent].base + cdat_st[ent].length) {
> +            assert(i < file_size);
> +            sum += cdat->buf[i++];
> +        }
> +
> +        ent++;
> +    }
> +
> +    if (sum != 0) {
> +        warn_report("CDAT: Found checksum mismatch in %s", cdat->filename);
> +    }
> +    cdat->entry_len = num_ent;
> +    cdat->entry = g_steal_pointer(&cdat_st);
> +}
> +
> +void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp)
> +{
> +    CDATObject *cdat = &cxl_cstate->cdat;
> +
> +    if (cdat->filename) {
> +        ct3_load_cdat(cdat, errp);
> +    } else {
> +        ct3_build_cdat(cdat, errp);
> +    }
> +}
> +
> +void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp)
> +{
> +    CDATObject *cdat = &cxl_cstate->cdat;
> +
> +    if (cdat->to_update) {
> +        ct3_build_cdat(cdat, errp);
> +    }
> +}
> +
> +void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
> +{
> +    CDATObject *cdat = &cxl_cstate->cdat;
> +
> +    free(cdat->entry);
> +    if (cdat->built_buf)
> +        cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
> +                              cdat->private);
> +    if (cdat->buf)
> +        free(cdat->buf);
> +}
> diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
> index f117b99949..cfa95ffd40 100644
> --- a/hw/cxl/meson.build
> +++ b/hw/cxl/meson.build
> @@ -4,6 +4,7 @@ softmmu_ss.add(when: 'CONFIG_CXL',
>                     'cxl-device-utils.c',
>                     'cxl-mailbox-utils.c',
>                     'cxl-host.c',
> +                   'cxl-cdat.c',
>                 ),
>                 if_false: files(
>                     'cxl-host-stubs.c',
> diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
> new file mode 100644
> index 0000000000..fdb1fa98f4
> --- /dev/null
> +++ b/include/hw/cxl/cxl_cdat.h
> @@ -0,0 +1,165 @@
> +/*
> + * CXL CDAT Structure
> + *
> + * Copyright (C) 2021 Avery Design Systems, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef CXL_CDAT_H
> +#define CXL_CDAT_H
> +
> +#include "hw/cxl/cxl_pci.h"
> +
> +/*
> + * Reference:
> + *   Coherent Device Attribute Table (CDAT) Specification, Rev. 1.02, Oct. 2020
> + *   Compute Express Link (CXL) Specification, Rev. 2.0, Oct. 2020
> + */
> +/* Table Access DOE - CXL 8.1.11 */
> +#define CXL_DOE_TABLE_ACCESS      2
> +#define CXL_DOE_PROTOCOL_CDAT     ((CXL_DOE_TABLE_ACCESS << 16) | CXL_VENDOR_ID)
> +
> +/* Read Entry - CXL 8.1.11.1 */
> +#define CXL_DOE_TAB_TYPE_CDAT 0
> +#define CXL_DOE_TAB_ENT_MAX 0xFFFF
> +
> +/* Read Entry Request - CXL 8.1.11.1 Table 134 */
> +#define CXL_DOE_TAB_REQ 0
> +typedef struct CDATReq {
> +    DOEHeader header;
> +    uint8_t req_code;
> +    uint8_t table_type;
> +    uint16_t entry_handle;
> +} QEMU_PACKED CDATReq;
> +
> +/* Read Entry Response - CXL 8.1.11.1 Table 135 */
> +#define CXL_DOE_TAB_RSP 0
> +typedef struct CDATRsp {
> +    DOEHeader header;
> +    uint8_t rsp_code;
> +    uint8_t table_type;
> +    uint16_t entry_handle;
> +} QEMU_PACKED CDATRsp;
> +
> +/* CDAT Table Format - CDAT Table 1 */
> +#define CXL_CDAT_REV 1
> +typedef struct CDATTableHeader {
> +    uint32_t length;
> +    uint8_t revision;
> +    uint8_t checksum;
> +    uint8_t reserved[6];
> +    uint32_t sequence;
> +} QEMU_PACKED CDATTableHeader;
> +
> +/* CDAT Structure Types - CDAT Table 2 */
> +typedef enum {
> +    CDAT_TYPE_DSMAS = 0,
> +    CDAT_TYPE_DSLBIS = 1,
> +    CDAT_TYPE_DSMSCIS = 2,
> +    CDAT_TYPE_DSIS = 3,
> +    CDAT_TYPE_DSEMTS = 4,
> +    CDAT_TYPE_SSLBIS = 5,
> +} CDATType;
> +
> +typedef struct CDATSubHeader {
> +    uint8_t type;
> +    uint8_t reserved;
> +    uint16_t length;
> +} CDATSubHeader;
> +
> +/* Device Scoped Memory Affinity Structure - CDAT Table 3 */
> +typedef struct CDATDsmas {
> +    CDATSubHeader header;
> +    uint8_t DSMADhandle;
> +    uint8_t flags;
> +#define CDAT_DSMAS_FLAG_NV              (1 << 2)
> +#define CDAT_DSMAS_FLAG_SHAREABLE       (1 << 3)
> +#define CDAT_DSMAS_FLAG_HW_COHERENT     (1 << 4)
> +#define CDAT_DSMAS_FLAG_DYNAMIC_CAP     (1 << 5)
> +    uint16_t reserved;
> +    uint64_t DPA_base;
> +    uint64_t DPA_length;
> +} QEMU_PACKED CDATDsmas;
> +
> +/* Device Scoped Latency and Bandwidth Information Structure - CDAT Table 5 */
> +typedef struct CDATDslbis {
> +    CDATSubHeader header;
> +    uint8_t handle;
> +    /* Definitions of these fields refer directly to HMAT fields */
> +    uint8_t flags;
> +    uint8_t data_type;
> +    uint8_t reserved;
> +    uint64_t entry_base_unit;
> +    uint16_t entry[3];
> +    uint16_t reserved2;
> +} QEMU_PACKED CDATDslbis;
> +
> +/* Device Scoped Memory Side Cache Information Structure - CDAT Table 6 */
> +typedef struct CDATDsmscis {
> +    CDATSubHeader header;
> +    uint8_t DSMAS_handle;
> +    uint8_t reserved[3];
> +    uint64_t memory_side_cache_size;
> +    uint32_t cache_attributes;
> +} QEMU_PACKED CDATDsmscis;
> +
> +/* Device Scoped Initiator Structure - CDAT Table 7 */
> +typedef struct CDATDsis {
> +    CDATSubHeader header;
> +    uint8_t flags;
> +    uint8_t handle;
> +    uint16_t reserved;
> +} QEMU_PACKED CDATDsis;
> +
> +/* Device Scoped EFI Memory Type Structure - CDAT Table 8 */
> +typedef struct CDATDsemts {
> +    CDATSubHeader header;
> +    uint8_t DSMAS_handle;
> +    uint8_t EFI_memory_type_attr;
> +    uint16_t reserved;
> +    uint64_t DPA_offset;
> +    uint64_t DPA_length;
> +} QEMU_PACKED CDATDsemts;
> +
> +/* Switch Scoped Latency and Bandwidth Information Structure - CDAT Table 9 */
> +typedef struct CDATSslbisHeader {
> +    CDATSubHeader header;
> +    uint8_t data_type;
> +    uint8_t reserved[3];
> +    uint64_t entry_base_unit;
> +} QEMU_PACKED CDATSslbisHeader;
> +
> +/* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
> +typedef struct CDATSslbe {
> +    uint16_t port_x_id;
> +    uint16_t port_y_id;
> +    uint16_t latency_bandwidth;
> +    uint16_t reserved;
> +} QEMU_PACKED CDATSslbe;
> +
> +typedef struct CDATSslbis {
> +    CDATSslbisHeader sslbis_header;
> +    CDATSslbe sslbe[];
> +} CDATSslbis;
> +
> +typedef struct CDATEntry {
> +    void *base;
> +    uint32_t length;
> +} CDATEntry;
> +
> +typedef struct CDATObject {
> +    CDATEntry *entry;
> +    int entry_len;
> +
> +    int (*build_cdat_table)(CDATSubHeader ***cdat_table, void *priv);
> +    void (*free_cdat_table)(CDATSubHeader **, int num, void *priv);
> +    bool to_update;
> +    void *private;
> +    char *filename;
> +    uint8_t *buf;
> +    struct CDATSubHeader **built_buf;
> +    int built_buf_len;
> +} CDATObject;
> +#endif /* CXL_CDAT_H */
> diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
> index 94ec2f07d7..34075cfb72 100644
> --- a/include/hw/cxl/cxl_component.h
> +++ b/include/hw/cxl/cxl_component.h
> @@ -19,6 +19,7 @@
>  #include "qemu/range.h"
>  #include "qemu/typedefs.h"
>  #include "hw/register.h"
> +#include "qapi/error.h"
>  
>  enum reg_type {
>      CXL2_DEVICE,
> @@ -184,6 +185,8 @@ typedef struct cxl_component {
>              struct PCIDevice *pdev;
>          };
>      };
> +
> +    CDATObject cdat;
>  } CXLComponentState;
>  
>  void cxl_component_register_block_init(Object *obj,
> @@ -220,4 +223,8 @@ static inline hwaddr cxl_decode_ig(int ig)
>  
>  CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
>  
> +void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp);
> +void cxl_doe_cdat_release(CXLComponentState *cxl_cstate);
> +void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp);
> +
>  #endif
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index e4d221cdb3..449b0edfe9 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -243,6 +243,9 @@ struct CXLType3Dev {
>      AddressSpace hostmem_as;
>      CXLComponentState cxl_cstate;
>      CXLDeviceState cxl_dstate;
> +
> +    /* DOE */
> +    DOECap doe_cdat;
>  };
>  
>  #define TYPE_CXL_TYPE3 "cxl-type3"
> diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h
> index 01cf002096..3cb79eca1e 100644
> --- a/include/hw/cxl/cxl_pci.h
> +++ b/include/hw/cxl/cxl_pci.h
> @@ -13,6 +13,7 @@
>  #include "qemu/compiler.h"
>  #include "hw/pci/pci.h"
>  #include "hw/pci/pcie.h"
> +#include "hw/cxl/cxl_cdat.h"
>  
>  #define CXL_VENDOR_ID 0x1e98
>  


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
@ 2022-10-13 11:04     ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 11:04 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy
  Cc: ira.weiny

On Fri, 7 Oct 2022 16:21:54 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> 
> The Data Object Exchange implementation of CXL Coherent Device Attribute
> Table (CDAT). This implementation is referring to "Coherent Device
> Attribute Table Specification, Rev. 1.02, Oct. 2020" and "Compute
> Express Link Specification, Rev. 2.0, Oct. 2020"
> 
> This patch adds core support that will be shared by both
> end-points and switch port emulation.
> 
> Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> Signed-off-by: Chris Browy <cbrowy@avery-design.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Whilst doing v8 I'll uprev this to the 1.03 CDAT spec and CXL 3.0
Changes are minor and it's backwards compatible but it's hard
to get the older CXL spec now 3.0 is out.

> 
> ---
> Changes since RFC:
> - Split out libary code from specific device.
> ---
>  hw/cxl/cxl-cdat.c              | 222 +++++++++++++++++++++++++++++++++
>  hw/cxl/meson.build             |   1 +
>  include/hw/cxl/cxl_cdat.h      | 165 ++++++++++++++++++++++++
>  include/hw/cxl/cxl_component.h |   7 ++
>  include/hw/cxl/cxl_device.h    |   3 +
>  include/hw/cxl/cxl_pci.h       |   1 +
>  6 files changed, 399 insertions(+)
> 
> diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> new file mode 100644
> index 0000000000..137178632b
> --- /dev/null
> +++ b/hw/cxl/cxl-cdat.c
> @@ -0,0 +1,222 @@
> +/*
> + * CXL CDAT Structure
> + *
> + * Copyright (C) 2021 Avery Design Systems, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/pci/pci.h"
> +#include "hw/cxl/cxl.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +
> +static void cdat_len_check(CDATSubHeader *hdr, Error **errp)
> +{
> +    assert(hdr->length);
> +    assert(hdr->reserved == 0);
> +
> +    switch (hdr->type) {
> +    case CDAT_TYPE_DSMAS:
> +        assert(hdr->length == sizeof(CDATDsmas));
> +        break;
> +    case CDAT_TYPE_DSLBIS:
> +        assert(hdr->length == sizeof(CDATDslbis));
> +        break;
> +    case CDAT_TYPE_DSMSCIS:
> +        assert(hdr->length == sizeof(CDATDsmscis));
> +        break;
> +    case CDAT_TYPE_DSIS:
> +        assert(hdr->length == sizeof(CDATDsis));
> +        break;
> +    case CDAT_TYPE_DSEMTS:
> +        assert(hdr->length == sizeof(CDATDsemts));
> +        break;
> +    case CDAT_TYPE_SSLBIS:
> +        assert(hdr->length >= sizeof(CDATSslbisHeader));
> +        assert((hdr->length - sizeof(CDATSslbisHeader)) %
> +               sizeof(CDATSslbe) == 0);
> +        break;
> +    default:
> +        error_setg(errp, "Type %d is reserved", hdr->type);
> +    }
> +}
> +
> +static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> +{
> +    g_autofree CDATTableHeader *cdat_header = NULL;
> +    g_autofree CDATEntry *cdat_st = NULL;
> +    uint8_t sum = 0;
> +    int ent, i;
> +
> +    /* Use default table if fopen == NULL */
> +    assert(cdat->build_cdat_table);
> +
> +    cdat_header = g_malloc0(sizeof(*cdat_header));
> +    if (!cdat_header) {
> +        error_setg(errp, "Failed to allocate CDAT header");
> +        return;
> +    }
> +
> +    cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf, cdat->private);
> +
> +    if (!cdat->built_buf_len) {
> +        /* Build later as not all data available yet */
> +        cdat->to_update = true;
> +        return;
> +    }
> +    cdat->to_update = false;
> +
> +    cdat_st = g_malloc0(sizeof(*cdat_st) * (cdat->built_buf_len + 1));
> +    if (!cdat_st) {
> +        error_setg(errp, "Failed to allocate CDAT entry array");
> +        return;
> +    }
> +
> +    /* Entry 0 for CDAT header, starts with Entry 1 */
> +    for (ent = 1; ent < cdat->built_buf_len + 1; ent++) {
> +        CDATSubHeader *hdr = cdat->built_buf[ent - 1];
> +        uint8_t *buf = (uint8_t *)cdat->built_buf[ent - 1];
> +
> +        cdat_st[ent].base = hdr;
> +        cdat_st[ent].length = hdr->length;
> +
> +        cdat_header->length += hdr->length;
> +        for (i = 0; i < hdr->length; i++) {
> +            sum += buf[i];
> +        }
> +    }
> +
> +    /* CDAT header */
> +    cdat_header->revision = CXL_CDAT_REV;
> +    /* For now, no runtime updates */
> +    cdat_header->sequence = 0;
> +    cdat_header->length += sizeof(CDATTableHeader);
> +    sum += cdat_header->revision + cdat_header->sequence +
> +        cdat_header->length;
> +    /* Sum of all bytes including checksum must be 0 */
> +    cdat_header->checksum = ~sum + 1;
> +
> +    cdat_st[0].base = g_steal_pointer(&cdat_header);
> +    cdat_st[0].length = sizeof(*cdat_header);
> +    cdat->entry_len = 1 + cdat->built_buf_len;
> +    cdat->entry = g_steal_pointer(&cdat_st);
> +}
> +
> +static void ct3_load_cdat(CDATObject *cdat, Error **errp)
> +{
> +    g_autofree CDATEntry *cdat_st = NULL;
> +    uint8_t sum = 0;
> +    int num_ent;
> +    int i = 0, ent = 1, file_size = 0;
> +    CDATSubHeader *hdr;
> +    FILE *fp = NULL;
> +
> +    /* Read CDAT file and create its cache */
> +    fp = fopen(cdat->filename, "r");
> +    if (!fp) {
> +        error_setg(errp, "CDAT: Unable to open file");
> +        return;
> +    }
> +
> +    fseek(fp, 0, SEEK_END);
> +    file_size = ftell(fp);
> +    fseek(fp, 0, SEEK_SET);
> +    cdat->buf = g_malloc0(file_size);
> +
> +    if (fread(cdat->buf, file_size, 1, fp) == 0) {
> +        error_setg(errp, "CDAT: File read failed");
> +        return;
> +    }
> +
> +    fclose(fp);
> +
> +    if (file_size < sizeof(CDATTableHeader)) {
> +        error_setg(errp, "CDAT: File too short");
> +        return;
> +    }
> +    i = sizeof(CDATTableHeader);
> +    num_ent = 1;
> +    while (i < file_size) {
> +        hdr = (CDATSubHeader *)(cdat->buf + i);
> +        cdat_len_check(hdr, errp);
> +        i += hdr->length;
> +        num_ent++;
> +    }
> +    if (i != file_size) {
> +        error_setg(errp, "CDAT: File length missmatch");
> +        return;
> +    }
> +
> +    cdat_st = g_malloc0(sizeof(*cdat_st) * num_ent);
> +    if (!cdat_st) {
> +        error_setg(errp, "CDAT: Failed to allocate entry array");
> +        return;
> +    }
> +
> +    /* Set CDAT header, Entry = 0 */
> +    cdat_st[0].base = cdat->buf;
> +    cdat_st[0].length = sizeof(CDATTableHeader);
> +    i = 0;
> +
> +    while (i < cdat_st[0].length) {
> +        sum += cdat->buf[i++];
> +    }
> +
> +    /* Read CDAT structures */
> +    while (i < file_size) {
> +        hdr = (CDATSubHeader *)(cdat->buf + i);
> +        cdat_len_check(hdr, errp);
> +
> +        cdat_st[ent].base = hdr;
> +        cdat_st[ent].length = hdr->length;
> +
> +        while (cdat->buf + i <
> +               (uint8_t *)cdat_st[ent].base + cdat_st[ent].length) {
> +            assert(i < file_size);
> +            sum += cdat->buf[i++];
> +        }
> +
> +        ent++;
> +    }
> +
> +    if (sum != 0) {
> +        warn_report("CDAT: Found checksum mismatch in %s", cdat->filename);
> +    }
> +    cdat->entry_len = num_ent;
> +    cdat->entry = g_steal_pointer(&cdat_st);
> +}
> +
> +void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp)
> +{
> +    CDATObject *cdat = &cxl_cstate->cdat;
> +
> +    if (cdat->filename) {
> +        ct3_load_cdat(cdat, errp);
> +    } else {
> +        ct3_build_cdat(cdat, errp);
> +    }
> +}
> +
> +void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp)
> +{
> +    CDATObject *cdat = &cxl_cstate->cdat;
> +
> +    if (cdat->to_update) {
> +        ct3_build_cdat(cdat, errp);
> +    }
> +}
> +
> +void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
> +{
> +    CDATObject *cdat = &cxl_cstate->cdat;
> +
> +    free(cdat->entry);
> +    if (cdat->built_buf)
> +        cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
> +                              cdat->private);
> +    if (cdat->buf)
> +        free(cdat->buf);
> +}
> diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
> index f117b99949..cfa95ffd40 100644
> --- a/hw/cxl/meson.build
> +++ b/hw/cxl/meson.build
> @@ -4,6 +4,7 @@ softmmu_ss.add(when: 'CONFIG_CXL',
>                     'cxl-device-utils.c',
>                     'cxl-mailbox-utils.c',
>                     'cxl-host.c',
> +                   'cxl-cdat.c',
>                 ),
>                 if_false: files(
>                     'cxl-host-stubs.c',
> diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
> new file mode 100644
> index 0000000000..fdb1fa98f4
> --- /dev/null
> +++ b/include/hw/cxl/cxl_cdat.h
> @@ -0,0 +1,165 @@
> +/*
> + * CXL CDAT Structure
> + *
> + * Copyright (C) 2021 Avery Design Systems, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef CXL_CDAT_H
> +#define CXL_CDAT_H
> +
> +#include "hw/cxl/cxl_pci.h"
> +
> +/*
> + * Reference:
> + *   Coherent Device Attribute Table (CDAT) Specification, Rev. 1.02, Oct. 2020
> + *   Compute Express Link (CXL) Specification, Rev. 2.0, Oct. 2020
> + */
> +/* Table Access DOE - CXL 8.1.11 */
> +#define CXL_DOE_TABLE_ACCESS      2
> +#define CXL_DOE_PROTOCOL_CDAT     ((CXL_DOE_TABLE_ACCESS << 16) | CXL_VENDOR_ID)
> +
> +/* Read Entry - CXL 8.1.11.1 */
> +#define CXL_DOE_TAB_TYPE_CDAT 0
> +#define CXL_DOE_TAB_ENT_MAX 0xFFFF
> +
> +/* Read Entry Request - CXL 8.1.11.1 Table 134 */
> +#define CXL_DOE_TAB_REQ 0
> +typedef struct CDATReq {
> +    DOEHeader header;
> +    uint8_t req_code;
> +    uint8_t table_type;
> +    uint16_t entry_handle;
> +} QEMU_PACKED CDATReq;
> +
> +/* Read Entry Response - CXL 8.1.11.1 Table 135 */
> +#define CXL_DOE_TAB_RSP 0
> +typedef struct CDATRsp {
> +    DOEHeader header;
> +    uint8_t rsp_code;
> +    uint8_t table_type;
> +    uint16_t entry_handle;
> +} QEMU_PACKED CDATRsp;
> +
> +/* CDAT Table Format - CDAT Table 1 */
> +#define CXL_CDAT_REV 1
> +typedef struct CDATTableHeader {
> +    uint32_t length;
> +    uint8_t revision;
> +    uint8_t checksum;
> +    uint8_t reserved[6];
> +    uint32_t sequence;
> +} QEMU_PACKED CDATTableHeader;
> +
> +/* CDAT Structure Types - CDAT Table 2 */
> +typedef enum {
> +    CDAT_TYPE_DSMAS = 0,
> +    CDAT_TYPE_DSLBIS = 1,
> +    CDAT_TYPE_DSMSCIS = 2,
> +    CDAT_TYPE_DSIS = 3,
> +    CDAT_TYPE_DSEMTS = 4,
> +    CDAT_TYPE_SSLBIS = 5,
> +} CDATType;
> +
> +typedef struct CDATSubHeader {
> +    uint8_t type;
> +    uint8_t reserved;
> +    uint16_t length;
> +} CDATSubHeader;
> +
> +/* Device Scoped Memory Affinity Structure - CDAT Table 3 */
> +typedef struct CDATDsmas {
> +    CDATSubHeader header;
> +    uint8_t DSMADhandle;
> +    uint8_t flags;
> +#define CDAT_DSMAS_FLAG_NV              (1 << 2)
> +#define CDAT_DSMAS_FLAG_SHAREABLE       (1 << 3)
> +#define CDAT_DSMAS_FLAG_HW_COHERENT     (1 << 4)
> +#define CDAT_DSMAS_FLAG_DYNAMIC_CAP     (1 << 5)
> +    uint16_t reserved;
> +    uint64_t DPA_base;
> +    uint64_t DPA_length;
> +} QEMU_PACKED CDATDsmas;
> +
> +/* Device Scoped Latency and Bandwidth Information Structure - CDAT Table 5 */
> +typedef struct CDATDslbis {
> +    CDATSubHeader header;
> +    uint8_t handle;
> +    /* Definitions of these fields refer directly to HMAT fields */
> +    uint8_t flags;
> +    uint8_t data_type;
> +    uint8_t reserved;
> +    uint64_t entry_base_unit;
> +    uint16_t entry[3];
> +    uint16_t reserved2;
> +} QEMU_PACKED CDATDslbis;
> +
> +/* Device Scoped Memory Side Cache Information Structure - CDAT Table 6 */
> +typedef struct CDATDsmscis {
> +    CDATSubHeader header;
> +    uint8_t DSMAS_handle;
> +    uint8_t reserved[3];
> +    uint64_t memory_side_cache_size;
> +    uint32_t cache_attributes;
> +} QEMU_PACKED CDATDsmscis;
> +
> +/* Device Scoped Initiator Structure - CDAT Table 7 */
> +typedef struct CDATDsis {
> +    CDATSubHeader header;
> +    uint8_t flags;
> +    uint8_t handle;
> +    uint16_t reserved;
> +} QEMU_PACKED CDATDsis;
> +
> +/* Device Scoped EFI Memory Type Structure - CDAT Table 8 */
> +typedef struct CDATDsemts {
> +    CDATSubHeader header;
> +    uint8_t DSMAS_handle;
> +    uint8_t EFI_memory_type_attr;
> +    uint16_t reserved;
> +    uint64_t DPA_offset;
> +    uint64_t DPA_length;
> +} QEMU_PACKED CDATDsemts;
> +
> +/* Switch Scoped Latency and Bandwidth Information Structure - CDAT Table 9 */
> +typedef struct CDATSslbisHeader {
> +    CDATSubHeader header;
> +    uint8_t data_type;
> +    uint8_t reserved[3];
> +    uint64_t entry_base_unit;
> +} QEMU_PACKED CDATSslbisHeader;
> +
> +/* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
> +typedef struct CDATSslbe {
> +    uint16_t port_x_id;
> +    uint16_t port_y_id;
> +    uint16_t latency_bandwidth;
> +    uint16_t reserved;
> +} QEMU_PACKED CDATSslbe;
> +
> +typedef struct CDATSslbis {
> +    CDATSslbisHeader sslbis_header;
> +    CDATSslbe sslbe[];
> +} CDATSslbis;
> +
> +typedef struct CDATEntry {
> +    void *base;
> +    uint32_t length;
> +} CDATEntry;
> +
> +typedef struct CDATObject {
> +    CDATEntry *entry;
> +    int entry_len;
> +
> +    int (*build_cdat_table)(CDATSubHeader ***cdat_table, void *priv);
> +    void (*free_cdat_table)(CDATSubHeader **, int num, void *priv);
> +    bool to_update;
> +    void *private;
> +    char *filename;
> +    uint8_t *buf;
> +    struct CDATSubHeader **built_buf;
> +    int built_buf_len;
> +} CDATObject;
> +#endif /* CXL_CDAT_H */
> diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
> index 94ec2f07d7..34075cfb72 100644
> --- a/include/hw/cxl/cxl_component.h
> +++ b/include/hw/cxl/cxl_component.h
> @@ -19,6 +19,7 @@
>  #include "qemu/range.h"
>  #include "qemu/typedefs.h"
>  #include "hw/register.h"
> +#include "qapi/error.h"
>  
>  enum reg_type {
>      CXL2_DEVICE,
> @@ -184,6 +185,8 @@ typedef struct cxl_component {
>              struct PCIDevice *pdev;
>          };
>      };
> +
> +    CDATObject cdat;
>  } CXLComponentState;
>  
>  void cxl_component_register_block_init(Object *obj,
> @@ -220,4 +223,8 @@ static inline hwaddr cxl_decode_ig(int ig)
>  
>  CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
>  
> +void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp);
> +void cxl_doe_cdat_release(CXLComponentState *cxl_cstate);
> +void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp);
> +
>  #endif
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index e4d221cdb3..449b0edfe9 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -243,6 +243,9 @@ struct CXLType3Dev {
>      AddressSpace hostmem_as;
>      CXLComponentState cxl_cstate;
>      CXLDeviceState cxl_dstate;
> +
> +    /* DOE */
> +    DOECap doe_cdat;
>  };
>  
>  #define TYPE_CXL_TYPE3 "cxl-type3"
> diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h
> index 01cf002096..3cb79eca1e 100644
> --- a/include/hw/cxl/cxl_pci.h
> +++ b/include/hw/cxl/cxl_pci.h
> @@ -13,6 +13,7 @@
>  #include "qemu/compiler.h"
>  #include "hw/pci/pci.h"
>  #include "hw/pci/pcie.h"
> +#include "hw/cxl/cxl_cdat.h"
>  
>  #define CXL_VENDOR_ID 0x1e98
>  



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-13  8:57       ` Jonathan Cameron via
  (?)
@ 2022-10-13 11:36       ` Gregory Price
  2022-10-13 11:53           ` Jonathan Cameron via
  -1 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-13 11:36 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-devel, linux-cxl, Alison Schofield, Davidlohr Bueso,
	a.manzanares, Ben Widawsky, Gregory Price, mst, hchkuo, cbrowy,
	ira.weiny

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

Reading through your notes, everything seems reasonable, though I'm not
sure I agree with the two pass notion, though I'll wait to see the patch
set.

The enum is a good idea, *forehead slap*, I should have done it.  If we
have a local enum, why not just make it global (within the file) and
allocate the table as I have once we know how many MRs are present?

6 eggs/half dozen though, I'm ultimately fine with either.

On Thu, Oct 13, 2022, 4:58 AM Jonathan Cameron <Jonathan.Cameron@huawei.com>
wrote:

> On Wed, 12 Oct 2022 14:21:15 -0400
> Gregory Price <gourry.memverge@gmail.com> wrote:
>
> > Included in this response is a recommended patch set on top of this
> > patch that resolves a number of issues, including style and a heap
> > corruption bug.
> >
> > The purpose of this patch set is to refactor the CDAT initialization
> > code to support future patch sets that will introduce multi-region
> > support in CXL Type3 devices.
> >
> > 1) Checkpatch errors in the immediately prior patch
> > 2) Flatting of code in cdat initialization
> > 3) Changes in allocation and error checking for cleanliness
> > 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
> >    multi-region allocation in the future.  Also resolves a heap
> >    corruption bug
> > 5) Refactor of CDAT initialization code into a function that initializes
> >    sub-tables per memory-region.
> >
> > Gregory Price (5):
> >   hw/mem/cxl_type3: fix checkpatch errors
> >   hw/mem/cxl_type3: Pull validation checks ahead of functional code
> >   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
> >   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
> >   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
> >     function
> >
> >  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
> >  1 file changed, 122 insertions(+), 118 deletions(-)
> >
>
> Thanks, I'm going to roll this stuff into the original patch set for v8.
> Some of this I already have (like the check patch stuff).
> Some I may disagree with in which case  I'll reply to the patches - note
> I haven't looked at them in detail yet!
>
> Jonathan
>

[-- Attachment #2: Type: text/html, Size: 2883 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-13 11:36       ` Gregory Price
@ 2022-10-13 11:53           ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 11:53 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, Alison Schofield, Davidlohr Bueso,
	a.manzanares, Ben Widawsky, Gregory Price, mst, hchkuo, cbrowy,
	ira.weiny

On Thu, 13 Oct 2022 07:36:28 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Reading through your notes, everything seems reasonable, though I'm not
> sure I agree with the two pass notion, though I'll wait to see the patch
> set.
> 
> The enum is a good idea, *forehead slap*, I should have done it.  If we
> have a local enum, why not just make it global (within the file) and
> allocate the table as I have once we know how many MRs are present?

It's not global as we need the entries to be packed.  So if just one mr
(which ever one) the entries for that need to be at the beginning of
cdat_table.  I also don't want to bake into the outer caller that the
entries will always be the same size for different MRs.

For the two pass case...

I'll send code in a few mins, but in meantime my thought is that
the extended code for volatile + non volatile will looks something like:
(variable names made up)

	if (ct3d->volatile_mem) {
		volatile_mr = host_memory_backend_get_memory(ct3d->volatile_mem....);
		if (!volatile_mr) {
			return -ENINVAL;
		}
		rc = ct3_build_cdat_entries_for_mr(NULL, dsmad++, volatile_mr);
		if (rc < 0) {
			return rc;
		}
		volatile_len = rc;
	}

	if (ct3d->nonvolatile_mem) {
		nonvolatile_mr = host_memory_backend_get_memory(ct3d->nonvolatile_mem);
		if (!nonvolatile_mr) {
			return -ENINVAL;
		}
		rc = ct3_build_cdat_entries_for_mr(NULL, dmsmad++, nonvolatile_mr....);
		if (rc < 0) {
			return rc;
		}
		nonvolatile_len = rc;
	}

	dsmad = 0;

	table = g_malloc(0, (volatile_len + nonvolatile_len) * sizeof(*table));
	if (!table) {
		return -ENOMEM;
	}
	
	if (volatile_len) {
		rc = ct3_build_cdat_entries_for_mr(&table[0], dmsad++, volatile_mr....);
		if (rc < 0) {
			return rc;
		}
	}	
	if (nonvolatile_len) {
		rc = ct3_build_cdat_entries_for_mr(&table[volatile_len], dsmad++, nonvolatile_mr...);
		if (rc < 0) {
			/* Only place we need error handling.  Could make it more generic of course */
			for (i = 0; i < volatile_len; i++) {
				g_free(cdat_table[i]);
			}
			return rc;
		}
	}

	*cdat_table = g_steal_pointer(&table);


Jonathan

> 
> 6 eggs/half dozen though, I'm ultimately fine with either.
> 
> On Thu, Oct 13, 2022, 4:58 AM Jonathan Cameron <Jonathan.Cameron@huawei.com>
> wrote:
> 
> > On Wed, 12 Oct 2022 14:21:15 -0400
> > Gregory Price <gourry.memverge@gmail.com> wrote:
> >  
> > > Included in this response is a recommended patch set on top of this
> > > patch that resolves a number of issues, including style and a heap
> > > corruption bug.
> > >
> > > The purpose of this patch set is to refactor the CDAT initialization
> > > code to support future patch sets that will introduce multi-region
> > > support in CXL Type3 devices.
> > >
> > > 1) Checkpatch errors in the immediately prior patch
> > > 2) Flatting of code in cdat initialization
> > > 3) Changes in allocation and error checking for cleanliness
> > > 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
> > >    multi-region allocation in the future.  Also resolves a heap
> > >    corruption bug
> > > 5) Refactor of CDAT initialization code into a function that initializes
> > >    sub-tables per memory-region.
> > >
> > > Gregory Price (5):
> > >   hw/mem/cxl_type3: fix checkpatch errors
> > >   hw/mem/cxl_type3: Pull validation checks ahead of functional code
> > >   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
> > >   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
> > >   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
> > >     function
> > >
> > >  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
> > >  1 file changed, 122 insertions(+), 118 deletions(-)
> > >  
> >
> > Thanks, I'm going to roll this stuff into the original patch set for v8.
> > Some of this I already have (like the check patch stuff).
> > Some I may disagree with in which case  I'll reply to the patches - note
> > I haven't looked at them in detail yet!
> >
> > Jonathan
> >  
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
@ 2022-10-13 11:53           ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 11:53 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, Alison Schofield, Davidlohr Bueso,
	a.manzanares, Ben Widawsky, Gregory Price, mst, hchkuo, cbrowy,
	ira.weiny

On Thu, 13 Oct 2022 07:36:28 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> Reading through your notes, everything seems reasonable, though I'm not
> sure I agree with the two pass notion, though I'll wait to see the patch
> set.
> 
> The enum is a good idea, *forehead slap*, I should have done it.  If we
> have a local enum, why not just make it global (within the file) and
> allocate the table as I have once we know how many MRs are present?

It's not global as we need the entries to be packed.  So if just one mr
(which ever one) the entries for that need to be at the beginning of
cdat_table.  I also don't want to bake into the outer caller that the
entries will always be the same size for different MRs.

For the two pass case...

I'll send code in a few mins, but in meantime my thought is that
the extended code for volatile + non volatile will looks something like:
(variable names made up)

	if (ct3d->volatile_mem) {
		volatile_mr = host_memory_backend_get_memory(ct3d->volatile_mem....);
		if (!volatile_mr) {
			return -ENINVAL;
		}
		rc = ct3_build_cdat_entries_for_mr(NULL, dsmad++, volatile_mr);
		if (rc < 0) {
			return rc;
		}
		volatile_len = rc;
	}

	if (ct3d->nonvolatile_mem) {
		nonvolatile_mr = host_memory_backend_get_memory(ct3d->nonvolatile_mem);
		if (!nonvolatile_mr) {
			return -ENINVAL;
		}
		rc = ct3_build_cdat_entries_for_mr(NULL, dmsmad++, nonvolatile_mr....);
		if (rc < 0) {
			return rc;
		}
		nonvolatile_len = rc;
	}

	dsmad = 0;

	table = g_malloc(0, (volatile_len + nonvolatile_len) * sizeof(*table));
	if (!table) {
		return -ENOMEM;
	}
	
	if (volatile_len) {
		rc = ct3_build_cdat_entries_for_mr(&table[0], dmsad++, volatile_mr....);
		if (rc < 0) {
			return rc;
		}
	}	
	if (nonvolatile_len) {
		rc = ct3_build_cdat_entries_for_mr(&table[volatile_len], dsmad++, nonvolatile_mr...);
		if (rc < 0) {
			/* Only place we need error handling.  Could make it more generic of course */
			for (i = 0; i < volatile_len; i++) {
				g_free(cdat_table[i]);
			}
			return rc;
		}
	}

	*cdat_table = g_steal_pointer(&table);


Jonathan

> 
> 6 eggs/half dozen though, I'm ultimately fine with either.
> 
> On Thu, Oct 13, 2022, 4:58 AM Jonathan Cameron <Jonathan.Cameron@huawei.com>
> wrote:
> 
> > On Wed, 12 Oct 2022 14:21:15 -0400
> > Gregory Price <gourry.memverge@gmail.com> wrote:
> >  
> > > Included in this response is a recommended patch set on top of this
> > > patch that resolves a number of issues, including style and a heap
> > > corruption bug.
> > >
> > > The purpose of this patch set is to refactor the CDAT initialization
> > > code to support future patch sets that will introduce multi-region
> > > support in CXL Type3 devices.
> > >
> > > 1) Checkpatch errors in the immediately prior patch
> > > 2) Flatting of code in cdat initialization
> > > 3) Changes in allocation and error checking for cleanliness
> > > 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
> > >    multi-region allocation in the future.  Also resolves a heap
> > >    corruption bug
> > > 5) Refactor of CDAT initialization code into a function that initializes
> > >    sub-tables per memory-region.
> > >
> > > Gregory Price (5):
> > >   hw/mem/cxl_type3: fix checkpatch errors
> > >   hw/mem/cxl_type3: Pull validation checks ahead of functional code
> > >   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
> > >   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
> > >   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
> > >     function
> > >
> > >  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
> > >  1 file changed, 122 insertions(+), 118 deletions(-)
> > >  
> >
> > Thanks, I'm going to roll this stuff into the original patch set for v8.
> > Some of this I already have (like the check patch stuff).
> > Some I may disagree with in which case  I'll reply to the patches - note
> > I haven't looked at them in detail yet!
> >
> > Jonathan
> >  
> 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-13 11:53           ` Jonathan Cameron via
  (?)
@ 2022-10-13 12:35           ` Gregory Price
  2022-10-13 14:40               ` Jonathan Cameron via
  -1 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-13 12:35 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-devel, linux-cxl, Alison Schofield, Davidlohr Bueso,
	a.manzanares, Ben Widawsky, Gregory Price, mst, hchkuo, cbrowy,
	ira.weiny


fwiw this is what my function looked like after the prior changes, very
similar to yours proposed below

static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
                                void *priv)
{
    CXLType3Dev *ct3d = priv;
    MemoryRegion *vmr = NULL, *pmr = NULL;
    uint64_t dpa_base = 0;
    int dsmad_handle = 0;
    int num_ents = 0;
    int cur_ent = 0;
    int ret = 0;

    if (ct3d->hostvmem) {
        vmr = host_memory_backend_get_memory(ct3d->hostvmem);
        if (!vmr)
            return -EINVAL;
        num_ents += CT3_CDAT_SUBTABLE_SIZE;
    }
    if (ct3d->hostpmem) {
        pmr = host_memory_backend_get_memory(ct3d->hostpmem);
        if (!pmr)
            return -EINVAL;
        num_ents += CT3_CDAT_SUBTABLE_SIZE;
    }
    if (!num_ents) {
        return 0;
    }

    *cdat_table = g_malloc0(num_ents * sizeof(*cdat_table));
    if (!*cdat_table) {
        return -ENOMEM;
    }

    /* Volatile aspects are mapped first */
    if (vmr) {
        ret = ct3_build_cdat_subtable(*cdat_table, vmr, dsmad_handle++,
                                      false, dpa_base);
        if (ret < 0) {
            goto error_cleanup;
        }
        dpa_base = vmr->size;
        cur_ent += ret;
    }
    /* Non volatile aspects */
    if (pmr) {
        /* non-volatile entries follow the volatile entries */
        ret = ct3_build_cdat_subtable(&(*cdat_table)[cur_ent], pmr,
                                      dsmad_handle, true, dpa_base);
        if (ret < 0) {
            goto error_cleanup;
        }
        cur_ent += ret;
    }
    assert(cur_ent == num_ents);

    return ret;
error_cleanup:
    int i;
    for (i = 0; i < num_ents; i++) {
        g_free(*cdat_table[i]);
    }
    g_free(*cdat_table);
    return ret;
}


On Thu, Oct 13, 2022 at 12:53:13PM +0100, Jonathan Cameron wrote:
> On Thu, 13 Oct 2022 07:36:28 -0400
> Gregory Price <gourry.memverge@gmail.com> wrote:
> 
> > Reading through your notes, everything seems reasonable, though I'm not
> > sure I agree with the two pass notion, though I'll wait to see the patch
> > set.
> > 
> > The enum is a good idea, *forehead slap*, I should have done it.  If we
> > have a local enum, why not just make it global (within the file) and
> > allocate the table as I have once we know how many MRs are present?
> 
> It's not global as we need the entries to be packed.  So if just one mr
> (which ever one) the entries for that need to be at the beginning of
> cdat_table.  I also don't want to bake into the outer caller that the
> entries will always be the same size for different MRs.
> 
> For the two pass case...
> 
> I'll send code in a few mins, but in meantime my thought is that
> the extended code for volatile + non volatile will looks something like:
> (variable names made up)
> 
> 	if (ct3d->volatile_mem) {
> 		volatile_mr = host_memory_backend_get_memory(ct3d->volatile_mem....);
> 		if (!volatile_mr) {
> 			return -ENINVAL;
> 		}
> 		rc = ct3_build_cdat_entries_for_mr(NULL, dsmad++, volatile_mr);
> 		if (rc < 0) {
> 			return rc;
> 		}
> 		volatile_len = rc;
> 	}
> 
> 	if (ct3d->nonvolatile_mem) {
> 		nonvolatile_mr = host_memory_backend_get_memory(ct3d->nonvolatile_mem);
> 		if (!nonvolatile_mr) {
> 			return -ENINVAL;
> 		}
> 		rc = ct3_build_cdat_entries_for_mr(NULL, dmsmad++, nonvolatile_mr....);
> 		if (rc < 0) {
> 			return rc;
> 		}
> 		nonvolatile_len = rc;
> 	}
> 
> 	dsmad = 0;
> 
> 	table = g_malloc(0, (volatile_len + nonvolatile_len) * sizeof(*table));
> 	if (!table) {
> 		return -ENOMEM;
> 	}
> 	
> 	if (volatile_len) {
> 		rc = ct3_build_cdat_entries_for_mr(&table[0], dmsad++, volatile_mr....);
> 		if (rc < 0) {
> 			return rc;
> 		}
> 	}	
> 	if (nonvolatile_len) {
> 		rc = ct3_build_cdat_entries_for_mr(&table[volatile_len], dsmad++, nonvolatile_mr...);
> 		if (rc < 0) {
> 			/* Only place we need error handling.  Could make it more generic of course */
> 			for (i = 0; i < volatile_len; i++) {
> 				g_free(cdat_table[i]);
> 			}
> 			return rc;
> 		}
> 	}
> 
> 	*cdat_table = g_steal_pointer(&table);
> 
> 
> Jonathan
> 
> > 
> > 6 eggs/half dozen though, I'm ultimately fine with either.
> > 
> > On Thu, Oct 13, 2022, 4:58 AM Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > wrote:
> > 
> > > On Wed, 12 Oct 2022 14:21:15 -0400
> > > Gregory Price <gourry.memverge@gmail.com> wrote:
> > >  
> > > > Included in this response is a recommended patch set on top of this
> > > > patch that resolves a number of issues, including style and a heap
> > > > corruption bug.
> > > >
> > > > The purpose of this patch set is to refactor the CDAT initialization
> > > > code to support future patch sets that will introduce multi-region
> > > > support in CXL Type3 devices.
> > > >
> > > > 1) Checkpatch errors in the immediately prior patch
> > > > 2) Flatting of code in cdat initialization
> > > > 3) Changes in allocation and error checking for cleanliness
> > > > 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
> > > >    multi-region allocation in the future.  Also resolves a heap
> > > >    corruption bug
> > > > 5) Refactor of CDAT initialization code into a function that initializes
> > > >    sub-tables per memory-region.
> > > >
> > > > Gregory Price (5):
> > > >   hw/mem/cxl_type3: fix checkpatch errors
> > > >   hw/mem/cxl_type3: Pull validation checks ahead of functional code
> > > >   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
> > > >   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
> > > >   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
> > > >     function
> > > >
> > > >  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
> > > >  1 file changed, 122 insertions(+), 118 deletions(-)
> > > >  
> > >
> > > Thanks, I'm going to roll this stuff into the original patch set for v8.
> > > Some of this I already have (like the check patch stuff).
> > > Some I may disagree with in which case  I'll reply to the patches - note
> > > I haven't looked at them in detail yet!
> > >
> > > Jonathan
> > >  
> > 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-13 12:35           ` Gregory Price
@ 2022-10-13 14:40               ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-13 14:40 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, Alison Schofield, Davidlohr Bueso,
	a.manzanares, Ben Widawsky, Gregory Price, mst, hchkuo, cbrowy,
	ira.weiny

On Thu, 13 Oct 2022 08:35:13 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> fwiw this is what my function looked like after the prior changes, very
> similar to yours proposed below

Makes sense given only real change is exactly where the size comes from ;)

FYI, I've pushed out latest version on top of qemu/master
at gitlab.com/jic23/ as tag doe-v8

Just as soon as I finish bouncing patches to a machine I can push from
I'll push out the rest of my queue.

My current thought is to slide your series under the rest of that queue
(so directly on top of the DOE set - v8+ depending on reviews).

The other series coming through is Ira's event injection but my guess
is that will take a bit more time to stabilize.

Jonathan

> 
> static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                 void *priv)
> {
>     CXLType3Dev *ct3d = priv;
>     MemoryRegion *vmr = NULL, *pmr = NULL;
>     uint64_t dpa_base = 0;
>     int dsmad_handle = 0;
>     int num_ents = 0;
>     int cur_ent = 0;
>     int ret = 0;
> 
>     if (ct3d->hostvmem) {
>         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
>         if (!vmr)
>             return -EINVAL;
>         num_ents += CT3_CDAT_SUBTABLE_SIZE;
>     }
>     if (ct3d->hostpmem) {
>         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
>         if (!pmr)
>             return -EINVAL;
>         num_ents += CT3_CDAT_SUBTABLE_SIZE;
>     }
>     if (!num_ents) {
>         return 0;
>     }
> 
>     *cdat_table = g_malloc0(num_ents * sizeof(*cdat_table));
>     if (!*cdat_table) {
>         return -ENOMEM;
>     }
> 
>     /* Volatile aspects are mapped first */
>     if (vmr) {
>         ret = ct3_build_cdat_subtable(*cdat_table, vmr, dsmad_handle++,
>                                       false, dpa_base);
>         if (ret < 0) {
>             goto error_cleanup;
>         }
>         dpa_base = vmr->size;
>         cur_ent += ret;
>     }
>     /* Non volatile aspects */
>     if (pmr) {
>         /* non-volatile entries follow the volatile entries */
>         ret = ct3_build_cdat_subtable(&(*cdat_table)[cur_ent], pmr,
>                                       dsmad_handle, true, dpa_base);
>         if (ret < 0) {
>             goto error_cleanup;
>         }
>         cur_ent += ret;
>     }
>     assert(cur_ent == num_ents);
> 
>     return ret;
> error_cleanup:
>     int i;
>     for (i = 0; i < num_ents; i++) {

Might as well loop only to cur_ent as the rest will be NULL.


>         g_free(*cdat_table[i]);
>     }
>     g_free(*cdat_table);
>     return ret;
> }
> 
> 
> On Thu, Oct 13, 2022 at 12:53:13PM +0100, Jonathan Cameron wrote:
> > On Thu, 13 Oct 2022 07:36:28 -0400
> > Gregory Price <gourry.memverge@gmail.com> wrote:
> >   
> > > Reading through your notes, everything seems reasonable, though I'm not
> > > sure I agree with the two pass notion, though I'll wait to see the patch
> > > set.
> > > 
> > > The enum is a good idea, *forehead slap*, I should have done it.  If we
> > > have a local enum, why not just make it global (within the file) and
> > > allocate the table as I have once we know how many MRs are present?  
> > 
> > It's not global as we need the entries to be packed.  So if just one mr
> > (which ever one) the entries for that need to be at the beginning of
> > cdat_table.  I also don't want to bake into the outer caller that the
> > entries will always be the same size for different MRs.
> > 
> > For the two pass case...
> > 
> > I'll send code in a few mins, but in meantime my thought is that
> > the extended code for volatile + non volatile will looks something like:
> > (variable names made up)
> > 
> > 	if (ct3d->volatile_mem) {
> > 		volatile_mr = host_memory_backend_get_memory(ct3d->volatile_mem....);
> > 		if (!volatile_mr) {
> > 			return -ENINVAL;
> > 		}
> > 		rc = ct3_build_cdat_entries_for_mr(NULL, dsmad++, volatile_mr);
> > 		if (rc < 0) {
> > 			return rc;
> > 		}
> > 		volatile_len = rc;
> > 	}
> > 
> > 	if (ct3d->nonvolatile_mem) {
> > 		nonvolatile_mr = host_memory_backend_get_memory(ct3d->nonvolatile_mem);
> > 		if (!nonvolatile_mr) {
> > 			return -ENINVAL;
> > 		}
> > 		rc = ct3_build_cdat_entries_for_mr(NULL, dmsmad++, nonvolatile_mr....);
> > 		if (rc < 0) {
> > 			return rc;
> > 		}
> > 		nonvolatile_len = rc;
> > 	}
> > 
> > 	dsmad = 0;
> > 
> > 	table = g_malloc(0, (volatile_len + nonvolatile_len) * sizeof(*table));
> > 	if (!table) {
> > 		return -ENOMEM;
> > 	}
> > 	
> > 	if (volatile_len) {
> > 		rc = ct3_build_cdat_entries_for_mr(&table[0], dmsad++, volatile_mr....);
> > 		if (rc < 0) {
> > 			return rc;
> > 		}
> > 	}	
> > 	if (nonvolatile_len) {
> > 		rc = ct3_build_cdat_entries_for_mr(&table[volatile_len], dsmad++, nonvolatile_mr...);
> > 		if (rc < 0) {
> > 			/* Only place we need error handling.  Could make it more generic of course */
> > 			for (i = 0; i < volatile_len; i++) {
> > 				g_free(cdat_table[i]);
> > 			}
> > 			return rc;
> > 		}
> > 	}
> > 
> > 	*cdat_table = g_steal_pointer(&table);
> > 
> > 
> > Jonathan
> >   
> > > 
> > > 6 eggs/half dozen though, I'm ultimately fine with either.
> > > 
> > > On Thu, Oct 13, 2022, 4:58 AM Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > wrote:
> > >   
> > > > On Wed, 12 Oct 2022 14:21:15 -0400
> > > > Gregory Price <gourry.memverge@gmail.com> wrote:
> > > >    
> > > > > Included in this response is a recommended patch set on top of this
> > > > > patch that resolves a number of issues, including style and a heap
> > > > > corruption bug.
> > > > >
> > > > > The purpose of this patch set is to refactor the CDAT initialization
> > > > > code to support future patch sets that will introduce multi-region
> > > > > support in CXL Type3 devices.
> > > > >
> > > > > 1) Checkpatch errors in the immediately prior patch
> > > > > 2) Flatting of code in cdat initialization
> > > > > 3) Changes in allocation and error checking for cleanliness
> > > > > 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
> > > > >    multi-region allocation in the future.  Also resolves a heap
> > > > >    corruption bug
> > > > > 5) Refactor of CDAT initialization code into a function that initializes
> > > > >    sub-tables per memory-region.
> > > > >
> > > > > Gregory Price (5):
> > > > >   hw/mem/cxl_type3: fix checkpatch errors
> > > > >   hw/mem/cxl_type3: Pull validation checks ahead of functional code
> > > > >   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
> > > > >   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
> > > > >   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
> > > > >     function
> > > > >
> > > > >  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
> > > > >  1 file changed, 122 insertions(+), 118 deletions(-)
> > > > >    
> > > >
> > > > Thanks, I'm going to roll this stuff into the original patch set for v8.
> > > > Some of this I already have (like the check patch stuff).
> > > > Some I may disagree with in which case  I'll reply to the patches - note
> > > > I haven't looked at them in detail yet!
> > > >
> > > > Jonathan
> > > >    
> > >   
> >   


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
@ 2022-10-13 14:40               ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-13 14:40 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, linux-cxl, Alison Schofield, Davidlohr Bueso,
	a.manzanares, Ben Widawsky, Gregory Price, mst, hchkuo, cbrowy,
	ira.weiny

On Thu, 13 Oct 2022 08:35:13 -0400
Gregory Price <gourry.memverge@gmail.com> wrote:

> fwiw this is what my function looked like after the prior changes, very
> similar to yours proposed below

Makes sense given only real change is exactly where the size comes from ;)

FYI, I've pushed out latest version on top of qemu/master
at gitlab.com/jic23/ as tag doe-v8

Just as soon as I finish bouncing patches to a machine I can push from
I'll push out the rest of my queue.

My current thought is to slide your series under the rest of that queue
(so directly on top of the DOE set - v8+ depending on reviews).

The other series coming through is Ira's event injection but my guess
is that will take a bit more time to stabilize.

Jonathan

> 
> static int ct3_build_cdat_table(CDATSubHeader ***cdat_table,
>                                 void *priv)
> {
>     CXLType3Dev *ct3d = priv;
>     MemoryRegion *vmr = NULL, *pmr = NULL;
>     uint64_t dpa_base = 0;
>     int dsmad_handle = 0;
>     int num_ents = 0;
>     int cur_ent = 0;
>     int ret = 0;
> 
>     if (ct3d->hostvmem) {
>         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
>         if (!vmr)
>             return -EINVAL;
>         num_ents += CT3_CDAT_SUBTABLE_SIZE;
>     }
>     if (ct3d->hostpmem) {
>         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
>         if (!pmr)
>             return -EINVAL;
>         num_ents += CT3_CDAT_SUBTABLE_SIZE;
>     }
>     if (!num_ents) {
>         return 0;
>     }
> 
>     *cdat_table = g_malloc0(num_ents * sizeof(*cdat_table));
>     if (!*cdat_table) {
>         return -ENOMEM;
>     }
> 
>     /* Volatile aspects are mapped first */
>     if (vmr) {
>         ret = ct3_build_cdat_subtable(*cdat_table, vmr, dsmad_handle++,
>                                       false, dpa_base);
>         if (ret < 0) {
>             goto error_cleanup;
>         }
>         dpa_base = vmr->size;
>         cur_ent += ret;
>     }
>     /* Non volatile aspects */
>     if (pmr) {
>         /* non-volatile entries follow the volatile entries */
>         ret = ct3_build_cdat_subtable(&(*cdat_table)[cur_ent], pmr,
>                                       dsmad_handle, true, dpa_base);
>         if (ret < 0) {
>             goto error_cleanup;
>         }
>         cur_ent += ret;
>     }
>     assert(cur_ent == num_ents);
> 
>     return ret;
> error_cleanup:
>     int i;
>     for (i = 0; i < num_ents; i++) {

Might as well loop only to cur_ent as the rest will be NULL.


>         g_free(*cdat_table[i]);
>     }
>     g_free(*cdat_table);
>     return ret;
> }
> 
> 
> On Thu, Oct 13, 2022 at 12:53:13PM +0100, Jonathan Cameron wrote:
> > On Thu, 13 Oct 2022 07:36:28 -0400
> > Gregory Price <gourry.memverge@gmail.com> wrote:
> >   
> > > Reading through your notes, everything seems reasonable, though I'm not
> > > sure I agree with the two pass notion, though I'll wait to see the patch
> > > set.
> > > 
> > > The enum is a good idea, *forehead slap*, I should have done it.  If we
> > > have a local enum, why not just make it global (within the file) and
> > > allocate the table as I have once we know how many MRs are present?  
> > 
> > It's not global as we need the entries to be packed.  So if just one mr
> > (which ever one) the entries for that need to be at the beginning of
> > cdat_table.  I also don't want to bake into the outer caller that the
> > entries will always be the same size for different MRs.
> > 
> > For the two pass case...
> > 
> > I'll send code in a few mins, but in meantime my thought is that
> > the extended code for volatile + non volatile will looks something like:
> > (variable names made up)
> > 
> > 	if (ct3d->volatile_mem) {
> > 		volatile_mr = host_memory_backend_get_memory(ct3d->volatile_mem....);
> > 		if (!volatile_mr) {
> > 			return -ENINVAL;
> > 		}
> > 		rc = ct3_build_cdat_entries_for_mr(NULL, dsmad++, volatile_mr);
> > 		if (rc < 0) {
> > 			return rc;
> > 		}
> > 		volatile_len = rc;
> > 	}
> > 
> > 	if (ct3d->nonvolatile_mem) {
> > 		nonvolatile_mr = host_memory_backend_get_memory(ct3d->nonvolatile_mem);
> > 		if (!nonvolatile_mr) {
> > 			return -ENINVAL;
> > 		}
> > 		rc = ct3_build_cdat_entries_for_mr(NULL, dmsmad++, nonvolatile_mr....);
> > 		if (rc < 0) {
> > 			return rc;
> > 		}
> > 		nonvolatile_len = rc;
> > 	}
> > 
> > 	dsmad = 0;
> > 
> > 	table = g_malloc(0, (volatile_len + nonvolatile_len) * sizeof(*table));
> > 	if (!table) {
> > 		return -ENOMEM;
> > 	}
> > 	
> > 	if (volatile_len) {
> > 		rc = ct3_build_cdat_entries_for_mr(&table[0], dmsad++, volatile_mr....);
> > 		if (rc < 0) {
> > 			return rc;
> > 		}
> > 	}	
> > 	if (nonvolatile_len) {
> > 		rc = ct3_build_cdat_entries_for_mr(&table[volatile_len], dsmad++, nonvolatile_mr...);
> > 		if (rc < 0) {
> > 			/* Only place we need error handling.  Could make it more generic of course */
> > 			for (i = 0; i < volatile_len; i++) {
> > 				g_free(cdat_table[i]);
> > 			}
> > 			return rc;
> > 		}
> > 	}
> > 
> > 	*cdat_table = g_steal_pointer(&table);
> > 
> > 
> > Jonathan
> >   
> > > 
> > > 6 eggs/half dozen though, I'm ultimately fine with either.
> > > 
> > > On Thu, Oct 13, 2022, 4:58 AM Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > wrote:
> > >   
> > > > On Wed, 12 Oct 2022 14:21:15 -0400
> > > > Gregory Price <gourry.memverge@gmail.com> wrote:
> > > >    
> > > > > Included in this response is a recommended patch set on top of this
> > > > > patch that resolves a number of issues, including style and a heap
> > > > > corruption bug.
> > > > >
> > > > > The purpose of this patch set is to refactor the CDAT initialization
> > > > > code to support future patch sets that will introduce multi-region
> > > > > support in CXL Type3 devices.
> > > > >
> > > > > 1) Checkpatch errors in the immediately prior patch
> > > > > 2) Flatting of code in cdat initialization
> > > > > 3) Changes in allocation and error checking for cleanliness
> > > > > 4) Change in the allocation/free strategy of CDAT sub-tables to simplify
> > > > >    multi-region allocation in the future.  Also resolves a heap
> > > > >    corruption bug
> > > > > 5) Refactor of CDAT initialization code into a function that initializes
> > > > >    sub-tables per memory-region.
> > > > >
> > > > > Gregory Price (5):
> > > > >   hw/mem/cxl_type3: fix checkpatch errors
> > > > >   hw/mem/cxl_type3: Pull validation checks ahead of functional code
> > > > >   hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work
> > > > >   hw/mem/cxl_type3: Change the CDAT allocation/free strategy
> > > > >   hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a
> > > > >     function
> > > > >
> > > > >  hw/mem/cxl_type3.c | 240 +++++++++++++++++++++++----------------------
> > > > >  1 file changed, 122 insertions(+), 118 deletions(-)
> > > > >    
> > > >
> > > > Thanks, I'm going to roll this stuff into the original patch set for v8.
> > > > Some of this I already have (like the check patch stuff).
> > > > Some I may disagree with in which case  I'll reply to the patches - note
> > > > I haven't looked at them in detail yet!
> > > >
> > > > Jonathan
> > > >    
> > >   
> >   



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function
  2022-10-13 10:47         ` Jonathan Cameron via
  (?)
@ 2022-10-13 19:40         ` Gregory Price
  2022-10-14 15:29             ` Jonathan Cameron via
  -1 siblings, 1 reply; 58+ messages in thread
From: Gregory Price @ 2022-10-13 19:40 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Gregory Price, qemu-devel, linux-cxl, alison.schofield, dave,
	a.manzanares, bwidawsk, mst, hchkuo, cbrowy, ira.weiny

> >      /* For now, no memory side cache, plausiblish numbers */
> > -    *dslbis_nonvolatile1 = (CDATDslbis) {
> > +    *dslbis1 = (CDATDslbis) {
> >          .header = {
> >              .type = CDAT_TYPE_DSLBIS,
> > -            .length = sizeof(*dslbis_nonvolatile1),
> > +            .length = sizeof(*dslbis1),
> >          },
> > -        .handle = nonvolatile_dsmad,
> > +        .handle = dsmad_handle,
> >          .flags = HMAT_LB_MEM_MEMORY,
> >          .data_type = HMAT_LB_DATA_READ_LATENCY,
> >          .entry_base_unit = 10000, /* 10ns base */
> >          .entry[0] = 15, /* 150ns */
> 
> If we are going to wrap this up for volatile / non-volatile 
> we probably need to pass in a reasonable value for these.
> Whilst not technically always true, to test the Linux handling
> I'd want non-volatile to report as longer latency.
> 

Here's a good question

Do we want the base unit and entry to be adjustable for volatile and
nonvolatile regions for the purpose of testing?  Or should this simply
be a static value for each?

Since we need to pass in (is_pmem/is_nonvolatile) or whatever into the
cdat function, we could just use that to do one of a few options:
    1) Select from a static value
    2) Select a static value and apply a multiplier for nvmem
    3) Use a base/value provided by the use and apply a multiplier
    4) Make vmem and pmem have separately configurable latencies

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function
  2022-10-13 19:40         ` Gregory Price
@ 2022-10-14 15:29             ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron @ 2022-10-14 15:29 UTC (permalink / raw)
  To: Gregory Price
  Cc: Gregory Price, qemu-devel, linux-cxl, alison.schofield, dave,
	a.manzanares, bwidawsk, mst, hchkuo, cbrowy, ira.weiny

On Thu, 13 Oct 2022 15:40:47 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> > >      /* For now, no memory side cache, plausiblish numbers */
> > > -    *dslbis_nonvolatile1 = (CDATDslbis) {
> > > +    *dslbis1 = (CDATDslbis) {
> > >          .header = {
> > >              .type = CDAT_TYPE_DSLBIS,
> > > -            .length = sizeof(*dslbis_nonvolatile1),
> > > +            .length = sizeof(*dslbis1),
> > >          },
> > > -        .handle = nonvolatile_dsmad,
> > > +        .handle = dsmad_handle,
> > >          .flags = HMAT_LB_MEM_MEMORY,
> > >          .data_type = HMAT_LB_DATA_READ_LATENCY,
> > >          .entry_base_unit = 10000, /* 10ns base */
> > >          .entry[0] = 15, /* 150ns */  
> > 
> > If we are going to wrap this up for volatile / non-volatile 
> > we probably need to pass in a reasonable value for these.
> > Whilst not technically always true, to test the Linux handling
> > I'd want non-volatile to report as longer latency.
> >   
> 
> Here's a good question
> 
> Do we want the base unit and entry to be adjustable for volatile and
> nonvolatile regions for the purpose of testing?  Or should this simply
> be a static value for each?

We definitely want a 'default' value if nothing is provided.
It might be useful to allow it to be adjusted, but lets add that when
we have a use for it (perhaps testing some stuff in kernel where the
values matter enough to make them controllable).

> 
> Since we need to pass in (is_pmem/is_nonvolatile) or whatever into the
> cdat function, we could just use that to do one of a few options:
>     1) Select from a static value
>     2) Select a static value and apply a multiplier for nvmem
>     3) Use a base/value provided by the use and apply a multiplier
>     4) Make vmem and pmem have separately configurable latencies

For now 1 is fine I think.

I've just pushed out a doe-v9 tag and cxl-2022-10-14 branch to 
gitlab.com/jic23/qemu  Also advanced the base tree to current QEMU mainline.

Note that if anyone is playing with the switch cci device and mainline kernel
you'll currently need to revert
https://lore.kernel.org/linux-pci/20220905080232.36087-5-mika.westerberg@linux.intel.com/

Jonathan


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function
@ 2022-10-14 15:29             ` Jonathan Cameron via
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Cameron via @ 2022-10-14 15:29 UTC (permalink / raw)
  To: Gregory Price
  Cc: Gregory Price, qemu-devel, linux-cxl, alison.schofield, dave,
	a.manzanares, bwidawsk, mst, hchkuo, cbrowy, ira.weiny

On Thu, 13 Oct 2022 15:40:47 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> > >      /* For now, no memory side cache, plausiblish numbers */
> > > -    *dslbis_nonvolatile1 = (CDATDslbis) {
> > > +    *dslbis1 = (CDATDslbis) {
> > >          .header = {
> > >              .type = CDAT_TYPE_DSLBIS,
> > > -            .length = sizeof(*dslbis_nonvolatile1),
> > > +            .length = sizeof(*dslbis1),
> > >          },
> > > -        .handle = nonvolatile_dsmad,
> > > +        .handle = dsmad_handle,
> > >          .flags = HMAT_LB_MEM_MEMORY,
> > >          .data_type = HMAT_LB_DATA_READ_LATENCY,
> > >          .entry_base_unit = 10000, /* 10ns base */
> > >          .entry[0] = 15, /* 150ns */  
> > 
> > If we are going to wrap this up for volatile / non-volatile 
> > we probably need to pass in a reasonable value for these.
> > Whilst not technically always true, to test the Linux handling
> > I'd want non-volatile to report as longer latency.
> >   
> 
> Here's a good question
> 
> Do we want the base unit and entry to be adjustable for volatile and
> nonvolatile regions for the purpose of testing?  Or should this simply
> be a static value for each?

We definitely want a 'default' value if nothing is provided.
It might be useful to allow it to be adjusted, but lets add that when
we have a use for it (perhaps testing some stuff in kernel where the
values matter enough to make them controllable).

> 
> Since we need to pass in (is_pmem/is_nonvolatile) or whatever into the
> cdat function, we could just use that to do one of a few options:
>     1) Select from a static value
>     2) Select a static value and apply a multiplier for nvmem
>     3) Use a base/value provided by the use and apply a multiplier
>     4) Make vmem and pmem have separately configurable latencies

For now 1 is fine I think.

I've just pushed out a doe-v9 tag and cxl-2022-10-14 branch to 
gitlab.com/jic23/qemu  Also advanced the base tree to current QEMU mainline.

Note that if anyone is playing with the switch cci device and mainline kernel
you'll currently need to revert
https://lore.kernel.org/linux-pci/20220905080232.36087-5-mika.westerberg@linux.intel.com/

Jonathan



^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2022-10-14 15:33 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-07 15:21 [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0 Jonathan Cameron
2022-10-07 15:21 ` Jonathan Cameron via
2022-10-07 15:21 ` [PATCH v7 1/5] hw/pci: PCIe Data Object Exchange emulation Jonathan Cameron
2022-10-07 15:21   ` Jonathan Cameron via
2022-10-07 15:21 ` [PATCH v7 2/5] hw/mem/cxl-type3: Add MSIX support Jonathan Cameron
2022-10-07 15:21   ` Jonathan Cameron via
2022-10-07 15:21 ` [PATCH v7 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation Jonathan Cameron
2022-10-07 15:21   ` Jonathan Cameron via
2022-10-13 11:04   ` Jonathan Cameron
2022-10-13 11:04     ` Jonathan Cameron via
2022-10-07 15:21 ` [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron
2022-10-07 15:21   ` Jonathan Cameron via
2022-10-12 16:01   ` Gregory Price
2022-10-13 10:40     ` Jonathan Cameron
2022-10-13 10:40       ` Jonathan Cameron via
2022-10-13 10:56     ` Jonathan Cameron
2022-10-13 10:56       ` Jonathan Cameron via
2022-10-12 18:21   ` Gregory Price
2022-10-12 18:21     ` [PATCH 1/5] hw/mem/cxl_type3: fix checkpatch errors Gregory Price
2022-10-12 18:21     ` [PATCH 2/5] hw/mem/cxl_type3: Pull validation checks ahead of functional code Gregory Price
2022-10-13  9:07       ` Jonathan Cameron
2022-10-13  9:07         ` Jonathan Cameron via
2022-10-13 10:42         ` Jonathan Cameron
2022-10-13 10:42           ` Jonathan Cameron via
2022-10-12 18:21     ` [PATCH 3/5] hw/mem/cxl_type3: CDAT pre-allocate and check resources prior to work Gregory Price
2022-10-13 10:44       ` Jonathan Cameron
2022-10-13 10:44         ` Jonathan Cameron via
2022-10-12 18:21     ` [PATCH 4/5] hw/mem/cxl_type3: Change the CDAT allocation/free strategy Gregory Price
2022-10-13 10:45       ` Jonathan Cameron
2022-10-13 10:45         ` Jonathan Cameron via
2022-10-12 18:21     ` [PATCH 5/5] hw/mem/cxl_type3: Refactor CDAT sub-table entry initialization into a function Gregory Price
2022-10-13 10:47       ` Jonathan Cameron
2022-10-13 10:47         ` Jonathan Cameron via
2022-10-13 19:40         ` Gregory Price
2022-10-14 15:29           ` Jonathan Cameron
2022-10-14 15:29             ` Jonathan Cameron via
2022-10-13  8:57     ` [PATCH v7 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron
2022-10-13  8:57       ` Jonathan Cameron via
2022-10-13 11:36       ` Gregory Price
2022-10-13 11:53         ` Jonathan Cameron
2022-10-13 11:53           ` Jonathan Cameron via
2022-10-13 12:35           ` Gregory Price
2022-10-13 14:40             ` Jonathan Cameron
2022-10-13 14:40               ` Jonathan Cameron via
2022-10-07 15:21 ` [PATCH v7 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE Jonathan Cameron
2022-10-07 15:21   ` Jonathan Cameron via
2022-10-10 10:30 ` [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0 Jonathan Cameron
2022-10-10 10:30   ` Jonathan Cameron via
2022-10-11  9:45   ` Huai-Cheng
2022-10-11 21:19 ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
2022-10-11 21:19   ` [PATCH 1/5] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL Gregory Price
2022-10-11 21:19   ` [PATCH 2/5] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
2022-10-11 21:19   ` [PATCH 3/5] hw/mem/cxl_type: Generalize CDATDsmas initialization for Memory Regions Gregory Price
2022-10-12 14:10     ` Jonathan Cameron
2022-10-12 14:10       ` Jonathan Cameron via
2022-10-11 21:19   ` [PATCH 4/5] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
2022-10-11 21:19   ` [PATCH 5/5] cxl: update tests and documentation for new cxl properties Gregory Price
2022-10-11 22:20   ` [PATCH 0/5] Multi-Region and Volatile Memory support for CXL Type-3 Devices Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.