linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.
@ 2022-10-14 15:10 Jonathan Cameron
  2022-10-14 15:10 ` [PATCH v9 1/5] hw/pci: PCIe Data Object Exchange emulation Jonathan Cameron
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-14 15:10 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, Gregory Price, ira.weiny
  Cc: linuxarm, Jonathan Cameron

Changes since v8:
 - Take the entry enums out of the functions and prefix them
   appropriately.
 - Use the visibility of *_NUM_ENTRIES to allocate the cdat_table
 - Fix volatile_mr -> nonvolatile_mr

V7 Cover letter - lightly edited.

Whilst I have carried on Huai-Cheng Kuo's series version numbering and
naming, there have been very substantial changes since v6 so I would
suggest fresh review makes sense for anyone who has looked at this before.
In particularly if the Avery design folks could check I haven't broken
anything that would be great.

For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/

Summary of changes:
1) Linux headers definitions for DOE are now upstream so drop that patch.
2) Add CDAT for switch upstream port.
3) Generate 'plausible' default CDAT tables when a file is not provided.
4) General refactoring to calculate the correct table sizes and allocate
   based on that rather than copying from a local static array.
5) Changes from earlier reviews such as matching QEMU type naming style.
6) Moved compliance and SPDM usecases to future patch sets.

Sign-offs on these are complex because the patches were originally developed
by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
Cameron who made substantial changes.

Huai-Cheng Kuo confirmed they are happy to maintain this updated code.

What's here?

This series brings generic PCI Express Data Object Exchange support (DOE)
DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
config space via a PCIe Extended Capability Structure.
The PCIe spec defines several protocols (including one to discover what
protocols a given DOE instance supports) and other specification such as
CXL define additional protocols using their own vendor IDs.

In this series we make use of the DOE to support the CXL spec defined
Table Access Protocol, specifically to provide access to CDAT - a
table specified in a specification that is hosted by the UEFI forum
and is used to provide runtime discoverability of the sort of information
that would otherwise be available in firmware tables (memory types,
latency and bandwidth information etc).

The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
The version merged did not support interrupts (earlier versions did
so that support in the emulation was tested a while back).

This series provides CDAT emulation for CXL switch upstream ports
and CXL type 3 memory devices. Note that to exercise the switch support
additional Linux kernel patches are needed.
https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
(I'll post a new version of that support shortly)

Additional protocols will be supported by follow on patch sets:
* CXL compliance protocol.
* CMA / SPDM device attestation.
(Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
that tree next week)
Huai-Cheng Kuo (3):
  hw/pci: PCIe Data Object Exchange emulation
  hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
  hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange

Jonathan Cameron (2):
  hw/mem/cxl-type3: Add MSIX support
  hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE

 MAINTAINERS                    |   7 +
 hw/cxl/cxl-cdat.c              | 224 ++++++++++++++++++++
 hw/cxl/meson.build             |   1 +
 hw/mem/cxl_type3.c             | 264 ++++++++++++++++++++++++
 hw/pci-bridge/cxl_upstream.c   | 195 +++++++++++++++++-
 hw/pci/meson.build             |   1 +
 hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
 include/hw/cxl/cxl_component.h |   7 +
 include/hw/cxl/cxl_device.h    |   3 +
 include/hw/cxl/cxl_pci.h       |   1 +
 include/hw/pci/pci_ids.h       |   3 +
 include/hw/pci/pcie.h          |   1 +
 include/hw/pci/pcie_doe.h      | 123 +++++++++++
 include/hw/pci/pcie_regs.h     |   4 +
 15 files changed, 1366 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/cxl-cdat.c
 create mode 100644 hw/pci/pcie_doe.c
 create mode 100644 include/hw/cxl/cxl_cdat.h
 create mode 100644 include/hw/pci/pcie_doe.h

-- 
2.37.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v9 1/5] hw/pci: PCIe Data Object Exchange emulation
  2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
@ 2022-10-14 15:10 ` Jonathan Cameron
  2022-10-14 15:10 ` [PATCH v9 2/5] hw/mem/cxl-type3: Add MSIX support Jonathan Cameron
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-14 15:10 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, Gregory Price, ira.weiny
  Cc: linuxarm, Jonathan Cameron

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

Emulation of PCIe Data Object Exchange (DOE)
PCIE Base Specification r6.0 6.3 Data Object Exchange

Supports multiple DOE PCIe Extended Capabilities for a single PCIe
device. For each capability, a static array of DOEProtocol should be passed
to pcie_doe_init(). The protocols in that array will be registered under
the DOE capability structure. For each protocol, vendor ID, type, and
corresponding callback function (handle_request()) should be implemented.
This callback function represents how the DOE request for corresponding
protocol will be handled.

pcie_doe_{read/write}_config() must be appended to corresponding PCI
device's config_read/write() handler to enable DOE access. In
pcie_doe_read_config(), false will be returned if pci_config_read()
offset is not within DOE capability range. In pcie_doe_write_config(),
the function will have no affect if the address is not within the related
DOE PCIE extended capability.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 MAINTAINERS                |   7 +
 hw/pci/meson.build         |   1 +
 hw/pci/pcie_doe.c          | 367 +++++++++++++++++++++++++++++++++++++
 include/hw/pci/pci_ids.h   |   3 +
 include/hw/pci/pcie.h      |   1 +
 include/hw/pci/pcie_doe.h  | 123 +++++++++++++
 include/hw/pci/pcie_regs.h |   4 +
 7 files changed, 506 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8ae2e43c83..562e1d02a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1832,6 +1832,13 @@ F: qapi/pci.json
 F: docs/pci*
 F: docs/specs/*pci*
 
+PCIE DOE
+M: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
+M: Chris Browy <cbrowy@avery-design.com>
+S: Supported
+F: include/hw/pci/pcie_doe.h
+F: hw/pci/pcie_doe.c
+
 ACPI/SMBIOS
 M: Michael S. Tsirkin <mst@redhat.com>
 M: Igor Mammedov <imammedo@redhat.com>
diff --git a/hw/pci/meson.build b/hw/pci/meson.build
index bcc9c75919..5aff7ed1c6 100644
--- a/hw/pci/meson.build
+++ b/hw/pci/meson.build
@@ -13,6 +13,7 @@ pci_ss.add(files(
 # allow plugging PCIe devices into PCI buses, include them even if
 # CONFIG_PCI_EXPRESS=n.
 pci_ss.add(files('pcie.c', 'pcie_aer.c'))
+pci_ss.add(files('pcie_doe.c'))
 softmmu_ss.add(when: 'CONFIG_PCI_EXPRESS', if_true: files('pcie_port.c', 'pcie_host.c'))
 softmmu_ss.add_all(when: 'CONFIG_PCI', if_true: pci_ss)
 
diff --git a/hw/pci/pcie_doe.c b/hw/pci/pcie_doe.c
new file mode 100644
index 0000000000..2210f86968
--- /dev/null
+++ b/hw/pci/pcie_doe.c
@@ -0,0 +1,367 @@
+/*
+ * PCIe Data Object Exchange
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/range.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_doe.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+
+#define DWORD_BYTE 4
+
+typedef struct DoeDiscoveryReq {
+    DOEHeader header;
+    uint8_t index;
+    uint8_t reserved[3];
+} QEMU_PACKED DoeDiscoveryReq;
+
+typedef struct DoeDiscoveryRsp {
+    DOEHeader header;
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    uint8_t next_index;
+} QEMU_PACKED DoeDiscoveryRsp;
+
+static bool pcie_doe_discovery(DOECap *doe_cap)
+{
+    DoeDiscoveryReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    DoeDiscoveryRsp rsp;
+    uint8_t index = req->index;
+    DOEProtocol *prot;
+
+    /* Discard request if length does not match DoeDiscoveryReq */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(DoeDiscoveryReq), DWORD_BYTE)) {
+        return false;
+    }
+
+    rsp.header = (DOEHeader) {
+        .vendor_id = PCI_VENDOR_ID_PCI_SIG,
+        .data_obj_type = PCI_SIG_DOE_DISCOVERY,
+        .length = DIV_ROUND_UP(sizeof(DoeDiscoveryRsp), DWORD_BYTE),
+    };
+
+    /* Point to the requested protocol, index 0 must be Discovery */
+    if (index == 0) {
+        rsp.vendor_id = PCI_VENDOR_ID_PCI_SIG;
+        rsp.data_obj_type = PCI_SIG_DOE_DISCOVERY;
+    } else {
+        if (index < doe_cap->protocol_num) {
+            prot = &doe_cap->protocols[index - 1];
+            rsp.vendor_id = prot->vendor_id;
+            rsp.data_obj_type = prot->data_obj_type;
+        } else {
+            rsp.vendor_id = 0xFFFF;
+            rsp.data_obj_type = 0xFF;
+        }
+    }
+
+    if (index + 1 == doe_cap->protocol_num) {
+        rsp.next_index = 0;
+    } else {
+        rsp.next_index = index + 1;
+    }
+
+    pcie_doe_set_rsp(doe_cap, &rsp);
+
+    return true;
+}
+
+static void pcie_doe_reset_mbox(DOECap *st)
+{
+    st->read_mbox_idx = 0;
+    st->read_mbox_len = 0;
+    st->write_mbox_len = 0;
+
+    memset(st->read_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+    memset(st->write_mbox, 0, PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+}
+
+void pcie_doe_init(PCIDevice *dev, DOECap *doe_cap, uint16_t offset,
+                   DOEProtocol *protocols, bool intr, uint16_t vec)
+{
+    pcie_add_capability(dev, PCI_EXT_CAP_ID_DOE, 0x1, offset,
+                        PCI_DOE_SIZEOF);
+
+    doe_cap->pdev = dev;
+    doe_cap->offset = offset;
+
+    if (intr && (msi_present(dev) || msix_present(dev))) {
+        doe_cap->cap.intr = intr;
+        doe_cap->cap.vec = vec;
+    }
+
+    doe_cap->write_mbox = g_malloc0(PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+    doe_cap->read_mbox = g_malloc0(PCI_DOE_DW_SIZE_MAX * DWORD_BYTE);
+
+    pcie_doe_reset_mbox(doe_cap);
+
+    doe_cap->protocols = protocols;
+    for (; protocols->vendor_id; protocols++) {
+        doe_cap->protocol_num++;
+    }
+    assert(doe_cap->protocol_num < PCI_DOE_PROTOCOL_NUM_MAX);
+
+    /* Increment to allow for the discovery protocol */
+    doe_cap->protocol_num++;
+}
+
+void pcie_doe_fini(DOECap *doe_cap)
+{
+    g_free(doe_cap->read_mbox);
+    g_free(doe_cap->write_mbox);
+    g_free(doe_cap);
+}
+
+uint32_t pcie_doe_build_protocol(DOEProtocol *p)
+{
+    return DATA_OBJ_BUILD_HEADER1(p->vendor_id, p->data_obj_type);
+}
+
+void *pcie_doe_get_write_mbox_ptr(DOECap *doe_cap)
+{
+    return doe_cap->write_mbox;
+}
+
+/*
+ * Copy the response to read mailbox buffer
+ * This might be called in self-defined handle_request() if a DOE response is
+ * required in the corresponding protocol
+ */
+void pcie_doe_set_rsp(DOECap *doe_cap, void *rsp)
+{
+    uint32_t len = pcie_doe_get_obj_len(rsp);
+
+    memcpy(doe_cap->read_mbox + doe_cap->read_mbox_len, rsp, len * DWORD_BYTE);
+    doe_cap->read_mbox_len += len;
+}
+
+uint32_t pcie_doe_get_obj_len(void *obj)
+{
+    uint32_t len;
+
+    if (!obj) {
+        return 0;
+    }
+
+    /* Only lower 18 bits are valid */
+    len = DATA_OBJ_LEN_MASK(((DOEHeader *)obj)->length);
+
+    /* PCIe r6.0 Table 6.29: a value of 00000h indicates 2^18 DW */
+    return (len) ? len : PCI_DOE_DW_SIZE_MAX;
+}
+
+static void pcie_doe_irq_assert(DOECap *doe_cap)
+{
+    PCIDevice *dev = doe_cap->pdev;
+
+    if (doe_cap->cap.intr && doe_cap->ctrl.intr) {
+        if (doe_cap->status.intr) {
+            return;
+        }
+        doe_cap->status.intr = 1;
+
+        if (msix_enabled(dev)) {
+            msix_notify(dev, doe_cap->cap.vec);
+        } else if (msi_enabled(dev)) {
+            msi_notify(dev, doe_cap->cap.vec);
+        }
+    }
+}
+
+static void pcie_doe_set_ready(DOECap *doe_cap, bool rdy)
+{
+    doe_cap->status.ready = rdy;
+
+    if (rdy) {
+        pcie_doe_irq_assert(doe_cap);
+    }
+}
+
+static void pcie_doe_set_error(DOECap *doe_cap, bool err)
+{
+    doe_cap->status.error = err;
+
+    if (err) {
+        pcie_doe_irq_assert(doe_cap);
+    }
+}
+
+/*
+ * Check incoming request in write_mbox for protocol format
+ */
+static void pcie_doe_prepare_rsp(DOECap *doe_cap)
+{
+    bool success = false;
+    int p;
+    bool (*handle_request)(DOECap *) = NULL;
+
+    if (doe_cap->status.error) {
+        return;
+    }
+
+    if (doe_cap->write_mbox[0] ==
+        DATA_OBJ_BUILD_HEADER1(PCI_VENDOR_ID_PCI_SIG, PCI_SIG_DOE_DISCOVERY)) {
+        handle_request = pcie_doe_discovery;
+    } else {
+        for (p = 0; p < doe_cap->protocol_num - 1; p++) {
+            if (doe_cap->write_mbox[0] ==
+                pcie_doe_build_protocol(&doe_cap->protocols[p])) {
+                handle_request = doe_cap->protocols[p].handle_request;
+                break;
+            }
+        }
+    }
+
+    /*
+     * PCIe r6 DOE 6.30.1:
+     * If the number of DW transferred does not match the
+     * indicated Length for a data object, then the
+     * data object must be silently discarded.
+     */
+    if (handle_request && (doe_cap->write_mbox_len ==
+        pcie_doe_get_obj_len(pcie_doe_get_write_mbox_ptr(doe_cap)))) {
+        success = handle_request(doe_cap);
+    }
+
+    if (success) {
+        pcie_doe_set_ready(doe_cap, 1);
+    } else {
+        pcie_doe_reset_mbox(doe_cap);
+    }
+}
+
+/*
+ * Read from DOE config space.
+ * Return false if the address not within DOE_CAP range.
+ */
+bool pcie_doe_read_config(DOECap *doe_cap, uint32_t addr, int size,
+                          uint32_t *buf)
+{
+    uint32_t shift;
+    uint16_t doe_offset = doe_cap->offset;
+
+    if (!range_covers_byte(doe_offset + PCI_EXP_DOE_CAP,
+                           PCI_DOE_SIZEOF - 4, addr)) {
+        return false;
+    }
+
+    addr -= doe_offset;
+    *buf = 0;
+
+    if (range_covers_byte(PCI_EXP_DOE_CAP, DWORD_BYTE, addr)) {
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_REG, INTR_SUPP,
+                          doe_cap->cap.intr);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_REG, DOE_INTR_MSG_NUM,
+                          doe_cap->cap.vec);
+    } else if (range_covers_byte(PCI_EXP_DOE_CTRL, DWORD_BYTE, addr)) {
+        /* Must return ABORT=0 and GO=0 */
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_CONTROL, DOE_INTR_EN,
+                          doe_cap->ctrl.intr);
+    } else if (range_covers_byte(PCI_EXP_DOE_STATUS, DWORD_BYTE, addr)) {
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_BUSY,
+                          doe_cap->status.busy);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_INTR_STATUS,
+                          doe_cap->status.intr);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DOE_ERROR,
+                          doe_cap->status.error);
+        *buf = FIELD_DP32(*buf, PCI_DOE_CAP_STATUS, DATA_OBJ_RDY,
+                          doe_cap->status.ready);
+    /* Mailbox should be DW accessed */
+    } else if (addr == PCI_EXP_DOE_RD_DATA_MBOX && size == DWORD_BYTE) {
+        if (doe_cap->status.ready && !doe_cap->status.error) {
+            *buf = doe_cap->read_mbox[doe_cap->read_mbox_idx];
+        }
+    }
+
+    /* Process Alignment */
+    shift = addr % DWORD_BYTE;
+    *buf = extract32(*buf, shift * 8, size * 8);
+
+    return true;
+}
+
+/*
+ * Write to DOE config space.
+ * Return if the address not within DOE_CAP range or receives an abort
+ */
+void pcie_doe_write_config(DOECap *doe_cap,
+                           uint32_t addr, uint32_t val, int size)
+{
+    uint16_t doe_offset = doe_cap->offset;
+    uint32_t shift;
+
+    if (!range_covers_byte(doe_offset + PCI_EXP_DOE_CAP,
+                           PCI_DOE_SIZEOF - 4, addr)) {
+        return;
+    }
+
+    /* Process Alignment */
+    shift = addr % DWORD_BYTE;
+    addr -= (doe_offset + shift);
+    val = deposit32(val, shift * 8, size * 8, val);
+
+    switch (addr) {
+    case PCI_EXP_DOE_CTRL:
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_ABORT)) {
+            pcie_doe_set_ready(doe_cap, 0);
+            pcie_doe_set_error(doe_cap, 0);
+            pcie_doe_reset_mbox(doe_cap);
+            return;
+        }
+
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_GO)) {
+            pcie_doe_prepare_rsp(doe_cap);
+        }
+
+        if (FIELD_EX32(val, PCI_DOE_CAP_CONTROL, DOE_INTR_EN)) {
+            doe_cap->ctrl.intr = 1;
+        /* Clear interrupt bit located within the first byte */
+        } else if (shift == 0) {
+            doe_cap->ctrl.intr = 0;
+        }
+        break;
+    case PCI_EXP_DOE_STATUS:
+        if (FIELD_EX32(val, PCI_DOE_CAP_STATUS, DOE_INTR_STATUS)) {
+            doe_cap->status.intr = 0;
+        }
+        break;
+    case PCI_EXP_DOE_RD_DATA_MBOX:
+        /* Mailbox should be DW accessed */
+        if (size != DWORD_BYTE) {
+            return;
+        }
+        doe_cap->read_mbox_idx++;
+        if (doe_cap->read_mbox_idx == doe_cap->read_mbox_len) {
+            pcie_doe_reset_mbox(doe_cap);
+            pcie_doe_set_ready(doe_cap, 0);
+        } else if (doe_cap->read_mbox_idx > doe_cap->read_mbox_len) {
+            /* Underflow */
+            pcie_doe_set_error(doe_cap, 1);
+        }
+        break;
+    case PCI_EXP_DOE_WR_DATA_MBOX:
+        /* Mailbox should be DW accessed */
+        if (size != DWORD_BYTE) {
+            return;
+        }
+        doe_cap->write_mbox[doe_cap->write_mbox_len] = val;
+        doe_cap->write_mbox_len++;
+        break;
+    case PCI_EXP_DOE_CAP:
+        /* fallthrough */
+    default:
+        break;
+    }
+}
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index d5ddea558b..bc9f834fd1 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -157,6 +157,9 @@
 
 /* Vendors and devices.  Sort key: vendor first, device next. */
 
+/* Ref: PCIe r6.0 Table 6-32 */
+#define PCI_VENDOR_ID_PCI_SIG            0x0001
+
 #define PCI_VENDOR_ID_LSI_LOGIC          0x1000
 #define PCI_DEVICE_ID_LSI_53C810         0x0001
 #define PCI_DEVICE_ID_LSI_53C895A        0x0012
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 798a262a0a..698d3de851 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -26,6 +26,7 @@
 #include "hw/pci/pcie_aer.h"
 #include "hw/pci/pcie_sriov.h"
 #include "hw/hotplug.h"
+#include "hw/pci/pcie_doe.h"
 
 typedef enum {
     /* for attention and power indicator */
diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
new file mode 100644
index 0000000000..ba4d8b03bd
--- /dev/null
+++ b/include/hw/pci/pcie_doe.h
@@ -0,0 +1,123 @@
+/*
+ * PCIe Data Object Exchange
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef PCIE_DOE_H
+#define PCIE_DOE_H
+
+#include "qemu/range.h"
+#include "qemu/typedefs.h"
+#include "hw/register.h"
+
+/*
+ * Reference:
+ * PCIe r6.0 - 7.9.24 Data Object Exchange Extended Capability
+ */
+/* Capabilities Register - r6.0 7.9.24.2 */
+#define PCI_EXP_DOE_CAP             0x04
+REG32(PCI_DOE_CAP_REG, 0)
+    FIELD(PCI_DOE_CAP_REG, INTR_SUPP, 0, 1)
+    FIELD(PCI_DOE_CAP_REG, DOE_INTR_MSG_NUM, 1, 11)
+
+/* Control Register - r6.0 7.9.24.3 */
+#define PCI_EXP_DOE_CTRL            0x08
+REG32(PCI_DOE_CAP_CONTROL, 0)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_ABORT, 0, 1)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_INTR_EN, 1, 1)
+    FIELD(PCI_DOE_CAP_CONTROL, DOE_GO, 31, 1)
+
+/* Status Register - r6.0 7.9.24.4 */
+#define PCI_EXP_DOE_STATUS          0x0c
+REG32(PCI_DOE_CAP_STATUS, 0)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_BUSY, 0, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_INTR_STATUS, 1, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DOE_ERROR, 2, 1)
+    FIELD(PCI_DOE_CAP_STATUS, DATA_OBJ_RDY, 31, 1)
+
+/* Write Data Mailbox Register - r6.0 7.9.24.5 */
+#define PCI_EXP_DOE_WR_DATA_MBOX    0x10
+
+/* Read Data Mailbox Register - 7.9.xx.6 */
+#define PCI_EXP_DOE_RD_DATA_MBOX    0x14
+
+/* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
+#define PCI_SIG_DOE_DISCOVERY       0x00
+
+#define PCI_DOE_DW_SIZE_MAX         (1 << 18)
+#define PCI_DOE_PROTOCOL_NUM_MAX    256
+
+#define DATA_OBJ_BUILD_HEADER1(v, p)    (((p) << 16) | (v))
+#define DATA_OBJ_LEN_MASK(len)          ((len) & (PCI_DOE_DW_SIZE_MAX - 1))
+
+typedef struct DOEHeader DOEHeader;
+typedef struct DOEProtocol DOEProtocol;
+typedef struct DOECap DOECap;
+
+struct DOEHeader {
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    uint8_t reserved;
+    uint32_t length;
+} QEMU_PACKED;
+
+/* Protocol infos and rsp function callback */
+struct DOEProtocol {
+    uint16_t vendor_id;
+    uint8_t data_obj_type;
+    bool (*handle_request)(DOECap *);
+};
+
+struct DOECap {
+    /* Owner */
+    PCIDevice *pdev;
+
+    uint16_t offset;
+
+    struct {
+        bool intr;
+        uint16_t vec;
+    } cap;
+
+    struct {
+        bool abort;
+        bool intr;
+        bool go;
+    } ctrl;
+
+    struct {
+        bool busy;
+        bool intr;
+        bool error;
+        bool ready;
+    } status;
+
+    uint32_t *write_mbox;
+    uint32_t *read_mbox;
+
+    /* Mailbox position indicator */
+    uint32_t read_mbox_idx;
+    uint32_t read_mbox_len;
+    uint32_t write_mbox_len;
+
+    /* Protocols and its callback response */
+    DOEProtocol *protocols;
+    uint16_t protocol_num;
+};
+
+void pcie_doe_init(PCIDevice *pdev, DOECap *doe_cap, uint16_t offset,
+                   DOEProtocol *protocols, bool intr, uint16_t vec);
+void pcie_doe_fini(DOECap *doe_cap);
+bool pcie_doe_read_config(DOECap *doe_cap, uint32_t addr, int size,
+                          uint32_t *buf);
+void pcie_doe_write_config(DOECap *doe_cap, uint32_t addr,
+                           uint32_t val, int size);
+uint32_t pcie_doe_build_protocol(DOEProtocol *p);
+void *pcie_doe_get_write_mbox_ptr(DOECap *doe_cap);
+void pcie_doe_set_rsp(DOECap *doe_cap, void *rsp);
+uint32_t pcie_doe_get_obj_len(void *obj);
+#endif /* PCIE_DOE_H */
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 1db86b0ec4..963dc2e170 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -179,4 +179,8 @@ typedef enum PCIExpLinkWidth {
 #define PCI_ACS_VER                     0x1
 #define PCI_ACS_SIZEOF                  8
 
+/* DOE Capability Register Fields */
+#define PCI_DOE_VER                     0x1
+#define PCI_DOE_SIZEOF                  24
+
 #endif /* QEMU_PCIE_REGS_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v9 2/5] hw/mem/cxl-type3: Add MSIX support
  2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
  2022-10-14 15:10 ` [PATCH v9 1/5] hw/pci: PCIe Data Object Exchange emulation Jonathan Cameron
@ 2022-10-14 15:10 ` Jonathan Cameron
  2022-10-14 15:10 ` [PATCH v9 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation Jonathan Cameron
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-14 15:10 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, Gregory Price, ira.weiny
  Cc: linuxarm, Jonathan Cameron

This will be used by several upcoming patch sets so break it out
such that it doesn't matter which one lands first.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/mem/cxl_type3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a71bf1afeb..568c9d62f5 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -13,6 +13,7 @@
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
 #include "hw/cxl/cxl.h"
+#include "hw/pci/msix.h"
 
 /*
  * Null value of all Fs suggested by IEEE RA guidelines for use of
@@ -146,6 +147,8 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     ComponentRegisters *regs = &cxl_cstate->crb;
     MemoryRegion *mr = &regs->component_registers;
     uint8_t *pci_conf = pci_dev->config;
+    unsigned short msix_num = 1;
+    int i;
 
     if (!cxl_setup_memory(ct3d, errp)) {
         return;
@@ -180,6 +183,12 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
                      PCI_BASE_ADDRESS_SPACE_MEMORY |
                          PCI_BASE_ADDRESS_MEM_TYPE_64,
                      &ct3d->cxl_dstate.device_registers);
+
+    /* MSI(-X) Initailization */
+    msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
+    for (i = 0; i < msix_num; i++) {
+        msix_vector_use(pci_dev, i);
+    }
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v9 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
  2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
  2022-10-14 15:10 ` [PATCH v9 1/5] hw/pci: PCIe Data Object Exchange emulation Jonathan Cameron
  2022-10-14 15:10 ` [PATCH v9 2/5] hw/mem/cxl-type3: Add MSIX support Jonathan Cameron
@ 2022-10-14 15:10 ` Jonathan Cameron
  2022-10-14 15:10 ` [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-14 15:10 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, Gregory Price, ira.weiny
  Cc: linuxarm, Jonathan Cameron

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

The Data Object Exchange implementation of CXL Coherent Device Attribute
Table (CDAT). This implementation is referring to "Coherent Device
Attribute Table Specification, Rev. 1.03, July. 2022" and "Compute
Express Link Specification, Rev. 3.0, July. 2022"

This patch adds core support that will be shared by both
end-points and switch port emulation.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/cxl/cxl-cdat.c              | 224 +++++++++++++++++++++++++++++++++
 hw/cxl/meson.build             |   1 +
 include/hw/cxl/cxl_cdat.h      | 165 ++++++++++++++++++++++++
 include/hw/cxl/cxl_component.h |   7 ++
 include/hw/cxl/cxl_device.h    |   3 +
 include/hw/cxl/cxl_pci.h       |   1 +
 6 files changed, 401 insertions(+)

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
new file mode 100644
index 0000000000..3653aa56f0
--- /dev/null
+++ b/hw/cxl/cxl-cdat.c
@@ -0,0 +1,224 @@
+/*
+ * CXL CDAT Structure
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+
+static void cdat_len_check(CDATSubHeader *hdr, Error **errp)
+{
+    assert(hdr->length);
+    assert(hdr->reserved == 0);
+
+    switch (hdr->type) {
+    case CDAT_TYPE_DSMAS:
+        assert(hdr->length == sizeof(CDATDsmas));
+        break;
+    case CDAT_TYPE_DSLBIS:
+        assert(hdr->length == sizeof(CDATDslbis));
+        break;
+    case CDAT_TYPE_DSMSCIS:
+        assert(hdr->length == sizeof(CDATDsmscis));
+        break;
+    case CDAT_TYPE_DSIS:
+        assert(hdr->length == sizeof(CDATDsis));
+        break;
+    case CDAT_TYPE_DSEMTS:
+        assert(hdr->length == sizeof(CDATDsemts));
+        break;
+    case CDAT_TYPE_SSLBIS:
+        assert(hdr->length >= sizeof(CDATSslbisHeader));
+        assert((hdr->length - sizeof(CDATSslbisHeader)) %
+               sizeof(CDATSslbe) == 0);
+        break;
+    default:
+        error_setg(errp, "Type %d is reserved", hdr->type);
+    }
+}
+
+static void ct3_build_cdat(CDATObject *cdat, Error **errp)
+{
+    g_autofree CDATTableHeader *cdat_header = NULL;
+    g_autofree CDATEntry *cdat_st = NULL;
+    uint8_t sum = 0;
+    int ent, i;
+
+    /* Use default table if fopen == NULL */
+    assert(cdat->build_cdat_table);
+
+    cdat_header = g_malloc0(sizeof(*cdat_header));
+    if (!cdat_header) {
+        error_setg(errp, "Failed to allocate CDAT header");
+        return;
+    }
+
+    cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf, cdat->private);
+
+    if (!cdat->built_buf_len) {
+        /* Build later as not all data available yet */
+        cdat->to_update = true;
+        return;
+    }
+    cdat->to_update = false;
+
+    cdat_st = g_malloc0(sizeof(*cdat_st) * (cdat->built_buf_len + 1));
+    if (!cdat_st) {
+        error_setg(errp, "Failed to allocate CDAT entry array");
+        return;
+    }
+
+    /* Entry 0 for CDAT header, starts with Entry 1 */
+    for (ent = 1; ent < cdat->built_buf_len + 1; ent++) {
+        CDATSubHeader *hdr = cdat->built_buf[ent - 1];
+        uint8_t *buf = (uint8_t *)cdat->built_buf[ent - 1];
+
+        cdat_st[ent].base = hdr;
+        cdat_st[ent].length = hdr->length;
+
+        cdat_header->length += hdr->length;
+        for (i = 0; i < hdr->length; i++) {
+            sum += buf[i];
+        }
+    }
+
+    /* CDAT header */
+    cdat_header->revision = CXL_CDAT_REV;
+    /* For now, no runtime updates */
+    cdat_header->sequence = 0;
+    cdat_header->length += sizeof(CDATTableHeader);
+    sum += cdat_header->revision + cdat_header->sequence +
+        cdat_header->length;
+    /* Sum of all bytes including checksum must be 0 */
+    cdat_header->checksum = ~sum + 1;
+
+    cdat_st[0].base = g_steal_pointer(&cdat_header);
+    cdat_st[0].length = sizeof(*cdat_header);
+    cdat->entry_len = 1 + cdat->built_buf_len;
+    cdat->entry = g_steal_pointer(&cdat_st);
+}
+
+static void ct3_load_cdat(CDATObject *cdat, Error **errp)
+{
+    g_autofree CDATEntry *cdat_st = NULL;
+    uint8_t sum = 0;
+    int num_ent;
+    int i = 0, ent = 1, file_size = 0;
+    CDATSubHeader *hdr;
+    FILE *fp = NULL;
+
+    /* Read CDAT file and create its cache */
+    fp = fopen(cdat->filename, "r");
+    if (!fp) {
+        error_setg(errp, "CDAT: Unable to open file");
+        return;
+    }
+
+    fseek(fp, 0, SEEK_END);
+    file_size = ftell(fp);
+    fseek(fp, 0, SEEK_SET);
+    cdat->buf = g_malloc0(file_size);
+
+    if (fread(cdat->buf, file_size, 1, fp) == 0) {
+        error_setg(errp, "CDAT: File read failed");
+        return;
+    }
+
+    fclose(fp);
+
+    if (file_size < sizeof(CDATTableHeader)) {
+        error_setg(errp, "CDAT: File too short");
+        return;
+    }
+    i = sizeof(CDATTableHeader);
+    num_ent = 1;
+    while (i < file_size) {
+        hdr = (CDATSubHeader *)(cdat->buf + i);
+        cdat_len_check(hdr, errp);
+        i += hdr->length;
+        num_ent++;
+    }
+    if (i != file_size) {
+        error_setg(errp, "CDAT: File length missmatch");
+        return;
+    }
+
+    cdat_st = g_malloc0(sizeof(*cdat_st) * num_ent);
+    if (!cdat_st) {
+        error_setg(errp, "CDAT: Failed to allocate entry array");
+        return;
+    }
+
+    /* Set CDAT header, Entry = 0 */
+    cdat_st[0].base = cdat->buf;
+    cdat_st[0].length = sizeof(CDATTableHeader);
+    i = 0;
+
+    while (i < cdat_st[0].length) {
+        sum += cdat->buf[i++];
+    }
+
+    /* Read CDAT structures */
+    while (i < file_size) {
+        hdr = (CDATSubHeader *)(cdat->buf + i);
+        cdat_len_check(hdr, errp);
+
+        cdat_st[ent].base = hdr;
+        cdat_st[ent].length = hdr->length;
+
+        while (cdat->buf + i <
+               (uint8_t *)cdat_st[ent].base + cdat_st[ent].length) {
+            assert(i < file_size);
+            sum += cdat->buf[i++];
+        }
+
+        ent++;
+    }
+
+    if (sum != 0) {
+        warn_report("CDAT: Found checksum mismatch in %s", cdat->filename);
+    }
+    cdat->entry_len = num_ent;
+    cdat->entry = g_steal_pointer(&cdat_st);
+}
+
+void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    if (cdat->filename) {
+        ct3_load_cdat(cdat, errp);
+    } else {
+        ct3_build_cdat(cdat, errp);
+    }
+}
+
+void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    if (cdat->to_update) {
+        ct3_build_cdat(cdat, errp);
+    }
+}
+
+void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
+{
+    CDATObject *cdat = &cxl_cstate->cdat;
+
+    free(cdat->entry);
+    if (cdat->built_buf) {
+        cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
+                              cdat->private);
+    }
+    if (cdat->buf) {
+        free(cdat->buf);
+    }
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index f117b99949..cfa95ffd40 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -4,6 +4,7 @@ softmmu_ss.add(when: 'CONFIG_CXL',
                    'cxl-device-utils.c',
                    'cxl-mailbox-utils.c',
                    'cxl-host.c',
+                   'cxl-cdat.c',
                ),
                if_false: files(
                    'cxl-host-stubs.c',
diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
new file mode 100644
index 0000000000..52c232e912
--- /dev/null
+++ b/include/hw/cxl/cxl_cdat.h
@@ -0,0 +1,165 @@
+/*
+ * CXL CDAT Structure
+ *
+ * Copyright (C) 2021 Avery Design Systems, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_CDAT_H
+#define CXL_CDAT_H
+
+#include "hw/cxl/cxl_pci.h"
+
+/*
+ * Reference:
+ *   Coherent Device Attribute Table (CDAT) Specification, Rev. 1.03, July. 2022
+ *   Compute Express Link (CXL) Specification, Rev. 3.0, Aug. 2022
+ */
+/* Table Access DOE - CXL r3.0 8.1.11 */
+#define CXL_DOE_TABLE_ACCESS      2
+#define CXL_DOE_PROTOCOL_CDAT     ((CXL_DOE_TABLE_ACCESS << 16) | CXL_VENDOR_ID)
+
+/* Read Entry - CXL r3.0 8.1.11.1 */
+#define CXL_DOE_TAB_TYPE_CDAT 0
+#define CXL_DOE_TAB_ENT_MAX 0xFFFF
+
+/* Read Entry Request - CXL r3.0 8.1.11.1 Table 8-13 */
+#define CXL_DOE_TAB_REQ 0
+typedef struct CDATReq {
+    DOEHeader header;
+    uint8_t req_code;
+    uint8_t table_type;
+    uint16_t entry_handle;
+} QEMU_PACKED CDATReq;
+
+/* Read Entry Response - CXL r3.0 8.1.11.1 Table 8-14 */
+#define CXL_DOE_TAB_RSP 0
+typedef struct CDATRsp {
+    DOEHeader header;
+    uint8_t rsp_code;
+    uint8_t table_type;
+    uint16_t entry_handle;
+} QEMU_PACKED CDATRsp;
+
+/* CDAT Table Format - CDAT Table 1 */
+#define CXL_CDAT_REV 2
+typedef struct CDATTableHeader {
+    uint32_t length;
+    uint8_t revision;
+    uint8_t checksum;
+    uint8_t reserved[6];
+    uint32_t sequence;
+} QEMU_PACKED CDATTableHeader;
+
+/* CDAT Structure Types - CDAT Table 2 */
+typedef enum {
+    CDAT_TYPE_DSMAS = 0,
+    CDAT_TYPE_DSLBIS = 1,
+    CDAT_TYPE_DSMSCIS = 2,
+    CDAT_TYPE_DSIS = 3,
+    CDAT_TYPE_DSEMTS = 4,
+    CDAT_TYPE_SSLBIS = 5,
+} CDATType;
+
+typedef struct CDATSubHeader {
+    uint8_t type;
+    uint8_t reserved;
+    uint16_t length;
+} CDATSubHeader;
+
+/* Device Scoped Memory Affinity Structure - CDAT Table 3 */
+typedef struct CDATDsmas {
+    CDATSubHeader header;
+    uint8_t DSMADhandle;
+    uint8_t flags;
+#define CDAT_DSMAS_FLAG_NV              (1 << 2)
+#define CDAT_DSMAS_FLAG_SHAREABLE       (1 << 3)
+#define CDAT_DSMAS_FLAG_HW_COHERENT     (1 << 4)
+#define CDAT_DSMAS_FLAG_DYNAMIC_CAP     (1 << 5)
+    uint16_t reserved;
+    uint64_t DPA_base;
+    uint64_t DPA_length;
+} QEMU_PACKED CDATDsmas;
+
+/* Device Scoped Latency and Bandwidth Information Structure - CDAT Table 5 */
+typedef struct CDATDslbis {
+    CDATSubHeader header;
+    uint8_t handle;
+    /* Definitions of these fields refer directly to HMAT fields */
+    uint8_t flags;
+    uint8_t data_type;
+    uint8_t reserved;
+    uint64_t entry_base_unit;
+    uint16_t entry[3];
+    uint16_t reserved2;
+} QEMU_PACKED CDATDslbis;
+
+/* Device Scoped Memory Side Cache Information Structure - CDAT Table 6 */
+typedef struct CDATDsmscis {
+    CDATSubHeader header;
+    uint8_t DSMAS_handle;
+    uint8_t reserved[3];
+    uint64_t memory_side_cache_size;
+    uint32_t cache_attributes;
+} QEMU_PACKED CDATDsmscis;
+
+/* Device Scoped Initiator Structure - CDAT Table 7 */
+typedef struct CDATDsis {
+    CDATSubHeader header;
+    uint8_t flags;
+    uint8_t handle;
+    uint16_t reserved;
+} QEMU_PACKED CDATDsis;
+
+/* Device Scoped EFI Memory Type Structure - CDAT Table 8 */
+typedef struct CDATDsemts {
+    CDATSubHeader header;
+    uint8_t DSMAS_handle;
+    uint8_t EFI_memory_type_attr;
+    uint16_t reserved;
+    uint64_t DPA_offset;
+    uint64_t DPA_length;
+} QEMU_PACKED CDATDsemts;
+
+/* Switch Scoped Latency and Bandwidth Information Structure - CDAT Table 9 */
+typedef struct CDATSslbisHeader {
+    CDATSubHeader header;
+    uint8_t data_type;
+    uint8_t reserved[3];
+    uint64_t entry_base_unit;
+} QEMU_PACKED CDATSslbisHeader;
+
+/* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
+typedef struct CDATSslbe {
+    uint16_t port_x_id;
+    uint16_t port_y_id;
+    uint16_t latency_bandwidth;
+    uint16_t reserved;
+} QEMU_PACKED CDATSslbe;
+
+typedef struct CDATSslbis {
+    CDATSslbisHeader sslbis_header;
+    CDATSslbe sslbe[];
+} QEMU_PACKED CDATSslbis;
+
+typedef struct CDATEntry {
+    void *base;
+    uint32_t length;
+} CDATEntry;
+
+typedef struct CDATObject {
+    CDATEntry *entry;
+    int entry_len;
+
+    int (*build_cdat_table)(CDATSubHeader ***cdat_table, void *priv);
+    void (*free_cdat_table)(CDATSubHeader **cdat_table, int num, void *priv);
+    bool to_update;
+    void *private;
+    char *filename;
+    uint8_t *buf;
+    struct CDATSubHeader **built_buf;
+    int built_buf_len;
+} CDATObject;
+#endif /* CXL_CDAT_H */
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 94ec2f07d7..34075cfb72 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -19,6 +19,7 @@
 #include "qemu/range.h"
 #include "qemu/typedefs.h"
 #include "hw/register.h"
+#include "qapi/error.h"
 
 enum reg_type {
     CXL2_DEVICE,
@@ -184,6 +185,8 @@ typedef struct cxl_component {
             struct PCIDevice *pdev;
         };
     };
+
+    CDATObject cdat;
 } CXLComponentState;
 
 void cxl_component_register_block_init(Object *obj,
@@ -220,4 +223,8 @@ static inline hwaddr cxl_decode_ig(int ig)
 
 CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
 
+void cxl_doe_cdat_init(CXLComponentState *cxl_cstate, Error **errp);
+void cxl_doe_cdat_release(CXLComponentState *cxl_cstate);
+void cxl_doe_cdat_update(CXLComponentState *cxl_cstate, Error **errp);
+
 #endif
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e4d221cdb3..449b0edfe9 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -243,6 +243,9 @@ struct CXLType3Dev {
     AddressSpace hostmem_as;
     CXLComponentState cxl_cstate;
     CXLDeviceState cxl_dstate;
+
+    /* DOE */
+    DOECap doe_cdat;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h
index 01cf002096..3cb79eca1e 100644
--- a/include/hw/cxl/cxl_pci.h
+++ b/include/hw/cxl/cxl_pci.h
@@ -13,6 +13,7 @@
 #include "qemu/compiler.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pcie.h"
+#include "hw/cxl/cxl_cdat.h"
 
 #define CXL_VENDOR_ID 0x1e98
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
                   ` (2 preceding siblings ...)
  2022-10-14 15:10 ` [PATCH v9 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation Jonathan Cameron
@ 2022-10-14 15:10 ` Jonathan Cameron
  2022-10-24 17:42   ` Gregory Price
  2022-10-14 15:10 ` [PATCH v9 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE Jonathan Cameron
  2022-10-25 22:06 ` [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
  5 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-14 15:10 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, Gregory Price, ira.weiny
  Cc: linuxarm, Jonathan Cameron

From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>

The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
in "-device cxl-type3"'s command option. The file is required to provide
the whole CDAT table in binary mode. The other is to use the default
that provides some 'reasonable' numbers based on type of memory and
size.

The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
capability offset 0x190. The config read/write to this capability range
can be generated in the OS to request the CDAT data.

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
Changes since v8: Thanks to Gregory Price and Michael Tsirkin.
- Move enum out of function and prefix entries with CT3_CDAT
  Related refactoring now the size is visible.
- Fix wrong naming of volatile which should be nonvolatile.
---
 hw/mem/cxl_type3.c | 255 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 255 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 568c9d62f5..255590201a 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -12,9 +12,246 @@
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
+#include "sysemu/numa.h"
 #include "hw/cxl/cxl.h"
 #include "hw/pci/msix.h"
 
+#define DWORD_BYTE 4
+
+/* Default CDAT entries for a memory region */
+enum {
+    CT3_CDAT_DSMAS,
+    CT3_CDAT_DSLBIS0,
+    CT3_CDAT_DSLBIS1,
+    CT3_CDAT_DSLBIS2,
+    CT3_CDAT_DSLBIS3,
+    CT3_CDAT_DSEMTS,
+    CT3_CDAT_NUM_ENTRIES
+};
+
+static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
+                                         int dsmad_handle, MemoryRegion *mr)
+{
+    g_autofree CDATDsmas *dsmas = NULL;
+    g_autofree CDATDslbis *dslbis0 = NULL;
+    g_autofree CDATDslbis *dslbis1 = NULL;
+    g_autofree CDATDslbis *dslbis2 = NULL;
+    g_autofree CDATDslbis *dslbis3 = NULL;
+    g_autofree CDATDsemts *dsemts = NULL;
+
+    dsmas = g_malloc(sizeof(*dsmas));
+    if (!dsmas) {
+        return -ENOMEM;
+    }
+    *dsmas = (CDATDsmas) {
+        .header = {
+            .type = CDAT_TYPE_DSMAS,
+            .length = sizeof(*dsmas),
+        },
+        .DSMADhandle = dsmad_handle,
+        .flags = CDAT_DSMAS_FLAG_NV,
+        .DPA_base = 0,
+        .DPA_length = int128_get64(mr->size),
+    };
+
+    /* For now, no memory side cache, plausiblish numbers */
+    dslbis0 = g_malloc(sizeof(*dslbis0));
+    if (!dslbis0) {
+        return -ENOMEM;
+    }
+    *dslbis0 = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis0),
+        },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_READ_LATENCY,
+        .entry_base_unit = 10000, /* 10ns base */
+        .entry[0] = 15, /* 150ns */
+    };
+
+    dslbis1 = g_malloc(sizeof(*dslbis1));
+    if (!dslbis1) {
+        return -ENOMEM;
+    }
+    *dslbis1 = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis1),
+        },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_WRITE_LATENCY,
+        .entry_base_unit = 10000,
+        .entry[0] = 25, /* 250ns */
+    };
+
+    dslbis2 = g_malloc(sizeof(*dslbis2));
+    if (!dslbis2) {
+        return -ENOMEM;
+    }
+    *dslbis2 = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis2),
+        },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_READ_BANDWIDTH,
+        .entry_base_unit = 1000, /* GB/s */
+        .entry[0] = 16,
+    };
+
+    dslbis3 = g_malloc(sizeof(*dslbis3));
+    if (!dslbis3) {
+        return -ENOMEM;
+    }
+    *dslbis3 = (CDATDslbis) {
+        .header = {
+            .type = CDAT_TYPE_DSLBIS,
+            .length = sizeof(*dslbis3),
+        },
+        .handle = dsmad_handle,
+        .flags = HMAT_LB_MEM_MEMORY,
+        .data_type = HMAT_LB_DATA_WRITE_BANDWIDTH,
+        .entry_base_unit = 1000, /* GB/s */
+        .entry[0] = 16,
+    };
+
+    dsemts = g_malloc(sizeof(*dsemts));
+    if (!dsemts) {
+        return -ENOMEM;
+    }
+    *dsemts = (CDATDsemts) {
+        .header = {
+            .type = CDAT_TYPE_DSEMTS,
+            .length = sizeof(*dsemts),
+        },
+        .DSMAS_handle = dsmad_handle,
+        /* Reserved - the non volatile from DSMAS matters */
+        .EFI_memory_type_attr = 2,
+        .DPA_offset = 0,
+        .DPA_length = int128_get64(mr->size),
+    };
+
+    /* Header always at start of structure */
+    cdat_table[CT3_CDAT_DSMAS] = g_steal_pointer(&dsmas);
+    cdat_table[CT3_CDAT_DSLBIS0] = g_steal_pointer(&dslbis0);
+    cdat_table[CT3_CDAT_DSLBIS1] = g_steal_pointer(&dslbis1);
+    cdat_table[CT3_CDAT_DSLBIS2] = g_steal_pointer(&dslbis2);
+    cdat_table[CT3_CDAT_DSLBIS3] = g_steal_pointer(&dslbis3);
+    cdat_table[CT3_CDAT_DSEMTS] = g_steal_pointer(&dsemts);
+
+    return 0;
+}
+
+static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
+{
+    g_autofree CDATSubHeader **table = NULL;
+    MemoryRegion *nonvolatile_mr;
+    CXLType3Dev *ct3d = priv;
+    int dsmad_handle = 0;
+    int rc;
+
+    if (!ct3d->hostmem) {
+        return 0;
+    }
+
+    nonvolatile_mr = host_memory_backend_get_memory(ct3d->hostmem);
+    if (!nonvolatile_mr) {
+        return -EINVAL;
+    }
+
+    table = g_malloc0(CT3_CDAT_NUM_ENTRIES * sizeof(*table));
+    if (!table) {
+        return -ENOMEM;
+    }
+
+    rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, nonvolatile_mr);
+    if (rc < 0) {
+        return rc;
+    }
+
+    *cdat_table = g_steal_pointer(&table);
+
+    return CT3_CDAT_NUM_ENTRIES;
+}
+
+static void ct3_free_cdat_table(CDATSubHeader **cdat_table, int num, void *priv)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        g_free(cdat_table[i]);
+    }
+    g_free(cdat_table);
+}
+
+static bool cxl_doe_cdat_rsp(DOECap *doe_cap)
+{
+    CDATObject *cdat = &CXL_TYPE3(doe_cap->pdev)->cxl_cstate.cdat;
+    uint16_t ent;
+    void *base;
+    uint32_t len;
+    CDATReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    CDATRsp rsp;
+
+    assert(cdat->entry_len);
+
+    /* Discard if request length mismatched */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(CDATReq), DWORD_BYTE)) {
+        return false;
+    }
+
+    ent = req->entry_handle;
+    base = cdat->entry[ent].base;
+    len = cdat->entry[ent].length;
+
+    rsp = (CDATRsp) {
+        .header = {
+            .vendor_id = CXL_VENDOR_ID,
+            .data_obj_type = CXL_DOE_TABLE_ACCESS,
+            .reserved = 0x0,
+            .length = DIV_ROUND_UP((sizeof(rsp) + len), DWORD_BYTE),
+        },
+        .rsp_code = CXL_DOE_TAB_RSP,
+        .table_type = CXL_DOE_TAB_TYPE_CDAT,
+        .entry_handle = (ent < cdat->entry_len - 1) ?
+                        ent + 1 : CXL_DOE_TAB_ENT_MAX,
+    };
+
+    memcpy(doe_cap->read_mbox, &rsp, sizeof(rsp));
+    memcpy(doe_cap->read_mbox + DIV_ROUND_UP(sizeof(rsp), DWORD_BYTE),
+           base, len);
+
+    doe_cap->read_mbox_len += rsp.header.length;
+
+    return true;
+}
+
+static uint32_t ct3d_config_read(PCIDevice *pci_dev, uint32_t addr, int size)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+    uint32_t val;
+
+    if (pcie_doe_read_config(&ct3d->doe_cdat, addr, size, &val)) {
+        return val;
+    }
+
+    return pci_default_read_config(pci_dev, addr, size);
+}
+
+static void ct3d_config_write(PCIDevice *pci_dev, uint32_t addr, uint32_t val,
+                              int size)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+
+    pcie_doe_write_config(&ct3d->doe_cdat, addr, val, size);
+    pci_default_write_config(pci_dev, addr, val, size);
+}
+
 /*
  * Null value of all Fs suggested by IEEE RA guidelines for use of
  * EU, OUI and CID
@@ -140,6 +377,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
     return true;
 }
 
+static DOEProtocol doe_cdat_prot[] = {
+    { CXL_VENDOR_ID, CXL_DOE_TABLE_ACCESS, cxl_doe_cdat_rsp },
+    { }
+};
+
 static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 {
     CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
@@ -189,6 +431,14 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     for (i = 0; i < msix_num; i++) {
         msix_vector_use(pci_dev, i);
     }
+
+    /* DOE Initailization */
+    pcie_doe_init(pci_dev, &ct3d->doe_cdat, 0x190, doe_cdat_prot, true, 0);
+
+    cxl_cstate->cdat.build_cdat_table = ct3_build_cdat_table;
+    cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
+    cxl_cstate->cdat.private = ct3d;
+    cxl_doe_cdat_init(cxl_cstate, errp);
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
@@ -197,6 +447,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     CXLComponentState *cxl_cstate = &ct3d->cxl_cstate;
     ComponentRegisters *regs = &cxl_cstate->crb;
 
+    cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
     address_space_destroy(&ct3d->hostmem_as);
 }
@@ -296,6 +547,7 @@ static Property ct3_props[] = {
     DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
+    DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -361,6 +613,9 @@ static void ct3_class_init(ObjectClass *oc, void *data)
     pc->device_id = 0xd93; /* LVF for now */
     pc->revision = 1;
 
+    pc->config_write = ct3d_config_write;
+    pc->config_read = ct3d_config_read;
+
     set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
     dc->desc = "CXL PMEM Device (Type 3)";
     dc->reset = ct3d_reset;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v9 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
  2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
                   ` (3 preceding siblings ...)
  2022-10-14 15:10 ` [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron
@ 2022-10-14 15:10 ` Jonathan Cameron
  2022-10-25 22:06 ` [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
  5 siblings, 0 replies; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-14 15:10 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, Gregory Price, ira.weiny
  Cc: linuxarm, Jonathan Cameron

This Data Object Exchange Mailbox allows software to query the
latency and bandwidth between ports on the switch. For now
only provide information on routes between the upstream port and
each downstream port (not p2p).

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

--
Changes since v8: Mostly to match the type 3 equivalent
 - Move enum out of function and give it a more descriptive namespace.
---
 hw/pci-bridge/cxl_upstream.c | 195 ++++++++++++++++++++++++++++++++++-
 include/hw/cxl/cxl_cdat.h    |   1 +
 2 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
index a83a3e81e4..9b8b57df9d 100644
--- a/hw/pci-bridge/cxl_upstream.c
+++ b/hw/pci-bridge/cxl_upstream.c
@@ -10,11 +10,12 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "hw/qdev-properties.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/pcie.h"
 #include "hw/pci/pcie_port.h"
 
-#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 1
+#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 2
 
 #define CXL_UPSTREAM_PORT_MSI_OFFSET 0x70
 #define CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET 0x90
@@ -28,6 +29,7 @@ typedef struct CXLUpstreamPort {
 
     /*< public >*/
     CXLComponentState cxl_cstate;
+    DOECap doe_cdat;
 } CXLUpstreamPort;
 
 CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
@@ -60,6 +62,9 @@ static void cxl_usp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
 static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
                                  uint32_t val, int len)
 {
+    CXLUpstreamPort *usp = CXL_USP(d);
+
+    pcie_doe_write_config(&usp->doe_cdat, address, val, len);
     pci_bridge_write_config(d, address, val, len);
     pcie_cap_flr_write_config(d, address, val, len);
     pcie_aer_write_config(d, address, val, len);
@@ -67,6 +72,18 @@ static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
     cxl_usp_dvsec_write_config(d, address, val, len);
 }
 
+static uint32_t cxl_usp_read_config(PCIDevice *d, uint32_t address, int len)
+{
+    CXLUpstreamPort *usp = CXL_USP(d);
+    uint32_t val;
+
+    if (pcie_doe_read_config(&usp->doe_cdat, address, len, &val)) {
+        return val;
+    }
+
+    return pci_default_read_config(d, address, len);
+}
+
 static void latch_registers(CXLUpstreamPort *usp)
 {
     uint32_t *reg_state = usp->cxl_cstate.crb.cache_mem_registers;
@@ -119,6 +136,167 @@ static void build_dvsecs(CXLComponentState *cxl)
                                REG_LOC_DVSEC_REVID, dvsec);
 }
 
+static bool cxl_doe_cdat_rsp(DOECap *doe_cap)
+{
+    CDATObject *cdat = &CXL_USP(doe_cap->pdev)->cxl_cstate.cdat;
+    uint16_t ent;
+    void *base;
+    uint32_t len;
+    CDATReq *req = pcie_doe_get_write_mbox_ptr(doe_cap);
+    CDATRsp rsp;
+
+    cxl_doe_cdat_update(&CXL_USP(doe_cap->pdev)->cxl_cstate, &error_fatal);
+    assert(cdat->entry_len);
+
+    /* Discard if request length mismatched */
+    if (pcie_doe_get_obj_len(req) <
+        DIV_ROUND_UP(sizeof(CDATReq), sizeof(uint32_t))) {
+        return false;
+    }
+
+    ent = req->entry_handle;
+    base = cdat->entry[ent].base;
+    len = cdat->entry[ent].length;
+
+    rsp = (CDATRsp) {
+        .header = {
+            .vendor_id = CXL_VENDOR_ID,
+            .data_obj_type = CXL_DOE_TABLE_ACCESS,
+            .reserved = 0x0,
+            .length = DIV_ROUND_UP((sizeof(rsp) + len), sizeof(uint32_t)),
+        },
+        .rsp_code = CXL_DOE_TAB_RSP,
+        .table_type = CXL_DOE_TAB_TYPE_CDAT,
+        .entry_handle = (ent < cdat->entry_len - 1) ?
+                        ent + 1 : CXL_DOE_TAB_ENT_MAX,
+    };
+
+    memcpy(doe_cap->read_mbox, &rsp, sizeof(rsp));
+        memcpy(doe_cap->read_mbox + DIV_ROUND_UP(sizeof(rsp), sizeof(uint32_t)),
+           base, len);
+
+    doe_cap->read_mbox_len += rsp.header.length;
+
+    return true;
+}
+
+static DOEProtocol doe_cdat_prot[] = {
+    { CXL_VENDOR_ID, CXL_DOE_TABLE_ACCESS, cxl_doe_cdat_rsp },
+    { }
+};
+
+enum {
+    CXL_USP_CDAT_SSLBIS_LAT,
+    CXL_USP_CDAT_SSLBIS_BW,
+    CXL_USP_CDAT_NUM_ENTRIES
+};
+
+static int build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
+{
+    g_autofree CDATSslbis *sslbis_latency = NULL;
+    g_autofree CDATSslbis *sslbis_bandwidth = NULL;
+    CXLUpstreamPort *us = CXL_USP(priv);
+    PCIBus *bus = &PCI_BRIDGE(us)->sec_bus;
+    int devfn, sslbis_size, i;
+    int count = 0;
+    uint16_t port_ids[256];
+
+    for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
+        PCIDevice *d = bus->devices[devfn];
+        PCIEPort *port;
+
+        if (!d || !pci_is_express(d) || !d->exp.exp_cap) {
+            continue;
+        }
+
+        /*
+         * Whilst the PCI express spec doesn't allow anything other than
+         * downstream ports on this bus, let us be a little paranoid
+         */
+        if (!object_dynamic_cast(OBJECT(d), TYPE_PCIE_PORT)) {
+            continue;
+        }
+
+        port = PCIE_PORT(d);
+        port_ids[count] = port->port;
+        count++;
+    }
+
+    /* May not yet have any ports - try again later */
+    if (count == 0) {
+        return 0;
+    }
+
+    sslbis_size = sizeof(CDATSslbis) + sizeof(*sslbis_latency->sslbe) * count;
+    sslbis_latency = g_malloc(sslbis_size);
+    if (!sslbis_latency) {
+        return -ENOMEM;
+    }
+    *sslbis_latency = (CDATSslbis) {
+        .sslbis_header = {
+            .header = {
+                .type = CDAT_TYPE_SSLBIS,
+                .length = sslbis_size,
+            },
+            .data_type = HMATLB_DATA_TYPE_ACCESS_LATENCY,
+            .entry_base_unit = 10000,
+        },
+    };
+
+    for (i = 0; i < count; i++) {
+        sslbis_latency->sslbe[i] = (CDATSslbe) {
+            .port_x_id = CDAT_PORT_ID_USP,
+            .port_y_id = port_ids[i],
+            .latency_bandwidth = 15, /* 150ns */
+        };
+    }
+
+    sslbis_bandwidth = g_malloc(sslbis_size);
+    if (!sslbis_bandwidth) {
+        return 0;
+    }
+    *sslbis_bandwidth = (CDATSslbis) {
+        .sslbis_header = {
+            .header = {
+                .type = CDAT_TYPE_SSLBIS,
+                .length = sslbis_size,
+            },
+            .data_type = HMATLB_DATA_TYPE_ACCESS_BANDWIDTH,
+            .entry_base_unit = 1000,
+        },
+    };
+
+    for (i = 0; i < count; i++) {
+        sslbis_bandwidth->sslbe[i] = (CDATSslbe) {
+            .port_x_id = CDAT_PORT_ID_USP,
+            .port_y_id = port_ids[i],
+            .latency_bandwidth = 16, /* 16 GB/s */
+        };
+    }
+
+    *cdat_table = g_malloc0(sizeof(*cdat_table) * CXL_USP_CDAT_NUM_ENTRIES);
+    if (!*cdat_table) {
+        return -ENOMEM;
+    }
+
+    /* Header always at start of structure */
+    (*cdat_table)[CXL_USP_CDAT_SSLBIS_LAT] = g_steal_pointer(&sslbis_latency);
+    (*cdat_table)[CXL_USP_CDAT_SSLBIS_BW] = g_steal_pointer(&sslbis_bandwidth);
+
+    return CXL_USP_CDAT_NUM_ENTRIES;
+}
+
+static void free_default_cdat_table(CDATSubHeader **cdat_table, int num,
+                                    void *priv)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        g_free(cdat_table[i]);
+    }
+    g_free(cdat_table);
+}
+
 static void cxl_usp_realize(PCIDevice *d, Error **errp)
 {
     PCIEPort *p = PCIE_PORT(d);
@@ -161,6 +339,14 @@ static void cxl_usp_realize(PCIDevice *d, Error **errp)
                      PCI_BASE_ADDRESS_MEM_TYPE_64,
                      component_bar);
 
+    pcie_doe_init(d, &usp->doe_cdat, cxl_cstate->dvsec_offset, doe_cdat_prot,
+                  true, 1);
+
+    cxl_cstate->cdat.build_cdat_table = build_cdat_table;
+    cxl_cstate->cdat.free_cdat_table = free_default_cdat_table;
+    cxl_cstate->cdat.private = d;
+    cxl_doe_cdat_init(cxl_cstate, errp);
+
     return;
 
 err_cap:
@@ -179,6 +365,11 @@ static void cxl_usp_exitfn(PCIDevice *d)
     pci_bridge_exitfn(d);
 }
 
+static Property cxl_upstream_props[] = {
+    DEFINE_PROP_STRING("cdat", CXLUpstreamPort, cxl_cstate.cdat.filename),
+    DEFINE_PROP_END_OF_LIST()
+};
+
 static void cxl_upstream_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
@@ -186,6 +377,7 @@ static void cxl_upstream_class_init(ObjectClass *oc, void *data)
 
     k->is_bridge = true;
     k->config_write = cxl_usp_write_config;
+    k->config_read = cxl_usp_read_config;
     k->realize = cxl_usp_realize;
     k->exit = cxl_usp_exitfn;
     k->vendor_id = 0x19e5; /* Huawei */
@@ -194,6 +386,7 @@ static void cxl_upstream_class_init(ObjectClass *oc, void *data)
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->desc = "CXL Switch Upstream Port";
     dc->reset = cxl_usp_reset;
+    device_class_set_props(dc, cxl_upstream_props);
 }
 
 static const TypeInfo cxl_usp_info = {
diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
index 52c232e912..e9eda00142 100644
--- a/include/hw/cxl/cxl_cdat.h
+++ b/include/hw/cxl/cxl_cdat.h
@@ -131,6 +131,7 @@ typedef struct CDATSslbisHeader {
     uint64_t entry_base_unit;
 } QEMU_PACKED CDATSslbisHeader;
 
+#define CDAT_PORT_ID_USP 0x100
 /* Switch Scoped Latency and Bandwidth Entry - CDAT Table 10 */
 typedef struct CDATSslbe {
     uint16_t port_x_id;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-14 15:10 ` [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron
@ 2022-10-24 17:42   ` Gregory Price
  2022-10-25 10:56     ` Jonathan Cameron
  0 siblings, 1 reply; 11+ messages in thread
From: Gregory Price @ 2022-10-24 17:42 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, ira.weiny, linuxarm

On Fri, Oct 14, 2022 at 04:10:44PM +0100, Jonathan Cameron wrote:
> From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> 
> The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
> in "-device cxl-type3"'s command option. The file is required to provide
> the whole CDAT table in binary mode. The other is to use the default
> that provides some 'reasonable' numbers based on type of memory and
> size.
> 
> The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
> capability offset 0x190. The config read/write to this capability range
> can be generated in the OS to request the CDAT data.
> 
> Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> Signed-off-by: Chris Browy <cbrowy@avery-design.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 

In reviewing this patch under a debug kernel, I'm discovering some
warnings / tracebacks that I want to get a sanity check on.  They appear
to be warnings, but i'm not 100% sure what to make of them.

I get the following stack traces during boot (on the x86 machine).

Removing the type-3 device from the configuration prevents the traceback
from occurring, suggesting it's something to do with that code in
particular (which tracks with the patch here)

configuration:

qemu-system-x86_64 -drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 2G,slots=4,maxmem=4G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
-object memory-backend-file,pmem=true,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=1G \
-object memory-backend-file,pmem=true,id=lsa0,mem-path=/tmp/cxl-lsa0,size=1G \
-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G

==== trace 1 ====

[   30.607343] WARNING: CPU: 0 PID: 23 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
[   30.688367] Modules linked in:
[   30.703848] CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
[   30.794232] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   30.862301] Workqueue: events_unbound async_run_entry_fn
[   30.890460] RIP: 0010:__debug_object_init.cold+0x18/0x183
[   30.917120] Code: 0b 48 89 fe 48 c7 c7 e0 a7 9a b6 e8 50 74 fd ff 0f 0b 83 c0 01 48 89 ee 48 c7 c7 d8 a9 9a b6 89 05 f9 d7 bd 03 e8 36 74 fd 0
[   31.019141] RSP: 0018:ffffbffb800c3a90 EFLAGS: 00010246
[   31.043604] RAX: 0000000000000050 RBX: ffff9bbec8f7c1b8 RCX: 0000000000000000
[   31.103764] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff
[   31.134239] RBP: ffffbffb800c3bb0 R08: 0000000000000000 R09: ffffbffb800c3938
[   31.188841] R10: 0000000000000003 R11: ffffffffb7366368 R12: ffffffffb6440940
[   31.218418] R13: 0000000000130b50 R14: ffffffffb9c33b58 R15: ffffffffb9c33b50
[   31.273389] FS:  0000000000000000(0000) GS:ffff9bbf3da00000(0000) knlGS:0000000000000000
[   31.315480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   31.352369] CR2: 00007fcd1cd57c08 CR3: 0000000049228000 CR4: 00000000000006f0
[   31.397279] Call Trace:
[   31.412605]  <TASK>
[   31.424189]  pci_doe_submit_task+0x5d/0xd0
[   31.459591]  pci_doe_discovery+0xb4/0x100
[   31.478560]  ? pci_doe_xa_destroy+0x10/0x10
[   31.499449]  pcim_doe_create_mb+0x219/0x290
[   31.516835]  cxl_pci_probe+0x192/0x430
[   31.532501]  local_pci_probe+0x41/0x80
[   31.549181]  pci_device_probe+0xb3/0x220
[   31.587266]  really_probe+0xde/0x380
[   31.601778]  ? pm_runtime_barrier+0x50/0x90
[   31.618854]  __driver_probe_device+0x78/0x170
[   31.656184]  driver_probe_device+0x1f/0x90
[   31.675865]  __driver_attach_async_helper+0x5c/0xe0
[   31.692896]  async_run_entry_fn+0x30/0x130
[   31.707297]  process_one_work+0x294/0x5b0
[   31.722375]  worker_thread+0x4f/0x3a0
[   31.741028]  ? process_one_work+0x5b0/0x5b0
[   31.757448]  kthread+0xf5/0x120
[   31.770179]  ? kthread_complete_and_exit+0x20/0x20
[   31.806496]  ret_from_fork+0x22/0x30
[   31.831317]  </TASK>
[   31.840206] irq event stamp: 1597
[   31.853893] hardirqs last  enabled at (1611): [<ffffffffb518e8de>] __up_console_sem+0x5e/0x70
[   31.908805] hardirqs last disabled at (1624): [<ffffffffb518e8c3>] __up_console_sem+0x43/0x70
[   31.965293] softirqs last  enabled at (354): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
[   32.000551] softirqs last disabled at (345): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
[   32.121265] ---[ end trace 0000000000000000 ]---


==== trace 2 ====

[   99.902268] WARNING: CPU: 0 PID: 530 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
[   99.921567] Modules linked in: i2c_i801(+) cxl_mem(+) bochs(+) pcspkr(+) drm_vram_helper i2c_smbus drm_ttm_helper ttm parport_pc(+) lpc_ich pg
[   99.956391] CPU: 0 PID: 530 Comm: systemd-udevd Tainted: G        W         -------  ---  6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 1
[   99.978561] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  100.000618] RIP: 0010:__debug_object_init.cold+0x18/0x183
[  100.011232] Code: 0b 48 89 fe 48 c7 c7 e0 a7 9a b6 e8 50 74 fd ff 0f 0b 83 c0 01 48 89 ee 48 c7 c7 d8 a9 9a b6 89 05 f9 d7 bd 03 e8 36 74 fd 0
[  100.047919] RSP: 0018:ffffbffb80c6b770 EFLAGS: 00010246
[  100.060167] RAX: 0000000000000050 RBX: ffff9bbec9448b40 RCX: 0000000000000000
[  100.080397] RDX: 0000000000000001 RSI: ffffffffb698ef83 RDI: 00000000ffffffff
[  100.099485] RBP: ffffbffb80c6b918 R08: 0000000000000000 R09: ffffbffb80c6b618
[  100.116927] R10: 0000000000000003 R11: ffffffffb7366368 R12: ffffffffb6440940
[  100.134762] R13: 00000000000ec680 R14: ffffffffb9bef688 R15: ffffffffb9bef680
[  100.151144] FS:  00007f485962c580(0000) GS:ffff9bbf3da00000(0000) knlGS:0000000000000000
[  100.168115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.181757] CR2: 000055f120bfa028 CR3: 0000000009598000 CR4: 00000000000006f0
[  100.196873] Call Trace:
[  100.205142]  <TASK>
[  100.211998]  pci_doe_submit_task+0x5d/0xd0
[  100.225314]  cxl_cdat_get_length+0xb8/0x110
[  100.237607]  ? dvsec_range_allowed+0x60/0x60
[  100.251315]  read_cdat_data+0xaf/0x1a0
[  100.259359]  cxl_port_probe+0x80/0x120
[  100.270131]  cxl_bus_probe+0x17/0x50
[  100.280874]  really_probe+0xde/0x380
[  100.290308]  ? pm_runtime_barrier+0x50/0x90
[  100.301589]  __driver_probe_device+0x78/0x170
[  100.313626]  driver_probe_device+0x1f/0x90
[  100.338204]  __device_attach_driver+0x85/0x110
[  100.357875]  ? driver_allows_async_probing+0x70/0x70
[  100.367909]  bus_for_each_drv+0x7a/0xb0
[  100.376261]  __device_attach+0xb3/0x1d0
[  100.384565]  bus_probe_device+0x9f/0xc0
[  100.403420]  device_add+0x41e/0x9b0
[  100.424240]  ? kobject_set_name_vargs+0x6d/0x90
[  100.437082]  ? dev_set_name+0x4b/0x60
[  100.452290]  devm_cxl_add_port+0x27b/0x3b0
[  100.477464]  devm_cxl_add_endpoint+0x82/0x130
[  100.506916]  cxl_mem_probe+0xc4/0x11d [cxl_mem]
[  100.525895]  cxl_bus_probe+0x17/0x50
[  100.556013]  really_probe+0xde/0x380
[  100.567388]  ? pm_runtime_barrier+0x50/0x90
[  100.604187]  __driver_probe_device+0x78/0x170
[  100.623848]  driver_probe_device+0x1f/0x90
[  100.643793]  __driver_attach+0xd5/0x1d0
[  100.655270]  ? __device_attach_driver+0x110/0x110
[  100.668906]  bus_for_each_dev+0x76/0xa0
[  100.684247]  bus_add_driver+0x1b1/0x200
[  100.694768]  driver_register+0x89/0xe0
[  100.706379]  ? 0xffffffffc03e7000
[  100.715846]  do_one_initcall+0x6e/0x320
[  100.726612]  do_init_module+0x4a/0x200
[  100.738882]  __do_sys_init_module+0x16a/0x1a0
[  100.752941]  do_syscall_64+0x5b/0x80
[  100.766652]  ? do_syscall_64+0x67/0x80
[  100.779569]  ? asm_exc_page_fault+0x22/0x30
[  100.794089]  ? lockdep_hardirqs_on+0x7d/0x100
[  100.813769]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  100.829029] RIP: 0033:0x7f485991253e
[  100.840991] Code: 48 8b 0d e5 58 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 8
[  100.904842] RSP: 002b:00007ffd96d322b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[  100.936376] RAX: ffffffffffffffda RBX: 000055f120b724b0 RCX: 00007f485991253e
[  100.971138] RDX: 00007f4859f9643c RSI: 0000000000002dbe RDI: 000055f120bdd000
[  101.016019] RBP: 00007f4859f9643c R08: 000055f120bd47f0 R09: 00007ffd96d2ef1e
[  101.042435] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
[  101.073437] R13: 000055f120bd4530 R14: 0000000000000000 R15: 000055f120bd4860
[  101.120278]  </TASK>
[  101.132024] irq event stamp: 102985
[  101.153401] hardirqs last  enabled at (102999): [<ffffffffb518e8de>] __up_console_sem+0x5e/0x70
[  101.184098] hardirqs last disabled at (103014): [<ffffffffb518e8c3>] __up_console_sem+0x43/0x70
[  101.236473] softirqs last  enabled at (101888): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
[  101.294275] softirqs last disabled at (101757): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160


==== trace 3 ====

[  111.356606] WARNING: CPU: 0 PID: 530 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
[  111.366842] Modules linked in: iTCO_wdt ppdev intel_pmc_bxt iTCO_vendor_support cxl_pmem libnvdimm snd_pcm snd_timer snd e1000e soundcore joyg
[  111.396404] CPU: 0 PID: 530 Comm: systemd-udevd Tainted: G        W         -------  ---  6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 1
[  111.410199] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  111.422370] RIP: 0010:__debug_object_init.cold+0x18/0x183
[  111.429086] Code: 0b 48 89 fe 48 c7 c7 e0 a7 9a b6 e8 50 74 fd ff 0f 0b 83 c0 01 48 89 ee 48 c7 c7 d8 a9 9a b6 89 05 f9 d7 bd 03 e8 36 74 fd 0
[  111.453462] RSP: 0000:ffffbffb80c6b748 EFLAGS: 00010246
[  111.463194] RAX: 0000000000000050 RBX: ffff9bbec84818e8 RCX: 0000000000000000
[  111.472081] RDX: 0000000000000001 RSI: ffffffffb698ef83 RDI: 00000000ffffffff
[  111.481240] RBP: ffffbffb80c6b908 R08: 0000000000000000 R09: ffffbffb80c6b5f0
[  111.490230] R10: 0000000000000003 R11: ffffffffb7366368 R12: ffffffffb6440940
[  111.499437] R13: 00000000000ec680 R14: ffffffffb9bef688 R15: ffffffffb9bef680
[  111.507544] FS:  00007f485962c580(0000) GS:ffff9bbf3da00000(0000) knlGS:0000000000000000
[  111.518473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  111.529110] CR2: 000055ea819dc228 CR3: 0000000009598000 CR4: 00000000000006f0
[  111.539622] Call Trace:
[  111.543604]  <TASK>
[  111.547146]  pci_doe_submit_task+0x5d/0xd0
[  111.553235]  cxl_cdat_read_table.isra.0+0x141/0x190
[  111.559888]  ? dvsec_range_allowed+0x60/0x60
[  111.566950]  read_cdat_data+0xfc/0x1a0
[  111.572438]  cxl_port_probe+0x80/0x120
[  111.579219]  cxl_bus_probe+0x17/0x50
[  111.586051]  really_probe+0xde/0x380
[  111.592435]  ? pm_runtime_barrier+0x50/0x90
[  111.598580]  __driver_probe_device+0x78/0x170
[  111.604947]  driver_probe_device+0x1f/0x90
[  111.610825]  __device_attach_driver+0x85/0x110
[  111.616997]  ? driver_allows_async_probing+0x70/0x70
[  111.623939]  bus_for_each_drv+0x7a/0xb0
[  111.629126]  __device_attach+0xb3/0x1d0
[  111.634907]  bus_probe_device+0x9f/0xc0
[  111.641211]  device_add+0x41e/0x9b0
[  111.650591]  ? kobject_set_name_vargs+0x6d/0x90
[  111.659416]  ? dev_set_name+0x4b/0x60
[  111.665160]  devm_cxl_add_port+0x27b/0x3b0
[  111.671486]  devm_cxl_add_endpoint+0x82/0x130
[  111.677424]  cxl_mem_probe+0xc4/0x11d [cxl_mem]
[  111.684237]  cxl_bus_probe+0x17/0x50
[  111.690079]  really_probe+0xde/0x380
[  111.695487]  ? pm_runtime_barrier+0x50/0x90
[  111.702526]  __driver_probe_device+0x78/0x170
[  111.710317]  driver_probe_device+0x1f/0x90
[  111.716468]  __driver_attach+0xd5/0x1d0
[  111.721786]  ? __device_attach_driver+0x110/0x110
[  111.728178]  bus_for_each_dev+0x76/0xa0
[  111.736175]  bus_add_driver+0x1b1/0x200
[  111.741584]  driver_register+0x89/0xe0
[  111.747214]  ? 0xffffffffc03e7000
[  111.753161]  do_one_initcall+0x6e/0x320
[  111.759145]  do_init_module+0x4a/0x200
[  111.773055]  __do_sys_init_module+0x16a/0x1a0
[  111.782164]  do_syscall_64+0x5b/0x80
[  111.788480]  ? do_syscall_64+0x67/0x80
[  111.795108]  ? asm_exc_page_fault+0x22/0x30
[  111.803255]  ? lockdep_hardirqs_on+0x7d/0x100
[  111.811142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  111.818426] RIP: 0033:0x7f485991253e
[  111.824120] Code: 48 8b 0d e5 58 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 8
[  111.873034] RSP: 002b:00007ffd96d322b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[  111.888776] RAX: ffffffffffffffda RBX: 000055f120b724b0 RCX: 00007f485991253e
[  111.913156] RDX: 00007f4859f9643c RSI: 0000000000002dbe RDI: 000055f120bdd000
[  111.931096] RBP: 00007f4859f9643c R08: 000055f120bd47f0 R09: 00007ffd96d2ef1e
[  111.945030] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
[  111.959393] R13: 000055f120bd4530 R14: 0000000000000000 R15: 000055f120bd4860
[  111.987474]  </TASK>
[  112.002511] irq event stamp: 104291
[  112.011044] hardirqs last  enabled at (104307): [<ffffffffb518e8de>] __up_console_sem+0x5e/0x70
[  112.046936] hardirqs last disabled at (104320): [<ffffffffb518e8c3>] __up_console_sem+0x43/0x70
[  112.066433] softirqs last  enabled at (103208): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
[  112.086461] softirqs last disabled at (103183): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
[  112.107575] ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-24 17:42   ` Gregory Price
@ 2022-10-25 10:56     ` Jonathan Cameron
  2022-10-31 19:51       ` Ira Weiny
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-25 10:56 UTC (permalink / raw)
  To: Gregory Price
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, ira.weiny, linuxarm

On Mon, 24 Oct 2022 13:42:54 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Oct 14, 2022 at 04:10:44PM +0100, Jonathan Cameron wrote:
> > From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > 
> > The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
> > in "-device cxl-type3"'s command option. The file is required to provide
> > the whole CDAT table in binary mode. The other is to use the default
> > that provides some 'reasonable' numbers based on type of memory and
> > size.
> > 
> > The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
> > capability offset 0x190. The config read/write to this capability range
> > can be generated in the OS to request the CDAT data.
> > 
> > Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > Signed-off-by: Chris Browy <cbrowy@avery-design.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >   
> 
> In reviewing this patch under a debug kernel, I'm discovering some
> warnings / tracebacks that I want to get a sanity check on.  They appear
> to be warnings, but i'm not 100% sure what to make of them.
> 
> I get the following stack traces during boot (on the x86 machine).
> 
> Removing the type-3 device from the configuration prevents the traceback
> from occurring, suggesting it's something to do with that code in
> particular (which tracks with the patch here)

Thanks Gregory,

We have an INIT_WORK() in pci_doe_submit_task()
that in the pci_doe_discovery() call is based a work structure that is
on the stack.  As such should be INIT_WORK_ONSTACK()

This is a little tricky becaues there is no particular reason to assume
all struct pci_doe_task instances will be on the stack though they are
today.  We don't really want to break the layering as would be necessary
to have this init at the locations where we otherwise initialize the
containing structure.

My first thought is add an 'onstack' bool to either the pci_doe_submit_task()
or perhaps better would be to add it to the pci_doe_task() and have
macros like DECLARE_CDAT_DOE_TASK_ONSTACK() set it appropriately so we
can call the right INIT_WORK_*() variant later.

Ira / others, which way to go to fix this?

I'll also reply to the last version of that series to make sure people see
this discussion.

Jonathan



> 
> configuration:
> 
> qemu-system-x86_64 -drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
> -m 2G,slots=4,maxmem=4G \
> -smp 4 \
> -machine type=q35,accel=kvm,cxl=on \
> -enable-kvm \
> -nographic \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
> -object memory-backend-file,pmem=true,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=1G \
> -object memory-backend-file,pmem=true,id=lsa0,mem-path=/tmp/cxl-lsa0,size=1G \
> -device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 \
> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> 
> ==== trace 1 ====
> 
> [   30.607343] WARNING: CPU: 0 PID: 23 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
> [   30.688367] Modules linked in:
> [   30.703848] CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> [   30.794232] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [   30.862301] Workqueue: events_unbound async_run_entry_fn
> [   30.890460] RIP: 0010:__debug_object_init.cold+0x18/0x183
> [   30.917120] Code: 0b 48 89 fe 48 c7 c7 e0 a7 9a b6 e8 50 74 fd ff 0f 0b 83 c0 01 48 89 ee 48 c7 c7 d8 a9 9a b6 89 05 f9 d7 bd 03 e8 36 74 fd 0
> [   31.019141] RSP: 0018:ffffbffb800c3a90 EFLAGS: 00010246
> [   31.043604] RAX: 0000000000000050 RBX: ffff9bbec8f7c1b8 RCX: 0000000000000000
> [   31.103764] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff
> [   31.134239] RBP: ffffbffb800c3bb0 R08: 0000000000000000 R09: ffffbffb800c3938
> [   31.188841] R10: 0000000000000003 R11: ffffffffb7366368 R12: ffffffffb6440940
> [   31.218418] R13: 0000000000130b50 R14: ffffffffb9c33b58 R15: ffffffffb9c33b50
> [   31.273389] FS:  0000000000000000(0000) GS:ffff9bbf3da00000(0000) knlGS:0000000000000000
> [   31.315480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   31.352369] CR2: 00007fcd1cd57c08 CR3: 0000000049228000 CR4: 00000000000006f0
> [   31.397279] Call Trace:
> [   31.412605]  <TASK>
> [   31.424189]  pci_doe_submit_task+0x5d/0xd0
> [   31.459591]  pci_doe_discovery+0xb4/0x100
> [   31.478560]  ? pci_doe_xa_destroy+0x10/0x10
> [   31.499449]  pcim_doe_create_mb+0x219/0x290
> [   31.516835]  cxl_pci_probe+0x192/0x430
> [   31.532501]  local_pci_probe+0x41/0x80
> [   31.549181]  pci_device_probe+0xb3/0x220
> [   31.587266]  really_probe+0xde/0x380
> [   31.601778]  ? pm_runtime_barrier+0x50/0x90
> [   31.618854]  __driver_probe_device+0x78/0x170
> [   31.656184]  driver_probe_device+0x1f/0x90
> [   31.675865]  __driver_attach_async_helper+0x5c/0xe0
> [   31.692896]  async_run_entry_fn+0x30/0x130
> [   31.707297]  process_one_work+0x294/0x5b0
> [   31.722375]  worker_thread+0x4f/0x3a0
> [   31.741028]  ? process_one_work+0x5b0/0x5b0
> [   31.757448]  kthread+0xf5/0x120
> [   31.770179]  ? kthread_complete_and_exit+0x20/0x20
> [   31.806496]  ret_from_fork+0x22/0x30
> [   31.831317]  </TASK>
> [   31.840206] irq event stamp: 1597
> [   31.853893] hardirqs last  enabled at (1611): [<ffffffffb518e8de>] __up_console_sem+0x5e/0x70
> [   31.908805] hardirqs last disabled at (1624): [<ffffffffb518e8c3>] __up_console_sem+0x43/0x70
> [   31.965293] softirqs last  enabled at (354): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
> [   32.000551] softirqs last disabled at (345): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
> [   32.121265] ---[ end trace 0000000000000000 ]---
> 
> 
> ==== trace 2 ====
> 
> [   99.902268] WARNING: CPU: 0 PID: 530 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
> [   99.921567] Modules linked in: i2c_i801(+) cxl_mem(+) bochs(+) pcspkr(+) drm_vram_helper i2c_smbus drm_ttm_helper ttm parport_pc(+) lpc_ich pg
> [   99.956391] CPU: 0 PID: 530 Comm: systemd-udevd Tainted: G        W         -------  ---  6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 1
> [   99.978561] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [  100.000618] RIP: 0010:__debug_object_init.cold+0x18/0x183
> [  100.011232] Code: 0b 48 89 fe 48 c7 c7 e0 a7 9a b6 e8 50 74 fd ff 0f 0b 83 c0 01 48 89 ee 48 c7 c7 d8 a9 9a b6 89 05 f9 d7 bd 03 e8 36 74 fd 0
> [  100.047919] RSP: 0018:ffffbffb80c6b770 EFLAGS: 00010246
> [  100.060167] RAX: 0000000000000050 RBX: ffff9bbec9448b40 RCX: 0000000000000000
> [  100.080397] RDX: 0000000000000001 RSI: ffffffffb698ef83 RDI: 00000000ffffffff
> [  100.099485] RBP: ffffbffb80c6b918 R08: 0000000000000000 R09: ffffbffb80c6b618
> [  100.116927] R10: 0000000000000003 R11: ffffffffb7366368 R12: ffffffffb6440940
> [  100.134762] R13: 00000000000ec680 R14: ffffffffb9bef688 R15: ffffffffb9bef680
> [  100.151144] FS:  00007f485962c580(0000) GS:ffff9bbf3da00000(0000) knlGS:0000000000000000
> [  100.168115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  100.181757] CR2: 000055f120bfa028 CR3: 0000000009598000 CR4: 00000000000006f0
> [  100.196873] Call Trace:
> [  100.205142]  <TASK>
> [  100.211998]  pci_doe_submit_task+0x5d/0xd0
> [  100.225314]  cxl_cdat_get_length+0xb8/0x110
> [  100.237607]  ? dvsec_range_allowed+0x60/0x60
> [  100.251315]  read_cdat_data+0xaf/0x1a0
> [  100.259359]  cxl_port_probe+0x80/0x120
> [  100.270131]  cxl_bus_probe+0x17/0x50
> [  100.280874]  really_probe+0xde/0x380
> [  100.290308]  ? pm_runtime_barrier+0x50/0x90
> [  100.301589]  __driver_probe_device+0x78/0x170
> [  100.313626]  driver_probe_device+0x1f/0x90
> [  100.338204]  __device_attach_driver+0x85/0x110
> [  100.357875]  ? driver_allows_async_probing+0x70/0x70
> [  100.367909]  bus_for_each_drv+0x7a/0xb0
> [  100.376261]  __device_attach+0xb3/0x1d0
> [  100.384565]  bus_probe_device+0x9f/0xc0
> [  100.403420]  device_add+0x41e/0x9b0
> [  100.424240]  ? kobject_set_name_vargs+0x6d/0x90
> [  100.437082]  ? dev_set_name+0x4b/0x60
> [  100.452290]  devm_cxl_add_port+0x27b/0x3b0
> [  100.477464]  devm_cxl_add_endpoint+0x82/0x130
> [  100.506916]  cxl_mem_probe+0xc4/0x11d [cxl_mem]
> [  100.525895]  cxl_bus_probe+0x17/0x50
> [  100.556013]  really_probe+0xde/0x380
> [  100.567388]  ? pm_runtime_barrier+0x50/0x90
> [  100.604187]  __driver_probe_device+0x78/0x170
> [  100.623848]  driver_probe_device+0x1f/0x90
> [  100.643793]  __driver_attach+0xd5/0x1d0
> [  100.655270]  ? __device_attach_driver+0x110/0x110
> [  100.668906]  bus_for_each_dev+0x76/0xa0
> [  100.684247]  bus_add_driver+0x1b1/0x200
> [  100.694768]  driver_register+0x89/0xe0
> [  100.706379]  ? 0xffffffffc03e7000
> [  100.715846]  do_one_initcall+0x6e/0x320
> [  100.726612]  do_init_module+0x4a/0x200
> [  100.738882]  __do_sys_init_module+0x16a/0x1a0
> [  100.752941]  do_syscall_64+0x5b/0x80
> [  100.766652]  ? do_syscall_64+0x67/0x80
> [  100.779569]  ? asm_exc_page_fault+0x22/0x30
> [  100.794089]  ? lockdep_hardirqs_on+0x7d/0x100
> [  100.813769]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  100.829029] RIP: 0033:0x7f485991253e
> [  100.840991] Code: 48 8b 0d e5 58 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 8
> [  100.904842] RSP: 002b:00007ffd96d322b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> [  100.936376] RAX: ffffffffffffffda RBX: 000055f120b724b0 RCX: 00007f485991253e
> [  100.971138] RDX: 00007f4859f9643c RSI: 0000000000002dbe RDI: 000055f120bdd000
> [  101.016019] RBP: 00007f4859f9643c R08: 000055f120bd47f0 R09: 00007ffd96d2ef1e
> [  101.042435] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
> [  101.073437] R13: 000055f120bd4530 R14: 0000000000000000 R15: 000055f120bd4860
> [  101.120278]  </TASK>
> [  101.132024] irq event stamp: 102985
> [  101.153401] hardirqs last  enabled at (102999): [<ffffffffb518e8de>] __up_console_sem+0x5e/0x70
> [  101.184098] hardirqs last disabled at (103014): [<ffffffffb518e8c3>] __up_console_sem+0x43/0x70
> [  101.236473] softirqs last  enabled at (101888): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
> [  101.294275] softirqs last disabled at (101757): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
> 
> 
> ==== trace 3 ====
> 
> [  111.356606] WARNING: CPU: 0 PID: 530 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
> [  111.366842] Modules linked in: iTCO_wdt ppdev intel_pmc_bxt iTCO_vendor_support cxl_pmem libnvdimm snd_pcm snd_timer snd e1000e soundcore joyg
> [  111.396404] CPU: 0 PID: 530 Comm: systemd-udevd Tainted: G        W         -------  ---  6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 1
> [  111.410199] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [  111.422370] RIP: 0010:__debug_object_init.cold+0x18/0x183
> [  111.429086] Code: 0b 48 89 fe 48 c7 c7 e0 a7 9a b6 e8 50 74 fd ff 0f 0b 83 c0 01 48 89 ee 48 c7 c7 d8 a9 9a b6 89 05 f9 d7 bd 03 e8 36 74 fd 0
> [  111.453462] RSP: 0000:ffffbffb80c6b748 EFLAGS: 00010246
> [  111.463194] RAX: 0000000000000050 RBX: ffff9bbec84818e8 RCX: 0000000000000000
> [  111.472081] RDX: 0000000000000001 RSI: ffffffffb698ef83 RDI: 00000000ffffffff
> [  111.481240] RBP: ffffbffb80c6b908 R08: 0000000000000000 R09: ffffbffb80c6b5f0
> [  111.490230] R10: 0000000000000003 R11: ffffffffb7366368 R12: ffffffffb6440940
> [  111.499437] R13: 00000000000ec680 R14: ffffffffb9bef688 R15: ffffffffb9bef680
> [  111.507544] FS:  00007f485962c580(0000) GS:ffff9bbf3da00000(0000) knlGS:0000000000000000
> [  111.518473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  111.529110] CR2: 000055ea819dc228 CR3: 0000000009598000 CR4: 00000000000006f0
> [  111.539622] Call Trace:
> [  111.543604]  <TASK>
> [  111.547146]  pci_doe_submit_task+0x5d/0xd0
> [  111.553235]  cxl_cdat_read_table.isra.0+0x141/0x190
> [  111.559888]  ? dvsec_range_allowed+0x60/0x60
> [  111.566950]  read_cdat_data+0xfc/0x1a0
> [  111.572438]  cxl_port_probe+0x80/0x120
> [  111.579219]  cxl_bus_probe+0x17/0x50
> [  111.586051]  really_probe+0xde/0x380
> [  111.592435]  ? pm_runtime_barrier+0x50/0x90
> [  111.598580]  __driver_probe_device+0x78/0x170
> [  111.604947]  driver_probe_device+0x1f/0x90
> [  111.610825]  __device_attach_driver+0x85/0x110
> [  111.616997]  ? driver_allows_async_probing+0x70/0x70
> [  111.623939]  bus_for_each_drv+0x7a/0xb0
> [  111.629126]  __device_attach+0xb3/0x1d0
> [  111.634907]  bus_probe_device+0x9f/0xc0
> [  111.641211]  device_add+0x41e/0x9b0
> [  111.650591]  ? kobject_set_name_vargs+0x6d/0x90
> [  111.659416]  ? dev_set_name+0x4b/0x60
> [  111.665160]  devm_cxl_add_port+0x27b/0x3b0
> [  111.671486]  devm_cxl_add_endpoint+0x82/0x130
> [  111.677424]  cxl_mem_probe+0xc4/0x11d [cxl_mem]
> [  111.684237]  cxl_bus_probe+0x17/0x50
> [  111.690079]  really_probe+0xde/0x380
> [  111.695487]  ? pm_runtime_barrier+0x50/0x90
> [  111.702526]  __driver_probe_device+0x78/0x170
> [  111.710317]  driver_probe_device+0x1f/0x90
> [  111.716468]  __driver_attach+0xd5/0x1d0
> [  111.721786]  ? __device_attach_driver+0x110/0x110
> [  111.728178]  bus_for_each_dev+0x76/0xa0
> [  111.736175]  bus_add_driver+0x1b1/0x200
> [  111.741584]  driver_register+0x89/0xe0
> [  111.747214]  ? 0xffffffffc03e7000
> [  111.753161]  do_one_initcall+0x6e/0x320
> [  111.759145]  do_init_module+0x4a/0x200
> [  111.773055]  __do_sys_init_module+0x16a/0x1a0
> [  111.782164]  do_syscall_64+0x5b/0x80
> [  111.788480]  ? do_syscall_64+0x67/0x80
> [  111.795108]  ? asm_exc_page_fault+0x22/0x30
> [  111.803255]  ? lockdep_hardirqs_on+0x7d/0x100
> [  111.811142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  111.818426] RIP: 0033:0x7f485991253e
> [  111.824120] Code: 48 8b 0d e5 58 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 8
> [  111.873034] RSP: 002b:00007ffd96d322b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> [  111.888776] RAX: ffffffffffffffda RBX: 000055f120b724b0 RCX: 00007f485991253e
> [  111.913156] RDX: 00007f4859f9643c RSI: 0000000000002dbe RDI: 000055f120bdd000
> [  111.931096] RBP: 00007f4859f9643c R08: 000055f120bd47f0 R09: 00007ffd96d2ef1e
> [  111.945030] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
> [  111.959393] R13: 000055f120bd4530 R14: 0000000000000000 R15: 000055f120bd4860
> [  111.987474]  </TASK>
> [  112.002511] irq event stamp: 104291
> [  112.011044] hardirqs last  enabled at (104307): [<ffffffffb518e8de>] __up_console_sem+0x5e/0x70
> [  112.046936] hardirqs last disabled at (104320): [<ffffffffb518e8c3>] __up_console_sem+0x43/0x70
> [  112.066433] softirqs last  enabled at (103208): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
> [  112.086461] softirqs last disabled at (103183): [<ffffffffb51012ed>] __irq_exit_rcu+0xed/0x160
> [  112.107575] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.
  2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
                   ` (4 preceding siblings ...)
  2022-10-14 15:10 ` [PATCH v9 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE Jonathan Cameron
@ 2022-10-25 22:06 ` Jonathan Cameron
  2022-10-25 22:30   ` Michael S. Tsirkin
  5 siblings, 1 reply; 11+ messages in thread
From: Jonathan Cameron @ 2022-10-25 22:06 UTC (permalink / raw)
  To: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, Gregory Price, ira.weiny
  Cc: Jonathan Cameron


Hi Michael,

Any chance of this making 7.2?  Gregory has identified a bug
in the Linux kernel support that we've reported (debug marking
related). I don't think there are any other queries outstanding on this.

I've been holding off on sending other features to focus on
getting this in, but given timing those will probably have to wait
for next cycle now along with Gregory's series for volatile support
(which we don't have OS support for yet anyway).

Thanks,

Jonathan

On Fri, 14 Oct 2022 16:10:40 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> Changes since v8:
>  - Take the entry enums out of the functions and prefix them
>    appropriately.
>  - Use the visibility of *_NUM_ENTRIES to allocate the cdat_table
>  - Fix volatile_mr -> nonvolatile_mr
> 
> V7 Cover letter - lightly edited.
> 
> Whilst I have carried on Huai-Cheng Kuo's series version numbering and
> naming, there have been very substantial changes since v6 so I would
> suggest fresh review makes sense for anyone who has looked at this before.
> In particularly if the Avery design folks could check I haven't broken
> anything that would be great.
> 
> For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
> https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/
> 
> Summary of changes:
> 1) Linux headers definitions for DOE are now upstream so drop that patch.
> 2) Add CDAT for switch upstream port.
> 3) Generate 'plausible' default CDAT tables when a file is not provided.
> 4) General refactoring to calculate the correct table sizes and allocate
>    based on that rather than copying from a local static array.
> 5) Changes from earlier reviews such as matching QEMU type naming style.
> 6) Moved compliance and SPDM usecases to future patch sets.
> 
> Sign-offs on these are complex because the patches were originally developed
> by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
> Cameron who made substantial changes.
> 
> Huai-Cheng Kuo confirmed they are happy to maintain this updated code.
> 
> What's here?
> 
> This series brings generic PCI Express Data Object Exchange support (DOE)
> DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
> config space via a PCIe Extended Capability Structure.
> The PCIe spec defines several protocols (including one to discover what
> protocols a given DOE instance supports) and other specification such as
> CXL define additional protocols using their own vendor IDs.
> 
> In this series we make use of the DOE to support the CXL spec defined
> Table Access Protocol, specifically to provide access to CDAT - a
> table specified in a specification that is hosted by the UEFI forum
> and is used to provide runtime discoverability of the sort of information
> that would otherwise be available in firmware tables (memory types,
> latency and bandwidth information etc).
> 
> The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
> The version merged did not support interrupts (earlier versions did
> so that support in the emulation was tested a while back).
> 
> This series provides CDAT emulation for CXL switch upstream ports
> and CXL type 3 memory devices. Note that to exercise the switch support
> additional Linux kernel patches are needed.
> https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
> (I'll post a new version of that support shortly)
> 
> Additional protocols will be supported by follow on patch sets:
> * CXL compliance protocol.
> * CMA / SPDM device attestation.
> (Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
> that tree next week)
> Huai-Cheng Kuo (3):
>   hw/pci: PCIe Data Object Exchange emulation
>   hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
>   hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
> 
> Jonathan Cameron (2):
>   hw/mem/cxl-type3: Add MSIX support
>   hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
> 
>  MAINTAINERS                    |   7 +
>  hw/cxl/cxl-cdat.c              | 224 ++++++++++++++++++++
>  hw/cxl/meson.build             |   1 +
>  hw/mem/cxl_type3.c             | 264 ++++++++++++++++++++++++
>  hw/pci-bridge/cxl_upstream.c   | 195 +++++++++++++++++-
>  hw/pci/meson.build             |   1 +
>  hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
>  include/hw/cxl/cxl_component.h |   7 +
>  include/hw/cxl/cxl_device.h    |   3 +
>  include/hw/cxl/cxl_pci.h       |   1 +
>  include/hw/pci/pci_ids.h       |   3 +
>  include/hw/pci/pcie.h          |   1 +
>  include/hw/pci/pcie_doe.h      | 123 +++++++++++
>  include/hw/pci/pcie_regs.h     |   4 +
>  15 files changed, 1366 insertions(+), 1 deletion(-)
>  create mode 100644 hw/cxl/cxl-cdat.c
>  create mode 100644 hw/pci/pcie_doe.c
>  create mode 100644 include/hw/cxl/cxl_cdat.h
>  create mode 100644 include/hw/pci/pcie_doe.h
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.
  2022-10-25 22:06 ` [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
@ 2022-10-25 22:30   ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2022-10-25 22:30 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-devel, Ben Widawsky, linux-cxl, Huai-Cheng Kuo, Chris Browy,
	Gregory Price, ira.weiny

Queued for next pull, thanks!

On Tue, Oct 25, 2022 at 11:06:39PM +0100, Jonathan Cameron wrote:
> 
> Hi Michael,
> 
> Any chance of this making 7.2?  Gregory has identified a bug
> in the Linux kernel support that we've reported (debug marking
> related). I don't think there are any other queries outstanding on this.
> 
> I've been holding off on sending other features to focus on
> getting this in, but given timing those will probably have to wait
> for next cycle now along with Gregory's series for volatile support
> (which we don't have OS support for yet anyway).
> 
> Thanks,
> 
> Jonathan
> 
> On Fri, 14 Oct 2022 16:10:40 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > Changes since v8:
> >  - Take the entry enums out of the functions and prefix them
> >    appropriately.
> >  - Use the visibility of *_NUM_ENTRIES to allocate the cdat_table
> >  - Fix volatile_mr -> nonvolatile_mr
> > 
> > V7 Cover letter - lightly edited.
> > 
> > Whilst I have carried on Huai-Cheng Kuo's series version numbering and
> > naming, there have been very substantial changes since v6 so I would
> > suggest fresh review makes sense for anyone who has looked at this before.
> > In particularly if the Avery design folks could check I haven't broken
> > anything that would be great.
> > 
> > For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
> > https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbrowy@avery-design.com/
> > 
> > Summary of changes:
> > 1) Linux headers definitions for DOE are now upstream so drop that patch.
> > 2) Add CDAT for switch upstream port.
> > 3) Generate 'plausible' default CDAT tables when a file is not provided.
> > 4) General refactoring to calculate the correct table sizes and allocate
> >    based on that rather than copying from a local static array.
> > 5) Changes from earlier reviews such as matching QEMU type naming style.
> > 6) Moved compliance and SPDM usecases to future patch sets.
> > 
> > Sign-offs on these are complex because the patches were originally developed
> > by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
> > Cameron who made substantial changes.
> > 
> > Huai-Cheng Kuo confirmed they are happy to maintain this updated code.
> > 
> > What's here?
> > 
> > This series brings generic PCI Express Data Object Exchange support (DOE)
> > DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
> > config space via a PCIe Extended Capability Structure.
> > The PCIe spec defines several protocols (including one to discover what
> > protocols a given DOE instance supports) and other specification such as
> > CXL define additional protocols using their own vendor IDs.
> > 
> > In this series we make use of the DOE to support the CXL spec defined
> > Table Access Protocol, specifically to provide access to CDAT - a
> > table specified in a specification that is hosted by the UEFI forum
> > and is used to provide runtime discoverability of the sort of information
> > that would otherwise be available in firmware tables (memory types,
> > latency and bandwidth information etc).
> > 
> > The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
> > The version merged did not support interrupts (earlier versions did
> > so that support in the emulation was tested a while back).
> > 
> > This series provides CDAT emulation for CXL switch upstream ports
> > and CXL type 3 memory devices. Note that to exercise the switch support
> > additional Linux kernel patches are needed.
> > https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/
> > (I'll post a new version of that support shortly)
> > 
> > Additional protocols will be supported by follow on patch sets:
> > * CXL compliance protocol.
> > * CMA / SPDM device attestation.
> > (Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will refresh
> > that tree next week)
> > Huai-Cheng Kuo (3):
> >   hw/pci: PCIe Data Object Exchange emulation
> >   hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
> >   hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
> > 
> > Jonathan Cameron (2):
> >   hw/mem/cxl-type3: Add MSIX support
> >   hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
> > 
> >  MAINTAINERS                    |   7 +
> >  hw/cxl/cxl-cdat.c              | 224 ++++++++++++++++++++
> >  hw/cxl/meson.build             |   1 +
> >  hw/mem/cxl_type3.c             | 264 ++++++++++++++++++++++++
> >  hw/pci-bridge/cxl_upstream.c   | 195 +++++++++++++++++-
> >  hw/pci/meson.build             |   1 +
> >  hw/pci/pcie_doe.c              | 367 +++++++++++++++++++++++++++++++++
> >  include/hw/cxl/cxl_cdat.h      | 166 +++++++++++++++
> >  include/hw/cxl/cxl_component.h |   7 +
> >  include/hw/cxl/cxl_device.h    |   3 +
> >  include/hw/cxl/cxl_pci.h       |   1 +
> >  include/hw/pci/pci_ids.h       |   3 +
> >  include/hw/pci/pcie.h          |   1 +
> >  include/hw/pci/pcie_doe.h      | 123 +++++++++++
> >  include/hw/pci/pcie_regs.h     |   4 +
> >  15 files changed, 1366 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/cxl/cxl-cdat.c
> >  create mode 100644 hw/pci/pcie_doe.c
> >  create mode 100644 include/hw/cxl/cxl_cdat.h
> >  create mode 100644 include/hw/pci/pcie_doe.h
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
  2022-10-25 10:56     ` Jonathan Cameron
@ 2022-10-31 19:51       ` Ira Weiny
  0 siblings, 0 replies; 11+ messages in thread
From: Ira Weiny @ 2022-10-31 19:51 UTC (permalink / raw)
  To: Jonathan Cameron, Gregory Price
  Cc: qemu-devel, Michael Tsirkin, Ben Widawsky, linux-cxl,
	Huai-Cheng Kuo, Chris Browy, linuxarm

On Tue, Oct 25, 2022 at 11:56:17AM +0100, Jonathan Cameron wrote:
> On Mon, 24 Oct 2022 13:42:54 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Fri, Oct 14, 2022 at 04:10:44PM +0100, Jonathan Cameron wrote:
> > > From: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > 
> > > The CDAT can be specified in two ways. One is to add ",cdat=<filename>"
> > > in "-device cxl-type3"'s command option. The file is required to provide
> > > the whole CDAT table in binary mode. The other is to use the default
> > > that provides some 'reasonable' numbers based on type of memory and
> > > size.
> > > 
> > > The DOE capability supporting CDAT is added to hw/mem/cxl_type3.c with
> > > capability offset 0x190. The config read/write to this capability range
> > > can be generated in the OS to request the CDAT data.
> > > 
> > > Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > Signed-off-by: Chris Browy <cbrowy@avery-design.com>
> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >   
> > 
> > In reviewing this patch under a debug kernel, I'm discovering some
> > warnings / tracebacks that I want to get a sanity check on.  They appear
> > to be warnings, but i'm not 100% sure what to make of them.
> > 
> > I get the following stack traces during boot (on the x86 machine).
> > 
> > Removing the type-3 device from the configuration prevents the traceback
> > from occurring, suggesting it's something to do with that code in
> > particular (which tracks with the patch here)
> 
> Thanks Gregory,
> 
> We have an INIT_WORK() in pci_doe_submit_task()
> that in the pci_doe_discovery() call is based a work structure that is
> on the stack.  As such should be INIT_WORK_ONSTACK()
> 
> This is a little tricky becaues there is no particular reason to assume
> all struct pci_doe_task instances will be on the stack though they are
> today.  We don't really want to break the layering as would be necessary
> to have this init at the locations where we otherwise initialize the
> containing structure.
> 
> My first thought is add an 'onstack' bool to either the pci_doe_submit_task()
> or perhaps better would be to add it to the pci_doe_task() and have
> macros like DECLARE_CDAT_DOE_TASK_ONSTACK() set it appropriately so we
> can call the right INIT_WORK_*() variant later.
> 
> Ira / others, which way to go to fix this?

Yes Jonathan is on the right track here.  Though I was confused why no one had
ever seen this till now.  I see it was because I never ran with the
CONFIG_DEBUG_OBJECTS_WORK option.

While we could have a declaration macro.  I think the best solution is to
separate the task from the internal implementation; see patch below.  I was
never fully happy with the idea of having the work struct in the user visible
task object anyway.

> 
> I'll also reply to the last version of that series to make sure people see
> this discussion.

Thanks I've been looking for time to look at this series and have missed it.
So the ping over there helped!  :-D

Ira

From 9e16d5e399723412acbaec1bb9be807d5e5bf7fc Mon Sep 17 00:00:00 2001
From: Ira Weiny <ira.weiny@intel.com>
Date: Mon, 31 Oct 2022 11:31:30 -0700
Subject: [RFC PATCH] PCI/doe: Fix work struct declaration

The callers of pci_doe_submit_task() allocate the pci_doe_task on the
stack.  This causes the work structure to be allocated on the stack
without pci_doe_submit_task() knowing.  Work item initialization needs
to be done with either INIT_WORK_ONSTACK() or INIT_WORK() depending on
how the work item is allocated.

Jonathan suggested creating doe task allocation macros such as
DECLARE_CDAT_DOE_TASK_ONSTACK().  This would work, however, having the
work structure be part of pci_doe_task seems like a layering violation.

Create an internal struct pci_doe_work which represents the work and use
this for the work queue.

RFC because I'm wondering if a gfp_t flags parameter should be added to
pci_doe_submit_task() or not.

[1] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m88a7f50dcce52f30c8bf5c3dcc06fa9843b54a2d

Reported-by: Gregory Price <gregory.price@memverge.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/pci/doe.c       | 56 +++++++++++++++++++++++++++--------------
 include/linux/pci-doe.h |  4 ---
 2 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index e402f05068a5..6a9fa57e2cac 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -29,6 +29,12 @@
 #define PCI_DOE_FLAG_CANCEL	0
 #define PCI_DOE_FLAG_DEAD	1
 
+struct pci_doe_work {
+	struct pci_doe_task *task;
+	struct work_struct work;
+	struct pci_doe_mb *doe_mb;
+};
+
 /**
  * struct pci_doe_mb - State for a single DOE mailbox
  *
@@ -214,9 +220,9 @@ static void signal_task_complete(struct pci_doe_task *task, int rv)
 	task->complete(task);
 }
 
-static void signal_task_abort(struct pci_doe_task *task, int rv)
+static void signal_task_abort(struct pci_doe_mb *doe_mb, struct pci_doe_task *task,
+			      int rv)
 {
-	struct pci_doe_mb *doe_mb = task->doe_mb;
 	struct pci_dev *pdev = doe_mb->pdev;
 
 	if (pci_doe_abort(doe_mb)) {
@@ -233,9 +239,10 @@ static void signal_task_abort(struct pci_doe_task *task, int rv)
 
 static void doe_statemachine_work(struct work_struct *work)
 {
-	struct pci_doe_task *task = container_of(work, struct pci_doe_task,
-						 work);
-	struct pci_doe_mb *doe_mb = task->doe_mb;
+	struct pci_doe_work *doe_work = container_of(work, struct pci_doe_work,
+						     work);
+	struct pci_doe_task *task = doe_work->task;
+	struct pci_doe_mb *doe_mb = doe_work->doe_mb;
 	struct pci_dev *pdev = doe_mb->pdev;
 	int offset = doe_mb->cap_offset;
 	unsigned long timeout_jiffies;
@@ -244,7 +251,7 @@ static void doe_statemachine_work(struct work_struct *work)
 
 	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags)) {
 		signal_task_complete(task, -EIO);
-		return;
+		goto free_work;
 	}
 
 	/* Send request */
@@ -260,8 +267,8 @@ static void doe_statemachine_work(struct work_struct *work)
 		if (rc == -EBUSY)
 			dev_err_ratelimited(&pdev->dev, "[%x] busy detected; another entity is sending conflicting requests\n",
 					    offset);
-		signal_task_abort(task, rc);
-		return;
+		signal_task_abort(doe_mb, task, rc);
+		goto free_work;
 	}
 
 	timeout_jiffies = jiffies + PCI_DOE_TIMEOUT;
@@ -269,30 +276,32 @@ static void doe_statemachine_work(struct work_struct *work)
 retry_resp:
 	pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
 	if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
-		signal_task_abort(task, -EIO);
-		return;
+		signal_task_abort(doe_mb, task, -EIO);
+		goto free_work;
 	}
 
 	if (!FIELD_GET(PCI_DOE_STATUS_DATA_OBJECT_READY, val)) {
 		if (time_after(jiffies, timeout_jiffies)) {
-			signal_task_abort(task, -EIO);
-			return;
+			signal_task_abort(doe_mb, task, -EIO);
+			goto free_work;
 		}
 		rc = pci_doe_wait(doe_mb, PCI_DOE_POLL_INTERVAL);
 		if (rc) {
-			signal_task_abort(task, rc);
-			return;
+			signal_task_abort(doe_mb, task, rc);
+			goto free_work;
 		}
 		goto retry_resp;
 	}
 
 	rc  = pci_doe_recv_resp(doe_mb, task);
 	if (rc < 0) {
-		signal_task_abort(task, rc);
-		return;
+		signal_task_abort(doe_mb, task, rc);
+		goto free_work;
 	}
 
 	signal_task_complete(task, rc);
+free_work:
+	kfree(doe_work);
 }
 
 static void pci_doe_task_complete(struct pci_doe_task *task)
@@ -510,10 +519,14 @@ EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
  *
  * Excess data will be discarded.
  *
+ * Context: non-atomic; allocates with GFP_KERNEL
+ *
  * RETURNS: 0 when task has been successfully queued, -ERRNO on error
  */
 int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 {
+	struct pci_doe_work *work;
+
 	if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type))
 		return -EINVAL;
 
@@ -528,9 +541,14 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
 		return -EIO;
 
-	task->doe_mb = doe_mb;
-	INIT_WORK(&task->work, doe_statemachine_work);
-	queue_work(doe_mb->work_queue, &task->work);
+	work = kzalloc(sizeof(*work), GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	work->task = task;
+	work->doe_mb = doe_mb;
+	INIT_WORK(&work->work, doe_statemachine_work);
+	queue_work(doe_mb->work_queue, &work->work);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(pci_doe_submit_task);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index ed9b4df792b8..7c573cdbf17b 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -52,10 +52,6 @@ struct pci_doe_task {
 	int rv;
 	void (*complete)(struct pci_doe_task *task);
 	void *private;
-
-	/* No need for the user to initialize these fields */
-	struct work_struct work;
-	struct pci_doe_mb *doe_mb;
 };
 
 /**
-- 
2.37.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-10-31 19:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-14 15:10 [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
2022-10-14 15:10 ` [PATCH v9 1/5] hw/pci: PCIe Data Object Exchange emulation Jonathan Cameron
2022-10-14 15:10 ` [PATCH v9 2/5] hw/mem/cxl-type3: Add MSIX support Jonathan Cameron
2022-10-14 15:10 ` [PATCH v9 3/5] hw/cxl/cdat: CXL CDAT Data Object Exchange implementation Jonathan Cameron
2022-10-14 15:10 ` [PATCH v9 4/5] hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange Jonathan Cameron
2022-10-24 17:42   ` Gregory Price
2022-10-25 10:56     ` Jonathan Cameron
2022-10-31 19:51       ` Ira Weiny
2022-10-14 15:10 ` [PATCH v9 5/5] hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE Jonathan Cameron
2022-10-25 22:06 ` [PATCH v9 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2 Jonathan Cameron
2022-10-25 22:30   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).