linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/16] Collection of DOE material
@ 2023-02-10 20:25 Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian Lukas Wunner
                   ` (15 more replies)
  0 siblings, 16 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Collection of DOE material, v3:

Migrate to synchronous API, create mailboxes in PCI core instead of
in drivers, relax restrictions on request/response size.


One major change since v2 is the behavior on device removal:
Previously ongoing DOE exchanges were canceled before driver unbinding,
so that a driver doesn't wait for an ongoing DOE exchange to time out.
That's the right thing to do for surprise removal, but not for
orderly removal via sysfs or Attention Button, so canceling is
now confined to the former.

Another major change is a new patch at the end of the series
to rightsize CXL CDAT response allocation (per Jonathan's request).

A number of issues in CXL CDAT retrieval caught my eye.
Some of them look like potential vulnerabilities.
I'm adding 4 patches at the beginning of the series to fix them.

Ira kindly tested the series again and verified that the CDAT exposed
in sysfs remains identical.


Changes v2 -> v3:

* [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header
  [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries
  [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length
  * Newly added patch in v3

* [PATCH v3 05/16] PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
  * Explain in commit message what the long-term fix implemented by
    a subsequent commit is (Jonathan)

* [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing
  * Newly added patch in v3

* [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management
  * Call pci_doe_flush_mb() from pci_doe_destroy_mb() to simplify
    error paths (Jonathan)
  * Explain in commit message why xa_destroy() is reordered in
    error path of the new pci_doe_create_mb() (Jonathan)

* [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  * Don't cancel ongoing DOE exchanges in pci_stop_dev() so that
    drivers may perform DOE in their ->remove() hooks
  * Instead cancel ongoing DOE exchanges on surprise removal in
    pci_dev_set_disconnected()
  * Emit error message in pci_doe_init() if mailbox creation fails (Ira)
  * Explain in commit message that pci_find_doe_mailbox() can later
    be amended with pci_find_next_doe_mailbox() (Jonathan)

* [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size
  * Fix byte order of last dword on big endian arches (Ira)
  * Explain in commit message and kerneldoc that arbitrary-sized
    payloads are not stipulated by the spec, but merely for
    convenience of the caller (Bjorn, Jonathan)
  * Add code comment that "remainder" in pci_doe_recv_resp() signifies
    the number of data bytes in the last payload dword (Ira)

* [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation
  * Newly added patch in v3 on popular request (Jonathan)


Link to v2:

https://lore.kernel.org/all/cover.1674468099.git.lukas@wunner.de/


Lukas Wunner (16):
  cxl/pci: Fix CDAT retrieval on big endian
  cxl/pci: Handle truncated CDAT header
  cxl/pci: Handle truncated CDAT entries
  cxl/pci: Handle excessive CDAT length
  PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
  PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
  PCI/DOE: Provide synchronous API and use it internally
  cxl/pci: Use synchronous API for DOE
  PCI/DOE: Make asynchronous API private
  PCI/DOE: Deduplicate mailbox flushing
  PCI/DOE: Allow mailbox creation without devres management
  PCI/DOE: Create mailboxes on device enumeration
  cxl/pci: Use CDAT DOE mailbox created by PCI core
  PCI/DOE: Make mailbox creation API private
  PCI/DOE: Relax restrictions on request and response size
  cxl/pci: Rightsize CDAT response allocation

 .clang-format           |   1 -
 drivers/cxl/core/pci.c  | 125 ++++++---------
 drivers/cxl/cxl.h       |   3 +-
 drivers/cxl/cxlmem.h    |   3 -
 drivers/cxl/cxlpci.h    |  14 ++
 drivers/cxl/pci.c       |  49 ------
 drivers/cxl/port.c      |   2 +-
 drivers/pci/doe.c       | 340 ++++++++++++++++++++++++++++++----------
 drivers/pci/pci.h       |  12 ++
 drivers/pci/probe.c     |   1 +
 drivers/pci/remove.c    |   1 +
 include/linux/pci-doe.h |  62 +-------
 include/linux/pci.h     |   3 +
 13 files changed, 345 insertions(+), 271 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-11  0:22   ` Dan Williams
                     ` (2 more replies)
  2023-02-10 20:25 ` [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header Lukas Wunner
                   ` (14 subsequent siblings)
  15 siblings, 3 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

The CDAT exposed in sysfs differs between little endian and big endian
arches:  On big endian, every 4 bytes are byte-swapped.

PCI Configuration Space is little endian (PCI r3.0 sec 6.1).  Accessors
such as pci_read_config_dword() implicitly swap bytes on big endian.
That way, the macros in include/uapi/linux/pci_regs.h work regardless of
the arch's endianness.  For an example of implicit byte-swapping, see
ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
(Load Word Byte-Reverse Indexed).

DOE Read/Write Data Mailbox Registers are unlike other registers in
Configuration Space in that they contain or receive a 4 byte portion of
an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
They need to be copied to or from the request/response buffer verbatim.
So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
byte-swapping.

The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
implicit byte-swapping.  Byte-swap requests after constructing them with
those macros and byte-swap responses before parsing them.

Change the request and response type to __le32 to avoid sparse warnings.

Fixes: c97006046c79 ("cxl/port: Read CDAT table")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.0+
---
 Changes v2 -> v3:
 * Newly added patch in v3

 drivers/cxl/core/pci.c  | 12 ++++++------
 drivers/pci/doe.c       | 13 ++++++++-----
 include/linux/pci-doe.h |  8 ++++++--
 3 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 57764e9cd19d..d3cf1d9d67d4 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -480,7 +480,7 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport)
 	return NULL;
 }
 
-#define CDAT_DOE_REQ(entry_handle)					\
+#define CDAT_DOE_REQ(entry_handle) cpu_to_le32				\
 	(FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE,			\
 		    CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) |		\
 	 FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE,			\
@@ -493,8 +493,8 @@ static void cxl_doe_task_complete(struct pci_doe_task *task)
 }
 
 struct cdat_doe_task {
-	u32 request_pl;
-	u32 response_pl[32];
+	__le32 request_pl;
+	__le32 response_pl[32];
 	struct completion c;
 	struct pci_doe_task task;
 };
@@ -531,7 +531,7 @@ static int cxl_cdat_get_length(struct device *dev,
 	if (t.task.rv < sizeof(u32))
 		return -EIO;
 
-	*length = t.response_pl[1];
+	*length = le32_to_cpu(t.response_pl[1]);
 	dev_dbg(dev, "CDAT length %zu\n", *length);
 
 	return 0;
@@ -548,7 +548,7 @@ static int cxl_cdat_read_table(struct device *dev,
 	do {
 		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
 		size_t entry_dw;
-		u32 *entry;
+		__le32 *entry;
 		int rc;
 
 		rc = pci_doe_submit_task(cdat_doe, &t.task);
@@ -563,7 +563,7 @@ static int cxl_cdat_read_table(struct device *dev,
 
 		/* Get the CXL table access header entry handle */
 		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
-					 t.response_pl[0]);
+					 le32_to_cpu(t.response_pl[0]));
 		entry = t.response_pl + 1;
 		entry_dw = t.task.rv / sizeof(u32);
 		/* Skip Header */
diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 66d9ab288646..69efa9a250b9 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
 					  length));
 	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)
 		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
-				       task->request_pl[i]);
+				       le32_to_cpu(task->request_pl[i]));
 
 	pci_doe_write_ctrl(doe_mb, PCI_DOE_CTRL_GO);
 
@@ -198,8 +198,8 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas
 	payload_length = min(length, task->response_pl_sz / sizeof(u32));
 	/* Read the rest of the response payload */
 	for (i = 0; i < payload_length; i++) {
-		pci_read_config_dword(pdev, offset + PCI_DOE_READ,
-				      &task->response_pl[i]);
+		pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
+		task->response_pl[i] = cpu_to_le32(val);
 		/* Prior to the last ack, ensure Data Object Ready */
 		if (i == (payload_length - 1) && !pci_doe_data_obj_ready(doe_mb))
 			return -EIO;
@@ -322,15 +322,17 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
 	struct pci_doe_task task = {
 		.prot.vid = PCI_VENDOR_ID_PCI_SIG,
 		.prot.type = PCI_DOE_PROTOCOL_DISCOVERY,
-		.request_pl = &request_pl,
+		.request_pl = (__le32 *)&request_pl,
 		.request_pl_sz = sizeof(request_pl),
-		.response_pl = &response_pl,
+		.response_pl = (__le32 *)&response_pl,
 		.response_pl_sz = sizeof(response_pl),
 		.complete = pci_doe_task_complete,
 		.private = &c,
 	};
 	int rc;
 
+	cpu_to_le32s(&request_pl);
+
 	rc = pci_doe_submit_task(doe_mb, &task);
 	if (rc < 0)
 		return rc;
@@ -340,6 +342,7 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
 	if (task.rv != sizeof(response_pl))
 		return -EIO;
 
+	le32_to_cpus(&response_pl);
 	*vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl);
 	*protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL,
 			      response_pl);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index ed9b4df792b8..43765eaf2342 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -34,6 +34,10 @@ struct pci_doe_mb;
  * @work: Used internally by the mailbox
  * @doe_mb: Used internally by the mailbox
  *
+ * Payloads are treated as opaque byte streams which are transmitted verbatim,
+ * without byte-swapping.  If payloads contain little-endian register values,
+ * the caller is responsible for conversion with cpu_to_le32() / le32_to_cpu().
+ *
  * The payload sizes and rv are specified in bytes with the following
  * restrictions concerning the protocol.
  *
@@ -45,9 +49,9 @@ struct pci_doe_mb;
  */
 struct pci_doe_task {
 	struct pci_doe_protocol prot;
-	u32 *request_pl;
+	__le32 *request_pl;
 	size_t request_pl_sz;
-	u32 *response_pl;
+	__le32 *response_pl;
 	size_t response_pl_sz;
 	int rv;
 	void (*complete)(struct pci_doe_task *task);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-11  0:40   ` Dan Williams
                     ` (2 more replies)
  2023-02-10 20:25 ` [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries Lukas Wunner
                   ` (13 subsequent siblings)
  15 siblings, 3 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

cxl_cdat_get_length() only checks whether the DOE response size is
sufficient for the Table Access response header (1 dword), but not the
succeeding CDAT header (1 dword length plus other fields).

It thus returns whatever uninitialized memory happens to be on the stack
if a truncated DOE response with only 1 dword was received.  Fix it.

Fixes: c97006046c79 ("cxl/port: Read CDAT table")
Reported-by: Ming Li <ming4.li@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.0+
---
 Changes v2 -> v3:
 * Newly added patch in v3

 drivers/cxl/core/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index d3cf1d9d67d4..11a85b3a9a0b 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -528,7 +528,7 @@ static int cxl_cdat_get_length(struct device *dev,
 		return rc;
 	}
 	wait_for_completion(&t.c);
-	if (t.task.rv < sizeof(u32))
+	if (t.task.rv < 2 * sizeof(u32))
 		return -EIO;
 
 	*length = le32_to_cpu(t.response_pl[1]);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-11  0:50   ` Dan Williams
  2023-02-14 11:30   ` Jonathan Cameron
  2023-02-10 20:25 ` [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length Lukas Wunner
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

If truncated CDAT entries are received from a device, the concatenation
of those entries constitutes a corrupt CDAT, yet is happily exposed to
user space.

Avoid by verifying response lengths and erroring out if truncation is
detected.

The last CDAT entry may still be truncated despite the checks introduced
herein if the length in the CDAT header is too small.  However, that is
easily detectable by user space because it reaches EOF prematurely.
A subsequent commit which rightsizes the CDAT response allocation closes
that remaining loophole.

The two lines introduced here which exceed 80 chars are shortened to
less than 80 chars by a subsequent commit which migrates to a
synchronous DOE API and replaces "t.task.rv" by "rc".

The existing acpi_cdat_header and acpi_table_cdat struct definitions
provided by ACPICA cannot be used because they do not employ __le16 or
__le32 types.  I believe that cannot be changed because those types are
Linux-specific and ACPI is specified for little endian platforms only,
hence doesn't care about endianness.  So duplicate the structs.

Fixes: c97006046c79 ("cxl/port: Read CDAT table")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.0+
---
 Changes v2 -> v3:
 * Newly added patch in v3

 drivers/cxl/core/pci.c | 13 +++++++++----
 drivers/cxl/cxlpci.h   | 14 ++++++++++++++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 11a85b3a9a0b..a3fb6bd68d17 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -547,8 +547,8 @@ static int cxl_cdat_read_table(struct device *dev,
 
 	do {
 		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
+		struct cdat_entry_header *entry;
 		size_t entry_dw;
-		__le32 *entry;
 		int rc;
 
 		rc = pci_doe_submit_task(cdat_doe, &t.task);
@@ -557,14 +557,19 @@ static int cxl_cdat_read_table(struct device *dev,
 			return rc;
 		}
 		wait_for_completion(&t.c);
-		/* 1 DW header + 1 DW data min */
-		if (t.task.rv < (2 * sizeof(u32)))
+
+		/* 1 DW Table Access Response Header + CDAT entry */
+		entry = (struct cdat_entry_header *)(t.response_pl + 1);
+		if ((entry_handle == 0 &&
+		     t.task.rv != sizeof(u32) + sizeof(struct cdat_header)) ||
+		    (entry_handle > 0 &&
+		     (t.task.rv < sizeof(u32) + sizeof(struct cdat_entry_header) ||
+		      t.task.rv != sizeof(u32) + le16_to_cpu(entry->length))))
 			return -EIO;
 
 		/* Get the CXL table access header entry handle */
 		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
 					 le32_to_cpu(t.response_pl[0]));
-		entry = t.response_pl + 1;
 		entry_dw = t.task.rv / sizeof(u32);
 		/* Skip Header */
 		entry_dw -= 1;
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 920909791bb9..104ad2b72516 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -62,6 +62,20 @@ enum cxl_regloc_type {
 	CXL_REGLOC_RBI_TYPES
 };
 
+struct cdat_header {
+	__le32 length;
+	u8 revision;
+	u8 checksum;
+	u8 reserved[6];
+	__le32 sequence;
+} __packed;
+
+struct cdat_entry_header {
+	u8 type;
+	u8 reserved;
+	__le16 length;
+} __packed;
+
 int devm_cxl_port_enumerate_dports(struct cxl_port *port);
 struct cxl_dev_state;
 int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (2 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-11  1:04   ` Dan Williams
  2023-02-14 11:33   ` Jonathan Cameron
  2023-02-10 20:25 ` [PATCH v3 05/16] PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y Lukas Wunner
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

If the length in the CDAT header is larger than the concatenation of the
header and all table entries, then the CDAT exposed to user space
contains trailing null bytes.

Not every consumer may be able to handle that.  Per Postel's robustness
principle, "be liberal in what you accept" and silently reduce the
cached length to avoid exposing those null bytes.

Fixes: c97006046c79 ("cxl/port: Read CDAT table")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v6.0+
---
 Changes v2 -> v3:
 * Newly added patch in v3

 drivers/cxl/core/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index a3fb6bd68d17..c37c41d7acb6 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -582,6 +582,9 @@ static int cxl_cdat_read_table(struct device *dev,
 		}
 	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
 
+	/* Length in CDAT header may exceed concatenation of CDAT entries */
+	cdat->length -= length;
+
 	return 0;
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 05/16] PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (3 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 06/16] PCI/DOE: Fix memory leak " Lukas Wunner
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Gregory Price reports a WARN splat with CONFIG_DEBUG_OBJECTS=y upon CXL
probing because pci_doe_submit_task() invokes INIT_WORK() instead of
INIT_WORK_ONSTACK() for a work_struct that was allocated on the stack.

All callers of pci_doe_submit_task() allocate the work_struct on the
stack, so replace INIT_WORK() with INIT_WORK_ONSTACK() as a backportable
short-term fix.

The long-term fix implemented by a subsequent commit is to move to a
synchronous API which allocates the work_struct internally in the DOE
library.

Stacktrace for posterity:

WARNING: CPU: 0 PID: 23 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 pci_doe_submit_task+0x5d/0xd0
 pci_doe_discovery+0xb4/0x100
 pcim_doe_create_mb+0x219/0x290
 cxl_pci_probe+0x192/0x430
 local_pci_probe+0x41/0x80
 pci_device_probe+0xb3/0x220
 really_probe+0xde/0x380
 __driver_probe_device+0x78/0x170
 driver_probe_device+0x1f/0x90
 __driver_attach_async_helper+0x5c/0xe0
 async_run_entry_fn+0x30/0x130
 process_one_work+0x294/0x5b0

Fixes: 9d24322e887b ("PCI/DOE: Add DOE mailbox support functions")
Link: https://lore.kernel.org/linux-cxl/Y1bOniJliOFszvIK@memverge.com/
Reported-by: Gregory Price <gregory.price@memverge.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com>
Cc: stable@vger.kernel.org # v6.0+
---
 Changes v2 -> v3:
 * Explain in commit message what the long-term fix implemented by
   a subsequent commit is (Jonathan)

 drivers/pci/doe.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 69efa9a250b9..c20ca62a8c9d 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -523,6 +523,8 @@ EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
  * task->complete will be called when the state machine is done processing this
  * task.
  *
+ * @task must be allocated on the stack.
+ *
  * Excess data will be discarded.
  *
  * RETURNS: 0 when task has been successfully queued, -ERRNO on error
@@ -544,7 +546,7 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 		return -EIO;
 
 	task->doe_mb = doe_mb;
-	INIT_WORK(&task->work, doe_statemachine_work);
+	INIT_WORK_ONSTACK(&task->work, doe_statemachine_work);
 	queue_work(doe_mb->work_queue, &task->work);
 	return 0;
 }
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 06/16] PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (4 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 05/16] PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-11  1:06   ` Dan Williams
  2023-03-01  1:51   ` Davidlohr Bueso
  2023-02-10 20:25 ` [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally Lukas Wunner
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

After a pci_doe_task completes, its work_struct needs to be destroyed
to avoid a memory leak with CONFIG_DEBUG_OBJECTS=y.

Fixes: 9d24322e887b ("PCI/DOE: Add DOE mailbox support functions")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: stable@vger.kernel.org # v6.0+
---
 drivers/pci/doe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index c20ca62a8c9d..6cf0600a38aa 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -224,6 +224,7 @@ static void signal_task_complete(struct pci_doe_task *task, int rv)
 {
 	task->rv = rv;
 	task->complete(task);
+	destroy_work_on_stack(&task->work);
 }
 
 static void signal_task_abort(struct pci_doe_task *task, int rv)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (5 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 06/16] PCI/DOE: Fix memory leak " Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-15  1:45   ` Li, Ming
  2023-02-28 18:58   ` Davidlohr Bueso
  2023-02-10 20:25 ` [PATCH v3 08/16] cxl/pci: Use synchronous API for DOE Lukas Wunner
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

The DOE API only allows asynchronous exchanges and forces callers to
provide a completion callback.  Yet all existing callers only perform
synchronous exchanges.  Upcoming commits for CMA (Component Measurement
and Authentication, PCIe r6.0 sec 6.31) likewise require only
synchronous DOE exchanges.

Provide a synchronous pci_doe() API call which builds on the internal
asynchronous machinery.

Convert the internal pci_doe_discovery() to the new call.

The new API allows submission of const-declared requests, necessitating
the addition of a const qualifier in struct pci_doe_task.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
 drivers/pci/doe.c       | 69 ++++++++++++++++++++++++++++++++---------
 include/linux/pci-doe.h |  6 +++-
 2 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 6cf0600a38aa..d2edae8a32ac 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -319,28 +319,17 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
 	u32 request_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX,
 				    *index);
 	u32 response_pl;
-	DECLARE_COMPLETION_ONSTACK(c);
-	struct pci_doe_task task = {
-		.prot.vid = PCI_VENDOR_ID_PCI_SIG,
-		.prot.type = PCI_DOE_PROTOCOL_DISCOVERY,
-		.request_pl = (__le32 *)&request_pl,
-		.request_pl_sz = sizeof(request_pl),
-		.response_pl = (__le32 *)&response_pl,
-		.response_pl_sz = sizeof(response_pl),
-		.complete = pci_doe_task_complete,
-		.private = &c,
-	};
 	int rc;
 
 	cpu_to_le32s(&request_pl);
 
-	rc = pci_doe_submit_task(doe_mb, &task);
+	rc = pci_doe(doe_mb, PCI_VENDOR_ID_PCI_SIG, PCI_DOE_PROTOCOL_DISCOVERY,
+		     &request_pl, sizeof(request_pl),
+		     &response_pl, sizeof(response_pl));
 	if (rc < 0)
 		return rc;
 
-	wait_for_completion(&c);
-
-	if (task.rv != sizeof(response_pl))
+	if (rc != sizeof(response_pl))
 		return -EIO;
 
 	le32_to_cpus(&response_pl);
@@ -552,3 +541,53 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(pci_doe_submit_task);
+
+/**
+ * pci_doe() - Perform Data Object Exchange
+ *
+ * @doe_mb: DOE Mailbox
+ * @vendor: Vendor ID
+ * @type: Data Object Type
+ * @request: Request payload
+ * @request_sz: Size of request payload (bytes)
+ * @response: Response payload
+ * @response_sz: Size of response payload (bytes)
+ *
+ * Submit @request to @doe_mb and store the @response.
+ * The DOE exchange is performed synchronously and may therefore sleep.
+ *
+ * Payloads are treated as opaque byte streams which are transmitted verbatim,
+ * without byte-swapping.  If payloads contain little-endian register values,
+ * the caller is responsible for conversion with cpu_to_le32() / le32_to_cpu().
+ *
+ * RETURNS: Length of received response or negative errno.
+ * Received data in excess of @response_sz is discarded.
+ * The length may be smaller than @response_sz and the caller
+ * is responsible for checking that.
+ */
+int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
+	    const void *request, size_t request_sz,
+	    void *response, size_t response_sz)
+{
+	DECLARE_COMPLETION_ONSTACK(c);
+	struct pci_doe_task task = {
+		.prot.vid = vendor,
+		.prot.type = type,
+		.request_pl = request,
+		.request_pl_sz = request_sz,
+		.response_pl = response,
+		.response_pl_sz = response_sz,
+		.complete = pci_doe_task_complete,
+		.private = &c,
+	};
+	int rc;
+
+	rc = pci_doe_submit_task(doe_mb, &task);
+	if (rc)
+		return rc;
+
+	wait_for_completion(&c);
+
+	return task.rv;
+}
+EXPORT_SYMBOL_GPL(pci_doe);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index 43765eaf2342..5dcd54f892e5 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -49,7 +49,7 @@ struct pci_doe_mb;
  */
 struct pci_doe_task {
 	struct pci_doe_protocol prot;
-	__le32 *request_pl;
+	const __le32 *request_pl;
 	size_t request_pl_sz;
 	__le32 *response_pl;
 	size_t response_pl_sz;
@@ -78,4 +78,8 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset);
 bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type);
 int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task);
 
+int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
+	    const void *request, size_t request_sz,
+	    void *response, size_t response_sz);
+
 #endif
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 08/16] cxl/pci: Use synchronous API for DOE
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (6 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 09/16] PCI/DOE: Make asynchronous API private Lukas Wunner
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

A synchronous API for DOE has just been introduced.  Convert CXL CDAT
retrieval over to it.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/pci.c | 66 ++++++++++++++----------------------------
 1 file changed, 22 insertions(+), 44 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index c37c41d7acb6..09a9fa67d12a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -487,51 +487,26 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport)
 		    CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) |		\
 	 FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle)))
 
-static void cxl_doe_task_complete(struct pci_doe_task *task)
-{
-	complete(task->private);
-}
-
-struct cdat_doe_task {
-	__le32 request_pl;
-	__le32 response_pl[32];
-	struct completion c;
-	struct pci_doe_task task;
-};
-
-#define DECLARE_CDAT_DOE_TASK(req, cdt)                       \
-struct cdat_doe_task cdt = {                                  \
-	.c = COMPLETION_INITIALIZER_ONSTACK(cdt.c),           \
-	.request_pl = req,				      \
-	.task = {                                             \
-		.prot.vid = PCI_DVSEC_VENDOR_ID_CXL,        \
-		.prot.type = CXL_DOE_PROTOCOL_TABLE_ACCESS, \
-		.request_pl = &cdt.request_pl,                \
-		.request_pl_sz = sizeof(cdt.request_pl),      \
-		.response_pl = cdt.response_pl,               \
-		.response_pl_sz = sizeof(cdt.response_pl),    \
-		.complete = cxl_doe_task_complete,            \
-		.private = &cdt.c,                            \
-	}                                                     \
-}
-
 static int cxl_cdat_get_length(struct device *dev,
 			       struct pci_doe_mb *cdat_doe,
 			       size_t *length)
 {
-	DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(0), t);
+	__le32 request = CDAT_DOE_REQ(0);
+	__le32 response[32];
 	int rc;
 
-	rc = pci_doe_submit_task(cdat_doe, &t.task);
+	rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
+		     CXL_DOE_PROTOCOL_TABLE_ACCESS,
+		     &request, sizeof(request),
+		     &response, sizeof(response));
 	if (rc < 0) {
-		dev_err(dev, "DOE submit failed: %d", rc);
+		dev_err(dev, "DOE failed: %d", rc);
 		return rc;
 	}
-	wait_for_completion(&t.c);
-	if (t.task.rv < 2 * sizeof(u32))
+	if (rc < 2 * sizeof(u32))
 		return -EIO;
 
-	*length = le32_to_cpu(t.response_pl[1]);
+	*length = le32_to_cpu(response[1]);
 	dev_dbg(dev, "CDAT length %zu\n", *length);
 
 	return 0;
@@ -546,31 +521,34 @@ static int cxl_cdat_read_table(struct device *dev,
 	int entry_handle = 0;
 
 	do {
-		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
+		__le32 request = CDAT_DOE_REQ(entry_handle);
 		struct cdat_entry_header *entry;
+		__le32 response[32];
 		size_t entry_dw;
 		int rc;
 
-		rc = pci_doe_submit_task(cdat_doe, &t.task);
+		rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
+			     CXL_DOE_PROTOCOL_TABLE_ACCESS,
+			     &request, sizeof(request),
+			     &response, sizeof(response));
 		if (rc < 0) {
-			dev_err(dev, "DOE submit failed: %d", rc);
+			dev_err(dev, "DOE failed: %d", rc);
 			return rc;
 		}
-		wait_for_completion(&t.c);
 
 		/* 1 DW Table Access Response Header + CDAT entry */
-		entry = (struct cdat_entry_header *)(t.response_pl + 1);
+		entry = (struct cdat_entry_header *)(response + 1);
 		if ((entry_handle == 0 &&
-		     t.task.rv != sizeof(u32) + sizeof(struct cdat_header)) ||
+		     rc != sizeof(u32) + sizeof(struct cdat_header)) ||
 		    (entry_handle > 0 &&
-		     (t.task.rv < sizeof(u32) + sizeof(struct cdat_entry_header) ||
-		      t.task.rv != sizeof(u32) + le16_to_cpu(entry->length))))
+		     (rc < sizeof(u32) + sizeof(struct cdat_entry_header) ||
+		      rc != sizeof(u32) + le16_to_cpu(entry->length))))
 			return -EIO;
 
 		/* Get the CXL table access header entry handle */
 		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
-					 le32_to_cpu(t.response_pl[0]));
-		entry_dw = t.task.rv / sizeof(u32);
+					 le32_to_cpu(response[0]));
+		entry_dw = rc / sizeof(u32);
 		/* Skip Header */
 		entry_dw -= 1;
 		entry_dw = min(length / sizeof(u32), entry_dw);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 09/16] PCI/DOE: Make asynchronous API private
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (7 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 08/16] cxl/pci: Use synchronous API for DOE Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-15  1:48   ` Li, Ming
  2023-02-10 20:25 ` [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing Lukas Wunner
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

A synchronous API for DOE has just been introduced.  CXL (the only
in-tree DOE user so far) was converted to use it instead of the
asynchronous API.

Consequently, pci_doe_submit_task() as well as the pci_doe_task struct
are only used internally, so make them private.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/doe.c       | 45 ++++++++++++++++++++++++++++++++++++--
 include/linux/pci-doe.h | 48 -----------------------------------------
 2 files changed, 43 insertions(+), 50 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index d2edae8a32ac..afb53bc1b4aa 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -56,6 +56,47 @@ struct pci_doe_mb {
 	unsigned long flags;
 };
 
+struct pci_doe_protocol {
+	u16 vid;
+	u8 type;
+};
+
+/**
+ * struct pci_doe_task - represents a single query/response
+ *
+ * @prot: DOE Protocol
+ * @request_pl: The request payload
+ * @request_pl_sz: Size of the request payload (bytes)
+ * @response_pl: The response payload
+ * @response_pl_sz: Size of the response payload (bytes)
+ * @rv: Return value.  Length of received response or error (bytes)
+ * @complete: Called when task is complete
+ * @private: Private data for the consumer
+ * @work: Used internally by the mailbox
+ * @doe_mb: Used internally by the mailbox
+ *
+ * The payload sizes and rv are specified in bytes with the following
+ * restrictions concerning the protocol.
+ *
+ *	1) The request_pl_sz must be a multiple of double words (4 bytes)
+ *	2) The response_pl_sz must be >= a single double word (4 bytes)
+ *	3) rv is returned as bytes but it will be a multiple of double words
+ */
+struct pci_doe_task {
+	struct pci_doe_protocol prot;
+	const __le32 *request_pl;
+	size_t request_pl_sz;
+	__le32 *response_pl;
+	size_t response_pl_sz;
+	int rv;
+	void (*complete)(struct pci_doe_task *task);
+	void *private;
+
+	/* initialized by pci_doe_submit_task() */
+	struct work_struct work;
+	struct pci_doe_mb *doe_mb;
+};
+
 static int pci_doe_wait(struct pci_doe_mb *doe_mb, unsigned long timeout)
 {
 	if (wait_event_timeout(doe_mb->wq,
@@ -519,7 +560,8 @@ EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
  *
  * RETURNS: 0 when task has been successfully queued, -ERRNO on error
  */
-int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
+static int pci_doe_submit_task(struct pci_doe_mb *doe_mb,
+			       struct pci_doe_task *task)
 {
 	if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type))
 		return -EINVAL;
@@ -540,7 +582,6 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 	queue_work(doe_mb->work_queue, &task->work);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(pci_doe_submit_task);
 
 /**
  * pci_doe() - Perform Data Object Exchange
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index 5dcd54f892e5..7f16749c6aa3 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -13,55 +13,8 @@
 #ifndef LINUX_PCI_DOE_H
 #define LINUX_PCI_DOE_H
 
-struct pci_doe_protocol {
-	u16 vid;
-	u8 type;
-};
-
 struct pci_doe_mb;
 
-/**
- * struct pci_doe_task - represents a single query/response
- *
- * @prot: DOE Protocol
- * @request_pl: The request payload
- * @request_pl_sz: Size of the request payload (bytes)
- * @response_pl: The response payload
- * @response_pl_sz: Size of the response payload (bytes)
- * @rv: Return value.  Length of received response or error (bytes)
- * @complete: Called when task is complete
- * @private: Private data for the consumer
- * @work: Used internally by the mailbox
- * @doe_mb: Used internally by the mailbox
- *
- * Payloads are treated as opaque byte streams which are transmitted verbatim,
- * without byte-swapping.  If payloads contain little-endian register values,
- * the caller is responsible for conversion with cpu_to_le32() / le32_to_cpu().
- *
- * The payload sizes and rv are specified in bytes with the following
- * restrictions concerning the protocol.
- *
- *	1) The request_pl_sz must be a multiple of double words (4 bytes)
- *	2) The response_pl_sz must be >= a single double word (4 bytes)
- *	3) rv is returned as bytes but it will be a multiple of double words
- *
- * NOTE there is no need for the caller to initialize work or doe_mb.
- */
-struct pci_doe_task {
-	struct pci_doe_protocol prot;
-	const __le32 *request_pl;
-	size_t request_pl_sz;
-	__le32 *response_pl;
-	size_t response_pl_sz;
-	int rv;
-	void (*complete)(struct pci_doe_task *task);
-	void *private;
-
-	/* No need for the user to initialize these fields */
-	struct work_struct work;
-	struct pci_doe_mb *doe_mb;
-};
-
 /**
  * pci_doe_for_each_off - Iterate each DOE capability
  * @pdev: struct pci_dev to iterate
@@ -76,7 +29,6 @@ struct pci_doe_task {
 
 struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset);
 bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type);
-int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task);
 
 int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
 	    const void *request, size_t request_sz,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (8 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 09/16] PCI/DOE: Make asynchronous API private Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-14 11:36   ` Jonathan Cameron
  2023-02-15  5:07   ` Li, Ming
  2023-02-10 20:25 ` [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management Lukas Wunner
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

When a DOE mailbox is torn down, its workqueue is flushed once in
pci_doe_flush_mb() through a call to flush_workqueue() and subsequently
flushed once more in pci_doe_destroy_workqueue() through a call to
destroy_workqueue().

Deduplicate by dropping flush_workqueue() from pci_doe_flush_mb().

Rename pci_doe_flush_mb() to pci_doe_cancel_tasks() to more aptly
describe what it now does.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 Changes v2 -> v3:
 * Newly added patch in v3

 drivers/pci/doe.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index afb53bc1b4aa..291cd7a46a39 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -429,7 +429,7 @@ static void pci_doe_destroy_workqueue(void *mb)
 	destroy_workqueue(doe_mb->work_queue);
 }
 
-static void pci_doe_flush_mb(void *mb)
+static void pci_doe_cancel_tasks(void *mb)
 {
 	struct pci_doe_mb *doe_mb = mb;
 
@@ -439,9 +439,6 @@ static void pci_doe_flush_mb(void *mb)
 	/* Cancel an in progress work item, if necessary */
 	set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags);
 	wake_up(&doe_mb->wq);
-
-	/* Flush all work items */
-	flush_workqueue(doe_mb->work_queue);
 }
 
 /**
@@ -498,9 +495,9 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset)
 
 	/*
 	 * The state machine and the mailbox should be in sync now;
-	 * Set up mailbox flush prior to using the mailbox to query protocols.
+	 * Set up cancel tasks prior to using the mailbox to query protocols.
 	 */
-	rc = devm_add_action_or_reset(dev, pci_doe_flush_mb, doe_mb);
+	rc = devm_add_action_or_reset(dev, pci_doe_cancel_tasks, doe_mb);
 	if (rc)
 		return ERR_PTR(rc);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (9 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-14 11:51   ` Jonathan Cameron
  2023-02-15  5:17   ` Li, Ming
  2023-02-10 20:25 ` [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration Lukas Wunner
                   ` (4 subsequent siblings)
  15 siblings, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

DOE mailbox creation is currently only possible through a devres-managed
API.  The lifetime of mailboxes thus ends with driver unbinding.

An upcoming commit will create DOE mailboxes upon device enumeration by
the PCI core.  Their lifetime shall not be limited by a driver.

Therefore rework pcim_doe_create_mb() into the non-devres-managed
pci_doe_create_mb().  Add pci_doe_destroy_mb() for mailbox destruction
on device removal.

Provide a devres-managed wrapper under the existing pcim_doe_create_mb()
name.

The error path of pcim_doe_create_mb() previously called xa_destroy() if
alloc_ordered_workqueue() failed.  That's unnecessary because the xarray
is still empty at that point.  It doesn't need to be destroyed until
it's been populated by pci_doe_cache_protocols().  Arrange the error
path of the new pci_doe_create_mb() accordingly.

pci_doe_cancel_tasks() is no longer used as callback for
devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
instead of a generic void pointer.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 Changes v2 -> v3:
 * Call pci_doe_flush_mb() from pci_doe_destroy_mb() to simplify
   error paths (Jonathan)
 * Explain in commit message why xa_destroy() is reordered in
   error path of the new pci_doe_create_mb() (Jonathan)

 drivers/pci/doe.c | 103 +++++++++++++++++++++++++++++-----------------
 1 file changed, 66 insertions(+), 37 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 291cd7a46a39..2bc202b64b6a 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -37,7 +37,7 @@
  *
  * This state is used to manage a single DOE mailbox capability.  All fields
  * should be considered opaque to the consumers and the structure passed into
- * the helpers below after being created by devm_pci_doe_create()
+ * the helpers below after being created by pci_doe_create_mb().
  *
  * @pdev: PCI device this mailbox belongs to
  * @cap_offset: Capability offset
@@ -415,24 +415,8 @@ static int pci_doe_cache_protocols(struct pci_doe_mb *doe_mb)
 	return 0;
 }
 
-static void pci_doe_xa_destroy(void *mb)
+static void pci_doe_cancel_tasks(struct pci_doe_mb *doe_mb)
 {
-	struct pci_doe_mb *doe_mb = mb;
-
-	xa_destroy(&doe_mb->prots);
-}
-
-static void pci_doe_destroy_workqueue(void *mb)
-{
-	struct pci_doe_mb *doe_mb = mb;
-
-	destroy_workqueue(doe_mb->work_queue);
-}
-
-static void pci_doe_cancel_tasks(void *mb)
-{
-	struct pci_doe_mb *doe_mb = mb;
-
 	/* Stop all pending work items from starting */
 	set_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags);
 
@@ -442,7 +426,7 @@ static void pci_doe_cancel_tasks(void *mb)
 }
 
 /**
- * pcim_doe_create_mb() - Create a DOE mailbox object
+ * pci_doe_create_mb() - Create a DOE mailbox object
  *
  * @pdev: PCI device to create the DOE mailbox for
  * @cap_offset: Offset of the DOE mailbox
@@ -453,24 +437,20 @@ static void pci_doe_cancel_tasks(void *mb)
  * RETURNS: created mailbox object on success
  *	    ERR_PTR(-errno) on failure
  */
-struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset)
+static struct pci_doe_mb *pci_doe_create_mb(struct pci_dev *pdev,
+					    u16 cap_offset)
 {
 	struct pci_doe_mb *doe_mb;
-	struct device *dev = &pdev->dev;
 	int rc;
 
-	doe_mb = devm_kzalloc(dev, sizeof(*doe_mb), GFP_KERNEL);
+	doe_mb = kzalloc(sizeof(*doe_mb), GFP_KERNEL);
 	if (!doe_mb)
 		return ERR_PTR(-ENOMEM);
 
 	doe_mb->pdev = pdev;
 	doe_mb->cap_offset = cap_offset;
 	init_waitqueue_head(&doe_mb->wq);
-
 	xa_init(&doe_mb->prots);
-	rc = devm_add_action(dev, pci_doe_xa_destroy, doe_mb);
-	if (rc)
-		return ERR_PTR(rc);
 
 	doe_mb->work_queue = alloc_ordered_workqueue("%s %s DOE [%x]", 0,
 						dev_driver_string(&pdev->dev),
@@ -479,36 +459,85 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset)
 	if (!doe_mb->work_queue) {
 		pci_err(pdev, "[%x] failed to allocate work queue\n",
 			doe_mb->cap_offset);
-		return ERR_PTR(-ENOMEM);
+		rc = -ENOMEM;
+		goto err_free;
 	}
-	rc = devm_add_action_or_reset(dev, pci_doe_destroy_workqueue, doe_mb);
-	if (rc)
-		return ERR_PTR(rc);
 
 	/* Reset the mailbox by issuing an abort */
 	rc = pci_doe_abort(doe_mb);
 	if (rc) {
 		pci_err(pdev, "[%x] failed to reset mailbox with abort command : %d\n",
 			doe_mb->cap_offset, rc);
-		return ERR_PTR(rc);
+		goto err_destroy_wq;
 	}
 
 	/*
 	 * The state machine and the mailbox should be in sync now;
-	 * Set up cancel tasks prior to using the mailbox to query protocols.
+	 * Use the mailbox to query protocols.
 	 */
-	rc = devm_add_action_or_reset(dev, pci_doe_cancel_tasks, doe_mb);
-	if (rc)
-		return ERR_PTR(rc);
-
 	rc = pci_doe_cache_protocols(doe_mb);
 	if (rc) {
 		pci_err(pdev, "[%x] failed to cache protocols : %d\n",
 			doe_mb->cap_offset, rc);
-		return ERR_PTR(rc);
+		goto err_cancel;
 	}
 
 	return doe_mb;
+
+err_cancel:
+	pci_doe_cancel_tasks(doe_mb);
+	xa_destroy(&doe_mb->prots);
+err_destroy_wq:
+	destroy_workqueue(doe_mb->work_queue);
+err_free:
+	kfree(doe_mb);
+	return ERR_PTR(rc);
+}
+
+/**
+ * pci_doe_destroy_mb() - Destroy a DOE mailbox object
+ *
+ * @ptr: Pointer to DOE mailbox
+ *
+ * Destroy all internal data structures created for the DOE mailbox.
+ */
+static void pci_doe_destroy_mb(void *ptr)
+{
+	struct pci_doe_mb *doe_mb = ptr;
+
+	pci_doe_cancel_tasks(doe_mb);
+	xa_destroy(&doe_mb->prots);
+	destroy_workqueue(doe_mb->work_queue);
+	kfree(doe_mb);
+}
+
+/**
+ * pcim_doe_create_mb() - Create a DOE mailbox object
+ *
+ * @pdev: PCI device to create the DOE mailbox for
+ * @cap_offset: Offset of the DOE mailbox
+ *
+ * Create a single mailbox object to manage the mailbox protocol at the
+ * cap_offset specified.  The mailbox will automatically be destroyed on
+ * driver unbinding from @pdev.
+ *
+ * RETURNS: created mailbox object on success
+ *	    ERR_PTR(-errno) on failure
+ */
+struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset)
+{
+	struct pci_doe_mb *doe_mb;
+	int rc;
+
+	doe_mb = pci_doe_create_mb(pdev, cap_offset);
+	if (IS_ERR(doe_mb))
+		return doe_mb;
+
+	rc = devm_add_action_or_reset(&pdev->dev, pci_doe_destroy_mb, doe_mb);
+	if (rc)
+		return ERR_PTR(rc);
+
+	return doe_mb;
 }
 EXPORT_SYMBOL_GPL(pcim_doe_create_mb);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (10 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-15  2:07   ` Li, Ming
  2023-02-28  1:18   ` Alexey Kardashevskiy
  2023-02-10 20:25 ` [PATCH v3 13/16] cxl/pci: Use CDAT DOE mailbox created by PCI core Lukas Wunner
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Currently a DOE instance cannot be shared by multiple drivers because
each driver creates its own pci_doe_mb struct for a given DOE instance.
For the same reason a DOE instance cannot be shared between the PCI core
and a driver.

Overcome this limitation by creating mailboxes in the PCI core on device
enumeration.

Provide a pci_find_doe_mailbox() API call to allow drivers to get a
pci_doe_mb for a given (pci_dev, vendor, protocol) triple.  This API is
modeled after pci_find_capability() and can later be amended with a
pci_find_next_doe_mailbox() call to iterate over all mailboxes of a
given pci_dev which support a specific protocol.

On removal, destroy the mailboxes in pci_destroy_dev(), after the driver
is unbound.  This allows drivers to use DOE in their ->remove() hook.

On surprise removal, cancel ongoing DOE exchanges and prevent new ones
from being scheduled.  Thereby ensure that a hot-removed device doesn't
needlessly wait for a running exchange to time out.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 Changes v2 -> v3:
 * Don't cancel ongoing DOE exchanges in pci_stop_dev() so that
   drivers may perform DOE in their ->remove() hooks
 * Instead cancel ongoing DOE exchanges on surprise removal in
   pci_dev_set_disconnected()
 * Emit error message in pci_doe_init() if mailbox creation fails (Ira)
 * Explain in commit message that pci_find_doe_mailbox() can later
   be amended with pci_find_next_doe_mailbox() (Jonathan)

 drivers/pci/doe.c       | 73 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h       | 12 +++++++
 drivers/pci/probe.c     |  1 +
 drivers/pci/remove.c    |  1 +
 include/linux/pci-doe.h |  2 ++
 include/linux/pci.h     |  3 ++
 6 files changed, 92 insertions(+)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 2bc202b64b6a..bf32875d27da 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -20,6 +20,8 @@
 #include <linux/pci-doe.h>
 #include <linux/workqueue.h>
 
+#include "pci.h"
+
 #define PCI_DOE_PROTOCOL_DISCOVERY 0
 
 /* Timeout of 1 second from 6.30.2 Operation, PCI Spec r6.0 */
@@ -658,3 +660,74 @@ int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
 	return task.rv;
 }
 EXPORT_SYMBOL_GPL(pci_doe);
+
+/**
+ * pci_find_doe_mailbox() - Find Data Object Exchange mailbox
+ *
+ * @pdev: PCI device
+ * @vendor: Vendor ID
+ * @type: Data Object Type
+ *
+ * Find first DOE mailbox of a PCI device which supports the given protocol.
+ *
+ * RETURNS: Pointer to the DOE mailbox or NULL if none was found.
+ */
+struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
+					u8 type)
+{
+	struct pci_doe_mb *doe_mb;
+	unsigned long index;
+
+	xa_for_each(&pdev->doe_mbs, index, doe_mb)
+		if (pci_doe_supports_prot(doe_mb, vendor, type))
+			return doe_mb;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_find_doe_mailbox);
+
+void pci_doe_init(struct pci_dev *pdev)
+{
+	struct pci_doe_mb *doe_mb;
+	u16 offset = 0;
+	int rc;
+
+	xa_init(&pdev->doe_mbs);
+
+	while ((offset = pci_find_next_ext_capability(pdev, offset,
+						      PCI_EXT_CAP_ID_DOE))) {
+		doe_mb = pci_doe_create_mb(pdev, offset);
+		if (IS_ERR(doe_mb)) {
+			pci_err(pdev, "[%x] failed to create mailbox: %ld\n",
+				offset, PTR_ERR(doe_mb));
+			continue;
+		}
+
+		rc = xa_insert(&pdev->doe_mbs, offset, doe_mb, GFP_KERNEL);
+		if (rc) {
+			pci_err(pdev, "[%x] failed to insert mailbox: %d\n",
+				offset, rc);
+			pci_doe_destroy_mb(doe_mb);
+		}
+	}
+}
+
+void pci_doe_destroy(struct pci_dev *pdev)
+{
+	struct pci_doe_mb *doe_mb;
+	unsigned long index;
+
+	xa_for_each(&pdev->doe_mbs, index, doe_mb)
+		pci_doe_destroy_mb(doe_mb);
+
+	xa_destroy(&pdev->doe_mbs);
+}
+
+void pci_doe_disconnected(struct pci_dev *pdev)
+{
+	struct pci_doe_mb *doe_mb;
+	unsigned long index;
+
+	xa_for_each(&pdev->doe_mbs, index, doe_mb)
+		pci_doe_cancel_tasks(doe_mb);
+}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 8f5d4bd5b410..065ca9743ec1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -318,6 +318,16 @@ struct pci_sriov {
 	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
 };
 
+#ifdef CONFIG_PCI_DOE
+void pci_doe_init(struct pci_dev *pdev);
+void pci_doe_destroy(struct pci_dev *pdev);
+void pci_doe_disconnected(struct pci_dev *pdev);
+#else
+static inline void pci_doe_init(struct pci_dev *pdev) { }
+static inline void pci_doe_destroy(struct pci_dev *pdev) { }
+static inline void pci_doe_disconnected(struct pci_dev *pdev) { }
+#endif
+
 /**
  * pci_dev_set_io_state - Set the new error state if possible.
  *
@@ -372,6 +382,8 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
 	pci_dev_set_io_state(dev, pci_channel_io_perm_failure);
 	device_unlock(&dev->dev);
 
+	pci_doe_disconnected(dev);
+
 	return 0;
 }
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1779582fb500..65e60ee50489 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2476,6 +2476,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	pci_aer_init(dev);		/* Advanced Error Reporting */
 	pci_dpc_init(dev);		/* Downstream Port Containment */
 	pci_rcec_init(dev);		/* Root Complex Event Collector */
+	pci_doe_init(dev);		/* Data Object Exchange */
 
 	pcie_report_downtraining(dev);
 	pci_init_reset_methods(dev);
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 0145aef1b930..f25acf50879f 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -39,6 +39,7 @@ static void pci_destroy_dev(struct pci_dev *dev)
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);
 
+	pci_doe_destroy(dev);
 	pcie_aspm_exit_link_state(dev);
 	pci_bridge_d3_update(dev);
 	pci_free_resources(dev);
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index 7f16749c6aa3..d6192ee0ac07 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -29,6 +29,8 @@ struct pci_doe_mb;
 
 struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset);
 bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type);
+struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
+					u8 type);
 
 int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
 	    const void *request, size_t request_sz,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3c51cac3890b..b19c2965e384 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -511,6 +511,9 @@ struct pci_dev {
 #endif
 #ifdef CONFIG_PCI_P2PDMA
 	struct pci_p2pdma __rcu *p2pdma;
+#endif
+#ifdef CONFIG_PCI_DOE
+	struct xarray	doe_mbs;	/* Data Object Exchange mailboxes */
 #endif
 	u16		acs_cap;	/* ACS Capability offset */
 	phys_addr_t	rom;		/* Physical address if not from BAR */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 13/16] cxl/pci: Use CDAT DOE mailbox created by PCI core
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (11 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 14/16] PCI/DOE: Make mailbox creation API private Lukas Wunner
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration.

Drop creation of a (duplicate) CDAT DOE mailbox on cxl probing in favor
of the one already created by the PCI core.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/cxl/core/pci.c | 27 +++++------------------
 drivers/cxl/cxlmem.h   |  3 ---
 drivers/cxl/pci.c      | 49 ------------------------------------------
 3 files changed, 5 insertions(+), 74 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 09a9fa67d12a..1b954783b516 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -459,27 +459,6 @@ EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, CXL);
 #define CXL_DOE_TABLE_ACCESS_LAST_ENTRY		0xffff
 #define CXL_DOE_PROTOCOL_TABLE_ACCESS 2
 
-static struct pci_doe_mb *find_cdat_doe(struct device *uport)
-{
-	struct cxl_memdev *cxlmd;
-	struct cxl_dev_state *cxlds;
-	unsigned long index;
-	void *entry;
-
-	cxlmd = to_cxl_memdev(uport);
-	cxlds = cxlmd->cxlds;
-
-	xa_for_each(&cxlds->doe_mbs, index, entry) {
-		struct pci_doe_mb *cur = entry;
-
-		if (pci_doe_supports_prot(cur, PCI_DVSEC_VENDOR_ID_CXL,
-					  CXL_DOE_PROTOCOL_TABLE_ACCESS))
-			return cur;
-	}
-
-	return NULL;
-}
-
 #define CDAT_DOE_REQ(entry_handle) cpu_to_le32				\
 	(FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE,			\
 		    CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) |		\
@@ -577,10 +556,14 @@ void read_cdat_data(struct cxl_port *port)
 	struct pci_doe_mb *cdat_doe;
 	struct device *dev = &port->dev;
 	struct device *uport = port->uport;
+	struct cxl_memdev *cxlmd = to_cxl_memdev(uport);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
 	size_t cdat_length;
 	int rc;
 
-	cdat_doe = find_cdat_doe(uport);
+	cdat_doe = pci_find_doe_mailbox(pdev, PCI_DVSEC_VENDOR_ID_CXL,
+					CXL_DOE_PROTOCOL_TABLE_ACCESS);
 	if (!cdat_doe) {
 		dev_dbg(dev, "No CDAT mailbox\n");
 		return;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index ab138004f644..e1a1b23cf56c 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -227,7 +227,6 @@ struct cxl_endpoint_dvsec_info {
  * @component_reg_phys: register base of component registers
  * @info: Cached DVSEC information about the device.
  * @serial: PCIe Device Serial Number
- * @doe_mbs: PCI DOE mailbox array
  * @mbox_send: @dev specific transport for transmitting mailbox commands
  *
  * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
@@ -264,8 +263,6 @@ struct cxl_dev_state {
 	resource_size_t component_reg_phys;
 	u64 serial;
 
-	struct xarray doe_mbs;
-
 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
 };
 
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 33083a522fd1..f8b8e514a3c6 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -8,7 +8,6 @@
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/pci.h>
-#include <linux/pci-doe.h>
 #include <linux/aer.h>
 #include <linux/io.h>
 #include "cxlmem.h"
@@ -359,52 +358,6 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
 	return rc;
 }
 
-static void cxl_pci_destroy_doe(void *mbs)
-{
-	xa_destroy(mbs);
-}
-
-static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
-{
-	struct device *dev = cxlds->dev;
-	struct pci_dev *pdev = to_pci_dev(dev);
-	u16 off = 0;
-
-	xa_init(&cxlds->doe_mbs);
-	if (devm_add_action(&pdev->dev, cxl_pci_destroy_doe, &cxlds->doe_mbs)) {
-		dev_err(dev, "Failed to create XArray for DOE's\n");
-		return;
-	}
-
-	/*
-	 * Mailbox creation is best effort.  Higher layers must determine if
-	 * the lack of a mailbox for their protocol is a device failure or not.
-	 */
-	pci_doe_for_each_off(pdev, off) {
-		struct pci_doe_mb *doe_mb;
-
-		doe_mb = pcim_doe_create_mb(pdev, off);
-		if (IS_ERR(doe_mb)) {
-			dev_err(dev, "Failed to create MB object for MB @ %x\n",
-				off);
-			continue;
-		}
-
-		if (!pci_request_config_region_exclusive(pdev, off,
-							 PCI_DOE_CAP_SIZEOF,
-							 dev_name(dev)))
-			pci_err(pdev, "Failed to exclude DOE registers\n");
-
-		if (xa_insert(&cxlds->doe_mbs, off, doe_mb, GFP_KERNEL)) {
-			dev_err(dev, "xa_insert failed to insert MB @ %x\n",
-				off);
-			continue;
-		}
-
-		dev_dbg(dev, "Created DOE mailbox @%x\n", off);
-	}
-}
-
 /*
  * Assume that any RCIEP that emits the CXL memory expander class code
  * is an RCD
@@ -469,8 +422,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	cxlds->component_reg_phys = map.resource;
 
-	devm_cxl_pci_create_doe(cxlds);
-
 	rc = cxl_map_component_regs(&pdev->dev, &cxlds->regs.component,
 				    &map, BIT(CXL_CM_CAP_CAP_ID_RAS));
 	if (rc)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 14/16] PCI/DOE: Make mailbox creation API private
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (12 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 13/16] cxl/pci: Use CDAT DOE mailbox created by PCI core Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-15  2:13   ` Li, Ming
  2023-02-10 20:25 ` [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size Lukas Wunner
  2023-02-10 20:25 ` [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation Lukas Wunner
  15 siblings, 1 reply; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration.  CXL (the only in-tree DOE
user so far) has been migrated to use those mailboxes instead of
creating its own.

That leaves pcim_doe_create_mb() and pci_doe_for_each_off() without any
callers, so drop them.

pci_doe_supports_prot() is now only used internally, so declare it
static.

pci_doe_destroy_mb() is no longer used as callback for
devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
instead of a generic void pointer.

Because pci_doe_create_mb() is only called on device enumeration, i.e.
before driver binding, the workqueue name never contains a driver name.
So replace dev_driver_string() with dev_bus_name() when generating the
workqueue name.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 .clang-format           |  1 -
 drivers/pci/doe.c       | 41 ++++-------------------------------------
 include/linux/pci-doe.h | 14 --------------
 3 files changed, 4 insertions(+), 52 deletions(-)

diff --git a/.clang-format b/.clang-format
index b62836419ea3..cb1c17c7fcc9 100644
--- a/.clang-format
+++ b/.clang-format
@@ -520,7 +520,6 @@ ForEachMacros:
   - 'of_property_for_each_string'
   - 'of_property_for_each_u32'
   - 'pci_bus_for_each_resource'
-  - 'pci_doe_for_each_off'
   - 'pcl_for_each_chunk'
   - 'pcl_for_each_segment'
   - 'pcm_for_each_format'
diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index bf32875d27da..15e5f9df1bba 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -455,7 +455,7 @@ static struct pci_doe_mb *pci_doe_create_mb(struct pci_dev *pdev,
 	xa_init(&doe_mb->prots);
 
 	doe_mb->work_queue = alloc_ordered_workqueue("%s %s DOE [%x]", 0,
-						dev_driver_string(&pdev->dev),
+						dev_bus_name(&pdev->dev),
 						pci_name(pdev),
 						doe_mb->cap_offset);
 	if (!doe_mb->work_queue) {
@@ -499,50 +499,18 @@ static struct pci_doe_mb *pci_doe_create_mb(struct pci_dev *pdev,
 /**
  * pci_doe_destroy_mb() - Destroy a DOE mailbox object
  *
- * @ptr: Pointer to DOE mailbox
+ * @doe_mb: DOE mailbox
  *
  * Destroy all internal data structures created for the DOE mailbox.
  */
-static void pci_doe_destroy_mb(void *ptr)
+static void pci_doe_destroy_mb(struct pci_doe_mb *doe_mb)
 {
-	struct pci_doe_mb *doe_mb = ptr;
-
 	pci_doe_cancel_tasks(doe_mb);
 	xa_destroy(&doe_mb->prots);
 	destroy_workqueue(doe_mb->work_queue);
 	kfree(doe_mb);
 }
 
-/**
- * pcim_doe_create_mb() - Create a DOE mailbox object
- *
- * @pdev: PCI device to create the DOE mailbox for
- * @cap_offset: Offset of the DOE mailbox
- *
- * Create a single mailbox object to manage the mailbox protocol at the
- * cap_offset specified.  The mailbox will automatically be destroyed on
- * driver unbinding from @pdev.
- *
- * RETURNS: created mailbox object on success
- *	    ERR_PTR(-errno) on failure
- */
-struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset)
-{
-	struct pci_doe_mb *doe_mb;
-	int rc;
-
-	doe_mb = pci_doe_create_mb(pdev, cap_offset);
-	if (IS_ERR(doe_mb))
-		return doe_mb;
-
-	rc = devm_add_action_or_reset(&pdev->dev, pci_doe_destroy_mb, doe_mb);
-	if (rc)
-		return ERR_PTR(rc);
-
-	return doe_mb;
-}
-EXPORT_SYMBOL_GPL(pcim_doe_create_mb);
-
 /**
  * pci_doe_supports_prot() - Return if the DOE instance supports the given
  *			     protocol
@@ -552,7 +520,7 @@ EXPORT_SYMBOL_GPL(pcim_doe_create_mb);
  *
  * RETURNS: True if the DOE mailbox supports the protocol specified
  */
-bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type)
+static bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type)
 {
 	unsigned long index;
 	void *entry;
@@ -567,7 +535,6 @@ bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type)
 
 	return false;
 }
-EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
 
 /**
  * pci_doe_submit_task() - Submit a task to be processed by the state machine
diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
index d6192ee0ac07..1f14aed4354b 100644
--- a/include/linux/pci-doe.h
+++ b/include/linux/pci-doe.h
@@ -15,20 +15,6 @@
 
 struct pci_doe_mb;
 
-/**
- * pci_doe_for_each_off - Iterate each DOE capability
- * @pdev: struct pci_dev to iterate
- * @off: u16 of config space offset of each mailbox capability found
- */
-#define pci_doe_for_each_off(pdev, off) \
-	for (off = pci_find_next_ext_capability(pdev, off, \
-					PCI_EXT_CAP_ID_DOE); \
-		off > 0; \
-		off = pci_find_next_ext_capability(pdev, off, \
-					PCI_EXT_CAP_ID_DOE))
-
-struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset);
-bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type);
 struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
 					u8 type);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (13 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 14/16] PCI/DOE: Make mailbox creation API private Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-15  5:05   ` Li, Ming
  2023-02-10 20:25 ` [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation Lukas Wunner
  15 siblings, 1 reply; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

An upcoming user of DOE is CMA (Component Measurement and Authentication,
PCIe r6.0 sec 6.31).

It builds on SPDM (Security Protocol and Data Model):
https://www.dmtf.org/dsp/DSP0274

SPDM message sizes are not always a multiple of dwords.  To transport
them over DOE without using bounce buffers, allow sending requests and
receiving responses whose final dword is only partially populated.

To be clear, PCIe r6.0 sec 6.30.1 specifies the Data Object Header 2
"Length" in dwords and pci_doe_send_req() and pci_doe_recv_resp()
read/write dwords.  So from a spec point of view, DOE is still specified
in dwords and allowing non-dword request/response buffers is merely for
the convenience of callers.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 Changes v2 -> v3:
 * Fix byte order of last dword on big endian arches (Ira)
 * Explain in commit message and kerneldoc that arbitrary-sized
   payloads are not stipulated by the spec, but merely for
   convenience of the caller (Bjorn, Jonathan)
 * Add code comment that "remainder" in pci_doe_recv_resp() signifies
   the number of data bytes in the last payload dword (Ira)

 drivers/pci/doe.c | 74 +++++++++++++++++++++++++++++++----------------
 1 file changed, 49 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 15e5f9df1bba..14252f2fa955 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -76,13 +76,6 @@ struct pci_doe_protocol {
  * @private: Private data for the consumer
  * @work: Used internally by the mailbox
  * @doe_mb: Used internally by the mailbox
- *
- * The payload sizes and rv are specified in bytes with the following
- * restrictions concerning the protocol.
- *
- *	1) The request_pl_sz must be a multiple of double words (4 bytes)
- *	2) The response_pl_sz must be >= a single double word (4 bytes)
- *	3) rv is returned as bytes but it will be a multiple of double words
  */
 struct pci_doe_task {
 	struct pci_doe_protocol prot;
@@ -153,7 +146,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
 {
 	struct pci_dev *pdev = doe_mb->pdev;
 	int offset = doe_mb->cap_offset;
-	size_t length;
+	size_t length, remainder;
 	u32 val;
 	int i;
 
@@ -171,7 +164,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
 		return -EIO;
 
 	/* Length is 2 DW of header + length of payload in DW */
-	length = 2 + task->request_pl_sz / sizeof(u32);
+	length = 2 + DIV_ROUND_UP(task->request_pl_sz, sizeof(u32));
 	if (length > PCI_DOE_MAX_LENGTH)
 		return -EIO;
 	if (length == PCI_DOE_MAX_LENGTH)
@@ -184,10 +177,21 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
 	pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
 			       FIELD_PREP(PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH,
 					  length));
+
+	/* Write payload */
 	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)
 		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
 				       le32_to_cpu(task->request_pl[i]));
 
+	/* Write last payload dword */
+	remainder = task->request_pl_sz % sizeof(u32);
+	if (remainder) {
+		val = 0;
+		memcpy(&val, &task->request_pl[i], remainder);
+		le32_to_cpus(&val);
+		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE, val);
+	}
+
 	pci_doe_write_ctrl(doe_mb, PCI_DOE_CTRL_GO);
 
 	return 0;
@@ -207,11 +211,11 @@ static bool pci_doe_data_obj_ready(struct pci_doe_mb *doe_mb)
 
 static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
 {
+	size_t length, payload_length, remainder, received;
 	struct pci_dev *pdev = doe_mb->pdev;
 	int offset = doe_mb->cap_offset;
-	size_t length, payload_length;
+	int i = 0;
 	u32 val;
-	int i;
 
 	/* Read the first dword to get the protocol */
 	pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
@@ -238,15 +242,38 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas
 
 	/* First 2 dwords have already been read */
 	length -= 2;
-	payload_length = min(length, task->response_pl_sz / sizeof(u32));
-	/* Read the rest of the response payload */
-	for (i = 0; i < payload_length; i++) {
+	received = task->response_pl_sz;
+	payload_length = DIV_ROUND_UP(task->response_pl_sz, sizeof(u32));
+	remainder = task->response_pl_sz % sizeof(u32);
+
+	/* remainder signifies number of data bytes in last payload dword */
+	if (!remainder)
+		remainder = sizeof(u32);
+
+	if (length < payload_length) {
+		received = length * sizeof(u32);
+		payload_length = length;
+		remainder = sizeof(u32);
+	}
+
+	if (payload_length) {
+		/* Read all payload dwords except the last */
+		for (; i < payload_length - 1; i++) {
+			pci_read_config_dword(pdev, offset + PCI_DOE_READ,
+					      &val);
+			task->response_pl[i] = cpu_to_le32(val);
+			pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
+		}
+
+		/* Read last payload dword */
 		pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
-		task->response_pl[i] = cpu_to_le32(val);
+		cpu_to_le32s(&val);
+		memcpy(&task->response_pl[i], &val, remainder);
 		/* Prior to the last ack, ensure Data Object Ready */
-		if (i == (payload_length - 1) && !pci_doe_data_obj_ready(doe_mb))
+		if (!pci_doe_data_obj_ready(doe_mb))
 			return -EIO;
 		pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
+		i++;
 	}
 
 	/* Flush excess length */
@@ -260,7 +287,7 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas
 	if (FIELD_GET(PCI_DOE_STATUS_ERROR, val))
 		return -EIO;
 
-	return min(length, task->response_pl_sz / sizeof(u32)) * sizeof(u32);
+	return received;
 }
 
 static void signal_task_complete(struct pci_doe_task *task, int rv)
@@ -561,14 +588,6 @@ static int pci_doe_submit_task(struct pci_doe_mb *doe_mb,
 	if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type))
 		return -EINVAL;
 
-	/*
-	 * DOE requests must be a whole number of DW and the response needs to
-	 * be big enough for at least 1 DW
-	 */
-	if (task->request_pl_sz % sizeof(u32) ||
-	    task->response_pl_sz < sizeof(u32))
-		return -EINVAL;
-
 	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
 		return -EIO;
 
@@ -596,6 +615,11 @@ static int pci_doe_submit_task(struct pci_doe_mb *doe_mb,
  * without byte-swapping.  If payloads contain little-endian register values,
  * the caller is responsible for conversion with cpu_to_le32() / le32_to_cpu().
  *
+ * For convenience, arbitrary payload sizes are allowed even though PCIe r6.0
+ * sec 6.30.1 specifies the Data Object Header 2 "Length" in dwords.  The last
+ * (partial) dword is copied with byte granularity and padded with zeroes if
+ * necessary.  Callers are thus relieved from using dword-sized bounce buffers.
+ *
  * RETURNS: Length of received response or negative errno.
  * Received data in excess of @response_sz is discarded.
  * The length may be smaller than @response_sz and the caller
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation
  2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
                   ` (14 preceding siblings ...)
  2023-02-10 20:25 ` [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size Lukas Wunner
@ 2023-02-10 20:25 ` Lukas Wunner
  2023-02-14 13:05   ` Jonathan Cameron
                     ` (2 more replies)
  15 siblings, 3 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-10 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Jonathan notes that cxl_cdat_get_length() and cxl_cdat_read_table()
allocate 32 dwords for the DOE response even though it may be smaller.

In the case of cxl_cdat_get_length(), only the second dword of the
response is of interest (it contains the length).  So reduce the
allocation to 2 dwords and let DOE discard the remainder.

In the case of cxl_cdat_read_table(), a correctly sized allocation for
the full CDAT already exists.  Let DOE write each table entry directly
into that allocation.  There's a snag in that the table entry is
preceded by a Table Access Response Header (1 dword).  Save the last
dword of the previous table entry, let DOE overwrite it with the
header of the next entry and restore it afterwards.

The resulting CDAT is preceded by 4 unavoidable useless bytes.  Increase
the allocation size accordingly and skip these bytes when exposing CDAT
in sysfs.

The buffer overflow check in cxl_cdat_read_table() becomes unnecessary
because the remaining bytes in the allocation are tracked in "length",
which is passed to DOE and limits how many bytes it writes to the
allocation.  Additionally, cxl_cdat_read_table() bails out if the DOE
response is truncated due to insufficient space.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
---
 Changes v2 -> v3:
 * Newly added patch in v3 on popular request (Jonathan)

 drivers/cxl/core/pci.c | 34 ++++++++++++++++++----------------
 drivers/cxl/cxl.h      |  3 ++-
 drivers/cxl/port.c     |  2 +-
 3 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 1b954783b516..70097cc75302 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -471,7 +471,7 @@ static int cxl_cdat_get_length(struct device *dev,
 			       size_t *length)
 {
 	__le32 request = CDAT_DOE_REQ(0);
-	__le32 response[32];
+	__le32 response[2];
 	int rc;
 
 	rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
@@ -495,28 +495,28 @@ static int cxl_cdat_read_table(struct device *dev,
 			       struct pci_doe_mb *cdat_doe,
 			       struct cxl_cdat *cdat)
 {
-	size_t length = cdat->length;
-	u32 *data = cdat->table;
+	size_t length = cdat->length + sizeof(u32);
+	__le32 *data = cdat->table;
 	int entry_handle = 0;
+	__le32 saved_dw = 0;
 
 	do {
 		__le32 request = CDAT_DOE_REQ(entry_handle);
 		struct cdat_entry_header *entry;
-		__le32 response[32];
 		size_t entry_dw;
 		int rc;
 
 		rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
 			     CXL_DOE_PROTOCOL_TABLE_ACCESS,
 			     &request, sizeof(request),
-			     &response, sizeof(response));
+			     data, length);
 		if (rc < 0) {
 			dev_err(dev, "DOE failed: %d", rc);
 			return rc;
 		}
 
 		/* 1 DW Table Access Response Header + CDAT entry */
-		entry = (struct cdat_entry_header *)(response + 1);
+		entry = (struct cdat_entry_header *)(data + 1);
 		if ((entry_handle == 0 &&
 		     rc != sizeof(u32) + sizeof(struct cdat_header)) ||
 		    (entry_handle > 0 &&
@@ -526,21 +526,22 @@ static int cxl_cdat_read_table(struct device *dev,
 
 		/* Get the CXL table access header entry handle */
 		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
-					 le32_to_cpu(response[0]));
+					 le32_to_cpu(data[0]));
 		entry_dw = rc / sizeof(u32);
 		/* Skip Header */
 		entry_dw -= 1;
-		entry_dw = min(length / sizeof(u32), entry_dw);
-		/* Prevent length < 1 DW from causing a buffer overflow */
-		if (entry_dw) {
-			memcpy(data, entry, entry_dw * sizeof(u32));
-			length -= entry_dw * sizeof(u32);
-			data += entry_dw;
-		}
+		/*
+		 * Table Access Response Header overwrote the last DW of
+		 * previous entry, so restore that DW
+		 */
+		*data = saved_dw;
+		length -= entry_dw * sizeof(u32);
+		data += entry_dw;
+		saved_dw = *data;
 	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
 
 	/* Length in CDAT header may exceed concatenation of CDAT entries */
-	cdat->length -= length;
+	cdat->length -= length - sizeof(u32);
 
 	return 0;
 }
@@ -576,7 +577,8 @@ void read_cdat_data(struct cxl_port *port)
 		return;
 	}
 
-	port->cdat.table = devm_kzalloc(dev, cdat_length, GFP_KERNEL);
+	port->cdat.table = devm_kzalloc(dev, cdat_length + sizeof(u32),
+					GFP_KERNEL);
 	if (!port->cdat.table)
 		return;
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1b1cf459ac77..78f5cae5134c 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -494,7 +494,8 @@ struct cxl_pmem_region {
  * @component_reg_phys: component register capability base address (optional)
  * @dead: last ep has been removed, force port re-creation
  * @depth: How deep this port is relative to the root. depth 0 is the root.
- * @cdat: Cached CDAT data
+ * @cdat: Cached CDAT data (@table is preceded by 4 null bytes, these are not
+ *	  included in @length)
  * @cdat_available: Should a CDAT attribute be available in sysfs
  */
 struct cxl_port {
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 5453771bf330..0705343ac5ca 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -95,7 +95,7 @@ static ssize_t CDAT_read(struct file *filp, struct kobject *kobj,
 		return 0;
 
 	return memory_read_from_buffer(buf, count, &offset,
-				       port->cdat.table,
+				       port->cdat.table + sizeof(u32),
 				       port->cdat.length);
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* RE: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-10 20:25 ` [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian Lukas Wunner
@ 2023-02-11  0:22   ` Dan Williams
  2023-02-19 13:03     ` Lukas Wunner
  2023-02-14 11:15   ` Jonathan Cameron
  2023-02-28  2:53   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 62+ messages in thread
From: Dan Williams @ 2023-02-11  0:22 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Lukas Wunner wrote:
> The CDAT exposed in sysfs differs between little endian and big endian
> arches:  On big endian, every 4 bytes are byte-swapped.
> 
> PCI Configuration Space is little endian (PCI r3.0 sec 6.1).  Accessors
> such as pci_read_config_dword() implicitly swap bytes on big endian.
> That way, the macros in include/uapi/linux/pci_regs.h work regardless of
> the arch's endianness.  For an example of implicit byte-swapping, see
> ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
> (Load Word Byte-Reverse Indexed).
> 
> DOE Read/Write Data Mailbox Registers are unlike other registers in
> Configuration Space in that they contain or receive a 4 byte portion of
> an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
> They need to be copied to or from the request/response buffer verbatim.
> So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
> byte-swapping.
> 
> The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
> implicit byte-swapping.  Byte-swap requests after constructing them with
> those macros and byte-swap responses before parsing them.
> 
> Change the request and response type to __le32 to avoid sparse warnings.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+

Good catch, a question below, but either way:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c  | 12 ++++++------
>  drivers/pci/doe.c       | 13 ++++++++-----
>  include/linux/pci-doe.h |  8 ++++++--
>  3 files changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 57764e9cd19d..d3cf1d9d67d4 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -480,7 +480,7 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport)
>  	return NULL;
>  }
>  
> -#define CDAT_DOE_REQ(entry_handle)					\
> +#define CDAT_DOE_REQ(entry_handle) cpu_to_le32				\
>  	(FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE,			\
>  		    CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) |		\
>  	 FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE,			\
> @@ -493,8 +493,8 @@ static void cxl_doe_task_complete(struct pci_doe_task *task)
>  }
>  
>  struct cdat_doe_task {
> -	u32 request_pl;
> -	u32 response_pl[32];
> +	__le32 request_pl;
> +	__le32 response_pl[32];
>  	struct completion c;
>  	struct pci_doe_task task;
>  };
> @@ -531,7 +531,7 @@ static int cxl_cdat_get_length(struct device *dev,
>  	if (t.task.rv < sizeof(u32))
>  		return -EIO;
>  
> -	*length = t.response_pl[1];
> +	*length = le32_to_cpu(t.response_pl[1]);
>  	dev_dbg(dev, "CDAT length %zu\n", *length);
>  
>  	return 0;
> @@ -548,7 +548,7 @@ static int cxl_cdat_read_table(struct device *dev,
>  	do {
>  		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
>  		size_t entry_dw;
> -		u32 *entry;
> +		__le32 *entry;
>  		int rc;
>  
>  		rc = pci_doe_submit_task(cdat_doe, &t.task);
> @@ -563,7 +563,7 @@ static int cxl_cdat_read_table(struct device *dev,
>  
>  		/* Get the CXL table access header entry handle */
>  		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
> -					 t.response_pl[0]);
> +					 le32_to_cpu(t.response_pl[0]));
>  		entry = t.response_pl + 1;
>  		entry_dw = t.task.rv / sizeof(u32);
>  		/* Skip Header */
> diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> index 66d9ab288646..69efa9a250b9 100644
> --- a/drivers/pci/doe.c
> +++ b/drivers/pci/doe.c
> @@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
>  					  length));
>  	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)
>  		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
> -				       task->request_pl[i]);
> +				       le32_to_cpu(task->request_pl[i]));

What do you think about defining:

int pci_doe_write(const struct pci_dev *dev, int where, __le32 val);
int pci_doe_read(const struct pci_dev *dev, int where, __le32 *val);

...local to this file to make it extra clear that DOE transfers are
passing raw byte-streams and not register values as
pci_{write,read}_config_dword() expect.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header
  2023-02-10 20:25 ` [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header Lukas Wunner
@ 2023-02-11  0:40   ` Dan Williams
  2023-02-11  9:34     ` Lukas Wunner
  2023-02-14 11:16   ` Jonathan Cameron
  2023-02-15  1:41   ` Li, Ming
  2 siblings, 1 reply; 62+ messages in thread
From: Dan Williams @ 2023-02-11  0:40 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Lukas Wunner wrote:
> cxl_cdat_get_length() only checks whether the DOE response size is
> sufficient for the Table Access response header (1 dword), but not the
> succeeding CDAT header (1 dword length plus other fields).
> 
> It thus returns whatever uninitialized memory happens to be on the stack
> if a truncated DOE response with only 1 dword was received.  Fix it.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Reported-by: Ming Li <ming4.li@intel.com>
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index d3cf1d9d67d4..11a85b3a9a0b 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -528,7 +528,7 @@ static int cxl_cdat_get_length(struct device *dev,
>  		return rc;
>  	}
>  	wait_for_completion(&t.c);
> -	if (t.task.rv < sizeof(u32))
> +	if (t.task.rv < 2 * sizeof(u32))
>  		return -EIO;

Looks good, I wonder since this is standard for all data objects whether
the check should be pushed into the core?

For now this is easier to backport, but a follow-on could push it down a
level.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries
  2023-02-10 20:25 ` [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries Lukas Wunner
@ 2023-02-11  0:50   ` Dan Williams
  2023-02-11 10:56     ` Lukas Wunner
  2023-02-14 11:30   ` Jonathan Cameron
  1 sibling, 1 reply; 62+ messages in thread
From: Dan Williams @ 2023-02-11  0:50 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Lukas Wunner wrote:
> If truncated CDAT entries are received from a device, the concatenation
> of those entries constitutes a corrupt CDAT, yet is happily exposed to
> user space.
> 
> Avoid by verifying response lengths and erroring out if truncation is
> detected.
> 
> The last CDAT entry may still be truncated despite the checks introduced
> herein if the length in the CDAT header is too small.  However, that is
> easily detectable by user space because it reaches EOF prematurely.
> A subsequent commit which rightsizes the CDAT response allocation closes
> that remaining loophole.
> 
> The two lines introduced here which exceed 80 chars are shortened to
> less than 80 chars by a subsequent commit which migrates to a
> synchronous DOE API and replaces "t.task.rv" by "rc".
> 
> The existing acpi_cdat_header and acpi_table_cdat struct definitions
> provided by ACPICA cannot be used because they do not employ __le16 or
> __le32 types.  I believe that cannot be changed because those types are
> Linux-specific and ACPI is specified for little endian platforms only,
> hence doesn't care about endianness.  So duplicate the structs.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c | 13 +++++++++----
>  drivers/cxl/cxlpci.h   | 14 ++++++++++++++
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 11a85b3a9a0b..a3fb6bd68d17 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -547,8 +547,8 @@ static int cxl_cdat_read_table(struct device *dev,
>  
>  	do {
>  		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
> +		struct cdat_entry_header *entry;
>  		size_t entry_dw;
> -		__le32 *entry;
>  		int rc;
>  
>  		rc = pci_doe_submit_task(cdat_doe, &t.task);
> @@ -557,14 +557,19 @@ static int cxl_cdat_read_table(struct device *dev,
>  			return rc;
>  		}
>  		wait_for_completion(&t.c);
> -		/* 1 DW header + 1 DW data min */
> -		if (t.task.rv < (2 * sizeof(u32)))

Ah, I guess that's why the previous check can not be pushed down
further, its obviated by this more comprehensive check.

> +
> +		/* 1 DW Table Access Response Header + CDAT entry */
> +		entry = (struct cdat_entry_header *)(t.response_pl + 1);
> +		if ((entry_handle == 0 &&
> +		     t.task.rv != sizeof(u32) + sizeof(struct cdat_header)) ||
> +		    (entry_handle > 0 &&
> +		     (t.task.rv < sizeof(u32) + sizeof(struct cdat_entry_header) ||
> +		      t.task.rv != sizeof(u32) + le16_to_cpu(entry->length))))
>  			return -EIO;

Looks correct for catching that the response is large enough to
communicate the length and that the expected length was retrieved.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length
  2023-02-10 20:25 ` [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length Lukas Wunner
@ 2023-02-11  1:04   ` Dan Williams
  2023-02-14 11:33   ` Jonathan Cameron
  1 sibling, 0 replies; 62+ messages in thread
From: Dan Williams @ 2023-02-11  1:04 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Lukas Wunner wrote:
> If the length in the CDAT header is larger than the concatenation of the
> header and all table entries, then the CDAT exposed to user space
> contains trailing null bytes.
> 
> Not every consumer may be able to handle that.  Per Postel's robustness
> principle, "be liberal in what you accept" and silently reduce the
> cached length to avoid exposing those null bytes.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index a3fb6bd68d17..c37c41d7acb6 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -582,6 +582,9 @@ static int cxl_cdat_read_table(struct device *dev,
>  		}
>  	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
>  
> +	/* Length in CDAT header may exceed concatenation of CDAT entries */
> +	cdat->length -= length;
> +

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v3 06/16] PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
  2023-02-10 20:25 ` [PATCH v3 06/16] PCI/DOE: Fix memory leak " Lukas Wunner
@ 2023-02-11  1:06   ` Dan Williams
  2023-03-01  1:51   ` Davidlohr Bueso
  1 sibling, 0 replies; 62+ messages in thread
From: Dan Williams @ 2023-02-11  1:06 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Lukas Wunner wrote:
> After a pci_doe_task completes, its work_struct needs to be destroyed
> to avoid a memory leak with CONFIG_DEBUG_OBJECTS=y.
> 
> Fixes: 9d24322e887b ("PCI/DOE: Add DOE mailbox support functions")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: stable@vger.kernel.org # v6.0+
> ---
>  drivers/pci/doe.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> index c20ca62a8c9d..6cf0600a38aa 100644
> --- a/drivers/pci/doe.c
> +++ b/drivers/pci/doe.c
> @@ -224,6 +224,7 @@ static void signal_task_complete(struct pci_doe_task *task, int rv)
>  {
>  	task->rv = rv;
>  	task->complete(task);
> +	destroy_work_on_stack(&task->work);
>  }
>  
>  static void signal_task_abort(struct pci_doe_task *task, int rv)
> -- 
> 2.39.1
> 
> 

For 5 and 6, carried over from the v2:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

1 through 6 seem suitable to go in as fixes at any time.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header
  2023-02-11  0:40   ` Dan Williams
@ 2023-02-11  9:34     ` Lukas Wunner
  0 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-11  9:34 UTC (permalink / raw)
  To: Dan Williams
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Alison Schofield, Vishal Verma, Dave Jiang, Li,
	Ming, Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, Feb 10, 2023 at 04:40:15PM -0800, Dan Williams wrote:
> Lukas Wunner wrote:
> > cxl_cdat_get_length() only checks whether the DOE response size is
> > sufficient for the Table Access response header (1 dword), but not the
> > succeeding CDAT header (1 dword length plus other fields).
> > 
> > It thus returns whatever uninitialized memory happens to be on the stack
> > if a truncated DOE response with only 1 dword was received.  Fix it.
[...]
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -528,7 +528,7 @@ static int cxl_cdat_get_length(struct device *dev,
> >  		return rc;
> >  	}
> >  	wait_for_completion(&t.c);
> > -	if (t.task.rv < sizeof(u32))
> > +	if (t.task.rv < 2 * sizeof(u32))
> >  		return -EIO;
> 
> Looks good, I wonder since this is standard for all data objects whether
> the check should be pushed into the core?

No, "t.task.rv" contains the payload length in bytes of the response
received via DOE.  It doesn't include the DOE header.

I think it is legal to receive an empty response via DOE, so I cannot
push a length check down into the core.

In this case, the payload contains one dword for the Table Access Response
header (CXL r3.0 sec 8.1.11.1), followed by 3 dwords for the CDAT header:

https://uefi.org/sites/default/files/resources/Coherent%20Device%20Attribute%20Table_1.01.pdf

The CDAT header contains the length of the entire CDAT in the first
dword, hence the above-quoted check only verifies that at least two
dwords were received.  It's harmless if the remainder of the CDAT
header is truncated.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries
  2023-02-11  0:50   ` Dan Williams
@ 2023-02-11 10:56     ` Lukas Wunner
  0 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-11 10:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Alison Schofield, Vishal Verma, Dave Jiang, Li,
	Ming, Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, Feb 10, 2023 at 04:50:38PM -0800, Dan Williams wrote:
> Lukas Wunner wrote:
> > If truncated CDAT entries are received from a device, the concatenation
> > of those entries constitutes a corrupt CDAT, yet is happily exposed to
> > user space.
> > 
> > Avoid by verifying response lengths and erroring out if truncation is
> > detected.
[...]
> > @@ -557,14 +557,19 @@ static int cxl_cdat_read_table(struct device *dev,
> >  			return rc;
> >  		}
> >  		wait_for_completion(&t.c);
> > -		/* 1 DW header + 1 DW data min */
> > -		if (t.task.rv < (2 * sizeof(u32)))
> 
> Ah, I guess that's why the previous check can not be pushed down
> further, its obviated by this more comprehensive check.

The check in the previous patch ("cxl/pci: Handle truncated CDAT header")
fixes response size verification for cxl_cdat_get_length().

The check here however is in a different function, cxl_cdat_read_table().
Previously the function only checked whether the DOE response contained
two dwords.  That's one dword for the Table Access Response header
(CXL r3.0 sec 8.1.11.1) plus one dword for the common header shared
by all CDAT structures.

What can happen is we receive a truncated entry with, say, just
two dwords, and we copy that to the cached CDAT exposed in sysfs.
Obviously that's a corrupt CDAT.  The consumer of the CDAT doesn't
know that the data was truncated and will try to parse the entry
even though the data may actually be a portion of the next entry.

The improved verification here ensures that the received amount of
data corresponds to the length in the entry header.

It is up to the parsing function of each entry to verify whether
that length is correct for that specific entry.  E.g. a DSMAS entry
should contain 24 bytes, so the parsing function needs to verify
that the length in the entry header is actually 24.  (See the e-mail
I sent to Dave a few minutes ago.)

Thanks,

Lukas

> > +		/* 1 DW Table Access Response Header + CDAT entry */
> > +		entry = (struct cdat_entry_header *)(t.response_pl + 1);
> > +		if ((entry_handle == 0 &&
> > +		     t.task.rv != sizeof(u32) + sizeof(struct cdat_header)) ||
> > +		    (entry_handle > 0 &&
> > +		     (t.task.rv < sizeof(u32) + sizeof(struct cdat_entry_header) ||
> > +		      t.task.rv != sizeof(u32) + le16_to_cpu(entry->length))))
> >  			return -EIO;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-10 20:25 ` [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian Lukas Wunner
  2023-02-11  0:22   ` Dan Williams
@ 2023-02-14 11:15   ` Jonathan Cameron
  2023-02-14 13:51     ` Lukas Wunner
  2023-02-28  2:53   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 11:15 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, 10 Feb 2023 21:25:01 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> The CDAT exposed in sysfs differs between little endian and big endian
> arches:  On big endian, every 4 bytes are byte-swapped.
> 
> PCI Configuration Space is little endian (PCI r3.0 sec 6.1).  Accessors
> such as pci_read_config_dword() implicitly swap bytes on big endian.
> That way, the macros in include/uapi/linux/pci_regs.h work regardless of
> the arch's endianness.  For an example of implicit byte-swapping, see
> ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
> (Load Word Byte-Reverse Indexed).
> 
> DOE Read/Write Data Mailbox Registers are unlike other registers in
> Configuration Space in that they contain or receive a 4 byte portion of
> an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
> They need to be copied to or from the request/response buffer verbatim.
> So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
> byte-swapping.
> 
> The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
> implicit byte-swapping.  Byte-swap requests after constructing them with
> those macros and byte-swap responses before parsing them.
> 
> Change the request and response type to __le32 to avoid sparse warnings.

I'll start by saying I hate endian issues. Endless headaches.

The type swaps in here are confusing me but after arguing myself both
ways this morning, I think you have it right.

In brief this patch set ensures that internal handling of the payloads
is in le32 chunks by unwinding the implicit conversions in the pci config
accessors for BE platforms.  Thus the data in the payloads is always
in the same order.  Good start.  Hence the binary read back of CDAT is
little endian (including all fields within it).

Should we reflect that in the type of cdat->table or at least he u32 *data
pointer in cxl_cdat_read_table()

A few other trivial simplifications inline.

Jonathan


> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c  | 12 ++++++------
>  drivers/pci/doe.c       | 13 ++++++++-----
>  include/linux/pci-doe.h |  8 ++++++--
>  3 files changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 57764e9cd19d..d3cf1d9d67d4 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -480,7 +480,7 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport)
>  	return NULL;
>  }
>  
> -#define CDAT_DOE_REQ(entry_handle)					\
> +#define CDAT_DOE_REQ(entry_handle) cpu_to_le32				\
>  	(FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE,			\
>  		    CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) |		\
>  	 FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE,			\
> @@ -493,8 +493,8 @@ static void cxl_doe_task_complete(struct pci_doe_task *task)
>  }
>  
>  struct cdat_doe_task {
> -	u32 request_pl;
> -	u32 response_pl[32];
> +	__le32 request_pl;
> +	__le32 response_pl[32];
>  	struct completion c;
>  	struct pci_doe_task task;
>  };
> @@ -531,7 +531,7 @@ static int cxl_cdat_get_length(struct device *dev,
>  	if (t.task.rv < sizeof(u32))
>  		return -EIO;
>  
> -	*length = t.response_pl[1];
> +	*length = le32_to_cpu(t.response_pl[1]);
>  	dev_dbg(dev, "CDAT length %zu\n", *length);
>  
>  	return 0;
> @@ -548,7 +548,7 @@ static int cxl_cdat_read_table(struct device *dev,
>  	do {
>  		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
>  		size_t entry_dw;
> -		u32 *entry;
> +		__le32 *entry;
>  		int rc;
>  
>  		rc = pci_doe_submit_task(cdat_doe, &t.task);
> @@ -563,7 +563,7 @@ static int cxl_cdat_read_table(struct device *dev,
>  
>  		/* Get the CXL table access header entry handle */
>  		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
> -					 t.response_pl[0]);
> +					 le32_to_cpu(t.response_pl[0]));
>  		entry = t.response_pl + 1;
>  		entry_dw = t.task.rv / sizeof(u32);

Given we are manipulating it as __le32, I'd prefer to see that reflected
in the sizeof() calls.

>  		/* Skip Header */
> diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> index 66d9ab288646..69efa9a250b9 100644
> --- a/drivers/pci/doe.c
> +++ b/drivers/pci/doe.c
> @@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
>  					  length));
>  	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)

sizeof(__le32) or sizeof(task->request_pl[0])

>  		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
> -				       task->request_pl[i]);
> +				       le32_to_cpu(task->request_pl[i]));
>  
>  	pci_doe_write_ctrl(doe_mb, PCI_DOE_CTRL_GO);
>  
> @@ -198,8 +198,8 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas
>  	payload_length = min(length, task->response_pl_sz / sizeof(u32));
>  	/* Read the rest of the response payload */
>  	for (i = 0; i < payload_length; i++) {
> -		pci_read_config_dword(pdev, offset + PCI_DOE_READ,
> -				      &task->response_pl[i]);
> +		pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
> +		task->response_pl[i] = cpu_to_le32(val);
>  		/* Prior to the last ack, ensure Data Object Ready */
>  		if (i == (payload_length - 1) && !pci_doe_data_obj_ready(doe_mb))
>  			return -EIO;
> @@ -322,15 +322,17 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
>  	struct pci_doe_task task = {
>  		.prot.vid = PCI_VENDOR_ID_PCI_SIG,
>  		.prot.type = PCI_DOE_PROTOCOL_DISCOVERY,
> -		.request_pl = &request_pl,
> +		.request_pl = (__le32 *)&request_pl,

I don't like this type cast. request_pl is local
anyway, so why not change it's type and set it appropriately in
the first place.

>  		.request_pl_sz = sizeof(request_pl),
> -		.response_pl = &response_pl,
> +		.response_pl = (__le32 *)&response_pl,

Likewise here.

>  		.response_pl_sz = sizeof(response_pl),
>  		.complete = pci_doe_task_complete,
>  		.private = &c,
>  	};
>  	int rc;
>  
> +	cpu_to_le32s(&request_pl);
> +
>  	rc = pci_doe_submit_task(doe_mb, &task);
>  	if (rc < 0)
>  		return rc;
> @@ -340,6 +342,7 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
>  	if (task.rv != sizeof(response_pl))
>  		return -EIO;
>  
> +	le32_to_cpus(&response_pl);
>  	*vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl);
>  	*protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL,
>  			      response_pl);
> diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
> index ed9b4df792b8..43765eaf2342 100644
> --- a/include/linux/pci-doe.h
> +++ b/include/linux/pci-doe.h
> @@ -34,6 +34,10 @@ struct pci_doe_mb;
>   * @work: Used internally by the mailbox
>   * @doe_mb: Used internally by the mailbox
>   *
> + * Payloads are treated as opaque byte streams which are transmitted verbatim,
> + * without byte-swapping.  If payloads contain little-endian register values,
> + * the caller is responsible for conversion with cpu_to_le32() / le32_to_cpu().
> + *
>   * The payload sizes and rv are specified in bytes with the following
>   * restrictions concerning the protocol.
>   *
> @@ -45,9 +49,9 @@ struct pci_doe_mb;
>   */
>  struct pci_doe_task {
>  	struct pci_doe_protocol prot;
> -	u32 *request_pl;
> +	__le32 *request_pl;
>  	size_t request_pl_sz;
> -	u32 *response_pl;
> +	__le32 *response_pl;
>  	size_t response_pl_sz;
>  	int rv;
>  	void (*complete)(struct pci_doe_task *task);


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header
  2023-02-10 20:25 ` [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header Lukas Wunner
  2023-02-11  0:40   ` Dan Williams
@ 2023-02-14 11:16   ` Jonathan Cameron
  2023-02-15  1:41   ` Li, Ming
  2 siblings, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 11:16 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, 10 Feb 2023 21:25:02 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> cxl_cdat_get_length() only checks whether the DOE response size is
> sufficient for the Table Access response header (1 dword), but not the
> succeeding CDAT header (1 dword length plus other fields).
> 
> It thus returns whatever uninitialized memory happens to be on the stack
> if a truncated DOE response with only 1 dword was received.  Fix it.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Reported-by: Ming Li <ming4.li@intel.com>
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
Oops + thanks for fixing.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index d3cf1d9d67d4..11a85b3a9a0b 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -528,7 +528,7 @@ static int cxl_cdat_get_length(struct device *dev,
>  		return rc;
>  	}
>  	wait_for_completion(&t.c);
> -	if (t.task.rv < sizeof(u32))
> +	if (t.task.rv < 2 * sizeof(u32))
>  		return -EIO;
>  
>  	*length = le32_to_cpu(t.response_pl[1]);


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries
  2023-02-10 20:25 ` [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries Lukas Wunner
  2023-02-11  0:50   ` Dan Williams
@ 2023-02-14 11:30   ` Jonathan Cameron
  1 sibling, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 11:30 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, 10 Feb 2023 21:25:03 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> If truncated CDAT entries are received from a device, the concatenation
> of those entries constitutes a corrupt CDAT, yet is happily exposed to
> user space.
> 
> Avoid by verifying response lengths and erroring out if truncation is
> detected.
> 
> The last CDAT entry may still be truncated despite the checks introduced
> herein if the length in the CDAT header is too small.  However, that is
> easily detectable by user space because it reaches EOF prematurely.
> A subsequent commit which rightsizes the CDAT response allocation closes
> that remaining loophole.
> 
> The two lines introduced here which exceed 80 chars are shortened to
> less than 80 chars by a subsequent commit which migrates to a
> synchronous DOE API and replaces "t.task.rv" by "rc".
> 
> The existing acpi_cdat_header and acpi_table_cdat struct definitions
> provided by ACPICA cannot be used because they do not employ __le16 or
> __le32 types.  I believe that cannot be changed because those types are
> Linux-specific and ACPI is specified for little endian platforms only,
> hence doesn't care about endianness.  So duplicate the structs.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
One trivial suggestion.

Glad to see this tightened up.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c | 13 +++++++++----
>  drivers/cxl/cxlpci.h   | 14 ++++++++++++++
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 11a85b3a9a0b..a3fb6bd68d17 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -547,8 +547,8 @@ static int cxl_cdat_read_table(struct device *dev,
>  
>  	do {
>  		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
> +		struct cdat_entry_header *entry;
>  		size_t entry_dw;
> -		__le32 *entry;
>  		int rc;
>  
>  		rc = pci_doe_submit_task(cdat_doe, &t.task);
> @@ -557,14 +557,19 @@ static int cxl_cdat_read_table(struct device *dev,
>  			return rc;
>  		}
>  		wait_for_completion(&t.c);
> -		/* 1 DW header + 1 DW data min */
> -		if (t.task.rv < (2 * sizeof(u32)))
> +
> +		/* 1 DW Table Access Response Header + CDAT entry */
> +		entry = (struct cdat_entry_header *)(t.response_pl + 1);
> +		if ((entry_handle == 0 &&
> +		     t.task.rv != sizeof(u32) + sizeof(struct cdat_header)) ||
> +		    (entry_handle > 0 &&
> +		     (t.task.rv < sizeof(u32) + sizeof(struct cdat_entry_header) ||

Slight preference for sizeof(*entry)
so that it is clear that we are checking we can do the next check without
getting garbage.

> +		      t.task.rv != sizeof(u32) + le16_to_cpu(entry->length))))
>  			return -EIO;
>  
>  		/* Get the CXL table access header entry handle */
>  		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
>  					 le32_to_cpu(t.response_pl[0]));
> -		entry = t.response_pl + 1;
>  		entry_dw = t.task.rv / sizeof(u32);
>  		/* Skip Header */
>  		entry_dw -= 1;
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 920909791bb9..104ad2b72516 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -62,6 +62,20 @@ enum cxl_regloc_type {
>  	CXL_REGLOC_RBI_TYPES
>  };
>  
> +struct cdat_header {
> +	__le32 length;
> +	u8 revision;
> +	u8 checksum;
> +	u8 reserved[6];
> +	__le32 sequence;
> +} __packed;
> +
> +struct cdat_entry_header {
> +	u8 type;
> +	u8 reserved;
> +	__le16 length;
> +} __packed;
> +
>  int devm_cxl_port_enumerate_dports(struct cxl_port *port);
>  struct cxl_dev_state;
>  int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length
  2023-02-10 20:25 ` [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length Lukas Wunner
  2023-02-11  1:04   ` Dan Williams
@ 2023-02-14 11:33   ` Jonathan Cameron
  2023-02-16 10:26     ` Lukas Wunner
  1 sibling, 1 reply; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 11:33 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, 10 Feb 2023 21:25:04 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> If the length in the CDAT header is larger than the concatenation of the
> header and all table entries, then the CDAT exposed to user space
> contains trailing null bytes.
> 
> Not every consumer may be able to handle that.  Per Postel's robustness
> principle, "be liberal in what you accept" and silently reduce the
> cached length to avoid exposing those null bytes.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+

Fair enough. I'd argue that we are papering over broken hardware if
we hit these conditions, so given we aren't aware of any (I hope)
not sure this is stable material.  Argument in favor of stable being
that if we do get broken hardware we don't want an ABI change when
we paper over the garbage... hmm.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/cxl/core/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index a3fb6bd68d17..c37c41d7acb6 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -582,6 +582,9 @@ static int cxl_cdat_read_table(struct device *dev,
>  		}
>  	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
>  
> +	/* Length in CDAT header may exceed concatenation of CDAT entries */
> +	cdat->length -= length;
> +
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing
  2023-02-10 20:25 ` [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing Lukas Wunner
@ 2023-02-14 11:36   ` Jonathan Cameron
  2023-02-15  5:07   ` Li, Ming
  1 sibling, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 11:36 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, 10 Feb 2023 21:25:10 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> When a DOE mailbox is torn down, its workqueue is flushed once in
> pci_doe_flush_mb() through a call to flush_workqueue() and subsequently
> flushed once more in pci_doe_destroy_workqueue() through a call to
> destroy_workqueue().
> 
> Deduplicate by dropping flush_workqueue() from pci_doe_flush_mb().
> 
> Rename pci_doe_flush_mb() to pci_doe_cancel_tasks() to more aptly
> describe what it now does.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
LGTM

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3
> 
>  drivers/pci/doe.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> index afb53bc1b4aa..291cd7a46a39 100644
> --- a/drivers/pci/doe.c
> +++ b/drivers/pci/doe.c
> @@ -429,7 +429,7 @@ static void pci_doe_destroy_workqueue(void *mb)
>  	destroy_workqueue(doe_mb->work_queue);
>  }
>  
> -static void pci_doe_flush_mb(void *mb)
> +static void pci_doe_cancel_tasks(void *mb)
>  {
>  	struct pci_doe_mb *doe_mb = mb;
>  
> @@ -439,9 +439,6 @@ static void pci_doe_flush_mb(void *mb)
>  	/* Cancel an in progress work item, if necessary */
>  	set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags);
>  	wake_up(&doe_mb->wq);
> -
> -	/* Flush all work items */
> -	flush_workqueue(doe_mb->work_queue);
>  }
>  
>  /**
> @@ -498,9 +495,9 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset)
>  
>  	/*
>  	 * The state machine and the mailbox should be in sync now;
> -	 * Set up mailbox flush prior to using the mailbox to query protocols.
> +	 * Set up cancel tasks prior to using the mailbox to query protocols.
>  	 */
> -	rc = devm_add_action_or_reset(dev, pci_doe_flush_mb, doe_mb);
> +	rc = devm_add_action_or_reset(dev, pci_doe_cancel_tasks, doe_mb);
>  	if (rc)
>  		return ERR_PTR(rc);
>  


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management
  2023-02-10 20:25 ` [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management Lukas Wunner
@ 2023-02-14 11:51   ` Jonathan Cameron
  2023-02-15  5:17   ` Li, Ming
  1 sibling, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 11:51 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, 10 Feb 2023 21:25:11 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> DOE mailbox creation is currently only possible through a devres-managed
> API.  The lifetime of mailboxes thus ends with driver unbinding.
> 
> An upcoming commit will create DOE mailboxes upon device enumeration by
> the PCI core.  Their lifetime shall not be limited by a driver.
> 
> Therefore rework pcim_doe_create_mb() into the non-devres-managed
> pci_doe_create_mb().  Add pci_doe_destroy_mb() for mailbox destruction
> on device removal.
> 
> Provide a devres-managed wrapper under the existing pcim_doe_create_mb()
> name.
> 
> The error path of pcim_doe_create_mb() previously called xa_destroy() if
> alloc_ordered_workqueue() failed.  That's unnecessary because the xarray
> is still empty at that point.  It doesn't need to be destroyed until
> it's been populated by pci_doe_cache_protocols().  Arrange the error
> path of the new pci_doe_create_mb() accordingly.
> 
> pci_doe_cancel_tasks() is no longer used as callback for
> devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
> instead of a generic void pointer.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation
  2023-02-10 20:25 ` [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation Lukas Wunner
@ 2023-02-14 13:05   ` Jonathan Cameron
  2023-02-16  0:56   ` Ira Weiny
  2023-02-28  1:45   ` Alexey Kardashevskiy
  2 siblings, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 13:05 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, 10 Feb 2023 21:25:16 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> Jonathan notes that cxl_cdat_get_length() and cxl_cdat_read_table()
> allocate 32 dwords for the DOE response even though it may be smaller.
> 
> In the case of cxl_cdat_get_length(), only the second dword of the
> response is of interest (it contains the length).  So reduce the
> allocation to 2 dwords and let DOE discard the remainder.
> 
> In the case of cxl_cdat_read_table(), a correctly sized allocation for
> the full CDAT already exists.  Let DOE write each table entry directly
> into that allocation.  There's a snag in that the table entry is
> preceded by a Table Access Response Header (1 dword).  Save the last
> dword of the previous table entry, let DOE overwrite it with the
> header of the next entry and restore it afterwards.

Marginally nasty, but looks like it works to me and avoid excessive allocations.

> 
> The resulting CDAT is preceded by 4 unavoidable useless bytes.  Increase
> the allocation size accordingly and skip these bytes when exposing CDAT
> in sysfs.
> 
> The buffer overflow check in cxl_cdat_read_table() becomes unnecessary
> because the remaining bytes in the allocation are tracked in "length",
> which is passed to DOE and limits how many bytes it writes to the
> allocation.  Additionally, cxl_cdat_read_table() bails out if the DOE
> response is truncated due to insufficient space.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Follow up comment on earlier one inline.

> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3 on popular request (Jonathan)
> 
>  drivers/cxl/core/pci.c | 34 ++++++++++++++++++----------------
>  drivers/cxl/cxl.h      |  3 ++-
>  drivers/cxl/port.c     |  2 +-
>  3 files changed, 21 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 1b954783b516..70097cc75302 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -471,7 +471,7 @@ static int cxl_cdat_get_length(struct device *dev,
>  			       size_t *length)
>  {
>  	__le32 request = CDAT_DOE_REQ(0);
> -	__le32 response[32];
> +	__le32 response[2];
>  	int rc;
>  
>  	rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
> @@ -495,28 +495,28 @@ static int cxl_cdat_read_table(struct device *dev,
>  			       struct pci_doe_mb *cdat_doe,
>  			       struct cxl_cdat *cdat)
>  {
> -	size_t length = cdat->length;
> -	u32 *data = cdat->table;
> +	size_t length = cdat->length + sizeof(u32);
> +	__le32 *data = cdat->table;

Ah. Makes my earlier comment on this type irrelevant.

>  	int entry_handle = 0;
> +	__le32 saved_dw = 0;
>  
>  	do {
>  		__le32 request = CDAT_DOE_REQ(entry_handle);
>  		struct cdat_entry_header *entry;
> -		__le32 response[32];
>  		size_t entry_dw;
>  		int rc;
>  
>  		rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
>  			     CXL_DOE_PROTOCOL_TABLE_ACCESS,
>  			     &request, sizeof(request),
> -			     &response, sizeof(response));
> +			     data, length);
>  		if (rc < 0) {
>  			dev_err(dev, "DOE failed: %d", rc);
>  			return rc;
>  		}
>  
>  		/* 1 DW Table Access Response Header + CDAT entry */
> -		entry = (struct cdat_entry_header *)(response + 1);
> +		entry = (struct cdat_entry_header *)(data + 1);
>  		if ((entry_handle == 0 &&
>  		     rc != sizeof(u32) + sizeof(struct cdat_header)) ||
>  		    (entry_handle > 0 &&
> @@ -526,21 +526,22 @@ static int cxl_cdat_read_table(struct device *dev,
>  
>  		/* Get the CXL table access header entry handle */
>  		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
> -					 le32_to_cpu(response[0]));
> +					 le32_to_cpu(data[0]));
>  		entry_dw = rc / sizeof(u32);
>  		/* Skip Header */
>  		entry_dw -= 1;
> -		entry_dw = min(length / sizeof(u32), entry_dw);
> -		/* Prevent length < 1 DW from causing a buffer overflow */
> -		if (entry_dw) {
> -			memcpy(data, entry, entry_dw * sizeof(u32));
> -			length -= entry_dw * sizeof(u32);
> -			data += entry_dw;
> -		}
> +		/*
> +		 * Table Access Response Header overwrote the last DW of
> +		 * previous entry, so restore that DW
> +		 */
> +		*data = saved_dw;
> +		length -= entry_dw * sizeof(u32);
> +		data += entry_dw;
> +		saved_dw = *data;
>  	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
>  
>  	/* Length in CDAT header may exceed concatenation of CDAT entries */
> -	cdat->length -= length;
> +	cdat->length -= length - sizeof(u32);
>  
>  	return 0;
>  }
> @@ -576,7 +577,8 @@ void read_cdat_data(struct cxl_port *port)
>  		return;
>  	}
>  
> -	port->cdat.table = devm_kzalloc(dev, cdat_length, GFP_KERNEL);
> +	port->cdat.table = devm_kzalloc(dev, cdat_length + sizeof(u32),
> +					GFP_KERNEL);
>  	if (!port->cdat.table)
>  		return;
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1b1cf459ac77..78f5cae5134c 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -494,7 +494,8 @@ struct cxl_pmem_region {
>   * @component_reg_phys: component register capability base address (optional)
>   * @dead: last ep has been removed, force port re-creation
>   * @depth: How deep this port is relative to the root. depth 0 is the root.
> - * @cdat: Cached CDAT data
> + * @cdat: Cached CDAT data (@table is preceded by 4 null bytes, these are not
> + *	  included in @length)
>   * @cdat_available: Should a CDAT attribute be available in sysfs
>   */
>  struct cxl_port {
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 5453771bf330..0705343ac5ca 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -95,7 +95,7 @@ static ssize_t CDAT_read(struct file *filp, struct kobject *kobj,
>  		return 0;
>  
>  	return memory_read_from_buffer(buf, count, &offset,
> -				       port->cdat.table,
> +				       port->cdat.table + sizeof(u32),
>  				       port->cdat.length);
>  }
>  


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-14 11:15   ` Jonathan Cameron
@ 2023-02-14 13:51     ` Lukas Wunner
  2023-02-14 15:45       ` Jonathan Cameron
  0 siblings, 1 reply; 62+ messages in thread
From: Lukas Wunner @ 2023-02-14 13:51 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Tue, Feb 14, 2023 at 11:15:04AM +0000, Jonathan Cameron wrote:
> On Fri, 10 Feb 2023 21:25:01 +0100 Lukas Wunner <lukas@wunner.de> wrote:
> > The CDAT exposed in sysfs differs between little endian and big endian
> > arches:  On big endian, every 4 bytes are byte-swapped.
> > 
> > PCI Configuration Space is little endian (PCI r3.0 sec 6.1).  Accessors
> > such as pci_read_config_dword() implicitly swap bytes on big endian.
> > That way, the macros in include/uapi/linux/pci_regs.h work regardless of
> > the arch's endianness.  For an example of implicit byte-swapping, see
> > ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
> > (Load Word Byte-Reverse Indexed).
> > 
> > DOE Read/Write Data Mailbox Registers are unlike other registers in
> > Configuration Space in that they contain or receive a 4 byte portion of
> > an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
> > They need to be copied to or from the request/response buffer verbatim.
> > So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
> > byte-swapping.
> > 
> > The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
> > implicit byte-swapping.  Byte-swap requests after constructing them with
> > those macros and byte-swap responses before parsing them.
> > 
> > Change the request and response type to __le32 to avoid sparse warnings.
> 
> In brief this patch set ensures that internal handling of the payloads
> is in le32 chunks by unwinding the implicit conversions in the pci config
> accessors for BE platforms.  Thus the data in the payloads is always
> in the same order.  Good start.  Hence the binary read back of CDAT is
> little endian (including all fields within it).

Correct.  On big endian this means writing request dwords and
reading response dwords involves 2 endianness conversions.

If the request is constructed with macros such as CXL_DOE_TABLE_ACCESS_*
and PCI_DOE_DATA_OBJECT_DISC_*, then an additional (3rd) endianness
conversion is necessary:  The request is constructed in native big endian
order, then converted to a little endian DOE payload.  That payload is
then converted back to big endian so that it can be written with the
config space accessor, which converts to little endian.

I recognize this is crazy but I do not see a better solution.
Dan suggested amending struct pci_ops with ->read_raw and ->write_raw
callbacks which omit the endianness conversion.  However there are
175 struct pci_ops in the kernel and each one would have to be looked at.
I also do not know the Assembler instructions for every architecture to
perform a non-byte-swapped write or read.  It's a big task and lots of
code to add.  Given that DOE is likely the only user and big endian
is seeing decreasing use, it's probably not worth the effort and LoC.


> > @@ -322,15 +322,17 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
> >  	struct pci_doe_task task = {
> >  		.prot.vid = PCI_VENDOR_ID_PCI_SIG,
> >  		.prot.type = PCI_DOE_PROTOCOL_DISCOVERY,
> > -		.request_pl = &request_pl,
> > +		.request_pl = (__le32 *)&request_pl,
> 
> I don't like this type cast. request_pl is local
> anyway, so why not change it's type and set it appropriately in
> the first place.

That doesn't work I'm afraid.  The request_pl is constructed with
FIELD_PREP() here and the response_pl is parsed with FIELD_GET().

Those macros operate on an unsigned integer of native endianness.
If I declare them __le32, I get sparse warnings and rightfully so.

The casts are the simplest, least intrusive solution.  The only
alternative would be to use separate variables but then I'd have
to change a lot more lines and it would be more difficult to review
and probably not all that more readable.

Thanks,

Lukas

> >  		.request_pl_sz = sizeof(request_pl),
> > -		.response_pl = &response_pl,
> > +		.response_pl = (__le32 *)&response_pl,
> 
> Likewise here.
> 
> >  		.response_pl_sz = sizeof(response_pl),
> >  		.complete = pci_doe_task_complete,
> >  		.private = &c,
> >  	};
> >  	int rc;
> >  
> > +	cpu_to_le32s(&request_pl);
> > +
> >  	rc = pci_doe_submit_task(doe_mb, &task);
> >  	if (rc < 0)
> >  		return rc;
> > @@ -340,6 +342,7 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
> >  	if (task.rv != sizeof(response_pl))
> >  		return -EIO;
> >  
> > +	le32_to_cpus(&response_pl);
> >  	*vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl);
> >  	*protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL,
> >  			      response_pl);

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-14 13:51     ` Lukas Wunner
@ 2023-02-14 15:45       ` Jonathan Cameron
  0 siblings, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-14 15:45 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Tue, 14 Feb 2023 14:51:10 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> On Tue, Feb 14, 2023 at 11:15:04AM +0000, Jonathan Cameron wrote:
> > On Fri, 10 Feb 2023 21:25:01 +0100 Lukas Wunner <lukas@wunner.de> wrote:  
> > > The CDAT exposed in sysfs differs between little endian and big endian
> > > arches:  On big endian, every 4 bytes are byte-swapped.
> > > 
> > > PCI Configuration Space is little endian (PCI r3.0 sec 6.1).  Accessors
> > > such as pci_read_config_dword() implicitly swap bytes on big endian.
> > > That way, the macros in include/uapi/linux/pci_regs.h work regardless of
> > > the arch's endianness.  For an example of implicit byte-swapping, see
> > > ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
> > > (Load Word Byte-Reverse Indexed).
> > > 
> > > DOE Read/Write Data Mailbox Registers are unlike other registers in
> > > Configuration Space in that they contain or receive a 4 byte portion of
> > > an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
> > > They need to be copied to or from the request/response buffer verbatim.
> > > So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
> > > byte-swapping.
> > > 
> > > The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
> > > implicit byte-swapping.  Byte-swap requests after constructing them with
> > > those macros and byte-swap responses before parsing them.
> > > 
> > > Change the request and response type to __le32 to avoid sparse warnings.  
> > 
> > In brief this patch set ensures that internal handling of the payloads
> > is in le32 chunks by unwinding the implicit conversions in the pci config
> > accessors for BE platforms.  Thus the data in the payloads is always
> > in the same order.  Good start.  Hence the binary read back of CDAT is
> > little endian (including all fields within it).  
> 
> Correct.  On big endian this means writing request dwords and
> reading response dwords involves 2 endianness conversions.
> 
> If the request is constructed with macros such as CXL_DOE_TABLE_ACCESS_*
> and PCI_DOE_DATA_OBJECT_DISC_*, then an additional (3rd) endianness
> conversion is necessary:  The request is constructed in native big endian
> order, then converted to a little endian DOE payload.  That payload is
> then converted back to big endian so that it can be written with the
> config space accessor, which converts to little endian.
> 
> I recognize this is crazy but I do not see a better solution.
> Dan suggested amending struct pci_ops with ->read_raw and ->write_raw
> callbacks which omit the endianness conversion.  However there are
> 175 struct pci_ops in the kernel and each one would have to be looked at.
> I also do not know the Assembler instructions for every architecture to
> perform a non-byte-swapped write or read.  It's a big task and lots of
> code to add.  Given that DOE is likely the only user and big endian
> is seeing decreasing use, it's probably not worth the effort and LoC.
> 
> 
> > > @@ -322,15 +322,17 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
> > >  	struct pci_doe_task task = {
> > >  		.prot.vid = PCI_VENDOR_ID_PCI_SIG,
> > >  		.prot.type = PCI_DOE_PROTOCOL_DISCOVERY,
> > > -		.request_pl = &request_pl,
> > > +		.request_pl = (__le32 *)&request_pl,  
> > 
> > I don't like this type cast. request_pl is local
> > anyway, so why not change it's type and set it appropriately in
> > the first place.  
> 
> That doesn't work I'm afraid.  The request_pl is constructed with
> FIELD_PREP() here and the response_pl is parsed with FIELD_GET().
> 
> Those macros operate on an unsigned integer of native endianness.
> If I declare them __le32, I get sparse warnings and rightfully so.
> 
> The casts are the simplest, least intrusive solution.  The only
> alternative would be to use separate variables but then I'd have
> to change a lot more lines and it would be more difficult to review
> and probably not all that more readable.

Separate variables was what I meant (less than clear)

Add a __le32 le32_req_pl then
do the endian conversion into that rather than in place conversion.
Equivalent in teh other direction for the response.

That way the endian markings end up correct on all the local variables
and it shouldn't require many more lines to change.


> 
> Thanks,
> 
> Lukas
> 
> > >  		.request_pl_sz = sizeof(request_pl),
> > > -		.response_pl = &response_pl,
> > > +		.response_pl = (__le32 *)&response_pl,  
> > 
> > Likewise here.
> >   
> > >  		.response_pl_sz = sizeof(response_pl),
> > >  		.complete = pci_doe_task_complete,
> > >  		.private = &c,
> > >  	};
> > >  	int rc;
> > >  
> > > +	cpu_to_le32s(&request_pl);
> > > +
> > >  	rc = pci_doe_submit_task(doe_mb, &task);
> > >  	if (rc < 0)
> > >  		return rc;
> > > @@ -340,6 +342,7 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
> > >  	if (task.rv != sizeof(response_pl))
> > >  		return -EIO;
> > >  
> > > +	le32_to_cpus(&response_pl);
> > >  	*vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl);
> > >  	*protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL,
> > >  			      response_pl);  
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header
  2023-02-10 20:25 ` [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header Lukas Wunner
  2023-02-11  0:40   ` Dan Williams
  2023-02-14 11:16   ` Jonathan Cameron
@ 2023-02-15  1:41   ` Li, Ming
  2 siblings, 0 replies; 62+ messages in thread
From: Li, Ming @ 2023-02-15  1:41 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, Bjorn Helgaas, linux-pci

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> cxl_cdat_get_length() only checks whether the DOE response size is
> sufficient for the Table Access response header (1 dword), but not the
> succeeding CDAT header (1 dword length plus other fields).
> 
> It thus returns whatever uninitialized memory happens to be on the stack
> if a truncated DOE response with only 1 dword was received.  Fix it.
> 
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Reported-by: Ming Li <ming4.li@intel.com>
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Ming Li <ming4.li@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally
  2023-02-10 20:25 ` [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally Lukas Wunner
@ 2023-02-15  1:45   ` Li, Ming
  2023-02-28 18:58   ` Davidlohr Bueso
  1 sibling, 0 replies; 62+ messages in thread
From: Li, Ming @ 2023-02-15  1:45 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, linux-pci, Bjorn Helgaas

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> The DOE API only allows asynchronous exchanges and forces callers to
> provide a completion callback.  Yet all existing callers only perform
> synchronous exchanges.  Upcoming commits for CMA (Component Measurement
> and Authentication, PCIe r6.0 sec 6.31) likewise require only
> synchronous DOE exchanges.
> 
> Provide a synchronous pci_doe() API call which builds on the internal
> asynchronous machinery.
> 
> Convert the internal pci_doe_discovery() to the new call.
> 
> The new API allows submission of const-declared requests, necessitating
> the addition of a const qualifier in struct pci_doe_task.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Ming Li <ming4.li@intel.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 09/16] PCI/DOE: Make asynchronous API private
  2023-02-10 20:25 ` [PATCH v3 09/16] PCI/DOE: Make asynchronous API private Lukas Wunner
@ 2023-02-15  1:48   ` Li, Ming
  0 siblings, 0 replies; 62+ messages in thread
From: Li, Ming @ 2023-02-15  1:48 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, linux-pci, Bjorn Helgaas

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> A synchronous API for DOE has just been introduced.  CXL (the only
> in-tree DOE user so far) was converted to use it instead of the
> asynchronous API.
> 
> Consequently, pci_doe_submit_task() as well as the pci_doe_task struct
> are only used internally, so make them private.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Ming Li <ming4.li@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-10 20:25 ` [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration Lukas Wunner
@ 2023-02-15  2:07   ` Li, Ming
  2023-02-28  1:18   ` Alexey Kardashevskiy
  1 sibling, 0 replies; 62+ messages in thread
From: Li, Ming @ 2023-02-15  2:07 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, linux-pci, Bjorn Helgaas

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> Currently a DOE instance cannot be shared by multiple drivers because
> each driver creates its own pci_doe_mb struct for a given DOE instance.
> For the same reason a DOE instance cannot be shared between the PCI core
> and a driver.
> 
> Overcome this limitation by creating mailboxes in the PCI core on device
> enumeration.
> 
> Provide a pci_find_doe_mailbox() API call to allow drivers to get a
> pci_doe_mb for a given (pci_dev, vendor, protocol) triple.  This API is
> modeled after pci_find_capability() and can later be amended with a
> pci_find_next_doe_mailbox() call to iterate over all mailboxes of a
> given pci_dev which support a specific protocol.
> 
> On removal, destroy the mailboxes in pci_destroy_dev(), after the driver
> is unbound.  This allows drivers to use DOE in their ->remove() hook.
> 
> On surprise removal, cancel ongoing DOE exchanges and prevent new ones
> from being scheduled.  Thereby ensure that a hot-removed device doesn't
> needlessly wait for a running exchange to time out.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Ming Li <ming4.li@intel.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 14/16] PCI/DOE: Make mailbox creation API private
  2023-02-10 20:25 ` [PATCH v3 14/16] PCI/DOE: Make mailbox creation API private Lukas Wunner
@ 2023-02-15  2:13   ` Li, Ming
  0 siblings, 0 replies; 62+ messages in thread
From: Li, Ming @ 2023-02-15  2:13 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, linux-pci, Bjorn Helgaas

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> The PCI core has just been amended to create a pci_doe_mb struct for
> every DOE instance on device enumeration.  CXL (the only in-tree DOE
> user so far) has been migrated to use those mailboxes instead of
> creating its own.
> 
> That leaves pcim_doe_create_mb() and pci_doe_for_each_off() without any
> callers, so drop them.
> 
> pci_doe_supports_prot() is now only used internally, so declare it
> static.
> 
> pci_doe_destroy_mb() is no longer used as callback for
> devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
> instead of a generic void pointer.
> 
> Because pci_doe_create_mb() is only called on device enumeration, i.e.
> before driver binding, the workqueue name never contains a driver name.
> So replace dev_driver_string() with dev_bus_name() when generating the
> workqueue name.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Reviewed-by: Ming Li <ming4.li@intel.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size
  2023-02-10 20:25 ` [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size Lukas Wunner
@ 2023-02-15  5:05   ` Li, Ming
  2023-02-15 11:49     ` Lukas Wunner
  0 siblings, 1 reply; 62+ messages in thread
From: Li, Ming @ 2023-02-15  5:05 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, linux-pci, Bjorn Helgaas

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> An upcoming user of DOE is CMA (Component Measurement and Authentication,
> PCIe r6.0 sec 6.31).
> 
> It builds on SPDM (Security Protocol and Data Model):
> https://www.dmtf.org/dsp/DSP0274
> 
> SPDM message sizes are not always a multiple of dwords.  To transport
> them over DOE without using bounce buffers, allow sending requests and
> receiving responses whose final dword is only partially populated.
> 
> To be clear, PCIe r6.0 sec 6.30.1 specifies the Data Object Header 2
> "Length" in dwords and pci_doe_send_req() and pci_doe_recv_resp()
> read/write dwords.  So from a spec point of view, DOE is still specified
> in dwords and allowing non-dword request/response buffers is merely for
> the convenience of callers.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  Changes v2 -> v3:
>  * Fix byte order of last dword on big endian arches (Ira)
>  * Explain in commit message and kerneldoc that arbitrary-sized
>    payloads are not stipulated by the spec, but merely for
>    convenience of the caller (Bjorn, Jonathan)
>  * Add code comment that "remainder" in pci_doe_recv_resp() signifies
>    the number of data bytes in the last payload dword (Ira)
> 
>  drivers/pci/doe.c | 74 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 49 insertions(+), 25 deletions(-)
>

......

> +
> +	if (payload_length) {
> +		/* Read all payload dwords except the last */
> +		for (; i < payload_length - 1; i++) {
> +			pci_read_config_dword(pdev, offset + PCI_DOE_READ,
> +					      &val);
> +			task->response_pl[i] = cpu_to_le32(val);
> +			pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
> +		}
> +
> +		/* Read last payload dword */
>  		pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
> -		task->response_pl[i] = cpu_to_le32(val);
> +		cpu_to_le32s(&val);
> +		memcpy(&task->response_pl[i], &val, remainder);

This "Read last payload dword" seems like making sense only when remainder != sizeof(u32).
If remainder == sizeof(u32), it should be read in above reading loops.
But this implementation also looks good to me.

Reviewed-by: Ming Li <ming4.li@intel.com>

>  		/* Prior to the last ack, ensure Data Object Ready */
> -		if (i == (payload_length - 1) && !pci_doe_data_obj_ready(doe_mb))
> +		if (!pci_doe_data_obj_ready(doe_mb))
>  			return -EIO;
>  		pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
> +		i++;
>  	}
>  
>  	/* Flush excess length */



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing
  2023-02-10 20:25 ` [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing Lukas Wunner
  2023-02-14 11:36   ` Jonathan Cameron
@ 2023-02-15  5:07   ` Li, Ming
  1 sibling, 0 replies; 62+ messages in thread
From: Li, Ming @ 2023-02-15  5:07 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> When a DOE mailbox is torn down, its workqueue is flushed once in
> pci_doe_flush_mb() through a call to flush_workqueue() and subsequently
> flushed once more in pci_doe_destroy_workqueue() through a call to
> destroy_workqueue().
> 
> Deduplicate by dropping flush_workqueue() from pci_doe_flush_mb().
> 
> Rename pci_doe_flush_mb() to pci_doe_cancel_tasks() to more aptly
> describe what it now does.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Ming Li <ming4.li@intel.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management
  2023-02-10 20:25 ` [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management Lukas Wunner
  2023-02-14 11:51   ` Jonathan Cameron
@ 2023-02-15  5:17   ` Li, Ming
  1 sibling, 0 replies; 62+ messages in thread
From: Li, Ming @ 2023-02-15  5:17 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, linux-pci, Bjorn Helgaas

On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> DOE mailbox creation is currently only possible through a devres-managed
> API.  The lifetime of mailboxes thus ends with driver unbinding.
> 
> An upcoming commit will create DOE mailboxes upon device enumeration by
> the PCI core.  Their lifetime shall not be limited by a driver.
> 
> Therefore rework pcim_doe_create_mb() into the non-devres-managed
> pci_doe_create_mb().  Add pci_doe_destroy_mb() for mailbox destruction
> on device removal.
> 
> Provide a devres-managed wrapper under the existing pcim_doe_create_mb()
> name.
> 
> The error path of pcim_doe_create_mb() previously called xa_destroy() if
> alloc_ordered_workqueue() failed.  That's unnecessary because the xarray
> is still empty at that point.  It doesn't need to be destroyed until
> it's been populated by pci_doe_cache_protocols().  Arrange the error
> path of the new pci_doe_create_mb() accordingly.
> 
> pci_doe_cancel_tasks() is no longer used as callback for
> devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
> instead of a generic void pointer.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Ming Li <ming4.li@intel.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size
  2023-02-15  5:05   ` Li, Ming
@ 2023-02-15 11:49     ` Lukas Wunner
  0 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-15 11:49 UTC (permalink / raw)
  To: Li, Ming
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Hillf Danton,
	Ben Widawsky, linuxarm, linux-cxl, linux-pci, Bjorn Helgaas

On Wed, Feb 15, 2023 at 01:05:20PM +0800, Li, Ming wrote:
> On 2/11/2023 4:25 AM, Lukas Wunner wrote:
> > An upcoming user of DOE is CMA (Component Measurement and Authentication,
> > PCIe r6.0 sec 6.31).
> > 
> > It builds on SPDM (Security Protocol and Data Model):
> > https://www.dmtf.org/dsp/DSP0274
> > 
> > SPDM message sizes are not always a multiple of dwords.  To transport
> > them over DOE without using bounce buffers, allow sending requests and
> > receiving responses whose final dword is only partially populated.
[...]
> > +	if (payload_length) {
> > +		/* Read all payload dwords except the last */
> > +		for (; i < payload_length - 1; i++) {
> > +			pci_read_config_dword(pdev, offset + PCI_DOE_READ,
> > +					      &val);
> > +			task->response_pl[i] = cpu_to_le32(val);
> > +			pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
> > +		}
> > +
> > +		/* Read last payload dword */
> >  		pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
> > -		task->response_pl[i] = cpu_to_le32(val);
> > +		cpu_to_le32s(&val);
> > +		memcpy(&task->response_pl[i], &val, remainder);
> 
> This "Read last payload dword" seems like making sense only when remainder != sizeof(u32).
> If remainder == sizeof(u32), it should be read in above reading loops.
> But this implementation also looks good to me.

The last payload dword requires special handling anyway because of
the Data Object Ready check (see quoted remainder of the function
below).

Up until now that special handling is done with an if-clause that
checks if the last loop iteration has been reached.

The support for non-dword responses adds more special handling
of the last payload dword, which is why I decided to go the full
distance and move handling of the last dword out of the loop.

Thanks,

Lukas

> >  		/* Prior to the last ack, ensure Data Object Ready */
> > -		if (i == (payload_length - 1) && !pci_doe_data_obj_ready(doe_mb))
> > +		if (!pci_doe_data_obj_ready(doe_mb))
> >  			return -EIO;
> >  		pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
> > +		i++;
> >  	}
> >  
> >  	/* Flush excess length */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation
  2023-02-10 20:25 ` [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation Lukas Wunner
  2023-02-14 13:05   ` Jonathan Cameron
@ 2023-02-16  0:56   ` Ira Weiny
  2023-02-16  8:03     ` Lukas Wunner
  2023-02-28  1:45   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 62+ messages in thread
From: Ira Weiny @ 2023-02-16  0:56 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Lukas Wunner wrote:
> Jonathan notes that cxl_cdat_get_length() and cxl_cdat_read_table()
> allocate 32 dwords for the DOE response even though it may be smaller.
> 
> In the case of cxl_cdat_get_length(), only the second dword of the
> response is of interest (it contains the length).  So reduce the
> allocation to 2 dwords and let DOE discard the remainder.
> 
> In the case of cxl_cdat_read_table(), a correctly sized allocation for
> the full CDAT already exists.  Let DOE write each table entry directly
> into that allocation.  There's a snag in that the table entry is
> preceded by a Table Access Response Header (1 dword).

Where is this 'Table Access Response Header' defined?

Ira

> Save the last
> dword of the previous table entry, let DOE overwrite it with the
> header of the next entry and restore it afterwards.
> 
> The resulting CDAT is preceded by 4 unavoidable useless bytes.  Increase
> the allocation size accordingly and skip these bytes when exposing CDAT
> in sysfs.
> 
> The buffer overflow check in cxl_cdat_read_table() becomes unnecessary
> because the remaining bytes in the allocation are tracked in "length",
> which is passed to DOE and limits how many bytes it writes to the
> allocation.  Additionally, cxl_cdat_read_table() bails out if the DOE
> response is truncated due to insufficient space.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> ---
>  Changes v2 -> v3:
>  * Newly added patch in v3 on popular request (Jonathan)
> 
>  drivers/cxl/core/pci.c | 34 ++++++++++++++++++----------------
>  drivers/cxl/cxl.h      |  3 ++-
>  drivers/cxl/port.c     |  2 +-
>  3 files changed, 21 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 1b954783b516..70097cc75302 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -471,7 +471,7 @@ static int cxl_cdat_get_length(struct device *dev,
>  			       size_t *length)
>  {
>  	__le32 request = CDAT_DOE_REQ(0);
> -	__le32 response[32];
> +	__le32 response[2];
>  	int rc;
>  
>  	rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
> @@ -495,28 +495,28 @@ static int cxl_cdat_read_table(struct device *dev,
>  			       struct pci_doe_mb *cdat_doe,
>  			       struct cxl_cdat *cdat)
>  {
> -	size_t length = cdat->length;
> -	u32 *data = cdat->table;
> +	size_t length = cdat->length + sizeof(u32);
> +	__le32 *data = cdat->table;
>  	int entry_handle = 0;
> +	__le32 saved_dw = 0;
>  
>  	do {
>  		__le32 request = CDAT_DOE_REQ(entry_handle);
>  		struct cdat_entry_header *entry;
> -		__le32 response[32];
>  		size_t entry_dw;
>  		int rc;
>  
>  		rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL,
>  			     CXL_DOE_PROTOCOL_TABLE_ACCESS,
>  			     &request, sizeof(request),
> -			     &response, sizeof(response));
> +			     data, length);
>  		if (rc < 0) {
>  			dev_err(dev, "DOE failed: %d", rc);
>  			return rc;
>  		}
>  
>  		/* 1 DW Table Access Response Header + CDAT entry */
> -		entry = (struct cdat_entry_header *)(response + 1);
> +		entry = (struct cdat_entry_header *)(data + 1);
>  		if ((entry_handle == 0 &&
>  		     rc != sizeof(u32) + sizeof(struct cdat_header)) ||
>  		    (entry_handle > 0 &&
> @@ -526,21 +526,22 @@ static int cxl_cdat_read_table(struct device *dev,
>  
>  		/* Get the CXL table access header entry handle */
>  		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
> -					 le32_to_cpu(response[0]));
> +					 le32_to_cpu(data[0]));
>  		entry_dw = rc / sizeof(u32);
>  		/* Skip Header */
>  		entry_dw -= 1;
> -		entry_dw = min(length / sizeof(u32), entry_dw);
> -		/* Prevent length < 1 DW from causing a buffer overflow */
> -		if (entry_dw) {
> -			memcpy(data, entry, entry_dw * sizeof(u32));
> -			length -= entry_dw * sizeof(u32);
> -			data += entry_dw;
> -		}
> +		/*
> +		 * Table Access Response Header overwrote the last DW of
> +		 * previous entry, so restore that DW
> +		 */
> +		*data = saved_dw;
> +		length -= entry_dw * sizeof(u32);
> +		data += entry_dw;
> +		saved_dw = *data;
>  	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
>  
>  	/* Length in CDAT header may exceed concatenation of CDAT entries */
> -	cdat->length -= length;
> +	cdat->length -= length - sizeof(u32);
>  
>  	return 0;
>  }
> @@ -576,7 +577,8 @@ void read_cdat_data(struct cxl_port *port)
>  		return;
>  	}
>  
> -	port->cdat.table = devm_kzalloc(dev, cdat_length, GFP_KERNEL);
> +	port->cdat.table = devm_kzalloc(dev, cdat_length + sizeof(u32),
> +					GFP_KERNEL);
>  	if (!port->cdat.table)
>  		return;
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1b1cf459ac77..78f5cae5134c 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -494,7 +494,8 @@ struct cxl_pmem_region {
>   * @component_reg_phys: component register capability base address (optional)
>   * @dead: last ep has been removed, force port re-creation
>   * @depth: How deep this port is relative to the root. depth 0 is the root.
> - * @cdat: Cached CDAT data
> + * @cdat: Cached CDAT data (@table is preceded by 4 null bytes, these are not
> + *	  included in @length)
>   * @cdat_available: Should a CDAT attribute be available in sysfs
>   */
>  struct cxl_port {
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 5453771bf330..0705343ac5ca 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -95,7 +95,7 @@ static ssize_t CDAT_read(struct file *filp, struct kobject *kobj,
>  		return 0;
>  
>  	return memory_read_from_buffer(buf, count, &offset,
> -				       port->cdat.table,
> +				       port->cdat.table + sizeof(u32),
>  				       port->cdat.length);
>  }
>  
> -- 
> 2.39.1
> 



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation
  2023-02-16  0:56   ` Ira Weiny
@ 2023-02-16  8:03     ` Lukas Wunner
  0 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-16  8:03 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Jonathan Cameron,
	Dan Williams, Alison Schofield, Vishal Verma, Dave Jiang, Li,
	Ming, Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Wed, Feb 15, 2023 at 04:56:48PM -0800, Ira Weiny wrote:
> Lukas Wunner wrote:
> > Jonathan notes that cxl_cdat_get_length() and cxl_cdat_read_table()
> > allocate 32 dwords for the DOE response even though it may be smaller.
> > 
> > In the case of cxl_cdat_get_length(), only the second dword of the
> > response is of interest (it contains the length).  So reduce the
> > allocation to 2 dwords and let DOE discard the remainder.
> > 
> > In the case of cxl_cdat_read_table(), a correctly sized allocation for
> > the full CDAT already exists.  Let DOE write each table entry directly
> > into that allocation.  There's a snag in that the table entry is
> > preceded by a Table Access Response Header (1 dword).
> 
> Where is this 'Table Access Response Header' defined?

CXL r3.0 table 8-14 (sec 8.1.11.1, page 399).

I'll amend the commit message with a reference to the spec.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length
  2023-02-14 11:33   ` Jonathan Cameron
@ 2023-02-16 10:26     ` Lukas Wunner
  2023-02-17 10:01       ` Jonathan Cameron
  0 siblings, 1 reply; 62+ messages in thread
From: Lukas Wunner @ 2023-02-16 10:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Tue, Feb 14, 2023 at 11:33:11AM +0000, Jonathan Cameron wrote:
> On Fri, 10 Feb 2023 21:25:04 +0100 Lukas Wunner <lukas@wunner.de> wrote:
> > If the length in the CDAT header is larger than the concatenation of the
> > header and all table entries, then the CDAT exposed to user space
> > contains trailing null bytes.
> > 
> > Not every consumer may be able to handle that.  Per Postel's robustness
> > principle, "be liberal in what you accept" and silently reduce the
> > cached length to avoid exposing those null bytes.
[...]
> Fair enough. I'd argue that we are papering over broken hardware if
> we hit these conditions, so given we aren't aware of any (I hope)
> not sure this is stable material.  Argument in favor of stable being
> that if we do get broken hardware we don't want an ABI change when
> we paper over the garbage... hmm.

Type 0 is assigned for DSMAS structures.  So user space might believe
there's an additional DSMAS in the CDAT.  It *could* detect that the
length is bogus (it is 0 but should be 24), but what if it doesn't
check that?  It seems way too dangerous to leave this loophole open,
hence the stable designation.

Thanks,

Lukas

> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
> > @@ -582,6 +582,9 @@ static int cxl_cdat_read_table(struct device *dev,
> >  		}
> >  	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
> >  
> > +	/* Length in CDAT header may exceed concatenation of CDAT entries */
> > +	cdat->length -= length;
> > +
> >  	return 0;
> >  }
> >  

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length
  2023-02-16 10:26     ` Lukas Wunner
@ 2023-02-17 10:01       ` Jonathan Cameron
  0 siblings, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-17 10:01 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Thu, 16 Feb 2023 11:26:16 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> On Tue, Feb 14, 2023 at 11:33:11AM +0000, Jonathan Cameron wrote:
> > On Fri, 10 Feb 2023 21:25:04 +0100 Lukas Wunner <lukas@wunner.de> wrote:  
> > > If the length in the CDAT header is larger than the concatenation of the
> > > header and all table entries, then the CDAT exposed to user space
> > > contains trailing null bytes.
> > > 
> > > Not every consumer may be able to handle that.  Per Postel's robustness
> > > principle, "be liberal in what you accept" and silently reduce the
> > > cached length to avoid exposing those null bytes.  
> [...]
> > Fair enough. I'd argue that we are papering over broken hardware if
> > we hit these conditions, so given we aren't aware of any (I hope)
> > not sure this is stable material.  Argument in favor of stable being
> > that if we do get broken hardware we don't want an ABI change when
> > we paper over the garbage... hmm.  
> 
> Type 0 is assigned for DSMAS structures.  So user space might believe
> there's an additional DSMAS in the CDAT.  It *could* detect that the
> length is bogus (it is 0 but should be 24), but what if it doesn't
> check that?  It seems way too dangerous to leave this loophole open,
> hence the stable designation.
Ok

> 
> Thanks,
> 
> Lukas
> 
> > > --- a/drivers/cxl/core/pci.c
> > > +++ b/drivers/cxl/core/pci.c
> > > @@ -582,6 +582,9 @@ static int cxl_cdat_read_table(struct device *dev,
> > >  		}
> > >  	} while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY);
> > >  
> > > +	/* Length in CDAT header may exceed concatenation of CDAT entries */
> > > +	cdat->length -= length;
> > > +
> > >  	return 0;
> > >  }
> > >    


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-11  0:22   ` Dan Williams
@ 2023-02-19 13:03     ` Lukas Wunner
  0 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-19 13:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Alison Schofield, Vishal Verma, Dave Jiang, Li,
	Ming, Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Fri, Feb 10, 2023 at 04:22:47PM -0800, Dan Williams wrote:
> Lukas Wunner wrote:
> > PCI Configuration Space is little endian (PCI r3.0 sec 6.1).  Accessors
> > such as pci_read_config_dword() implicitly swap bytes on big endian.
> > That way, the macros in include/uapi/linux/pci_regs.h work regardless of
> > the arch's endianness.  For an example of implicit byte-swapping, see
> > ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
> > (Load Word Byte-Reverse Indexed).
> > 
> > DOE Read/Write Data Mailbox Registers are unlike other registers in
> > Configuration Space in that they contain or receive a 4 byte portion of
> > an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
> > They need to be copied to or from the request/response buffer verbatim.
> > So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
> > byte-swapping.
[...]
> > --- a/drivers/pci/doe.c
> > +++ b/drivers/pci/doe.c
> > @@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
> >  					  length));
> >  	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)
> >  		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
> > -				       task->request_pl[i]);
> > +				       le32_to_cpu(task->request_pl[i]));
> 
> What do you think about defining:
> 
> int pci_doe_write(const struct pci_dev *dev, int where, __le32 val);
> int pci_doe_read(const struct pci_dev *dev, int where, __le32 *val);
> 
> ...local to this file to make it extra clear that DOE transfers are
> passing raw byte-streams and not register values as
> pci_{write,read}_config_dword() expect.

I've done a prototype (see below), but it doesn't strike me as an
improvement:

There are only two occurrences for each of those read and write accessors,
so the diffstat isn't pretty (27 insertions, 12 deletions).

Moreover, the request header constructed in pci_doe_send_req() is
constructed in native endianness and written using the standard
pci_write_config_dword() accessor.  Same for the response header
parsed in pci_doe_recv_resp().  Thus, the functions do not
consistently use the new custom accessors, but rather use a mix
of the standard accessors and custom accessors.  So clarity may
not improve all that much.

Let me know if you feel otherwise!

The patch below goes on top of the series.  I'm using a variation
of your suggested approach in that I've named the custom accessors
pci_doe_{write,read}_data() (for consistency with the existing
pci_doe_write_ctrl()).

Thanks,

Lukas

-- >8 --

diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
index 8499cf9..6b0148e 100644
--- a/drivers/pci/doe.c
+++ b/drivers/pci/doe.c
@@ -106,6 +106,24 @@ static void pci_doe_write_ctrl(struct pci_doe_mb *doe_mb, u32 val)
 	pci_write_config_dword(pdev, offset + PCI_DOE_CTRL, val);
 }
 
+static void pci_doe_write_data(struct pci_doe_mb *doe_mb, __le32 lav)
+{
+	struct pci_dev *pdev = doe_mb->pdev;
+	int offset = doe_mb->cap_offset;
+
+	pci_write_config_dword(pdev, offset + PCI_DOE_WRITE, le32_to_cpu(lav));
+}
+
+static void pci_doe_read_data(struct pci_doe_mb *doe_mb, __le32 *lav)
+{
+	struct pci_dev *pdev = doe_mb->pdev;
+	int offset = doe_mb->cap_offset;
+	u32 val;
+
+	pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
+	*lav = cpu_to_le32(val);
+}
+
 static int pci_doe_abort(struct pci_doe_mb *doe_mb)
 {
 	struct pci_dev *pdev = doe_mb->pdev;
@@ -144,6 +162,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
 	struct pci_dev *pdev = doe_mb->pdev;
 	int offset = doe_mb->cap_offset;
 	size_t length, remainder;
+	__le32 lav;
 	u32 val;
 	int i;
 
@@ -177,16 +196,14 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
 
 	/* Write payload */
 	for (i = 0; i < task->request_pl_sz / sizeof(__le32); i++)
-		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
-				       le32_to_cpu(task->request_pl[i]));
+		pci_doe_write_data(doe_mb, task->request_pl[i]);
 
 	/* Write last payload dword */
 	remainder = task->request_pl_sz % sizeof(__le32);
 	if (remainder) {
-		val = 0;
-		memcpy(&val, &task->request_pl[i], remainder);
-		le32_to_cpus(&val);
-		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE, val);
+		lav = 0;
+		memcpy(&lav, &task->request_pl[i], remainder);
+		pci_doe_write_data(doe_mb, lav);
 	}
 
 	pci_doe_write_ctrl(doe_mb, PCI_DOE_CTRL_GO);
@@ -211,6 +228,7 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas
 	size_t length, payload_length, remainder, received;
 	struct pci_dev *pdev = doe_mb->pdev;
 	int offset = doe_mb->cap_offset;
+	__le32 lav;
 	int i = 0;
 	u32 val;
 
@@ -256,16 +274,13 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas
 	if (payload_length) {
 		/* Read all payload dwords except the last */
 		for (; i < payload_length - 1; i++) {
-			pci_read_config_dword(pdev, offset + PCI_DOE_READ,
-					      &val);
-			task->response_pl[i] = cpu_to_le32(val);
+			pci_doe_read_data(doe_mb, &task->response_pl[i]);
 			pci_write_config_dword(pdev, offset + PCI_DOE_READ, 0);
 		}
 
 		/* Read last payload dword */
-		pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
-		cpu_to_le32s(&val);
-		memcpy(&task->response_pl[i], &val, remainder);
+		pci_doe_read_data(doe_mb, &lav);
+		memcpy(&task->response_pl[i], &lav, remainder);
 		/* Prior to the last ack, ensure Data Object Ready */
 		if (!pci_doe_data_obj_ready(doe_mb))
 			return -EIO;

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-10 20:25 ` [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration Lukas Wunner
  2023-02-15  2:07   ` Li, Ming
@ 2023-02-28  1:18   ` Alexey Kardashevskiy
  2023-02-28  1:39     ` Dan Williams
  2023-02-28  5:43     ` Lukas Wunner
  1 sibling, 2 replies; 62+ messages in thread
From: Alexey Kardashevskiy @ 2023-02-28  1:18 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On 11/2/23 07:25, Lukas Wunner wrote:
> Currently a DOE instance cannot be shared by multiple drivers because
> each driver creates its own pci_doe_mb struct for a given DOE instance.

Sorry for my ignorance but why/how/when would a device have multiple 
drivers bound? Or it only one driver at the time but we also want DOE 
MBs to survive switching to another (different or newer) driver?


> For the same reason a DOE instance cannot be shared between the PCI core
> and a driver.

And we want this sharing why? Any example will do. Thanks,

> Overcome this limitation by creating mailboxes in the PCI core on device
> enumeration.
> 
> Provide a pci_find_doe_mailbox() API call to allow drivers to get a
> pci_doe_mb for a given (pci_dev, vendor, protocol) triple. 
 >
> This API is
> modeled after pci_find_capability() and can later be amended with a
> pci_find_next_doe_mailbox() call to iterate over all mailboxes of a
> given pci_dev which support a specific protocol.
> 
> On removal, destroy the mailboxes in pci_destroy_dev(), after the driver
> is unbound.  This allows drivers to use DOE in their ->remove() hook.
> 
> On surprise removal, cancel ongoing DOE exchanges and prevent new ones
> from being scheduled.  Thereby ensure that a hot-removed device doesn't
> needlessly wait for a running exchange to time out.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>   Changes v2 -> v3:
>   * Don't cancel ongoing DOE exchanges in pci_stop_dev() so that
>     drivers may perform DOE in their ->remove() hooks
>   * Instead cancel ongoing DOE exchanges on surprise removal in
>     pci_dev_set_disconnected()
>   * Emit error message in pci_doe_init() if mailbox creation fails (Ira)
>   * Explain in commit message that pci_find_doe_mailbox() can later
>     be amended with pci_find_next_doe_mailbox() (Jonathan)
> 
>   drivers/pci/doe.c       | 73 +++++++++++++++++++++++++++++++++++++++++
>   drivers/pci/pci.h       | 12 +++++++
>   drivers/pci/probe.c     |  1 +
>   drivers/pci/remove.c    |  1 +
>   include/linux/pci-doe.h |  2 ++
>   include/linux/pci.h     |  3 ++
>   6 files changed, 92 insertions(+)
> 
> diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> index 2bc202b64b6a..bf32875d27da 100644
> --- a/drivers/pci/doe.c
> +++ b/drivers/pci/doe.c
> @@ -20,6 +20,8 @@
>   #include <linux/pci-doe.h>
>   #include <linux/workqueue.h>
>   
> +#include "pci.h"
> +
>   #define PCI_DOE_PROTOCOL_DISCOVERY 0
>   
>   /* Timeout of 1 second from 6.30.2 Operation, PCI Spec r6.0 */
> @@ -658,3 +660,74 @@ int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
>   	return task.rv;
>   }
>   EXPORT_SYMBOL_GPL(pci_doe);
> +
> +/**
> + * pci_find_doe_mailbox() - Find Data Object Exchange mailbox
> + *
> + * @pdev: PCI device
> + * @vendor: Vendor ID
> + * @type: Data Object Type
> + *
> + * Find first DOE mailbox of a PCI device which supports the given protocol.
> + *
> + * RETURNS: Pointer to the DOE mailbox or NULL if none was found.
> + */
> +struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
> +					u8 type)
> +{
> +	struct pci_doe_mb *doe_mb;
> +	unsigned long index;
> +
> +	xa_for_each(&pdev->doe_mbs, index, doe_mb)
> +		if (pci_doe_supports_prot(doe_mb, vendor, type))
> +			return doe_mb;
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_find_doe_mailbox);
> +
> +void pci_doe_init(struct pci_dev *pdev)
> +{
> +	struct pci_doe_mb *doe_mb;
> +	u16 offset = 0;
> +	int rc;
> +
> +	xa_init(&pdev->doe_mbs);
> +
> +	while ((offset = pci_find_next_ext_capability(pdev, offset,
> +						      PCI_EXT_CAP_ID_DOE))) {
> +		doe_mb = pci_doe_create_mb(pdev, offset);
> +		if (IS_ERR(doe_mb)) {
> +			pci_err(pdev, "[%x] failed to create mailbox: %ld\n",
> +				offset, PTR_ERR(doe_mb));
> +			continue;
> +		}
> +
> +		rc = xa_insert(&pdev->doe_mbs, offset, doe_mb, GFP_KERNEL);
> +		if (rc) {
> +			pci_err(pdev, "[%x] failed to insert mailbox: %d\n",
> +				offset, rc);
> +			pci_doe_destroy_mb(doe_mb);
> +		}
> +	}
> +}
> +
> +void pci_doe_destroy(struct pci_dev *pdev)
> +{
> +	struct pci_doe_mb *doe_mb;
> +	unsigned long index;
> +
> +	xa_for_each(&pdev->doe_mbs, index, doe_mb)
> +		pci_doe_destroy_mb(doe_mb);
> +
> +	xa_destroy(&pdev->doe_mbs);
> +}
> +
> +void pci_doe_disconnected(struct pci_dev *pdev)
> +{
> +	struct pci_doe_mb *doe_mb;
> +	unsigned long index;
> +
> +	xa_for_each(&pdev->doe_mbs, index, doe_mb)
> +		pci_doe_cancel_tasks(doe_mb);
> +}
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 8f5d4bd5b410..065ca9743ec1 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -318,6 +318,16 @@ struct pci_sriov {
>   	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
>   };
>   
> +#ifdef CONFIG_PCI_DOE
> +void pci_doe_init(struct pci_dev *pdev);
> +void pci_doe_destroy(struct pci_dev *pdev);
> +void pci_doe_disconnected(struct pci_dev *pdev);
> +#else
> +static inline void pci_doe_init(struct pci_dev *pdev) { }
> +static inline void pci_doe_destroy(struct pci_dev *pdev) { }
> +static inline void pci_doe_disconnected(struct pci_dev *pdev) { }
> +#endif
> +
>   /**
>    * pci_dev_set_io_state - Set the new error state if possible.
>    *
> @@ -372,6 +382,8 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
>   	pci_dev_set_io_state(dev, pci_channel_io_perm_failure);
>   	device_unlock(&dev->dev);
>   
> +	pci_doe_disconnected(dev);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 1779582fb500..65e60ee50489 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2476,6 +2476,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
>   	pci_aer_init(dev);		/* Advanced Error Reporting */
>   	pci_dpc_init(dev);		/* Downstream Port Containment */
>   	pci_rcec_init(dev);		/* Root Complex Event Collector */
> +	pci_doe_init(dev);		/* Data Object Exchange */
>   
>   	pcie_report_downtraining(dev);
>   	pci_init_reset_methods(dev);
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index 0145aef1b930..f25acf50879f 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -39,6 +39,7 @@ static void pci_destroy_dev(struct pci_dev *dev)
>   	list_del(&dev->bus_list);
>   	up_write(&pci_bus_sem);
>   
> +	pci_doe_destroy(dev);
>   	pcie_aspm_exit_link_state(dev);
>   	pci_bridge_d3_update(dev);
>   	pci_free_resources(dev);
> diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
> index 7f16749c6aa3..d6192ee0ac07 100644
> --- a/include/linux/pci-doe.h
> +++ b/include/linux/pci-doe.h
> @@ -29,6 +29,8 @@ struct pci_doe_mb;
>   
>   struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset);
>   bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type);
> +struct pci_doe_mb *pci_find_doe_mailbox(struct pci_dev *pdev, u16 vendor,
> +					u8 type);
>   
>   int pci_doe(struct pci_doe_mb *doe_mb, u16 vendor, u8 type,
>   	    const void *request, size_t request_sz,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3c51cac3890b..b19c2965e384 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -511,6 +511,9 @@ struct pci_dev {
>   #endif
>   #ifdef CONFIG_PCI_P2PDMA
>   	struct pci_p2pdma __rcu *p2pdma;
> +#endif
> +#ifdef CONFIG_PCI_DOE
> +	struct xarray	doe_mbs;	/* Data Object Exchange mailboxes */
>   #endif
>   	u16		acs_cap;	/* ACS Capability offset */
>   	phys_addr_t	rom;		/* Physical address if not from BAR */

-- 
Alexey


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-28  1:18   ` Alexey Kardashevskiy
@ 2023-02-28  1:39     ` Dan Williams
  2023-02-28  5:43     ` Lukas Wunner
  1 sibling, 0 replies; 62+ messages in thread
From: Dan Williams @ 2023-02-28  1:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

Alexey Kardashevskiy wrote:
> On 11/2/23 07:25, Lukas Wunner wrote:
> > Currently a DOE instance cannot be shared by multiple drivers because
> > each driver creates its own pci_doe_mb struct for a given DOE instance.
> 
> Sorry for my ignorance but why/how/when would a device have multiple 
> drivers bound? Or it only one driver at the time but we also want DOE 
> MBs to survive switching to another (different or newer) driver?
> 
> 
> > For the same reason a DOE instance cannot be shared between the PCI core
> > and a driver.
> 
> And we want this sharing why? Any example will do. Thanks,

I understood "sharing" in this context to mean that the PCI core can use
the DOE instance and the device can reuse the same *through* the PCI
core. So for example an SPDM session can be established by the core and
used later by the driver.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation
  2023-02-10 20:25 ` [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation Lukas Wunner
  2023-02-14 13:05   ` Jonathan Cameron
  2023-02-16  0:56   ` Ira Weiny
@ 2023-02-28  1:45   ` Alexey Kardashevskiy
  2023-02-28  5:55     ` Lukas Wunner
  2 siblings, 1 reply; 62+ messages in thread
From: Alexey Kardashevskiy @ 2023-02-28  1:45 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On 11/2/23 07:25, Lukas Wunner wrote:
> Jonathan notes that cxl_cdat_get_length() and cxl_cdat_read_table()
> allocate 32 dwords for the DOE response even though it may be smaller.
> 
> In the case of cxl_cdat_get_length(), only the second dword of the
> response is of interest (it contains the length).  So reduce the
> allocation to 2 dwords and let DOE discard the remainder.
> 
> In the case of cxl_cdat_read_table(), a correctly sized allocation for
> the full CDAT already exists.  Let DOE write each table entry directly
> into that allocation.  There's a snag in that the table entry is
> preceded by a Table Access Response Header (1 dword).  Save the last
> dword of the previous table entry, let DOE overwrite it with the
> header of the next entry and restore it afterwards.
> 
> The resulting CDAT is preceded by 4 unavoidable useless bytes.  Increase
> the allocation size accordingly and skip these bytes when exposing CDAT
> in sysfs.
> 
> The buffer overflow check in cxl_cdat_read_table() becomes unnecessary
> because the remaining bytes in the allocation are tracked in "length",
> which is passed to DOE and limits how many bytes it writes to the
> allocation.  Additionally, cxl_cdat_read_table() bails out if the DOE
> response is truncated due to insufficient space.
> 
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> ---
>   Changes v2 -> v3:
>   * Newly added patch in v3 on popular request (Jonathan)
> 
>   drivers/cxl/core/pci.c | 34 ++++++++++++++++++----------------


Almost no change from this patchset applied cleanly to 
drivers/cxl/core/pci.c, what is the patchset based on? Thanks,

-- 
Alexey


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-10 20:25 ` [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian Lukas Wunner
  2023-02-11  0:22   ` Dan Williams
  2023-02-14 11:15   ` Jonathan Cameron
@ 2023-02-28  2:53   ` Alexey Kardashevskiy
  2023-02-28  8:24     ` Lukas Wunner
  2 siblings, 1 reply; 62+ messages in thread
From: Alexey Kardashevskiy @ 2023-02-28  2:53 UTC (permalink / raw)
  To: Lukas Wunner, Bjorn Helgaas, linux-pci
  Cc: Gregory Price, Ira Weiny, Jonathan Cameron, Dan Williams,
	Alison Schofield, Vishal Verma, Dave Jiang, Li, Ming,
	Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On 11/2/23 07:25, Lukas Wunner wrote:
> The CDAT exposed in sysfs differs between little endian and big endian
> arches:  On big endian, every 4 bytes are byte-swapped.


hexdump prints different byte streams on LE and BE? Does not seem right.


> PCI Configuration Space is little endian (PCI r3.0 sec 6.1).  Accessors
> such as pci_read_config_dword() implicitly swap bytes on big endian.
> That way, the macros in include/uapi/linux/pci_regs.h work regardless of
> the arch's endianness.  For an example of implicit byte-swapping, see
> ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
> (Load Word Byte-Reverse Indexed).
> 
> DOE Read/Write Data Mailbox Registers are unlike other registers in
> Configuration Space in that they contain or receive a 4 byte portion of
> an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
> They need to be copied to or from the request/response buffer verbatim.
> So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
> byte-swapping.
> 
> The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
> implicit byte-swapping.  Byte-swap requests after constructing them with
> those macros and byte-swap responses before parsing them.
> 
> Change the request and response type to __le32 to avoid sparse warnings.
>
> Fixes: c97006046c79 ("cxl/port: Read CDAT table")
> Tested-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v6.0+
> ---
>   Changes v2 -> v3:
>   * Newly added patch in v3
> 
>   drivers/cxl/core/pci.c  | 12 ++++++------
>   drivers/pci/doe.c       | 13 ++++++++-----
>   include/linux/pci-doe.h |  8 ++++++--
>   3 files changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 57764e9cd19d..d3cf1d9d67d4 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -480,7 +480,7 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport)
>   	return NULL;
>   }
>   
> -#define CDAT_DOE_REQ(entry_handle)					\
> +#define CDAT_DOE_REQ(entry_handle) cpu_to_le32				\
>   	(FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE,			\
>   		    CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) |		\
>   	 FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE,			\
> @@ -493,8 +493,8 @@ static void cxl_doe_task_complete(struct pci_doe_task *task)
>   }
>   
>   struct cdat_doe_task {
> -	u32 request_pl;
> -	u32 response_pl[32];
> +	__le32 request_pl;
> +	__le32 response_pl[32];

This is ok as it is a binary format of DOE message (is it?)...

>   	struct completion c;
>   	struct pci_doe_task task;
>   };
> @@ -531,7 +531,7 @@ static int cxl_cdat_get_length(struct device *dev,
>   	if (t.task.rv < sizeof(u32))
>   		return -EIO;
>   
> -	*length = t.response_pl[1];
> +	*length = le32_to_cpu(t.response_pl[1]);
>   	dev_dbg(dev, "CDAT length %zu\n", *length);
>   
>   	return 0;
> @@ -548,7 +548,7 @@ static int cxl_cdat_read_table(struct device *dev,
>   	do {
>   		DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(entry_handle), t);
>   		size_t entry_dw;
> -		u32 *entry;
> +		__le32 *entry;
>   		int rc;
>   
>   		rc = pci_doe_submit_task(cdat_doe, &t.task);
> @@ -563,7 +563,7 @@ static int cxl_cdat_read_table(struct device *dev,
>   
>   		/* Get the CXL table access header entry handle */
>   		entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE,
> -					 t.response_pl[0]);
> +					 le32_to_cpu(t.response_pl[0]));
>   		entry = t.response_pl + 1;
>   		entry_dw = t.task.rv / sizeof(u32);
>   		/* Skip Header */
> diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c
> index 66d9ab288646..69efa9a250b9 100644
> --- a/drivers/pci/doe.c
> +++ b/drivers/pci/doe.c
> @@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
>   					  length));
>   	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)
>   		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
> -				       task->request_pl[i]);
> +				       le32_to_cpu(task->request_pl[i]));

Does it really work on BE? My little brain explodes on all these 
convertions :)

char buf[] = { 1, 2, 3, 4 }
u32 *request_pl = buf;

request_pl[0] will be 0x01020304.
le32_to_cpu(request_pl[0]) will be 0x04030201
And then pci_write_config_dword() will do another swap.

Did I miss something? (/me is gone bringing up a BE system).

>   
>   	pci_doe_write_ctrl(doe_mb, PCI_DOE_CTRL_GO);
>   
> @@ -198,8 +198,8 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas
>   	payload_length = min(length, task->response_pl_sz / sizeof(u32));
>   	/* Read the rest of the response payload */
>   	for (i = 0; i < payload_length; i++) {
> -		pci_read_config_dword(pdev, offset + PCI_DOE_READ,
> -				      &task->response_pl[i]);
> +		pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
> +		task->response_pl[i] = cpu_to_le32(val);
>   		/* Prior to the last ack, ensure Data Object Ready */
>   		if (i == (payload_length - 1) && !pci_doe_data_obj_ready(doe_mb))
>   			return -EIO;
> @@ -322,15 +322,17 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
>   	struct pci_doe_task task = {
>   		.prot.vid = PCI_VENDOR_ID_PCI_SIG,
>   		.prot.type = PCI_DOE_PROTOCOL_DISCOVERY,
> -		.request_pl = &request_pl,
> +		.request_pl = (__le32 *)&request_pl,
>   		.request_pl_sz = sizeof(request_pl),
> -		.response_pl = &response_pl,
> +		.response_pl = (__le32 *)&response_pl,
>   		.response_pl_sz = sizeof(response_pl),
>   		.complete = pci_doe_task_complete,
>   		.private = &c,
>   	};
>   	int rc;
>   
> +	cpu_to_le32s(&request_pl);
> +
>   	rc = pci_doe_submit_task(doe_mb, &task);
>   	if (rc < 0)
>   		return rc;
> @@ -340,6 +342,7 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid,
>   	if (task.rv != sizeof(response_pl))
>   		return -EIO;
>   
> +	le32_to_cpus(&response_pl);
>   	*vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl);
>   	*protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL,
>   			      response_pl);
> diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h
> index ed9b4df792b8..43765eaf2342 100644
> --- a/include/linux/pci-doe.h
> +++ b/include/linux/pci-doe.h
> @@ -34,6 +34,10 @@ struct pci_doe_mb;
>    * @work: Used internally by the mailbox
>    * @doe_mb: Used internally by the mailbox
>    *
> + * Payloads are treated as opaque byte streams which are transmitted verbatim,
> + * without byte-swapping.  If payloads contain little-endian register values,
> + * the caller is responsible for conversion with cpu_to_le32() / le32_to_cpu().
> + *
>    * The payload sizes and rv are specified in bytes with the following
>    * restrictions concerning the protocol.
>    *
> @@ -45,9 +49,9 @@ struct pci_doe_mb;
>    */
>   struct pci_doe_task {
>   	struct pci_doe_protocol prot;
> -	u32 *request_pl;
> +	__le32 *request_pl;
>   	size_t request_pl_sz;
> -	u32 *response_pl;
> +	__le32 *response_pl;


This does not look right. Either:
- pci_doe() should also take __le32* or
- pci_doe() should do cpu_to_le32() for the request, and 
pci_doe_task_complete() - for the response.


Thanks,


>   	size_t response_pl_sz;
>   	int rv;
>   	void (*complete)(struct pci_doe_task *task);

-- 
Alexey


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-28  1:18   ` Alexey Kardashevskiy
  2023-02-28  1:39     ` Dan Williams
@ 2023-02-28  5:43     ` Lukas Wunner
  2023-02-28  7:24       ` Alexey Kardashevskiy
  1 sibling, 1 reply; 62+ messages in thread
From: Lukas Wunner @ 2023-02-28  5:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On Tue, Feb 28, 2023 at 12:18:07PM +1100, Alexey Kardashevskiy wrote:
> On 11/2/23 07:25, Lukas Wunner wrote:
> > For the same reason a DOE instance cannot be shared between the PCI core
> > and a driver.
> 
> And we want this sharing why? Any example will do. Thanks,

The PCI core is going to perform CMA/SPDM authentication when a device
gets enumerated (PCIe r6.0 sec 6.31).  That's the main motivation
to lift DOE mailbox creation into the PCI core.  It's not mentioned
here explicitly because I want the patch to stand on its own.
CMA/SPDM support will be submitted separately.

A driver that later on gets bound to the device should be allowed
to talk to it via DOE as well, possibly even sharing the same DOE
mailboxes used by the PCI core.

Patches for CMA/SPDM are under development on this branch:

https://github.com/l1k/linux/commits/doe


> > Currently a DOE instance cannot be shared by multiple drivers because
> > each driver creates its own pci_doe_mb struct for a given DOE instance.
> 
> Sorry for my ignorance but why/how/when would a device have multiple drivers
> bound? Or it only one driver at the time but we also want DOE MBs to survive
> switching to another (different or newer) driver?

Conceivably, a driver may have the need to talk to multiple devices
via DOE, even ones it's not bound to.  (E.g. devices in its ancestry
or children.)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation
  2023-02-28  1:45   ` Alexey Kardashevskiy
@ 2023-02-28  5:55     ` Lukas Wunner
  0 siblings, 0 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-02-28  5:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On Tue, Feb 28, 2023 at 12:45:33PM +1100, Alexey Kardashevskiy wrote:
> Almost no change from this patchset applied cleanly to
> drivers/cxl/core/pci.c, what is the patchset based on? Thanks,

This series is based on pci/next, i.e. v6.2-rc1 plus PCI changes for v6.3:

https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=next

It should apply cleanly to cxl/next as well, save for a trivial merge
conflict in patch [13/16] due to a context change.  Here's a branch
containing v3 of this series based on cxl/next:

https://github.com/l1k/linux/commits/doe_rework_v3_cxl_next

That said, you're probably better off just using the development branch
for CMA/SPDM, it incorporates all the changes requested during review
of v3 and will form the basis for v4 (slated for submission in a few
days):

https://github.com/l1k/linux/commits/doe

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-28  5:43     ` Lukas Wunner
@ 2023-02-28  7:24       ` Alexey Kardashevskiy
  2023-02-28 10:42         ` Jonathan Cameron
  2023-03-02 20:22         ` Lukas Wunner
  0 siblings, 2 replies; 62+ messages in thread
From: Alexey Kardashevskiy @ 2023-02-28  7:24 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On 28/2/23 16:43, Lukas Wunner wrote:
> On Tue, Feb 28, 2023 at 12:18:07PM +1100, Alexey Kardashevskiy wrote:
>> On 11/2/23 07:25, Lukas Wunner wrote:
>>> For the same reason a DOE instance cannot be shared between the PCI core
>>> and a driver.
>>
>> And we want this sharing why? Any example will do. Thanks,
> 
> The PCI core is going to perform CMA/SPDM authentication when a device
> gets enumerated (PCIe r6.0 sec 6.31).  That's the main motivation
> to lift DOE mailbox creation into the PCI core.  It's not mentioned
> here explicitly because I want the patch to stand on its own.
> CMA/SPDM support will be submitted separately.

I was going the opposite direction with avoiding adding this into the 
PCI core as 1) the pci_dev struct is already 2K  and 2) it is a niche 
feature and  3) I wanted this CMA/SPDM session setup to be platform 
specific as on our platform the SPDM support requires some devices to be 
probed before we can any SPDM.


> A driver that later on gets bound to the device should be allowed
> to talk to it via DOE as well, possibly even sharing the same DOE
> mailboxes used by the PCI core.
> 
> Patches for CMA/SPDM are under development on this branch:
> 
> https://github.com/l1k/linux/commits/doe

yes, thanks! Lots of reading :)


>>> Currently a DOE instance cannot be shared by multiple drivers because
>>> each driver creates its own pci_doe_mb struct for a given DOE instance.
>>
>> Sorry for my ignorance but why/how/when would a device have multiple drivers
>> bound? Or it only one driver at the time but we also want DOE MBs to survive
>> switching to another (different or newer) driver?
> 
> Conceivably, a driver may have the need to talk to multiple devices
> via DOE, even ones it's not bound to.  (E.g. devices in its ancestry
> or children.)

Ah ok. Well, a parent device could look for the DOE MB in a child using 
devres_find(), this requirement alone does not require moving things to 
the PCI core and potentially allows it to be a module which could be a 
better way as distros could have it always enabled but it would not 
waste any memory on my laptop when not loaded. Thanks,


-- 
Alexey


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-28  2:53   ` Alexey Kardashevskiy
@ 2023-02-28  8:24     ` Lukas Wunner
  2023-02-28 12:08       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 62+ messages in thread
From: Lukas Wunner @ 2023-02-28  8:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On Tue, Feb 28, 2023 at 01:53:46PM +1100, Alexey Kardashevskiy wrote:
> On 11/2/23 07:25, Lukas Wunner wrote:
> > @@ -493,8 +493,8 @@ static void cxl_doe_task_complete(struct pci_doe_task *task)
> >   }
> >   struct cdat_doe_task {
> > -	u32 request_pl;
> > -	u32 response_pl[32];
> > +	__le32 request_pl;
> > +	__le32 response_pl[32];
> 
> This is ok as it is a binary format of DOE message (is it?)...

Yes, the DOE request and response is an opaque byte stream.

The above-quoted type changes are necessary to avoid sparse warnings
(as noted in the commit message).


> > --- a/drivers/pci/doe.c
> > +++ b/drivers/pci/doe.c
> > @@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
> >   					  length));
> >   	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)
> >   		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
> > -				       task->request_pl[i]);
> > +				       le32_to_cpu(task->request_pl[i]));
> 
> Does it really work on BE? My little brain explodes on all these convertions
> 
> char buf[] = { 1, 2, 3, 4 }
> u32 *request_pl = buf;
> 
> request_pl[0] will be 0x01020304.
> le32_to_cpu(request_pl[0]) will be 0x04030201
> And then pci_write_config_dword() will do another swap.
> 
> Did I miss something? (/me is gone bringing up a BE system).

Correct.


> > @@ -45,9 +49,9 @@ struct pci_doe_mb;
> >    */
> >   struct pci_doe_task {
> >   	struct pci_doe_protocol prot;
> > -	u32 *request_pl;
> > +	__le32 *request_pl;
> >   	size_t request_pl_sz;
> > -	u32 *response_pl;
> > +	__le32 *response_pl;
> 
> 
> This does not look right. Either:
> - pci_doe() should also take __le32* or
> - pci_doe() should do cpu_to_le32() for the request, and
> pci_doe_task_complete() - for the response.

Again, the type changes are necessary to avoid sparse warnings.

pci_doe() takes a void * because patch [15/16] eliminates the need
for the request and response size to be a multiple of dwords.
Passing __le32 * to pci_doe() would then no longer be correct.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-28  7:24       ` Alexey Kardashevskiy
@ 2023-02-28 10:42         ` Jonathan Cameron
  2023-03-02 20:22         ` Lukas Wunner
  1 sibling, 0 replies; 62+ messages in thread
From: Jonathan Cameron @ 2023-02-28 10:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Lukas Wunner, Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Dan Williams, Alison Schofield, Vishal Verma, Dave Jiang, Li,
	Ming, Hillf Danton, Ben Widawsky, linuxarm, linux-cxl

On Tue, 28 Feb 2023 18:24:41 +1100
Alexey Kardashevskiy <aik@amd.com> wrote:

> On 28/2/23 16:43, Lukas Wunner wrote:
> > On Tue, Feb 28, 2023 at 12:18:07PM +1100, Alexey Kardashevskiy wrote:  
> >> On 11/2/23 07:25, Lukas Wunner wrote:  
> >>> For the same reason a DOE instance cannot be shared between the PCI core
> >>> and a driver.  
> >>
> >> And we want this sharing why? Any example will do. Thanks,  
> > 
> > The PCI core is going to perform CMA/SPDM authentication when a device
> > gets enumerated (PCIe r6.0 sec 6.31).  That's the main motivation
> > to lift DOE mailbox creation into the PCI core.  It's not mentioned
> > here explicitly because I want the patch to stand on its own.
> > CMA/SPDM support will be submitted separately.  
> 
> I was going the opposite direction with avoiding adding this into the 
> PCI core as 1) the pci_dev struct is already 2K  and 2) it is a niche 
> feature and  3) I wanted this CMA/SPDM session setup to be platform 
> specific as on our platform the SPDM support requires some devices to be 
> probed before we can any SPDM.

Is that happening over a DOE mailbox, or a different transport?
If it's a different transport then that should be fine, though we'll need
to have a slightly different security model and any part of early
driver load will need to be carefully hardened against a "malicious" device
if it is doing anything non trivial. If it's just "turning on the lights"
then shouldn't be a problem.

Will be interesting to see how niche DOE ends up.  My guess it it's worth
a struct xarray, but we could take it out of line if that saves enough
to bother.

> 
> 
> > A driver that later on gets bound to the device should be allowed
> > to talk to it via DOE as well, possibly even sharing the same DOE
> > mailboxes used by the PCI core.
> > 
> > Patches for CMA/SPDM are under development on this branch:
> > 
> > https://github.com/l1k/linux/commits/doe  
> 
> yes, thanks! Lots of reading :)
> 
> 
> >>> Currently a DOE instance cannot be shared by multiple drivers because
> >>> each driver creates its own pci_doe_mb struct for a given DOE instance.  
> >>
> >> Sorry for my ignorance but why/how/when would a device have multiple drivers
> >> bound? Or it only one driver at the time but we also want DOE MBs to survive
> >> switching to another (different or newer) driver?  
> > 
> > Conceivably, a driver may have the need to talk to multiple devices
> > via DOE, even ones it's not bound to.  (E.g. devices in its ancestry
> > or children.)  
> 
> Ah ok. Well, a parent device could look for the DOE MB in a child using 
> devres_find(), this requirement alone does not require moving things to 
> the PCI core and potentially allows it to be a module which could be a 
> better way as distros could have it always enabled but it would not 
> waste any memory on my laptop when not loaded. Thanks,

The DOE mailboxes have an "exciting" level of flexibility and discovering
supported protocols requires use of the DOE itself. So we need a single
entity to take control over concurrent access to each DOE instance.

Given the mix of protocols, I'd expect some of them to be potentially accessed
by a parent, and others to be accessed by driver attached to the child.

Whether it needs to chat to it's parent isn't totally clear to me yet, as
depends a bit on what entities end up getting created for management of
encryption etc + what other usecases we see in medium term.

Jonathan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian
  2023-02-28  8:24     ` Lukas Wunner
@ 2023-02-28 12:08       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 62+ messages in thread
From: Alexey Kardashevskiy @ 2023-02-28 12:08 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On 28/2/23 19:24, Lukas Wunner wrote:
> On Tue, Feb 28, 2023 at 01:53:46PM +1100, Alexey Kardashevskiy wrote:
>> On 11/2/23 07:25, Lukas Wunner wrote:
>>> @@ -493,8 +493,8 @@ static void cxl_doe_task_complete(struct pci_doe_task *task)
>>>    }
>>>    struct cdat_doe_task {
>>> -	u32 request_pl;
>>> -	u32 response_pl[32];
>>> +	__le32 request_pl;
>>> +	__le32 response_pl[32];
>>
>> This is ok as it is a binary format of DOE message (is it?)...
> 
> Yes, the DOE request and response is an opaque byte stream.
> 
> The above-quoted type changes are necessary to avoid sparse warnings
> (as noted in the commit message).
> 
> 
>>> --- a/drivers/pci/doe.c
>>> +++ b/drivers/pci/doe.c
>>> @@ -143,7 +143,7 @@ static int pci_doe_send_req(struct pci_doe_mb *doe_mb,
>>>    					  length));
>>>    	for (i = 0; i < task->request_pl_sz / sizeof(u32); i++)
>>>    		pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
>>> -				       task->request_pl[i]);
>>> +				       le32_to_cpu(task->request_pl[i]));
>>
>> Does it really work on BE? My little brain explodes on all these convertions
>>
>> char buf[] = { 1, 2, 3, 4 }
>> u32 *request_pl = buf;
>>
>> request_pl[0] will be 0x01020304.
>> le32_to_cpu(request_pl[0]) will be 0x04030201
>> And then pci_write_config_dword() will do another swap.
>>
>> Did I miss something? (/me is gone bringing up a BE system).
> 
> Correct.

Uff, ok, your code works correct, sorry for the noise.


> 
>>> @@ -45,9 +49,9 @@ struct pci_doe_mb;
>>>     */
>>>    struct pci_doe_task {
>>>    	struct pci_doe_protocol prot;
>>> -	u32 *request_pl;
>>> +	__le32 *request_pl;
>>>    	size_t request_pl_sz;
>>> -	u32 *response_pl;
>>> +	__le32 *response_pl;
>>
>>
>> This does not look right. Either:
>> - pci_doe() should also take __le32* or
>> - pci_doe() should do cpu_to_le32() for the request, and
>> pci_doe_task_complete() - for the response.
> 
> Again, the type changes are necessary to avoid sparse warnings.

I'd shut it up like this:

===
          for (i = 0; i < task->request_pl_sz / sizeof(__le32); i++)
                 pci_write_config_dword(pdev, offset + PCI_DOE_WRITE,
-                                      le32_to_cpu(task->request_pl[i]));
+                                      le32_to_cpu(*(__le32 
*)&task->request_pl[i]));
===

+ may be a comment that LE is a natural order. Dunno.

Changing types to __le32 without explicit conversions or comments just 
confuses. Thanks,


> pci_doe() takes a void * because patch [15/16] eliminates the need
> for the request and response size to be a multiple of dwords.
> Passing __le32 * to pci_doe() would then no longer be correct.


-- 
Alexey


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally
  2023-02-10 20:25 ` [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally Lukas Wunner
  2023-02-15  1:45   ` Li, Ming
@ 2023-02-28 18:58   ` Davidlohr Bueso
  1 sibling, 0 replies; 62+ messages in thread
From: Davidlohr Bueso @ 2023-02-28 18:58 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On Fri, 10 Feb 2023, Lukas Wunner wrote:

>The DOE API only allows asynchronous exchanges and forces callers to
>provide a completion callback.  Yet all existing callers only perform
>synchronous exchanges.  Upcoming commits for CMA (Component Measurement
>and Authentication, PCIe r6.0 sec 6.31) likewise require only
>synchronous DOE exchanges.
>
>Provide a synchronous pci_doe() API call which builds on the internal
>asynchronous machinery.
>
>Convert the internal pci_doe_discovery() to the new call.
>
>The new API allows submission of const-declared requests, necessitating
>the addition of a const qualifier in struct pci_doe_task.
>
>Tested-by: Ira Weiny <ira.weiny@intel.com>
>Signed-off-by: Lukas Wunner <lukas@wunner.de>
>Reviewed-by: Ira Weiny <ira.weiny@intel.com>
>Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Cc: Dan Williams <dan.j.williams@intel.com>

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 06/16] PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
  2023-02-10 20:25 ` [PATCH v3 06/16] PCI/DOE: Fix memory leak " Lukas Wunner
  2023-02-11  1:06   ` Dan Williams
@ 2023-03-01  1:51   ` Davidlohr Bueso
  1 sibling, 0 replies; 62+ messages in thread
From: Davidlohr Bueso @ 2023-03-01  1:51 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On Fri, 10 Feb 2023, Lukas Wunner wrote:

>After a pci_doe_task completes, its work_struct needs to be destroyed
>to avoid a memory leak with CONFIG_DEBUG_OBJECTS=y.
>
>Fixes: 9d24322e887b ("PCI/DOE: Add DOE mailbox support functions")
>Tested-by: Ira Weiny <ira.weiny@intel.com>
>Signed-off-by: Lukas Wunner <lukas@wunner.de>
>Reviewed-by: Ira Weiny <ira.weiny@intel.com>
>Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Cc: stable@vger.kernel.org # v6.0+

Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-02-28  7:24       ` Alexey Kardashevskiy
  2023-02-28 10:42         ` Jonathan Cameron
@ 2023-03-02 20:22         ` Lukas Wunner
  2023-03-07  1:55           ` Alexey Kardashevskiy
  2023-04-03  0:55           ` Alexey Kardashevskiy
  1 sibling, 2 replies; 62+ messages in thread
From: Lukas Wunner @ 2023-03-02 20:22 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On Tue, Feb 28, 2023 at 06:24:41PM +1100, Alexey Kardashevskiy wrote:
> On 28/2/23 16:43, Lukas Wunner wrote:
> > On Tue, Feb 28, 2023 at 12:18:07PM +1100, Alexey Kardashevskiy wrote:
> > > On 11/2/23 07:25, Lukas Wunner wrote:
> > > > For the same reason a DOE instance cannot be shared between the
> > > > PCI core and a driver.
> > > 
> > > And we want this sharing why? Any example will do. Thanks,
> > 
> > The PCI core is going to perform CMA/SPDM authentication when a device
> > gets enumerated (PCIe r6.0 sec 6.31).  That's the main motivation
> > to lift DOE mailbox creation into the PCI core.  It's not mentioned
> > here explicitly because I want the patch to stand on its own.
> > CMA/SPDM support will be submitted separately.
> 
> I was going the opposite direction with avoiding adding this into the PCI
> core as 1) the pci_dev struct is already 2K  and 2) it is a niche feature
> and  3) I wanted this CMA/SPDM session setup to be platform specific as on
> our platform the SPDM support requires some devices to be probed before we
> can any SPDM.

We had an open discussion at Plumbers with stakeholders from various
companies and the consensus was that initially we'll upstream a CMA/SPDM
implementation which authenticates devices on enumeration and exposes
the result to user space via sysfs.  Nothing more, nothing less:

https://lpc.events/event/16/contributions/1304/
https://lpc.events/event/16/contributions/1304/attachments/1029/1974/LPC2022-SPDM-BoF-v4.pdf

Thereby we seek to minimize friction in the upstreaming process
because we avoid depending on PCIe features layered on top of
CMA/SPDM (such as IDE) and we can postpone controversial discussions
about the consequences of failed authentication (such as forbidding
driver binding or firewalling the device off via ACS).

The patches under development follow the approach we agreed on back then.

I honestly think that the size of struct pci_dev is a non-issue because
even the lowest-end IoT devices come with gigabytes of memory these days
and as long as we do not exceed PAGE_SIZE (which is the limit when we'd
have to switch from kmalloc() to vmalloc() I believe), there's no
problem.

The claim that DOE and CMA/SPDM is going to be a niche feature is at
least debatable.  It seems rather likely to me that we'll see adoption
particularly in the mobile segment as iOS/Android/Chrome promulgators
will see an opportunity to harden their products further.

It's intriguing that you need some devices to be probed before you
can perform authentication.  What do you need this for?

Do you perhaps need to load firmware onto the device before you can
authenticate it?  Doesn't that defeat the whole point of authentication?

Or do you mean that certain *other* devices need to probe before a device
can be authenticated?  What are these other devices?

What happens if the device is suspended to D3cold or experiences a
Conventional Reset?  Do you need to do anything special before you
can re-authenticate the device?

On the one hand, performing authentication in the PCI core could be
considered a case of midlayer fallacy.  On the other hand, I think we
do want to afford CMA/SPDM and the features layered on top of it
(IDE, TDISP) regardless whether a driver is bound:  That way we can
protect the PCI core's communication with the device and with ACS
we can protect other PCI devices in the system from peer-to-peer
communication originating from a malicious device which failed to
authenticate.

I would like to make the implementation under development work for
your use case as well, but to do that I need a better understanding
what your requirements are.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-03-02 20:22         ` Lukas Wunner
@ 2023-03-07  1:55           ` Alexey Kardashevskiy
  2023-04-03  0:55           ` Alexey Kardashevskiy
  1 sibling, 0 replies; 62+ messages in thread
From: Alexey Kardashevskiy @ 2023-03-07  1:55 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On 3/3/23 07:22, Lukas Wunner wrote:
> On Tue, Feb 28, 2023 at 06:24:41PM +1100, Alexey Kardashevskiy wrote:
>> On 28/2/23 16:43, Lukas Wunner wrote:
>>> On Tue, Feb 28, 2023 at 12:18:07PM +1100, Alexey Kardashevskiy wrote:
>>>> On 11/2/23 07:25, Lukas Wunner wrote:
>>>>> For the same reason a DOE instance cannot be shared between the
>>>>> PCI core and a driver.
>>>>
>>>> And we want this sharing why? Any example will do. Thanks,
>>>
>>> The PCI core is going to perform CMA/SPDM authentication when a device
>>> gets enumerated (PCIe r6.0 sec 6.31).  That's the main motivation
>>> to lift DOE mailbox creation into the PCI core.  It's not mentioned
>>> here explicitly because I want the patch to stand on its own.
>>> CMA/SPDM support will be submitted separately.
>>
>> I was going the opposite direction with avoiding adding this into the PCI
>> core as 1) the pci_dev struct is already 2K  and 2) it is a niche feature
>> and  3) I wanted this CMA/SPDM session setup to be platform specific as on
>> our platform the SPDM support requires some devices to be probed before we
>> can any SPDM.
> 
> We had an open discussion at Plumbers with stakeholders from various
> companies and the consensus was that initially we'll upstream a CMA/SPDM
> implementation which authenticates devices on enumeration and exposes
> the result to user space via sysfs.  Nothing more, nothing less:
> 
> https://lpc.events/event/16/contributions/1304/
> https://lpc.events/event/16/contributions/1304/attachments/1029/1974/LPC2022-SPDM-BoF-v4.pdf
> 
> Thereby we seek to minimize friction in the upstreaming process
> because we avoid depending on PCIe features layered on top of
> CMA/SPDM (such as IDE) and we can postpone controversial discussions
> about the consequences of failed authentication (such as forbidding
> driver binding or firewalling the device off via ACS).
> 
> The patches under development follow the approach we agreed on back then.
> 
> I honestly think that the size of struct pci_dev is a non-issue because
> even the lowest-end IoT devices come with gigabytes of memory these days
> and as long as we do not exceed PAGE_SIZE (which is the limit when we'd
> have to switch from kmalloc() to vmalloc() I believe), there's no
> problem.

Well, not sizeof(pci_dev) as much, it is more about DOE/SPDM code being 
compiled into vmlinux - it is hard for me to imagine this running on my 
laptop soon but distros dislike multiple kernel configs. Not a huge deal 
but still.

> The claim that DOE and CMA/SPDM is going to be a niche feature is at
> least debatable.  It seems rather likely to me that we'll see adoption
> particularly in the mobile segment as iOS/Android/Chrome promulgators
> will see an opportunity to harden their products further.

These guys all use custom configs (as was also suggested in that threat 
thread - "Linux guest kernel threat model for Confidential Computing").

> It's intriguing that you need some devices to be probed before you
> can perform authentication.  What do you need this for?
> 
> Do you perhaps need to load firmware onto the device before you can
> authenticate it?  Doesn't that defeat the whole point of authentication?
> > Or do you mean that certain *other* devices need to probe before a device
> can be authenticated?  What are these other devices?

The AMD CCP device (which represents PSP == a secure processor == TCB 
for SEV VMs) is a PCI device and when we add TDISP/etc to the TCB, we 
will need the host to speak to PSP to set this up. It does not matter 
for host-only IDE support though.


> What happens if the device is suspended to D3cold or experiences a
> Conventional Reset?  Do you need to do anything special before you
> can re-authenticate the device?
> 
> On the one hand, performing authentication in the PCI core could be
> considered a case of midlayer fallacy.  On the other hand, I think we
> do want to afford CMA/SPDM and the features layered on top of it
> (IDE, TDISP) regardless whether a driver is bound:


https://cdrdv2-public.intel.com/742542/software-enabling-for-tdx-tee-io-fixed.pdf 
suggests that TDX is going to have to reimplement SPDM/IDE/TDISP again 
so the common base here is DOE only, is not it?

And what is the value of TDISP without a device driver, how can the PCI 
core decide if the device is alright to use? I am sure there is value 
but my imagination fails me.


>  That way we can
> protect the PCI core's communication with the device and with ACS
> we can protect other PCI devices in the system from peer-to-peer
> communication originating from a malicious device which failed to
> authenticate.
> 
> I would like to make the implementation under development work for
> your use case as well, but to do that I need a better understanding
> what your requirements are.

While I am figuring this out, is it a bug? :)
https://github.com/l1k/linux/commit/c1b242d74d55d361934bdcff1bb4f881922b10ae#r102531215

Thanks,

> Thanks,
> 
> Lukas

-- 
Alexey


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration
  2023-03-02 20:22         ` Lukas Wunner
  2023-03-07  1:55           ` Alexey Kardashevskiy
@ 2023-04-03  0:55           ` Alexey Kardashevskiy
  1 sibling, 0 replies; 62+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-03  0:55 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, Gregory Price, Ira Weiny,
	Jonathan Cameron, Dan Williams, Alison Schofield, Vishal Verma,
	Dave Jiang, Li, Ming, Hillf Danton, Ben Widawsky, linuxarm,
	linux-cxl

On 3/3/23 07:22, Lukas Wunner wrote:

> I would like to make the implementation under development work for
> your use case as well, but to do that I need a better understanding
> what your requirements are.

Soooo here is what we are up to:

https://www.amd.com/content/dam/amd/en/documents/developer/sev-tio-whitepaper.pdf

tl;dr: it is secure VFIO == TDISP on AMD SEV-SNP so it is going to use 
ASP (== PSP == the secure processor) for SPDM/IDE.


-- 
Alexey


^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2023-04-03  0:55 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10 20:25 [PATCH v3 00/16] Collection of DOE material Lukas Wunner
2023-02-10 20:25 ` [PATCH v3 01/16] cxl/pci: Fix CDAT retrieval on big endian Lukas Wunner
2023-02-11  0:22   ` Dan Williams
2023-02-19 13:03     ` Lukas Wunner
2023-02-14 11:15   ` Jonathan Cameron
2023-02-14 13:51     ` Lukas Wunner
2023-02-14 15:45       ` Jonathan Cameron
2023-02-28  2:53   ` Alexey Kardashevskiy
2023-02-28  8:24     ` Lukas Wunner
2023-02-28 12:08       ` Alexey Kardashevskiy
2023-02-10 20:25 ` [PATCH v3 02/16] cxl/pci: Handle truncated CDAT header Lukas Wunner
2023-02-11  0:40   ` Dan Williams
2023-02-11  9:34     ` Lukas Wunner
2023-02-14 11:16   ` Jonathan Cameron
2023-02-15  1:41   ` Li, Ming
2023-02-10 20:25 ` [PATCH v3 03/16] cxl/pci: Handle truncated CDAT entries Lukas Wunner
2023-02-11  0:50   ` Dan Williams
2023-02-11 10:56     ` Lukas Wunner
2023-02-14 11:30   ` Jonathan Cameron
2023-02-10 20:25 ` [PATCH v3 04/16] cxl/pci: Handle excessive CDAT length Lukas Wunner
2023-02-11  1:04   ` Dan Williams
2023-02-14 11:33   ` Jonathan Cameron
2023-02-16 10:26     ` Lukas Wunner
2023-02-17 10:01       ` Jonathan Cameron
2023-02-10 20:25 ` [PATCH v3 05/16] PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y Lukas Wunner
2023-02-10 20:25 ` [PATCH v3 06/16] PCI/DOE: Fix memory leak " Lukas Wunner
2023-02-11  1:06   ` Dan Williams
2023-03-01  1:51   ` Davidlohr Bueso
2023-02-10 20:25 ` [PATCH v3 07/16] PCI/DOE: Provide synchronous API and use it internally Lukas Wunner
2023-02-15  1:45   ` Li, Ming
2023-02-28 18:58   ` Davidlohr Bueso
2023-02-10 20:25 ` [PATCH v3 08/16] cxl/pci: Use synchronous API for DOE Lukas Wunner
2023-02-10 20:25 ` [PATCH v3 09/16] PCI/DOE: Make asynchronous API private Lukas Wunner
2023-02-15  1:48   ` Li, Ming
2023-02-10 20:25 ` [PATCH v3 10/16] PCI/DOE: Deduplicate mailbox flushing Lukas Wunner
2023-02-14 11:36   ` Jonathan Cameron
2023-02-15  5:07   ` Li, Ming
2023-02-10 20:25 ` [PATCH v3 11/16] PCI/DOE: Allow mailbox creation without devres management Lukas Wunner
2023-02-14 11:51   ` Jonathan Cameron
2023-02-15  5:17   ` Li, Ming
2023-02-10 20:25 ` [PATCH v3 12/16] PCI/DOE: Create mailboxes on device enumeration Lukas Wunner
2023-02-15  2:07   ` Li, Ming
2023-02-28  1:18   ` Alexey Kardashevskiy
2023-02-28  1:39     ` Dan Williams
2023-02-28  5:43     ` Lukas Wunner
2023-02-28  7:24       ` Alexey Kardashevskiy
2023-02-28 10:42         ` Jonathan Cameron
2023-03-02 20:22         ` Lukas Wunner
2023-03-07  1:55           ` Alexey Kardashevskiy
2023-04-03  0:55           ` Alexey Kardashevskiy
2023-02-10 20:25 ` [PATCH v3 13/16] cxl/pci: Use CDAT DOE mailbox created by PCI core Lukas Wunner
2023-02-10 20:25 ` [PATCH v3 14/16] PCI/DOE: Make mailbox creation API private Lukas Wunner
2023-02-15  2:13   ` Li, Ming
2023-02-10 20:25 ` [PATCH v3 15/16] PCI/DOE: Relax restrictions on request and response size Lukas Wunner
2023-02-15  5:05   ` Li, Ming
2023-02-15 11:49     ` Lukas Wunner
2023-02-10 20:25 ` [PATCH v3 16/16] cxl/pci: Rightsize CDAT response allocation Lukas Wunner
2023-02-14 13:05   ` Jonathan Cameron
2023-02-16  0:56   ` Ira Weiny
2023-02-16  8:03     ` Lukas Wunner
2023-02-28  1:45   ` Alexey Kardashevskiy
2023-02-28  5:55     ` Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).