linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] DOE usage with pcie/portdrv
@ 2022-05-03 15:34 Jonathan Cameron
  2022-05-03 15:34 ` [RFC PATCH 1/1] pcie/portdrv: Hack in DOE and CDAT support Jonathan Cameron
  2022-05-06 22:40 ` [RFC PATCH 0/1] DOE usage with pcie/portdrv Dan Williams
  0 siblings, 2 replies; 22+ messages in thread
From: Jonathan Cameron @ 2022-05-03 15:34 UTC (permalink / raw)
  To: linuxarm, ira.weiny, Linux PCI, linux-cxl, Dan Williams

So far, we have been considering Data Object Exchange (DOE) mailboxes only
on EPs (CXL type 3 devices).
CXL CDAT (technically CXL Table Query Protocol but lets just call it CDAT)
  https://lore.kernel.org/linux-cxl/20220414203237.2198665-1-ira.weiny@intel.com
CMA/SPDM support
  https://lore.kernel.org/linux-cxl/20220303135905.10420-1-Jonathan.Cameron@huawei.com/

However, a number of DOE protocols apply to switch (and root) ports.
DOE instances supporting CDAT occur on switch upstream ports as well as EPs.

DOE instances supporting CMA may occur in root ports, upstream switch ports,
downstream switch ports and EPs (including multiple functions where relevant).

The intent of this RFC is to discuss how to actually implement such support.
The attached patch is a really rough PoC for CDAT on upstream switch ports
done by adding a new pcie_port_service_driver.  This is different from the
proposed auxiliary device used for CXL type 3 devices (for now).

So open questions:

1. Granularity. Should we do a driver per group of protocols that may
   be collocated, or one per DOE instance. For now, we might be looking
   at CDAT as done for this PoC, and CMA/IDE.
2. Use of a pcie_port_service_driver a reasonable way to do this?
3. Service provision. It is likely that all of the protocols defined
   above will be used as part of activities that span multiple devices.
   a) CDAT used to establish latencies and bandwidth between host CPU
      and memory on a CXL type 3 device beyond one or more CXL switches.
   b) CMA.  Might just be used to provide simple device attestation
      and potentially lock out the upstream port above a switch if the
      switch does not pass attestation.  Many many other uses possible...
   c) Secure CMA / IDE. Likely to be used to set up link IDE.  What
      this will look like is a question I've not really started
      thinking about yet.

   So how do we support this?  If nothing else we need to make sure
   the drivers for the port don't go away whilst in use.

The patch is a very early PoC just to show it would 'work'...

Note I am keen to not have the discussion around this support delay
Ira's series.

Jonathan Cameron (1):
  pcie/portdrv: Hack in DOE and CDAT support

 drivers/pci/pcie/Kconfig        |   5 ++
 drivers/pci/pcie/Makefile       |   1 +
 drivers/pci/pcie/cdat.c         | 127 ++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h      |   9 ++-
 drivers/pci/pcie/portdrv_core.c |  43 ++++++++++-
 drivers/pci/pcie/portdrv_pci.c  |   2 +
 6 files changed, 183 insertions(+), 4 deletions(-)
 create mode 100644 drivers/pci/pcie/cdat.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 1/1] pcie/portdrv: Hack in DOE and CDAT support
  2022-05-03 15:34 [RFC PATCH 0/1] DOE usage with pcie/portdrv Jonathan Cameron
@ 2022-05-03 15:34 ` Jonathan Cameron
  2022-05-06 22:40 ` [RFC PATCH 0/1] DOE usage with pcie/portdrv Dan Williams
  1 sibling, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2022-05-03 15:34 UTC (permalink / raw)
  To: linuxarm, ira.weiny, Linux PCI, linux-cxl, Dan Williams

This is very much a PoC for one potential way to support
CDAT on Switch Upstream Ports.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/Kconfig        |   5 ++
 drivers/pci/pcie/Makefile       |   1 +
 drivers/pci/pcie/cdat.c         | 127 ++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h      |   9 ++-
 drivers/pci/pcie/portdrv_core.c |  43 ++++++++++-
 drivers/pci/pcie/portdrv_pci.c  |   2 +
 6 files changed, 183 insertions(+), 4 deletions(-)
 create mode 100644 drivers/pci/pcie/cdat.c

diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 788ac8df3f9d..7d90c05bf0ac 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -142,3 +142,8 @@ config PCIE_EDR
 	  the PCI Firmware Specification r3.2.  Enable this if you want to
 	  support hybrid DPC model which uses both firmware and OS to
 	  implement DPC.
+
+config PCIE_CDAT
+       bool "CDAT hacks"
+       help
+	It's a dirty hack.
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 5783a2f79e6a..4925394e8d4b 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME)		+= pme.o
 obj-$(CONFIG_PCIE_DPC)		+= dpc.o
 obj-$(CONFIG_PCIE_PTM)		+= ptm.o
 obj-$(CONFIG_PCIE_EDR)		+= edr.o
+obj-$(CONFIG_PCIE_CDAT)		+= cdat.o
diff --git a/drivers/pci/pcie/cdat.c b/drivers/pci/pcie/cdat.c
new file mode 100644
index 000000000000..04165c1b8174
--- /dev/null
+++ b/drivers/pci/pcie/cdat.c
@@ -0,0 +1,127 @@
+//HACK
+#include <linux/pci.h>
+#include <linux/pci-doe.h>
+#include "../../cxl/cxlpci.h"
+#include "../../cxl/cdat.h"
+#include "portdrv.h"
+
+struct cdat {
+	struct pci_doe_mb *doe_mb;
+	struct device *dev;
+};
+
+#define CDAT_DOE_REQ(entry_handle)					\
+	(FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE,			\
+		    CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) |		\
+	 FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE,			\
+		    CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) |		\
+	 FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle)))
+
+static void cdat_doe_task_complete(struct pci_doe_task *task)
+{
+	complete(task->private);
+}
+
+static int cdat_get_length(struct cdat *cdat, size_t *length)
+{
+	u32 cdat_request_pl = CDAT_DOE_REQ(0);
+	u32 cdat_response_pl[32];
+	DECLARE_COMPLETION_ONSTACK(c);
+	struct pci_doe_task task = {
+		.prot.vid = PCI_DVSEC_VENDOR_ID_CXL,
+		.prot.type = CXL_DOE_PROTOCOL_TABLE_ACCESS,
+		.request_pl = &cdat_request_pl,
+		.request_pl_sz = sizeof(cdat_request_pl),
+		.response_pl = cdat_response_pl,
+		.response_pl_sz = sizeof(cdat_response_pl),
+		.complete = cdat_doe_task_complete,
+		.private = &c,
+	};
+	int rc;
+
+	rc = pci_doe_submit_task(cdat->doe_mb, &task);
+	if (rc < 0) {
+		dev_err(cdat->dev, "DOE submit failed: %d", rc);
+		return rc;
+	}
+	wait_for_completion(&c);
+
+	if (task.rv < 1)
+		return -EIO;
+
+	*length = cdat_response_pl[1];
+	dev_info(cdat->dev, "CDAT length %lu\n", *length);
+
+	return 0;
+}
+
+static int cdat_probe(struct pcie_device *dev)
+{
+	struct device *device = &dev->device;
+	struct cdat *cdat;
+	u16 off;
+        size_t length;
+	int rc;
+
+	/* probably not necessary */
+	if ((pci_pcie_type(dev->port) != PCI_EXP_TYPE_UPSTREAM))
+		return -ENODEV;
+
+	cdat = devm_kzalloc(device, sizeof(*cdat), GFP_KERNEL);
+	if (!cdat)
+		return -ENOMEM;
+
+	cdat->dev = device;
+	/* Terrible though it is lets go find the cap again */
+	pci_doe_for_each_off(dev->port, off) {
+		bool supports_cdat;
+		struct pci_doe_mb *doe_mb;
+
+		doe_mb = pci_doe_create_mb(dev->port, off, true);
+		if (IS_ERR(doe_mb)) {
+			dev_err(device,
+				"Failed to create the DOE mailbox state machine\n");
+			continue;
+		}
+		supports_cdat = pci_doe_supports_prot(doe_mb, PCI_DVSEC_VENDOR_ID_CXL,
+						CXL_DOE_PROTOCOL_TABLE_ACCESS);
+		if (supports_cdat) {
+			cdat->doe_mb = doe_mb;
+			break;
+		}
+		pci_doe_destroy_mb(doe_mb);
+	}
+	if (!cdat->doe_mb)
+		return -ENODEV;
+
+	rc = cdat_get_length(cdat, &length);
+	if (rc) {
+		dev_err(device, "Failed to get cdat length\n");
+		return rc;
+	}
+	printk("CDAT length is %lu\n", length);
+	/* this is a PoC so that'll do for now */
+	dev_set_drvdata(device, cdat);
+	return 0;
+}
+
+static void cdat_remove(struct pcie_device *dev)
+{
+	/* Mess with driver remove later */
+	struct cdat *cdat = dev_get_drvdata(&dev->device);
+
+	pci_doe_destroy_mb(cdat->doe_mb);
+}
+
+static struct pcie_port_service_driver cdat_driver = {
+	.name = "cdat",
+	.port_type = PCIE_ANY_PORT, //fixme -upstream only
+	.service = PCIE_PORT_SERVICE_CDAT,
+	.probe = cdat_probe,
+	.remove = cdat_remove,
+};
+
+int __init pcie_cdat_init(void)
+{
+	return pcie_port_service_register(&cdat_driver);
+}
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 0ef4bf5f811d..d35b973f5b67 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -22,8 +22,10 @@
 #define PCIE_PORT_SERVICE_DPC		(1 << PCIE_PORT_SERVICE_DPC_SHIFT)
 #define PCIE_PORT_SERVICE_BWNOTIF_SHIFT	4	/* Bandwidth notification */
 #define PCIE_PORT_SERVICE_BWNOTIF	(1 << PCIE_PORT_SERVICE_BWNOTIF_SHIFT)
+#define PCIE_PORT_SERVICE_CDAT_SHIFT	5	/* CXL CDAT Table DOE */
+#define PCIE_PORT_SERVICE_CDAT		(1 << PCIE_PORT_SERVICE_CDAT_SHIFT)
 
-#define PCIE_PORT_DEVICE_MAXSERVICES   5
+#define PCIE_PORT_DEVICE_MAXSERVICES   6
 
 extern bool pcie_ports_dpc_native;
 
@@ -53,6 +55,11 @@ int pcie_dpc_init(void);
 static inline int pcie_dpc_init(void) { return 0; }
 #endif
 
+#ifdef CONFIG_PCIE_CDAT
+int pcie_cdat_init(void);
+#else
+static inline int pcie_cdat_init(void) { return 0; }
+#endif
 /* Port Type */
 #define PCIE_ANY_PORT			(~0)
 
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 604feeb84ee4..161545b30014 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -8,6 +8,7 @@
 
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/pci-doe.h>
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/pm.h>
@@ -17,6 +18,7 @@
 #include <linux/aer.h>
 
 #include "../pci.h"
+#include "../../cxl/cxlpci.h"
 #include "portdrv.h"
 
 struct portdrv_service_data {
@@ -43,10 +45,12 @@ static void release_pcie_device(struct device *dev)
  * required to accommodate the largest Message Number.
  */
 static int pcie_message_numbers(struct pci_dev *dev, int mask,
-				u32 *pme, u32 *aer, u32 *dpc)
+				u32 *pme, u32 *aer, u32 *dpc, u32 *cdat)
 {
 	u32 nvec = 0, pos;
+	u16 off = 0;
 	u16 reg16;
+	u32 reg32;
 
 	/*
 	 * The Interrupt Message Number indicates which vector is used, i.e.,
@@ -86,6 +90,18 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 		}
 	}
 
+#ifdef CONFIG_PCIE_CDAT
+	pci_doe_for_each_off(dev, off) {
+		pci_read_config_dword(dev, off + PCI_DOE_CAP, &reg32);
+
+		if (!FIELD_GET(PCI_DOE_CAP_INT, reg32))
+			return -EOPNOTSUPP;
+
+		*cdat = FIELD_GET(PCI_DOE_CAP_IRQ, reg32);
+		nvec = max(nvec, *cdat + 1);
+	}
+#endif
+
 	return nvec;
 }
 
@@ -101,7 +117,7 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 {
 	int nr_entries, nvec, pcie_irq;
-	u32 pme = 0, aer = 0, dpc = 0;
+	u32 pme = 0, aer = 0, dpc = 0, cdat = 0;
 
 	/* Allocate the maximum possible number of MSI/MSI-X vectors */
 	nr_entries = pci_alloc_irq_vectors(dev, 1, PCIE_PORT_MAX_MSI_ENTRIES,
@@ -110,7 +126,7 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 		return nr_entries;
 
 	/* See how many and which Interrupt Message Numbers we actually use */
-	nvec = pcie_message_numbers(dev, mask, &pme, &aer, &dpc);
+	nvec = pcie_message_numbers(dev, mask, &pme, &aer, &dpc, &cdat);
 	if (nvec > nr_entries) {
 		pci_free_irq_vectors(dev);
 		return -EIO;
@@ -151,6 +167,9 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 	if (mask & PCIE_PORT_SERVICE_DPC)
 		irqs[PCIE_PORT_SERVICE_DPC_SHIFT] = pci_irq_vector(dev, dpc);
 
+	if (mask & PCIE_PORT_SERVICE_CDAT)
+		irqs[PCIE_PORT_SERVICE_CDAT_SHIFT] = pci_irq_vector(dev, cdat);
+
 	return 0;
 }
 
@@ -207,6 +226,7 @@ static int get_port_device_capability(struct pci_dev *dev)
 {
 	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
 	int services = 0;
+	u16 off = 0;
 
 	if (dev->is_hotplug_bridge &&
 	    (pcie_ports_native || host->native_pcie_hotplug)) {
@@ -264,6 +284,23 @@ static int get_port_device_capability(struct pci_dev *dev)
 		if (linkcap & PCI_EXP_LNKCAP_LBNC)
 			services |= PCIE_PORT_SERVICE_BWNOTIF;
 	}
+	/* Run the DOE in poll mode to query protocol */
+	pci_doe_for_each_off(dev, off) {
+		bool supports_cdat;
+		struct pci_doe_mb *doe_mb;
+
+		doe_mb = pci_doe_create_mb(dev, off, false);
+		if (IS_ERR(doe_mb)) {
+			pci_err(dev, "Failed to create the DOE mailbox state machine\n");
+			continue;
+		}
+		supports_cdat = pci_doe_supports_prot(doe_mb, PCI_DVSEC_VENDOR_ID_CXL,
+						      CXL_DOE_PROTOCOL_TABLE_ACCESS);
+		pci_doe_destroy_mb(doe_mb);
+		if (supports_cdat) {
+			services |= PCIE_PORT_SERVICE_CDAT;
+		}
+	}
 
 	return services;
 }
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 4b8801656ffb..b4ec0cab7eb9 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/pci.h>
+#include <linux/pci-doe.h>
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/pm.h>
@@ -235,6 +236,7 @@ static void __init pcie_init_services(void)
 	pcie_pme_init();
 	pcie_dpc_init();
 	pcie_hp_init();
+	pcie_cdat_init();
 }
 
 static int __init pcie_portdrv_init(void)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-03 15:34 [RFC PATCH 0/1] DOE usage with pcie/portdrv Jonathan Cameron
  2022-05-03 15:34 ` [RFC PATCH 1/1] pcie/portdrv: Hack in DOE and CDAT support Jonathan Cameron
@ 2022-05-06 22:40 ` Dan Williams
  2022-05-07 10:18   ` Lukas Wunner
  2022-05-09  9:33   ` Jonathan Cameron
  1 sibling, 2 replies; 22+ messages in thread
From: Dan Williams @ 2022-05-06 22:40 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Linuxarm, Weiny, Ira, Linux PCI, linux-cxl, Wunner, Lukas, CHUCK_LEVER

[ add Lukas and Chuck ]

On Tue, May 3, 2022 at 8:35 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> So far, we have been considering Data Object Exchange (DOE) mailboxes only
> on EPs (CXL type 3 devices).
> CXL CDAT (technically CXL Table Query Protocol but lets just call it CDAT)
>   https://lore.kernel.org/linux-cxl/20220414203237.2198665-1-ira.weiny@intel.com
> CMA/SPDM support
>   https://lore.kernel.org/linux-cxl/20220303135905.10420-1-Jonathan.Cameron@huawei.com/
>
> However, a number of DOE protocols apply to switch (and root) ports.
> DOE instances supporting CDAT occur on switch upstream ports as well as EPs.
>
> DOE instances supporting CMA may occur in root ports, upstream switch ports,
> downstream switch ports and EPs (including multiple functions where relevant).
>

So, like you, I was envisioning all the CMA and SPDM code landing in
the kernel until I read this:

"Extending in-kernel TLS support"
https://lwn.net/Articles/892216/

...and questioned why this new CMA/SPDM session establishment, which
is similar to TLS, be done inside the kernel while TLS session
establishment is done in userspace? I had a chance to chat with Chuck
at LSF/MM and confirmed there is little appetite to change this
up-call requirement for session establishment and expect CMA to be the
same. The rough idea of how this works with CMA/SPDM is providing an
ABI to retrieve session setup data with the end result of userspace
instantiating a keyid via keyctl the to be used for future SPDM
messages.

> The intent of this RFC is to discuss how to actually implement such support.
> The attached patch is a really rough PoC for CDAT on upstream switch ports
> done by adding a new pcie_port_service_driver.  This is different from the
> proposed auxiliary device used for CXL type 3 devices (for now).

CDAT to me is on the "CXL" side of a given PCI device. Given that
endpoints and switches are each represented by cxl_port objects it
seems those should generically carry the CDAT binary attribute, not
the PCI device, don't you think?

>
> So open questions:
>
> 1. Granularity. Should we do a driver per group of protocols that may
>    be collocated, or one per DOE instance. For now, we might be looking
>    at CDAT as done for this PoC, and CMA/IDE.

The more time goes by the more I am coming around to Bjorn's initial
reaction to all this that DOE is closer to a VPD model than an
auxiliary_device model or pcie_port_device model.

I.e. have some common discovery in the PCI core for enumerating DOE
instances and advertisting protocols, but otherwise leave it up to
individual leaf drivers like cxl_pci or cxl_port to use that core to
run a given protocol.

> 2. Use of a pcie_port_service_driver a reasonable way to do this?
> 3. Service provision. It is likely that all of the protocols defined
>    above will be used as part of activities that span multiple devices.
>    a) CDAT used to establish latencies and bandwidth between host CPU
>       and memory on a CXL type 3 device beyond one or more CXL switches.
>    b) CMA.  Might just be used to provide simple device attestation
>       and potentially lock out the upstream port above a switch if the
>       switch does not pass attestation.  Many many other uses possible...

Per above once userspace has installed an SPDM session keyid for a
given PCI device it can also optionally set an 'authorized' attribute
(similiar to what USB and Thunderbolt have) to indicate whether a
device has passed attestation. As for the actual protocols that are
going to run over the SPDM session those would need their own drivers
that reference the established keyid.

>    c) Secure CMA / IDE. Likely to be used to set up link IDE.  What
>       this will look like is a question I've not really started
>       thinking about yet.
>
>    So how do we support this?  If nothing else we need to make sure
>    the drivers for the port don't go away whilst in use.

Another reason to make it a core aspect of the PCI device like VPD so
there are no entanglements beyond "PCI device exists".

> The patch is a very early PoC just to show it would 'work'...
>
> Note I am keen to not have the discussion around this support delay
> Ira's series.

Is there a nearer term forcing function for this? I.e. v5.20 seems to
be where the current DOE series is going to intercept. I think abandon
the "aux" organization for now and make DOE like VPD.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-06 22:40 ` [RFC PATCH 0/1] DOE usage with pcie/portdrv Dan Williams
@ 2022-05-07 10:18   ` Lukas Wunner
  2022-05-09  9:48     ` Jonathan Cameron
  2022-05-09  9:33   ` Jonathan Cameron
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2022-05-07 10:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Cameron, Linuxarm, Weiny, Ira, Linux PCI, linux-cxl,
	CHUCK_LEVER

On Fri, May 06, 2022 at 03:40:25PM -0700, Dan Williams wrote:
> On Tue, May 3, 2022 at 8:35 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> So, like you, I was envisioning all the CMA and SPDM code landing in
> the kernel until I read this:
> 
> "Extending in-kernel TLS support"
> https://lwn.net/Articles/892216/
> 
> ...and questioned why this new CMA/SPDM session establishment, which
> is similar to TLS, be done inside the kernel while TLS session
> establishment is done in userspace? I had a chance to chat with Chuck
> at LSF/MM and confirmed there is little appetite to change this
> up-call requirement for session establishment and expect CMA to be the
> same. The rough idea of how this works with CMA/SPDM is providing an
> ABI to retrieve session setup data with the end result of userspace
> instantiating a keyid via keyctl the to be used for future SPDM
> messages.

I'm still somewhat undecided on the kernel vs. user space question.

The simplest approach would be to expose DOE to user space and just
let it send the requests necessary for setting up an SPDM session
as well as IDE.

Jonathan posted patches a while ago to expose a DOE chrdev:
https://lore.kernel.org/all/20210524133938.2815206-5-Jonathan.Cameron@huawei.com/

On the other hand, I'm poring over Jonathan's latest kernel SPDM/CMA
patches and they're not very large, hence seem upstreamable.

It would be nice to have the kernel auto-detect that a PCI device
supports IDE and create a directory in sysfs as a result.  The user
would just drop certificates there which are added to a keyring
and the kernel would automatically try to set up an SPDM session
with one of the certificates and bring up IDE encryption.  Binding of
drivers to devices could be delayed until IDE is set up, as could be
enumeration of devices below a hot-plugged, IDE-capable PCIe switch.

(I can think of plenty of use cases, some good, some evil.
Such as preventing malicious / forged devices from being used,
but also vendors and employers controlling what people plug into
their machines.  Brave new world.)

Of course, it would probably be possible to implement all of this
in user space but it might not be as seamless.

I just fear that we may become a laughing stock for security researchers
with a kernel-only implementation.  Kind of like "ASN.1 parsing at ring 0":
https://x41-dsec.de/lab/blog/kernel_userspace/

I have some feedback on Jonathan's SPDM/CMA patches and will send it
out in a bit...

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-06 22:40 ` [RFC PATCH 0/1] DOE usage with pcie/portdrv Dan Williams
  2022-05-07 10:18   ` Lukas Wunner
@ 2022-05-09  9:33   ` Jonathan Cameron
  1 sibling, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2022-05-09  9:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linuxarm, Weiny, Ira, Linux PCI, linux-cxl, Wunner, Lukas, CHUCK_LEVER

On Fri, 6 May 2022 15:40:25 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> [ add Lukas and Chuck ]
> 
> On Tue, May 3, 2022 at 8:35 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > So far, we have been considering Data Object Exchange (DOE) mailboxes only
> > on EPs (CXL type 3 devices).
> > CXL CDAT (technically CXL Table Query Protocol but lets just call it CDAT)
> >   https://lore.kernel.org/linux-cxl/20220414203237.2198665-1-ira.weiny@intel.com
> > CMA/SPDM support
> >   https://lore.kernel.org/linux-cxl/20220303135905.10420-1-Jonathan.Cameron@huawei.com/
> >
> > However, a number of DOE protocols apply to switch (and root) ports.
> > DOE instances supporting CDAT occur on switch upstream ports as well as EPs.
> >
> > DOE instances supporting CMA may occur in root ports, upstream switch ports,
> > downstream switch ports and EPs (including multiple functions where relevant).
> >  
> 
> So, like you, I was envisioning all the CMA and SPDM code landing in
> the kernel until I read this:
> 
> "Extending in-kernel TLS support"
> https://lwn.net/Articles/892216/
> 
> ...and questioned why this new CMA/SPDM session establishment, which
> is similar to TLS, be done inside the kernel while TLS session
> establishment is done in userspace? I had a chance to chat with Chuck
> at LSF/MM and confirmed there is little appetite to change this
> up-call requirement for session establishment and expect CMA to be the
> same. The rough idea of how this works with CMA/SPDM is providing an
> ABI to retrieve session setup data with the end result of userspace
> instantiating a keyid via keyctl the to be used for future SPDM
> messages.

I've also begun messing around with MCTP based transport and current
MCTP subsystem exposes everything via sockets.  Mainly I wanted to
mess around with emulating the FM-API stuff but SPDM (attestation anyway)
is also somewhat plausible.

In that particular case we 'could' do it in kernel I think but it seems
a little crazy and a userspace only (or mostly) solution is probably better.
My current thinking is that if we just had DOE then a kernel only solution
is fairly clean and simple, but chances are the same infrastructure
will be used with other transports where that's not true....

One side point here is I'm not sure we can build on libspdm without substantial
surgery - so we may well end up spinning something new in userspace.
libspdm does it's job as a reference implementation but it's not currently
friendly to being integrated in anything else.

I'd be nervous about drawing any firm conclusions on moving
the sessions setup to userspace until a prototype is up and running.
It feels like it would work fine, but I never trust my feelings ;)

I'll carry on with the MCTP emulation over next week or so and see
where I end up with that.

> 
> > The intent of this RFC is to discuss how to actually implement such support.
> > The attached patch is a really rough PoC for CDAT on upstream switch ports
> > done by adding a new pcie_port_service_driver.  This is different from the
> > proposed auxiliary device used for CXL type 3 devices (for now).  
> 
> CDAT to me is on the "CXL" side of a given PCI device. Given that
> endpoints and switches are each represented by cxl_port objects it
> seems those should generically carry the CDAT binary attribute, not
> the PCI device, don't you think?

Sure, the current approach is definitely a hack.  I think
the DOE itself needs to be managed by the PCIe side of things, but the
individual protocols then instantiated by various drivers.  Some
of those may become 'standard'.

What I definitely didn't want was to end up with aux drivers on aux
drivers.  If we just add this to the cxl_port that's fine for CDAT.
Some of the other protocols may end up looking like
this PoC though (CDAT is just the easiest to hack in so I used that
as an example :) 

> 
> >
> > So open questions:
> >
> > 1. Granularity. Should we do a driver per group of protocols that may
> >    be collocated, or one per DOE instance. For now, we might be looking
> >    at CDAT as done for this PoC, and CMA/IDE.  
> 
> The more time goes by the more I am coming around to Bjorn's initial
> reaction to all this that DOE is closer to a VPD model than an
> auxiliary_device model or pcie_port_device model.
> 
> I.e. have some common discovery in the PCI core for enumerating DOE
> instances and advertisting protocols, but otherwise leave it up to
> individual leaf drivers like cxl_pci or cxl_port to use that core to
> run a given protocol.

Sounds pretty close to where we started with earlier versions (v3 for example)
https://lore.kernel.org/linux-cxl/20210419165451.2176200-4-Jonathan.Cameron@huawei.com/
of the DOE set though we kicked off enumeration by PCI core code via an explicit call
to pci_doe_register_all().  The reasoning behind that was the need for interrupts.

We could play a similar game to I did in this series and do discovery
via polling and then bring up interrupts once available.
It's a bit clunky. Either:
a) When first protocol is requested by a driver, check if the device happens to
   have appropriate interrupts enabled then use them.
b) Have the request for a protocol explicitly request interrupts on basis
   the driver has figured out what vectors are needed and enabled them.

Either is a pretty horrible layer crossing given you'd typically not want the
CXL or similar driver to be that aware of interrupt vector info from the DOE
capability, but the only way it can find out what vectors to enable is to do
what the pci port driver does and walk all the capabilities that might need
a vector and find the max value that is specified anywhere.

I suppose we 'could' add a pci_vectors_get_std_max()
or something like that to paper over this.  Idea being the driver would
know about any other vectors that aren't specified in standards defined
capabilities.

I suspect we'll still end up with some DOE protocol drivers as pci port
services drivers (or a newer version of something similar) as they will
occur on generic PCI ports so there is nothing else to hang them off.

> 
> > 2. Use of a pcie_port_service_driver a reasonable way to do this?
> > 3. Service provision. It is likely that all of the protocols defined
> >    above will be used as part of activities that span multiple devices.
> >    a) CDAT used to establish latencies and bandwidth between host CPU
> >       and memory on a CXL type 3 device beyond one or more CXL switches.
> >    b) CMA.  Might just be used to provide simple device attestation
> >       and potentially lock out the upstream port above a switch if the
> >       switch does not pass attestation.  Many many other uses possible...  
> 
> Per above once userspace has installed an SPDM session keyid for a
> given PCI device it can also optionally set an 'authorized' attribute
> (similiar to what USB and Thunderbolt have) to indicate whether a
> device has passed attestation. As for the actual protocols that are
> going to run over the SPDM session those would need their own drivers
> that reference the established keyid.

Ok. That flow seems to make sense (at first glance).

> 
> >    c) Secure CMA / IDE. Likely to be used to set up link IDE.  What
> >       this will look like is a question I've not really started
> >       thinking about yet.
> >
> >    So how do we support this?  If nothing else we need to make sure
> >    the drivers for the port don't go away whilst in use.  
> 
> Another reason to make it a core aspect of the PCI device like VPD so
> there are no entanglements beyond "PCI device exists".

Hmm. I'm not totally sure that will be enough but 'maybe'.  Guess we'll
see as things come together.

> 
> > The patch is a very early PoC just to show it would 'work'...
> >
> > Note I am keen to not have the discussion around this support delay
> > Ira's series.  
> 
> Is there a nearer term forcing function for this? I.e. v5.20 seems to
> be where the current DOE series is going to intercept. I think abandon
> the "aux" organization for now and make DOE like VPD.

Main forcing function is the nebulous fact that we all keep forgetting
how this works and having to spend non trivial time getting back into
it. 5.20 cycle is probably fine given timing of anything
else landing.

Jonathan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-07 10:18   ` Lukas Wunner
@ 2022-05-09  9:48     ` Jonathan Cameron
  2022-05-11 19:13       ` Lukas Wunner
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Cameron @ 2022-05-09  9:48 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Dan Williams, Linuxarm, Weiny, Ira, Linux PCI, linux-cxl, CHUCK_LEVER

On Sat, 7 May 2022 12:18:48 +0200
Lukas Wunner <lukas@wunner.de> wrote:

> On Fri, May 06, 2022 at 03:40:25PM -0700, Dan Williams wrote:
> > On Tue, May 3, 2022 at 8:35 AM Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > So, like you, I was envisioning all the CMA and SPDM code landing in
> > the kernel until I read this:
> > 
> > "Extending in-kernel TLS support"
> > https://lwn.net/Articles/892216/
> > 
> > ...and questioned why this new CMA/SPDM session establishment, which
> > is similar to TLS, be done inside the kernel while TLS session
> > establishment is done in userspace? I had a chance to chat with Chuck
> > at LSF/MM and confirmed there is little appetite to change this
> > up-call requirement for session establishment and expect CMA to be the
> > same. The rough idea of how this works with CMA/SPDM is providing an
> > ABI to retrieve session setup data with the end result of userspace
> > instantiating a keyid via keyctl the to be used for future SPDM
> > messages.  
> 
> I'm still somewhat undecided on the kernel vs. user space question.

Likewise.  I feel a few more prototypes are needed to come to clear
conclusion.

> 
> The simplest approach would be to expose DOE to user space and just
> let it send the requests necessary for setting up an SPDM session
> as well as IDE.
> 
> Jonathan posted patches a while ago to expose a DOE chrdev:
> https://lore.kernel.org/all/20210524133938.2815206-5-Jonathan.Cameron@huawei.com/

Discussions at Plumbers last year (and on list) concluded that generic
interface was probably a non starter, but something SPDM specific should
be fine. (even if it's just that generic interface with a check that it
is the right DOE protocol ;)
> 
> On the other hand, I'm poring over Jonathan's latest kernel SPDM/CMA
> patches and they're not very large, hence seem upstreamable.

That ended up rather cleaner than I expected. Though note that series doesn't get
quite as far as secure channel setup which is the point where a
userspace stack might hand over to kernel.  It should be a fairly trivial
addition though.

> 
> It would be nice to have the kernel auto-detect that a PCI device
> supports IDE and create a directory in sysfs as a result.  The user
> would just drop certificates there which are added to a keyring
> and the kernel would automatically try to set up an SPDM session
> with one of the certificates and bring up IDE encryption.  Binding of
> drivers to devices could be delayed until IDE is set up, as could be
> enumeration of devices below a hot-plugged, IDE-capable PCIe switch.

Not sure interface would look exactly like that, but the point in general
is good.

> 
> (I can think of plenty of use cases, some good, some evil.
> Such as preventing malicious / forged devices from being used,
> but also vendors and employers controlling what people plug into
> their machines.  Brave new world.)

We could do driver specific 'blocking' until the 'authorized' bit Dan suggested
is set by userspace (after attestation) and end up with somewhat
similar security.  The only real difference being that a malicious device
might have more driver code to attack than it would if we blocked
the driver starting at all.  That would depend on how the driver was
implemented.

> 
> Of course, it would probably be possible to implement all of this
> in user space but it might not be as seamless.
> 
> I just fear that we may become a laughing stock for security researchers
> with a kernel-only implementation.  Kind of like "ASN.1 parsing at ring 0":
> https://x41-dsec.de/lab/blog/kernel_userspace/
> 
> I have some feedback on Jonathan's SPDM/CMA patches and will send it
> out in a bit...

Cool. Though I suspect those will be parked for now as 'sufficient' for
discussion.  Next step is to mock up what the alternative looks like...

Good to capture the feedback while it's fresh on basis we might end up back there!

Jonathan


> 
> Thanks,
> 
> Lukas
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-09  9:48     ` Jonathan Cameron
@ 2022-05-11 19:13       ` Lukas Wunner
  2022-05-11 19:19         ` Lukas Wunner
  2022-05-11 19:42         ` Dan Williams
  0 siblings, 2 replies; 22+ messages in thread
From: Lukas Wunner @ 2022-05-11 19:13 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Gavin Hindman, Dan Williams, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Mon, May 09, 2022 at 10:48:06AM +0100, Jonathan Cameron wrote:
> On Sat, 7 May 2022 12:18:48 +0200 Lukas Wunner <lukas@wunner.de> wrote:
> > I'm still somewhat undecided on the kernel vs. user space question.
> 
> Likewise.  I feel a few more prototypes are needed to come to clear
> conclusion.

Gavin Hindman (+cc) raised an important point off-list:

When an IDE-capable device is runtime suspended to D3hot and later
runtime resumed to D0, it may not preserve its internal state.
(The No_Soft_Reset bit in the Power Management Control/Status Register
tells us whether the device is capable of preserving internal state
over a transition to D3hot, see PCIe r6.0, sec. 7.5.2.2.)

Likewise, when an IDE-capable device is reset (e.g. due to Downstream
Port Containment, AER or a bus reset initiated by user space),
internal state is lost and must be reconstructed by pci_restore_state().
That state includes the SPDM session or IDE encryption.

If setting up an SPDM session is dependent on user space, the kernel
would have to leave a device in an inoperable state after runtime resume
or reset, until user space gets around to initiate SPDM.

I think that would be a terrible user experience.  We've gone to great
lengths to make reset recovery as seamless and quick as possible.
(E.g. hot-plugged NVMe drives survive a reset without the driver being
unbound, those would be prime candidates for IDE encryption.)
It won't help the acceptance of IDE if it breaks that seamlessness.

So that's a strong argument for an in-kernel SPDM implementation.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-11 19:13       ` Lukas Wunner
@ 2022-05-11 19:19         ` Lukas Wunner
  2022-05-11 19:43           ` Dan Williams
  2022-05-11 19:42         ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2022-05-11 19:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Gavin Hindman, Dan Williams, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Wed, May 11, 2022 at 09:13:45PM +0200, Lukas Wunner wrote:
> When an IDE-capable device is runtime suspended to D3hot and later
> runtime resumed to D0, it may not preserve its internal state.
> (The No_Soft_Reset bit in the Power Management Control/Status Register
> tells us whether the device is capable of preserving internal state
> over a transition to D3hot, see PCIe r6.0, sec. 7.5.2.2.)
> 
> Likewise, when an IDE-capable device is reset (e.g. due to Downstream
> Port Containment, AER or a bus reset initiated by user space),
> internal state is lost and must be reconstructed by pci_restore_state().
> That state includes the SPDM session or IDE encryption.

Digging a little further, sec. 6.33.8 says that "The No_Soft_Reset bit
must be Set", so at least IDE will always survive a D3hot transition.

But the reset argument still stands:  That same section says that all
IDE streams transition to Insecure and all keys are invalidated upon
reset.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-11 19:13       ` Lukas Wunner
  2022-05-11 19:19         ` Lukas Wunner
@ 2022-05-11 19:42         ` Dan Williams
  2022-05-11 20:22           ` Hindman, Gavin
  2022-05-14 13:31           ` Lukas Wunner
  1 sibling, 2 replies; 22+ messages in thread
From: Dan Williams @ 2022-05-11 19:42 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Wed, May 11, 2022 at 12:14 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Mon, May 09, 2022 at 10:48:06AM +0100, Jonathan Cameron wrote:
> > On Sat, 7 May 2022 12:18:48 +0200 Lukas Wunner <lukas@wunner.de> wrote:
> > > I'm still somewhat undecided on the kernel vs. user space question.
> >
> > Likewise.  I feel a few more prototypes are needed to come to clear
> > conclusion.
>
> Gavin Hindman (+cc) raised an important point off-list:
>
> When an IDE-capable device is runtime suspended to D3hot and later
> runtime resumed to D0, it may not preserve its internal state.
> (The No_Soft_Reset bit in the Power Management Control/Status Register
> tells us whether the device is capable of preserving internal state
> over a transition to D3hot, see PCIe r6.0, sec. 7.5.2.2.)

I think power-management effects relative to IDE is a soft spot of the
specification. If the link goes down then yes, IDE needs to be
re-established, but as far as I can see that's a policy tradeoff to
support runtime reset or support link encryption.

> Likewise, when an IDE-capable device is reset (e.g. due to Downstream
> Port Containment, AER or a bus reset initiated by user space),
> internal state is lost and must be reconstructed by pci_restore_state().
> That state includes the SPDM session or IDE encryption.
>
> If setting up an SPDM session is dependent on user space, the kernel
> would have to leave a device in an inoperable state after runtime resume
> or reset, until user space gets around to initiate SPDM.

Yes, this seems acceptable from the perspective of server platforms
that can make the power management vs security tradeoff.

>
> I think that would be a terrible user experience.  We've gone to great
> lengths to make reset recovery as seamless and quick as possible.
> (E.g. hot-plugged NVMe drives survive a reset without the driver being
> unbound, those would be prime candidates for IDE encryption.)
> It won't help the acceptance of IDE if it breaks that seamlessness.
>
> So that's a strong argument for an in-kernel SPDM implementation.

The SPDM message passing will always need to be supported in-kernel.
It's the certificate parsing and attestation flow that is proposed to
be in userspace. So perform CMA with userspace up-calls, and then
insert a key-id into the kernel for ongoing SPDM message passing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-11 19:19         ` Lukas Wunner
@ 2022-05-11 19:43           ` Dan Williams
  2022-05-14 13:55             ` Lukas Wunner
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2022-05-11 19:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Wed, May 11, 2022 at 12:20 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Wed, May 11, 2022 at 09:13:45PM +0200, Lukas Wunner wrote:
> > When an IDE-capable device is runtime suspended to D3hot and later
> > runtime resumed to D0, it may not preserve its internal state.
> > (The No_Soft_Reset bit in the Power Management Control/Status Register
> > tells us whether the device is capable of preserving internal state
> > over a transition to D3hot, see PCIe r6.0, sec. 7.5.2.2.)
> >
> > Likewise, when an IDE-capable device is reset (e.g. due to Downstream
> > Port Containment, AER or a bus reset initiated by user space),
> > internal state is lost and must be reconstructed by pci_restore_state().
> > That state includes the SPDM session or IDE encryption.
>
> Digging a little further, sec. 6.33.8 says that "The No_Soft_Reset bit
> must be Set", so at least IDE will always survive a D3hot transition.
>
> But the reset argument still stands:  That same section says that all
> IDE streams transition to Insecure and all keys are invalidated upon
> reset.

Right, this isn't the only problem with reset vs ongoing CXL operations...

https://lore.kernel.org/linux-cxl/164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-11 19:42         ` Dan Williams
@ 2022-05-11 20:22           ` Hindman, Gavin
  2022-05-11 21:04             ` Dan Williams
  2022-05-14 13:31           ` Lukas Wunner
  1 sibling, 1 reply; 22+ messages in thread
From: Hindman, Gavin @ 2022-05-11 20:22 UTC (permalink / raw)
  To: Williams, Dan J, Lukas Wunner
  Cc: Jonathan Cameron, Linuxarm, Weiny, Ira, Linux PCI, linux-cxl,
	CHUCK_LEVER



>-----Original Message-----
>From: Dan Williams <dan.j.williams@intel.com>
>Sent: Wednesday, May 11, 2022 12:42 PM
>To: Lukas Wunner <lukas@wunner.de>
>Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>; Hindman, Gavin
><gavin.hindman@intel.com>; Linuxarm <linuxarm@huawei.com>; Weiny, Ira
><ira.weiny@intel.com>; Linux PCI <linux-pci@vger.kernel.org>; linux-
>cxl@vger.kernel.org; CHUCK_LEVER <chuck.lever@oracle.com>
>Subject: Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
>
>On Wed, May 11, 2022 at 12:14 PM Lukas Wunner <lukas@wunner.de> wrote:
>>
>> On Mon, May 09, 2022 at 10:48:06AM +0100, Jonathan Cameron wrote:
>> > On Sat, 7 May 2022 12:18:48 +0200 Lukas Wunner <lukas@wunner.de>
>wrote:
>> > > I'm still somewhat undecided on the kernel vs. user space question.
>> >
>> > Likewise.  I feel a few more prototypes are needed to come to clear
>> > conclusion.
>>
>> Gavin Hindman (+cc) raised an important point off-list:
>>
>> When an IDE-capable device is runtime suspended to D3hot and later
>> runtime resumed to D0, it may not preserve its internal state.
>> (The No_Soft_Reset bit in the Power Management Control/Status Register
>> tells us whether the device is capable of preserving internal state
>> over a transition to D3hot, see PCIe r6.0, sec. 7.5.2.2.)
>
>I think power-management effects relative to IDE is a soft spot of the
>specification. If the link goes down then yes, IDE needs to be re-established,
>but as far as I can see that's a policy tradeoff to support runtime reset or
>support link encryption.
>
>> Likewise, when an IDE-capable device is reset (e.g. due to Downstream
>> Port Containment, AER or a bus reset initiated by user space),
>> internal state is lost and must be reconstructed by pci_restore_state().
>> That state includes the SPDM session or IDE encryption.
>>
>> If setting up an SPDM session is dependent on user space, the kernel
>> would have to leave a device in an inoperable state after runtime
>> resume or reset, until user space gets around to initiate SPDM.
>
>Yes, this seems acceptable from the perspective of server platforms that can
>make the power management vs security tradeoff.
>

Agree, though more and more we need to be thinking about sustainability and cost-of-ownership and having to keep devices awake in order to meet security goals is somewhat contrary to that objective.  I fully realize those are not technical constraints, but IMO should still be considered.  Latency for deadline-driven tasks was my original consideration, not just security - power-management features commonly get turned off due to resume latency, and this would appear to have the potential to extend resume latencies even in kernel, let alone waiting for user-space response. Again, obviously not a hard design constraint, but seems worthy of consideration 

>>
>> I think that would be a terrible user experience.  We've gone to great
>> lengths to make reset recovery as seamless and quick as possible.
>> (E.g. hot-plugged NVMe drives survive a reset without the driver being
>> unbound, those would be prime candidates for IDE encryption.) It won't
>> help the acceptance of IDE if it breaks that seamlessness.
>>
>> So that's a strong argument for an in-kernel SPDM implementation.
>
>The SPDM message passing will always need to be supported in-kernel.
>It's the certificate parsing and attestation flow that is proposed to be in
>userspace. So perform CMA with userspace up-calls, and then insert a key-id
>into the kernel for ongoing SPDM message passing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-11 20:22           ` Hindman, Gavin
@ 2022-05-11 21:04             ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2022-05-11 21:04 UTC (permalink / raw)
  To: Hindman, Gavin
  Cc: Lukas Wunner, Jonathan Cameron, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Wed, May 11, 2022 at 1:22 PM Hindman, Gavin <gavin.hindman@intel.com> wrote:
>
>
>
> >-----Original Message-----
> >From: Dan Williams <dan.j.williams@intel.com>
> >Sent: Wednesday, May 11, 2022 12:42 PM
> >To: Lukas Wunner <lukas@wunner.de>
> >Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>; Hindman, Gavin
> ><gavin.hindman@intel.com>; Linuxarm <linuxarm@huawei.com>; Weiny, Ira
> ><ira.weiny@intel.com>; Linux PCI <linux-pci@vger.kernel.org>; linux-
> >cxl@vger.kernel.org; CHUCK_LEVER <chuck.lever@oracle.com>
> >Subject: Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
> >
> >On Wed, May 11, 2022 at 12:14 PM Lukas Wunner <lukas@wunner.de> wrote:
> >>
> >> On Mon, May 09, 2022 at 10:48:06AM +0100, Jonathan Cameron wrote:
> >> > On Sat, 7 May 2022 12:18:48 +0200 Lukas Wunner <lukas@wunner.de>
> >wrote:
> >> > > I'm still somewhat undecided on the kernel vs. user space question.
> >> >
> >> > Likewise.  I feel a few more prototypes are needed to come to clear
> >> > conclusion.
> >>
> >> Gavin Hindman (+cc) raised an important point off-list:
> >>
> >> When an IDE-capable device is runtime suspended to D3hot and later
> >> runtime resumed to D0, it may not preserve its internal state.
> >> (The No_Soft_Reset bit in the Power Management Control/Status Register
> >> tells us whether the device is capable of preserving internal state
> >> over a transition to D3hot, see PCIe r6.0, sec. 7.5.2.2.)
> >
> >I think power-management effects relative to IDE is a soft spot of the
> >specification. If the link goes down then yes, IDE needs to be re-established,
> >but as far as I can see that's a policy tradeoff to support runtime reset or
> >support link encryption.
> >
> >> Likewise, when an IDE-capable device is reset (e.g. due to Downstream
> >> Port Containment, AER or a bus reset initiated by user space),
> >> internal state is lost and must be reconstructed by pci_restore_state().
> >> That state includes the SPDM session or IDE encryption.
> >>
> >> If setting up an SPDM session is dependent on user space, the kernel
> >> would have to leave a device in an inoperable state after runtime
> >> resume or reset, until user space gets around to initiate SPDM.
> >
> >Yes, this seems acceptable from the perspective of server platforms that can
> >make the power management vs security tradeoff.
> >
>
> Agree, though more and more we need to be thinking about sustainability and cost-of-ownership and having to keep devices awake in order to meet security goals is somewhat contrary to that objective.  I fully realize those are not technical constraints, but IMO should still be considered.  Latency for deadline-driven tasks was my original consideration, not just security - power-management features commonly get turned off due to resume latency, and this would appear to have the potential to extend resume latencies even in kernel, let alone waiting for user-space response. Again, obviously not a hard design constraint, but seems worthy of consideration

Keep in mind that kernel managed IDE is not much more than a stop-gap
to fully attesting that devices are within a goven trusted compute
boundary. In that model the kernel is not trusted to establish that
validation. Instead that role is reserved for a trusted platform
entity. So yes, those are important considerations, but they do not
read on the kernel implementation in the near term.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-11 19:42         ` Dan Williams
  2022-05-11 20:22           ` Hindman, Gavin
@ 2022-05-14 13:31           ` Lukas Wunner
  2022-05-16 16:53             ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2022-05-14 13:31 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Wed, May 11, 2022 at 12:42:24PM -0700, Dan Williams wrote:
> I think power-management effects relative to IDE is a soft spot of the
> specification.

When resuming from system sleep, the kernel restores a device's
config space in pci_pm_resume_noirq(), then calls the driver's
->resume_noirq() callback.  The driver is free to assume that
the device is accessible und usable at that point.

IDE breaks that contract if establishment of an SPDM session
depends on user space.  We can't call out to user space for
authentication during the resume_noirq phase because interrupts
are still disabled.

Drivers would have to be aware that IDE has not yet been
re-established and refrain from accessing the device.
Any child devices of the PCI device cannot be resumed
until then.

Ideally we'd want IDE to be transparent to drivers.
That's impossible if their access to devices is forbidden
after system sleep for an indefinite amount of time.

Runtime PM has similar issues as system sleep if the device
was in D3cold.

Reliance on user space also entails a risk of deadlocks:
Let's say user space process A accesses a PCI device,
the kernel runtime resumes the device and calls out to
user space process B to authenticate it.  If A is holding
a resource that B requires, the two tasks deadlock and
the device never becomes accessible.

The more I think about it, the more attractive does Jonathan's
in-kernel SPDM approach look.  Performing SPDM authentication and
IDE setup in the kernel would allow us to retain all existing
assumptions and behavior around power management and reset recovery,
avoid driver awareness of IDE and avoid deadlocks.


> > If setting up an SPDM session is dependent on user space, the kernel
> > would have to leave a device in an inoperable state after runtime resume
> > or reset, until user space gets around to initiate SPDM.
> 
> Yes, this seems acceptable from the perspective of server platforms
> that can make the power management vs security tradeoff.

It seems likely that IDE will not only be used on server platforms.

I'll see to to it that I provide more review feedback to Jonathan's RFC
series so that we can move forward with this.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-11 19:43           ` Dan Williams
@ 2022-05-14 13:55             ` Lukas Wunner
  2022-05-16 17:01               ` Dan Williams
  2022-05-18 13:43               ` Christoph Hellwig
  0 siblings, 2 replies; 22+ messages in thread
From: Lukas Wunner @ 2022-05-14 13:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Wed, May 11, 2022 at 12:43:34PM -0700, Dan Williams wrote:
> On Wed, May 11, 2022 at 12:20 PM Lukas Wunner <lukas@wunner.de> wrote:
> > But the reset argument still stands:  That same section says that all
> > IDE streams transition to Insecure and all keys are invalidated upon
> > reset.
> 
> Right, this isn't the only problem with reset vs ongoing CXL operations...
> 
> https://lore.kernel.org/linux-cxl/164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com/

The above-linked cover letter refers to AER.

I believe with AER, the kernel is notified of an error via an interrupt
and asynchronously attempts recovery through a reset.
Obviously, an eternity may pass until the kernel gets around to do that
and whether accesses performed between the initial error and the reset
succeed is sort of undefined.  So it's kind of a "best effort" error
recovery.

With the advent of DPC, the situation has improved considerably as the
hardware (not the kernel) automatically disables the link upon occurrence
of the initial error.  Any subsequent accesses will fail and the kernel
does not perform a reset itself (the hardware already did that) but merely
attempts to bring the link back up.  That has made error recovery pretty
solid and NVMe drives now seamlessly recover from errors without the need
to unbind/rebind the driver.  Data centers heavily depend on that feature.

Perhaps if CXL.mem used DPC, it would be able to recover more reliably?

Circling back to the SPDM/IDE topic, while NVMe is now capable of
reliably recovering from errors, it does expect the kernel to handle
recovery within a few seconds.  I'm not sure we can continue to
guarantee that if the kernel depends on user space to perform
re-authentication with SPDM after reset.  That's another headache
that we could avoid with in-kernel SPDM authentication.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-14 13:31           ` Lukas Wunner
@ 2022-05-16 16:53             ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2022-05-16 16:53 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Sat, May 14, 2022 at 6:31 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Wed, May 11, 2022 at 12:42:24PM -0700, Dan Williams wrote:
> > I think power-management effects relative to IDE is a soft spot of the
> > specification.
>
> When resuming from system sleep, the kernel restores a device's
> config space in pci_pm_resume_noirq(), then calls the driver's
> ->resume_noirq() callback.  The driver is free to assume that
> the device is accessible und usable at that point.
>
> IDE breaks that contract if establishment of an SPDM session
> depends on user space.  We can't call out to user space for
> authentication during the resume_noirq phase because interrupts
> are still disabled.
>
> Drivers would have to be aware that IDE has not yet been
> re-established and refrain from accessing the device.
> Any child devices of the PCI device cannot be resumed
> until then.

Suspend has larger issues with CXL:

https://lore.kernel.org/linux-cxl/165066828317.3907920.5690432272182042556.stgit@dwillia2-desk3.amr.corp.intel.com/

...so IDE is just one more problem on top that requires disabling
suspend. Unless / until firmware takes responsibility for setting up
IDE I am not seeing a clean option for allowing the link to go down.

> Ideally we'd want IDE to be transparent to drivers.
> That's impossible if their access to devices is forbidden
> after system sleep for an indefinite amount of time.
>
> Runtime PM has similar issues as system sleep if the device
> was in D3cold.
>
> Reliance on user space also entails a risk of deadlocks:
> Let's say user space process A accesses a PCI device,
> the kernel runtime resumes the device and calls out to
> user space process B to authenticate it.  If A is holding
> a resource that B requires, the two tasks deadlock and
> the device never becomes accessible.
>
> The more I think about it, the more attractive does Jonathan's
> in-kernel SPDM approach look.  Performing SPDM authentication and
> IDE setup in the kernel would allow us to retain all existing
> assumptions and behavior around power management and reset recovery,
> avoid driver awareness of IDE and avoid deadlocks.

I agree with you that userspace coordination has these problems, but
they are secondary to the larger problem that hosting memory behind
PCI devices causes.

>
>
> > > If setting up an SPDM session is dependent on user space, the kernel
> > > would have to leave a device in an inoperable state after runtime resume
> > > or reset, until user space gets around to initiate SPDM.
> >
> > Yes, this seems acceptable from the perspective of server platforms
> > that can make the power management vs security tradeoff.
>
> It seems likely that IDE will not only be used on server platforms.

I expect IDE outside of the server space will need to be platform
firmware managed. OS managed IDE seems a stopgap to platform firmware
validating devices to be within the trusted compute boundary.

> I'll see to to it that I provide more review feedback to Jonathan's RFC
> series so that we can move forward with this.
>
> Thanks,
>
> Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-14 13:55             ` Lukas Wunner
@ 2022-05-16 17:01               ` Dan Williams
  2022-05-27  9:39                 ` Lukas Wunner
  2022-05-18 13:43               ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Dan Williams @ 2022-05-16 17:01 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Sat, May 14, 2022 at 6:55 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Wed, May 11, 2022 at 12:43:34PM -0700, Dan Williams wrote:
> > On Wed, May 11, 2022 at 12:20 PM Lukas Wunner <lukas@wunner.de> wrote:
> > > But the reset argument still stands:  That same section says that all
> > > IDE streams transition to Insecure and all keys are invalidated upon
> > > reset.
> >
> > Right, this isn't the only problem with reset vs ongoing CXL operations...
> >
> > https://lore.kernel.org/linux-cxl/164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com/
>
> The above-linked cover letter refers to AER.
>
> I believe with AER, the kernel is notified of an error via an interrupt
> and asynchronously attempts recovery through a reset.
> Obviously, an eternity may pass until the kernel gets around to do that
> and whether accesses performed between the initial error and the reset
> succeed is sort of undefined.  So it's kind of a "best effort" error
> recovery.
>
> With the advent of DPC, the situation has improved considerably as the
> hardware (not the kernel) automatically disables the link upon occurrence
> of the initial error.

DPC, as far is I can see, is broken for CXL, any link going down
causes the entire active interleaved memory range to be lost. Hence
the "hopeful" designation in that patch set, if the link is going down
to DPC the chance that the kernel runs long enough to even report the
error is at risk.

> Any subsequent accesses will fail and the kernel
> does not perform a reset itself (the hardware already did that) but merely
> attempts to bring the link back up.  That has made error recovery pretty
> solid and NVMe drives now seamlessly recover from errors without the need
> to unbind/rebind the driver.  Data centers heavily depend on that feature.

Works great for NVME.

>
> Perhaps if CXL.mem used DPC, it would be able to recover more reliably?

So far all I can see are attempts to fail a bit more gracefully, but I
would not consider this a reliable architecture for recovery:

https://www.computeexpresslink.org/_files/ugd/0c1418_f63f7f1a9f474ba2b00f5e77429867cb.pdf

> Circling back to the SPDM/IDE topic, while NVMe is now capable of
> reliably recovering from errors, it does expect the kernel to handle
> recovery within a few seconds.  I'm not sure we can continue to
> guarantee that if the kernel depends on user space to perform
> re-authentication with SPDM after reset.  That's another headache
> that we could avoid with in-kernel SPDM authentication.

What is missing from this conversation is what constitutes a device
leaving the trusted compute boundary and is the existing attestation
invalidated by a reset. I.e. perhaps the kernel can just do a
keep-alive heartbeat after the reset with the already negotiated key
to confirm the session is still valid.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-14 13:55             ` Lukas Wunner
  2022-05-16 17:01               ` Dan Williams
@ 2022-05-18 13:43               ` Christoph Hellwig
  2022-05-18 15:08                 ` Dan Williams
  2022-05-20  5:42                 ` Lukas Wunner
  1 sibling, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2022-05-18 13:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Dan Williams, Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny,
	Ira, Linux PCI, linux-cxl, CHUCK_LEVER

On Sat, May 14, 2022 at 03:55:21PM +0200, Lukas Wunner wrote:
> Circling back to the SPDM/IDE topic, while NVMe is now capable of
> reliably recovering from errors, it does expect the kernel to handle
> recovery within a few seconds.  I'm not sure we can continue to
> guarantee that if the kernel depends on user space to perform
> re-authentication with SPDM after reset.  That's another headache
> that we could avoid with in-kernel SPDM authentication.

I wonder if we need kernel bundled and tightly controlled userspace
code for these kinds of things (also for NVMe/NFS TLS).  That is,
bundle a userspace ELF file or files with a module which is unpacked
to or accessible by a ramfs-style file systems.  Then allow executing
it without any interaction with the normal userspace, and non-pagable
memory.  That way we can reuse existing userspace code, have really
nice address space isolation but avoid all the deadlock potential
of normal userspace code.  And I don't think it would be too hard to
implement either.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-18 13:43               ` Christoph Hellwig
@ 2022-05-18 15:08                 ` Dan Williams
  2022-05-20  5:42                 ` Lukas Wunner
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Williams @ 2022-05-18 15:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Lukas Wunner, Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny,
	Ira, Linux PCI, linux-cxl, CHUCK_LEVER

On Wed, May 18, 2022 at 6:44 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sat, May 14, 2022 at 03:55:21PM +0200, Lukas Wunner wrote:
> > Circling back to the SPDM/IDE topic, while NVMe is now capable of
> > reliably recovering from errors, it does expect the kernel to handle
> > recovery within a few seconds.  I'm not sure we can continue to
> > guarantee that if the kernel depends on user space to perform
> > re-authentication with SPDM after reset.  That's another headache
> > that we could avoid with in-kernel SPDM authentication.
>
> I wonder if we need kernel bundled and tightly controlled userspace
> code for these kinds of things (also for NVMe/NFS TLS).  That is,
> bundle a userspace ELF file or files with a module which is unpacked
> to or accessible by a ramfs-style file systems.  Then allow executing
> it without any interaction with the normal userspace, and non-pagable
> memory.  That way we can reuse existing userspace code, have really
> nice address space isolation but avoid all the deadlock potential
> of normal userspace code.  And I don't think it would be too hard to
> implement either.

Yes, I also want something like this for mitigating the vulnerability
surface of things like PRM [1], where platform vendors are looking to
move more runtime helpers out of SMM mode and into ring0. I would
rather see those routines move all the way into ring3.

[1]: https://uefi.org/sites/default/files/resources/Platform%20Runtime%20Mechanism%20-%20with%20legal%20notice.pdf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-18 13:43               ` Christoph Hellwig
  2022-05-18 15:08                 ` Dan Williams
@ 2022-05-20  5:42                 ` Lukas Wunner
  2022-05-20 15:37                   ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2022-05-20  5:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny,
	Ira, Linux PCI, linux-cxl, CHUCK_LEVER

On Wed, May 18, 2022 at 06:43:39AM -0700, Christoph Hellwig wrote:
> On Sat, May 14, 2022 at 03:55:21PM +0200, Lukas Wunner wrote:
> > Circling back to the SPDM/IDE topic, while NVMe is now capable of
> > reliably recovering from errors, it does expect the kernel to handle
> > recovery within a few seconds.  I'm not sure we can continue to
> > guarantee that if the kernel depends on user space to perform
> > re-authentication with SPDM after reset.  That's another headache
> > that we could avoid with in-kernel SPDM authentication.
> 
> I wonder if we need kernel bundled and tightly controlled userspace
> code for these kinds of things (also for NVMe/NFS TLS).  That is,
> bundle a userspace ELF file or files with a module which is unpacked
> to or accessible by a ramfs-style file systems.  Then allow executing
> it without any interaction with the normal userspace, and non-pagable
> memory.  That way we can reuse existing userspace code, have really
> nice address space isolation but avoid all the deadlock potential
> of normal userspace code.  And I don't think it would be too hard to
> implement either.

Just as a reminder, on resume from system sleep, IDE needs to be
set up by pci_pm_resume_noirq() to comply with the existing assumption
that a PCI driver's ->resume_noirq callback may access the device.

At that point (device) interrupts are disabled, so it's not possible
to e.g. read certificates from disk or perform an OCSP request.
So the bundled userspace code would have to conform to a number of
severe restrictions to avoid resume issues.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-20  5:42                 ` Lukas Wunner
@ 2022-05-20 15:37                   ` Dan Williams
  2022-05-20 15:42                     ` Chuck Lever III
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2022-05-20 15:37 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Christoph Hellwig, Jonathan Cameron, Gavin Hindman, Linuxarm,
	Weiny, Ira, Linux PCI, linux-cxl, CHUCK_LEVER

On Thu, May 19, 2022 at 10:42 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Wed, May 18, 2022 at 06:43:39AM -0700, Christoph Hellwig wrote:
> > On Sat, May 14, 2022 at 03:55:21PM +0200, Lukas Wunner wrote:
> > > Circling back to the SPDM/IDE topic, while NVMe is now capable of
> > > reliably recovering from errors, it does expect the kernel to handle
> > > recovery within a few seconds.  I'm not sure we can continue to
> > > guarantee that if the kernel depends on user space to perform
> > > re-authentication with SPDM after reset.  That's another headache
> > > that we could avoid with in-kernel SPDM authentication.
> >
> > I wonder if we need kernel bundled and tightly controlled userspace
> > code for these kinds of things (also for NVMe/NFS TLS).  That is,
> > bundle a userspace ELF file or files with a module which is unpacked
> > to or accessible by a ramfs-style file systems.  Then allow executing
> > it without any interaction with the normal userspace, and non-pagable
> > memory.  That way we can reuse existing userspace code, have really
> > nice address space isolation but avoid all the deadlock potential
> > of normal userspace code.  And I don't think it would be too hard to
> > implement either.
>
> Just as a reminder, on resume from system sleep, IDE needs to be
> set up by pci_pm_resume_noirq() to comply with the existing assumption
> that a PCI driver's ->resume_noirq callback may access the device.
>
> At that point (device) interrupts are disabled, so it's not possible
> to e.g. read certificates from disk or perform an OCSP request.
> So the bundled userspace code would have to conform to a number of
> severe restrictions to avoid resume issues.

Recall that OS managed IDE is somewhat of a stop-gap / special case as
typically the OS kernel is outside of the platform trust boundary for
things like TDX. I imagine suspend in the presence of IDE would be
platform firmware managed, not OS managed. Certificate validation can
always move internal to the kernel later if a concrete need arises,
but it is difficult to go the other way, to kick out certificate
validation from the kernel if it proves not to be needed. Otherwise, a
ring3/ userspace helper that can live in non-pageable memory to avoid
scenarios like this sounds like a capability that would be worth
having regardless.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-20 15:37                   ` Dan Williams
@ 2022-05-20 15:42                     ` Chuck Lever III
  0 siblings, 0 replies; 22+ messages in thread
From: Chuck Lever III @ 2022-05-20 15:42 UTC (permalink / raw)
  To: Dan Williams
  Cc: Lukas Wunner, Christoph Hellwig, Jonathan Cameron, Gavin Hindman,
	Linuxarm, Weiny, Ira, Linux PCI, linux-cxl



> On May 20, 2022, at 11:37 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> Otherwise, a
> ring3/ userspace helper that can live in non-pageable memory to avoid
> scenarios like this sounds like a capability that would be worth
> having regardless.

TLS has a similar issue: We would like to support TLS-protected network
storage for the root filesystem. The user space agent that handles TLS
handshakes would therefore need to reside in non-pageable memory to
prevent a deadlock if a new handshake is needed to access the root's
backing store. But then so would the certificate trust chain and any
handshake configuration information.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
  2022-05-16 17:01               ` Dan Williams
@ 2022-05-27  9:39                 ` Lukas Wunner
  0 siblings, 0 replies; 22+ messages in thread
From: Lukas Wunner @ 2022-05-27  9:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Cameron, Gavin Hindman, Linuxarm, Weiny, Ira, Linux PCI,
	linux-cxl, CHUCK_LEVER

On Mon, May 16, 2022 at 10:01:26AM -0700, Dan Williams wrote:
> On Sat, May 14, 2022 at 6:55 AM Lukas Wunner <lukas@wunner.de> wrote:
> > Circling back to the SPDM/IDE topic, while NVMe is now capable of
> > reliably recovering from errors, it does expect the kernel to handle
> > recovery within a few seconds.  I'm not sure we can continue to
> > guarantee that if the kernel depends on user space to perform
> > re-authentication with SPDM after reset.  That's another headache
> > that we could avoid with in-kernel SPDM authentication.
> 
> What is missing from this conversation is what constitutes a device
> leaving the trusted compute boundary and is the existing attestation
> invalidated by a reset. I.e. perhaps the kernel can just do a
> keep-alive heartbeat after the reset with the already negotiated key
> to confirm the session is still valid.

After a bit of digging in the spec I think I can answer your question.

Any type of reset (both Conventional Reset and FLR) "must result in all
IDE Streams associated with that Function transitioning to the Insecure
state, and all keys must be invalidated and rendered unreadable."
(PCIe r6.0, sec 6.33.8).

So IDE is always gone after reset and needs to be re-established.
An SPDM session is necessary for that.  Does a reset affect SPDM?

It depends on the type of reset:  A FLR "must not change the state
of the secure session" (PCIe r6.0, sec 6.31.4).

But for a Conventional Reset, the situation seems to be different:
The CMA/SPDM section of the spec doesn't say what happens to an
SPDM session upon a Conventional Reset, but sec 6.6.1 clearly states
that "all Port registers and state machines must be set to their
initialization values as specified in this document, except for
sticky registers".  So I think we must assume that the SPDM session
is gone after a Conventional Reset.

DPC results in a Conventional Reset at the port where it occurs and
also propagates a Hot Reset down the hierarchy (see ea401499e943).

The bottom line is that after DPC, both the SPDM session as well as IDE
need to be re-established, requiring the certificate validation
which is controversial here when performed at the kernel level.

(As an aside, sec 6.6.2 lists the registers not affected by a FLR
but neglects to mention the SPDM/CMA session state.  I think that's
an erratum, I've requested access to the protocol workgroup to
report that.)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-05-27  9:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-03 15:34 [RFC PATCH 0/1] DOE usage with pcie/portdrv Jonathan Cameron
2022-05-03 15:34 ` [RFC PATCH 1/1] pcie/portdrv: Hack in DOE and CDAT support Jonathan Cameron
2022-05-06 22:40 ` [RFC PATCH 0/1] DOE usage with pcie/portdrv Dan Williams
2022-05-07 10:18   ` Lukas Wunner
2022-05-09  9:48     ` Jonathan Cameron
2022-05-11 19:13       ` Lukas Wunner
2022-05-11 19:19         ` Lukas Wunner
2022-05-11 19:43           ` Dan Williams
2022-05-14 13:55             ` Lukas Wunner
2022-05-16 17:01               ` Dan Williams
2022-05-27  9:39                 ` Lukas Wunner
2022-05-18 13:43               ` Christoph Hellwig
2022-05-18 15:08                 ` Dan Williams
2022-05-20  5:42                 ` Lukas Wunner
2022-05-20 15:37                   ` Dan Williams
2022-05-20 15:42                     ` Chuck Lever III
2022-05-11 19:42         ` Dan Williams
2022-05-11 20:22           ` Hindman, Gavin
2022-05-11 21:04             ` Dan Williams
2022-05-14 13:31           ` Lukas Wunner
2022-05-16 16:53             ` Dan Williams
2022-05-09  9:33   ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).