linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] CXL: Taint user access to DOE mailbox config space
@ 2022-08-22  0:52 ira.weiny
  2022-08-22  0:52 ` [PATCH 1/2] PCI: Allow drivers to request exclusive config regions ira.weiny
  2022-08-22  0:52 ` [PATCH 2/2] cxl/doe: Request exclusive DOE access ira.weiny
  0 siblings, 2 replies; 9+ messages in thread
From: ira.weiny @ 2022-08-22  0:52 UTC (permalink / raw)
  To: Dan Williams, Bjorn Helgaas, Greg Kroah-Hartman
  Cc: Ira Weiny, Jonathan Cameron, Alison Schofield, Vishal Verma,
	Ben Widawsky, linux-cxl, linux-kernel, linux-pci

From: Ira Weiny <ira.weiny@intel.com>

PCI config space access from user space has traditionally been unrestricted
with writes being an understood risk for device operation.

Unfortunately, device breakage or odd behavior from config writes lacks
indicators that can leave driver writers confused when evaluating failures.
This is especially true with the new PCIe Data Object Exchange (DOE) mailbox
protocol where backdoor shenanigans from user space through things such as
vendor defined protocols may affect device operation without complete breakage.

Even though access should not be restricted it would be nice for driver writers
to be able to flag critical parts of the config space such that interference
from user space can be detected.

Introduce pci_request_config_region_exclusive() and use it in the CXL driver
for DOE config space.

Ira Weiny (2):
  PCI: Allow drivers to request exclusive config regions
  cxl/doe: Request exclusive DOE access

 drivers/cxl/pci.c             |  5 +++++
 drivers/pci/pci-sysfs.c       |  6 ++++++
 drivers/pci/probe.c           |  6 ++++++
 include/linux/ioport.h        |  2 ++
 include/linux/pci.h           | 16 ++++++++++++++++
 include/uapi/linux/pci_regs.h |  1 +
 kernel/resource.c             | 13 ++++++++-----
 7 files changed, 44 insertions(+), 5 deletions(-)


base-commit: 1cd8a2537eb07751d405ab7e2223f20338a90506
-- 
2.37.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] PCI: Allow drivers to request exclusive config regions
  2022-08-22  0:52 [PATCH 0/2] CXL: Taint user access to DOE mailbox config space ira.weiny
@ 2022-08-22  0:52 ` ira.weiny
  2022-08-22  6:39   ` Greg Kroah-Hartman
  2022-08-22  0:52 ` [PATCH 2/2] cxl/doe: Request exclusive DOE access ira.weiny
  1 sibling, 1 reply; 9+ messages in thread
From: ira.weiny @ 2022-08-22  0:52 UTC (permalink / raw)
  To: Dan Williams, Bjorn Helgaas, Greg Kroah-Hartman
  Cc: Ira Weiny, Jonathan Cameron, Alison Schofield, Vishal Verma,
	Ben Widawsky, linux-cxl, linux-kernel, linux-pci

From: Ira Weiny <ira.weiny@intel.com>

PCI config space access from user space has traditionally been
unrestricted with writes being an understood risk for device operation.

Unfortunately, device breakage or odd behavior from config writes lacks
indicators that can leave driver writers confused when evaluating
failures.  This is especially true with the new PCIe Data Object
Exchange (DOE) mailbox protocol where backdoor shenanigans from user
space through things such as vendor defined protocols may affect device
operation without complete breakage.

A prior proposal restricted read and writes completely.[1]  Greg and
Bjorn pointed out that proposal is flawed for a couple of reasons.
First, lspci should always be allowed and should not interfere with any
device operation.  Second, setpci is a valuable tool that is sometimes
necessary and it should not be completely restricted.[2]  Finally
methods exist for full lock of device access if required.

Even though access should not be restricted it would be nice for driver
writers to be able to flag critical parts of the config space such that
interference from user space can be detected.

Introduce pci_request_config_region_exclusive() to mark exclusive config
regions.  Such regions trigger a warning and kernel taint if accessed
via user space.

[1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
[2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from[1]:
	Change name to pci_request_config_region_exclusive()
	Don't flag reads at all.
	Allow writes with a warn and taint of the kernel.
	Update commit message
	Forward port to latest tree.
---
 drivers/pci/pci-sysfs.c |  6 ++++++
 drivers/pci/probe.c     |  6 ++++++
 include/linux/ioport.h  |  2 ++
 include/linux/pci.h     | 16 ++++++++++++++++
 kernel/resource.c       | 13 ++++++++-----
 5 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index fc804e08e3cb..de41d761bdf5 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -755,6 +755,12 @@ static ssize_t pci_write_config(struct file *filp, struct kobject *kobj,
 	if (ret)
 		return ret;
 
+	if (resource_is_exclusive(&dev->config_resource, off,
+				  count)) {
+		pci_warn(dev, "Write to restricted range %llx detected", off);
+		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+	}
+
 	if (off > dev->cfg_size)
 		return 0;
 	if (off + count > dev->cfg_size) {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6280e780a48c..d81d7457058b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2303,6 +2303,12 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
 	dev->bus = pci_bus_get(bus);
+	dev->config_resource = (struct resource) {
+		.name = "PCI Config",
+		.start = 0,
+		.end = -1,
+	};
+
 #ifdef CONFIG_PCI_MSI
 	raw_spin_lock_init(&dev->msi_lock);
 #endif
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 616b683563a9..cf1de55d14da 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -312,6 +312,8 @@ extern void __devm_release_region(struct device *dev, struct resource *parent,
 				  resource_size_t start, resource_size_t n);
 extern int iomem_map_sanity_check(resource_size_t addr, unsigned long size);
 extern bool iomem_is_exclusive(u64 addr);
+extern bool resource_is_exclusive(struct resource *resource, u64 addr,
+				  resource_size_t size);
 
 extern int
 walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 81a57b498f22..dde37bfa0ca5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -409,6 +409,7 @@ struct pci_dev {
 	 */
 	unsigned int	irq;
 	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+	struct resource config_resource;		 /* driver exclusive config ranges */
 
 	bool		match_driver;		/* Skip attaching driver */
 
@@ -1406,6 +1407,21 @@ int pci_request_selected_regions(struct pci_dev *, int, const char *);
 int pci_request_selected_regions_exclusive(struct pci_dev *, int, const char *);
 void pci_release_selected_regions(struct pci_dev *, int);
 
+static inline __must_check struct resource *
+pci_request_config_region_exclusive(struct pci_dev *pdev, unsigned int offset,
+				    unsigned int len, const char *name)
+{
+	return __request_region(&pdev->config_resource, offset, len, name,
+				IORESOURCE_EXCLUSIVE);
+}
+
+static inline void pci_release_config_region(struct pci_dev *pdev,
+					     unsigned int offset,
+					     unsigned int len)
+{
+	__release_region(&pdev->config_resource, offset, len);
+}
+
 /* drivers/pci/bus.c */
 void pci_add_resource(struct list_head *resources, struct resource *res);
 void pci_add_resource_offset(struct list_head *resources, struct resource *res,
diff --git a/kernel/resource.c b/kernel/resource.c
index 4c5e80b92f2f..82ed54cd1f0d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1707,18 +1707,15 @@ static int strict_iomem_checks;
  *
  * Returns true if exclusive to the kernel, otherwise returns false.
  */
-bool iomem_is_exclusive(u64 addr)
+bool resource_is_exclusive(struct resource *root, u64 addr, resource_size_t size)
 {
 	const unsigned int exclusive_system_ram = IORESOURCE_SYSTEM_RAM |
 						  IORESOURCE_EXCLUSIVE;
 	bool skip_children = false, err = false;
-	int size = PAGE_SIZE;
 	struct resource *p;
 
-	addr = addr & PAGE_MASK;
-
 	read_lock(&resource_lock);
-	for_each_resource(&iomem_resource, p, skip_children) {
+	for_each_resource(root, p, skip_children) {
 		if (p->start >= addr + size)
 			break;
 		if (p->end < addr) {
@@ -1757,6 +1754,12 @@ bool iomem_is_exclusive(u64 addr)
 	return err;
 }
 
+bool iomem_is_exclusive(u64 addr)
+{
+	return resource_is_exclusive(&iomem_resource, addr & PAGE_MASK,
+				     PAGE_SIZE);
+}
+
 struct resource_entry *resource_list_create_entry(struct resource *res,
 						  size_t extra_size)
 {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] cxl/doe: Request exclusive DOE access
  2022-08-22  0:52 [PATCH 0/2] CXL: Taint user access to DOE mailbox config space ira.weiny
  2022-08-22  0:52 ` [PATCH 1/2] PCI: Allow drivers to request exclusive config regions ira.weiny
@ 2022-08-22  0:52 ` ira.weiny
  1 sibling, 0 replies; 9+ messages in thread
From: ira.weiny @ 2022-08-22  0:52 UTC (permalink / raw)
  To: Dan Williams, Bjorn Helgaas, Greg Kroah-Hartman
  Cc: Ira Weiny, Jonathan Cameron, Alison Schofield, Vishal Verma,
	Ben Widawsky, linux-cxl, linux-kernel, linux-pci

From: Ira Weiny <ira.weiny@intel.com>

The PCIE Data Object Exchange (DOE) mailbox is a protocol run over
configuration cycles.  It assumes one initiator at a time.  While the
kernel has control of the mailbox user space writes could interfere with
the kernel access.

Mark DOE mailbox config space exclusive when iterated by the CXL driver.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/pci.c             | 5 +++++
 include/uapi/linux/pci_regs.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..5b833eb91543 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -418,6 +418,11 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
 			continue;
 		}
 
+		if (!pci_request_config_region_exclusive(pdev, off,
+							 PCI_DOE_CAP_SIZE,
+							 dev_name(dev)))
+			pci_err(pdev, "Failed to exclude DOE registers\n");
+
 		if (xa_insert(&cxlds->doe_mbs, off, doe_mb, GFP_KERNEL)) {
 			dev_err(dev, "xa_insert failed to insert MB @ %x\n",
 				off);
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 57b8e2ffb1dd..f2396bcd09cc 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -1119,6 +1119,7 @@
 #define  PCI_DOE_STATUS_DATA_OBJECT_READY	0x80000000  /* Data Object Ready */
 #define PCI_DOE_WRITE		0x10    /* DOE Write Data Mailbox Register */
 #define PCI_DOE_READ		0x14    /* DOE Read Data Mailbox Register */
+#define PCI_DOE_CAP_SIZE	(0x14 + 4)	/* Size of this register block */
 
 /* DOE Data Object - note not actually registers */
 #define PCI_DOE_DATA_OBJECT_HEADER_1_VID		0x0000ffff
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI: Allow drivers to request exclusive config regions
  2022-08-22  0:52 ` [PATCH 1/2] PCI: Allow drivers to request exclusive config regions ira.weiny
@ 2022-08-22  6:39   ` Greg Kroah-Hartman
  2022-08-22 19:44     ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2022-08-22  6:39 UTC (permalink / raw)
  To: ira.weiny
  Cc: Dan Williams, Bjorn Helgaas, Jonathan Cameron, Alison Schofield,
	Vishal Verma, Ben Widawsky, linux-cxl, linux-kernel, linux-pci

On Sun, Aug 21, 2022 at 08:52:36PM -0400, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> PCI config space access from user space has traditionally been
> unrestricted with writes being an understood risk for device operation.
> 
> Unfortunately, device breakage or odd behavior from config writes lacks
> indicators that can leave driver writers confused when evaluating
> failures.  This is especially true with the new PCIe Data Object
> Exchange (DOE) mailbox protocol where backdoor shenanigans from user
> space through things such as vendor defined protocols may affect device
> operation without complete breakage.

What userspace tools are out there messing with PCI config space through
userspace on these devices today?  How is this the kernel's fault if
someone runs such a thing?

> A prior proposal restricted read and writes completely.[1]  Greg and
> Bjorn pointed out that proposal is flawed for a couple of reasons.
> First, lspci should always be allowed and should not interfere with any
> device operation.  Second, setpci is a valuable tool that is sometimes
> necessary and it should not be completely restricted.[2]  Finally
> methods exist for full lock of device access if required.
> 
> Even though access should not be restricted it would be nice for driver
> writers to be able to flag critical parts of the config space such that
> interference from user space can be detected.
> 
> Introduce pci_request_config_region_exclusive() to mark exclusive config
> regions.  Such regions trigger a warning and kernel taint if accessed
> via user space.
> 
> [1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
> [2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/
> 
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from[1]:
> 	Change name to pci_request_config_region_exclusive()
> 	Don't flag reads at all.
> 	Allow writes with a warn and taint of the kernel.
> 	Update commit message
> 	Forward port to latest tree.
> ---
>  drivers/pci/pci-sysfs.c |  6 ++++++
>  drivers/pci/probe.c     |  6 ++++++
>  include/linux/ioport.h  |  2 ++
>  include/linux/pci.h     | 16 ++++++++++++++++
>  kernel/resource.c       | 13 ++++++++-----
>  5 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index fc804e08e3cb..de41d761bdf5 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -755,6 +755,12 @@ static ssize_t pci_write_config(struct file *filp, struct kobject *kobj,
>  	if (ret)
>  		return ret;
>  
> +	if (resource_is_exclusive(&dev->config_resource, off,
> +				  count)) {
> +		pci_warn(dev, "Write to restricted range %llx detected", off);

Will this allow any user to spam the kernel log from userspace?  You
might want to rate-limit it, right?

> +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +	}
> +
>  	if (off > dev->cfg_size)
>  		return 0;
>  	if (off + count > dev->cfg_size) {
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6280e780a48c..d81d7457058b 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2303,6 +2303,12 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>  	INIT_LIST_HEAD(&dev->bus_list);
>  	dev->dev.type = &pci_dev_type;
>  	dev->bus = pci_bus_get(bus);
> +	dev->config_resource = (struct resource) {
> +		.name = "PCI Config",
> +		.start = 0,
> +		.end = -1,
> +	};
> +
>  #ifdef CONFIG_PCI_MSI
>  	raw_spin_lock_init(&dev->msi_lock);
>  #endif
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 616b683563a9..cf1de55d14da 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -312,6 +312,8 @@ extern void __devm_release_region(struct device *dev, struct resource *parent,
>  				  resource_size_t start, resource_size_t n);
>  extern int iomem_map_sanity_check(resource_size_t addr, unsigned long size);
>  extern bool iomem_is_exclusive(u64 addr);
> +extern bool resource_is_exclusive(struct resource *resource, u64 addr,
> +				  resource_size_t size);
>  
>  extern int
>  walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 81a57b498f22..dde37bfa0ca5 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -409,6 +409,7 @@ struct pci_dev {
>  	 */
>  	unsigned int	irq;
>  	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
> +	struct resource config_resource;		 /* driver exclusive config ranges */

So the driver only gets 1 range to mark this way?

What are the ranges for typical devices that have this problem?

This still feels very odd to me, how far do we have to go to protect
userspace from doing bad things on systems when they have the
permissions and access to do those bad things?

What are you trying to protect yourself from, bogus bug reports by
people doing bad things and then blaming you?  That's easy to handle,
just ignore them :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI: Allow drivers to request exclusive config regions
  2022-08-22  6:39   ` Greg Kroah-Hartman
@ 2022-08-22 19:44     ` Dan Williams
  2022-08-22 20:38       ` Ira Weiny
  2022-08-24  9:52       ` Jonathan Cameron
  0 siblings, 2 replies; 9+ messages in thread
From: Dan Williams @ 2022-08-22 19:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman, ira.weiny
  Cc: Dan Williams, Bjorn Helgaas, Jonathan Cameron, Alison Schofield,
	Vishal Verma, Ben Widawsky, linux-cxl, linux-kernel, linux-pci

Greg Kroah-Hartman wrote:
> On Sun, Aug 21, 2022 at 08:52:36PM -0400, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > PCI config space access from user space has traditionally been
> > unrestricted with writes being an understood risk for device operation.
> > 
> > Unfortunately, device breakage or odd behavior from config writes lacks
> > indicators that can leave driver writers confused when evaluating
> > failures.  This is especially true with the new PCIe Data Object
> > Exchange (DOE) mailbox protocol where backdoor shenanigans from user
> > space through things such as vendor defined protocols may affect device
> > operation without complete breakage.
> 
> What userspace tools are out there messing with PCI config space through
> userspace on these devices today?  How is this the kernel's fault if
> someone runs such a thing?
> 
> > A prior proposal restricted read and writes completely.[1]  Greg and
> > Bjorn pointed out that proposal is flawed for a couple of reasons.
> > First, lspci should always be allowed and should not interfere with any
> > device operation.  Second, setpci is a valuable tool that is sometimes
> > necessary and it should not be completely restricted.[2]  Finally
> > methods exist for full lock of device access if required.
> > 
> > Even though access should not be restricted it would be nice for driver
> > writers to be able to flag critical parts of the config space such that
> > interference from user space can be detected.
> > 
> > Introduce pci_request_config_region_exclusive() to mark exclusive config
> > regions.  Such regions trigger a warning and kernel taint if accessed
> > via user space.
> > 
> > [1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
> > [2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/
> > 
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from[1]:
> > 	Change name to pci_request_config_region_exclusive()
> > 	Don't flag reads at all.
> > 	Allow writes with a warn and taint of the kernel.
> > 	Update commit message
> > 	Forward port to latest tree.
> > ---
> >  drivers/pci/pci-sysfs.c |  6 ++++++
> >  drivers/pci/probe.c     |  6 ++++++
> >  include/linux/ioport.h  |  2 ++
> >  include/linux/pci.h     | 16 ++++++++++++++++
> >  kernel/resource.c       | 13 ++++++++-----
> >  5 files changed, 38 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index fc804e08e3cb..de41d761bdf5 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -755,6 +755,12 @@ static ssize_t pci_write_config(struct file *filp, struct kobject *kobj,
> >  	if (ret)
> >  		return ret;
> >  
> > +	if (resource_is_exclusive(&dev->config_resource, off,
> > +				  count)) {
> > +		pci_warn(dev, "Write to restricted range %llx detected", off);

Note to Ira, I would expect a message like:

"Unexpected user write to kernel-exclusive config offset %#llx\n"

...this probaly also wants current->comm similar to the
lockdown_is_locked_down() warning.

> Will this allow any user to spam the kernel log from userspace?  You
> might want to rate-limit it, right?

It should be a once-only message. You only get one chance to trample on
a configuration address range that the kernel cares about and then
tainted.

> > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > +	}
> > +
> >  	if (off > dev->cfg_size)
> >  		return 0;
> >  	if (off + count > dev->cfg_size) {
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 6280e780a48c..d81d7457058b 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -2303,6 +2303,12 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
> >  	INIT_LIST_HEAD(&dev->bus_list);
> >  	dev->dev.type = &pci_dev_type;
> >  	dev->bus = pci_bus_get(bus);
> > +	dev->config_resource = (struct resource) {
> > +		.name = "PCI Config",
> > +		.start = 0,
> > +		.end = -1,
> > +	};
> > +
> >  #ifdef CONFIG_PCI_MSI
> >  	raw_spin_lock_init(&dev->msi_lock);
> >  #endif
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index 616b683563a9..cf1de55d14da 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -312,6 +312,8 @@ extern void __devm_release_region(struct device *dev, struct resource *parent,
> >  				  resource_size_t start, resource_size_t n);
> >  extern int iomem_map_sanity_check(resource_size_t addr, unsigned long size);
> >  extern bool iomem_is_exclusive(u64 addr);
> > +extern bool resource_is_exclusive(struct resource *resource, u64 addr,
> > +				  resource_size_t size);
> >  
> >  extern int
> >  walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 81a57b498f22..dde37bfa0ca5 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -409,6 +409,7 @@ struct pci_dev {
> >  	 */
> >  	unsigned int	irq;
> >  	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
> > +	struct resource config_resource;		 /* driver exclusive config ranges */
> 
> So the driver only gets 1 range to mark this way?

No, config_resource is just the base of the resource tree to walk
similar to how iomem_resource is consumed in the current
iomem_is_exclusive() implementation.

> What are the ranges for typical devices that have this problem?

Unfortunately DOE is an extended capability that can pop any number of
instances in that per-device list.

Although I also think there is potential to use this in something like
pci_iomap() to trigger a future warning if userspace mucks with the BARs
that the driver is using.

> This still feels very odd to me, how far do we have to go to protect
> userspace from doing bad things on systems when they have the
> permissions and access to do those bad things?

Right, this mechanism isn't about protection as much as it is reserving
the space for kernel implementations of DOE protocols. Outright
'protection' is already there today in the form CONFIG_LOCK_DOWN_KERNEL
that prevents userspace config writes. There just are not many distro
kernels that turn that protection on.

> What are you trying to protect yourself from, bogus bug reports by
> people doing bad things and then blaming you?  That's easy to handle,
> just ignore them :)

I asked Ira to push on this to protect the kernel from people like me,
:). So, there is this massively complicated specification for device
attestation and link integrity / encryption protection (SPDM and IDE)
that has applications to both PCIe and CXL. I do not see a path in the
near term to land that support in the kernel.

DOE being user accessible though, lends itself to pure userspace
implementations of SPDM and IDE infrastructure. I want to develop that
infrastructure, but also have the kernel reserve the space / right to
obviate that implementation with kernel control of the DOE mailbox, SPDM
sessions, and IDE keys in the future.

The warning helps keeps proof-of-concept and/or proprietary DOE code out
of production release streams on the observation that end users tend to
demand warn-free and taint-free kernel deployments.

Similar to CONFIG_IO_STRICT_DEVMEM, just turn this warning off if you
are a distribution kernel vendor who does not care about DOE vendor
tools affecting system state, and turn on CONFIG_LOCK_DOWN_KERNEL if you
are on the other end of the spectrum and want protection from such
things. The warn is middle-road option that I expect most distros would
use.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI: Allow drivers to request exclusive config regions
  2022-08-22 19:44     ` Dan Williams
@ 2022-08-22 20:38       ` Ira Weiny
  2022-08-22 21:18         ` Dan Williams
  2022-08-24  9:52       ` Jonathan Cameron
  1 sibling, 1 reply; 9+ messages in thread
From: Ira Weiny @ 2022-08-22 20:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: Greg Kroah-Hartman, Bjorn Helgaas, Jonathan Cameron,
	Alison Schofield, Vishal Verma, Ben Widawsky, linux-cxl,
	linux-kernel, linux-pci

On Mon, Aug 22, 2022 at 12:44:54PM -0700, Dan Williams wrote:
> Greg Kroah-Hartman wrote:
> > On Sun, Aug 21, 2022 at 08:52:36PM -0400, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > PCI config space access from user space has traditionally been
> > > unrestricted with writes being an understood risk for device operation.
> > > 
> > > Unfortunately, device breakage or odd behavior from config writes lacks
> > > indicators that can leave driver writers confused when evaluating
> > > failures.  This is especially true with the new PCIe Data Object
> > > Exchange (DOE) mailbox protocol where backdoor shenanigans from user
> > > space through things such as vendor defined protocols may affect device
> > > operation without complete breakage.
> > 

I was about to respond when I saw Dan's message so I will respond to both of
you.

> > What userspace tools are out there messing with PCI config space through
> > userspace on these devices today?  How is this the kernel's fault if
> > someone runs such a thing?

It is not the kernels fault but multi-party actors on a DOE mailbox can result
in the kernel getting incorrect responses for it's valid query.  After my
conversation with Jonathan during DOE development[1] I'm more convinced that
this tainting of the kernel will be valuable.

[1] https://lore.kernel.org/linux-cxl/20220704074508.00000cac@Huawei.com/

> > 
> > > A prior proposal restricted read and writes completely.[1]  Greg and
> > > Bjorn pointed out that proposal is flawed for a couple of reasons.
> > > First, lspci should always be allowed and should not interfere with any
> > > device operation.  Second, setpci is a valuable tool that is sometimes
> > > necessary and it should not be completely restricted.[2]  Finally
> > > methods exist for full lock of device access if required.
> > > 
> > > Even though access should not be restricted it would be nice for driver
> > > writers to be able to flag critical parts of the config space such that
> > > interference from user space can be detected.
> > > 
> > > Introduce pci_request_config_region_exclusive() to mark exclusive config
> > > regions.  Such regions trigger a warning and kernel taint if accessed
> > > via user space.
> > > 
> > > [1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
> > > [2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/
> > > 
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > ---
> > > Changes from[1]:
> > > 	Change name to pci_request_config_region_exclusive()
> > > 	Don't flag reads at all.
> > > 	Allow writes with a warn and taint of the kernel.
> > > 	Update commit message
> > > 	Forward port to latest tree.
> > > ---
> > >  drivers/pci/pci-sysfs.c |  6 ++++++
> > >  drivers/pci/probe.c     |  6 ++++++
> > >  include/linux/ioport.h  |  2 ++
> > >  include/linux/pci.h     | 16 ++++++++++++++++
> > >  kernel/resource.c       | 13 ++++++++-----
> > >  5 files changed, 38 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > index fc804e08e3cb..de41d761bdf5 100644
> > > --- a/drivers/pci/pci-sysfs.c
> > > +++ b/drivers/pci/pci-sysfs.c
> > > @@ -755,6 +755,12 @@ static ssize_t pci_write_config(struct file *filp, struct kobject *kobj,
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > +	if (resource_is_exclusive(&dev->config_resource, off,
> > > +				  count)) {
> > > +		pci_warn(dev, "Write to restricted range %llx detected", off);
> 
> Note to Ira, I would expect a message like:
> 
> "Unexpected user write to kernel-exclusive config offset %#llx\n"

Done.

> 
> ...this probaly also wants current->comm similar to the
> lockdown_is_locked_down() warning.

Done.

> 
> > Will this allow any user to spam the kernel log from userspace?  You
> > might want to rate-limit it, right?
> 
> It should be a once-only message. You only get one chance to trample on
> a configuration address range that the kernel cares about and then
> tainted.

Yes my bad.  I should have thought of this.  Changed to pci_WARN_ONCE().

> 
> > > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > +	}
> > > +
> > >  	if (off > dev->cfg_size)
> > >  		return 0;
> > >  	if (off + count > dev->cfg_size) {
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index 6280e780a48c..d81d7457058b 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -2303,6 +2303,12 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
> > >  	INIT_LIST_HEAD(&dev->bus_list);
> > >  	dev->dev.type = &pci_dev_type;
> > >  	dev->bus = pci_bus_get(bus);
> > > +	dev->config_resource = (struct resource) {
> > > +		.name = "PCI Config",
> > > +		.start = 0,
> > > +		.end = -1,
> > > +	};
> > > +
> > >  #ifdef CONFIG_PCI_MSI
> > >  	raw_spin_lock_init(&dev->msi_lock);
> > >  #endif
> > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > > index 616b683563a9..cf1de55d14da 100644
> > > --- a/include/linux/ioport.h
> > > +++ b/include/linux/ioport.h
> > > @@ -312,6 +312,8 @@ extern void __devm_release_region(struct device *dev, struct resource *parent,
> > >  				  resource_size_t start, resource_size_t n);
> > >  extern int iomem_map_sanity_check(resource_size_t addr, unsigned long size);
> > >  extern bool iomem_is_exclusive(u64 addr);
> > > +extern bool resource_is_exclusive(struct resource *resource, u64 addr,
> > > +				  resource_size_t size);
> > >  
> > >  extern int
> > >  walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 81a57b498f22..dde37bfa0ca5 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -409,6 +409,7 @@ struct pci_dev {
> > >  	 */
> > >  	unsigned int	irq;
> > >  	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
> > > +	struct resource config_resource;		 /* driver exclusive config ranges */
> > 
> > So the driver only gets 1 range to mark this way?
> 
> No, config_resource is just the base of the resource tree to walk
> similar to how iomem_resource is consumed in the current
> iomem_is_exclusive() implementation.

I had to double check but yea, Dan is correct here.

> 
> > What are the ranges for typical devices that have this problem?
> 
> Unfortunately DOE is an extended capability that can pop any number of
> instances in that per-device list.
> 
> Although I also think there is potential to use this in something like
> pci_iomap() to trigger a future warning if userspace mucks with the BARs
> that the driver is using.
> 
> > This still feels very odd to me, how far do we have to go to protect
> > userspace from doing bad things on systems when they have the
> > permissions and access to do those bad things?
> 
> Right, this mechanism isn't about protection as much as it is reserving
> the space for kernel implementations of DOE protocols. Outright
> 'protection' is already there today in the form CONFIG_LOCK_DOWN_KERNEL
> that prevents userspace config writes. There just are not many distro
> kernels that turn that protection on.

I agree that there is no real protection.  This is just a way of us knowing
that bug reports have outside influence which can have real bad consequences.

> 
> > What are you trying to protect yourself from, bogus bug reports by
> > people doing bad things and then blaming you?  That's easy to handle,
> > just ignore them :)

I think the problem is that the bug report could be very difficult to know that
something bad was done.  I'm ok with ignoring them.  But this makes it clear
that the bug report was potentially user doing bad things.

I see bug reports being something like:

"Kernel issued query for CDAT.  Device returned garbage."

Vs

"Tainted kernel (error message seen) issued query for CDAT.  Device returned
garbage."

The first could be a malfunctioning device.  Where the second can clearly be
pushed back to the user to stop doing bad things.  I can't in good conscience
ignore the first report.

> 
> I asked Ira to push on this to protect the kernel from people like me,
> :). So, there is this massively complicated specification for device
> attestation and link integrity / encryption protection (SPDM and IDE)
> that has applications to both PCIe and CXL. I do not see a path in the
> near term to land that support in the kernel.
> 
> DOE being user accessible though, lends itself to pure userspace
> implementations of SPDM and IDE infrastructure. I want to develop that
> infrastructure, but also have the kernel reserve the space / right to
> obviate that implementation with kernel control of the DOE mailbox, SPDM
> sessions, and IDE keys in the future.
> 
> The warning helps keeps proof-of-concept and/or proprietary DOE code out
> of production release streams on the observation that end users tend to
> demand warn-free and taint-free kernel deployments.
> 
> Similar to CONFIG_IO_STRICT_DEVMEM, just turn this warning off if you
> are a distribution kernel vendor who does not care about DOE vendor
> tools affecting system state, and turn on CONFIG_LOCK_DOWN_KERNEL if you
> are on the other end of the spectrum and want protection from such
> things. The warn is middle-road option that I expect most distros would
> use.

FWIW I did not add any Kconfig for the feature.  Should I?

Ira

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI: Allow drivers to request exclusive config regions
  2022-08-22 20:38       ` Ira Weiny
@ 2022-08-22 21:18         ` Dan Williams
  2022-08-23  7:19           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2022-08-22 21:18 UTC (permalink / raw)
  To: Ira Weiny, Dan Williams
  Cc: Greg Kroah-Hartman, Bjorn Helgaas, Jonathan Cameron,
	Alison Schofield, Vishal Verma, Ben Widawsky, linux-cxl,
	linux-kernel, linux-pci

Ira Weiny wrote:
> > Similar to CONFIG_IO_STRICT_DEVMEM, just turn this warning off if you
> > are a distribution kernel vendor who does not care about DOE vendor
> > tools affecting system state, and turn on CONFIG_LOCK_DOWN_KERNEL if you
> > are on the other end of the spectrum and want protection from such
> > things. The warn is middle-road option that I expect most distros would
> > use.
> 
> FWIW I did not add any Kconfig for the feature.  Should I?

I think so, especially as it is a distro configuration policy about what
user-mode PCI tooling it wants to support / warn about.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI: Allow drivers to request exclusive config regions
  2022-08-22 21:18         ` Dan Williams
@ 2022-08-23  7:19           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2022-08-23  7:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Bjorn Helgaas, Jonathan Cameron, Alison Schofield,
	Vishal Verma, Ben Widawsky, linux-cxl, linux-kernel, linux-pci

On Mon, Aug 22, 2022 at 02:18:17PM -0700, Dan Williams wrote:
> Ira Weiny wrote:
> > > Similar to CONFIG_IO_STRICT_DEVMEM, just turn this warning off if you
> > > are a distribution kernel vendor who does not care about DOE vendor
> > > tools affecting system state, and turn on CONFIG_LOCK_DOWN_KERNEL if you
> > > are on the other end of the spectrum and want protection from such
> > > things. The warn is middle-road option that I expect most distros would
> > > use.
> > 
> > FWIW I did not add any Kconfig for the feature.  Should I?
> 
> I think so, especially as it is a distro configuration policy about what
> user-mode PCI tooling it wants to support / warn about.

No, no need for that, distros are forced to just "enable everything" as
they don't want to make any policy decision like that.  So just make it
always present, no one will turn it off.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] PCI: Allow drivers to request exclusive config regions
  2022-08-22 19:44     ` Dan Williams
  2022-08-22 20:38       ` Ira Weiny
@ 2022-08-24  9:52       ` Jonathan Cameron
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2022-08-24  9:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Greg Kroah-Hartman, ira.weiny, Bjorn Helgaas, Alison Schofield,
	Vishal Verma, Ben Widawsky, linux-cxl, linux-kernel, linux-pci,
	Lukas Wunner

> 
> > What are you trying to protect yourself from, bogus bug reports by
> > people doing bad things and then blaming you?  That's easy to handle,
> > just ignore them :)  
> 
> I asked Ira to push on this to protect the kernel from people like me,
> :). So, there is this massively complicated specification for device
> attestation and link integrity / encryption protection (SPDM and IDE)
> that has applications to both PCIe and CXL. I do not see a path in the
> near term to land that support in the kernel.
> 
> DOE being user accessible though, lends itself to pure userspace
> implementations of SPDM and IDE infrastructure. I want to develop that
> infrastructure, but also have the kernel reserve the space / right to
> obviate that implementation with kernel control of the DOE mailbox, SPDM
> sessions, and IDE keys in the future.
Can't resist...

If anyone is at Plumbers (in person or virtually) the will be a BoF on
SPDM etc. Not scheduled yet...

https://lpc.events/event/16/contributions/1304/

Come join the Kernel vs Partly Kernel vs fully Userspace discussions.

Thanks,

Jonathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-24  9:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-22  0:52 [PATCH 0/2] CXL: Taint user access to DOE mailbox config space ira.weiny
2022-08-22  0:52 ` [PATCH 1/2] PCI: Allow drivers to request exclusive config regions ira.weiny
2022-08-22  6:39   ` Greg Kroah-Hartman
2022-08-22 19:44     ` Dan Williams
2022-08-22 20:38       ` Ira Weiny
2022-08-22 21:18         ` Dan Williams
2022-08-23  7:19           ` Greg Kroah-Hartman
2022-08-24  9:52       ` Jonathan Cameron
2022-08-22  0:52 ` [PATCH 2/2] cxl/doe: Request exclusive DOE access ira.weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).