linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Add parameter for disabling ACS redirection for P2P
@ 2018-06-18 19:36 Logan Gunthorpe
  2018-06-18 19:36 ` [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable Logan Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-18 19:36 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Hi,

As discussed in our PCI P2PDMA series, we'd like to add a kernel
parameter for selectively disabling ACS redirection for select
bridges. Seeing this turned out to be a small series in itself, we've
decided to send this separately from the P2P work.

This series generalizes the code already done for the resource_alignment
option that already exists. The first patch creates a helper function
to match PCI devices against strings based on the code that already
existed in pci_specified_resource_alignment().

The second patch expands the new helper to optionally take a path of
PCI devfns. This is to address Alex's renumbering concern when using
simple bus-devfns. The implementation is essentially how he described it and
similar to the Intel VT-d spec (Section 8.3.1).

The final patch adds the disable_acs_redir kernel parameter which takes
a list of PCI devices and will disable the ACS P2P Request Redirect,
ACS P2P Completion Redirect and ACS P2P Egress Control bits for the
selected devices. This allows P2P traffic between selected bridges and
seeing it's done at boot, before IOMMU group creating the IOMMU groups
will be created correctly based on the bits.

Thanks,

Logan

--

Changes since v2:
* Rebased onto v4.18-rc1 (no conflicts)
* Minor tweaks to the documentation per Andy
* Removed the "path:" prefix and use the path parsing code
  for simple devices (as it works the same). Per a suggestion from Alex

Changes since v1:

* Reworked pci_dev_str_match_path using strrchr as suggested by Alex
* Collected Christian's Acks

--

Logan Gunthorpe (3):
  PCI: Make specifying PCI devices in kernel parameters reusable
  PCI: Allow specifying devices using a base bus and path of devfns
  PCI: Introduce the disable_acs_redir parameter

 Documentation/admin-guide/kernel-parameters.txt |  39 ++-
 drivers/pci/pci.c                               | 336 ++++++++++++++++++++----
 2 files changed, 314 insertions(+), 61 deletions(-)

--
2.11.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable
  2018-06-18 19:36 [PATCH v3 0/3] Add parameter for disabling ACS redirection for P2P Logan Gunthorpe
@ 2018-06-18 19:36 ` Logan Gunthorpe
  2018-06-18 21:44   ` Alex Williamson
  2018-06-18 19:36 ` [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns Logan Gunthorpe
  2018-06-18 19:36 ` [PATCH v3 3/3] PCI: Introduce the disable_acs_redir parameter Logan Gunthorpe
  2 siblings, 1 reply; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-18 19:36 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

Separate out the code to match a PCI device with a string (typically
originating from a kernel parameter) from the
pci_specified_resource_alignment() function into its own helper
function.

While we are at it, this change fixes the kernel style of the function
(fixing a number of long lines and extra parentheses).

Additionally, make the analogous change to the kernel parameter
documentation: Separating the description of how to specify a PCI device
into it's own section at the head of the pci= parameter.

This patch should have no functional alterations.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  26 +++-
 drivers/pci/pci.c                               | 153 +++++++++++++++---------
 2 files changed, 120 insertions(+), 59 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index efc7aa7a0670..760fb2b0b349 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2994,7 +2994,24 @@
 			See header of drivers/block/paride/pcd.c.
 			See also Documentation/blockdev/paride.txt.
 
-	pci=option[,option...]	[PCI] various PCI subsystem options:
+	pci=option[,option...]	[PCI] various PCI subsystem options.
+
+				Some options herein operate on a specific device
+				or a set of devices (<pci_dev>). These are
+				specified in one of the following formats:
+
+				[<domain>:]<bus>:<slot>.<func>
+				pci:<vendor>:<device>[:<subvendor>:<subdevice>]
+
+				Note: the first format specifies a PCI
+				bus/slot/function address which may change
+				if new hardware is inserted, if motherboard
+				firmware changes, or due to changes caused
+				by other kernel parameters. The second format
+				selects devices using IDs from the
+				configuration space which may match multiple
+				devices in the system.
+
 		earlydump	[X86] dump PCI config space before the kernel
 				changes anything
 		off		[X86] don't probe for the PCI bus
@@ -3123,11 +3140,10 @@
 				window. The default value is 64 megabytes.
 		resource_alignment=
 				Format:
-				[<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
-				[<order of align>@]pci:<vendor>:<device>\
-						[:<subvendor>:<subdevice>][; ...]
+				[<order of align>@]<pci_dev>[; ...]
 				Specifies alignment and device to reassign
-				aligned memory resources.
+				aligned memory resources. How to
+				specify the device is described above.
 				If <order of align> is not specified,
 				PAGE_SIZE is used as alignment.
 				PCI-PCI bridge can be specified, if resource
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 97acba712e4e..bec1bef6f326 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -191,6 +191,88 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
 EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
 #endif
 
+/**
+ * pci_dev_str_match - test if a string matches a device
+ * @dev:    the PCI device to test
+ * @p:      string to match the device against
+ * @endptr: pointer to the string after the match
+ *
+ * Test if a string (typically from a kernel parameter) matches a
+ * specified. The string may be of one of two forms formats:
+ *
+ *   [<domain>:]<bus>:<slot>.<func>
+ *   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
+ *
+ * The first format specifies a PCI bus/slot/function address which
+ * may change if new hardware is inserted, if motherboard firmware changes,
+ * or due to changes caused in kernel parameters.
+ *
+ * The second format matches devices using IDs in the configuration
+ * space which may match multiple devices in the system. A value of 0
+ * for any field will match all devices.
+ *
+ * Returns 1 if the string matches the device, 0 if it does not and
+ * a negative error code if the string cannot be parsed.
+ */
+static int pci_dev_str_match(struct pci_dev *dev, const char *p,
+			     const char **endptr)
+{
+	int ret;
+	int seg, bus, slot, func, count;
+	unsigned short vendor, device, subsystem_vendor, subsystem_device;
+
+	if (strncmp(p, "pci:", 4) == 0) {
+		/* PCI vendor/device (subvendor/subdevice) ids are specified */
+		p += 4;
+		ret = sscanf(p, "%hx:%hx:%hx:%hx%n", &vendor, &device,
+			     &subsystem_vendor, &subsystem_device, &count);
+		if (ret != 4) {
+			ret = sscanf(p, "%hx:%hx%n", &vendor, &device, &count);
+			if (ret != 2)
+				return -EINVAL;
+
+			subsystem_vendor = 0;
+			subsystem_device = 0;
+		}
+
+		p += count;
+
+		if ((!vendor || vendor == dev->vendor) &&
+		    (!device || device == dev->device) &&
+		    (!subsystem_vendor ||
+			    subsystem_vendor == dev->subsystem_vendor) &&
+		    (!subsystem_device ||
+			    subsystem_device == dev->subsystem_device))
+			goto found;
+
+	} else {
+		/* PCI Bus,Slot,Function ids are specified */
+		ret = sscanf(p, "%x:%x:%x.%x%n", &seg, &bus, &slot,
+			     &func, &count);
+		if (ret != 4) {
+			seg = 0;
+			ret = sscanf(p, "%x:%x.%x%n", &bus, &slot,
+				     &func, &count);
+			if (ret != 3)
+				return -EINVAL;
+		}
+
+		p += count;
+
+		if (seg == pci_domain_nr(dev->bus) &&
+		    bus == dev->bus->number &&
+		    slot == PCI_SLOT(dev->devfn) &&
+		    func == PCI_FUNC(dev->devfn))
+			goto found;
+	}
+
+	*endptr = p;
+	return 0;
+
+found:
+	*endptr = p;
+	return 1;
+}
 
 static int __pci_find_next_cap_ttl(struct pci_bus *bus, unsigned int devfn,
 				   u8 pos, int cap, int *ttl)
@@ -5454,10 +5536,10 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
 static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
 							bool *resize)
 {
-	int seg, bus, slot, func, align_order, count;
-	unsigned short vendor, device, subsystem_vendor, subsystem_device;
+	int align_order, count;
 	resource_size_t align = pcibios_default_alignment();
-	char *p;
+	const char *p;
+	int ret;
 
 	spin_lock(&resource_alignment_lock);
 	p = resource_alignment_param;
@@ -5477,58 +5559,21 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
 		} else {
 			align_order = -1;
 		}
-		if (strncmp(p, "pci:", 4) == 0) {
-			/* PCI vendor/device (subvendor/subdevice) ids are specified */
-			p += 4;
-			if (sscanf(p, "%hx:%hx:%hx:%hx%n",
-				&vendor, &device, &subsystem_vendor, &subsystem_device, &count) != 4) {
-				if (sscanf(p, "%hx:%hx%n", &vendor, &device, &count) != 2) {
-					printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: pci:%s\n",
-						p);
-					break;
-				}
-				subsystem_vendor = subsystem_device = 0;
-			}
-			p += count;
-			if ((!vendor || (vendor == dev->vendor)) &&
-				(!device || (device == dev->device)) &&
-				(!subsystem_vendor || (subsystem_vendor == dev->subsystem_vendor)) &&
-				(!subsystem_device || (subsystem_device == dev->subsystem_device))) {
-				*resize = true;
-				if (align_order == -1)
-					align = PAGE_SIZE;
-				else
-					align = 1 << align_order;
-				/* Found */
-				break;
-			}
-		}
-		else {
-			if (sscanf(p, "%x:%x:%x.%x%n",
-				&seg, &bus, &slot, &func, &count) != 4) {
-				seg = 0;
-				if (sscanf(p, "%x:%x.%x%n",
-						&bus, &slot, &func, &count) != 3) {
-					/* Invalid format */
-					printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: %s\n",
-						p);
-					break;
-				}
-			}
-			p += count;
-			if (seg == pci_domain_nr(dev->bus) &&
-				bus == dev->bus->number &&
-				slot == PCI_SLOT(dev->devfn) &&
-				func == PCI_FUNC(dev->devfn)) {
-				*resize = true;
-				if (align_order == -1)
-					align = PAGE_SIZE;
-				else
-					align = 1 << align_order;
-				/* Found */
-				break;
-			}
+
+		ret = pci_dev_str_match(dev, p, &p);
+		if (ret == 1) {
+			*resize = true;
+			if (align_order == -1)
+				align = PAGE_SIZE;
+			else
+				align = 1 << align_order;
+			break;
+		} else if (ret < 0) {
+			pr_info("PCI: Can't parse resource_alignment parameter: pci:%s\n",
+				p);
+			break;
 		}
+
 		if (*p != ';' && *p != ',') {
 			/* End of param or invalid format */
 			break;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns
  2018-06-18 19:36 [PATCH v3 0/3] Add parameter for disabling ACS redirection for P2P Logan Gunthorpe
  2018-06-18 19:36 ` [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable Logan Gunthorpe
@ 2018-06-18 19:36 ` Logan Gunthorpe
  2018-06-18 21:55   ` Alex Williamson
  2018-06-21 19:22   ` Matthew Wilcox
  2018-06-18 19:36 ` [PATCH v3 3/3] PCI: Introduce the disable_acs_redir parameter Logan Gunthorpe
  2 siblings, 2 replies; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-18 19:36 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

When specifying PCI devices on the kernel command line using a
BDF, the bus numbers can change when adding or replacing a device,
changing motherboard firmware, or applying kernel parameters like
pci=assign-buses. When this happens, it is usually undesirable to
apply whatever command line tweak to the wrong device.

Therefore, it is useful to be able to specify devices with a base
bus number and the path of devfns needed to get to it. (Similar to
the "device scope" structure in the Intel VT-d spec, Section 8.3.1.)

Thus, we add an option to specify devices in the following format:

[<domain>:]<bus>:<slot>.<func>[/<slot>.<func>][/ ...]

The path can be any segment within the PCI hierarchy of any length and
determined through the use of 'lspci -t'. When specified this way, it is
less likely that a renumbered bus will result in a valid device specification
and the tweak won't be applied to the wrong device.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   8 +-
 drivers/pci/pci.c                               | 120 ++++++++++++++++++++----
 2 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 760fb2b0b349..d45285e1ab6a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3000,14 +3000,18 @@
 				or a set of devices (<pci_dev>). These are
 				specified in one of the following formats:
 
-				[<domain>:]<bus>:<slot>.<func>
+				[<domain>:]<bus>:<slot>.<func>[/<slot>.<func>][/ ...]
 				pci:<vendor>:<device>[:<subvendor>:<subdevice>]
 
 				Note: the first format specifies a PCI
 				bus/slot/function address which may change
 				if new hardware is inserted, if motherboard
 				firmware changes, or due to changes caused
-				by other kernel parameters. The second format
+				by other kernel parameters. Optionally
+				a path from a device through multiple
+				slot/function addresses can be specified
+				after the base address (this is more robust
+				against renumbering issues). The second format
 				selects devices using IDs from the
 				configuration space which may match multiple
 				devices in the system.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index bec1bef6f326..6fbad0492461 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -192,22 +192,111 @@ EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
 #endif
 
 /**
+ * pci_dev_str_match_path - test if a path string matches a device
+ * @dev:    the PCI device to test
+ * @p:      string to match the device against
+ * @endptr: pointer to the string after the match
+ *
+ * Test if a string (typically from a kernel parameter) formated as a
+ * path of slot/function addresses matches a PCI device. The string must
+ * be of the form:
+ *
+ *   [<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
+ *
+ * A path for a device can be obtained using 'lspci -t'. Using a path
+ * is more robust against renumbering of devices than using only
+ * a single bus, slot and function address.
+ *
+ * Returns 1 if the string matches the device, 0 if it does not and
+ * a negative error code if it fails to parse the string.
+ */
+static int pci_dev_str_match_path(struct pci_dev *dev, const char *path,
+				  const char **endptr)
+{
+	int ret;
+	int seg, bus, slot, func;
+	char *wpath, *p;
+	char end;
+
+	*endptr = strchrnul(path, ';');
+
+	wpath = kmemdup_nul(path, *endptr - path, GFP_KERNEL);
+	if (!wpath)
+		return -ENOMEM;
+
+	while (1) {
+		p = strrchr(wpath, '/');
+		if (!p)
+			break;
+		ret = sscanf(p, "/%x.%x%c", &slot, &func, &end);
+		if (ret != 2) {
+			ret = -EINVAL;
+			goto free_and_exit;
+		}
+
+		if (dev->devfn != PCI_DEVFN(slot, func)) {
+			ret = 0;
+			goto free_and_exit;
+		}
+
+		/*
+		 * Note: we don't need to get a reference to the upstream
+		 * bridge because we hold a reference to the top level
+		 * device which should hold a reference to the bridge,
+		 * and so on.
+		 */
+		dev = pci_upstream_bridge(dev);
+		if (!dev) {
+			ret = 0;
+			goto free_and_exit;
+		}
+
+		*p = 0;
+	}
+
+	ret = sscanf(wpath, "%x:%x:%x.%x%c", &seg, &bus, &slot,
+		     &func, &end);
+	if (ret != 4) {
+		seg = 0;
+		ret = sscanf(wpath, "%x:%x.%x%c", &bus, &slot, &func, &end);
+		if (ret != 3) {
+			ret = -EINVAL;
+			goto free_and_exit;
+		}
+	}
+
+	ret = (seg == pci_domain_nr(dev->bus) &&
+	       bus == dev->bus->number &&
+	       dev->devfn == PCI_DEVFN(slot, func));
+
+free_and_exit:
+	kfree(wpath);
+	return ret;
+}
+
+/**
  * pci_dev_str_match - test if a string matches a device
  * @dev:    the PCI device to test
  * @p:      string to match the device against
  * @endptr: pointer to the string after the match
  *
  * Test if a string (typically from a kernel parameter) matches a
- * specified. The string may be of one of two forms formats:
+ * specified. The string may be of one of three formats:
  *
  *   [<domain>:]<bus>:<slot>.<func>
+ *   path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
  *   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
  *
  * The first format specifies a PCI bus/slot/function address which
  * may change if new hardware is inserted, if motherboard firmware changes,
  * or due to changes caused in kernel parameters.
  *
- * The second format matches devices using IDs in the configuration
+ * The second format specifies a PCI bus/slot/function root address and
+ * a path of slot/function addresses to the specific device from the root.
+ * The path for a device can be determined through the use of 'lspci -t'.
+ * This format is more robust against renumbering issues than the first format.
+
+ * The third format matches devices using IDs in the configuration
  * space which may match multiple devices in the system. A value of 0
  * for any field will match all devices.
  *
@@ -218,7 +307,7 @@ static int pci_dev_str_match(struct pci_dev *dev, const char *p,
 			     const char **endptr)
 {
 	int ret;
-	int seg, bus, slot, func, count;
+	int count;
 	unsigned short vendor, device, subsystem_vendor, subsystem_device;
 
 	if (strncmp(p, "pci:", 4) == 0) {
@@ -244,25 +333,16 @@ static int pci_dev_str_match(struct pci_dev *dev, const char *p,
 		    (!subsystem_device ||
 			    subsystem_device == dev->subsystem_device))
 			goto found;
-
 	} else {
-		/* PCI Bus,Slot,Function ids are specified */
-		ret = sscanf(p, "%x:%x:%x.%x%n", &seg, &bus, &slot,
-			     &func, &count);
-		if (ret != 4) {
-			seg = 0;
-			ret = sscanf(p, "%x:%x.%x%n", &bus, &slot,
-				     &func, &count);
-			if (ret != 3)
-				return -EINVAL;
-		}
-
-		p += count;
+		/*
+		 * PCI Bus,Slot,Function ids are specified
+		 *  (optionally, may include a path of devfns following it)
+		 */
 
-		if (seg == pci_domain_nr(dev->bus) &&
-		    bus == dev->bus->number &&
-		    slot == PCI_SLOT(dev->devfn) &&
-		    func == PCI_FUNC(dev->devfn))
+		ret = pci_dev_str_match_path(dev, p, &p);
+		if (ret < 0)
+			return ret;
+		else if (ret)
 			goto found;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/3] PCI: Introduce the disable_acs_redir parameter
  2018-06-18 19:36 [PATCH v3 0/3] Add parameter for disabling ACS redirection for P2P Logan Gunthorpe
  2018-06-18 19:36 ` [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable Logan Gunthorpe
  2018-06-18 19:36 ` [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns Logan Gunthorpe
@ 2018-06-18 19:36 ` Logan Gunthorpe
  2018-06-18 22:21   ` Alex Williamson
  2 siblings, 1 reply; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-18 19:36 UTC (permalink / raw)
  To: linux-kernel, linux-pci, linux-doc
  Cc: Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt, Alex Williamson,
	Christian König, Logan Gunthorpe

In order to support P2P traffic on a segment of the PCI hierarchy,
we must be able to disable the ACS redirect bits for select
PCI bridges. The bridges must be selected before the devices are
discovered by the kernel and the IOMMU groups created. Therefore,
a kernel command line parameter is created to specify devices
which must have their ACS bits disabled.

The new parameter takes a list of devices separated by a semicolon.
Each device specified will have it's ACS redirect bits disabled.
This is similar to the existing 'resource_alignment' parameter and just
like it we also create a sysfs bus attribute which can be used to
read the parameter. Writing the parameter is not supported
as it would require forcibly hot plugging the affected device as
well as all devices whose IOMMU groups might change.

The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
Egress Control bits are disabled which is sufficient to always allow
passing P2P traffic uninterrupted. The bits are set after the kernel
(optionally) enables the ACS bits itself. It is also done regardless of
whether the kernel sets the bits or not seeing some BIOS firmware is known
to set the bits on boot.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   9 +++
 drivers/pci/pci.c                               | 103 +++++++++++++++++++++++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d45285e1ab6a..2ec36e258bb0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3190,6 +3190,15 @@
 				Adding the window is slightly risky (it may
 				conflict with unreported devices), so this
 				taints the kernel.
+		disable_acs_redir=<pci_dev>[; ...]
+				Specify one or more PCI devices (in the format
+				specified above) separated by semicolons.
+				Each device specified will have the PCI ACS
+				redirect capabilities forced off which will
+				allow P2P traffic between devices through
+				bridges without forcing it upstream. Note:
+				this removes isolation between devices and
+				will make the IOMMU groups less granular.
 
 	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
 			Management.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6fbad0492461..eb85bf507398 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2982,6 +2982,92 @@ void pci_request_acs(void)
 	pci_acs_enable = 1;
 }
 
+#define DISABLE_ACS_REDIR_PARAM_SIZE COMMAND_LINE_SIZE
+static char disable_acs_redir_param[DISABLE_ACS_REDIR_PARAM_SIZE] = {0};
+static DEFINE_SPINLOCK(disable_acs_redir_lock);
+
+static ssize_t pci_set_disable_acs_redir_param(const char *buf, size_t count)
+{
+	if (count > DISABLE_ACS_REDIR_PARAM_SIZE - 1)
+		count = DISABLE_ACS_REDIR_PARAM_SIZE - 1;
+	spin_lock(&disable_acs_redir_lock);
+	strncpy(disable_acs_redir_param, buf, count);
+	disable_acs_redir_param[count] = '\0';
+	spin_unlock(&disable_acs_redir_lock);
+	return count;
+}
+
+static ssize_t pci_disable_acs_redir_show(struct bus_type *bus, char *buf)
+{
+	size_t count;
+
+	spin_lock(&disable_acs_redir_lock);
+	count = snprintf(buf, PAGE_SIZE, "%s\n", disable_acs_redir_param);
+	spin_unlock(&disable_acs_redir_lock);
+	return count;
+}
+
+static BUS_ATTR(disable_acs_redir, 0444, pci_disable_acs_redir_show, NULL);
+
+static int __init pci_disable_acs_redir_sysfs_init(void)
+{
+	return bus_create_file(&pci_bus_type, &bus_attr_disable_acs_redir);
+}
+late_initcall(pci_disable_acs_redir_sysfs_init);
+
+/**
+ * pci_disable_acs_redir - disable ACS redirect capabilities
+ * @dev: the PCI device
+ *
+ * For only devices specified in the disable_acs_redir parameter.
+ */
+static void pci_disable_acs_redir(struct pci_dev *dev)
+{
+	int ret = 0;
+	const char *p;
+	int pos;
+	u16 ctrl;
+
+	spin_lock(&disable_acs_redir_lock);
+
+	p = disable_acs_redir_param;
+	while (*p) {
+		ret = pci_dev_str_match(dev, p, &p);
+		if (ret < 0) {
+			pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
+				     disable_acs_redir_param);
+
+			break;
+		} else if (ret == 1) {
+			/* Found a match */
+			break;
+		}
+
+		if (*p != ';' && *p != ',') {
+			/* End of param or invalid format */
+			break;
+		}
+		p++;
+	}
+	spin_unlock(&disable_acs_redir_lock);
+
+	if (ret != 1)
+		return;
+
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+	/* P2P Request & Completion Redirect */
+	ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
+
+	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+
+	pci_info(dev, "disabled ACS redirect\n");
+}
+
 /**
  * pci_std_enable_acs - enable ACS on devices using standard ACS capabilites
  * @dev: the PCI device
@@ -3021,12 +3107,22 @@ static void pci_std_enable_acs(struct pci_dev *dev)
 void pci_enable_acs(struct pci_dev *dev)
 {
 	if (!pci_acs_enable)
-		return;
+		goto disable_acs_redir;
 
 	if (!pci_dev_specific_enable_acs(dev))
-		return;
+		goto disable_acs_redir;
 
 	pci_std_enable_acs(dev);
+
+disable_acs_redir:
+	/*
+	 * Note: pci_disable_acs_redir() must be called even if
+	 * ACS is not enabled by the kernel because the firmware
+	 * may have unexpectedly set the flags. So if we are told
+	 * to disable it, we should always disable it after setting
+	 * the kernel's default preferences.
+	 */
+	pci_disable_acs_redir(dev);
 }
 
 static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
@@ -5966,6 +6062,9 @@ static int __init pci_setup(char *str)
 				pcie_bus_config = PCIE_BUS_PEER2PEER;
 			} else if (!strncmp(str, "pcie_scan_all", 13)) {
 				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
+			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
+				pci_set_disable_acs_redir_param(str + 18,
+					strlen(str + 18));
 			} else {
 				printk(KERN_ERR "PCI: Unknown option `%s'\n",
 						str);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable
  2018-06-18 19:36 ` [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable Logan Gunthorpe
@ 2018-06-18 21:44   ` Alex Williamson
  2018-06-18 21:49     ` Logan Gunthorpe
  2018-06-18 23:06     ` Andy Shevchenko
  0 siblings, 2 replies; 13+ messages in thread
From: Alex Williamson @ 2018-06-18 21:44 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Christian König

On Mon, 18 Jun 2018 13:36:34 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> Separate out the code to match a PCI device with a string (typically
> originating from a kernel parameter) from the
> pci_specified_resource_alignment() function into its own helper
> function.
> 
> While we are at it, this change fixes the kernel style of the function
> (fixing a number of long lines and extra parentheses).
> 
> Additionally, make the analogous change to the kernel parameter
> documentation: Separating the description of how to specify a PCI device
> into it's own section at the head of the pci= parameter.
> 
> This patch should have no functional alterations.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Stephen Bates <sbates@raithlin.com>
> Acked-by: Christian König <christian.koenig@amd.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  26 +++-
>  drivers/pci/pci.c                               | 153 +++++++++++++++---------
>  2 files changed, 120 insertions(+), 59 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index efc7aa7a0670..760fb2b0b349 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2994,7 +2994,24 @@
>  			See header of drivers/block/paride/pcd.c.
>  			See also Documentation/blockdev/paride.txt.
>  
> -	pci=option[,option...]	[PCI] various PCI subsystem options:
> +	pci=option[,option...]	[PCI] various PCI subsystem options.
> +
> +				Some options herein operate on a specific device
> +				or a set of devices (<pci_dev>). These are
> +				specified in one of the following formats:
> +
> +				[<domain>:]<bus>:<slot>.<func>
> +				pci:<vendor>:<device>[:<subvendor>:<subdevice>]
> +
> +				Note: the first format specifies a PCI
> +				bus/slot/function address which may change
> +				if new hardware is inserted, if motherboard
> +				firmware changes, or due to changes caused
> +				by other kernel parameters. The second format

Nit, is it worth noting that when unspecified, domain is zero?

> +				selects devices using IDs from the
> +				configuration space which may match multiple
> +				devices in the system.
> +
>  		earlydump	[X86] dump PCI config space before the kernel
>  				changes anything
>  		off		[X86] don't probe for the PCI bus
> @@ -3123,11 +3140,10 @@
>  				window. The default value is 64 megabytes.
>  		resource_alignment=
>  				Format:
> -				[<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
> -				[<order of align>@]pci:<vendor>:<device>\
> -						[:<subvendor>:<subdevice>][; ...]
> +				[<order of align>@]<pci_dev>[; ...]
>  				Specifies alignment and device to reassign
> -				aligned memory resources.
> +				aligned memory resources. How to
> +				specify the device is described above.
>  				If <order of align> is not specified,
>  				PAGE_SIZE is used as alignment.
>  				PCI-PCI bridge can be specified, if resource
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 97acba712e4e..bec1bef6f326 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -191,6 +191,88 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
>  EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
>  #endif
>  
> +/**
> + * pci_dev_str_match - test if a string matches a device
> + * @dev:    the PCI device to test
> + * @p:      string to match the device against
> + * @endptr: pointer to the string after the match
> + *
> + * Test if a string (typically from a kernel parameter) matches a
> + * specified. The string may be of one of two forms formats:
> + *
> + *   [<domain>:]<bus>:<slot>.<func>
> + *   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
> + *
> + * The first format specifies a PCI bus/slot/function address which
> + * may change if new hardware is inserted, if motherboard firmware changes,
> + * or due to changes caused in kernel parameters.
> + *
> + * The second format matches devices using IDs in the configuration
> + * space which may match multiple devices in the system. A value of 0
> + * for any field will match all devices.

I realize this is not a change in behavior, but since we're spelling it
out in a proper comment rather than burying it in the implementation,
using 0 as a wildcard is rather questionable behavior.  It always
surprises me when I read this because pci_match_one_device() uses
PCI_ANY_ID (~0) as a wildcard and as a result of struct pci_device_id
using __u32 for these fields, we actually need to specify ffffffff on
the commandline to get a wildcard match for dynamic ids.  The latter is
tedious to use, but I think it's more correct, and the use of a __u32 is
probably attributed to the fact that 0xffff is only reserved for vendor
ID, the spec doesn't seem to reserve any entries from the vendor's
device ID range.

There's probably really no path to resolve these, but acknowledging the
difference in this comment block might be helpful in the future.

> + *
> + * Returns 1 if the string matches the device, 0 if it does not and
> + * a negative error code if the string cannot be parsed.
> + */
> +static int pci_dev_str_match(struct pci_dev *dev, const char *p,
> +			     const char **endptr)
> +{
> +	int ret;
> +	int seg, bus, slot, func, count;
> +	unsigned short vendor, device, subsystem_vendor, subsystem_device;
> +
> +	if (strncmp(p, "pci:", 4) == 0) {
> +		/* PCI vendor/device (subvendor/subdevice) ids are specified */
> +		p += 4;
> +		ret = sscanf(p, "%hx:%hx:%hx:%hx%n", &vendor, &device,
> +			     &subsystem_vendor, &subsystem_device, &count);
> +		if (ret != 4) {
> +			ret = sscanf(p, "%hx:%hx%n", &vendor, &device, &count);
> +			if (ret != 2)
> +				return -EINVAL;
> +
> +			subsystem_vendor = 0;
> +			subsystem_device = 0;
> +		}
> +
> +		p += count;
> +
> +		if ((!vendor || vendor == dev->vendor) &&
> +		    (!device || device == dev->device) &&
> +		    (!subsystem_vendor ||
> +			    subsystem_vendor == dev->subsystem_vendor) &&
> +		    (!subsystem_device ||
> +			    subsystem_device == dev->subsystem_device))
> +			goto found;
> +
> +	} else {
> +		/* PCI Bus,Slot,Function ids are specified */
> +		ret = sscanf(p, "%x:%x:%x.%x%n", &seg, &bus, &slot,
> +			     &func, &count);
> +		if (ret != 4) {
> +			seg = 0;
> +			ret = sscanf(p, "%x:%x.%x%n", &bus, &slot,
> +				     &func, &count);
> +			if (ret != 3)
> +				return -EINVAL;
> +		}
> +
> +		p += count;
> +
> +		if (seg == pci_domain_nr(dev->bus) &&
> +		    bus == dev->bus->number &&
> +		    slot == PCI_SLOT(dev->devfn) &&
> +		    func == PCI_FUNC(dev->devfn))
> +			goto found;
> +	}
> +
> +	*endptr = p;
> +	return 0;
> +
> +found:
> +	*endptr = p;
> +	return 1;
> +}
>  
>  static int __pci_find_next_cap_ttl(struct pci_bus *bus, unsigned int devfn,
>  				   u8 pos, int cap, int *ttl)
> @@ -5454,10 +5536,10 @@ static DEFINE_SPINLOCK(resource_alignment_lock);
>  static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>  							bool *resize)
>  {
> -	int seg, bus, slot, func, align_order, count;
> -	unsigned short vendor, device, subsystem_vendor, subsystem_device;
> +	int align_order, count;
>  	resource_size_t align = pcibios_default_alignment();
> -	char *p;
> +	const char *p;
> +	int ret;
>  
>  	spin_lock(&resource_alignment_lock);
>  	p = resource_alignment_param;
> @@ -5477,58 +5559,21 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev,
>  		} else {
>  			align_order = -1;
>  		}
> -		if (strncmp(p, "pci:", 4) == 0) {
> -			/* PCI vendor/device (subvendor/subdevice) ids are specified */
> -			p += 4;
> -			if (sscanf(p, "%hx:%hx:%hx:%hx%n",
> -				&vendor, &device, &subsystem_vendor, &subsystem_device, &count) != 4) {
> -				if (sscanf(p, "%hx:%hx%n", &vendor, &device, &count) != 2) {
> -					printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: pci:%s\n",
> -						p);
> -					break;
> -				}
> -				subsystem_vendor = subsystem_device = 0;
> -			}
> -			p += count;
> -			if ((!vendor || (vendor == dev->vendor)) &&
> -				(!device || (device == dev->device)) &&
> -				(!subsystem_vendor || (subsystem_vendor == dev->subsystem_vendor)) &&
> -				(!subsystem_device || (subsystem_device == dev->subsystem_device))) {
> -				*resize = true;
> -				if (align_order == -1)
> -					align = PAGE_SIZE;
> -				else
> -					align = 1 << align_order;
> -				/* Found */
> -				break;
> -			}
> -		}
> -		else {
> -			if (sscanf(p, "%x:%x:%x.%x%n",
> -				&seg, &bus, &slot, &func, &count) != 4) {
> -				seg = 0;
> -				if (sscanf(p, "%x:%x.%x%n",
> -						&bus, &slot, &func, &count) != 3) {
> -					/* Invalid format */
> -					printk(KERN_ERR "PCI: Can't parse resource_alignment parameter: %s\n",
> -						p);
> -					break;
> -				}
> -			}
> -			p += count;
> -			if (seg == pci_domain_nr(dev->bus) &&
> -				bus == dev->bus->number &&
> -				slot == PCI_SLOT(dev->devfn) &&
> -				func == PCI_FUNC(dev->devfn)) {
> -				*resize = true;
> -				if (align_order == -1)
> -					align = PAGE_SIZE;
> -				else
> -					align = 1 << align_order;
> -				/* Found */
> -				break;
> -			}
> +
> +		ret = pci_dev_str_match(dev, p, &p);
> +		if (ret == 1) {
> +			*resize = true;
> +			if (align_order == -1)
> +				align = PAGE_SIZE;
> +			else
> +				align = 1 << align_order;
> +			break;
> +		} else if (ret < 0) {
> +			pr_info("PCI: Can't parse resource_alignment parameter: pci:%s\n",


The "pci:" prefix on %s doesn't make sense now, it was used above when
the pointer was already advanced past this token, now I believe it would
lead to "pci:pci:xxxx:yyyy" or "pci:xx:yy.z".  Thanks,

Alex

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable
  2018-06-18 21:44   ` Alex Williamson
@ 2018-06-18 21:49     ` Logan Gunthorpe
  2018-06-18 23:06     ` Andy Shevchenko
  1 sibling, 0 replies; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-18 21:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Christian König

On 6/18/2018 3:44 PM, Alex Williamson wrote:
>> + *
>> + * The second format matches devices using IDs in the configuration
>> + * space which may match multiple devices in the system. A value of 0
>> + * for any field will match all devices.
> 
> I realize this is not a change in behavior, but since we're spelling it
> out in a proper comment rather than burying it in the implementation,
> using 0 as a wildcard is rather questionable behavior.  It always
> surprises me when I read this because pci_match_one_device() uses
> PCI_ANY_ID (~0) as a wildcard and as a result of struct pci_device_id
> using __u32 for these fields, we actually need to specify ffffffff on
> the commandline to get a wildcard match for dynamic ids.  The latter is
> tedious to use, but I think it's more correct, and the use of a __u32 is
> probably attributed to the fact that 0xffff is only reserved for vendor
> ID, the spec doesn't seem to reserve any entries from the vendor's
> device ID range.
> 
> There's probably really no path to resolve these, but acknowledging the
> difference in this comment block might be helpful in the future.

Ok, I'll add a note in the comment.
>> +		ret = pci_dev_str_match(dev, p, &p);
>> +		if (ret == 1) {
>> +			*resize = true;
>> +			if (align_order == -1)
>> +				align = PAGE_SIZE;
>> +			else
>> +				align = 1 << align_order;
>> +			break;
>> +		} else if (ret < 0) {
>> +			pr_info("PCI: Can't parse resource_alignment parameter: pci:%s\n",
> 
> 
> The "pci:" prefix on %s doesn't make sense now, it was used above when
> the pointer was already advanced past this token, now I believe it would
> lead to "pci:pci:xxxx:yyyy" or "pci:xx:yy.z".  Thanks,

Yup, nice catch. I'll fix it for v4.

Logan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns
  2018-06-18 19:36 ` [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns Logan Gunthorpe
@ 2018-06-18 21:55   ` Alex Williamson
  2018-06-21 19:22   ` Matthew Wilcox
  1 sibling, 0 replies; 13+ messages in thread
From: Alex Williamson @ 2018-06-18 21:55 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Christian König

On Mon, 18 Jun 2018 13:36:35 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> When specifying PCI devices on the kernel command line using a
> BDF, the bus numbers can change when adding or replacing a device,
> changing motherboard firmware, or applying kernel parameters like
> pci=assign-buses. When this happens, it is usually undesirable to
> apply whatever command line tweak to the wrong device.
> 
> Therefore, it is useful to be able to specify devices with a base
> bus number and the path of devfns needed to get to it. (Similar to
> the "device scope" structure in the Intel VT-d spec, Section 8.3.1.)
> 
> Thus, we add an option to specify devices in the following format:
> 
> [<domain>:]<bus>:<slot>.<func>[/<slot>.<func>][/ ...]
> 
> The path can be any segment within the PCI hierarchy of any length and
> determined through the use of 'lspci -t'. When specified this way, it is
> less likely that a renumbered bus will result in a valid device specification
> and the tweak won't be applied to the wrong device.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Stephen Bates <sbates@raithlin.com>
> Acked-by: Christian König <christian.koenig@amd.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |   8 +-
>  drivers/pci/pci.c                               | 120 ++++++++++++++++++++----
>  2 files changed, 106 insertions(+), 22 deletions(-)

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] PCI: Introduce the disable_acs_redir parameter
  2018-06-18 19:36 ` [PATCH v3 3/3] PCI: Introduce the disable_acs_redir parameter Logan Gunthorpe
@ 2018-06-18 22:21   ` Alex Williamson
  2018-06-18 22:25     ` Logan Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2018-06-18 22:21 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Christian König

On Mon, 18 Jun 2018 13:36:36 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> In order to support P2P traffic on a segment of the PCI hierarchy,
> we must be able to disable the ACS redirect bits for select
> PCI bridges. The bridges must be selected before the devices are
> discovered by the kernel and the IOMMU groups created. Therefore,
> a kernel command line parameter is created to specify devices
> which must have their ACS bits disabled.
> 
> The new parameter takes a list of devices separated by a semicolon.
> Each device specified will have it's ACS redirect bits disabled.
> This is similar to the existing 'resource_alignment' parameter and just
> like it we also create a sysfs bus attribute which can be used to
> read the parameter. Writing the parameter is not supported
> as it would require forcibly hot plugging the affected device as
> well as all devices whose IOMMU groups might change.

Why do we need a sysfs attribute for this if it's static, can't we just
see it in /proc/cmdline?  Seems to be a fair bit of overhead to support
for as little as we can do with it.

> The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P
> Egress Control bits are disabled which is sufficient to always allow
> passing P2P traffic uninterrupted. The bits are set after the kernel
> (optionally) enables the ACS bits itself. It is also done regardless of
> whether the kernel sets the bits or not seeing some BIOS firmware is known
> to set the bits on boot.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Stephen Bates <sbates@raithlin.com>
> Acked-by: Christian König <christian.koenig@amd.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |   9 +++
>  drivers/pci/pci.c                               | 103 +++++++++++++++++++++++-
>  2 files changed, 110 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index d45285e1ab6a..2ec36e258bb0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3190,6 +3190,15 @@
>  				Adding the window is slightly risky (it may
>  				conflict with unreported devices), so this
>  				taints the kernel.
> +		disable_acs_redir=<pci_dev>[; ...]
> +				Specify one or more PCI devices (in the format
> +				specified above) separated by semicolons.
> +				Each device specified will have the PCI ACS
> +				redirect capabilities forced off which will
> +				allow P2P traffic between devices through
> +				bridges without forcing it upstream. Note:
> +				this removes isolation between devices and
> +				will make the IOMMU groups less granular.
>  
>  	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
>  			Management.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 6fbad0492461..eb85bf507398 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2982,6 +2982,92 @@ void pci_request_acs(void)
>  	pci_acs_enable = 1;
>  }
>  
> +#define DISABLE_ACS_REDIR_PARAM_SIZE COMMAND_LINE_SIZE
> +static char disable_acs_redir_param[DISABLE_ACS_REDIR_PARAM_SIZE] = {0};

Hmm, wouldn't this be initialized to zero anyway?  I'm surprised
resource alignment is already wasting this sort of space vs dynamically
allocating, I'm not sure it's a good example to follow.

> +static DEFINE_SPINLOCK(disable_acs_redir_lock);
> +
> +static ssize_t pci_set_disable_acs_redir_param(const char *buf, size_t count)
> +{
> +	if (count > DISABLE_ACS_REDIR_PARAM_SIZE - 1)
> +		count = DISABLE_ACS_REDIR_PARAM_SIZE - 1;
> +	spin_lock(&disable_acs_redir_lock);
> +	strncpy(disable_acs_redir_param, buf, count);
> +	disable_acs_redir_param[count] = '\0';
> +	spin_unlock(&disable_acs_redir_lock);
> +	return count;
> +}
> +
> +static ssize_t pci_disable_acs_redir_show(struct bus_type *bus, char *buf)
> +{
> +	size_t count;
> +
> +	spin_lock(&disable_acs_redir_lock);
> +	count = snprintf(buf, PAGE_SIZE, "%s\n", disable_acs_redir_param);
> +	spin_unlock(&disable_acs_redir_lock);
> +	return count;
> +}
> +
> +static BUS_ATTR(disable_acs_redir, 0444, pci_disable_acs_redir_show, NULL);
> +
> +static int __init pci_disable_acs_redir_sysfs_init(void)
> +{
> +	return bus_create_file(&pci_bus_type, &bus_attr_disable_acs_redir);
> +}
> +late_initcall(pci_disable_acs_redir_sysfs_init);
> +
> +/**
> + * pci_disable_acs_redir - disable ACS redirect capabilities
> + * @dev: the PCI device
> + *
> + * For only devices specified in the disable_acs_redir parameter.
> + */
> +static void pci_disable_acs_redir(struct pci_dev *dev)
> +{
> +	int ret = 0;
> +	const char *p;
> +	int pos;
> +	u16 ctrl;
> +
> +	spin_lock(&disable_acs_redir_lock);
> +
> +	p = disable_acs_redir_param;
> +	while (*p) {
> +		ret = pci_dev_str_match(dev, p, &p);
> +		if (ret < 0) {
> +			pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
> +				     disable_acs_redir_param);
> +
> +			break;
> +		} else if (ret == 1) {
> +			/* Found a match */
> +			break;
> +		}
> +
> +		if (*p != ';' && *p != ',') {
> +			/* End of param or invalid format */
> +			break;
> +		}
> +		p++;
> +	}
> +	spin_unlock(&disable_acs_redir_lock);
> +
> +	if (ret != 1)
> +		return;
> +
> +	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> +	if (!pos)
> +		return;
> +
> +	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> +
> +	/* P2P Request & Completion Redirect */
> +	ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
> +
> +	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> +
> +	pci_info(dev, "disabled ACS redirect\n");
> +}

Seems that too much is taken from the dynamic resource alignment that
doesn't necessarily apply to a read-only, commandline-only, device
discovery-only option.  I don't think we need locking, I don't think we
need a massive static buffer, ideally perhaps even no buffer, just a
pointer to commandline.  Thanks,

Alex

> +
>  /**
>   * pci_std_enable_acs - enable ACS on devices using standard ACS capabilites
>   * @dev: the PCI device
> @@ -3021,12 +3107,22 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>  void pci_enable_acs(struct pci_dev *dev)
>  {
>  	if (!pci_acs_enable)
> -		return;
> +		goto disable_acs_redir;
>  
>  	if (!pci_dev_specific_enable_acs(dev))
> -		return;
> +		goto disable_acs_redir;
>  
>  	pci_std_enable_acs(dev);
> +
> +disable_acs_redir:
> +	/*
> +	 * Note: pci_disable_acs_redir() must be called even if
> +	 * ACS is not enabled by the kernel because the firmware
> +	 * may have unexpectedly set the flags. So if we are told
> +	 * to disable it, we should always disable it after setting
> +	 * the kernel's default preferences.
> +	 */
> +	pci_disable_acs_redir(dev);
>  }
>  
>  static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
> @@ -5966,6 +6062,9 @@ static int __init pci_setup(char *str)
>  				pcie_bus_config = PCIE_BUS_PEER2PEER;
>  			} else if (!strncmp(str, "pcie_scan_all", 13)) {
>  				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
> +			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
> +				pci_set_disable_acs_redir_param(str + 18,
> +					strlen(str + 18));
>  			} else {
>  				printk(KERN_ERR "PCI: Unknown option `%s'\n",
>  						str);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] PCI: Introduce the disable_acs_redir parameter
  2018-06-18 22:21   ` Alex Williamson
@ 2018-06-18 22:25     ` Logan Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-18 22:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Christian König



On 18/06/18 04:21 PM, Alex Williamson wrote:
> Seems that too much is taken from the dynamic resource alignment that
> doesn't necessarily apply to a read-only, commandline-only, device
> discovery-only option.  I don't think we need locking, I don't think we
> need a massive static buffer, ideally perhaps even no buffer, just a
> pointer to commandline.  Thanks,

Yeah, fair points. I'll take another look and see what I can strip out.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable
  2018-06-18 21:44   ` Alex Williamson
  2018-06-18 21:49     ` Logan Gunthorpe
@ 2018-06-18 23:06     ` Andy Shevchenko
  2018-06-18 23:11       ` Logan Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: Andy Shevchenko @ 2018-06-18 23:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Logan Gunthorpe, Linux Kernel Mailing List, linux-pci,
	Linux Documentation List, Stephen Bates, Christoph Hellwig,
	Bjorn Helgaas, Jonathan Corbet, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Christian König

On Tue, Jun 19, 2018 at 12:44 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 18 Jun 2018 13:36:34 -0600
> Logan Gunthorpe <logang@deltatee.com> wrote:

> I realize this is not a change in behavior, but since we're spelling it
> out in a proper comment rather than burying it in the implementation,
> using 0 as a wildcard is rather questionable behavior.  It always
> surprises me when I read this because pci_match_one_device() uses
> PCI_ANY_ID (~0) as a wildcard and as a result of struct pci_device_id
> using __u32 for these fields, we actually need to specify ffffffff on
> the commandline to get a wildcard match for dynamic ids.  The latter is
> tedious to use, but I think it's more correct, and the use of a __u32 is
> probably attributed to the fact that 0xffff is only reserved for vendor
> ID, the spec doesn't seem to reserve any entries from the vendor's
> device ID range.
>
> There's probably really no path to resolve these, but acknowledging the
> difference in this comment block might be helpful in the future.

...or introduce a parser part to allow user supply "any" instead of
numeric value.

>> +                     pr_info("PCI: Can't parse resource_alignment parameter: pci:%s\n",

> The "pci:" prefix on %s doesn't make sense now, it was used above when
> the pointer was already advanced past this token, now I believe it would
> lead to "pci:pci:xxxx:yyyy" or "pci:xx:yy.z".  Thanks,

I'm just wondering if we can use pci_info() here, Or it makes no sense?
Also, the original loglevel was an "error".

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable
  2018-06-18 23:06     ` Andy Shevchenko
@ 2018-06-18 23:11       ` Logan Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-18 23:11 UTC (permalink / raw)
  To: Andy Shevchenko, Alex Williamson
  Cc: Linux Kernel Mailing List, linux-pci, Linux Documentation List,
	Stephen Bates, Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Marc Zyngier,
	Kai-Heng Feng, Frederic Weisbecker, Dan Williams,
	Jérôme Glisse, Benjamin Herrenschmidt,
	Christian König



On 18/06/18 05:06 PM, Andy Shevchenko wrote:
> On Tue, Jun 19, 2018 at 12:44 AM, Alex Williamson
>> There's probably really no path to resolve these, but acknowledging the
>> difference in this comment block might be helpful in the future.
> 
> ...or introduce a parser part to allow user supply "any" instead of
> numeric value.

I think the main difficulty is maintaining backwards compatibility. If
anyone is already using zero as a parameter then we will break their system.

>>> +                     pr_info("PCI: Can't parse resource_alignment parameter: pci:%s\n",
> 
>> The "pci:" prefix on %s doesn't make sense now, it was used above when
>> the pointer was already advanced past this token, now I believe it would
>> lead to "pci:pci:xxxx:yyyy" or "pci:xx:yy.z".  Thanks,
> 
> I'm just wondering if we can use pci_info() here, Or it makes no sense?
> Also, the original loglevel was an "error".

Yeah, I don't think pci_info() makes sense as it's not attached to a
specific device.

Not sure how I messed up the log level, but I'll fix it for v4.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns
  2018-06-18 19:36 ` [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns Logan Gunthorpe
  2018-06-18 21:55   ` Alex Williamson
@ 2018-06-21 19:22   ` Matthew Wilcox
  2018-06-21 21:00     ` Logan Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2018-06-21 19:22 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König

On Mon, Jun 18, 2018 at 01:36:35PM -0600, Logan Gunthorpe wrote:
> @@ -3000,14 +3000,18 @@
>  				or a set of devices (<pci_dev>). These are
>  				specified in one of the following formats:
>  
> -				[<domain>:]<bus>:<slot>.<func>
> +				[<domain>:]<bus>:<slot>.<func>[/<slot>.<func>][/ ...]

How about:
+				[<domain>:]<bus>:<slot>.<func>[/<slot>.<func>]*

> -				by other kernel parameters. The second format
> +				by other kernel parameters. Optionally
> +				a path from a device through multiple

I think that's "a path to a device", because you'd start by specifying the
root port, then continuing down the hierarchy, right?

> + * Test if a string (typically from a kernel parameter) formated as a

formatted

> + * path of slot/function addresses matches a PCI device. The string must
> + * be of the form:
> + *
> + *   [<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
> + *
> + * A path for a device can be obtained using 'lspci -t'. Using a path
> + * is more robust against renumbering of devices than using only

I'd call it bus renumbering rather than device renumbering.  After all,
if the device got renumbered, this would fail ;-)

>   * pci_dev_str_match - test if a string matches a device
>   * @dev:    the PCI device to test
>   * @p:      string to match the device against
>   * @endptr: pointer to the string after the match
>   *
>   * Test if a string (typically from a kernel parameter) matches a
> - * specified. The string may be of one of two forms formats:
> + * specified. The string may be of one of three formats:

Surely just "The string may be in one of three formats"

>   *
>   *   [<domain>:]<bus>:<slot>.<func>
> + *   path:[<domain>:]<bus>:<slot>.<func>/<slot>.<func>[/ ...]
>   *   pci:<vendor>:<device>[:<subvendor>:<subdevice>]

I think you're dropped the "path:" prefix from your parser?

>   * The first format specifies a PCI bus/slot/function address which
>   * may change if new hardware is inserted, if motherboard firmware changes,
>   * or due to changes caused in kernel parameters.
>   *
> - * The second format matches devices using IDs in the configuration
> + * The second format specifies a PCI bus/slot/function root address and
> + * a path of slot/function addresses to the specific device from the root.
> + * The path for a device can be determined through the use of 'lspci -t'.
> + * This format is more robust against renumbering issues than the first format.
> +
> + * The third format matches devices using IDs in the configuration
>   * space which may match multiple devices in the system. A value of 0
>   * for any field will match all devices.
>   *

So you probably want to reword this too.  Two formats, one with optional
trailing path elements?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns
  2018-06-21 19:22   ` Matthew Wilcox
@ 2018-06-21 21:00     ` Logan Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Logan Gunthorpe @ 2018-06-21 21:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, linux-pci, linux-doc, Stephen Bates,
	Christoph Hellwig, Bjorn Helgaas, Jonathan Corbet, Ingo Molnar,
	Thomas Gleixner, Paul E. McKenney, Marc Zyngier, Kai-Heng Feng,
	Frederic Weisbecker, Dan Williams, Jérôme Glisse,
	Benjamin Herrenschmidt, Alex Williamson, Christian König


On 21/06/18 01:22 PM, Matthew Wilcox wrote:
> On Mon, Jun 18, 2018 at 01:36:35PM -0600, Logan Gunthorpe wrote:
> So you probably want to reword this too.  Two formats, one with optional
> trailing path elements?

Thanks for the review Matthew! I've addressed all your feedback for a v4
which I'll post shortly.

Logan


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-06-21 21:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-18 19:36 [PATCH v3 0/3] Add parameter for disabling ACS redirection for P2P Logan Gunthorpe
2018-06-18 19:36 ` [PATCH v3 1/3] PCI: Make specifying PCI devices in kernel parameters reusable Logan Gunthorpe
2018-06-18 21:44   ` Alex Williamson
2018-06-18 21:49     ` Logan Gunthorpe
2018-06-18 23:06     ` Andy Shevchenko
2018-06-18 23:11       ` Logan Gunthorpe
2018-06-18 19:36 ` [PATCH v3 2/3] PCI: Allow specifying devices using a base bus and path of devfns Logan Gunthorpe
2018-06-18 21:55   ` Alex Williamson
2018-06-21 19:22   ` Matthew Wilcox
2018-06-21 21:00     ` Logan Gunthorpe
2018-06-18 19:36 ` [PATCH v3 3/3] PCI: Introduce the disable_acs_redir parameter Logan Gunthorpe
2018-06-18 22:21   ` Alex Williamson
2018-06-18 22:25     ` Logan Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).