linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] VMD MSI Remapping Bypass
@ 2020-07-28 19:49 Jon Derrick
  2020-07-28 19:49 ` [PATCH 1/6] PCI: vmd: Create physical offset helper Jon Derrick
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Jon Derrick @ 2020-07-28 19:49 UTC (permalink / raw)
  To: linux-pci, Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Jon Derrick

The Intel Volume Management Device acts similar to a PCI-to-PCI bridge in that
it changes downstream devices' requester-ids to its own. As VMD supports PCIe
devices, it has its own MSI/X table and transmits child device MSI/X by
remapping child device MSI/X and handling like a demultiplexer.

Some newer VMD devices (Icelake Server and client) have an option to bypass the
VMD MSI/X remapping table. This allows for better performance scaling as the
child device MSI/X won't be limited by VMD's MSI/X count and IRQ handler.

It's expected that most users don't want MSI/X remapping when they can get
better performance without this limitation. This set includes some long overdue
cleanup of overgrown VMD code and introduces the MSI/X remapping disable.

Applies on top of e3beca48a45b ("irqdomain/treewide: Keep firmware node
unconditionally allocated") and ec0160891e38 ("irqdomain/treewide: Free
firmware node after domain removal") in tip/urgent


Jon Derrick (6):
  PCI: vmd: Create physical offset helper
  PCI: vmd: Create bus offset configuration helper
  PCI: vmd: Create IRQ Domain configuration helper
  PCI: vmd: Create IRQ allocation helper
  x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  PCI: vmd: Disable MSI/X remapping when possible

 arch/x86/kernel/apic/msi.c   |   2 +-
 drivers/pci/controller/vmd.c | 344 +++++++++++++++++++++++------------
 2 files changed, 224 insertions(+), 122 deletions(-)


base-commit: ec0160891e387f4771f953b888b1fe951398e5d9
-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] PCI: vmd: Create physical offset helper
  2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
@ 2020-07-28 19:49 ` Jon Derrick
  2020-07-28 19:49 ` [PATCH 2/6] PCI: vmd: Create bus offset configuration helper Jon Derrick
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Jon Derrick @ 2020-07-28 19:49 UTC (permalink / raw)
  To: linux-pci, Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Jon Derrick, Andy Shevchenko

Moves the guest-passthrough physical offset discovery code to a new
helper. No functional changes.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 drivers/pci/controller/vmd.c | 105 +++++++++++++++++++++--------------
 1 file changed, 62 insertions(+), 43 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index f69ef8c89f72..44b2db789eff 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -417,6 +417,60 @@ static int vmd_find_free_domain(void)
 	return domain + 1;
 }
 
+static int vmd_get_phys_offsets(struct vmd_dev *vmd, bool native_hint,
+				resource_size_t *offset1,
+				resource_size_t *offset2)
+{
+	struct pci_dev *dev = vmd->dev;
+	u64 phys1, phys2;
+
+	if (native_hint) {
+		u32 vmlock;
+		int ret;
+
+		ret = pci_read_config_dword(dev, PCI_REG_VMLOCK, &vmlock);
+		if (ret || vmlock == ~0)
+			return -ENODEV;
+
+		if (MB2_SHADOW_EN(vmlock)) {
+			void __iomem *membar2;
+
+			membar2 = pci_iomap(dev, VMD_MEMBAR2, 0);
+			if (!membar2)
+				return -ENOMEM;
+			phys1 = readq(membar2 + MB2_SHADOW_OFFSET);
+			phys2 = readq(membar2 + MB2_SHADOW_OFFSET + 8);
+			pci_iounmap(dev, membar2);
+		} else
+			return 0;
+	} else {
+		/* Hypervisor-Emulated Vendor-Specific Capability */
+		int pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+		u32 reg, regu;
+
+		pci_read_config_dword(dev, pos + 4, &reg);
+
+		/* "SHDW" */
+		if (pos && reg == 0x53484457) {
+			pci_read_config_dword(dev, pos + 8, &reg);
+			pci_read_config_dword(dev, pos + 12, &regu);
+			phys1 = (u64) regu << 32 | reg;
+
+			pci_read_config_dword(dev, pos + 16, &reg);
+			pci_read_config_dword(dev, pos + 20, &regu);
+			phys2 = (u64) regu << 32 | reg;
+		} else
+			return 0;
+	}
+
+	*offset1 = dev->resource[VMD_MEMBAR1].start -
+			(phys1 & PCI_BASE_ADDRESS_MEM_MASK);
+	*offset2 = dev->resource[VMD_MEMBAR2].start -
+			(phys2 & PCI_BASE_ADDRESS_MEM_MASK);
+
+	return 0;
+}
+
 static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 {
 	struct pci_sysdata *sd = &vmd->sysdata;
@@ -428,6 +482,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	resource_size_t offset[2] = {0};
 	resource_size_t membar2_offset = 0x2000;
 	struct pci_bus *child;
+	int ret;
 
 	/*
 	 * Shadow registers may exist in certain VMD device ids which allow
@@ -436,50 +491,14 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	 * or 0, depending on an enable bit in the VMD device.
 	 */
 	if (features & VMD_FEAT_HAS_MEMBAR_SHADOW) {
-		u32 vmlock;
-		int ret;
-
 		membar2_offset = MB2_SHADOW_OFFSET + MB2_SHADOW_SIZE;
-		ret = pci_read_config_dword(vmd->dev, PCI_REG_VMLOCK, &vmlock);
-		if (ret || vmlock == ~0)
-			return -ENODEV;
-
-		if (MB2_SHADOW_EN(vmlock)) {
-			void __iomem *membar2;
-
-			membar2 = pci_iomap(vmd->dev, VMD_MEMBAR2, 0);
-			if (!membar2)
-				return -ENOMEM;
-			offset[0] = vmd->dev->resource[VMD_MEMBAR1].start -
-					(readq(membar2 + MB2_SHADOW_OFFSET) &
-					 PCI_BASE_ADDRESS_MEM_MASK);
-			offset[1] = vmd->dev->resource[VMD_MEMBAR2].start -
-					(readq(membar2 + MB2_SHADOW_OFFSET + 8) &
-					 PCI_BASE_ADDRESS_MEM_MASK);
-			pci_iounmap(vmd->dev, membar2);
-		}
-	}
-
-	if (features & VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP) {
-		int pos = pci_find_capability(vmd->dev, PCI_CAP_ID_VNDR);
-		u32 reg, regu;
-
-		pci_read_config_dword(vmd->dev, pos + 4, &reg);
-
-		/* "SHDW" */
-		if (pos && reg == 0x53484457) {
-			pci_read_config_dword(vmd->dev, pos + 8, &reg);
-			pci_read_config_dword(vmd->dev, pos + 12, &regu);
-			offset[0] = vmd->dev->resource[VMD_MEMBAR1].start -
-					(((u64) regu << 32 | reg) &
-					 PCI_BASE_ADDRESS_MEM_MASK);
-
-			pci_read_config_dword(vmd->dev, pos + 16, &reg);
-			pci_read_config_dword(vmd->dev, pos + 20, &regu);
-			offset[1] = vmd->dev->resource[VMD_MEMBAR2].start -
-					(((u64) regu << 32 | reg) &
-					 PCI_BASE_ADDRESS_MEM_MASK);
-		}
+		ret = vmd_get_phys_offsets(vmd, true, &offset[0], &offset[1]);
+		if (ret)
+			return ret;
+	} else if (features & VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP) {
+		ret = vmd_get_phys_offsets(vmd, false, &offset[0], &offset[1]);
+		if (ret)
+			return ret;
 	}
 
 	/*
-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/6] PCI: vmd: Create bus offset configuration helper
  2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
  2020-07-28 19:49 ` [PATCH 1/6] PCI: vmd: Create physical offset helper Jon Derrick
@ 2020-07-28 19:49 ` Jon Derrick
  2020-07-28 19:49 ` [PATCH 3/6] PCI: vmd: Create IRQ Domain " Jon Derrick
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Jon Derrick @ 2020-07-28 19:49 UTC (permalink / raw)
  To: linux-pci, Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Jon Derrick, Andy Shevchenko

Moves the bus offset configuration discovery code to a new helper.
Modifies the bus offset 2-bit decode switch to have a 0 case and a
default error case, just in case the field is expanded in future
hardware.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 drivers/pci/controller/vmd.c | 53 ++++++++++++++++++++++--------------
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 44b2db789eff..a462719af12a 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -471,6 +471,35 @@ static int vmd_get_phys_offsets(struct vmd_dev *vmd, bool native_hint,
 	return 0;
 }
 
+static int vmd_get_bus_number_start(struct vmd_dev *vmd)
+{
+	struct pci_dev *dev = vmd->dev;
+	u16 reg;
+
+	pci_read_config_word(dev, PCI_REG_VMCAP, &reg);
+	if (BUS_RESTRICT_CAP(reg)) {
+		pci_read_config_word(dev, PCI_REG_VMCONFIG, &reg);
+
+		switch (BUS_RESTRICT_CFG(reg)) {
+		case 0:
+			vmd->busn_start = 0;
+			break;
+		case 1:
+			vmd->busn_start = 128;
+			break;
+		case 2:
+			vmd->busn_start = 224;
+			break;
+		default:
+			pci_err(dev, "Unknown Bus Offset Setting (%d)\n",
+				BUS_RESTRICT_CFG(reg));
+			return -ENODEV;
+		}
+	}
+
+	return 0;
+}
+
 static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 {
 	struct pci_sysdata *sd = &vmd->sysdata;
@@ -506,27 +535,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	 * limits the bus range to between 0-127, 128-255, or 224-255
 	 */
 	if (features & VMD_FEAT_HAS_BUS_RESTRICTIONS) {
-		u16 reg16;
-
-		pci_read_config_word(vmd->dev, PCI_REG_VMCAP, &reg16);
-		if (BUS_RESTRICT_CAP(reg16)) {
-			pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG,
-					     &reg16);
-
-			switch (BUS_RESTRICT_CFG(reg16)) {
-			case 1:
-				vmd->busn_start = 128;
-				break;
-			case 2:
-				vmd->busn_start = 224;
-				break;
-			case 3:
-				pci_err(vmd->dev, "Unknown Bus Offset Setting\n");
-				return -ENODEV;
-			default:
-				break;
-			}
-		}
+		ret = vmd_get_bus_number_start(vmd);
+		if (ret)
+			return ret;
 	}
 
 	res = &vmd->dev->resource[VMD_CFGBAR];
-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/6] PCI: vmd: Create IRQ Domain configuration helper
  2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
  2020-07-28 19:49 ` [PATCH 1/6] PCI: vmd: Create physical offset helper Jon Derrick
  2020-07-28 19:49 ` [PATCH 2/6] PCI: vmd: Create bus offset configuration helper Jon Derrick
@ 2020-07-28 19:49 ` Jon Derrick
  2020-07-28 19:49 ` [PATCH 4/6] PCI: vmd: Create IRQ allocation helper Jon Derrick
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Jon Derrick @ 2020-07-28 19:49 UTC (permalink / raw)
  To: linux-pci, Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Jon Derrick, Andy Shevchenko

Moves the IRQ and MSI Domain configuration code to new helpers. No
functional changes.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 drivers/pci/controller/vmd.c | 52 ++++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index a462719af12a..703c48171993 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -298,6 +298,34 @@ static struct msi_domain_info vmd_msi_domain_info = {
 	.chip		= &vmd_msi_controller,
 };
 
+static int vmd_create_irq_domain(struct vmd_dev *vmd)
+{
+	struct fwnode_handle *fn;
+
+	fn = irq_domain_alloc_named_id_fwnode("VMD-MSI", vmd->sysdata.domain);
+	if (!fn)
+		return -ENODEV;
+
+	vmd->irq_domain = pci_msi_create_irq_domain(fn, &vmd_msi_domain_info,
+						    x86_vector_domain);
+	if (!vmd->irq_domain) {
+		irq_domain_free_fwnode(fn);
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static void vmd_remove_irq_domain(struct vmd_dev *vmd)
+{
+	if (vmd->irq_domain) {
+		struct fwnode_handle *fn = vmd->irq_domain->fwnode;
+
+		irq_domain_remove(vmd->irq_domain);
+		irq_domain_free_fwnode(fn);
+	}
+}
+
 static char __iomem *vmd_cfg_addr(struct vmd_dev *vmd, struct pci_bus *bus,
 				  unsigned int devfn, int reg, int len)
 {
@@ -503,7 +531,6 @@ static int vmd_get_bus_number_start(struct vmd_dev *vmd)
 static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 {
 	struct pci_sysdata *sd = &vmd->sysdata;
-	struct fwnode_handle *fn;
 	struct resource *res;
 	u32 upper_bits;
 	unsigned long flags;
@@ -598,16 +625,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 
 	sd->node = pcibus_to_node(vmd->dev->bus);
 
-	fn = irq_domain_alloc_named_id_fwnode("VMD-MSI", vmd->sysdata.domain);
-	if (!fn)
-		return -ENODEV;
-
-	vmd->irq_domain = pci_msi_create_irq_domain(fn, &vmd_msi_domain_info,
-						    x86_vector_domain);
-	if (!vmd->irq_domain) {
-		irq_domain_free_fwnode(fn);
-		return -ENODEV;
-	}
+	ret = vmd_create_irq_domain(vmd);
+	if (ret)
+		return ret;
 
 	pci_add_resource(&resources, &vmd->resources[0]);
 	pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
@@ -617,13 +637,13 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 				       &vmd_ops, sd, &resources);
 	if (!vmd->bus) {
 		pci_free_resource_list(&resources);
-		irq_domain_remove(vmd->irq_domain);
-		irq_domain_free_fwnode(fn);
+		vmd_remove_irq_domain(vmd);
 		return -ENODEV;
 	}
 
 	vmd_attach_resources(vmd);
-	dev_set_msi_domain(&vmd->bus->dev, vmd->irq_domain);
+	if (vmd->irq_domain)
+		dev_set_msi_domain(&vmd->bus->dev, vmd->irq_domain);
 
 	pci_scan_child_bus(vmd->bus);
 	pci_assign_unassigned_bus_resources(vmd->bus);
@@ -732,15 +752,13 @@ static void vmd_cleanup_srcu(struct vmd_dev *vmd)
 static void vmd_remove(struct pci_dev *dev)
 {
 	struct vmd_dev *vmd = pci_get_drvdata(dev);
-	struct fwnode_handle *fn = vmd->irq_domain->fwnode;
 
 	sysfs_remove_link(&vmd->dev->dev.kobj, "domain");
 	pci_stop_root_bus(vmd->bus);
 	pci_remove_root_bus(vmd->bus);
 	vmd_cleanup_srcu(vmd);
 	vmd_detach_resources(vmd);
-	irq_domain_remove(vmd->irq_domain);
-	irq_domain_free_fwnode(fn);
+	vmd_remove_irq_domain(vmd);
 }
 
 #ifdef CONFIG_PM_SLEEP
-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/6] PCI: vmd: Create IRQ allocation helper
  2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
                   ` (2 preceding siblings ...)
  2020-07-28 19:49 ` [PATCH 3/6] PCI: vmd: Create IRQ Domain " Jon Derrick
@ 2020-07-28 19:49 ` Jon Derrick
  2020-07-28 19:49 ` [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE Jon Derrick
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Jon Derrick @ 2020-07-28 19:49 UTC (permalink / raw)
  To: linux-pci, Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Jon Derrick, Andy Shevchenko

Moves the IRQ allocation and SRCU initialization code to a new helper.
No functional changes.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 drivers/pci/controller/vmd.c | 94 ++++++++++++++++++++----------------
 1 file changed, 53 insertions(+), 41 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 703c48171993..3214d785fa5d 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -528,6 +528,55 @@ static int vmd_get_bus_number_start(struct vmd_dev *vmd)
 	return 0;
 }
 
+static irqreturn_t vmd_irq(int irq, void *data)
+{
+	struct vmd_irq_list *irqs = data;
+	struct vmd_irq *vmdirq;
+	int idx;
+
+	idx = srcu_read_lock(&irqs->srcu);
+	list_for_each_entry_rcu(vmdirq, &irqs->irq_list, node)
+		generic_handle_irq(vmdirq->virq);
+	srcu_read_unlock(&irqs->srcu, idx);
+
+	return IRQ_HANDLED;
+}
+
+static int vmd_alloc_irqs(struct vmd_dev *vmd)
+{
+	struct pci_dev *dev = vmd->dev;
+	int i, err;
+
+	vmd->msix_count = pci_msix_vec_count(dev);
+	if (vmd->msix_count < 0)
+		return -ENODEV;
+
+	vmd->msix_count = pci_alloc_irq_vectors(dev, 1, vmd->msix_count,
+						PCI_IRQ_MSIX);
+	if (vmd->msix_count < 0)
+		return vmd->msix_count;
+
+	vmd->irqs = devm_kcalloc(&dev->dev, vmd->msix_count, sizeof(*vmd->irqs),
+				 GFP_KERNEL);
+	if (!vmd->irqs)
+		return -ENOMEM;
+
+	for (i = 0; i < vmd->msix_count; i++) {
+		err = init_srcu_struct(&vmd->irqs[i].srcu);
+		if (err)
+			return err;
+
+		INIT_LIST_HEAD(&vmd->irqs[i].irq_list);
+		err = devm_request_irq(&dev->dev, pci_irq_vector(dev, i),
+				       vmd_irq, IRQF_NO_THREAD,
+				       "vmd", &vmd->irqs[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 {
 	struct pci_sysdata *sd = &vmd->sysdata;
@@ -663,24 +712,10 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	return 0;
 }
 
-static irqreturn_t vmd_irq(int irq, void *data)
-{
-	struct vmd_irq_list *irqs = data;
-	struct vmd_irq *vmdirq;
-	int idx;
-
-	idx = srcu_read_lock(&irqs->srcu);
-	list_for_each_entry_rcu(vmdirq, &irqs->irq_list, node)
-		generic_handle_irq(vmdirq->virq);
-	srcu_read_unlock(&irqs->srcu, idx);
-
-	return IRQ_HANDLED;
-}
-
 static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
 	struct vmd_dev *vmd;
-	int i, err;
+	int err;
 
 	if (resource_size(&dev->resource[VMD_CFGBAR]) < (1 << 20))
 		return -ENOMEM;
@@ -703,32 +738,9 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	    dma_set_mask_and_coherent(&dev->dev, DMA_BIT_MASK(32)))
 		return -ENODEV;
 
-	vmd->msix_count = pci_msix_vec_count(dev);
-	if (vmd->msix_count < 0)
-		return -ENODEV;
-
-	vmd->msix_count = pci_alloc_irq_vectors(dev, 1, vmd->msix_count,
-					PCI_IRQ_MSIX);
-	if (vmd->msix_count < 0)
-		return vmd->msix_count;
-
-	vmd->irqs = devm_kcalloc(&dev->dev, vmd->msix_count, sizeof(*vmd->irqs),
-				 GFP_KERNEL);
-	if (!vmd->irqs)
-		return -ENOMEM;
-
-	for (i = 0; i < vmd->msix_count; i++) {
-		err = init_srcu_struct(&vmd->irqs[i].srcu);
-		if (err)
-			return err;
-
-		INIT_LIST_HEAD(&vmd->irqs[i].irq_list);
-		err = devm_request_irq(&dev->dev, pci_irq_vector(dev, i),
-				       vmd_irq, IRQF_NO_THREAD,
-				       "vmd", &vmd->irqs[i]);
-		if (err)
-			return err;
-	}
+	err = vmd_alloc_irqs(vmd);
+	if (err)
+		return err;
 
 	spin_lock_init(&vmd->cfg_lock);
 	pci_set_drvdata(dev, vmd);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
                   ` (3 preceding siblings ...)
  2020-07-28 19:49 ` [PATCH 4/6] PCI: vmd: Create IRQ allocation helper Jon Derrick
@ 2020-07-28 19:49 ` Jon Derrick
  2020-09-07 14:32   ` Lorenzo Pieralisi
  2020-10-20 20:26   ` Bjorn Helgaas
  2020-07-28 19:49 ` [PATCH 6/6] PCI: vmd: Disable MSI/X remapping when possible Jon Derrick
  2020-08-17 16:14 ` [PATCH 0/6] VMD MSI Remapping Bypass Derrick, Jonathan
  6 siblings, 2 replies; 15+ messages in thread
From: Jon Derrick @ 2020-07-28 19:49 UTC (permalink / raw)
  To: linux-pci, Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Jon Derrick, Andy Shevchenko

VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
In order to support direct interrupt remapping of VMD child devices,
ensure that the IRTE is programmed with the VMD endpoint's requester-id
using pci_real_dma_dev().

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 arch/x86/kernel/apic/msi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index c2b2911feeef..7ca271b8d891 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -189,7 +189,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 
 	init_irq_alloc_info(&info, NULL);
 	info.type = X86_IRQ_ALLOC_TYPE_MSI;
-	info.msi_dev = dev;
+	info.msi_dev = pci_real_dma_dev(dev);
 
 	domain = irq_remapping_get_irq_domain(&info);
 	if (domain == NULL)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/6] PCI: vmd: Disable MSI/X remapping when possible
  2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
                   ` (4 preceding siblings ...)
  2020-07-28 19:49 ` [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE Jon Derrick
@ 2020-07-28 19:49 ` Jon Derrick
  2020-10-20 20:23   ` Bjorn Helgaas
  2020-08-17 16:14 ` [PATCH 0/6] VMD MSI Remapping Bypass Derrick, Jonathan
  6 siblings, 1 reply; 15+ messages in thread
From: Jon Derrick @ 2020-07-28 19:49 UTC (permalink / raw)
  To: linux-pci, Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Jon Derrick, Andy Shevchenko

VMD will retransmit child device MSI/X using its own MSI/X table and
requester-id. This limits the number of MSI/X available to the whole
child device domain to the number of VMD MSI/X interrupts. Some VMD
devices have a mode where this remapping can be disabled, allowing child
device interrupts to bypass processing with the VMD MSI/X domain
interrupt handler and going straight the child device interrupt handler,
allowing for better performance and scaling. The requester-id still gets
changed to the VMD endpoint's requester-id, and the interrupt remapping
handlers have been updated to properly set IRTE for child device
interrupts to the VMD endpoint's context.

Some VMD platforms have existing production BIOS which rely on MSI/X
remapping and won't explicitly program the MSI/X remapping bit. This
re-enables MSI/X remapping on unload.

Disabling MSI/X remapping is only available for Icelake Server and
client VMD products.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 drivers/pci/controller/vmd.c | 58 +++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 3214d785fa5d..e8cde2c390b9 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -53,6 +53,12 @@ enum vmd_features {
 	 * vendor-specific capability space
 	 */
 	VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP	= (1 << 2),
+
+	/*
+	 * Device remaps MSI/X transactions into its MSI/X table and requires
+	 * VMD MSI domain for child device interrupt handling
+	 */
+	VMD_FEAT_REMAPS_MSI			= (1 << 3),
 };
 
 /*
@@ -298,6 +304,15 @@ static struct msi_domain_info vmd_msi_domain_info = {
 	.chip		= &vmd_msi_controller,
 };
 
+static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable)
+{
+	u16 reg;
+
+	pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, &reg);
+	reg = enable ? (reg & ~0x2) : (reg | 0x2);
+	pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg);
+}
+
 static int vmd_create_irq_domain(struct vmd_dev *vmd)
 {
 	struct fwnode_handle *fn;
@@ -318,6 +333,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd)
 
 static void vmd_remove_irq_domain(struct vmd_dev *vmd)
 {
+	/*
+	 * Some production BIOS won't enable remapping between soft reboots.
+	 * Ensure remapping is restored before unloading the driver
+	 */
+	if (!vmd->msix_count)
+		vmd_enable_msi_remapping(vmd, true);
+
 	if (vmd->irq_domain) {
 		struct fwnode_handle *fn = vmd->irq_domain->fwnode;
 
@@ -606,6 +628,27 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 			return ret;
 	}
 
+	/*
+	 * Currently MSI remapping must be enabled in guest passthrough mode
+	 * due to some missing interrupt remapping plumbing. This is probably
+	 * acceptable because the guest is usually CPU-limited and MSI
+	 * remapping doesn't become a performance bottleneck.
+	 */
+	if (features & VMD_FEAT_REMAPS_MSI || offset[0] || offset[1]) {
+		ret = vmd_alloc_irqs(vmd);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * Disable remapping for performance if possible based on if VMD IRQs
+	 * had been allocated.
+	 */
+	if (vmd->msix_count)
+		vmd_enable_msi_remapping(vmd, true);
+	else
+		vmd_enable_msi_remapping(vmd, false);
+
 	/*
 	 * Certain VMD devices may have a root port configuration option which
 	 * limits the bus range to between 0-127, 128-255, or 224-255
@@ -674,9 +717,11 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 
 	sd->node = pcibus_to_node(vmd->dev->bus);
 
-	ret = vmd_create_irq_domain(vmd);
-	if (ret)
-		return ret;
+	if (vmd->msix_count) {
+		ret = vmd_create_irq_domain(vmd);
+		if (ret)
+			return ret;
+	}
 
 	pci_add_resource(&resources, &vmd->resources[0]);
 	pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
@@ -738,10 +783,6 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	    dma_set_mask_and_coherent(&dev->dev, DMA_BIT_MASK(32)))
 		return -ENODEV;
 
-	err = vmd_alloc_irqs(vmd);
-	if (err)
-		return err;
-
 	spin_lock_init(&vmd->cfg_lock);
 	pci_set_drvdata(dev, vmd);
 	err = vmd_enable_domain(vmd, (unsigned long) id->driver_data);
@@ -809,7 +850,8 @@ static SIMPLE_DEV_PM_OPS(vmd_dev_pm_ops, vmd_suspend, vmd_resume);
 
 static const struct pci_device_id vmd_ids[] = {
 	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_201D),
-		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP,},
+		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
+				VMD_FEAT_REMAPS_MSI,},
 	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_28C0),
 		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
 				VMD_FEAT_HAS_BUS_RESTRICTIONS,},
-- 
2.27.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] VMD MSI Remapping Bypass
  2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
                   ` (5 preceding siblings ...)
  2020-07-28 19:49 ` [PATCH 6/6] PCI: vmd: Disable MSI/X remapping when possible Jon Derrick
@ 2020-08-17 16:14 ` Derrick, Jonathan
  6 siblings, 0 replies; 15+ messages in thread
From: Derrick, Jonathan @ 2020-08-17 16:14 UTC (permalink / raw)
  To: linux-pci, lorenzo.pieralisi
  Cc: hch, andrzej.jakowski, helgaas, linux-kernel, Kalakota, SushmaX, x86

Hi Lorenzo,

On Tue, 2020-07-28 at 13:49 -0600, Jon Derrick wrote:
> The Intel Volume Management Device acts similar to a PCI-to-PCI bridge in that
> it changes downstream devices' requester-ids to its own. As VMD supports PCIe
> devices, it has its own MSI/X table and transmits child device MSI/X by
> remapping child device MSI/X and handling like a demultiplexer.
> 
> Some newer VMD devices (Icelake Server and client) have an option to bypass the
> VMD MSI/X remapping table. This allows for better performance scaling as the
> child device MSI/X won't be limited by VMD's MSI/X count and IRQ handler.
> 
> It's expected that most users don't want MSI/X remapping when they can get
> better performance without this limitation. This set includes some long overdue
> cleanup of overgrown VMD code and introduces the MSI/X remapping disable.
> 
> Applies on top of e3beca48a45b ("irqdomain/treewide: Keep firmware node
> unconditionally allocated") and ec0160891e38 ("irqdomain/treewide: Free
> firmware node after domain removal") in tip/urgent
> 
> 
> Jon Derrick (6):
>   PCI: vmd: Create physical offset helper
>   PCI: vmd: Create bus offset configuration helper
>   PCI: vmd: Create IRQ Domain configuration helper
>   PCI: vmd: Create IRQ allocation helper
>   x86/apic/msi: Use Real PCI DMA device when configuring IRTE
>   PCI: vmd: Disable MSI/X remapping when possible
> 
>  arch/x86/kernel/apic/msi.c   |   2 +-
>  drivers/pci/controller/vmd.c | 344 +++++++++++++++++++++++------------
>  2 files changed, 224 insertions(+), 122 deletions(-)
> 
> 
> base-commit: ec0160891e387f4771f953b888b1fe951398e5d9

Gentle reminder. Please don't forget about this.
We have a few more patches coming soon that I'd prefer to stage upon
this set.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  2020-07-28 19:49 ` [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE Jon Derrick
@ 2020-09-07 14:32   ` Lorenzo Pieralisi
  2020-09-28 11:32     ` Thomas Gleixner
  2020-10-20 20:26   ` Bjorn Helgaas
  1 sibling, 1 reply; 15+ messages in thread
From: Lorenzo Pieralisi @ 2020-09-07 14:32 UTC (permalink / raw)
  To: Jon Derrick
  Cc: linux-pci, Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Andy Shevchenko

On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
> VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
> In order to support direct interrupt remapping of VMD child devices,
> ensure that the IRTE is programmed with the VMD endpoint's requester-id
> using pci_real_dma_dev().
> 
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> ---
>  arch/x86/kernel/apic/msi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I'd need an x86 maintainer ACK on this patch.

Thanks,
Lorenzo

> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index c2b2911feeef..7ca271b8d891 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -189,7 +189,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  
>  	init_irq_alloc_info(&info, NULL);
>  	info.type = X86_IRQ_ALLOC_TYPE_MSI;
> -	info.msi_dev = dev;
> +	info.msi_dev = pci_real_dma_dev(dev);
>  
>  	domain = irq_remapping_get_irq_domain(&info);
>  	if (domain == NULL)
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  2020-09-07 14:32   ` Lorenzo Pieralisi
@ 2020-09-28 11:32     ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2020-09-28 11:32 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jon Derrick
  Cc: linux-pci, Bjorn Helgaas, Christoph Hellwig, Andrzej Jakowski,
	Sushma Kalakota, linux-kernel, x86, Andy Shevchenko

On Mon, Sep 07 2020 at 15:32, Lorenzo Pieralisi wrote:

> On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
>> VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
>> In order to support direct interrupt remapping of VMD child devices,
>> ensure that the IRTE is programmed with the VMD endpoint's requester-id
>> using pci_real_dma_dev().
>> 
>> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
>> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
>> ---
>>  arch/x86/kernel/apic/msi.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> I'd need an x86 maintainer ACK on this patch.

That conflicts with the big PCI/MSI overhaul which is pending in

  git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/irq

native_setup_msi_irqs() does not exist anymore.

patch 3 has conflicts as well.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] PCI: vmd: Disable MSI/X remapping when possible
  2020-07-28 19:49 ` [PATCH 6/6] PCI: vmd: Disable MSI/X remapping when possible Jon Derrick
@ 2020-10-20 20:23   ` Bjorn Helgaas
  0 siblings, 0 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2020-10-20 20:23 UTC (permalink / raw)
  To: Jon Derrick
  Cc: linux-pci, Lorenzo Pieralisi, Christoph Hellwig,
	Andrzej Jakowski, Sushma Kalakota, linux-kernel, x86,
	Andy Shevchenko

On Tue, Jul 28, 2020 at 01:49:45PM -0600, Jon Derrick wrote:
> VMD will retransmit child device MSI/X using its own MSI/X table and
> requester-id. This limits the number of MSI/X available to the whole
> child device domain to the number of VMD MSI/X interrupts. Some VMD
> devices have a mode where this remapping can be disabled, allowing child
> device interrupts to bypass processing with the VMD MSI/X domain
> interrupt handler and going straight the child device interrupt handler,
> allowing for better performance and scaling. The requester-id still gets
> changed to the VMD endpoint's requester-id, and the interrupt remapping
> handlers have been updated to properly set IRTE for child device
> interrupts to the VMD endpoint's context.
> 
> Some VMD platforms have existing production BIOS which rely on MSI/X
> remapping and won't explicitly program the MSI/X remapping bit. This
> re-enables MSI/X remapping on unload.
> 
> Disabling MSI/X remapping is only available for Icelake Server and
> client VMD products.

I'm trying to button up the PCI pull request.  I'm updating Lorenzo's
pci/vmd branch for other reasons, and I notice it still contains this
commit: 67b219dc3a6d ("PCI: vmd: Disable MSI/X remapping when
possible").

I want to remove "MSI/X" from the commit log and comments of that
commit because I don't know what "MSI/X" means.

Should I replace it with "MSI" or "MSI-X" or "MSI/MSI-X"?  Something
else?  When possible, I want to use exactly the same terminology as
the PCIe spec to avoid ambiguity.

It talks about "MSI/X table", which makes me think it should be
"MSI-X", since plain MSI does not have a table.

> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> ---
>  drivers/pci/controller/vmd.c | 58 +++++++++++++++++++++++++++++++-----
>  1 file changed, 50 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 3214d785fa5d..e8cde2c390b9 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -53,6 +53,12 @@ enum vmd_features {
>  	 * vendor-specific capability space
>  	 */
>  	VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP	= (1 << 2),
> +
> +	/*
> +	 * Device remaps MSI/X transactions into its MSI/X table and requires
> +	 * VMD MSI domain for child device interrupt handling
> +	 */
> +	VMD_FEAT_REMAPS_MSI			= (1 << 3),
>  };
>  
>  /*
> @@ -298,6 +304,15 @@ static struct msi_domain_info vmd_msi_domain_info = {
>  	.chip		= &vmd_msi_controller,
>  };
>  
> +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable)
> +{
> +	u16 reg;
> +
> +	pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, &reg);
> +	reg = enable ? (reg & ~0x2) : (reg | 0x2);
> +	pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg);
> +}
> +
>  static int vmd_create_irq_domain(struct vmd_dev *vmd)
>  {
>  	struct fwnode_handle *fn;
> @@ -318,6 +333,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd)
>  
>  static void vmd_remove_irq_domain(struct vmd_dev *vmd)
>  {
> +	/*
> +	 * Some production BIOS won't enable remapping between soft reboots.
> +	 * Ensure remapping is restored before unloading the driver
> +	 */
> +	if (!vmd->msix_count)
> +		vmd_enable_msi_remapping(vmd, true);
> +
>  	if (vmd->irq_domain) {
>  		struct fwnode_handle *fn = vmd->irq_domain->fwnode;
>  
> @@ -606,6 +628,27 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  			return ret;
>  	}
>  
> +	/*
> +	 * Currently MSI remapping must be enabled in guest passthrough mode
> +	 * due to some missing interrupt remapping plumbing. This is probably
> +	 * acceptable because the guest is usually CPU-limited and MSI
> +	 * remapping doesn't become a performance bottleneck.
> +	 */
> +	if (features & VMD_FEAT_REMAPS_MSI || offset[0] || offset[1]) {
> +		ret = vmd_alloc_irqs(vmd);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/*
> +	 * Disable remapping for performance if possible based on if VMD IRQs
> +	 * had been allocated.
> +	 */
> +	if (vmd->msix_count)
> +		vmd_enable_msi_remapping(vmd, true);
> +	else
> +		vmd_enable_msi_remapping(vmd, false);
> +
>  	/*
>  	 * Certain VMD devices may have a root port configuration option which
>  	 * limits the bus range to between 0-127, 128-255, or 224-255
> @@ -674,9 +717,11 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  
>  	sd->node = pcibus_to_node(vmd->dev->bus);
>  
> -	ret = vmd_create_irq_domain(vmd);
> -	if (ret)
> -		return ret;
> +	if (vmd->msix_count) {
> +		ret = vmd_create_irq_domain(vmd);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	pci_add_resource(&resources, &vmd->resources[0]);
>  	pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
> @@ -738,10 +783,6 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  	    dma_set_mask_and_coherent(&dev->dev, DMA_BIT_MASK(32)))
>  		return -ENODEV;
>  
> -	err = vmd_alloc_irqs(vmd);
> -	if (err)
> -		return err;
> -
>  	spin_lock_init(&vmd->cfg_lock);
>  	pci_set_drvdata(dev, vmd);
>  	err = vmd_enable_domain(vmd, (unsigned long) id->driver_data);
> @@ -809,7 +850,8 @@ static SIMPLE_DEV_PM_OPS(vmd_dev_pm_ops, vmd_suspend, vmd_resume);
>  
>  static const struct pci_device_id vmd_ids[] = {
>  	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_201D),
> -		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP,},
> +		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
> +				VMD_FEAT_REMAPS_MSI,},
>  	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_28C0),
>  		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW |
>  				VMD_FEAT_HAS_BUS_RESTRICTIONS,},
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  2020-07-28 19:49 ` [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE Jon Derrick
  2020-09-07 14:32   ` Lorenzo Pieralisi
@ 2020-10-20 20:26   ` Bjorn Helgaas
  2020-10-21  1:20     ` Derrick, Jonathan
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2020-10-20 20:26 UTC (permalink / raw)
  To: Jon Derrick
  Cc: linux-pci, Lorenzo Pieralisi, Christoph Hellwig,
	Andrzej Jakowski, Sushma Kalakota, linux-kernel, x86,
	Andy Shevchenko

On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
> VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
> In order to support direct interrupt remapping of VMD child devices,
> ensure that the IRTE is programmed with the VMD endpoint's requester-id
> using pci_real_dma_dev().
> 
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>

As Thomas (and Stephen) pointed out, this conflicts with 7ca435cf857d
("x86/irq: Cleanup the arch_*_msi_irqs() leftovers"), which removes
native_setup_msi_irqs().

Stephen resolved the conflict by dropping this hunk.  I would rather
just drop this patch completely from the PCI tree.  If I keep the
patch, (1) Linus will have to resolve the conflict, and worse, (2)
it's not clear what happened to the use of pci_real_dma_dev() here.
It will just vanish into the ether with no explanation other than
"this function was removed."

Is dropping this patch the correct thing to do?  Or do you need to add
pci_real_dma_dev() elsewhere to compensate?

> ---
>  arch/x86/kernel/apic/msi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> index c2b2911feeef..7ca271b8d891 100644
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -189,7 +189,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  
>  	init_irq_alloc_info(&info, NULL);
>  	info.type = X86_IRQ_ALLOC_TYPE_MSI;
> -	info.msi_dev = dev;
> +	info.msi_dev = pci_real_dma_dev(dev);
>  
>  	domain = irq_remapping_get_irq_domain(&info);
>  	if (domain == NULL)
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  2020-10-20 20:26   ` Bjorn Helgaas
@ 2020-10-21  1:20     ` Derrick, Jonathan
  2020-10-21  2:21       ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick, Jonathan @ 2020-10-21  1:20 UTC (permalink / raw)
  To: helgaas
  Cc: hch, x86, Shevchenko, Andriy, lorenzo.pieralisi,
	andrzej.jakowski, linux-kernel, linux-pci, Kalakota, SushmaX

On Tue, 2020-10-20 at 15:26 -0500, Bjorn Helgaas wrote:
> On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
> > VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
> > In order to support direct interrupt remapping of VMD child devices,
> > ensure that the IRTE is programmed with the VMD endpoint's requester-id
> > using pci_real_dma_dev().
> > 
> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
> > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> 
> As Thomas (and Stephen) pointed out, this conflicts with 7ca435cf857d
> ("x86/irq: Cleanup the arch_*_msi_irqs() leftovers"), which removes
> native_setup_msi_irqs().
> 
> Stephen resolved the conflict by dropping this hunk.  I would rather
> just drop this patch completely from the PCI tree.  If I keep the
> patch, (1) Linus will have to resolve the conflict, and worse, (2)
> it's not clear what happened to the use of pci_real_dma_dev() here.
> It will just vanish into the ether with no explanation other than
> "this function was removed."
> 
> Is dropping this patch the correct thing to do?  Or do you need to add
> pci_real_dma_dev() elsewhere to compensate?
It would still need the pci_real_dma_dev() for IRTE programming.

I think at this point I would rather see 5+6 dropped and this included
for TGL enablement:
https://patchwork.kernel.org/project/linux-pci/patch/20200914190128.5114-1-jonathan.derrick@intel.com/

> 
> > ---
> >  arch/x86/kernel/apic/msi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
> > index c2b2911feeef..7ca271b8d891 100644
> > --- a/arch/x86/kernel/apic/msi.c
> > +++ b/arch/x86/kernel/apic/msi.c
> > @@ -189,7 +189,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> >  
> >  	init_irq_alloc_info(&info, NULL);
> >  	info.type = X86_IRQ_ALLOC_TYPE_MSI;
> > -	info.msi_dev = dev;
> > +	info.msi_dev = pci_real_dma_dev(dev);
> >  
> >  	domain = irq_remapping_get_irq_domain(&info);
> >  	if (domain == NULL)
> > -- 
> > 2.27.0
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  2020-10-21  1:20     ` Derrick, Jonathan
@ 2020-10-21  2:21       ` Bjorn Helgaas
  2020-10-21 19:55         ` Derrick, Jonathan
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2020-10-21  2:21 UTC (permalink / raw)
  To: Derrick, Jonathan
  Cc: hch, x86, Shevchenko, Andriy, lorenzo.pieralisi,
	andrzej.jakowski, linux-kernel, linux-pci, Kalakota, SushmaX

On Wed, Oct 21, 2020 at 01:20:24AM +0000, Derrick, Jonathan wrote:
> On Tue, 2020-10-20 at 15:26 -0500, Bjorn Helgaas wrote:
> > On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
> > > VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
> > > In order to support direct interrupt remapping of VMD child devices,
> > > ensure that the IRTE is programmed with the VMD endpoint's requester-id
> > > using pci_real_dma_dev().
> > > 
> > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
> > > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> > 
> > As Thomas (and Stephen) pointed out, this conflicts with 7ca435cf857d
> > ("x86/irq: Cleanup the arch_*_msi_irqs() leftovers"), which removes
> > native_setup_msi_irqs().
> > 
> > Stephen resolved the conflict by dropping this hunk.  I would rather
> > just drop this patch completely from the PCI tree.  If I keep the
> > patch, (1) Linus will have to resolve the conflict, and worse, (2)
> > it's not clear what happened to the use of pci_real_dma_dev() here.
> > It will just vanish into the ether with no explanation other than
> > "this function was removed."
> > 
> > Is dropping this patch the correct thing to do?  Or do you need to add
> > pci_real_dma_dev() elsewhere to compensate?
>
> It would still need the pci_real_dma_dev() for IRTE programming.
> 
> I think at this point I would rather see 5+6 dropped and this included
> for TGL enablement:
> https://patchwork.kernel.org/project/linux-pci/patch/20200914190128.5114-1-jonathan.derrick@intel.com/

It's too late to add new things for v5.10.  I'll drop 5 and I'll be
happy to drop 6, too, if you want.  I have several comments/questions
on 6 anyway that I haven't finished writing up.

But if you'd rather have 1-4 + 6 in v5.10 instead of just 1-4, let me
know.

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE
  2020-10-21  2:21       ` Bjorn Helgaas
@ 2020-10-21 19:55         ` Derrick, Jonathan
  0 siblings, 0 replies; 15+ messages in thread
From: Derrick, Jonathan @ 2020-10-21 19:55 UTC (permalink / raw)
  To: helgaas
  Cc: hch, x86, Shevchenko, Andriy, lorenzo.pieralisi,
	andrzej.jakowski, linux-kernel, linux-pci, Kalakota, SushmaX

On Tue, 2020-10-20 at 21:21 -0500, Bjorn Helgaas wrote:
> On Wed, Oct 21, 2020 at 01:20:24AM +0000, Derrick, Jonathan wrote:
> > On Tue, 2020-10-20 at 15:26 -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 28, 2020 at 01:49:44PM -0600, Jon Derrick wrote:
> > > > VMD retransmits child device MSI/X with the VMD endpoint's requester-id.
> > > > In order to support direct interrupt remapping of VMD child devices,
> > > > ensure that the IRTE is programmed with the VMD endpoint's requester-id
> > > > using pci_real_dma_dev().
> > > > 
> > > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
> > > > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> > > 
> > > As Thomas (and Stephen) pointed out, this conflicts with 7ca435cf857d
> > > ("x86/irq: Cleanup the arch_*_msi_irqs() leftovers"), which removes
> > > native_setup_msi_irqs().
> > > 
> > > Stephen resolved the conflict by dropping this hunk.  I would rather
> > > just drop this patch completely from the PCI tree.  If I keep the
> > > patch, (1) Linus will have to resolve the conflict, and worse, (2)
> > > it's not clear what happened to the use of pci_real_dma_dev() here.
> > > It will just vanish into the ether with no explanation other than
> > > "this function was removed."
> > > 
> > > Is dropping this patch the correct thing to do?  Or do you need to add
> > > pci_real_dma_dev() elsewhere to compensate?
> > 
> > It would still need the pci_real_dma_dev() for IRTE programming.
> > 
> > I think at this point I would rather see 5+6 dropped and this included
> > for TGL enablement:
> > https://patchwork.kernel.org/project/linux-pci/patch/20200914190128.5114-1-jonathan.derrick@intel.com/
> 
> It's too late to add new things for v5.10.  I'll drop 5 and I'll be
> happy to drop 6, too, if you want.  I have several comments/questions
> on 6 anyway that I haven't finished writing up.
> 
> But if you'd rather have 1-4 + 6 in v5.10 instead of just 1-4, let me
> know.
> 
> Bjorn

Here's the proposed new location for patch 5 for pci_real_dma_dev(),
but I can't test this at the moment:

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6313f0a05db7..707968b234e9 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -194,6 +194,7 @@ int pci_msi_prepare(struct irq_domain *domain,
struct device *dev, int nvec,
                arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
                arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
        }
+       arg->devid = pci_real_dma_dev(pdev);
 
        return 0;
 }
-- 
2.18.1


Otherwise I would want to drop 5 & 6 because 6 will likely break VMD
without patch 5 when IO APIC is in use

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-21 19:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28 19:49 [PATCH 0/6] VMD MSI Remapping Bypass Jon Derrick
2020-07-28 19:49 ` [PATCH 1/6] PCI: vmd: Create physical offset helper Jon Derrick
2020-07-28 19:49 ` [PATCH 2/6] PCI: vmd: Create bus offset configuration helper Jon Derrick
2020-07-28 19:49 ` [PATCH 3/6] PCI: vmd: Create IRQ Domain " Jon Derrick
2020-07-28 19:49 ` [PATCH 4/6] PCI: vmd: Create IRQ allocation helper Jon Derrick
2020-07-28 19:49 ` [PATCH 5/6] x86/apic/msi: Use Real PCI DMA device when configuring IRTE Jon Derrick
2020-09-07 14:32   ` Lorenzo Pieralisi
2020-09-28 11:32     ` Thomas Gleixner
2020-10-20 20:26   ` Bjorn Helgaas
2020-10-21  1:20     ` Derrick, Jonathan
2020-10-21  2:21       ` Bjorn Helgaas
2020-10-21 19:55         ` Derrick, Jonathan
2020-07-28 19:49 ` [PATCH 6/6] PCI: vmd: Disable MSI/X remapping when possible Jon Derrick
2020-10-20 20:23   ` Bjorn Helgaas
2020-08-17 16:14 ` [PATCH 0/6] VMD MSI Remapping Bypass Derrick, Jonathan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).