All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/31] powerpc: Modernize the PCI/MSI support
@ 2021-04-30  8:03 Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 01/31] powerpc/pseries/pci: Introduce __find_pe_total_msi() Cédric Le Goater
                   ` (30 more replies)
  0 siblings, 31 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

Hello,

This series adds support for MSI IRQ domains on top of the XICS (P8)
and XIVE (P9/P10) IRQ domains for the PowerNV (baremetal) and pSeries
(VM) platforms. It should improve greatly IRQ affinity of PCI MSIs
under these PowerPC platforms. Data locality can still be improved
with a machine IRQ domain per chip but this requires FW changes.

The patchset has a large impact but it is well contained under the MSI
support. Initial tests were done on the P8, P9 and P10 PowerNV and
pSeries platforms, under the KVM and PowerVM hypervisor. PCI passthrough
was tested on P8/KVM, P9/KVM and P9/pVM.

P8 passthrough adds an optimization to EOI MSIs when under real mode
but I didn't see any performance improvements with a passthrough 10G
Ethernet adapter. If someone has faster adapters, I would be interested
by the results.

The P8/CAPI driver is also impacted. Tests were done on a Firestone
system with a memory AFU.

Thanks,

C.

Cédric Le Goater (31):
  powerpc/pseries/pci: Introduce __find_pe_total_msi()
  powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs()
  powerpc/xive: Add support for IRQ domain hierarchy
  powerpc/xive: Ease debugging of xive_irq_set_affinity()
  powerpc/pseries/pci: Add MSI domains
  powerpc/xive: Drop unmask of MSIs at startup
  powerpc/xive: Fix xive_irq_set_affinity for MSI
  powerpc/pseries/pci: Add a domain_free_irqs handler
  powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data
  powerpc/pseries/pci: Add support of MSI domains to PHB hotplug
  powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup()
  powerpc/powernv/pci: Add MSI domains
  KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough
    interrupts
  KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt
    routines
  KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts
  powerpc/xics: Remove ICS list
  powerpc/xics: Rename the map handler in a check handler
  powerpc/xics: Give a name to the default XICS IRQ domain
  powerpc/xics: Add debug logging to the set_irq_affinity handlers
  powerpc/xics: Add support for IRQ domain hierarchy
  powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3
  powerpc/pci: Drop XIVE restriction on MSI domains
  powerpc/xics: Drop unmask of MSIs at startup
  powerpc/pseries/pci: Drop unused MSI code
  powerpc/powernv/pci: Drop unused MSI code
  powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough
    interrupt
  powerpc/xics: Fix IRQ migration
  powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices
  powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()
  KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts
  genirq: Improve "hwirq" output in /proc and /sys/

 arch/powerpc/include/asm/kvm_ppc.h         |   4 +-
 arch/powerpc/include/asm/pci-bridge.h      |   5 +
 arch/powerpc/include/asm/pnv-pci.h         |   2 +-
 arch/powerpc/include/asm/xics.h            |   3 +-
 arch/powerpc/include/asm/xive.h            |   1 +
 arch/powerpc/platforms/powernv/pci.h       |   6 -
 arch/powerpc/platforms/pseries/pseries.h   |   2 +
 arch/powerpc/kernel/pci-common.c           |   6 +
 arch/powerpc/kvm/book3s_hv.c               |  18 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c       |   8 +-
 arch/powerpc/kvm/book3s_xive.c             |  18 +-
 arch/powerpc/platforms/powernv/pci-ioda.c  | 258 ++++++++++++++++--
 arch/powerpc/platforms/powernv/pci.c       |  67 -----
 arch/powerpc/platforms/pseries/msi.c       | 296 ++++++++++++++++-----
 arch/powerpc/platforms/pseries/pci_dlpar.c |   4 +
 arch/powerpc/platforms/pseries/setup.c     |   2 +
 arch/powerpc/sysdev/xics/ics-opal.c        |  40 +--
 arch/powerpc/sysdev/xics/ics-rtas.c        |  40 +--
 arch/powerpc/sysdev/xics/xics-common.c     | 125 ++++++---
 arch/powerpc/sysdev/xive/common.c          |  81 +++++-
 kernel/irq/irqdesc.c                       |   2 +-
 kernel/irq/irqdomain.c                     |   1 +
 kernel/irq/proc.c                          |   2 +-
 23 files changed, 693 insertions(+), 298 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/31] powerpc/pseries/pci: Introduce __find_pe_total_msi()
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 02/31] powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs() Cédric Le Goater
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

It will help to size the PCI MSI domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/pseries/msi.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 637300330507..d2d090e04745 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -164,12 +164,12 @@ static int check_req_msix(struct pci_dev *pdev, int nvec)
 
 /* Quota calculation */
 
-static struct device_node *find_pe_total_msi(struct pci_dev *dev, int *total)
+static struct device_node *__find_pe_total_msi(struct device_node *node, int *total)
 {
 	struct device_node *dn;
 	const __be32 *p;
 
-	dn = of_node_get(pci_device_to_OF_node(dev));
+	dn = of_node_get(node);
 	while (dn) {
 		p = of_get_property(dn, "ibm,pe-total-#msi", NULL);
 		if (p) {
@@ -185,6 +185,11 @@ static struct device_node *find_pe_total_msi(struct pci_dev *dev, int *total)
 	return NULL;
 }
 
+static struct device_node *find_pe_total_msi(struct pci_dev *dev, int *total)
+{
+	return __find_pe_total_msi(pci_device_to_OF_node(dev), total);
+}
+
 static struct device_node *find_pe_dn(struct pci_dev *dev, int *total)
 {
 	struct device_node *dn;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/31] powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs()
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 01/31] powerpc/pseries/pci: Introduce __find_pe_total_msi() Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 03/31] powerpc/xive: Add support for IRQ domain hierarchy Cédric Le Goater
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

This splits the routine setting the MSIs in two parts: allocation of
MSIs for the PCI device at the FW level (RTAS) and the actual mapping
and activation of the IRQs.

rtas_prepare_msi_irqs() will serve as a handler for the MSI domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/pseries/msi.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index d2d090e04745..4bf14f27e1aa 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -373,12 +373,11 @@ static void rtas_hack_32bit_msi_gen2(struct pci_dev *pdev)
 	pci_write_config_dword(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI, 0);
 }
 
-static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
+static int rtas_prepare_msi_irqs(struct pci_dev *pdev, int nvec_in, int type,
+				 msi_alloc_info_t *arg)
 {
 	struct pci_dn *pdn;
-	int hwirq, virq, i, quota, rc;
-	struct msi_desc *entry;
-	struct msi_msg msg;
+	int quota, rc;
 	int nvec = nvec_in;
 	int use_32bit_msi_hack = 0;
 
@@ -456,6 +455,22 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 		return rc;
 	}
 
+	return 0;
+}
+
+static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
+{
+	struct pci_dn *pdn;
+	int hwirq, virq, i;
+	int rc;
+	struct msi_desc *entry;
+	struct msi_msg msg;
+
+	rc = rtas_prepare_msi_irqs(pdev, nvec_in, type, NULL);
+	if (rc)
+		return rc;
+
+	pdn = pci_get_pdn(pdev);
 	i = 0;
 	for_each_pci_msi_entry(entry, pdev) {
 		hwirq = rtas_query_irq_number(pdn, i++);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/31] powerpc/xive: Add support for IRQ domain hierarchy
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 01/31] powerpc/pseries/pci: Introduce __find_pe_total_msi() Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 02/31] powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs() Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 04/31] powerpc/xive: Ease debugging of xive_irq_set_affinity() Cédric Le Goater
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

This adds handlers to allocate/free IRQs in a domain hierarchy. We
could try to use xive_irq_domain_map() in xive_irq_domain_alloc() but
we rely on xive_irq_alloc_data() to set the IRQ handler data and
duplicating the code is simpler.

xive_irq_free_data() needs to be called when IRQ are freed to clear
the MMIO mappings and free the XIVE handler data, xive_irq_data
structure. This is going to be a problem with MSI domains which we
will address later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xive/common.c | 60 +++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 5acd76403ee7..6ad26243bc33 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1372,7 +1372,67 @@ static void xive_irq_domain_debug_show(struct seq_file *m, struct irq_domain *d,
 }
 #endif
 
+static int xive_irq_domain_translate(struct irq_domain *d,
+				     struct irq_fwspec *fwspec,
+				     unsigned long *hwirq,
+				     unsigned int *type)
+{
+	return xive_irq_domain_xlate(d, to_of_node(fwspec->fwnode),
+				     fwspec->param, fwspec->param_count,
+				     hwirq, type);
+}
+
+static int xive_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
+				 unsigned int nr_irqs, void *arg)
+{
+	struct irq_fwspec *fwspec = arg;
+	irq_hw_number_t hwirq;
+	unsigned int type = IRQ_TYPE_NONE;
+	int i, rc;
+
+	rc = xive_irq_domain_translate(domain, fwspec, &hwirq, &type);
+	if (rc)
+		return rc;
+
+	pr_debug("%s %d/%lx #%d\n", __func__, virq, hwirq, nr_irqs);
+
+	for (i = 0; i < nr_irqs; i++) {
+		/* TODO: call xive_irq_domain_map() */
+
+		/*
+		 * Mark interrupts as edge sensitive by default so that resend
+		 * actually works. Will fix that up below if needed.
+		 */
+		irq_clear_status_flags(virq, IRQ_LEVEL);
+
+		/* allocates and sets handler data */
+		rc = xive_irq_alloc_data(virq + i, hwirq + i);
+		if (rc)
+			return rc;
+
+		irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
+					      &xive_irq_chip, domain->host_data);
+		irq_set_handler(virq + i, handle_fasteoi_irq);
+	}
+
+	return 0;
+}
+
+static void xive_irq_domain_free(struct irq_domain *domain,
+				 unsigned int virq, unsigned int nr_irqs)
+{
+	int i;
+
+	pr_debug("%s %d #%d\n", __func__, virq, nr_irqs);
+
+	for (i = 0; i < nr_irqs; i++)
+		xive_irq_free_data(virq + i);
+}
+
 static const struct irq_domain_ops xive_irq_domain_ops = {
+	.alloc	= xive_irq_domain_alloc,
+	.free	= xive_irq_domain_free,
+	.translate = xive_irq_domain_translate,
 	.match = xive_irq_domain_match,
 	.map = xive_irq_domain_map,
 	.unmap = xive_irq_domain_unmap,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/31] powerpc/xive: Ease debugging of xive_irq_set_affinity()
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (2 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 03/31] powerpc/xive: Add support for IRQ domain hierarchy Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 05/31] powerpc/pseries/pci: Add MSI domains Cédric Le Goater
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

pr_debug() is easier to activate and it helps to know how the HW is
configured when tweaking the IRQ subsystem.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xive/common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 6ad26243bc33..9cb7ae728b46 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -713,7 +713,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
 	u32 target, old_target;
 	int rc = 0;
 
-	pr_devel("xive_irq_set_affinity: irq %d\n", d->irq);
+	pr_debug("%s: irq %d/%x\n", __func__, d->irq, hw_irq);
 
 	/* Is this valid ? */
 	if (cpumask_any_and(cpumask, cpu_online_mask) >= nr_cpu_ids)
@@ -758,7 +758,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
 		return rc;
 	}
 
-	pr_devel("  target: 0x%x\n", target);
+	pr_debug("  target: 0x%x\n", target);
 	xd->target = target;
 
 	/* Give up previous target */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/31] powerpc/pseries/pci: Add MSI domains
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (3 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 04/31] powerpc/xive: Ease debugging of xive_irq_set_affinity() Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 06/31] powerpc/xive: Drop unmask of MSIs at startup Cédric Le Goater
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Cédric Le Goater

Two IRQ domains are added on top of default machine IRQ domain.

First, the top level "PCI-MSI" domain deals with the MSI specificities.
In this domain, the HW IRQ numbers are generated by the PCI MSI layer,
they compose a unique ID for an MSI source with the PCI device
identifier and the MSI vector number.

These numbers can be quite large on a pSeries machine running under
the IBM Hypervisor and /sys/kernel/irq/ and /proc/interrupts will
require small fixes to show them correctly.

Then, the in-the-middle "MSI" domain acts as a proxy between the PCI
MSI subsystem and the machine IRQ subsystem. It usually handles the
MSI allocator but on pSeries machines, this is done by the RTAS
FW. RTAS returns IRQ numbers in the IRQ number space of the machine.
This is why this in-the-middle "Pseries-MSI" domain has the same HW
IRQ numbers as its parent domain.

Only the XIVE (P9/P10) parent domain is supported for now. We still
need to add support for IRQ domain hierarchy under XICS.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/pci-bridge.h    |   5 +
 arch/powerpc/platforms/pseries/pseries.h |   1 +
 arch/powerpc/kernel/pci-common.c         |   6 +
 arch/powerpc/platforms/pseries/msi.c     | 185 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/setup.c   |   2 +
 5 files changed, 199 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index d2a2a14e56f9..fb35d340a739 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -127,6 +127,11 @@ struct pci_controller {
 
 	void *private_data;
 	struct npu *npu;
+
+	/* IRQ domain hierarchy */
+	struct irq_domain	*dev_domain;
+	struct irq_domain	*msi_domain;
+	struct fwnode_handle	*fwnode;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 4fe48c04c6c2..91cf2afcf423 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -87,6 +87,7 @@ struct pci_host_bridge;
 int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
 
 extern struct pci_controller_ops pseries_pci_controller_ops;
+int pseries_msi_allocate_domains(struct pci_controller *phb);
 
 unsigned long pseries_memory_block_size(void);
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 001e90cd8948..c3573430919d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -29,6 +29,7 @@
 #include <linux/slab.h>
 #include <linux/vgaarb.h>
 #include <linux/numa.h>
+#include <linux/msi.h>
 
 #include <asm/processor.h>
 #include <asm/io.h>
@@ -1060,11 +1061,16 @@ void pcibios_bus_add_device(struct pci_dev *dev)
 
 int pcibios_add_device(struct pci_dev *dev)
 {
+	struct irq_domain *d;
+
 #ifdef CONFIG_PCI_IOV
 	if (ppc_md.pcibios_fixup_sriov)
 		ppc_md.pcibios_fixup_sriov(dev);
 #endif /* CONFIG_PCI_IOV */
 
+	d = dev_get_msi_domain(&dev->bus->dev);
+	if (d)
+		dev_set_msi_domain(&dev->dev, d);
 	return 0;
 }
 
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 4bf14f27e1aa..a9bd1e991df5 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -13,6 +13,7 @@
 #include <asm/hw_irq.h>
 #include <asm/ppc-pci.h>
 #include <asm/machdep.h>
+#include <asm/xive.h>
 
 #include "pseries.h"
 
@@ -518,6 +519,190 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 	return 0;
 }
 
+static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev,
+				   int nvec, msi_alloc_info_t *arg)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct msi_desc *desc = first_pci_msi_entry(pdev);
+	int type = desc->msi_attrib.is_msix ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
+
+	return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
+}
+
+static struct msi_domain_ops pseries_pci_msi_domain_ops = {
+	.msi_prepare	= pseries_msi_ops_prepare,
+};
+
+static void pseries_msi_shutdown(struct irq_data *d)
+{
+	d = d->parent_data;
+	if (d->chip->irq_shutdown)
+		d->chip->irq_shutdown(d);
+}
+
+static void pseries_msi_mask(struct irq_data *d)
+{
+	pci_msi_mask_irq(d);
+	irq_chip_mask_parent(d);
+}
+
+static void pseries_msi_unmask(struct irq_data *d)
+{
+	pci_msi_unmask_irq(d);
+	irq_chip_unmask_parent(d);
+}
+
+static struct irq_chip pseries_pci_msi_irq_chip = {
+	.name		= "Pseries-PCI-MSI",
+	.irq_shutdown	= pseries_msi_shutdown,
+	.irq_mask	= pseries_msi_mask,
+	.irq_unmask	= pseries_msi_unmask,
+	.irq_eoi	= irq_chip_eoi_parent,
+};
+
+static struct msi_domain_info pseries_msi_domain_info = {
+	.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+		  MSI_FLAG_MULTI_PCI_MSI  | MSI_FLAG_PCI_MSIX),
+	.ops   = &pseries_pci_msi_domain_ops,
+	.chip  = &pseries_pci_msi_irq_chip,
+};
+
+static void pseries_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	__pci_read_msi_msg(irq_data_get_msi_desc(data), msg);
+}
+
+static struct irq_chip pseries_msi_irq_chip = {
+	.name			= "Pseries-MSI",
+	.irq_shutdown		= pseries_msi_shutdown,
+	.irq_mask		= irq_chip_mask_parent,
+	.irq_unmask		= irq_chip_unmask_parent,
+	.irq_eoi		= irq_chip_eoi_parent,
+	.irq_set_affinity	= irq_chip_set_affinity_parent,
+	.irq_compose_msi_msg	= pseries_msi_compose_msg,
+};
+
+static int pseries_irq_parent_domain_alloc(struct irq_domain *domain, unsigned int virq,
+					   irq_hw_number_t hwirq)
+{
+	struct irq_fwspec parent_fwspec;
+	int ret;
+
+	parent_fwspec.fwnode = domain->parent->fwnode;
+	parent_fwspec.param_count = 2;
+	parent_fwspec.param[0] = hwirq;
+	parent_fwspec.param[1] = IRQ_TYPE_EDGE_RISING;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, 1, &parent_fwspec);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int pseries_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs, void *arg)
+{
+	struct pci_controller *phb = domain->host_data;
+	msi_alloc_info_t *info = arg;
+	struct msi_desc *desc = info->desc;
+	struct pci_dev *pdev = msi_desc_to_pci_dev(desc);
+	int hwirq;
+	int i, ret;
+
+	hwirq = rtas_query_irq_number(pci_get_pdn(pdev), desc->msi_attrib.entry_nr);
+	if (hwirq < 0) {
+		dev_err(&pdev->dev, "Failed to query HW IRQ: %d\n", hwirq);
+		return hwirq;
+	}
+
+	dev_dbg(&pdev->dev, "%s bridge %pOF %d/%x #%d\n", __func__,
+		phb->dn, virq, hwirq, nr_irqs);
+
+	for (i = 0; i < nr_irqs; i++) {
+		ret = pseries_irq_parent_domain_alloc(domain, virq + i, hwirq + i);
+		if (ret)
+			goto out;
+
+		irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
+					      &pseries_msi_irq_chip, domain->host_data);
+	}
+
+	return 0;
+
+out:
+	/* TODO: handle RTAS cleanup in ->msi_finish() ? */
+	irq_domain_free_irqs_parent(domain, virq, i - 1);
+	return ret;
+}
+
+static void pseries_irq_domain_free(struct irq_domain *domain, unsigned int virq,
+				    unsigned int nr_irqs)
+{
+	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
+	struct pci_controller *phb = irq_data_get_irq_chip_data(d);
+
+	pr_debug("%s bridge %pOF %d #%d\n", __func__, phb->dn, virq, nr_irqs);
+
+	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
+}
+
+static const struct irq_domain_ops pseries_irq_domain_ops = {
+	.alloc  = pseries_irq_domain_alloc,
+	.free   = pseries_irq_domain_free,
+};
+
+static int __pseries_msi_allocate_domains(struct pci_controller *phb,
+					  unsigned int count)
+{
+	struct irq_domain *parent = irq_get_default_host();
+
+	phb->fwnode = irq_domain_alloc_named_id_fwnode("Pseries-MSI",
+						       phb->global_number);
+	if (!phb->fwnode)
+		return -ENOMEM;
+
+	phb->dev_domain = irq_domain_create_hierarchy(parent, 0, count,
+						      phb->fwnode,
+						      &pseries_irq_domain_ops, phb);
+	if (!phb->dev_domain) {
+		pr_err("PCI: failed to create IRQ domain bridge %pOF (domain %d)\n",
+		       phb->dn, phb->global_number);
+		irq_domain_free_fwnode(phb->fwnode);
+		return -ENOMEM;
+	}
+
+	phb->msi_domain = pci_msi_create_irq_domain(of_node_to_fwnode(phb->dn),
+						    &pseries_msi_domain_info,
+						    phb->dev_domain);
+	if (!phb->msi_domain) {
+		pr_err("PCI: failed to create MSI IRQ domain bridge %pOF (domain %d)\n",
+		       phb->dn, phb->global_number);
+		irq_domain_free_fwnode(phb->fwnode);
+		irq_domain_remove(phb->dev_domain);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+int pseries_msi_allocate_domains(struct pci_controller *phb)
+{
+	int count;
+
+	/* Only supported by the XIVE driver */
+	if (!xive_enabled())
+		return -ENODEV;
+
+	if (!__find_pe_total_msi(phb->dn, &count)) {
+		pr_err("PCI: failed to find MSIs for bridge %pOF (domain %d)\n",
+		       phb->dn, phb->global_number);
+		return -ENOSPC;
+	}
+
+	return __pseries_msi_allocate_domains(phb, count);
+}
+
 static void rtas_msi_pci_irq_fixup(struct pci_dev *pdev)
 {
 	/* No LSI -> leave MSIs (if any) configured */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 46e1540abc22..178a50787a26 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -484,6 +484,8 @@ static void __init pSeries_discover_phbs(void)
 
 		/* create pci_dn's for DT nodes under this PHB */
 		pci_devs_phb_init_dynamic(phb);
+
+		pseries_msi_allocate_domains(phb);
 	}
 
 	of_node_put(root);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/31] powerpc/xive: Drop unmask of MSIs at startup
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (4 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 05/31] powerpc/pseries/pci: Add MSI domains Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI Cédric Le Goater
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

That was a workaround in the XIVE domain because of the lack of MSI
domain. This is now handled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xive/common.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 9cb7ae728b46..96737938e8e3 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -616,16 +616,6 @@ static unsigned int xive_irq_startup(struct irq_data *d)
 	pr_devel("xive_irq_startup: irq %d [0x%x] data @%p\n",
 		 d->irq, hw_irq, d);
 
-#ifdef CONFIG_PCI_MSI
-	/*
-	 * The generic MSI code returns with the interrupt disabled on the
-	 * card, using the MSI mask bits. Firmware doesn't appear to unmask
-	 * at that level, so we do it here by hand.
-	 */
-	if (irq_data_get_msi_desc(d))
-		pci_msi_unmask_irq(d);
-#endif
-
 	/* Pick a target */
 	target = xive_pick_irq_target(d, irq_data_get_affinity_mask(d));
 	if (target == XIVE_INVALID_TARGET) {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (5 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 06/31] powerpc/xive: Drop unmask of MSIs at startup Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-05-14 20:48   ` Thomas Gleixner
  2021-04-30  8:03 ` [PATCH 08/31] powerpc/pseries/pci: Add a domain_free_irqs handler Cédric Le Goater
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Cédric Le Goater

The MSI affinity is automanaged and it can be set before starting the
associated IRQ.

( Should we simply remove the irqd_is_started() test ? )

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xive/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 96737938e8e3..3485baf9ec8c 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -710,7 +710,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
 		return -EINVAL;
 
 	/* Don't do anything if the interrupt isn't started */
-	if (!irqd_is_started(d))
+	if (!irqd_is_started(d) && !irqd_affinity_is_managed(d))
 		return IRQ_SET_MASK_OK;
 
 	/*
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/31] powerpc/pseries/pci: Add a domain_free_irqs handler
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (6 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 09/31] powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data Cédric Le Goater
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Cédric Le Goater

The RTAS firmware can not disable one MSI at a time. It's all or
nothing. We need a custom free IRQ handler for that.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/pseries/msi.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index a9bd1e991df5..a41c448520d4 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -529,8 +529,24 @@ static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev
 	return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
 }
 
+/*
+ * RTAS can not disable one MSI at a time. It's all or nothing. Do it
+ * at the end after all IRQs have been freed.
+ */
+static void pseries_msi_domain_free_irqs(struct irq_domain *domain,
+					 struct device *dev)
+{
+	if (WARN_ON_ONCE(!dev_is_pci(dev)))
+		return;
+
+	__msi_domain_free_irqs(domain, dev);
+
+	rtas_disable_msi(to_pci_dev(dev));
+}
+
 static struct msi_domain_ops pseries_pci_msi_domain_ops = {
 	.msi_prepare	= pseries_msi_ops_prepare,
+	.domain_free_irqs = pseries_msi_domain_free_irqs,
 };
 
 static void pseries_msi_shutdown(struct irq_data *d)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/31] powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (7 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 08/31] powerpc/pseries/pci: Add a domain_free_irqs handler Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-05-20 12:33   ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 10/31] powerpc/pseries/pci: Add support of MSI domains to PHB hotplug Cédric Le Goater
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Cédric Le Goater

The MSI domain clears the IRQ with msi_domain_free(), which calls
irq_domain_free_irqs_top(), which clears the handler data. This is a
problem for the XIVE controller since we need to unmap MMIO pages and
free a specific XIVE structure.

The 'msi_free()' handler is called before irq_domain_free_irqs_top()
when the handler data is still available. Use that to clear the XIVE
controller data.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/xive.h      |  1 +
 arch/powerpc/platforms/pseries/msi.c | 16 +++++++++++++++-
 arch/powerpc/sysdev/xive/common.c    |  5 ++++-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index aa094a8655b0..20ae50ab083c 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -111,6 +111,7 @@ void xive_native_free_vp_block(u32 vp_base);
 int xive_native_populate_irq_data(u32 hw_irq,
 				  struct xive_irq_data *data);
 void xive_cleanup_irq_data(struct xive_irq_data *xd);
+void xive_irq_free_data(unsigned int virq);
 void xive_native_free_irq(u32 irq);
 int xive_native_configure_irq(u32 hw_irq, u32 target, u8 prio, u32 sw_irq);
 
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index a41c448520d4..da9d63a088bb 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -529,6 +529,19 @@ static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev
 	return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
 }
 
+/*
+ * ->msi_free() is called before irq_domain_free_irqs_top() when the
+ * handler data is still available. Use that to clear the XIVE
+ * controller data.
+ */
+static void pseries_msi_ops_msi_free(struct irq_domain *domain,
+				     struct msi_domain_info *info,
+				     unsigned int irq)
+{
+	if (xive_enabled())
+		xive_irq_free_data(irq);
+}
+
 /*
  * RTAS can not disable one MSI at a time. It's all or nothing. Do it
  * at the end after all IRQs have been freed.
@@ -546,6 +559,7 @@ static void pseries_msi_domain_free_irqs(struct irq_domain *domain,
 
 static struct msi_domain_ops pseries_pci_msi_domain_ops = {
 	.msi_prepare	= pseries_msi_ops_prepare,
+	.msi_free	= pseries_msi_ops_msi_free,
 	.domain_free_irqs = pseries_msi_domain_free_irqs,
 };
 
@@ -660,7 +674,7 @@ static void pseries_irq_domain_free(struct irq_domain *domain, unsigned int virq
 
 	pr_debug("%s bridge %pOF %d #%d\n", __func__, phb->dn, virq, nr_irqs);
 
-	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
+	/* XIVE domain data is cleared through ->msi_free() */
 }
 
 static const struct irq_domain_ops pseries_irq_domain_ops = {
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 3485baf9ec8c..191cd80ec534 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -980,6 +980,8 @@ EXPORT_SYMBOL_GPL(is_xive_irq);
 
 void xive_cleanup_irq_data(struct xive_irq_data *xd)
 {
+	pr_debug("%s for HW %x\n", __func__, xd->hw_irq);
+
 	if (xd->eoi_mmio) {
 		unmap_kernel_range((unsigned long)xd->eoi_mmio,
 				   1u << xd->esb_shift);
@@ -1025,7 +1027,7 @@ static int xive_irq_alloc_data(unsigned int virq, irq_hw_number_t hw)
 	return 0;
 }
 
-static void xive_irq_free_data(unsigned int virq)
+void xive_irq_free_data(unsigned int virq)
 {
 	struct xive_irq_data *xd = irq_get_handler_data(virq);
 
@@ -1035,6 +1037,7 @@ static void xive_irq_free_data(unsigned int virq)
 	xive_cleanup_irq_data(xd);
 	kfree(xd);
 }
+EXPORT_SYMBOL_GPL(xive_irq_free_data);
 
 #ifdef CONFIG_SMP
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/31] powerpc/pseries/pci: Add support of MSI domains to PHB hotplug
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (8 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 09/31] powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 11/31] powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup() Cédric Le Goater
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

Simply allocate or release the MSI domains when a PHB is inserted in
or removed from the machine.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/pseries/pseries.h   |  1 +
 arch/powerpc/platforms/pseries/msi.c       | 10 ++++++++++
 arch/powerpc/platforms/pseries/pci_dlpar.c |  4 ++++
 3 files changed, 15 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 91cf2afcf423..57bf4c2091e1 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -88,6 +88,7 @@ int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
 
 extern struct pci_controller_ops pseries_pci_controller_ops;
 int pseries_msi_allocate_domains(struct pci_controller *phb);
+void pseries_msi_free_domains(struct pci_controller *phb);
 
 unsigned long pseries_memory_block_size(void);
 
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index da9d63a088bb..d1470941cadf 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -733,6 +733,16 @@ int pseries_msi_allocate_domains(struct pci_controller *phb)
 	return __pseries_msi_allocate_domains(phb, count);
 }
 
+void pseries_msi_free_domains(struct pci_controller *phb)
+{
+	if (phb->msi_domain)
+		irq_domain_remove(phb->msi_domain);
+	if (phb->dev_domain)
+		irq_domain_remove(phb->dev_domain);
+	if (phb->fwnode)
+		irq_domain_free_fwnode(phb->fwnode);
+}
+
 static void rtas_msi_pci_irq_fixup(struct pci_dev *pdev)
 {
 	/* No LSI -> leave MSIs (if any) configured */
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index f9ae17e8a0f4..cf8a2e7a0f2c 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -33,6 +33,8 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn)
 
 	pci_devs_phb_init_dynamic(phb);
 
+	pseries_msi_allocate_domains(phb);
+
 	/* Create EEH devices for the PHB */
 	eeh_phb_pe_create(phb);
 
@@ -73,6 +75,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
 		}
 	}
 
+	pseries_msi_free_domains(phb);
+
 	/* Remove the PCI bus and unregister the bridge device from sysfs */
 	phb->bus = NULL;
 	pci_remove_bus(b);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 11/31] powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup()
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (9 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 10/31] powerpc/pseries/pci: Add support of MSI domains to PHB hotplug Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 12/31] powerpc/powernv/pci: Add MSI domains Cédric Le Goater
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

It will be used as a 'compose_msg' handler of the MSI domain
introduced later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 28 +++++++++++++++++++----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f0f901683a2f..b2a8da6114b5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2160,15 +2160,17 @@ bool is_pnv_opal_msi(struct irq_chip *chip)
 }
 EXPORT_SYMBOL_GPL(is_pnv_opal_msi);
 
-static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
-				  unsigned int hwirq, unsigned int virq,
-				  unsigned int is_64, struct msi_msg *msg)
+static int __pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
+				    unsigned int xive_num,
+				    unsigned int is_64, struct msi_msg *msg)
 {
 	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
-	unsigned int xive_num = hwirq - phb->msi_base;
 	__be32 data;
 	int rc;
 
+	dev_dbg(&dev->dev, "%s: setup %s-bit MSI for vector #%d\n", __func__,
+		is_64 ? "64" : "32", xive_num);
+
 	/* No PE assigned ? bail out ... no MSI for you ! */
 	if (pe == NULL)
 		return -ENXIO;
@@ -2216,12 +2218,28 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	}
 	msg->data = be32_to_cpu(data);
 
+	return 0;
+}
+
+static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
+				  unsigned int hwirq, unsigned int virq,
+				  unsigned int is_64, struct msi_msg *msg)
+{
+	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
+	unsigned int xive_num = hwirq - phb->msi_base;
+	int rc;
+
+	rc = __pnv_pci_ioda_msi_setup(phb, dev, xive_num, is_64, msg);
+	if (rc)
+		return rc;
+
+	/* P8 only */
 	pnv_set_msi_irq_chip(phb, virq);
 
 	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
 		 " address=%x_%08x data=%x PE# %x\n",
 		 pci_name(dev), is_64 ? "64" : "32", hwirq, xive_num,
-		 msg->address_hi, msg->address_lo, data, pe->pe_number);
+		 msg->address_hi, msg->address_lo, msg->data, pe->pe_number);
 
 	return 0;
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 12/31] powerpc/powernv/pci: Add MSI domains
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (10 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 11/31] powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup() Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 13/31] KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts Cédric Le Goater
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

This is very similar to the MSI domains of the pSeries platform. The
MSI allocator is directly handled under the Linux PHB in the
in-the-middle "MSI" domain.

Only the XIVE (P9/P10) parent domain is supported for now. We still
need to add support for IRQ domain hierarchy under XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 188 ++++++++++++++++++++++
 1 file changed, 188 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index b2a8da6114b5..3886ca6e2ed3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -36,6 +36,7 @@
 #include <asm/firmware.h>
 #include <asm/pnv-pci.h>
 #include <asm/mmzone.h>
+#include <asm/xive.h>
 
 #include <misc/cxl-base.h>
 
@@ -2244,6 +2245,189 @@ static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	return 0;
 }
 
+/*
+ * The msi_free() op is called before irq_domain_free_irqs_top() when
+ * the handler data is still available. Use that to clear the XIVE
+ * controller.
+ */
+static void pnv_msi_ops_msi_free(struct irq_domain *domain,
+				 struct msi_domain_info *info,
+				 unsigned int irq)
+{
+	if (xive_enabled())
+		xive_irq_free_data(irq);
+}
+
+static struct msi_domain_ops pnv_pci_msi_domain_ops = {
+	.msi_free	= pnv_msi_ops_msi_free,
+};
+
+static void pnv_msi_shutdown(struct irq_data *d)
+{
+	d = d->parent_data;
+	if (d->chip->irq_shutdown)
+		d->chip->irq_shutdown(d);
+}
+
+static void pnv_msi_mask(struct irq_data *d)
+{
+	pci_msi_mask_irq(d);
+	irq_chip_mask_parent(d);
+}
+
+static void pnv_msi_unmask(struct irq_data *d)
+{
+	pci_msi_unmask_irq(d);
+	irq_chip_unmask_parent(d);
+}
+
+static struct irq_chip pnv_pci_msi_irq_chip = {
+	.name		= "PNV-PCI-MSI",
+	.irq_shutdown	= pnv_msi_shutdown,
+	.irq_mask	= pnv_msi_mask,
+	.irq_unmask	= pnv_msi_unmask,
+	.irq_eoi	= irq_chip_eoi_parent,
+};
+
+static struct msi_domain_info pnv_msi_domain_info = {
+	.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
+		  MSI_FLAG_MULTI_PCI_MSI  | MSI_FLAG_PCI_MSIX),
+	.ops   = &pnv_pci_msi_domain_ops,
+	.chip  = &pnv_pci_msi_irq_chip,
+};
+
+static void pnv_msi_compose_msg(struct irq_data *d, struct msi_msg *msg)
+{
+	struct msi_desc *entry = irq_data_get_msi_desc(d);
+	struct pci_dev *pdev = msi_desc_to_pci_dev(entry);
+	struct pci_controller *hose = irq_data_get_irq_chip_data(d);
+	struct pnv_phb *phb = hose->private_data;
+	int rc;
+
+	rc = __pnv_pci_ioda_msi_setup(phb, pdev, d->hwirq,
+				      entry->msi_attrib.is_64, msg);
+	if (rc)
+		dev_err(&pdev->dev, "Failed to setup %s-bit MSI #%ld : %d\n",
+			entry->msi_attrib.is_64 ? "64" : "32", d->hwirq, rc);
+}
+
+static struct irq_chip pnv_msi_irq_chip = {
+	.name			= "PNV-MSI",
+	.irq_shutdown		= pnv_msi_shutdown,
+	.irq_mask		= irq_chip_mask_parent,
+	.irq_unmask		= irq_chip_unmask_parent,
+	.irq_eoi		= irq_chip_eoi_parent,
+	.irq_set_affinity	= irq_chip_set_affinity_parent,
+	.irq_compose_msi_msg	= pnv_msi_compose_msg,
+};
+
+static int pnv_irq_parent_domain_alloc(struct irq_domain *domain,
+				       unsigned int virq, int hwirq)
+{
+	struct irq_fwspec parent_fwspec;
+	int ret;
+
+	parent_fwspec.fwnode = domain->parent->fwnode;
+	parent_fwspec.param_count = 2;
+	parent_fwspec.param[0] = hwirq;
+	parent_fwspec.param[1] = IRQ_TYPE_EDGE_RISING;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, 1, &parent_fwspec);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int pnv_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
+				unsigned int nr_irqs, void *arg)
+{
+	struct pci_controller *hose = domain->host_data;
+	struct pnv_phb *phb = hose->private_data;
+	msi_alloc_info_t *info = arg;
+	struct pci_dev *pdev = msi_desc_to_pci_dev(info->desc);
+	int hwirq;
+	int i, ret;
+
+	hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, nr_irqs);
+	if (hwirq < 0) {
+		dev_warn(&pdev->dev, "failed to find a free MSI\n");
+		return -ENOSPC;
+	}
+
+	dev_dbg(&pdev->dev, "%s bridge %pOF %d/%x #%d\n", __func__,
+		hose->dn, virq, hwirq, nr_irqs);
+
+	for (i = 0; i < nr_irqs; i++) {
+		ret = pnv_irq_parent_domain_alloc(domain, virq + i,
+						  phb->msi_base + hwirq + i);
+		if (ret)
+			goto out;
+
+		irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
+					      &pnv_msi_irq_chip, hose);
+	}
+
+	return 0;
+
+out:
+	irq_domain_free_irqs_parent(domain, virq, i - 1);
+	msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq, nr_irqs);
+	return ret;
+}
+
+static void pnv_irq_domain_free(struct irq_domain *domain, unsigned int virq,
+				unsigned int nr_irqs)
+{
+	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
+	struct pci_controller *hose = irq_data_get_irq_chip_data(d);
+	struct pnv_phb *phb = hose->private_data;
+
+	pr_debug("%s bridge %pOF %d/%lx #%d\n", __func__, hose->dn,
+		 virq, d->hwirq, nr_irqs);
+
+	msi_bitmap_free_hwirqs(&phb->msi_bmp, d->hwirq, nr_irqs);
+	/* XIVE domain is cleared through ->msi_free() */
+}
+
+static const struct irq_domain_ops pnv_irq_domain_ops = {
+	.alloc  = pnv_irq_domain_alloc,
+	.free   = pnv_irq_domain_free,
+};
+
+static int pnv_msi_allocate_domains(struct pci_controller *hose, unsigned int count)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct irq_domain *parent = irq_get_default_host();
+
+	hose->fwnode = irq_domain_alloc_named_id_fwnode("PNV-MSI", phb->opal_id);
+	if (!hose->fwnode)
+		return -ENOMEM;
+
+	hose->dev_domain = irq_domain_create_hierarchy(parent, 0, count,
+						       hose->fwnode,
+						       &pnv_irq_domain_ops, hose);
+	if (!hose->dev_domain) {
+		pr_err("PCI: failed to create IRQ domain bridge %pOF (domain %d)\n",
+		       hose->dn, hose->global_number);
+		irq_domain_free_fwnode(hose->fwnode);
+		return -ENOMEM;
+	}
+
+	hose->msi_domain = pci_msi_create_irq_domain(of_node_to_fwnode(hose->dn),
+						     &pnv_msi_domain_info,
+						     hose->dev_domain);
+	if (!hose->msi_domain) {
+		pr_err("PCI: failed to create MSI IRQ domain bridge %pOF (domain %d)\n",
+		       hose->dn, hose->global_number);
+		irq_domain_free_fwnode(hose->fwnode);
+		irq_domain_remove(hose->dev_domain);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 {
 	unsigned int count;
@@ -2268,6 +2452,10 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 	phb->msi32_support = 1;
 	pr_info("  Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
 		count, phb->msi_base);
+
+	/* Only supported by the XIVE driver */
+	if (xive_enabled())
+		pnv_msi_allocate_domains(phb->hose, count);
 }
 
 static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 13/31] KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (11 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 12/31] powerpc/powernv/pci: Add MSI domains Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 14/31] KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines Cédric Le Goater
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

Passthrough PCI MSI interrupts are detected in KVM with a check on a
specific EOI handler (P8) or on XIVE (P9). We can now check the
PCI-MSI IRQ chip which is cleaner.

Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_hv.c              | 2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index deb450e4289e..86a0f8b0e6da 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5153,7 +5153,7 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
 	 * what our real-mode EOI code does, or a XIVE interrupt
 	 */
 	chip = irq_data_get_irq_chip(&desc->irq_data);
-	if (!chip || !(is_pnv_opal_msi(chip) || is_xive_irq(chip))) {
+	if (!chip || !is_pnv_opal_msi(chip)) {
 		pr_warn("kvmppc_set_passthru_irq_hv: Could not assign IRQ map for (%d,%d)\n",
 			host_irq, guest_gsi);
 		mutex_unlock(&kvm->lock);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3886ca6e2ed3..7b75af17dc59 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2151,13 +2151,15 @@ void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 	irq_set_chip(virq, &phb->ioda.irq_chip);
 }
 
+static struct irq_chip pnv_pci_msi_irq_chip;
+
 /*
  * Returns true iff chip is something that we could call
  * pnv_opal_pci_msi_eoi for.
  */
 bool is_pnv_opal_msi(struct irq_chip *chip)
 {
-	return chip->irq_eoi == pnv_ioda2_msi_eoi;
+	return chip->irq_eoi == pnv_ioda2_msi_eoi || chip == &pnv_pci_msi_irq_chip;
 }
 EXPORT_SYMBOL_GPL(is_pnv_opal_msi);
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 14/31] KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (12 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 13/31] KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts Cédric Le Goater
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

The routine kvmppc_set_passthru_irq() calls kvmppc_xive_set_mapped()
and kvmppc_xive_clr_mapped() with an IRQ descriptor. Use directly the
host IRQ number to remove a useless conversion.

Add some debug.

Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/kvm_ppc.h |  4 ++--
 arch/powerpc/kvm/book3s_hv.c       |  4 ++--
 arch/powerpc/kvm/book3s_xive.c     | 17 ++++++++---------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 8aacd76bb702..d6c52a0ec687 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -663,9 +663,9 @@ extern int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 				    struct kvm_vcpu *vcpu, u32 cpu);
 extern void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu);
 extern int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
-				  struct irq_desc *host_desc);
+				  unsigned long host_irq);
 extern int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
-				  struct irq_desc *host_desc);
+				  unsigned long host_irq);
 extern u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu);
 extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 86a0f8b0e6da..9f4eb74a11cc 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5196,7 +5196,7 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
 		pimap->n_mapped++;
 
 	if (xics_on_xive())
-		rc = kvmppc_xive_set_mapped(kvm, guest_gsi, desc);
+		rc = kvmppc_xive_set_mapped(kvm, guest_gsi, host_irq);
 	else
 		kvmppc_xics_set_mapped(kvm, guest_gsi, desc->irq_data.hwirq);
 	if (rc)
@@ -5237,7 +5237,7 @@ static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
 	}
 
 	if (xics_on_xive())
-		rc = kvmppc_xive_clr_mapped(kvm, guest_gsi, pimap->mapped[i].desc);
+		rc = kvmppc_xive_clr_mapped(kvm, guest_gsi, host_irq);
 	else
 		kvmppc_xics_clr_mapped(kvm, guest_gsi, pimap->mapped[i].r_hwirq);
 
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index e7219b6f5f9a..3a7da42bed57 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -856,13 +856,12 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
 }
 
 int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
-			   struct irq_desc *host_desc)
+			   unsigned long host_irq)
 {
 	struct kvmppc_xive *xive = kvm->arch.xive;
 	struct kvmppc_xive_src_block *sb;
 	struct kvmppc_xive_irq_state *state;
-	struct irq_data *host_data = irq_desc_get_irq_data(host_desc);
-	unsigned int host_irq = irq_desc_get_irq(host_desc);
+	struct irq_data *host_data = irq_get_irq_data(host_irq);
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data);
 	u16 idx;
 	u8 prio;
@@ -871,7 +870,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 	if (!xive)
 		return -ENODEV;
 
-	pr_devel("set_mapped girq 0x%lx host HW irq 0x%x...\n",guest_irq, hw_irq);
+	pr_debug("%s: GIRQ 0x%lx host IRQ %ld XIVE HW IRQ 0x%x\n",
+		 __func__, guest_irq, host_irq, hw_irq);
 
 	sb = kvmppc_xive_find_source(xive, guest_irq, &idx);
 	if (!sb)
@@ -893,7 +893,7 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 	 */
 	rc = irq_set_vcpu_affinity(host_irq, state);
 	if (rc) {
-		pr_err("Failed to set VCPU affinity for irq %d\n", host_irq);
+		pr_err("Failed to set VCPU affinity for host IRQ %ld\n", host_irq);
 		return rc;
 	}
 
@@ -953,12 +953,11 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 EXPORT_SYMBOL_GPL(kvmppc_xive_set_mapped);
 
 int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
-			   struct irq_desc *host_desc)
+			   unsigned long host_irq)
 {
 	struct kvmppc_xive *xive = kvm->arch.xive;
 	struct kvmppc_xive_src_block *sb;
 	struct kvmppc_xive_irq_state *state;
-	unsigned int host_irq = irq_desc_get_irq(host_desc);
 	u16 idx;
 	u8 prio;
 	int rc;
@@ -966,7 +965,7 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 	if (!xive)
 		return -ENODEV;
 
-	pr_devel("clr_mapped girq 0x%lx...\n", guest_irq);
+	pr_debug("%s: GIRQ 0x%lx host IRQ %ld\n", __func__, guest_irq, host_irq);
 
 	sb = kvmppc_xive_find_source(xive, guest_irq, &idx);
 	if (!sb)
@@ -993,7 +992,7 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 	/* Release the passed-through interrupt to the host */
 	rc = irq_set_vcpu_affinity(host_irq, NULL);
 	if (rc) {
-		pr_err("Failed to clr VCPU affinity for irq %d\n", host_irq);
+		pr_err("Failed to clr VCPU affinity for host IRQ %ld\n", host_irq);
 		return rc;
 	}
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (13 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 14/31] KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-05-14 20:51   ` Thomas Gleixner
  2021-04-30  8:03 ` [PATCH 16/31] powerpc/xics: Remove ICS list Cédric Le Goater
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Cédric Le Goater

PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
underlying calls handling the passthrough of the interrupt in the
guest need a number in the XIVE IRQ domain.

Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
PCI-MSI domain.

Exporting irq_get_default_host() might not be the best solution.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c | 3 ++-
 kernel/irq/irqdomain.c         | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 3a7da42bed57..81b9f4fc3978 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -861,7 +861,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 	struct kvmppc_xive *xive = kvm->arch.xive;
 	struct kvmppc_xive_src_block *sb;
 	struct kvmppc_xive_irq_state *state;
-	struct irq_data *host_data = irq_get_irq_data(host_irq);
+	struct irq_data *host_data =
+		irq_domain_get_irq_data(irq_get_default_host(), host_irq);
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data);
 	u16 idx;
 	u8 prio;
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index d10ab1d689d5..8a073d1ce611 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -481,6 +481,7 @@ struct irq_domain *irq_get_default_host(void)
 {
 	return irq_default_domain;
 }
+EXPORT_SYMBOL_GPL(irq_get_default_host);
 
 static void irq_domain_clear_mapping(struct irq_domain *domain,
 				     irq_hw_number_t hwirq)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 16/31] powerpc/xics: Remove ICS list
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (14 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 17/31] powerpc/xics: Rename the map handler in a check handler Cédric Le Goater
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

We always had only one ICS per machine. Simplify the XICS driver by
removing the ICS list.

The ICS stored in the chip data of the XICS domain becomes useless and
we don't need it anymore to migrate away IRQs from a CPU. This will be
removed in a subsequent patch.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xics/xics-common.c | 45 +++++++++++---------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index 7e4305c01bac..509b9432c368 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -38,7 +38,7 @@ DEFINE_PER_CPU(struct xics_cppr, xics_cppr);
 
 struct irq_domain *xics_host;
 
-static LIST_HEAD(ics_list);
+static struct ics *xics_ics;
 
 void xics_update_irq_servers(void)
 {
@@ -111,12 +111,11 @@ void xics_setup_cpu(void)
 
 void xics_mask_unknown_vec(unsigned int vec)
 {
-	struct ics *ics;
-
 	pr_err("Interrupt 0x%x (real) is invalid, disabling it.\n", vec);
 
-	list_for_each_entry(ics, &ics_list, link)
-		ics->mask_unknown(ics, vec);
+	if (WARN_ON(!xics_ics))
+		return;
+	xics_ics->mask_unknown(xics_ics, vec);
 }
 
 
@@ -198,7 +197,6 @@ void xics_migrate_irqs_away(void)
 		struct irq_chip *chip;
 		long server;
 		unsigned long flags;
-		struct ics *ics;
 
 		/* We can't set affinity on ISA interrupts */
 		if (virq < NUM_ISA_INTERRUPTS)
@@ -219,13 +217,10 @@ void xics_migrate_irqs_away(void)
 		raw_spin_lock_irqsave(&desc->lock, flags);
 
 		/* Locate interrupt server */
-		server = -1;
-		ics = irq_desc_get_chip_data(desc);
-		if (ics)
-			server = ics->get_server(ics, irq);
+		server = xics_ics->get_server(xics_ics, irq);
 		if (server < 0) {
-			printk(KERN_ERR "%s: Can't find server for irq %d\n",
-			       __func__, irq);
+			pr_err("%s: Can't find server for irq %d/%x\n",
+			       __func__, virq, irq);
 			goto unlock;
 		}
 
@@ -307,13 +302,9 @@ int xics_get_irq_server(unsigned int virq, const struct cpumask *cpumask,
 static int xics_host_match(struct irq_domain *h, struct device_node *node,
 			   enum irq_domain_bus_token bus_token)
 {
-	struct ics *ics;
-
-	list_for_each_entry(ics, &ics_list, link)
-		if (ics->host_match(ics, node))
-			return 1;
-
-	return 0;
+	if (WARN_ON(!xics_ics))
+		return 0;
+	return xics_ics->host_match(xics_ics, node) ? 1 : 0;
 }
 
 /* Dummies */
@@ -330,8 +321,6 @@ static struct irq_chip xics_ipi_chip = {
 static int xics_host_map(struct irq_domain *h, unsigned int virq,
 			 irq_hw_number_t hw)
 {
-	struct ics *ics;
-
 	pr_devel("xics: map virq %d, hwirq 0x%lx\n", virq, hw);
 
 	/*
@@ -348,12 +337,14 @@ static int xics_host_map(struct irq_domain *h, unsigned int virq,
 		return 0;
 	}
 
+	if (WARN_ON(!xics_ics))
+		return -EINVAL;
+
 	/* Let the ICS setup the chip data */
-	list_for_each_entry(ics, &ics_list, link)
-		if (ics->map(ics, virq) == 0)
-			return 0;
+	if (xics_ics->map(xics_ics, virq))
+		return -EINVAL;
 
-	return -EINVAL;
+	return 0;
 }
 
 static int xics_host_xlate(struct irq_domain *h, struct device_node *ct,
@@ -427,7 +418,9 @@ static void __init xics_init_host(void)
 
 void __init xics_register_ics(struct ics *ics)
 {
-	list_add(&ics->link, &ics_list);
+	if (WARN_ONCE(xics_ics, "XICS: Source Controller is already defined !"))
+		return;
+	xics_ics = ics;
 }
 
 static void __init xics_get_server_size(void)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 17/31] powerpc/xics: Rename the map handler in a check handler
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (15 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 16/31] powerpc/xics: Remove ICS list Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 18/31] powerpc/xics: Give a name to the default XICS IRQ domain Cédric Le Goater
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

This moves the IRQ initialization done under the OPAL and RTAS backends
in the common part of XICS. The 'map' handler becomes a simple 'check'
on the HW IRQ at the FW level.

As we don't need an ICS anymore in xics_migrate_irqs_away(), the XICS
domain does not set a chip data for the IRQ.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/xics.h        |  3 ++-
 arch/powerpc/sysdev/xics/ics-opal.c    | 27 +++++++++----------------
 arch/powerpc/sysdev/xics/ics-rtas.c    | 28 +++++++++-----------------
 arch/powerpc/sysdev/xics/xics-common.c | 15 ++++++++------
 4 files changed, 31 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/xics.h b/arch/powerpc/include/asm/xics.h
index 8e903b3f9c24..01b51a926f56 100644
--- a/arch/powerpc/include/asm/xics.h
+++ b/arch/powerpc/include/asm/xics.h
@@ -85,10 +85,11 @@ static inline int ics_opal_init(void) { return -ENODEV; }
 /* ICS instance, hooked up to chip_data of an irq */
 struct ics {
 	struct list_head link;
-	int (*map)(struct ics *ics, unsigned int virq);
+	int (*check)(struct ics *ics, unsigned int hwirq);
 	void (*mask_unknown)(struct ics *ics, unsigned long vec);
 	long (*get_server)(struct ics *ics, unsigned long vec);
 	int (*host_match)(struct ics *ics, struct device_node *node);
+	struct irq_chip *chip;
 	char data[];
 };
 
diff --git a/arch/powerpc/sysdev/xics/ics-opal.c b/arch/powerpc/sysdev/xics/ics-opal.c
index 823f6c9664cd..8c7ddcc718b6 100644
--- a/arch/powerpc/sysdev/xics/ics-opal.c
+++ b/arch/powerpc/sysdev/xics/ics-opal.c
@@ -157,26 +157,13 @@ static struct irq_chip ics_opal_irq_chip = {
 	.irq_retrigger = xics_retrigger,
 };
 
-static int ics_opal_map(struct ics *ics, unsigned int virq);
-static void ics_opal_mask_unknown(struct ics *ics, unsigned long vec);
-static long ics_opal_get_server(struct ics *ics, unsigned long vec);
-
 static int ics_opal_host_match(struct ics *ics, struct device_node *node)
 {
 	return 1;
 }
 
-/* Only one global & state struct ics */
-static struct ics ics_hal = {
-	.map		= ics_opal_map,
-	.mask_unknown	= ics_opal_mask_unknown,
-	.get_server	= ics_opal_get_server,
-	.host_match	= ics_opal_host_match,
-};
-
-static int ics_opal_map(struct ics *ics, unsigned int virq)
+static int ics_opal_check(struct ics *ics, unsigned int hw_irq)
 {
-	unsigned int hw_irq = (unsigned int)virq_to_hw(virq);
 	int64_t rc;
 	__be16 server;
 	int8_t priority;
@@ -189,9 +176,6 @@ static int ics_opal_map(struct ics *ics, unsigned int virq)
 	if (rc != OPAL_SUCCESS)
 		return -ENXIO;
 
-	irq_set_chip_and_handler(virq, &ics_opal_irq_chip, handle_fasteoi_irq);
-	irq_set_chip_data(virq, &ics_hal);
-
 	return 0;
 }
 
@@ -222,6 +206,15 @@ static long ics_opal_get_server(struct ics *ics, unsigned long vec)
 	return ics_opal_unmangle_server(be16_to_cpu(server));
 }
 
+/* Only one global & state struct ics */
+static struct ics ics_hal = {
+	.check		= ics_opal_check,
+	.mask_unknown	= ics_opal_mask_unknown,
+	.get_server	= ics_opal_get_server,
+	.host_match	= ics_opal_host_match,
+	.chip		= &ics_opal_irq_chip,
+};
+
 int __init ics_opal_init(void)
 {
 	if (!firmware_has_feature(FW_FEATURE_OPAL))
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index 4cf18000f07c..6d19d711ed35 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -24,19 +24,6 @@ static int ibm_set_xive;
 static int ibm_int_on;
 static int ibm_int_off;
 
-static int ics_rtas_map(struct ics *ics, unsigned int virq);
-static void ics_rtas_mask_unknown(struct ics *ics, unsigned long vec);
-static long ics_rtas_get_server(struct ics *ics, unsigned long vec);
-static int ics_rtas_host_match(struct ics *ics, struct device_node *node);
-
-/* Only one global & state struct ics */
-static struct ics ics_rtas = {
-	.map		= ics_rtas_map,
-	.mask_unknown	= ics_rtas_mask_unknown,
-	.get_server	= ics_rtas_get_server,
-	.host_match	= ics_rtas_host_match,
-};
-
 static void ics_rtas_unmask_irq(struct irq_data *d)
 {
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
@@ -169,9 +156,8 @@ static struct irq_chip ics_rtas_irq_chip = {
 	.irq_retrigger = xics_retrigger,
 };
 
-static int ics_rtas_map(struct ics *ics, unsigned int virq)
+static int ics_rtas_check(struct ics *ics, unsigned int hw_irq)
 {
-	unsigned int hw_irq = (unsigned int)virq_to_hw(virq);
 	int status[2];
 	int rc;
 
@@ -183,9 +169,6 @@ static int ics_rtas_map(struct ics *ics, unsigned int virq)
 	if (rc)
 		return -ENXIO;
 
-	irq_set_chip_and_handler(virq, &ics_rtas_irq_chip, handle_fasteoi_irq);
-	irq_set_chip_data(virq, &ics_rtas);
-
 	return 0;
 }
 
@@ -213,6 +196,15 @@ static int ics_rtas_host_match(struct ics *ics, struct device_node *node)
 	return !of_device_is_compatible(node, "chrp,iic");
 }
 
+/* Only one global & state struct ics */
+static struct ics ics_rtas = {
+	.check		= ics_rtas_check,
+	.mask_unknown	= ics_rtas_mask_unknown,
+	.get_server	= ics_rtas_get_server,
+	.host_match	= ics_rtas_host_match,
+	.chip = &ics_rtas_irq_chip,
+};
+
 __init int ics_rtas_init(void)
 {
 	ibm_get_xive = rtas_token("ibm,get-xive");
diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index 509b9432c368..2fa45cd12a82 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -318,10 +318,10 @@ static struct irq_chip xics_ipi_chip = {
 	.irq_unmask = xics_ipi_unmask,
 };
 
-static int xics_host_map(struct irq_domain *h, unsigned int virq,
-			 irq_hw_number_t hw)
+static int xics_host_map(struct irq_domain *domain, unsigned int virq,
+			 irq_hw_number_t hwirq)
 {
-	pr_devel("xics: map virq %d, hwirq 0x%lx\n", virq, hw);
+	pr_devel("xics: map virq %d, hwirq 0x%lx\n", virq, hwirq);
 
 	/*
 	 * Mark interrupts as edge sensitive by default so that resend
@@ -331,7 +331,7 @@ static int xics_host_map(struct irq_domain *h, unsigned int virq,
 	irq_clear_status_flags(virq, IRQ_LEVEL);
 
 	/* Don't call into ICS for IPIs */
-	if (hw == XICS_IPI) {
+	if (hwirq == XICS_IPI) {
 		irq_set_chip_and_handler(virq, &xics_ipi_chip,
 					 handle_percpu_irq);
 		return 0;
@@ -340,10 +340,13 @@ static int xics_host_map(struct irq_domain *h, unsigned int virq,
 	if (WARN_ON(!xics_ics))
 		return -EINVAL;
 
-	/* Let the ICS setup the chip data */
-	if (xics_ics->map(xics_ics, virq))
+	if (xics_ics->check(xics_ics, hwirq))
 		return -EINVAL;
 
+	/* No chip data for the XICS domain */
+	irq_domain_set_info(domain, virq, hwirq, xics_ics->chip,
+			    NULL, handle_fasteoi_irq, NULL, NULL);
+
 	return 0;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 18/31] powerpc/xics: Give a name to the default XICS IRQ domain
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (16 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 17/31] powerpc/xics: Rename the map handler in a check handler Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 19/31] powerpc/xics: Add debug logging to the set_irq_affinity handlers Cédric Le Goater
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

and clean up the error path.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xics/xics-common.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index 2fa45cd12a82..9815873333c7 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -412,11 +412,22 @@ static const struct irq_domain_ops xics_host_ops = {
 	.xlate = xics_host_xlate,
 };
 
-static void __init xics_init_host(void)
+static int __init xics_allocate_domain(void)
 {
-	xics_host = irq_domain_add_tree(NULL, &xics_host_ops, NULL);
-	BUG_ON(xics_host == NULL);
+	struct fwnode_handle *fn;
+
+	fn = irq_domain_alloc_named_fwnode("XICS");
+	if (!fn)
+		return -ENOMEM;
+
+	xics_host = irq_domain_create_tree(fn, &xics_host_ops, NULL);
+	if (!xics_host) {
+		irq_domain_free_fwnode(fn);
+		return -ENOMEM;
+	}
+
 	irq_set_default_host(xics_host);
+	return 0;
 }
 
 void __init xics_register_ics(struct ics *ics)
@@ -478,6 +489,8 @@ void __init xics_init(void)
 	/* Initialize common bits */
 	xics_get_server_size();
 	xics_update_irq_servers();
-	xics_init_host();
+	rc = xics_allocate_domain();
+	if (rc < 0)
+		pr_err("XICS: Failed to create IRQ domain");
 	xics_setup_cpu();
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 19/31] powerpc/xics: Add debug logging to the set_irq_affinity handlers
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (17 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 18/31] powerpc/xics: Give a name to the default XICS IRQ domain Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 20/31] powerpc/xics: Add support for IRQ domain hierarchy Cédric Le Goater
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

It really helps to know how the HW is configured when tweaking the IRQ
subsystem.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xics/ics-opal.c | 2 +-
 arch/powerpc/sysdev/xics/ics-rtas.c | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/xics/ics-opal.c b/arch/powerpc/sysdev/xics/ics-opal.c
index 8c7ddcc718b6..bf26cae1b982 100644
--- a/arch/powerpc/sysdev/xics/ics-opal.c
+++ b/arch/powerpc/sysdev/xics/ics-opal.c
@@ -133,7 +133,7 @@ static int ics_opal_set_affinity(struct irq_data *d,
 	}
 	server = ics_opal_mangle_server(wanted_server);
 
-	pr_devel("ics-hal: set-affinity irq %d [hw 0x%x] server: 0x%x/0x%x\n",
+	pr_debug("ics-hal: set-affinity irq %d [hw 0x%x] server: 0x%x/0x%x\n",
 		 d->irq, hw_irq, wanted_server, server);
 
 	rc = opal_set_xive(hw_irq, server, priority);
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index 6d19d711ed35..b50c6341682e 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -133,6 +133,9 @@ static int ics_rtas_set_affinity(struct irq_data *d,
 		return -1;
 	}
 
+	pr_debug("%s: irq %d [hw 0x%x] server: 0x%x\n", __func__, d->irq,
+		 hw_irq, irq_server);
+
 	status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL,
 				     hw_irq, irq_server, xics_status[1]);
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 20/31] powerpc/xics: Add support for IRQ domain hierarchy
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (18 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 19/31] powerpc/xics: Add debug logging to the set_irq_affinity handlers Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 21/31] powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3 Cédric Le Goater
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

XICS doesn't have any state associated with the IRQ. The support is
straightforward and simpler than for XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xics/xics-common.c | 37 ++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index 9815873333c7..05d21005dc79 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -406,7 +406,44 @@ int xics_retrigger(struct irq_data *data)
 	return 0;
 }
 
+static int xics_host_domain_translate(struct irq_domain *d, struct irq_fwspec *fwspec,
+				      unsigned long *hwirq, unsigned int *type)
+{
+	return xics_host_xlate(d, to_of_node(fwspec->fwnode), fwspec->param,
+			       fwspec->param_count, hwirq, type);
+}
+
+static int xics_host_domain_alloc(struct irq_domain *domain, unsigned int virq,
+				  unsigned int nr_irqs, void *arg)
+{
+	struct irq_fwspec *fwspec = arg;
+	irq_hw_number_t hwirq;
+	unsigned int type = IRQ_TYPE_NONE;
+	int i, rc;
+
+	rc = xics_host_domain_translate(domain, fwspec, &hwirq, &type);
+	if (rc)
+		return rc;
+
+	pr_debug("%s %d/%lx #%d\n", __func__, virq, hwirq, nr_irqs);
+
+	for (i = 0; i < nr_irqs; i++)
+		irq_domain_set_info(domain, virq + i, hwirq + i, xics_ics->chip,
+				    xics_ics, handle_fasteoi_irq, NULL, NULL);
+
+	return 0;
+}
+
+static void xics_host_domain_free(struct irq_domain *domain,
+				  unsigned int virq, unsigned int nr_irqs)
+{
+	pr_debug("%s %d #%d\n", __func__, virq, nr_irqs);
+}
+
 static const struct irq_domain_ops xics_host_ops = {
+	.alloc	= xics_host_domain_alloc,
+	.free	= xics_host_domain_free,
+	.translate = xics_host_domain_translate,
 	.match = xics_host_match,
 	.map = xics_host_map,
 	.xlate = xics_host_xlate,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 21/31] powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (19 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 20/31] powerpc/xics: Add support for IRQ domain hierarchy Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 22/31] powerpc/pci: Drop XIVE restriction on MSI domains Cédric Le Goater
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

PHB3s need an extra OPAL call to EOI the interrupt. The call takes an
OPAL HW IRQ number but it is translated into a vector number in OPAL.
Here, we directly use the vector number of the in-the-middle "MSI"
domain instead of grabbing the OPAL HW IRQ number in the XICS parent
domain.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7b75af17dc59..7035be271c34 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2313,12 +2313,33 @@ static void pnv_msi_compose_msg(struct irq_data *d, struct msi_msg *msg)
 			entry->msi_attrib.is_64 ? "64" : "32", d->hwirq, rc);
 }
 
+/*
+ * The IRQ data is mapped in the MSI domain in which HW IRQ numbers
+ * correspond to vector numbers.
+ */
+static void pnv_msi_eoi(struct irq_data *d)
+{
+	struct pci_controller *hose = irq_data_get_irq_chip_data(d);
+	struct pnv_phb *phb = hose->private_data;
+
+	if (phb->model == PNV_PHB_MODEL_PHB3) {
+		/*
+		 * The EOI OPAL call takes an OPAL HW IRQ number but
+		 * since it is translated into a vector number in
+		 * OPAL, use that directly.
+		 */
+		WARN_ON_ONCE(opal_pci_msi_eoi(phb->opal_id, d->hwirq));
+	}
+
+	irq_chip_eoi_parent(d);
+}
+
 static struct irq_chip pnv_msi_irq_chip = {
 	.name			= "PNV-MSI",
 	.irq_shutdown		= pnv_msi_shutdown,
 	.irq_mask		= irq_chip_mask_parent,
 	.irq_unmask		= irq_chip_unmask_parent,
-	.irq_eoi		= irq_chip_eoi_parent,
+	.irq_eoi		= pnv_msi_eoi,
 	.irq_set_affinity	= irq_chip_set_affinity_parent,
 	.irq_compose_msi_msg	= pnv_msi_compose_msg,
 };
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 22/31] powerpc/pci: Drop XIVE restriction on MSI domains
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (20 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 21/31] powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3 Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:03 ` [PATCH 23/31] powerpc/xics: Drop unmask of MSIs at startup Cédric Le Goater
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

The PowerNV and pSeries platforms now have support for both the XICS
and XIVE IRQ domains.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 4 +---
 arch/powerpc/platforms/pseries/msi.c      | 4 ----
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7035be271c34..13b56de92d85 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2476,9 +2476,7 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 	pr_info("  Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
 		count, phb->msi_base);
 
-	/* Only supported by the XIVE driver */
-	if (xive_enabled())
-		pnv_msi_allocate_domains(phb->hose, count);
+	pnv_msi_allocate_domains(phb->hose, count);
 }
 
 static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index d1470941cadf..1886cb5ca4df 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -720,10 +720,6 @@ int pseries_msi_allocate_domains(struct pci_controller *phb)
 {
 	int count;
 
-	/* Only supported by the XIVE driver */
-	if (!xive_enabled())
-		return -ENODEV;
-
 	if (!__find_pe_total_msi(phb->dn, &count)) {
 		pr_err("PCI: failed to find MSIs for bridge %pOF (domain %d)\n",
 		       phb->dn, phb->global_number);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 23/31] powerpc/xics: Drop unmask of MSIs at startup
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (21 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 22/31] powerpc/pci: Drop XIVE restriction on MSI domains Cédric Le Goater
@ 2021-04-30  8:03 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 24/31] powerpc/pseries/pci: Drop unused MSI code Cédric Le Goater
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

That was a workaround in the XICS domain because of the lack of MSI
domain. This is now handled.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xics/ics-opal.c | 11 -----------
 arch/powerpc/sysdev/xics/ics-rtas.c |  9 ---------
 2 files changed, 20 deletions(-)

diff --git a/arch/powerpc/sysdev/xics/ics-opal.c b/arch/powerpc/sysdev/xics/ics-opal.c
index bf26cae1b982..c4d95d8beb6f 100644
--- a/arch/powerpc/sysdev/xics/ics-opal.c
+++ b/arch/powerpc/sysdev/xics/ics-opal.c
@@ -62,17 +62,6 @@ static void ics_opal_unmask_irq(struct irq_data *d)
 
 static unsigned int ics_opal_startup(struct irq_data *d)
 {
-#ifdef CONFIG_PCI_MSI
-	/*
-	 * The generic MSI code returns with the interrupt disabled on the
-	 * card, using the MSI mask bits. Firmware doesn't appear to unmask
-	 * at that level, so we do it here by hand.
-	 */
-	if (irq_data_get_msi_desc(d))
-		pci_msi_unmask_irq(d);
-#endif
-
-	/* unmask it */
 	ics_opal_unmask_irq(d);
 	return 0;
 }
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index b50c6341682e..b9da317b7a2d 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -57,15 +57,6 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
 
 static unsigned int ics_rtas_startup(struct irq_data *d)
 {
-#ifdef CONFIG_PCI_MSI
-	/*
-	 * The generic MSI code returns with the interrupt disabled on the
-	 * card, using the MSI mask bits. Firmware doesn't appear to unmask
-	 * at that level, so we do it here by hand.
-	 */
-	if (irq_data_get_msi_desc(d))
-		pci_msi_unmask_irq(d);
-#endif
 	/* unmask it */
 	ics_rtas_unmask_irq(d);
 	return 0;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 24/31] powerpc/pseries/pci: Drop unused MSI code
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (22 preceding siblings ...)
  2021-04-30  8:03 ` [PATCH 23/31] powerpc/xics: Drop unmask of MSIs at startup Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 25/31] powerpc/powernv/pci: " Cédric Le Goater
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

MSIs should be fully managed by the PCI and IRQ subsystems now.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/pseries/msi.c | 87 ----------------------------
 1 file changed, 87 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 1886cb5ca4df..7ddce65edb88 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -111,21 +111,6 @@ static int rtas_query_irq_number(struct pci_dn *pdn, int offset)
 	return rtas_ret[0];
 }
 
-static void rtas_teardown_msi_irqs(struct pci_dev *pdev)
-{
-	struct msi_desc *entry;
-
-	for_each_pci_msi_entry(entry, pdev) {
-		if (!entry->irq)
-			continue;
-
-		irq_set_msi_desc(entry->irq, NULL);
-		irq_dispose_mapping(entry->irq);
-	}
-
-	rtas_disable_msi(pdev);
-}
-
 static int check_req(struct pci_dev *pdev, int nvec, char *prop_name)
 {
 	struct device_node *dn;
@@ -459,66 +444,6 @@ static int rtas_prepare_msi_irqs(struct pci_dev *pdev, int nvec_in, int type,
 	return 0;
 }
 
-static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
-{
-	struct pci_dn *pdn;
-	int hwirq, virq, i;
-	int rc;
-	struct msi_desc *entry;
-	struct msi_msg msg;
-
-	rc = rtas_prepare_msi_irqs(pdev, nvec_in, type, NULL);
-	if (rc)
-		return rc;
-
-	pdn = pci_get_pdn(pdev);
-	i = 0;
-	for_each_pci_msi_entry(entry, pdev) {
-		hwirq = rtas_query_irq_number(pdn, i++);
-		if (hwirq < 0) {
-			pr_debug("rtas_msi: error (%d) getting hwirq\n", rc);
-			return hwirq;
-		}
-
-		/*
-		 * Depending on the number of online CPUs in the original
-		 * kernel, it is likely for CPU #0 to be offline in a kdump
-		 * kernel. The associated IRQs in the affinity mappings
-		 * provided by irq_create_affinity_masks() are thus not
-		 * started by irq_startup(), as per-design for managed IRQs.
-		 * This can be a problem with multi-queue block devices driven
-		 * by blk-mq : such a non-started IRQ is very likely paired
-		 * with the single queue enforced by blk-mq during kdump (see
-		 * blk_mq_alloc_tag_set()). This causes the device to remain
-		 * silent and likely hangs the guest at some point.
-		 *
-		 * We don't really care for fine-grained affinity when doing
-		 * kdump actually : simply ignore the pre-computed affinity
-		 * masks in this case and let the default mask with all CPUs
-		 * be used when creating the IRQ mappings.
-		 */
-		if (is_kdump_kernel())
-			virq = irq_create_mapping(NULL, hwirq);
-		else
-			virq = irq_create_mapping_affinity(NULL, hwirq,
-							   entry->affinity);
-
-		if (!virq) {
-			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-			return -ENOSPC;
-		}
-
-		dev_dbg(&pdev->dev, "rtas_msi: allocated virq %d\n", virq);
-		irq_set_msi_desc(virq, entry);
-
-		/* Read config space back so we can restore after reset */
-		__pci_read_msi_msg(entry, &msg);
-		entry->msg = msg;
-	}
-
-	return 0;
-}
-
 static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev,
 				   int nvec, msi_alloc_info_t *arg)
 {
@@ -759,8 +684,6 @@ static void rtas_msi_pci_irq_fixup(struct pci_dev *pdev)
 
 static int rtas_msi_init(void)
 {
-	struct pci_controller *phb;
-
 	query_token  = rtas_token("ibm,query-interrupt-source-number");
 	change_token = rtas_token("ibm,change-msi");
 
@@ -772,16 +695,6 @@ static int rtas_msi_init(void)
 
 	pr_debug("rtas_msi: Registering RTAS MSI callbacks.\n");
 
-	WARN_ON(pseries_pci_controller_ops.setup_msi_irqs);
-	pseries_pci_controller_ops.setup_msi_irqs = rtas_setup_msi_irqs;
-	pseries_pci_controller_ops.teardown_msi_irqs = rtas_teardown_msi_irqs;
-
-	list_for_each_entry(phb, &hose_list, list_node) {
-		WARN_ON(phb->controller_ops.setup_msi_irqs);
-		phb->controller_ops.setup_msi_irqs = rtas_setup_msi_irqs;
-		phb->controller_ops.teardown_msi_irqs = rtas_teardown_msi_irqs;
-	}
-
 	WARN_ON(ppc_md.pci_irq_fixup);
 	ppc_md.pci_irq_fixup = rtas_msi_pci_irq_fixup;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 25/31] powerpc/powernv/pci: Drop unused MSI code
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (23 preceding siblings ...)
  2021-04-30  8:04 ` [PATCH 24/31] powerpc/pseries/pci: Drop unused MSI code Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 26/31] powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interrupt Cédric Le Goater
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater

MSIs should be fully managed by the PCI and IRQ subsystems now.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/powernv/pci.h      |  6 --
 arch/powerpc/platforms/powernv/pci-ioda.c | 29 ----------
 arch/powerpc/platforms/powernv/pci.c      | 67 -----------------------
 3 files changed, 102 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 36d22920f5a3..a075012788df 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -127,11 +127,7 @@ struct pnv_phb {
 #endif
 
 	unsigned int		msi_base;
-	unsigned int		msi32_support;
 	struct msi_bitmap	msi_bmp;
-	int (*msi_setup)(struct pnv_phb *phb, struct pci_dev *dev,
-			 unsigned int hwirq, unsigned int virq,
-			 unsigned int is_64, struct msi_msg *msg);
 	int (*init_m64)(struct pnv_phb *phb);
 	int (*get_pe_state)(struct pnv_phb *phb, int pe_no);
 	void (*freeze_pe)(struct pnv_phb *phb, int pe_no);
@@ -295,8 +291,6 @@ extern void pnv_npu2_map_lpar(struct pnv_ioda_pe *gpe, unsigned long msr);
 extern void pnv_pci_reset_secondary_bus(struct pci_dev *dev);
 extern int pnv_eeh_phb_reset(struct pci_controller *hose, int option);
 
-extern int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type);
-extern void pnv_teardown_msi_irqs(struct pci_dev *pdev);
 extern struct pnv_ioda_pe *pnv_pci_bdfn_to_pe(struct pnv_phb *phb, u16 bdfn);
 extern struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev *dev);
 extern void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 13b56de92d85..c5acd85a9144 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2224,29 +2224,6 @@ static int __pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	return 0;
 }
 
-static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
-				  unsigned int hwirq, unsigned int virq,
-				  unsigned int is_64, struct msi_msg *msg)
-{
-	struct pnv_ioda_pe *pe = pnv_ioda_get_pe(dev);
-	unsigned int xive_num = hwirq - phb->msi_base;
-	int rc;
-
-	rc = __pnv_pci_ioda_msi_setup(phb, dev, xive_num, is_64, msg);
-	if (rc)
-		return rc;
-
-	/* P8 only */
-	pnv_set_msi_irq_chip(phb, virq);
-
-	pr_devel("%s: %s-bit MSI on hwirq %x (xive #%d),"
-		 " address=%x_%08x data=%x PE# %x\n",
-		 pci_name(dev), is_64 ? "64" : "32", hwirq, xive_num,
-		 msg->address_hi, msg->address_lo, msg->data, pe->pe_number);
-
-	return 0;
-}
-
 /*
  * The msi_free() op is called before irq_domain_free_irqs_top() when
  * the handler data is still available. Use that to clear the XIVE
@@ -2471,8 +2448,6 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 		return;
 	}
 
-	phb->msi_setup = pnv_pci_ioda_msi_setup;
-	phb->msi32_support = 1;
 	pr_info("  Allocated bitmap for %d MSIs (base IRQ 0x%x)\n",
 		count, phb->msi_base);
 
@@ -3090,8 +3065,6 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_ioda_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_ioda_dma_bus_setup,
 	.iommu_bypass_supported	= pnv_pci_ioda_iommu_bypass_supported,
-	.setup_msi_irqs		= pnv_setup_msi_irqs,
-	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
 	.enable_device_hook	= pnv_pci_enable_device_hook,
 	.release_device		= pnv_pci_release_device,
 	.window_alignment	= pnv_pci_window_alignment,
@@ -3101,8 +3074,6 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 };
 
 static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
-	.setup_msi_irqs		= pnv_setup_msi_irqs,
-	.teardown_msi_irqs	= pnv_teardown_msi_irqs,
 	.enable_device_hook	= pnv_pci_enable_device_hook,
 	.window_alignment	= pnv_pci_window_alignment,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 9b9bca169275..397b3d7eb150 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -160,73 +160,6 @@ int pnv_pci_set_power_state(uint64_t id, uint8_t state, struct opal_msg *msg)
 }
 EXPORT_SYMBOL_GPL(pnv_pci_set_power_state);
 
-int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
-{
-	struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
-	struct msi_desc *entry;
-	struct msi_msg msg;
-	int hwirq;
-	unsigned int virq;
-	int rc;
-
-	if (WARN_ON(!phb) || !phb->msi_bmp.bitmap)
-		return -ENODEV;
-
-	if (pdev->no_64bit_msi && !phb->msi32_support)
-		return -ENODEV;
-
-	for_each_pci_msi_entry(entry, pdev) {
-		if (!entry->msi_attrib.is_64 && !phb->msi32_support) {
-			pr_warn("%s: Supports only 64-bit MSIs\n",
-				pci_name(pdev));
-			return -ENXIO;
-		}
-		hwirq = msi_bitmap_alloc_hwirqs(&phb->msi_bmp, 1);
-		if (hwirq < 0) {
-			pr_warn("%s: Failed to find a free MSI\n",
-				pci_name(pdev));
-			return -ENOSPC;
-		}
-		virq = irq_create_mapping(NULL, phb->msi_base + hwirq);
-		if (!virq) {
-			pr_warn("%s: Failed to map MSI to linux irq\n",
-				pci_name(pdev));
-			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq, 1);
-			return -ENOMEM;
-		}
-		rc = phb->msi_setup(phb, pdev, phb->msi_base + hwirq,
-				    virq, entry->msi_attrib.is_64, &msg);
-		if (rc) {
-			pr_warn("%s: Failed to setup MSI\n", pci_name(pdev));
-			irq_dispose_mapping(virq);
-			msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq, 1);
-			return rc;
-		}
-		irq_set_msi_desc(virq, entry);
-		pci_write_msi_msg(virq, &msg);
-	}
-	return 0;
-}
-
-void pnv_teardown_msi_irqs(struct pci_dev *pdev)
-{
-	struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
-	struct msi_desc *entry;
-	irq_hw_number_t hwirq;
-
-	if (WARN_ON(!phb))
-		return;
-
-	for_each_pci_msi_entry(entry, pdev) {
-		if (!entry->irq)
-			continue;
-		hwirq = virq_to_hw(entry->irq);
-		irq_set_msi_desc(entry->irq, NULL);
-		irq_dispose_mapping(entry->irq);
-		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, 1);
-	}
-}
-
 /* Nicely print the contents of the PE State Tables (PEST). */
 static void pnv_pci_dump_pest(__be64 pestA[], __be64 pestB[], int pest_size)
 {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 26/31] powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interrupt
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (24 preceding siblings ...)
  2021-04-30  8:04 ` [PATCH 25/31] powerpc/powernv/pci: " Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 27/31] powerpc/xics: Fix IRQ migration Cédric Le Goater
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Cédric Le Goater

The pnv_ioda2_msi_eoi chip handler is not used anymore for MSIs.
Simply use the check on the PSI-MSI chip.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c5acd85a9144..c1598ab730c3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2159,7 +2159,7 @@ static struct irq_chip pnv_pci_msi_irq_chip;
  */
 bool is_pnv_opal_msi(struct irq_chip *chip)
 {
-	return chip->irq_eoi == pnv_ioda2_msi_eoi || chip == &pnv_pci_msi_irq_chip;
+	return chip == &pnv_pci_msi_irq_chip;
 }
 EXPORT_SYMBOL_GPL(is_pnv_opal_msi);
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 27/31] powerpc/xics: Fix IRQ migration
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (25 preceding siblings ...)
  2021-04-30  8:04 ` [PATCH 26/31] powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interrupt Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 28/31] powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices Cédric Le Goater
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Cédric Le Goater

desc->irq_data points to the top level IRQ data descriptor which is
not necessarily in the XICS IRQ domain. MSIs are in another domain for
instance. Fix that by looking for a mapping on the low level XICS IRQ
domain.

TODO: Why not use irq_migrate_all_off_this_cpu() instead ?

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/sysdev/xics/xics-common.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index 05d21005dc79..2a3ad7f5c331 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -183,6 +183,8 @@ void xics_migrate_irqs_away(void)
 	unsigned int irq, virq;
 	struct irq_desc *desc;
 
+	pr_debug("%s: CPU %u\n", __func__, cpu);
+
 	/* If we used to be the default server, move to the new "boot_cpuid" */
 	if (hw_cpu == xics_default_server)
 		xics_update_irq_servers();
@@ -197,6 +199,7 @@ void xics_migrate_irqs_away(void)
 		struct irq_chip *chip;
 		long server;
 		unsigned long flags;
+		struct irq_data *irqd;
 
 		/* We can't set affinity on ISA interrupts */
 		if (virq < NUM_ISA_INTERRUPTS)
@@ -204,9 +207,11 @@ void xics_migrate_irqs_away(void)
 		/* We only need to migrate enabled IRQS */
 		if (!desc->action)
 			continue;
-		if (desc->irq_data.domain != xics_host)
+		/* We need a mapping in the XICS IRQ domain */
+		irqd = irq_domain_get_irq_data(xics_host, virq);
+		if (!irqd)
 			continue;
-		irq = desc->irq_data.hwirq;
+		irq = irqd_to_hwirq(irqd);
 		/* We need to get IPIs still. */
 		if (irq == XICS_IPI || irq == XICS_IRQ_SPURIOUS)
 			continue;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 28/31] powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (26 preceding siblings ...)
  2021-04-30  8:04 ` [PATCH 27/31] powerpc/xics: Fix IRQ migration Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 29/31] powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi() Cédric Le Goater
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Frederic Barrat, Cédric Le Goater, Christophe Lombard

Before MSI domains, the default IRQ chip of PHB3 MSIs was patched by
pnv_set_msi_irq_chip() with the custom EOI handler pnv_ioda2_msi_eoi()
and the owning PHB was deduced from the 'ioda.irq_chip' field. This
path has been deprecated by the MSI domains but it is still in use by
the P8 CAPI 'cxl' driver.

Rewriting this driver to support MSI would be a waste of time.
Nevertheless, we can still remove the IRQ chip patch and set the IRQ
chip data instead. This is cleaner.

Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index c1598ab730c3..d496d5b1b45a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2115,19 +2115,23 @@ int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq)
 	return opal_pci_msi_eoi(phb->opal_id, hw_irq);
 }
 
+/*
+ * The IRQ data is mapped in the XICS domain, with OPAL HW IRQ numbers
+ */
 static void pnv_ioda2_msi_eoi(struct irq_data *d)
 {
 	int64_t rc;
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
-	struct irq_chip *chip = irq_data_get_irq_chip(d);
+	struct pci_controller *hose = irq_data_get_irq_chip_data(d);
+	struct pnv_phb *phb = hose->private_data;
 
-	rc = pnv_opal_pci_msi_eoi(chip, hw_irq);
+	rc = opal_pci_msi_eoi(phb->opal_id, hw_irq);
 	WARN_ON_ONCE(rc);
 
 	icp_native_eoi(d);
 }
 
-
+/* P8/CXL only */
 void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 {
 	struct irq_data *idata;
@@ -2149,6 +2153,7 @@ void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
 		phb->ioda.irq_chip.irq_eoi = pnv_ioda2_msi_eoi;
 	}
 	irq_set_chip(virq, &phb->ioda.irq_chip);
+	irq_set_chip_data(virq, phb->hose);
 }
 
 static struct irq_chip pnv_pci_msi_irq_chip;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 29/31] powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (27 preceding siblings ...)
  2021-04-30  8:04 ` [PATCH 28/31] powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 30/31] KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/ Cédric Le Goater
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater, Alexey Kardashevskiy

pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts
when in real mode. Adding MSI domain broke the hack using the
'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the
IRQ chip data in the MSI domain.

The 'ioda.irq_chip' field is now unused and could be removed from the
pnv_phb struct.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/pnv-pci.h        |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c      |  8 ++++----
 arch/powerpc/platforms/powernv/pci-ioda.c | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index d0ee0ede5767..b3f480799352 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -33,7 +33,7 @@ int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num);
 void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num);
 int pnv_cxl_get_irq_count(struct pci_dev *dev);
 struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev);
-int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq);
+int64_t pnv_opal_pci_msi_eoi(struct irq_data *d);
 bool is_pnv_opal_msi(struct irq_chip *chip);
 
 #ifdef CONFIG_CXL_BASE
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index c2c9c733f359..1772d53526e2 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -713,6 +713,7 @@ static int ics_rm_eoi(struct kvm_vcpu *vcpu, u32 irq)
 		icp->rm_eoied_irq = irq;
 	}
 
+	/* Handle passthrough interrupts */
 	if (state->host_irq) {
 		++vcpu->stat.pthru_all;
 		if (state->intr_cpu != -1) {
@@ -766,7 +767,7 @@ int xics_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 
 static unsigned long eoi_rc;
 
-static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
+static void icp_eoi(struct irq_data *d, u32 hwirq, __be32 xirr, bool *again)
 {
 	void __iomem *xics_phys;
 	int64_t rc;
@@ -779,7 +780,7 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
 		return;
 	}
 
-	rc = pnv_opal_pci_msi_eoi(c, hwirq);
+	rc = pnv_opal_pci_msi_eoi(d);
 
 	if (rc)
 		eoi_rc = rc;
@@ -887,8 +888,7 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
 		icp_rm_deliver_irq(xics, icp, irq, false);
 
 	/* EOI the interrupt */
-	icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
-		again);
+	icp_eoi(irq_desc_get_irq_data(irq_map->desc), irq_map->r_hwirq, xirr, again);
 
 	if (check_too_hard(xics, icp) == H_TOO_HARD)
 		return 2;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d496d5b1b45a..8406b94cbfca 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2107,12 +2107,21 @@ void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	pe->dma_setup_done = true;
 }
 
-int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq)
+/*
+ * Called from KVM in real mode to EOI passthru interrupts. The ICP
+ * EOI is handled directly in KVM in kvmppc_deliver_irq_passthru().
+ *
+ * The IRQ data is mapped in the PCI-MSI domain and the EOI OPAL call
+ * needs an HW IRQ number mapped in the XICS IRQ domain. The HW IRQ
+ * numbers of the in-the-middle MSI domain are vector numbers and it's
+ * good enough for OPAL. Use that.
+ */
+int64_t pnv_opal_pci_msi_eoi(struct irq_data *d)
 {
-	struct pnv_phb *phb = container_of(chip, struct pnv_phb,
-					   ioda.irq_chip);
+	struct pci_controller *hose = irq_data_get_irq_chip_data(d->parent_data);
+	struct pnv_phb *phb = hose->private_data;
 
-	return opal_pci_msi_eoi(phb->opal_id, hw_irq);
+	return opal_pci_msi_eoi(phb->opal_id, d->parent_data->hwirq);
 }
 
 /*
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 30/31] KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (28 preceding siblings ...)
  2021-04-30  8:04 ` [PATCH 29/31] powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi() Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-04-30  8:04 ` [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/ Cédric Le Goater
  30 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cédric Le Goater, Alexey Kardashevskiy

PCI MSIs now live in an MSI domain but the underlying calls, which
will EOI the interrupt in real mode, need an HW IRQ number mapped in
the XICS IRQ domain. Grab it there.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_hv.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9f4eb74a11cc..6058bcc5b61e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5126,6 +5126,7 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
 	struct kvmppc_passthru_irqmap *pimap;
 	struct irq_chip *chip;
 	int i, rc = 0;
+	struct irq_data *host_data;
 
 	if (!kvm_irq_bypass)
 		return 1;
@@ -5190,7 +5191,14 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
 	 * the KVM real mode handler.
 	 */
 	smp_wmb();
-	irq_map->r_hwirq = desc->irq_data.hwirq;
+
+	/*
+	 * The 'host_irq' number is mapped in the PCI-MSI domain but
+	 * the underlying calls, which will EOI the interrupt in real
+	 * mode, need an HW IRQ number mapped in the XICS IRQ domain.
+	 */
+	host_data = irq_domain_get_irq_data(irq_get_default_host(), host_irq);
+	irq_map->r_hwirq = (unsigned int)irqd_to_hwirq(host_data);
 
 	if (i == pimap->n_mapped)
 		pimap->n_mapped++;
@@ -5198,7 +5206,7 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
 	if (xics_on_xive())
 		rc = kvmppc_xive_set_mapped(kvm, guest_gsi, host_irq);
 	else
-		kvmppc_xics_set_mapped(kvm, guest_gsi, desc->irq_data.hwirq);
+		kvmppc_xics_set_mapped(kvm, guest_gsi, irq_map->r_hwirq);
 	if (rc)
 		irq_map->r_hwirq = 0;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/
  2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
                   ` (29 preceding siblings ...)
  2021-04-30  8:04 ` [PATCH 30/31] KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts Cédric Le Goater
@ 2021-04-30  8:04 ` Cédric Le Goater
  2021-05-14 20:49   ` Thomas Gleixner
  30 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2021-04-30  8:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Cédric Le Goater

The HW IRQ numbers generated by the PCI MSI layer can be quite large
on a pSeries machine when running under the IBM Hypervisor and they
appear as negative. Use '%u' to show them correctly.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 kernel/irq/irqdesc.c | 2 +-
 kernel/irq/proc.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index cc1a09406c6e..85054eb2ae51 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -188,7 +188,7 @@ static ssize_t hwirq_show(struct kobject *kobj,
 
 	raw_spin_lock_irq(&desc->lock);
 	if (desc->irq_data.domain)
-		ret = sprintf(buf, "%d\n", (int)desc->irq_data.hwirq);
+		ret = sprintf(buf, "%u\n", (int)desc->irq_data.hwirq);
 	raw_spin_unlock_irq(&desc->lock);
 
 	return ret;
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 98138788cb04..e2392f05da04 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -513,7 +513,7 @@ int show_interrupts(struct seq_file *p, void *v)
 		seq_printf(p, " %8s", "None");
 	}
 	if (desc->irq_data.domain)
-		seq_printf(p, " %*d", prec, (int) desc->irq_data.hwirq);
+		seq_printf(p, " %*u", prec, (int)desc->irq_data.hwirq);
 	else
 		seq_printf(p, " %*s", prec, "");
 #ifdef CONFIG_GENERIC_IRQ_SHOW_LEVEL
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI
  2021-04-30  8:03 ` [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI Cédric Le Goater
@ 2021-05-14 20:48   ` Thomas Gleixner
  2021-05-20 17:25     ` Cédric Le Goater
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2021-05-14 20:48 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: Cédric Le Goater

On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
> The MSI affinity is automanaged and it can be set before starting the
> associated IRQ.
>
> ( Should we simply remove the irqd_is_started() test ? )

If the hardware can handle it properly.

But see:

  cffb717ceb8e ("powerpc/xive: Ensure active irqd when setting affinity")

which introduced that condition. It mutters something about migration of
shutdown interrupts:

       [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
       [   77.885859] xive: Error -6 reconfiguring irq 17
       [   77.885862] IRQ17: set affinity failed(-6).

Not that I can decode that :)

Non-managed interrupts have the sequence:

      startup()
      set_affinity()

which is historical and an earlier attempt to flip it caused havoc in
some places.

With managed we needed to make sure that the affinity is set correctly
right at start. So it needs to be done the other way round and it turned
out that for MSI this works.

I have no idea, whether that might make the above issue reappear or
not. If so, then we need some extra state to make it work.

The root cause which triggered the problem got fixed, so there should be
no issue _if_ this was specifically related to that CPU unplug case.

> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
> index 96737938e8e3..3485baf9ec8c 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -710,7 +710,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
>  		return -EINVAL;
>  
>  	/* Don't do anything if the interrupt isn't started */
> -	if (!irqd_is_started(d))
> +	if (!irqd_is_started(d) && !irqd_affinity_is_managed(d))
>  		return IRQ_SET_MASK_OK;
>  
>  	/*

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/
  2021-04-30  8:04 ` [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/ Cédric Le Goater
@ 2021-05-14 20:49   ` Thomas Gleixner
  2021-05-20 12:27     ` Cédric Le Goater
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2021-05-14 20:49 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: Cédric Le Goater

On Fri, Apr 30 2021 at 10:04, Cédric Le Goater wrote:
> The HW IRQ numbers generated by the PCI MSI layer can be quite large
> on a pSeries machine when running under the IBM Hypervisor and they
> appear as negative. Use '%u' to show them correctly.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  kernel/irq/irqdesc.c | 2 +-
>  kernel/irq/proc.c    | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index cc1a09406c6e..85054eb2ae51 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -188,7 +188,7 @@ static ssize_t hwirq_show(struct kobject *kobj,
>  
>  	raw_spin_lock_irq(&desc->lock);
>  	if (desc->irq_data.domain)
> -		ret = sprintf(buf, "%d\n", (int)desc->irq_data.hwirq);
> +		ret = sprintf(buf, "%u\n", (int)desc->irq_data.hwirq);

Which makes the (int) cast pointless, right?

>  	raw_spin_unlock_irq(&desc->lock);
>  
>  	return ret;
> diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
> index 98138788cb04..e2392f05da04 100644
> --- a/kernel/irq/proc.c
> +++ b/kernel/irq/proc.c
> @@ -513,7 +513,7 @@ int show_interrupts(struct seq_file *p, void *v)
>  		seq_printf(p, " %8s", "None");
>  	}
>  	if (desc->irq_data.domain)
> -		seq_printf(p, " %*d", prec, (int) desc->irq_data.hwirq);
> +		seq_printf(p, " %*u", prec, (int)desc->irq_data.hwirq);

ditto.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts
  2021-04-30  8:03 ` [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts Cédric Le Goater
@ 2021-05-14 20:51   ` Thomas Gleixner
  2021-05-15 10:40     ` Marc Zyngier
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2021-05-14 20:51 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: Cédric Le Goater, Marc Zyngier

On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:

CC: +Marc

> PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
> underlying calls handling the passthrough of the interrupt in the
> guest need a number in the XIVE IRQ domain.
>
> Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
> PCI-MSI domain.
>
> Exporting irq_get_default_host() might not be the best solution.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  arch/powerpc/kvm/book3s_xive.c | 3 ++-
>  kernel/irq/irqdomain.c         | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 3a7da42bed57..81b9f4fc3978 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -861,7 +861,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
>  	struct kvmppc_xive *xive = kvm->arch.xive;
>  	struct kvmppc_xive_src_block *sb;
>  	struct kvmppc_xive_irq_state *state;
> -	struct irq_data *host_data = irq_get_irq_data(host_irq);
> +	struct irq_data *host_data =
> +		irq_domain_get_irq_data(irq_get_default_host(), host_irq);
>  	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data);
>  	u16 idx;
>  	u8 prio;
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index d10ab1d689d5..8a073d1ce611 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -481,6 +481,7 @@ struct irq_domain *irq_get_default_host(void)
>  {
>  	return irq_default_domain;
>  }
> +EXPORT_SYMBOL_GPL(irq_get_default_host);
>  
>  static void irq_domain_clear_mapping(struct irq_domain *domain,
>  				     irq_hw_number_t hwirq)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts
  2021-05-14 20:51   ` Thomas Gleixner
@ 2021-05-15 10:40     ` Marc Zyngier
  2021-05-20 12:09       ` Cédric Le Goater
  0 siblings, 1 reply; 41+ messages in thread
From: Marc Zyngier @ 2021-05-15 10:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linuxppc-dev, Cédric Le Goater

On Fri, 14 May 2021 21:51:51 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
> 
> CC: +Marc

Thanks Thomas.

> 
> > PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
> > underlying calls handling the passthrough of the interrupt in the
> > guest need a number in the XIVE IRQ domain.
> >
> > Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
> > PCI-MSI domain.
> >
> > Exporting irq_get_default_host() might not be the best solution.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Paul Mackerras <paulus@ozlabs.org>
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  arch/powerpc/kvm/book3s_xive.c | 3 ++-
> >  kernel/irq/irqdomain.c         | 1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> > index 3a7da42bed57..81b9f4fc3978 100644
> > --- a/arch/powerpc/kvm/book3s_xive.c
> > +++ b/arch/powerpc/kvm/book3s_xive.c
> > @@ -861,7 +861,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
> >  	struct kvmppc_xive *xive = kvm->arch.xive;
> >  	struct kvmppc_xive_src_block *sb;
> >  	struct kvmppc_xive_irq_state *state;
> > -	struct irq_data *host_data = irq_get_irq_data(host_irq);
> > +	struct irq_data *host_data =
> > +		irq_domain_get_irq_data(irq_get_default_host(), host_irq);
> >  	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data);
> >  	u16 idx;
> >  	u8 prio;
> > diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> > index d10ab1d689d5..8a073d1ce611 100644
> > --- a/kernel/irq/irqdomain.c
> > +++ b/kernel/irq/irqdomain.c
> > @@ -481,6 +481,7 @@ struct irq_domain *irq_get_default_host(void)
> >  {
> >  	return irq_default_domain;
> >  }
> > +EXPORT_SYMBOL_GPL(irq_get_default_host);
> >  
> >  static void irq_domain_clear_mapping(struct irq_domain *domain,
> >  				     irq_hw_number_t hwirq)
> 

Is there any reason why we should add more users of the "default host"
fallback? I would really hope that new code would actually track their
irqdomain in a more fine-grained way, specially when using the
hierarchical MSi setup, which seems to be the goal of this series.

Don't you have enough topology information that you can make use of to
correctly assign a domain identifier (of_node or otherwise)?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts
  2021-05-15 10:40     ` Marc Zyngier
@ 2021-05-20 12:09       ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-05-20 12:09 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner; +Cc: linuxppc-dev

On 5/15/21 12:40 PM, Marc Zyngier wrote:
> On Fri, 14 May 2021 21:51:51 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
>>
>> CC: +Marc
> 
> Thanks Thomas.
> 
>>
>>> PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
>>> underlying calls handling the passthrough of the interrupt in the
>>> guest need a number in the XIVE IRQ domain.
>>>
>>> Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
>>> PCI-MSI domain.
>>>
>>> Exporting irq_get_default_host() might not be the best solution.
>>>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Paul Mackerras <paulus@ozlabs.org>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>>  arch/powerpc/kvm/book3s_xive.c | 3 ++-
>>>  kernel/irq/irqdomain.c         | 1 +
>>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>>> index 3a7da42bed57..81b9f4fc3978 100644
>>> --- a/arch/powerpc/kvm/book3s_xive.c
>>> +++ b/arch/powerpc/kvm/book3s_xive.c
>>> @@ -861,7 +861,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
>>>  	struct kvmppc_xive *xive = kvm->arch.xive;
>>>  	struct kvmppc_xive_src_block *sb;
>>>  	struct kvmppc_xive_irq_state *state;
>>> -	struct irq_data *host_data = irq_get_irq_data(host_irq);
>>> +	struct irq_data *host_data =
>>> +		irq_domain_get_irq_data(irq_get_default_host(), host_irq);
>>>  	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data);
>>>  	u16 idx;
>>>  	u8 prio;
>>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>>> index d10ab1d689d5..8a073d1ce611 100644
>>> --- a/kernel/irq/irqdomain.c
>>> +++ b/kernel/irq/irqdomain.c
>>> @@ -481,6 +481,7 @@ struct irq_domain *irq_get_default_host(void)
>>>  {
>>>  	return irq_default_domain;
>>>  }
>>> +EXPORT_SYMBOL_GPL(irq_get_default_host);
>>>  
>>>  static void irq_domain_clear_mapping(struct irq_domain *domain,
>>>  				     irq_hw_number_t hwirq)
>>
> 
> Is there any reason why we should add more users of the "default host"
> fallback? I would really hope that new code would actually track their
> irqdomain in a more fine-grained way, specially when using the
> hierarchical MSi setup, which seems to be the goal of this series.
> 
> Don't you have enough topology information that you can make use of to
> correctly assign a domain identifier (of_node or otherwise)?


PHB have a node ID and this is taken into account by the MSI domains.
However, one thing PPC (pSeries and PowerNV) lacks is an interrupt
controller node per chip which makes the IRQ domain hierarchy a bit
incomplete.

It will be difficult to change the pseries platform (VM) since the
PAPR architecture only specifies a single interrupt domain for the
whole machine. The PowerNV platform is designed in a similar way
(because the pseries platform preexisted) and the OPAL firmware hides
the interrupt controllers of each chip behind a single node. The
underlying topology is encoded in HW interrupt numbers. This is a bit
unfortunate since some PowerNV Linux drivers need that information.
Rewriting a new interrupt controller driver in OPAL would be a lot of
work and it won't happen any time soon. But it's feasible.

All that to say that we have a default IRQ domain on these platforms
and not one  IRQ domain per node/chip.

Also, there are two types of interrupt models to consider: the older
XICS (for P8/P7 processors) and the newer XIVE (for P9/P10).

Regarding MSI passthrough, the XIVE side is simpler (I can't believe I
am saying that, XIVE is anything but simple) and I think we can rework
kvmppc_xive_set_mapped() and xive_irq_set_vcpu_affinity() to remove
the IRQ domain bypass. 

XICS adds optimizations for passthrough done in real mode:

 e3c13e56a471 ("KVM: PPC: Book3S HV: Handle passthrough interrupts in guest")
 5d375199ea96 ("KVM: PPC: Book3S HV: Set server for passed-through interrupts")

That's a ~10% bandwidth improvements on CX5 adapters, it's good to
have but they are much more complex to rework. I took some time to
look for a solution for these because of the use of irq_to_desc() and
the use of the host IRQ in the XICS domain which are ugly but nothing
comes to mind yet.

For the time being, I think these changes bypassing the IRQ domains
are fine. I need some more time to mature an alternative.

Thanks,

C. 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/
  2021-05-14 20:49   ` Thomas Gleixner
@ 2021-05-20 12:27     ` Cédric Le Goater
  2021-05-20 12:57       ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2021-05-20 12:27 UTC (permalink / raw)
  To: Thomas Gleixner, linuxppc-dev

On 5/14/21 10:49 PM, Thomas Gleixner wrote:
> On Fri, Apr 30 2021 at 10:04, Cédric Le Goater wrote:
>> The HW IRQ numbers generated by the PCI MSI layer can be quite large
>> on a pSeries machine when running under the IBM Hypervisor and they
>> appear as negative. Use '%u' to show them correctly.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  kernel/irq/irqdesc.c | 2 +-
>>  kernel/irq/proc.c    | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
>> index cc1a09406c6e..85054eb2ae51 100644
>> --- a/kernel/irq/irqdesc.c
>> +++ b/kernel/irq/irqdesc.c
>> @@ -188,7 +188,7 @@ static ssize_t hwirq_show(struct kobject *kobj,
>>  
>>  	raw_spin_lock_irq(&desc->lock);
>>  	if (desc->irq_data.domain)
>> -		ret = sprintf(buf, "%d\n", (int)desc->irq_data.hwirq);
>> +		ret = sprintf(buf, "%u\n", (int)desc->irq_data.hwirq);
> 
> Which makes the (int) cast pointless, right?

Well, hwirq is a long. Would you prefer a "%lu" for both ?

Thanks,

C.

> 
>>  	raw_spin_unlock_irq(&desc->lock);
>>  
>>  	return ret;
>> diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
>> index 98138788cb04..e2392f05da04 100644
>> --- a/kernel/irq/proc.c
>> +++ b/kernel/irq/proc.c
>> @@ -513,7 +513,7 @@ int show_interrupts(struct seq_file *p, void *v)
>>  		seq_printf(p, " %8s", "None");
>>  	}
>>  	if (desc->irq_data.domain)
>> -		seq_printf(p, " %*d", prec, (int) desc->irq_data.hwirq);
>> +		seq_printf(p, " %*u", prec, (int)desc->irq_data.hwirq);
> 
> ditto.
> 
> Thanks,
> 
>         tglx
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/31] powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data
  2021-04-30  8:03 ` [PATCH 09/31] powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data Cédric Le Goater
@ 2021-05-20 12:33   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-05-20 12:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Thomas Gleixner, Marc Zyngier

Adding Marc.

On 4/30/21 10:03 AM, Cédric Le Goater wrote:
> The MSI domain clears the IRQ with msi_domain_free(), which calls
> irq_domain_free_irqs_top(), which clears the handler data. This is a
> problem for the XIVE controller since we need to unmap MMIO pages and
> free a specific XIVE structure.
> 
> The 'msi_free()' handler is called before irq_domain_free_irqs_top()
> when the handler data is still available. Use that to clear the XIVE
> controller data.
This feels like a clumsy way of doing so. 

irq_domain_free_irqs_parent() would be my preferred way to clear the 
lowlevel handler data but we can't today. Could there be a smarter way ?

Thanks,

C.


> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>
> ---
>  arch/powerpc/include/asm/xive.h      |  1 +
>  arch/powerpc/platforms/pseries/msi.c | 16 +++++++++++++++-
>  arch/powerpc/sysdev/xive/common.c    |  5 ++++-
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index aa094a8655b0..20ae50ab083c 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -111,6 +111,7 @@ void xive_native_free_vp_block(u32 vp_base);
>  int xive_native_populate_irq_data(u32 hw_irq,
>  				  struct xive_irq_data *data);
>  void xive_cleanup_irq_data(struct xive_irq_data *xd);
> +void xive_irq_free_data(unsigned int virq);
>  void xive_native_free_irq(u32 irq);
>  int xive_native_configure_irq(u32 hw_irq, u32 target, u8 prio, u32 sw_irq);
>  
> diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
> index a41c448520d4..da9d63a088bb 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -529,6 +529,19 @@ static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev
>  	return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
>  }
>  
> +/*
> + * ->msi_free() is called before irq_domain_free_irqs_top() when the
> + * handler data is still available. Use that to clear the XIVE
> + * controller data.
> + */
> +static void pseries_msi_ops_msi_free(struct irq_domain *domain,
> +				     struct msi_domain_info *info,
> +				     unsigned int irq)
> +{
> +	if (xive_enabled())
> +		xive_irq_free_data(irq);
> +}
> +
>  /*
>   * RTAS can not disable one MSI at a time. It's all or nothing. Do it
>   * at the end after all IRQs have been freed.
> @@ -546,6 +559,7 @@ static void pseries_msi_domain_free_irqs(struct irq_domain *domain,
>  
>  static struct msi_domain_ops pseries_pci_msi_domain_ops = {
>  	.msi_prepare	= pseries_msi_ops_prepare,
> +	.msi_free	= pseries_msi_ops_msi_free,
>  	.domain_free_irqs = pseries_msi_domain_free_irqs,
>  };
>  
> @@ -660,7 +674,7 @@ static void pseries_irq_domain_free(struct irq_domain *domain, unsigned int virq
>  
>  	pr_debug("%s bridge %pOF %d #%d\n", __func__, phb->dn, virq, nr_irqs);
>  
> -	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
> +	/* XIVE domain data is cleared through ->msi_free() */
>  }
>  
>  static const struct irq_domain_ops pseries_irq_domain_ops = {
> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
> index 3485baf9ec8c..191cd80ec534 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -980,6 +980,8 @@ EXPORT_SYMBOL_GPL(is_xive_irq);
>  
>  void xive_cleanup_irq_data(struct xive_irq_data *xd)
>  {
> +	pr_debug("%s for HW %x\n", __func__, xd->hw_irq);
> +
>  	if (xd->eoi_mmio) {
>  		unmap_kernel_range((unsigned long)xd->eoi_mmio,
>  				   1u << xd->esb_shift);
> @@ -1025,7 +1027,7 @@ static int xive_irq_alloc_data(unsigned int virq, irq_hw_number_t hw)
>  	return 0;
>  }
>  
> -static void xive_irq_free_data(unsigned int virq)
> +void xive_irq_free_data(unsigned int virq)
>  {
>  	struct xive_irq_data *xd = irq_get_handler_data(virq);
>  
> @@ -1035,6 +1037,7 @@ static void xive_irq_free_data(unsigned int virq)
>  	xive_cleanup_irq_data(xd);
>  	kfree(xd);
>  }
> +EXPORT_SYMBOL_GPL(xive_irq_free_data);
>  
>  #ifdef CONFIG_SMP
>  
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/
  2021-05-20 12:27     ` Cédric Le Goater
@ 2021-05-20 12:57       ` Thomas Gleixner
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gleixner @ 2021-05-20 12:57 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev

On Thu, May 20 2021 at 14:27, Cédric Le Goater wrote:
> On 5/14/21 10:49 PM, Thomas Gleixner wrote:
>> On Fri, Apr 30 2021 at 10:04, Cédric Le Goater wrote:
>>> The HW IRQ numbers generated by the PCI MSI layer can be quite large
>>> on a pSeries machine when running under the IBM Hypervisor and they
>>> appear as negative. Use '%u' to show them correctly.
>>>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>>  kernel/irq/irqdesc.c | 2 +-
>>>  kernel/irq/proc.c    | 2 +-
>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
>>> index cc1a09406c6e..85054eb2ae51 100644
>>> --- a/kernel/irq/irqdesc.c
>>> +++ b/kernel/irq/irqdesc.c
>>> @@ -188,7 +188,7 @@ static ssize_t hwirq_show(struct kobject *kobj,
>>>  
>>>  	raw_spin_lock_irq(&desc->lock);
>>>  	if (desc->irq_data.domain)
>>> -		ret = sprintf(buf, "%d\n", (int)desc->irq_data.hwirq);
>>> +		ret = sprintf(buf, "%u\n", (int)desc->irq_data.hwirq);
>> 
>> Which makes the (int) cast pointless, right?
>
> Well, hwirq is a long.

And that makes '%u' plus an int casted argument any more correct?

Aside of that hwirq is an unsigned long type and not long.

> Would you prefer a "%lu" for both ?

That's the obvious right thing to do, isn't it?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI
  2021-05-14 20:48   ` Thomas Gleixner
@ 2021-05-20 17:25     ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2021-05-20 17:25 UTC (permalink / raw)
  To: Thomas Gleixner, linuxppc-dev

On 5/14/21 10:48 PM, Thomas Gleixner wrote:
> On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
>> The MSI affinity is automanaged and it can be set before starting the
>> associated IRQ.
>>
>> ( Should we simply remove the irqd_is_started() test ? )
> 
> If the hardware can handle it properly.
> 
> But see:
> 
>   cffb717ceb8e ("powerpc/xive: Ensure active irqd when setting affinity")

Thanks for digging. That's a patch from the early days of XIVE support. 

> which introduced that condition. It mutters something about migration of
> shutdown interrupts:
> 
>        [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !

The XIVE driver in OPAL is complaining.

Linux is trying to configure the target of HW IRQ number 2 but OPAL refuses
because it's invalid. The first 16 are reserved (like on Linux).

So it's another problem. 2 could be a value from an "interrupts" property,
giving the INTx number assigned to a PCI device or an OPAL event IRQ 
number leaked into the XIVE domain. Given the low Linux IRQ number that 
might be the latter. 

>        [   77.885859] xive: Error -6 reconfiguring irq 17
>        [   77.885862] IRQ17: set affinity failed(-6).
> 
> Not that I can decode that :)

A device name would help but you have guessed most of it ;)

> 
> Non-managed interrupts have the sequence:
> 
>       startup()
>       set_affinity()
> 
> which is historical and an earlier attempt to flip it caused havoc in
> some places.
> 
> With managed we needed to make sure that the affinity is set correctly
> right at start. So it needs to be done the other way round and it turned
> out that for MSI this works.
> 
> I have no idea, whether that might make the above issue reappear or
> not. If so, then we need some extra state to make it work.
> 
> The root cause which triggered the problem got fixed, so there should be
> no issue _if_ this was specifically related to that CPU unplug case.

I would vote for this option. I will simply remove the irqd_is_started() 
test which looks bogus and do some extra tests on all platforms.

Thanks,

C.


 
>> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
>> index 96737938e8e3..3485baf9ec8c 100644
>> --- a/arch/powerpc/sysdev/xive/common.c
>> +++ b/arch/powerpc/sysdev/xive/common.c
>> @@ -710,7 +710,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
>>  		return -EINVAL;
>>  
>>  	/* Don't do anything if the interrupt isn't started */
>> -	if (!irqd_is_started(d))
>> +	if (!irqd_is_started(d) && !irqd_affinity_is_managed(d))
>>  		return IRQ_SET_MASK_OK;
>>  
>>  	/*
> 
> Thanks,
> 
>         tglx
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-05-20 17:25 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30  8:03 [PATCH 00/31] powerpc: Modernize the PCI/MSI support Cédric Le Goater
2021-04-30  8:03 ` [PATCH 01/31] powerpc/pseries/pci: Introduce __find_pe_total_msi() Cédric Le Goater
2021-04-30  8:03 ` [PATCH 02/31] powerpc/pseries/pci: Introduce rtas_prepare_msi_irqs() Cédric Le Goater
2021-04-30  8:03 ` [PATCH 03/31] powerpc/xive: Add support for IRQ domain hierarchy Cédric Le Goater
2021-04-30  8:03 ` [PATCH 04/31] powerpc/xive: Ease debugging of xive_irq_set_affinity() Cédric Le Goater
2021-04-30  8:03 ` [PATCH 05/31] powerpc/pseries/pci: Add MSI domains Cédric Le Goater
2021-04-30  8:03 ` [PATCH 06/31] powerpc/xive: Drop unmask of MSIs at startup Cédric Le Goater
2021-04-30  8:03 ` [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI Cédric Le Goater
2021-05-14 20:48   ` Thomas Gleixner
2021-05-20 17:25     ` Cédric Le Goater
2021-04-30  8:03 ` [PATCH 08/31] powerpc/pseries/pci: Add a domain_free_irqs handler Cédric Le Goater
2021-04-30  8:03 ` [PATCH 09/31] powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data Cédric Le Goater
2021-05-20 12:33   ` Cédric Le Goater
2021-04-30  8:03 ` [PATCH 10/31] powerpc/pseries/pci: Add support of MSI domains to PHB hotplug Cédric Le Goater
2021-04-30  8:03 ` [PATCH 11/31] powerpc/powernv/pci: Introduce __pnv_pci_ioda_msi_setup() Cédric Le Goater
2021-04-30  8:03 ` [PATCH 12/31] powerpc/powernv/pci: Add MSI domains Cédric Le Goater
2021-04-30  8:03 ` [PATCH 13/31] KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts Cédric Le Goater
2021-04-30  8:03 ` [PATCH 14/31] KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines Cédric Le Goater
2021-04-30  8:03 ` [PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts Cédric Le Goater
2021-05-14 20:51   ` Thomas Gleixner
2021-05-15 10:40     ` Marc Zyngier
2021-05-20 12:09       ` Cédric Le Goater
2021-04-30  8:03 ` [PATCH 16/31] powerpc/xics: Remove ICS list Cédric Le Goater
2021-04-30  8:03 ` [PATCH 17/31] powerpc/xics: Rename the map handler in a check handler Cédric Le Goater
2021-04-30  8:03 ` [PATCH 18/31] powerpc/xics: Give a name to the default XICS IRQ domain Cédric Le Goater
2021-04-30  8:03 ` [PATCH 19/31] powerpc/xics: Add debug logging to the set_irq_affinity handlers Cédric Le Goater
2021-04-30  8:03 ` [PATCH 20/31] powerpc/xics: Add support for IRQ domain hierarchy Cédric Le Goater
2021-04-30  8:03 ` [PATCH 21/31] powerpc/powernv/pci: Customize the MSI EOI handler to support PHB3 Cédric Le Goater
2021-04-30  8:03 ` [PATCH 22/31] powerpc/pci: Drop XIVE restriction on MSI domains Cédric Le Goater
2021-04-30  8:03 ` [PATCH 23/31] powerpc/xics: Drop unmask of MSIs at startup Cédric Le Goater
2021-04-30  8:04 ` [PATCH 24/31] powerpc/pseries/pci: Drop unused MSI code Cédric Le Goater
2021-04-30  8:04 ` [PATCH 25/31] powerpc/powernv/pci: " Cédric Le Goater
2021-04-30  8:04 ` [PATCH 26/31] powerpc/powernv/pci: Adapt is_pnv_opal_msi() to detect passthrough interrupt Cédric Le Goater
2021-04-30  8:04 ` [PATCH 27/31] powerpc/xics: Fix IRQ migration Cédric Le Goater
2021-04-30  8:04 ` [PATCH 28/31] powerpc/powernv/pci: Set the IRQ chip data for P8/CXL devices Cédric Le Goater
2021-04-30  8:04 ` [PATCH 29/31] powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi() Cédric Le Goater
2021-04-30  8:04 ` [PATCH 30/31] KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts Cédric Le Goater
2021-04-30  8:04 ` [PATCH 31/31] genirq: Improve "hwirq" output in /proc and /sys/ Cédric Le Goater
2021-05-14 20:49   ` Thomas Gleixner
2021-05-20 12:27     ` Cédric Le Goater
2021-05-20 12:57       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.