* [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node
@ 2021-03-31 14:45 Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm, chip-id" property Cédric Le Goater
` (11 more replies)
0 siblings, 12 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Greg Kurz, Cédric Le Goater
Hello,
ipistorm [*] can be used to benchmark the raw interrupt rate of an
interrupt controller by measuring the number of IPIs a system can
sustain. When applied to the XIVE interrupt controller of POWER9 and
POWER10 systems, a significant drop of the interrupt rate can be
observed when crossing the second node boundary.
This is due to the fact that a single IPI interrupt is used for all
CPUs of the system. The structure is shared and the cache line updates
impact greatly the traffic between nodes and the overall IPI
performance.
As a workaround, the impact can be reduced by deactivating the IRQ
lockup detector ("noirqdebug") which does a lot of accounting in the
Linux IRQ descriptor structure and is responsible for most of the
performance penalty.
As a fix, this proposal allocates an IPI interrupt per node, to be
shared by all CPUs of that node. It solves the scaling issue, the IRQ
lockup detector still has an impact but the XIVE interrupt rate scales
linearly. It also improves the "noirqdebug" case as showed in the
tables below.
* P9 DD2.2 - 2s * 64 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 4.984023 4.875405 4.996536 5.048892
0-31 10.879164 10.544040 10.757632 11.037859
0-47 15.345301 14.688764 14.926520 15.310053
0-63 17.064907 17.066812 17.613416 17.874511
2 0-79 11.768764 21.650749 22.689120 22.566508
0-95 10.616812 26.878789 28.434703 28.320324
0-111 10.151693 31.397803 31.771773 32.388122
0-127 9.948502 33.139336 34.875716 35.224548
* P10 DD1 - 4s (not homogeneous) 352 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 2.409402 2.364108 2.383303 2.395091
0-31 6.028325 6.046075 6.089999 6.073750
0-47 8.655178 8.644531 8.712830 8.724702
0-63 11.629652 11.735953 12.088203 12.055979
0-79 14.392321 14.729959 14.986701 14.973073
0-95 12.604158 13.004034 17.528748 17.568095
2 0-111 9.767753 13.719831 19.968606 20.024218
0-127 6.744566 16.418854 22.898066 22.995110
0-143 6.005699 19.174421 25.425622 25.417541
0-159 5.649719 21.938836 27.952662 28.059603
0-175 5.441410 24.109484 31.133915 31.127996
3 0-191 5.318341 24.405322 33.999221 33.775354
0-207 5.191382 26.449769 36.050161 35.867307
0-223 5.102790 29.356943 39.544135 39.508169
0-239 5.035295 31.933051 42.135075 42.071975
0-255 4.969209 34.477367 44.655395 44.757074
4 0-271 4.907652 35.887016 47.080545 47.318537
0-287 4.839581 38.076137 50.464307 50.636219
0-303 4.786031 40.881319 53.478684 53.310759
0-319 4.743750 43.448424 56.388102 55.973969
0-335 4.709936 45.623532 59.400930 58.926857
0-351 4.681413 45.646151 62.035804 61.830057
[*] https://github.com/antonblanchard/ipistorm
Thanks,
C.
Changes in v3:
- improved commit log for the misuse of "ibm,chip-id"
- better error handling of xive_request_ipi()
- use of a fwnode_handle to name the new domain
- increased IPI name length
- use of early_cpu_to_node() for hotplugged CPUs
- filter CPU-less nodes
Changes in v2:
- extra simplification on xmon
- fixes on issues reported by the kernel test robot
Cédric Le Goater (9):
powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property
powerpc/xive: Introduce an IPI interrupt domain
powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
powerpc/xive: Simplify xive_core_debug_show()
powerpc/xive: Drop check on irq_data in xive_core_debug_show()
powerpc/xive: Simplify the dump of XIVE interrupts under xmon
powerpc/xive: Fix xmon command "dxi"
powerpc/xive: Map one IPI interrupt per node
powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
arch/powerpc/include/asm/xive.h | 1 +
arch/powerpc/sysdev/xive/xive-internal.h | 2 -
arch/powerpc/sysdev/xive/common.c | 211 +++++++++++++++--------
arch/powerpc/xmon/xmon.c | 28 +--
4 files changed, 139 insertions(+), 103 deletions(-)
--
2.26.3
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm, chip-id" property
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-04-01 2:49 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property David Gibson
2021-03-31 14:45 ` [PATCH v3 2/9] powerpc/xive: Introduce an IPI interrupt domain Cédric Le Goater
` (10 subsequent siblings)
11 siblings, 1 reply; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: David Gibson, Greg Kurz, Cédric Le Goater
The 'chip_id' field of the XIVE CPU structure is used to choose a
target for a source located on the same chip when possible. The XIVE
driver queries the chip id value from the "ibm,chip-id" DT property
but this property is not available on all platforms. It was first
introduced on the PowerNV platform and later, under QEMU for pseries.
However, the property does not exist under PowerVM since it is not
specified in PAPR.
cpu_to_node() is a better alternative. On the PowerNV platform, the
node id is computed from the "ibm,associativity" property of the CPU.
Its value is built in the OPAL firmware from the physical chip id and
is equivalent to "ibm,chip-id". On pSeries, the hcall
H_HOME_NODE_ASSOCIATIVITY returns the node id.
Also to be noted that under QEMU/KVM "ibm,chip-id" is badly calculated
with unusual SMT configuration. This leads to a bogus chip id value
being returned by of_get_ibm_chip_id().
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
Changes in v3:
- improved commit log for the misuse of "ibm,chip-id"
arch/powerpc/sysdev/xive/common.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 7e08be5e5e4a..776871274b69 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1336,16 +1336,11 @@ static int xive_prepare_cpu(unsigned int cpu)
xc = per_cpu(xive_cpu, cpu);
if (!xc) {
- struct device_node *np;
-
xc = kzalloc_node(sizeof(struct xive_cpu),
GFP_KERNEL, cpu_to_node(cpu));
if (!xc)
return -ENOMEM;
- np = of_get_cpu_node(cpu, NULL);
- if (np)
- xc->chip_id = of_get_ibm_chip_id(np);
- of_node_put(np);
+ xc->chip_id = cpu_to_node(cpu);
xc->hw_ipi = XIVE_BAD_IRQ;
per_cpu(xive_cpu, cpu) = xc;
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 2/9] powerpc/xive: Introduce an IPI interrupt domain
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm, chip-id" property Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 3/9] powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ Cédric Le Goater
` (9 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Thomas Gleixner, Greg Kurz, Cédric Le Goater
The IPI interrupt is a special case of the XIVE IRQ domain. When
mapping and unmapping the interrupts in the Linux interrupt number
space, the HW interrupt number 0 (XIVE_IPI_HW_IRQ) is checked to
distinguish the IPI interrupt from other interrupts of the system.
Simplify the XIVE interrupt domain by introducing a specific domain
for the IPI.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
Changes in v3:
- better error handling of xive_request_ipi()
- use of a fwnode_handle to name the new domain
- dropped Greg's Reviewed-by because of the changes
arch/powerpc/sysdev/xive/common.c | 79 ++++++++++++++++++-------------
1 file changed, 46 insertions(+), 33 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 776871274b69..98f4dc916fa1 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1068,24 +1068,58 @@ static struct irq_chip xive_ipi_chip = {
.irq_unmask = xive_ipi_do_nothing,
};
-static void __init xive_request_ipi(void)
+/*
+ * IPIs are marked per-cpu. We use separate HW interrupts under the
+ * hood but associated with the same "linux" interrupt
+ */
+static int xive_ipi_irq_domain_map(struct irq_domain *h, unsigned int virq,
+ irq_hw_number_t hw)
{
+ irq_set_chip_and_handler(virq, &xive_ipi_chip, handle_percpu_irq);
+ return 0;
+}
+
+static const struct irq_domain_ops xive_ipi_irq_domain_ops = {
+ .map = xive_ipi_irq_domain_map,
+};
+
+static int __init xive_request_ipi(void)
+{
+ struct fwnode_handle *fwnode;
+ struct irq_domain *ipi_domain;
unsigned int virq;
+ int ret = -ENOMEM;
- /*
- * Initialization failed, move on, we might manage to
- * reach the point where we display our errors before
- * the system falls appart
- */
- if (!xive_irq_domain)
- return;
+ fwnode = irq_domain_alloc_named_fwnode("XIVE-IPI");
+ if (!fwnode)
+ goto out;
+
+ ipi_domain = irq_domain_create_linear(fwnode, 1,
+ &xive_ipi_irq_domain_ops, NULL);
+ if (!ipi_domain)
+ goto out_free_fwnode;
/* Initialize it */
- virq = irq_create_mapping(xive_irq_domain, XIVE_IPI_HW_IRQ);
+ virq = irq_create_mapping(ipi_domain, XIVE_IPI_HW_IRQ);
+ if (!virq) {
+ ret = -EINVAL;
+ goto out_free_domain;
+ }
+
xive_ipi_irq = virq;
- WARN_ON(request_irq(virq, xive_muxed_ipi_action,
- IRQF_PERCPU | IRQF_NO_THREAD, "IPI", NULL));
+ ret = request_irq(virq, xive_muxed_ipi_action,
+ IRQF_PERCPU | IRQF_NO_THREAD, "IPI", NULL);
+
+ WARN(ret < 0, "Failed to request IPI %d: %d\n", virq, ret);
+ return ret;
+
+out_free_domain:
+ irq_domain_remove(ipi_domain);
+out_free_fwnode:
+ irq_domain_free_fwnode(fwnode);
+out:
+ return ret;
}
static int xive_setup_cpu_ipi(unsigned int cpu)
@@ -1179,19 +1213,6 @@ static int xive_irq_domain_map(struct irq_domain *h, unsigned int virq,
*/
irq_clear_status_flags(virq, IRQ_LEVEL);
-#ifdef CONFIG_SMP
- /* IPIs are special and come up with HW number 0 */
- if (hw == XIVE_IPI_HW_IRQ) {
- /*
- * IPIs are marked per-cpu. We use separate HW interrupts under
- * the hood but associated with the same "linux" interrupt
- */
- irq_set_chip_and_handler(virq, &xive_ipi_chip,
- handle_percpu_irq);
- return 0;
- }
-#endif
-
rc = xive_irq_alloc_data(virq, hw);
if (rc)
return rc;
@@ -1203,15 +1224,7 @@ static int xive_irq_domain_map(struct irq_domain *h, unsigned int virq,
static void xive_irq_domain_unmap(struct irq_domain *d, unsigned int virq)
{
- struct irq_data *data = irq_get_irq_data(virq);
- unsigned int hw_irq;
-
- /* XXX Assign BAD number */
- if (!data)
- return;
- hw_irq = (unsigned int)irqd_to_hwirq(data);
- if (hw_irq != XIVE_IPI_HW_IRQ)
- xive_irq_free_data(virq);
+ xive_irq_free_data(virq);
}
static int xive_irq_domain_xlate(struct irq_domain *h, struct device_node *ct,
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 3/9] powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm, chip-id" property Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 2/9] powerpc/xive: Introduce an IPI interrupt domain Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 4/9] powerpc/xive: Simplify xive_core_debug_show() Cédric Le Goater
` (8 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Greg Kurz, Cédric Le Goater
The IPI interrupt has its own domain now. Testing the HW interrupt
number is not needed anymore.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 98f4dc916fa1..8bca9aca0607 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1417,13 +1417,12 @@ static void xive_flush_cpu_queue(unsigned int cpu, struct xive_cpu *xc)
struct irq_desc *desc = irq_to_desc(irq);
struct irq_data *d = irq_desc_get_irq_data(desc);
struct xive_irq_data *xd;
- unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
/*
* Ignore anything that isn't a XIVE irq and ignore
* IPIs, so can just be dropped.
*/
- if (d->domain != xive_irq_domain || hw_irq == XIVE_IPI_HW_IRQ)
+ if (d->domain != xive_irq_domain)
continue;
/*
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 4/9] powerpc/xive: Simplify xive_core_debug_show()
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (2 preceding siblings ...)
2021-03-31 14:45 ` [PATCH v3 3/9] powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 5/9] powerpc/xive: Drop check on irq_data in xive_core_debug_show() Cédric Le Goater
` (7 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Greg Kurz, Cédric Le Goater
Now that the IPI interrupt has its own domain, the checks on the HW
interrupt number XIVE_IPI_HW_IRQ and on the chip can be replaced by a
check on the domain.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 8bca9aca0607..4149ca846e7c 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1600,17 +1600,14 @@ static void xive_debug_show_cpu(struct seq_file *m, int cpu)
seq_puts(m, "\n");
}
-static void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d)
+static void xive_debug_show_irq(struct seq_file *m, struct irq_data *d)
{
- struct irq_chip *chip = irq_data_get_irq_chip(d);
+ unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
int rc;
u32 target;
u8 prio;
u32 lirq;
- if (!is_xive_irq(chip))
- return;
-
rc = xive_ops->get_irq_config(hw_irq, &target, &prio, &lirq);
if (rc) {
seq_printf(m, "IRQ 0x%08x : no config rc=%d\n", hw_irq, rc);
@@ -1648,16 +1645,9 @@ static int xive_core_debug_show(struct seq_file *m, void *private)
for_each_irq_desc(i, desc) {
struct irq_data *d = irq_desc_get_irq_data(desc);
- unsigned int hw_irq;
-
- if (!d)
- continue;
-
- hw_irq = (unsigned int)irqd_to_hwirq(d);
- /* IPIs are special (HW number 0) */
- if (hw_irq != XIVE_IPI_HW_IRQ)
- xive_debug_show_irq(m, hw_irq, d);
+ if (d->domain == xive_irq_domain)
+ xive_debug_show_irq(m, d);
}
return 0;
}
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 5/9] powerpc/xive: Drop check on irq_data in xive_core_debug_show()
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (3 preceding siblings ...)
2021-03-31 14:45 ` [PATCH v3 4/9] powerpc/xive: Simplify xive_core_debug_show() Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 6/9] powerpc/xive: Simplify the dump of XIVE interrupts under xmon Cédric Le Goater
` (6 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev
Cc: kernel test robot, Greg Kurz, Dan Carpenter, Cédric Le Goater
When looping on IRQ descriptor, irq_data is always valid.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 930914b7d528 ("powerpc/xive: Add a debugfs file to dump internal XIVE state")
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 4149ca846e7c..f5fe60c194bc 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1607,6 +1607,8 @@ static void xive_debug_show_irq(struct seq_file *m, struct irq_data *d)
u32 target;
u8 prio;
u32 lirq;
+ struct xive_irq_data *xd;
+ u64 val;
rc = xive_ops->get_irq_config(hw_irq, &target, &prio, &lirq);
if (rc) {
@@ -1617,17 +1619,14 @@ static void xive_debug_show_irq(struct seq_file *m, struct irq_data *d)
seq_printf(m, "IRQ 0x%08x : target=0x%x prio=%02x lirq=0x%x ",
hw_irq, target, prio, lirq);
- if (d) {
- struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
- u64 val = xive_esb_read(xd, XIVE_ESB_GET);
-
- seq_printf(m, "flags=%c%c%c PQ=%c%c",
- xd->flags & XIVE_IRQ_FLAG_STORE_EOI ? 'S' : ' ',
- xd->flags & XIVE_IRQ_FLAG_LSI ? 'L' : ' ',
- xd->flags & XIVE_IRQ_FLAG_H_INT_ESB ? 'H' : ' ',
- val & XIVE_ESB_VAL_P ? 'P' : '-',
- val & XIVE_ESB_VAL_Q ? 'Q' : '-');
- }
+ xd = irq_data_get_irq_handler_data(d);
+ val = xive_esb_read(xd, XIVE_ESB_GET);
+ seq_printf(m, "flags=%c%c%c PQ=%c%c",
+ xd->flags & XIVE_IRQ_FLAG_STORE_EOI ? 'S' : ' ',
+ xd->flags & XIVE_IRQ_FLAG_LSI ? 'L' : ' ',
+ xd->flags & XIVE_IRQ_FLAG_H_INT_ESB ? 'H' : ' ',
+ val & XIVE_ESB_VAL_P ? 'P' : '-',
+ val & XIVE_ESB_VAL_Q ? 'Q' : '-');
seq_puts(m, "\n");
}
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 6/9] powerpc/xive: Simplify the dump of XIVE interrupts under xmon
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (4 preceding siblings ...)
2021-03-31 14:45 ` [PATCH v3 5/9] powerpc/xive: Drop check on irq_data in xive_core_debug_show() Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 7/9] powerpc/xive: Fix xmon command "dxi" Cédric Le Goater
` (5 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Greg Kurz, Cédric Le Goater
Move the xmon routine under XIVE subsystem and rework the loop on the
interrupts taking into account the xive_irq_domain to filter out IPIs.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/include/asm/xive.h | 1 +
arch/powerpc/sysdev/xive/common.c | 14 ++++++++++++++
arch/powerpc/xmon/xmon.c | 28 ++--------------------------
3 files changed, 17 insertions(+), 26 deletions(-)
diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index 9a312b975ca8..aa094a8655b0 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -102,6 +102,7 @@ void xive_flush_interrupt(void);
/* xmon hook */
void xmon_xive_do_dump(int cpu);
int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d);
+void xmon_xive_get_irq_all(void);
/* APIs used by KVM */
u32 xive_native_default_eq_shift(void);
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index f5fe60c194bc..4c6e2e1289f5 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -289,6 +289,20 @@ int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d)
return 0;
}
+void xmon_xive_get_irq_all(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_data *d = irq_desc_get_irq_data(desc);
+ unsigned int hwirq = (unsigned int)irqd_to_hwirq(d);
+
+ if (d->domain == xive_irq_domain)
+ xmon_xive_get_irq_config(hwirq, d);
+ }
+}
+
#endif /* CONFIG_XMON */
static unsigned int xive_get_irq(void)
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 3fe37495f63d..80fbf8968f77 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2727,30 +2727,6 @@ static void dump_all_xives(void)
dump_one_xive(cpu);
}
-static void dump_one_xive_irq(u32 num, struct irq_data *d)
-{
- xmon_xive_get_irq_config(num, d);
-}
-
-static void dump_all_xive_irq(void)
-{
- unsigned int i;
- struct irq_desc *desc;
-
- for_each_irq_desc(i, desc) {
- struct irq_data *d = irq_desc_get_irq_data(desc);
- unsigned int hwirq;
-
- if (!d)
- continue;
-
- hwirq = (unsigned int)irqd_to_hwirq(d);
- /* IPIs are special (HW number 0) */
- if (hwirq)
- dump_one_xive_irq(hwirq, d);
- }
-}
-
static void dump_xives(void)
{
unsigned long num;
@@ -2767,9 +2743,9 @@ static void dump_xives(void)
return;
} else if (c == 'i') {
if (scanhex(&num))
- dump_one_xive_irq(num, NULL);
+ xmon_xive_get_irq_config(num, NULL);
else
- dump_all_xive_irq();
+ xmon_xive_get_irq_all();
return;
}
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 7/9] powerpc/xive: Fix xmon command "dxi"
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (5 preceding siblings ...)
2021-03-31 14:45 ` [PATCH v3 6/9] powerpc/xive: Simplify the dump of XIVE interrupts under xmon Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (4 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev
Cc: kernel test robot, Greg Kurz, Dan Carpenter, Cédric Le Goater
When under xmon, the "dxi" command dumps the state of the XIVE
interrupts. If an interrupt number is specified, only the state of
the associated XIVE interrupt is dumped. This form of the command
lacks an irq_data parameter which is nevertheless used by
xmon_xive_get_irq_config(), leading to an xmon crash.
Fix that by doing a lookup in the system IRQ mapping to query the IRQ
descriptor data. Invalid interrupt numbers, or not belonging to the
XIVE IRQ domain, OPAL event interrupt number for instance, should be
caught by the previous query done at the firmware level.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 97ef27507793 ("powerpc/xive: Fix xmon support on the PowerNV platform")
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 4c6e2e1289f5..69980b49afb7 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -253,17 +253,20 @@ notrace void xmon_xive_do_dump(int cpu)
xmon_printf("\n");
}
+static struct irq_data *xive_get_irq_data(u32 hw_irq)
+{
+ unsigned int irq = irq_find_mapping(xive_irq_domain, hw_irq);
+
+ return irq ? irq_get_irq_data(irq) : NULL;
+}
+
int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d)
{
- struct irq_chip *chip = irq_data_get_irq_chip(d);
int rc;
u32 target;
u8 prio;
u32 lirq;
- if (!is_xive_irq(chip))
- return -EINVAL;
-
rc = xive_ops->get_irq_config(hw_irq, &target, &prio, &lirq);
if (rc) {
xmon_printf("IRQ 0x%08x : no config rc=%d\n", hw_irq, rc);
@@ -273,6 +276,9 @@ int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d)
xmon_printf("IRQ 0x%08x : target=0x%x prio=%02x lirq=0x%x ",
hw_irq, target, prio, lirq);
+ if (!d)
+ d = xive_get_irq_data(hw_irq);
+
if (d) {
struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
u64 val = xive_esb_read(xd, XIVE_ESB_GET);
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (6 preceding siblings ...)
2021-03-31 14:45 ` [PATCH v3 7/9] powerpc/xive: Fix xmon command "dxi" Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-04-01 12:50 ` Nicholas Piggin
2021-03-31 14:45 ` [PATCH v3 9/9] powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler Cédric Le Goater
` (3 subsequent siblings)
11 siblings, 1 reply; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Thomas Gleixner, Greg Kurz, Cédric Le Goater
ipistorm [*] can be used to benchmark the raw interrupt rate of an
interrupt controller by measuring the number of IPIs a system can
sustain. When applied to the XIVE interrupt controller of POWER9 and
POWER10 systems, a significant drop of the interrupt rate can be
observed when crossing the second node boundary.
This is due to the fact that a single IPI interrupt is used for all
CPUs of the system. The structure is shared and the cache line updates
impact greatly the traffic between nodes and the overall IPI
performance.
As a workaround, the impact can be reduced by deactivating the IRQ
lockup detector ("noirqdebug") which does a lot of accounting in the
Linux IRQ descriptor structure and is responsible for most of the
performance penalty.
As a fix, this proposal allocates an IPI interrupt per node, to be
shared by all CPUs of that node. It solves the scaling issue, the IRQ
lockup detector still has an impact but the XIVE interrupt rate scales
linearly. It also improves the "noirqdebug" case as showed in the
tables below.
* P9 DD2.2 - 2s * 64 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 4.984023 4.875405 4.996536 5.048892
0-31 10.879164 10.544040 10.757632 11.037859
0-47 15.345301 14.688764 14.926520 15.310053
0-63 17.064907 17.066812 17.613416 17.874511
2 0-79 11.768764 21.650749 22.689120 22.566508
0-95 10.616812 26.878789 28.434703 28.320324
0-111 10.151693 31.397803 31.771773 32.388122
0-127 9.948502 33.139336 34.875716 35.224548
* P10 DD1 - 4s (not homogeneous) 352 threads
"noirqdebug"
Mint/s Mint/s
chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
--------------------------------------------------------------
1 0-15 2.409402 2.364108 2.383303 2.395091
0-31 6.028325 6.046075 6.089999 6.073750
0-47 8.655178 8.644531 8.712830 8.724702
0-63 11.629652 11.735953 12.088203 12.055979
0-79 14.392321 14.729959 14.986701 14.973073
0-95 12.604158 13.004034 17.528748 17.568095
2 0-111 9.767753 13.719831 19.968606 20.024218
0-127 6.744566 16.418854 22.898066 22.995110
0-143 6.005699 19.174421 25.425622 25.417541
0-159 5.649719 21.938836 27.952662 28.059603
0-175 5.441410 24.109484 31.133915 31.127996
3 0-191 5.318341 24.405322 33.999221 33.775354
0-207 5.191382 26.449769 36.050161 35.867307
0-223 5.102790 29.356943 39.544135 39.508169
0-239 5.035295 31.933051 42.135075 42.071975
0-255 4.969209 34.477367 44.655395 44.757074
4 0-271 4.907652 35.887016 47.080545 47.318537
0-287 4.839581 38.076137 50.464307 50.636219
0-303 4.786031 40.881319 53.478684 53.310759
0-319 4.743750 43.448424 56.388102 55.973969
0-335 4.709936 45.623532 59.400930 58.926857
0-351 4.681413 45.646151 62.035804 61.830057
[*] https://github.com/antonblanchard/ipistorm
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
Changes in v3:
- increased IPI name length
- use of early_cpu_to_node() for hotplugged CPUs
- filter CPU-less nodes
- dropped Greg's Reviewed-by because of the changes
arch/powerpc/sysdev/xive/xive-internal.h | 2 -
arch/powerpc/sysdev/xive/common.c | 60 +++++++++++++++++++-----
2 files changed, 47 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/xive-internal.h b/arch/powerpc/sysdev/xive/xive-internal.h
index 9cf57c722faa..b3a456fdd3a5 100644
--- a/arch/powerpc/sysdev/xive/xive-internal.h
+++ b/arch/powerpc/sysdev/xive/xive-internal.h
@@ -5,8 +5,6 @@
#ifndef __XIVE_INTERNAL_H
#define __XIVE_INTERNAL_H
-#define XIVE_IPI_HW_IRQ 0 /* interrupt source # for IPIs */
-
/*
* A "disabled" interrupt should never fire, to catch problems
* we set its logical number to this
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 69980b49afb7..06f29cd07448 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -63,8 +63,19 @@ static const struct xive_ops *xive_ops;
static struct irq_domain *xive_irq_domain;
#ifdef CONFIG_SMP
-/* The IPIs all use the same logical irq number */
-static u32 xive_ipi_irq;
+/* The IPIs use the same logical irq number when on the same chip */
+static struct xive_ipi_desc {
+ unsigned int irq;
+ char name[16];
+} *xive_ipis;
+
+/*
+ * Use early_cpu_to_node() for hot-plugged CPUs
+ */
+static unsigned int xive_ipi_cpu_to_irq(unsigned int cpu)
+{
+ return xive_ipis[early_cpu_to_node(cpu)].irq;
+}
#endif
/* Xive state for each CPU */
@@ -1107,33 +1118,53 @@ static int __init xive_request_ipi(void)
{
struct fwnode_handle *fwnode;
struct irq_domain *ipi_domain;
- unsigned int virq;
+ unsigned int node;
int ret = -ENOMEM;
fwnode = irq_domain_alloc_named_fwnode("XIVE-IPI");
if (!fwnode)
goto out;
- ipi_domain = irq_domain_create_linear(fwnode, 1,
+ ipi_domain = irq_domain_create_linear(fwnode, nr_node_ids,
&xive_ipi_irq_domain_ops, NULL);
if (!ipi_domain)
goto out_free_fwnode;
- /* Initialize it */
- virq = irq_create_mapping(ipi_domain, XIVE_IPI_HW_IRQ);
- if (!virq) {
- ret = -EINVAL;
+ xive_ipis = kcalloc(nr_node_ids, sizeof(*xive_ipis), GFP_KERNEL | __GFP_NOFAIL);
+ if (!xive_ipis)
goto out_free_domain;
- }
- xive_ipi_irq = virq;
+ for_each_node(node) {
+ struct xive_ipi_desc *xid = &xive_ipis[node];
+ irq_hw_number_t ipi_hwirq = node;
+
+ /* Skip nodes without CPUs */
+ if (cpumask_empty(cpumask_of_node(node)))
+ continue;
+
+ /*
+ * Map one IPI interrupt per node for all cpus of that node.
+ * Since the HW interrupt number doesn't have any meaning,
+ * simply use the node number.
+ */
+ xid->irq = irq_create_mapping(ipi_domain, ipi_hwirq);
+ if (!xid->irq) {
+ ret = -EINVAL;
+ goto out_free_xive_ipis;
+ }
+
+ snprintf(xid->name, sizeof(xid->name), "IPI-%d", node);
+
+ ret = request_irq(xid->irq, xive_muxed_ipi_action,
+ IRQF_PERCPU | IRQF_NO_THREAD, xid->name, NULL);
- ret = request_irq(virq, xive_muxed_ipi_action,
- IRQF_PERCPU | IRQF_NO_THREAD, "IPI", NULL);
+ WARN(ret < 0, "Failed to request IPI %d: %d\n", xid->irq, ret);
+ }
- WARN(ret < 0, "Failed to request IPI %d: %d\n", virq, ret);
return ret;
+out_free_xive_ipis:
+ kfree(xive_ipis);
out_free_domain:
irq_domain_remove(ipi_domain);
out_free_fwnode:
@@ -1144,6 +1175,7 @@ static int __init xive_request_ipi(void)
static int xive_setup_cpu_ipi(unsigned int cpu)
{
+ unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu);
struct xive_cpu *xc;
int rc;
@@ -1186,6 +1218,8 @@ static int xive_setup_cpu_ipi(unsigned int cpu)
static void xive_cleanup_cpu_ipi(unsigned int cpu, struct xive_cpu *xc)
{
+ unsigned int xive_ipi_irq = xive_ipi_cpu_to_irq(cpu);
+
/* Disable the IPI and free the IRQ data */
/* Already cleaned up ? */
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 9/9] powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (7 preceding siblings ...)
2021-03-31 14:45 ` [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
@ 2021-03-31 14:45 ` Cédric Le Goater
2021-04-01 8:04 ` [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Greg Kurz
` (2 subsequent siblings)
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-03-31 14:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Thomas Gleixner, Greg Kurz, Cédric Le Goater
Instead of calling irq_create_mapping() to map the IPI for a node,
introduce an 'alloc' handler. This is usually an extension to support
hierarchy irq_domains which is not exactly the case for XIVE-IPI
domain. However, we can now use the irq_domain_alloc_irqs() routine
which allocates the IRQ descriptor on the specified node, even better
for cache performance on multi node machines.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
I didn't rerun the benchmark to check for a difference.
arch/powerpc/sysdev/xive/common.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 06f29cd07448..bb7435ffe99c 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1103,15 +1103,26 @@ static struct irq_chip xive_ipi_chip = {
* IPIs are marked per-cpu. We use separate HW interrupts under the
* hood but associated with the same "linux" interrupt
*/
-static int xive_ipi_irq_domain_map(struct irq_domain *h, unsigned int virq,
- irq_hw_number_t hw)
+struct xive_ipi_alloc_info {
+ irq_hw_number_t hwirq;
+};
+
+static int xive_ipi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs, void *arg)
{
- irq_set_chip_and_handler(virq, &xive_ipi_chip, handle_percpu_irq);
+ struct xive_ipi_alloc_info *info = arg;
+ int i;
+
+ for (i = 0; i < nr_irqs; i++) {
+ irq_domain_set_info(domain, virq + i, info->hwirq + i, &xive_ipi_chip,
+ domain->host_data, handle_percpu_irq,
+ NULL, NULL);
+ }
return 0;
}
static const struct irq_domain_ops xive_ipi_irq_domain_ops = {
- .map = xive_ipi_irq_domain_map,
+ .alloc = xive_ipi_irq_domain_alloc,
};
static int __init xive_request_ipi(void)
@@ -1136,7 +1147,7 @@ static int __init xive_request_ipi(void)
for_each_node(node) {
struct xive_ipi_desc *xid = &xive_ipis[node];
- irq_hw_number_t ipi_hwirq = node;
+ struct xive_ipi_alloc_info info = { node };
/* Skip nodes without CPUs */
if (cpumask_empty(cpumask_of_node(node)))
@@ -1147,9 +1158,9 @@ static int __init xive_request_ipi(void)
* Since the HW interrupt number doesn't have any meaning,
* simply use the node number.
*/
- xid->irq = irq_create_mapping(ipi_domain, ipi_hwirq);
- if (!xid->irq) {
- ret = -EINVAL;
+ xid->irq = irq_domain_alloc_irqs(ipi_domain, 1, node, &info);
+ if (xid->irq < 0) {
+ ret = xid->irq;
goto out_free_xive_ipis;
}
--
2.26.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property
2021-03-31 14:45 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm, chip-id" property Cédric Le Goater
@ 2021-04-01 2:49 ` David Gibson
2021-04-01 9:10 ` Cédric Le Goater
0 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2021-04-01 2:49 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: linuxppc-dev, Greg Kurz
[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]
On Wed, Mar 31, 2021 at 04:45:06PM +0200, Cédric Le Goater wrote:
> The 'chip_id' field of the XIVE CPU structure is used to choose a
> target for a source located on the same chip when possible. The XIVE
> driver queries the chip id value from the "ibm,chip-id" DT property
> but this property is not available on all platforms. It was first
> introduced on the PowerNV platform and later, under QEMU for pseries.
> However, the property does not exist under PowerVM since it is not
> specified in PAPR.
>
> cpu_to_node() is a better alternative. On the PowerNV platform, the
> node id is computed from the "ibm,associativity" property of the CPU.
> Its value is built in the OPAL firmware from the physical chip id and
> is equivalent to "ibm,chip-id".
Hrm... I mean, for powernv this is certainly correct, but seems to be
relying on pretty specific specifics of the OPAL / chip behaviour,
namely that the NUMA id == chip ID.
> On pSeries, the hcall
> H_HOME_NODE_ASSOCIATIVITY returns the node id.
AFAICT, the chip_id field is never actually used in the PAPR version
of XIVE. The only access to the field outside native.c is in
xive_pick_irq_target(), and it only looks at chip_id if src_chip is
valid. But src_chip is initialized to XIVE_INVALID_CHIP_ID in papr.c
So it would make more sense to me to also initialize chip_id to
XIVE_INVALID_CHIP_ID for PAPR to make it clearer that it's not
relevant.
> Also to be noted that under QEMU/KVM "ibm,chip-id" is badly calculated
> with unusual SMT configuration. This leads to a bogus chip id value
> being returned by of_get_ibm_chip_id().
I *still* don't clearly understand what you think is bogus about the
chip id value that qemu generates. It's clearly not a problem for
XIVE, since PAPR XIVE never uses it.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (8 preceding siblings ...)
2021-03-31 14:45 ` [PATCH v3 9/9] powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler Cédric Le Goater
@ 2021-04-01 8:04 ` Greg Kurz
2021-04-01 9:18 ` Cédric Le Goater
2021-04-01 8:42 ` Cédric Le Goater
2021-04-19 3:59 ` Michael Ellerman
11 siblings, 1 reply; 21+ messages in thread
From: Greg Kurz @ 2021-04-01 8:04 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: linuxppc-dev
On Wed, 31 Mar 2021 16:45:05 +0200
Cédric Le Goater <clg@kaod.org> wrote:
>
> Hello,
>
> ipistorm [*] can be used to benchmark the raw interrupt rate of an
> interrupt controller by measuring the number of IPIs a system can
> sustain. When applied to the XIVE interrupt controller of POWER9 and
> POWER10 systems, a significant drop of the interrupt rate can be
> observed when crossing the second node boundary.
>
> This is due to the fact that a single IPI interrupt is used for all
> CPUs of the system. The structure is shared and the cache line updates
> impact greatly the traffic between nodes and the overall IPI
> performance.
>
> As a workaround, the impact can be reduced by deactivating the IRQ
> lockup detector ("noirqdebug") which does a lot of accounting in the
> Linux IRQ descriptor structure and is responsible for most of the
> performance penalty.
>
> As a fix, this proposal allocates an IPI interrupt per node, to be
> shared by all CPUs of that node. It solves the scaling issue, the IRQ
> lockup detector still has an impact but the XIVE interrupt rate scales
> linearly. It also improves the "noirqdebug" case as showed in the
> tables below.
>
As explained by David and others, NUMA nodes happen to match sockets
with current POWER CPUs but these are really different concepts. NUMA
is about CPU memory accesses latency, while in the case of XIVE you
really need to identify a XIVE chip localized in a given socket.
PAPR doesn't know about sockets, only cores. In other words, a PAPR
compliant guest sees all vCPUs like they all sit in a single socket.
Same for the XIVE. Trying to introduce a concept of socket, either
by hijacking OPAL's ibm,chip-id or NUMA node ids, is a kind of
spec violation in this context. If the user cares for locality of
the vCPUs and XIVE on the same socket, then it should bind vCPU
threads to host CPUs from the same socket in the first place.
Isn't this enough to solve the performance issues this series
want to fix, without the need for virtual socket ids ?
> * P9 DD2.2 - 2s * 64 threads
>
> "noirqdebug"
> Mint/s Mint/s
> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> --------------------------------------------------------------
> 1 0-15 4.984023 4.875405 4.996536 5.048892
> 0-31 10.879164 10.544040 10.757632 11.037859
> 0-47 15.345301 14.688764 14.926520 15.310053
> 0-63 17.064907 17.066812 17.613416 17.874511
> 2 0-79 11.768764 21.650749 22.689120 22.566508
> 0-95 10.616812 26.878789 28.434703 28.320324
> 0-111 10.151693 31.397803 31.771773 32.388122
> 0-127 9.948502 33.139336 34.875716 35.224548
>
>
> * P10 DD1 - 4s (not homogeneous) 352 threads
>
> "noirqdebug"
> Mint/s Mint/s
> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> --------------------------------------------------------------
> 1 0-15 2.409402 2.364108 2.383303 2.395091
> 0-31 6.028325 6.046075 6.089999 6.073750
> 0-47 8.655178 8.644531 8.712830 8.724702
> 0-63 11.629652 11.735953 12.088203 12.055979
> 0-79 14.392321 14.729959 14.986701 14.973073
> 0-95 12.604158 13.004034 17.528748 17.568095
> 2 0-111 9.767753 13.719831 19.968606 20.024218
> 0-127 6.744566 16.418854 22.898066 22.995110
> 0-143 6.005699 19.174421 25.425622 25.417541
> 0-159 5.649719 21.938836 27.952662 28.059603
> 0-175 5.441410 24.109484 31.133915 31.127996
> 3 0-191 5.318341 24.405322 33.999221 33.775354
> 0-207 5.191382 26.449769 36.050161 35.867307
> 0-223 5.102790 29.356943 39.544135 39.508169
> 0-239 5.035295 31.933051 42.135075 42.071975
> 0-255 4.969209 34.477367 44.655395 44.757074
> 4 0-271 4.907652 35.887016 47.080545 47.318537
> 0-287 4.839581 38.076137 50.464307 50.636219
> 0-303 4.786031 40.881319 53.478684 53.310759
> 0-319 4.743750 43.448424 56.388102 55.973969
> 0-335 4.709936 45.623532 59.400930 58.926857
> 0-351 4.681413 45.646151 62.035804 61.830057
>
> [*] https://github.com/antonblanchard/ipistorm
>
> Thanks,
>
> C.
>
> Changes in v3:
>
> - improved commit log for the misuse of "ibm,chip-id"
> - better error handling of xive_request_ipi()
> - use of a fwnode_handle to name the new domain
> - increased IPI name length
> - use of early_cpu_to_node() for hotplugged CPUs
> - filter CPU-less nodes
>
> Changes in v2:
>
> - extra simplification on xmon
> - fixes on issues reported by the kernel test robot
>
> Cédric Le Goater (9):
> powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property
> powerpc/xive: Introduce an IPI interrupt domain
> powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
> powerpc/xive: Simplify xive_core_debug_show()
> powerpc/xive: Drop check on irq_data in xive_core_debug_show()
> powerpc/xive: Simplify the dump of XIVE interrupts under xmon
> powerpc/xive: Fix xmon command "dxi"
> powerpc/xive: Map one IPI interrupt per node
> powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
>
> arch/powerpc/include/asm/xive.h | 1 +
> arch/powerpc/sysdev/xive/xive-internal.h | 2 -
> arch/powerpc/sysdev/xive/common.c | 211 +++++++++++++++--------
> arch/powerpc/xmon/xmon.c | 28 +--
> 4 files changed, 139 insertions(+), 103 deletions(-)
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (9 preceding siblings ...)
2021-04-01 8:04 ` [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Greg Kurz
@ 2021-04-01 8:42 ` Cédric Le Goater
2021-04-19 3:59 ` Michael Ellerman
11 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-04-01 8:42 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Greg Kurz, David Gibson
Hello,
On 3/31/21 4:45 PM, Cédric Le Goater wrote:
>
> Hello,
>
> ipistorm [*] can be used to benchmark the raw interrupt rate of an
> interrupt controller by measuring the number of IPIs a system can
> sustain. When applied to the XIVE interrupt controller of POWER9 and
> POWER10 systems, a significant drop of the interrupt rate can be
> observed when crossing the second node boundary.
>
> This is due to the fact that a single IPI interrupt is used for all
> CPUs of the system. The structure is shared and the cache line updates
> impact greatly the traffic between nodes and the overall IPI
> performance.
>
> As a workaround, the impact can be reduced by deactivating the IRQ
> lockup detector ("noirqdebug") which does a lot of accounting in the
> Linux IRQ descriptor structure and is responsible for most of the
> performance penalty.
>
> As a fix, this proposal allocates an IPI interrupt per node, to be
> shared by all CPUs of that node. It solves the scaling issue, the IRQ
> lockup detector still has an impact but the XIVE interrupt rate scales
> linearly. It also improves the "noirqdebug" case as showed in the
> tables below. Hello,
From the comments, I received on different email threads. It seems
I am doing some wrong assumption on the code and concepts. We canpostpone this patchset. It's an optimization and there are some
more cleanups that can be done before.
Thanks for the time and the shared expertise,
C.
>
> * P9 DD2.2 - 2s * 64 threads
>
> "noirqdebug"
> Mint/s Mint/s
> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> --------------------------------------------------------------
> 1 0-15 4.984023 4.875405 4.996536 5.048892
> 0-31 10.879164 10.544040 10.757632 11.037859
> 0-47 15.345301 14.688764 14.926520 15.310053
> 0-63 17.064907 17.066812 17.613416 17.874511
> 2 0-79 11.768764 21.650749 22.689120 22.566508
> 0-95 10.616812 26.878789 28.434703 28.320324
> 0-111 10.151693 31.397803 31.771773 32.388122
> 0-127 9.948502 33.139336 34.875716 35.224548
>
>
> * P10 DD1 - 4s (not homogeneous) 352 threads
>
> "noirqdebug"
> Mint/s Mint/s
> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> --------------------------------------------------------------
> 1 0-15 2.409402 2.364108 2.383303 2.395091
> 0-31 6.028325 6.046075 6.089999 6.073750
> 0-47 8.655178 8.644531 8.712830 8.724702
> 0-63 11.629652 11.735953 12.088203 12.055979
> 0-79 14.392321 14.729959 14.986701 14.973073
> 0-95 12.604158 13.004034 17.528748 17.568095
> 2 0-111 9.767753 13.719831 19.968606 20.024218
> 0-127 6.744566 16.418854 22.898066 22.995110
> 0-143 6.005699 19.174421 25.425622 25.417541
> 0-159 5.649719 21.938836 27.952662 28.059603
> 0-175 5.441410 24.109484 31.133915 31.127996
> 3 0-191 5.318341 24.405322 33.999221 33.775354
> 0-207 5.191382 26.449769 36.050161 35.867307
> 0-223 5.102790 29.356943 39.544135 39.508169
> 0-239 5.035295 31.933051 42.135075 42.071975
> 0-255 4.969209 34.477367 44.655395 44.757074
> 4 0-271 4.907652 35.887016 47.080545 47.318537
> 0-287 4.839581 38.076137 50.464307 50.636219
> 0-303 4.786031 40.881319 53.478684 53.310759
> 0-319 4.743750 43.448424 56.388102 55.973969
> 0-335 4.709936 45.623532 59.400930 58.926857
> 0-351 4.681413 45.646151 62.035804 61.830057
>
> [*] https://github.com/antonblanchard/ipistorm
>
> Thanks,
>
> C.
>
> Changes in v3:
>
> - improved commit log for the misuse of "ibm,chip-id"
> - better error handling of xive_request_ipi()
> - use of a fwnode_handle to name the new domain
> - increased IPI name length
> - use of early_cpu_to_node() for hotplugged CPUs
> - filter CPU-less nodes
>
> Changes in v2:
>
> - extra simplification on xmon
> - fixes on issues reported by the kernel test robot
>
> Cédric Le Goater (9):
> powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property
> powerpc/xive: Introduce an IPI interrupt domain
> powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
> powerpc/xive: Simplify xive_core_debug_show()
> powerpc/xive: Drop check on irq_data in xive_core_debug_show()
> powerpc/xive: Simplify the dump of XIVE interrupts under xmon
> powerpc/xive: Fix xmon command "dxi"
> powerpc/xive: Map one IPI interrupt per node
> powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
>
> arch/powerpc/include/asm/xive.h | 1 +
> arch/powerpc/sysdev/xive/xive-internal.h | 2 -
> arch/powerpc/sysdev/xive/common.c | 211 +++++++++++++++--------
> arch/powerpc/xmon/xmon.c | 28 +--
> 4 files changed, 139 insertions(+), 103 deletions(-)
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property
2021-04-01 2:49 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property David Gibson
@ 2021-04-01 9:10 ` Cédric Le Goater
0 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-04-01 9:10 UTC (permalink / raw)
To: David Gibson; +Cc: linuxppc-dev, Greg Kurz
On 4/1/21 4:49 AM, David Gibson wrote:
> On Wed, Mar 31, 2021 at 04:45:06PM +0200, Cédric Le Goater wrote:
>> The 'chip_id' field of the XIVE CPU structure is used to choose a
>> target for a source located on the same chip when possible. The XIVE
>> driver queries the chip id value from the "ibm,chip-id" DT property
>> but this property is not available on all platforms. It was first
>> introduced on the PowerNV platform and later, under QEMU for pseries.
>> However, the property does not exist under PowerVM since it is not
>> specified in PAPR.
>>
>> cpu_to_node() is a better alternative. On the PowerNV platform, the
>> node id is computed from the "ibm,associativity" property of the CPU.
>> Its value is built in the OPAL firmware from the physical chip id and
>> is equivalent to "ibm,chip-id".
>
> Hrm... I mean, for powernv this is certainly correct, but seems to be
> relying on pretty specific specifics of the OPAL / chip behaviour,
> namely that the NUMA id == chip ID.
Yes. It seems so.
>> On pSeries, the hcall H_HOME_NODE_ASSOCIATIVITY returns the node id.
>
> AFAICT, the chip_id field is never actually used in the PAPR version
> of XIVE. The only access to the field outside native.c is in
> xive_pick_irq_target(), and it only looks at chip_id if src_chip is
> valid.
Yes.
> But src_chip is initialized to XIVE_INVALID_CHIP_ID in papr.c
Yes. The H_INT hcalls do no return any information on the source
location.
> So it would make more sense to me to also initialize chip_id to
> XIVE_INVALID_CHIP_ID for PAPR to make it clearer that it's not
> relevant.
yes. That would clarify that chip_id is only relevant on PowerVM/KVM.
We can drop this patch, it's not a requirement for patches 2-9, simply
a cleanup. I will move the chip_id assignment to a platform handler
in a other patch.
>> Also to be noted that under QEMU/KVM "ibm,chip-id" is badly calculated
>> with unusual SMT configuration. This leads to a bogus chip id value
>> being returned by of_get_ibm_chip_id().
>
> I *still* don't clearly understand what you think is bogus about the
> chip id value that qemu generates. It's clearly not a problem for
> XIVE, since PAPR XIVE never uses it.
I am getting confused by socket/node/chip concepts under PPC.
However, when looking at PHB and MSI, there is definitely a "node"
concept being used in the core IRQ layer for allocation and affinity.
We will need to clarify that when we introduce MSI domains.
Thanks,
C.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node
2021-04-01 8:04 ` [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Greg Kurz
@ 2021-04-01 9:18 ` Cédric Le Goater
2021-04-01 12:45 ` Greg Kurz
0 siblings, 1 reply; 21+ messages in thread
From: Cédric Le Goater @ 2021-04-01 9:18 UTC (permalink / raw)
To: Greg Kurz; +Cc: linuxppc-dev
Hello,
On 4/1/21 10:04 AM, Greg Kurz wrote:
> On Wed, 31 Mar 2021 16:45:05 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
>
>>
>> Hello,
>>
>> ipistorm [*] can be used to benchmark the raw interrupt rate of an
>> interrupt controller by measuring the number of IPIs a system can
>> sustain. When applied to the XIVE interrupt controller of POWER9 and
>> POWER10 systems, a significant drop of the interrupt rate can be
>> observed when crossing the second node boundary.
>>
>> This is due to the fact that a single IPI interrupt is used for all
>> CPUs of the system. The structure is shared and the cache line updates
>> impact greatly the traffic between nodes and the overall IPI
>> performance.
>>
>> As a workaround, the impact can be reduced by deactivating the IRQ
>> lockup detector ("noirqdebug") which does a lot of accounting in the
>> Linux IRQ descriptor structure and is responsible for most of the
>> performance penalty.
>>
>> As a fix, this proposal allocates an IPI interrupt per node, to be
>> shared by all CPUs of that node. It solves the scaling issue, the IRQ
>> lockup detector still has an impact but the XIVE interrupt rate scales
>> linearly. It also improves the "noirqdebug" case as showed in the
>> tables below.
>>
>
> As explained by David and others, NUMA nodes happen to match sockets
> with current POWER CPUs but these are really different concepts. NUMA
> is about CPU memory accesses latency,
This is exactly our problem. we have cache issues because hw threads
on different chips are trying to access the same structure in memory.
It happens on virtual platforms and baremetal platforms. This is not
restricted to pseries.
> while in the case of XIVE you
> really need to identify a XIVE chip localized in a given socket.
>
> PAPR doesn't know about sockets, only cores. In other words, a PAPR
> compliant guest sees all vCPUs like they all sit in a single socket.
There are also NUMA nodes on PAPR.
> Same for the XIVE. Trying to introduce a concept of socket, either
> by hijacking OPAL's ibm,chip-id or NUMA node ids, is a kind of
> spec violation in this context. If the user cares for locality of
> the vCPUs and XIVE on the same socket, then it should bind vCPU
> threads to host CPUs from the same socket in the first place.
Yes. that's a must have of course. You need to reflect the real HW
topology in the guest or LPAR if you are after performance, or
restrict the virtual machine to be on a single socket/chip/node.
And this is not only a XIVE problem. XICS has the same problem with
a shared single IPI interrupt descriptor but XICS doesn't scale well
by design, so it doesn't show.
> Isn't this enough to solve the performance issues this series
> want to fix, without the need for virtual socket ids ?
what are virtual socket ids ? A new concept ?
Thanks,
C.
>
>> * P9 DD2.2 - 2s * 64 threads
>>
>> "noirqdebug"
>> Mint/s Mint/s
>> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
>> --------------------------------------------------------------
>> 1 0-15 4.984023 4.875405 4.996536 5.048892
>> 0-31 10.879164 10.544040 10.757632 11.037859
>> 0-47 15.345301 14.688764 14.926520 15.310053
>> 0-63 17.064907 17.066812 17.613416 17.874511
>> 2 0-79 11.768764 21.650749 22.689120 22.566508
>> 0-95 10.616812 26.878789 28.434703 28.320324
>> 0-111 10.151693 31.397803 31.771773 32.388122
>> 0-127 9.948502 33.139336 34.875716 35.224548
>>
>>
>> * P10 DD1 - 4s (not homogeneous) 352 threads
>>
>> "noirqdebug"
>> Mint/s Mint/s
>> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
>> --------------------------------------------------------------
>> 1 0-15 2.409402 2.364108 2.383303 2.395091
>> 0-31 6.028325 6.046075 6.089999 6.073750
>> 0-47 8.655178 8.644531 8.712830 8.724702
>> 0-63 11.629652 11.735953 12.088203 12.055979
>> 0-79 14.392321 14.729959 14.986701 14.973073
>> 0-95 12.604158 13.004034 17.528748 17.568095
>> 2 0-111 9.767753 13.719831 19.968606 20.024218
>> 0-127 6.744566 16.418854 22.898066 22.995110
>> 0-143 6.005699 19.174421 25.425622 25.417541
>> 0-159 5.649719 21.938836 27.952662 28.059603
>> 0-175 5.441410 24.109484 31.133915 31.127996
>> 3 0-191 5.318341 24.405322 33.999221 33.775354
>> 0-207 5.191382 26.449769 36.050161 35.867307
>> 0-223 5.102790 29.356943 39.544135 39.508169
>> 0-239 5.035295 31.933051 42.135075 42.071975
>> 0-255 4.969209 34.477367 44.655395 44.757074
>> 4 0-271 4.907652 35.887016 47.080545 47.318537
>> 0-287 4.839581 38.076137 50.464307 50.636219
>> 0-303 4.786031 40.881319 53.478684 53.310759
>> 0-319 4.743750 43.448424 56.388102 55.973969
>> 0-335 4.709936 45.623532 59.400930 58.926857
>> 0-351 4.681413 45.646151 62.035804 61.830057
>>
>> [*] https://github.com/antonblanchard/ipistorm
>>
>> Thanks,
>>
>> C.
>>
>> Changes in v3:
>>
>> - improved commit log for the misuse of "ibm,chip-id"
>> - better error handling of xive_request_ipi()
>> - use of a fwnode_handle to name the new domain
>> - increased IPI name length
>> - use of early_cpu_to_node() for hotplugged CPUs
>> - filter CPU-less nodes
>>
>> Changes in v2:
>>
>> - extra simplification on xmon
>> - fixes on issues reported by the kernel test robot
>>
>> Cédric Le Goater (9):
>> powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property
>> powerpc/xive: Introduce an IPI interrupt domain
>> powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
>> powerpc/xive: Simplify xive_core_debug_show()
>> powerpc/xive: Drop check on irq_data in xive_core_debug_show()
>> powerpc/xive: Simplify the dump of XIVE interrupts under xmon
>> powerpc/xive: Fix xmon command "dxi"
>> powerpc/xive: Map one IPI interrupt per node
>> powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
>>
>> arch/powerpc/include/asm/xive.h | 1 +
>> arch/powerpc/sysdev/xive/xive-internal.h | 2 -
>> arch/powerpc/sysdev/xive/common.c | 211 +++++++++++++++--------
>> arch/powerpc/xmon/xmon.c | 28 +--
>> 4 files changed, 139 insertions(+), 103 deletions(-)
>>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node
2021-04-01 9:18 ` Cédric Le Goater
@ 2021-04-01 12:45 ` Greg Kurz
2021-04-01 17:14 ` Cédric Le Goater
0 siblings, 1 reply; 21+ messages in thread
From: Greg Kurz @ 2021-04-01 12:45 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: linuxppc-dev
On Thu, 1 Apr 2021 11:18:10 +0200
Cédric Le Goater <clg@kaod.org> wrote:
> Hello,
>
> On 4/1/21 10:04 AM, Greg Kurz wrote:
> > On Wed, 31 Mar 2021 16:45:05 +0200
> > Cédric Le Goater <clg@kaod.org> wrote:
> >
> >>
> >> Hello,
> >>
> >> ipistorm [*] can be used to benchmark the raw interrupt rate of an
> >> interrupt controller by measuring the number of IPIs a system can
> >> sustain. When applied to the XIVE interrupt controller of POWER9 and
> >> POWER10 systems, a significant drop of the interrupt rate can be
> >> observed when crossing the second node boundary.
> >>
> >> This is due to the fact that a single IPI interrupt is used for all
> >> CPUs of the system. The structure is shared and the cache line updates
> >> impact greatly the traffic between nodes and the overall IPI
> >> performance.
> >>
> >> As a workaround, the impact can be reduced by deactivating the IRQ
> >> lockup detector ("noirqdebug") which does a lot of accounting in the
> >> Linux IRQ descriptor structure and is responsible for most of the
> >> performance penalty.
> >>
> >> As a fix, this proposal allocates an IPI interrupt per node, to be
> >> shared by all CPUs of that node. It solves the scaling issue, the IRQ
> >> lockup detector still has an impact but the XIVE interrupt rate scales
> >> linearly. It also improves the "noirqdebug" case as showed in the
> >> tables below.
> >>
> >
> > As explained by David and others, NUMA nodes happen to match sockets
> > with current POWER CPUs but these are really different concepts. NUMA
> > is about CPU memory accesses latency,
>
> This is exactly our problem. we have cache issues because hw threads
> on different chips are trying to access the same structure in memory.
> It happens on virtual platforms and baremetal platforms. This is not
> restricted to pseries.
>
Ok, I get it... the XIVE HW accesses structures in RAM, just like HW threads
do, so the closer, the better. This definitely looks NUMA related indeed. So
yes, the idea of having the XIVE HW to only access local in-RAM data when
handling IPIs between vCPUs in the same NUMA node makes sense.
What is less clear is the exact role of ibm,chip-id actually. This is
currently used on PowerNV only to pick up a default target on the same
"chip" as the source if possible. What is the detailed motivation behind
this ?
> > while in the case of XIVE you
> > really need to identify a XIVE chip localized in a given socket.
> >
> > PAPR doesn't know about sockets, only cores. In other words, a PAPR
> > compliant guest sees all vCPUs like they all sit in a single socket.
>
> There are also NUMA nodes on PAPR.
>
Yes but nothing prevents a NUMA node to span over multiple sockets
or having several NUMA nodes within the same socket, even if this
isn't the case in practice with current POWER hardware.
> > Same for the XIVE. Trying to introduce a concept of socket, either
> > by hijacking OPAL's ibm,chip-id or NUMA node ids, is a kind of
> > spec violation in this context. If the user cares for locality of
> > the vCPUs and XIVE on the same socket, then it should bind vCPU
> > threads to host CPUs from the same socket in the first place.
>
> Yes. that's a must have of course. You need to reflect the real HW
> topology in the guest or LPAR if you are after performance, or
> restrict the virtual machine to be on a single socket/chip/node.
>
> And this is not only a XIVE problem. XICS has the same problem with
> a shared single IPI interrupt descriptor but XICS doesn't scale well
> by design, so it doesn't show.
>
>
> > Isn't this enough to solve the performance issues this series
> > want to fix, without the need for virtual socket ids ?
> what are virtual socket ids ? A new concept ?
>
For now, we have virtual CPUs identified by a virtual CPU id.
It thus seems natural to speak of a virtual socket id, but
anyway, the wording isn't really important here and you
don't answer the question ;-)
> Thanks,
>
> C.
>
> >
> >> * P9 DD2.2 - 2s * 64 threads
> >>
> >> "noirqdebug"
> >> Mint/s Mint/s
> >> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> >> --------------------------------------------------------------
> >> 1 0-15 4.984023 4.875405 4.996536 5.048892
> >> 0-31 10.879164 10.544040 10.757632 11.037859
> >> 0-47 15.345301 14.688764 14.926520 15.310053
> >> 0-63 17.064907 17.066812 17.613416 17.874511
> >> 2 0-79 11.768764 21.650749 22.689120 22.566508
> >> 0-95 10.616812 26.878789 28.434703 28.320324
> >> 0-111 10.151693 31.397803 31.771773 32.388122
> >> 0-127 9.948502 33.139336 34.875716 35.224548
> >>
> >>
> >> * P10 DD1 - 4s (not homogeneous) 352 threads
> >>
> >> "noirqdebug"
> >> Mint/s Mint/s
> >> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> >> --------------------------------------------------------------
> >> 1 0-15 2.409402 2.364108 2.383303 2.395091
> >> 0-31 6.028325 6.046075 6.089999 6.073750
> >> 0-47 8.655178 8.644531 8.712830 8.724702
> >> 0-63 11.629652 11.735953 12.088203 12.055979
> >> 0-79 14.392321 14.729959 14.986701 14.973073
> >> 0-95 12.604158 13.004034 17.528748 17.568095
> >> 2 0-111 9.767753 13.719831 19.968606 20.024218
> >> 0-127 6.744566 16.418854 22.898066 22.995110
> >> 0-143 6.005699 19.174421 25.425622 25.417541
> >> 0-159 5.649719 21.938836 27.952662 28.059603
> >> 0-175 5.441410 24.109484 31.133915 31.127996
> >> 3 0-191 5.318341 24.405322 33.999221 33.775354
> >> 0-207 5.191382 26.449769 36.050161 35.867307
> >> 0-223 5.102790 29.356943 39.544135 39.508169
> >> 0-239 5.035295 31.933051 42.135075 42.071975
> >> 0-255 4.969209 34.477367 44.655395 44.757074
> >> 4 0-271 4.907652 35.887016 47.080545 47.318537
> >> 0-287 4.839581 38.076137 50.464307 50.636219
> >> 0-303 4.786031 40.881319 53.478684 53.310759
> >> 0-319 4.743750 43.448424 56.388102 55.973969
> >> 0-335 4.709936 45.623532 59.400930 58.926857
> >> 0-351 4.681413 45.646151 62.035804 61.830057
> >>
> >> [*] https://github.com/antonblanchard/ipistorm
> >>
> >> Thanks,
> >>
> >> C.
> >>
> >> Changes in v3:
> >>
> >> - improved commit log for the misuse of "ibm,chip-id"
> >> - better error handling of xive_request_ipi()
> >> - use of a fwnode_handle to name the new domain
> >> - increased IPI name length
> >> - use of early_cpu_to_node() for hotplugged CPUs
> >> - filter CPU-less nodes
> >>
> >> Changes in v2:
> >>
> >> - extra simplification on xmon
> >> - fixes on issues reported by the kernel test robot
> >>
> >> Cédric Le Goater (9):
> >> powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property
> >> powerpc/xive: Introduce an IPI interrupt domain
> >> powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
> >> powerpc/xive: Simplify xive_core_debug_show()
> >> powerpc/xive: Drop check on irq_data in xive_core_debug_show()
> >> powerpc/xive: Simplify the dump of XIVE interrupts under xmon
> >> powerpc/xive: Fix xmon command "dxi"
> >> powerpc/xive: Map one IPI interrupt per node
> >> powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
> >>
> >> arch/powerpc/include/asm/xive.h | 1 +
> >> arch/powerpc/sysdev/xive/xive-internal.h | 2 -
> >> arch/powerpc/sysdev/xive/common.c | 211 +++++++++++++++--------
> >> arch/powerpc/xmon/xmon.c | 28 +--
> >> 4 files changed, 139 insertions(+), 103 deletions(-)
> >>
> >
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node
2021-03-31 14:45 ` [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
@ 2021-04-01 12:50 ` Nicholas Piggin
2021-04-02 11:31 ` Cédric Le Goater
0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2021-04-01 12:50 UTC (permalink / raw)
To: Cédric Le Goater, linuxppc-dev; +Cc: Thomas Gleixner, Greg Kurz
Excerpts from Cédric Le Goater's message of April 1, 2021 12:45 am:
> ipistorm [*] can be used to benchmark the raw interrupt rate of an
> interrupt controller by measuring the number of IPIs a system can
> sustain. When applied to the XIVE interrupt controller of POWER9 and
> POWER10 systems, a significant drop of the interrupt rate can be
> observed when crossing the second node boundary.
>
> This is due to the fact that a single IPI interrupt is used for all
> CPUs of the system. The structure is shared and the cache line updates
> impact greatly the traffic between nodes and the overall IPI
> performance.
>
> As a workaround, the impact can be reduced by deactivating the IRQ
> lockup detector ("noirqdebug") which does a lot of accounting in the
> Linux IRQ descriptor structure and is responsible for most of the
> performance penalty.
>
> As a fix, this proposal allocates an IPI interrupt per node, to be
> shared by all CPUs of that node. It solves the scaling issue, the IRQ
> lockup detector still has an impact but the XIVE interrupt rate scales
> linearly. It also improves the "noirqdebug" case as showed in the
> tables below.
>
> * P9 DD2.2 - 2s * 64 threads
>
> "noirqdebug"
> Mint/s Mint/s
> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> --------------------------------------------------------------
> 1 0-15 4.984023 4.875405 4.996536 5.048892
> 0-31 10.879164 10.544040 10.757632 11.037859
> 0-47 15.345301 14.688764 14.926520 15.310053
> 0-63 17.064907 17.066812 17.613416 17.874511
> 2 0-79 11.768764 21.650749 22.689120 22.566508
> 0-95 10.616812 26.878789 28.434703 28.320324
> 0-111 10.151693 31.397803 31.771773 32.388122
> 0-127 9.948502 33.139336 34.875716 35.224548
>
> * P10 DD1 - 4s (not homogeneous) 352 threads
>
> "noirqdebug"
> Mint/s Mint/s
> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
> --------------------------------------------------------------
> 1 0-15 2.409402 2.364108 2.383303 2.395091
> 0-31 6.028325 6.046075 6.089999 6.073750
> 0-47 8.655178 8.644531 8.712830 8.724702
> 0-63 11.629652 11.735953 12.088203 12.055979
> 0-79 14.392321 14.729959 14.986701 14.973073
> 0-95 12.604158 13.004034 17.528748 17.568095
> 2 0-111 9.767753 13.719831 19.968606 20.024218
> 0-127 6.744566 16.418854 22.898066 22.995110
> 0-143 6.005699 19.174421 25.425622 25.417541
> 0-159 5.649719 21.938836 27.952662 28.059603
> 0-175 5.441410 24.109484 31.133915 31.127996
> 3 0-191 5.318341 24.405322 33.999221 33.775354
> 0-207 5.191382 26.449769 36.050161 35.867307
> 0-223 5.102790 29.356943 39.544135 39.508169
> 0-239 5.035295 31.933051 42.135075 42.071975
> 0-255 4.969209 34.477367 44.655395 44.757074
> 4 0-271 4.907652 35.887016 47.080545 47.318537
> 0-287 4.839581 38.076137 50.464307 50.636219
> 0-303 4.786031 40.881319 53.478684 53.310759
> 0-319 4.743750 43.448424 56.388102 55.973969
> 0-335 4.709936 45.623532 59.400930 58.926857
> 0-351 4.681413 45.646151 62.035804 61.830057
>
> [*] https://github.com/antonblanchard/ipistorm
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
Very nice result but the default-on irqdebug code is quite a slowdown
even with your improvements.
Is the main cacheline bouncing in the fast path coming from
desc->irq_count++ of the percpu handler? Can we do something quick and
dirty like the attached patch?
All this stuff seems totally racy with percpu handler but maybe that
doesn't matter too much (and anyway it would be a much bigger change)
Thanks,
Nick
---
kernel/irq/spurious.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index f865e5f4d382..6b17b737ee6c 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -378,7 +378,8 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
* then we merily delay the spurious detection
* by one hard interrupt. Not a real problem.
*/
- desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
+ if (desc->threads_handled_last & SPURIOUS_DEFERRED)
+ desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
}
}
@@ -403,6 +404,10 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
desc->irqs_unhandled -= ok;
}
+ if (likely(!desc->irqs_unhandled))
+ return;
+
+ /* Now getting into unhandled irq detection */
desc->irq_count++;
if (likely(desc->irq_count < 100000))
return;
--
2.23.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node
2021-04-01 12:45 ` Greg Kurz
@ 2021-04-01 17:14 ` Cédric Le Goater
0 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-04-01 17:14 UTC (permalink / raw)
To: Greg Kurz; +Cc: linuxppc-dev
On 4/1/21 2:45 PM, Greg Kurz wrote:
> On Thu, 1 Apr 2021 11:18:10 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
>
>> Hello,
>>
>> On 4/1/21 10:04 AM, Greg Kurz wrote:
>>> On Wed, 31 Mar 2021 16:45:05 +0200
>>> Cédric Le Goater <clg@kaod.org> wrote:
>>>
>>>>
>>>> Hello,
>>>>
>>>> ipistorm [*] can be used to benchmark the raw interrupt rate of an
>>>> interrupt controller by measuring the number of IPIs a system can
>>>> sustain. When applied to the XIVE interrupt controller of POWER9 and
>>>> POWER10 systems, a significant drop of the interrupt rate can be
>>>> observed when crossing the second node boundary.
>>>>
>>>> This is due to the fact that a single IPI interrupt is used for all
>>>> CPUs of the system. The structure is shared and the cache line updates
>>>> impact greatly the traffic between nodes and the overall IPI
>>>> performance.
>>>>
>>>> As a workaround, the impact can be reduced by deactivating the IRQ
>>>> lockup detector ("noirqdebug") which does a lot of accounting in the
>>>> Linux IRQ descriptor structure and is responsible for most of the
>>>> performance penalty.
>>>>
>>>> As a fix, this proposal allocates an IPI interrupt per node, to be
>>>> shared by all CPUs of that node. It solves the scaling issue, the IRQ
>>>> lockup detector still has an impact but the XIVE interrupt rate scales
>>>> linearly. It also improves the "noirqdebug" case as showed in the
>>>> tables below.
>>>>
>>>
>>> As explained by David and others, NUMA nodes happen to match sockets
>>> with current POWER CPUs but these are really different concepts. NUMA
>>> is about CPU memory accesses latency,
>>
>> This is exactly our problem. we have cache issues because hw threads
>> on different chips are trying to access the same structure in memory.
>> It happens on virtual platforms and baremetal platforms. This is not
>> restricted to pseries.
>>
>
> Ok, I get it... the XIVE HW accesses structures in RAM, just like HW threads
> do, so the closer, the better.
No. That's another problem related to the XIVE internal tables which
should be allocated on the chip where it is "mostly" used.
The problem is much simpler. As the commit log says :
This is due to the fact that a single IPI interrupt is used for all
CPUs of the system. The structure is shared and the cache line updates
impact greatly the traffic between nodes and the overall IPI
performance.
So, we have multiple threads competing for the same IRQ descriptor and
overloading the PowerBUS with cache update synchronization.
> This definitely looks NUMA related indeed. So
> yes, the idea of having the XIVE HW to only access local in-RAM data when
> handling IPIs between vCPUs in the same NUMA node makes sense.
yes. That's the goal.
> What is less clear is the exact role of ibm,chip-id actually. This is
> currently used on PowerNV only to pick up a default target on the same
> "chip" as the source if possible. What is the detailed motivation behind
> this ?
The "ibm,chip-id" issue is extra noise and not a requirement for this
patchset.
>>> while in the case of XIVE you
>>> really need to identify a XIVE chip localized in a given socket.
>>>
>>> PAPR doesn't know about sockets, only cores. In other words, a PAPR
>>> compliant guest sees all vCPUs like they all sit in a single socket.
>>
>> There are also NUMA nodes on PAPR.
>>
>
> Yes but nothing prevents a NUMA node to span over multiple sockets
> or having several NUMA nodes within the same socket, even if this
> isn't the case in practice with current POWER hardware.
yes. A NUMA node could even be a PCI adapter attached to storage.
I don't know what to say. We are missing a concept maybe.
>>> Same for the XIVE. Trying to introduce a concept of socket, either
>>> by hijacking OPAL's ibm,chip-id or NUMA node ids, is a kind of
>>> spec violation in this context. If the user cares for locality of
>>> the vCPUs and XIVE on the same socket, then it should bind vCPU
>>> threads to host CPUs from the same socket in the first place.
>>
>> Yes. that's a must have of course. You need to reflect the real HW
>> topology in the guest or LPAR if you are after performance, or
>> restrict the virtual machine to be on a single socket/chip/node.
>>
>> And this is not only a XIVE problem. XICS has the same problem with
>> a shared single IPI interrupt descriptor but XICS doesn't scale well
>> by design, so it doesn't show.
>>
>>
>>> Isn't this enough to solve the performance issues this series
>>> want to fix, without the need for virtual socket ids ?
>> what are virtual socket ids ? A new concept ?
>>
>
> For now, we have virtual CPUs identified by a virtual CPU id.
> It thus seems natural to speak of a virtual socket id, but
> anyway, the wording isn't really important here and you
> don't answer the question ;-)
if, on the hypervisor, you restrict the virtual machine vCPUs to be
on a single POWER processor/chip, there is no problem. But large
KVM guests or PowerVM LPARs do exist on 16s systems.
C.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node
2021-04-01 12:50 ` Nicholas Piggin
@ 2021-04-02 11:31 ` Cédric Le Goater
2021-04-02 12:19 ` Cédric Le Goater
0 siblings, 1 reply; 21+ messages in thread
From: Cédric Le Goater @ 2021-04-02 11:31 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Thomas Gleixner, Greg Kurz
On 4/1/21 2:50 PM, Nicholas Piggin wrote:
> Excerpts from Cédric Le Goater's message of April 1, 2021 12:45 am:
>> ipistorm [*] can be used to benchmark the raw interrupt rate of an
>> interrupt controller by measuring the number of IPIs a system can
>> sustain. When applied to the XIVE interrupt controller of POWER9 and
>> POWER10 systems, a significant drop of the interrupt rate can be
>> observed when crossing the second node boundary.
>>
>> This is due to the fact that a single IPI interrupt is used for all
>> CPUs of the system. The structure is shared and the cache line updates
>> impact greatly the traffic between nodes and the overall IPI
>> performance.
>>
>> As a workaround, the impact can be reduced by deactivating the IRQ
>> lockup detector ("noirqdebug") which does a lot of accounting in the
>> Linux IRQ descriptor structure and is responsible for most of the
>> performance penalty.
>>
>> As a fix, this proposal allocates an IPI interrupt per node, to be
>> shared by all CPUs of that node. It solves the scaling issue, the IRQ
>> lockup detector still has an impact but the XIVE interrupt rate scales
>> linearly. It also improves the "noirqdebug" case as showed in the
>> tables below.
>>
>> * P9 DD2.2 - 2s * 64 threads
>>
>> "noirqdebug"
>> Mint/s Mint/s
>> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
>> --------------------------------------------------------------
>> 1 0-15 4.984023 4.875405 4.996536 5.048892
>> 0-31 10.879164 10.544040 10.757632 11.037859
>> 0-47 15.345301 14.688764 14.926520 15.310053
>> 0-63 17.064907 17.066812 17.613416 17.874511
>> 2 0-79 11.768764 21.650749 22.689120 22.566508
>> 0-95 10.616812 26.878789 28.434703 28.320324
>> 0-111 10.151693 31.397803 31.771773 32.388122
>> 0-127 9.948502 33.139336 34.875716 35.224548
>>
>> * P10 DD1 - 4s (not homogeneous) 352 threads
>>
>> "noirqdebug"
>> Mint/s Mint/s
>> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
>> --------------------------------------------------------------
>> 1 0-15 2.409402 2.364108 2.383303 2.395091
>> 0-31 6.028325 6.046075 6.089999 6.073750
>> 0-47 8.655178 8.644531 8.712830 8.724702
>> 0-63 11.629652 11.735953 12.088203 12.055979
>> 0-79 14.392321 14.729959 14.986701 14.973073
>> 0-95 12.604158 13.004034 17.528748 17.568095
>> 2 0-111 9.767753 13.719831 19.968606 20.024218
>> 0-127 6.744566 16.418854 22.898066 22.995110
>> 0-143 6.005699 19.174421 25.425622 25.417541
>> 0-159 5.649719 21.938836 27.952662 28.059603
>> 0-175 5.441410 24.109484 31.133915 31.127996
>> 3 0-191 5.318341 24.405322 33.999221 33.775354
>> 0-207 5.191382 26.449769 36.050161 35.867307
>> 0-223 5.102790 29.356943 39.544135 39.508169
>> 0-239 5.035295 31.933051 42.135075 42.071975
>> 0-255 4.969209 34.477367 44.655395 44.757074
>> 4 0-271 4.907652 35.887016 47.080545 47.318537
>> 0-287 4.839581 38.076137 50.464307 50.636219
>> 0-303 4.786031 40.881319 53.478684 53.310759
>> 0-319 4.743750 43.448424 56.388102 55.973969
>> 0-335 4.709936 45.623532 59.400930 58.926857
>> 0-351 4.681413 45.646151 62.035804 61.830057
>>
>> [*] https://github.com/antonblanchard/ipistorm
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>
> Very nice result but the default-on irqdebug code is quite a slowdown
> even with your improvements.
>
> Is the main cacheline bouncing in the fast path coming from
> desc->irq_count++ of the percpu handler? Can we do something quick and
> dirty like the attached patch?
>
> All this stuff seems totally racy with percpu handler but maybe that
> doesn't matter too much (and anyway it would be a much bigger change)
I gave the patch below a try and we are reaching the same results,
even better. The simplest solution is always the best. Nick, you
should send that single patch.
Thanks,
C.
> Thanks,
> Nick
>
> ---
> kernel/irq/spurious.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
> index f865e5f4d382..6b17b737ee6c 100644
> --- a/kernel/irq/spurious.c
> +++ b/kernel/irq/spurious.c
> @@ -378,7 +378,8 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
> * then we merily delay the spurious detection
> * by one hard interrupt. Not a real problem.
> */
> - desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
> + if (desc->threads_handled_last & SPURIOUS_DEFERRED)
> + desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
> }
> }
>
> @@ -403,6 +404,10 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
> desc->irqs_unhandled -= ok;
> }
>
> + if (likely(!desc->irqs_unhandled))
> + return;
> +
> + /* Now getting into unhandled irq detection */
> desc->irq_count++;
> if (likely(desc->irq_count < 100000))
> return;
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node
2021-04-02 11:31 ` Cédric Le Goater
@ 2021-04-02 12:19 ` Cédric Le Goater
0 siblings, 0 replies; 21+ messages in thread
From: Cédric Le Goater @ 2021-04-02 12:19 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Thomas Gleixner, Greg Kurz
> I gave the patch below a try and we are reaching the same results,
> even better. The simplest solution is always the best. Nick, you
> should send that single patch.
FYI, here are results in a KVM guests with pinned vCPUs.
* P9 DD2.2 - 2s * 64 threads - KVM guest :
IPI/sys IPI/chip
----------- -------------------------------- --------------------------------
unhandled unhandled
chips cpus noirqdebug detection noirqdebug detection
----------- -------------------------------- --------------------------------
1 0-15 4.152813 4.084240 4.061028 4.097700 4.042539 4.008314
0-31 8.186328 8.157970 7.937127 8.277942 8.019539 7.831189
0-47 11.391635 11.232960 11.017530 11.278048 10.994501 10.889347
0-63 13.907476 14.022307 11.460280 13.933946 13.506828 11.369188
2 0-79 18.105276 18.084463 8.376047 18.069176 17.587916 15.477006
0-95 22.100683 22.265763 7.338229 22.084006 21.588463 19.502192
0-111 25.305948 25.473068 6.716235 25.429261 24.607570 22.733253
0-127 27.814449 28.029029 6.222736 27.960119 27.253432 23.884916
The three columns "IPI/chip" are results with this series. "IPI/sys" are
without. The "unhandled detection" columns are with Nick's patch.
C.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
` (10 preceding siblings ...)
2021-04-01 8:42 ` Cédric Le Goater
@ 2021-04-19 3:59 ` Michael Ellerman
11 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2021-04-19 3:59 UTC (permalink / raw)
To: Cédric Le Goater, linuxppc-dev; +Cc: Greg Kurz
On Wed, 31 Mar 2021 16:45:05 +0200, Cédric Le Goater wrote:
> ipistorm [*] can be used to benchmark the raw interrupt rate of an
> interrupt controller by measuring the number of IPIs a system can
> sustain. When applied to the XIVE interrupt controller of POWER9 and
> POWER10 systems, a significant drop of the interrupt rate can be
> observed when crossing the second node boundary.
>
> This is due to the fact that a single IPI interrupt is used for all
> CPUs of the system. The structure is shared and the cache line updates
> impact greatly the traffic between nodes and the overall IPI
> performance.
>
> [...]
Patches 2-9 applied to powerpc/next.
[2/9] powerpc/xive: Introduce an IPI interrupt domain
https://git.kernel.org/powerpc/c/7d348494136c8b47c39d1f7ccba28c47d5094a54
[3/9] powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ
https://git.kernel.org/powerpc/c/1835e72942b5aa779c8ada62aaeba03ab66d92c9
[4/9] powerpc/xive: Simplify xive_core_debug_show()
https://git.kernel.org/powerpc/c/5159d9872823230669b7949ba3caf18c4c314846
[5/9] powerpc/xive: Drop check on irq_data in xive_core_debug_show()
https://git.kernel.org/powerpc/c/a74ce5926b20cd0e6d624a9b2527073a96dfed7f
[6/9] powerpc/xive: Simplify the dump of XIVE interrupts under xmon
https://git.kernel.org/powerpc/c/6bf66eb8f404050030805c65cf39a810892f5f8e
[7/9] powerpc/xive: Fix xmon command "dxi"
https://git.kernel.org/powerpc/c/33e4bc5946432a4ac173fd08e8e30a13ab94d06d
[8/9] powerpc/xive: Map one IPI interrupt per node
https://git.kernel.org/powerpc/c/7dcc37b3eff97379b194adb17eb9a8270512dd1d
[9/9] powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler
https://git.kernel.org/powerpc/c/fd6db2892ebaa1383a93b4a609c65b96e615510a
cheers
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2021-04-19 4:06 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm, chip-id" property Cédric Le Goater
2021-04-01 2:49 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property David Gibson
2021-04-01 9:10 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 2/9] powerpc/xive: Introduce an IPI interrupt domain Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 3/9] powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 4/9] powerpc/xive: Simplify xive_core_debug_show() Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 5/9] powerpc/xive: Drop check on irq_data in xive_core_debug_show() Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 6/9] powerpc/xive: Simplify the dump of XIVE interrupts under xmon Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 7/9] powerpc/xive: Fix xmon command "dxi" Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
2021-04-01 12:50 ` Nicholas Piggin
2021-04-02 11:31 ` Cédric Le Goater
2021-04-02 12:19 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 9/9] powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler Cédric Le Goater
2021-04-01 8:04 ` [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Greg Kurz
2021-04-01 9:18 ` Cédric Le Goater
2021-04-01 12:45 ` Greg Kurz
2021-04-01 17:14 ` Cédric Le Goater
2021-04-01 8:42 ` Cédric Le Goater
2021-04-19 3:59 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).