Re: [PATCH v11 3/7] genirq: Add mechanism to multiplex a single HW IPI

From: Anup Patel <apatel@ventanamicro.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	 Atish Patra <atishp@atishpatra.org>,
	Alistair Francis <Alistair.Francis@wdc.com>,
	 Anup Patel <anup@brainfault.org>,
	linux-riscv@lists.infradead.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v11 3/7] genirq: Add mechanism to multiplex a single HW IPI
Date: Sat, 26 Nov 2022 19:01:46 +0530	[thread overview]
Message-ID: <CAK9=C2UeMVnMBKOgO_nseUZnmvG+CUu7xaRXtt6j3Esr+ap04g@mail.gmail.com> (raw)
In-Reply-To: <87y1rxubty.wl-maz@kernel.org>

On Sat, Nov 26, 2022 at 6:12 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Mon, 14 Nov 2022 09:39:00 +0000,
> Anup Patel <apatel@ventanamicro.com> wrote:
> >
> > All RISC-V platforms have a single HW IPI provided by the INTC local
> > interrupt controller. The HW method to trigger INTC IPI can be through
> > external irqchip (e.g. RISC-V AIA), through platform specific device
> > (e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call).
> >
> > To support multiple IPIs on RISC-V, we add a generic IPI multiplexing
> > mechanism which help us create multiple virtual IPIs using a single
> > HW IPI. This generic IPI multiplexing is inspired from the Apple AIC
> > irqchip driver and it is shared by various RISC-V irqchip drivers.
> >
> > Signed-off-by: Anup Patel <apatel@ventanamicro.com>
> > ---
> >  include/linux/irq.h  |  18 +++
> >  kernel/irq/Kconfig   |   5 +
> >  kernel/irq/Makefile  |   1 +
> >  kernel/irq/ipi-mux.c | 268 +++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 292 insertions(+)
> >  create mode 100644 kernel/irq/ipi-mux.c
> >
> > diff --git a/include/linux/irq.h b/include/linux/irq.h
> > index c3eb89606c2b..5ab702cb0a5b 100644
> > --- a/include/linux/irq.h
> > +++ b/include/linux/irq.h
> > @@ -1266,6 +1266,24 @@ int __ipi_send_mask(struct irq_desc *desc, const struct cpumask *dest);
> >  int ipi_send_single(unsigned int virq, unsigned int cpu);
> >  int ipi_send_mask(unsigned int virq, const struct cpumask *dest);
> >
> > +/**
> > + * struct ipi_mux_ops - IPI multiplex operations
> > + *
> > + * @ipi_mux_pre_handle:      Optional function called before handling parent IPI
> > + * @ipi_mux_post_handle:Optional function called after handling parent IPI
> > + * @ipi_mux_send:    Trigger parent IPI on target CPUs
> > + */
> > +struct ipi_mux_ops {
> > +     void (*ipi_mux_pre_handle)(unsigned int parent_virq, void *data);
> > +     void (*ipi_mux_post_handle)(unsigned int parent_virq, void *data);
>
> I still haven't seen any decent explanation for this other than "we
> need it". What is it that cannot be achieved via the irq_ack() and
> irq_eoi() callbacks?

Sure, even I think these are strange looking callbacks. I will drop
these callbacks in the next patch revision.

>
> > +     void (*ipi_mux_send)(unsigned int parent_virq, void *data,
> > +                          const struct cpumask *mask);
>
> In what context would the 'parent_virq' be useful? You are *sending*
> an IPI, not receiving it, and I expect the mechanism by which you send
> such an IPI to be independent of the Linux view of the irq.

Okay, I will drop the 'parent_virq' parameter.

>
> Also, please swap data and mask in the function signature.

Okay, I will change the order.

>
> > +};
> > +
> > +void ipi_mux_process(void);
> > +int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,
> > +                const struct ipi_mux_ops *ops, void *data);
> > +
> >  #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> >  /*
> >   * Registers a generic IRQ handling function as the top-level IRQ handler in
> > diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> > index db3d174c53d4..df17dbc54b02 100644
> > --- a/kernel/irq/Kconfig
> > +++ b/kernel/irq/Kconfig
> > @@ -86,6 +86,11 @@ config GENERIC_IRQ_IPI
> >       depends on SMP
> >       select IRQ_DOMAIN_HIERARCHY
> >
> > +# Generic IRQ IPI Mux support
> > +config GENERIC_IRQ_IPI_MUX
> > +     bool
> > +     depends on SMP
> > +
> >  # Generic MSI interrupt support
> >  config GENERIC_MSI_IRQ
> >       bool
> > diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> > index b4f53717d143..f19d3080bf11 100644
> > --- a/kernel/irq/Makefile
> > +++ b/kernel/irq/Makefile
> > @@ -15,6 +15,7 @@ obj-$(CONFIG_GENERIC_IRQ_MIGRATION) += cpuhotplug.o
> >  obj-$(CONFIG_PM_SLEEP) += pm.o
> >  obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
> >  obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
> > +obj-$(CONFIG_GENERIC_IRQ_IPI_MUX) += ipi-mux.o
> >  obj-$(CONFIG_SMP) += affinity.o
> >  obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
> >  obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
> > diff --git a/kernel/irq/ipi-mux.c b/kernel/irq/ipi-mux.c
> > new file mode 100644
> > index 000000000000..259e00366dd7
> > --- /dev/null
> > +++ b/kernel/irq/ipi-mux.c
> > @@ -0,0 +1,268 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Multiplex several virtual IPIs over a single HW IPI.
> > + *
> > + * Copyright The Asahi Linux Contributors
> > + * Copyright (c) 2022 Ventana Micro Systems Inc.
> > + */
> > +
> > +#define pr_fmt(fmt) "ipi-mux: " fmt
> > +#include <linux/cpu.h>
> > +#include <linux/init.h>
> > +#include <linux/irq.h>
> > +#include <linux/irqchip.h>
> > +#include <linux/irqchip/chained_irq.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/jump_label.h>
> > +#include <linux/percpu.h>
> > +#include <linux/smp.h>
> > +
> > +struct ipi_mux_cpu {
> > +     atomic_t                        enable;
> > +     atomic_t                        bits;
> > +     struct cpumask                  send_mask;
> > +};
> > +
> > +struct ipi_mux_control {
> > +     void                            *data;
> > +     unsigned int                    nr;
>
> Honestly, I think we can get rid of this. The number of IPIs Linux
> uses is pretty small, and assuming a huge value (like 32) would be
> enough. It would save looking up this value on each IPI handling.

I had kept in-case some driver wanted to create fewer (< 32)
muxed IPIs.

>
> > +     unsigned int                    parent_virq;
> > +     struct irq_domain               *domain;
> > +     const struct ipi_mux_ops        *ops;
> > +     struct ipi_mux_cpu __percpu     *cpu;
> > +};
> > +
> > +static struct ipi_mux_control *imux;
> > +static DEFINE_STATIC_KEY_FALSE(imux_pre_handle);
> > +static DEFINE_STATIC_KEY_FALSE(imux_post_handle);
> > +
> > +static void ipi_mux_mask(struct irq_data *d)
> > +{
> > +     struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > +
> > +     atomic_andnot(BIT(irqd_to_hwirq(d)), &icpu->enable);
> > +}
> > +
> > +static void ipi_mux_unmask(struct irq_data *d)
> > +{
> > +     u32 ibit = BIT(irqd_to_hwirq(d));
> > +     struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > +
> > +     atomic_or(ibit, &icpu->enable);
> > +
> > +     /*
> > +      * The atomic_or() above must complete before the atomic_read()
> > +      * below to avoid racing ipi_mux_send_mask().
> > +      */
> > +     smp_mb__after_atomic();
> > +
> > +     /* If a pending IPI was unmasked, raise a parent IPI immediately. */
> > +     if (atomic_read(&icpu->bits) & ibit)
> > +             imux->ops->ipi_mux_send(imux->parent_virq, imux->data,
> > +                                     cpumask_of(smp_processor_id()));
> > +}
> > +
> > +static void ipi_mux_send_mask(struct irq_data *d, const struct cpumask *mask)
> > +{
> > +     u32 ibit = BIT(irqd_to_hwirq(d));
> > +     struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > +     struct cpumask *send_mask = &icpu->send_mask;
> > +     unsigned long flags;
> > +     int cpu;
> > +
> > +     /*
> > +      * We use send_mask as a per-CPU variable so disable local
> > +      * interrupts to avoid being preempted.
> > +      */
> > +     local_irq_save(flags);
> > +
> > +     cpumask_clear(send_mask);
> > +
> > +     for_each_cpu(cpu, mask) {
> > +             icpu = per_cpu_ptr(imux->cpu, cpu);
> > +             atomic_or(ibit, &icpu->bits);
> > +
> > +             /*
> > +              * The atomic_or() above must complete before
> > +              * the atomic_read() below to avoid racing with
> > +              * ipi_mux_unmask().
> > +              */
> > +             smp_mb__after_atomic();
> > +
> > +             if (atomic_read(&icpu->enable) & ibit)
> > +                     cpumask_set_cpu(cpu, send_mask);
> > +     }
> > +
> > +     /* Trigger the parent IPI */
> > +     imux->ops->ipi_mux_send(imux->parent_virq, imux->data, send_mask);
> > +
> > +     local_irq_restore(flags);
> > +}
> > +
> > +static const struct irq_chip ipi_mux_chip = {
> > +     .name           = "IPI Mux",
> > +     .irq_mask       = ipi_mux_mask,
> > +     .irq_unmask     = ipi_mux_unmask,
> > +     .ipi_send_mask  = ipi_mux_send_mask,
> > +};
>
> I really think this could either be supplied by the irqchip, or
> somehow patched to avoid the pointless imux->ops->ipi_mux_send
> indirection. Pointer chasing hurts.

Once we remove ipi_mux_pre/post_handle() callbacks, the
"ops" will be pointless and we will be able to remove one level
of indirection here.

We certainly need a mux irqchip to implement the
mask/unmask semantics for muxed IPIs.

>
> > +
> > +static int ipi_mux_domain_alloc(struct irq_domain *d, unsigned int virq,
> > +                             unsigned int nr_irqs, void *arg)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < nr_irqs; i++) {
> > +             irq_set_percpu_devid(virq + i);
> > +             irq_domain_set_info(d, virq + i, i,
> > +                                 &ipi_mux_chip, d->host_data,
> > +                                 handle_percpu_devid_irq, NULL, NULL);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static const struct irq_domain_ops ipi_mux_domain_ops = {
> > +     .alloc          = ipi_mux_domain_alloc,
> > +     .free           = irq_domain_free_irqs_top,
> > +};
> > +
> > +/**
> > + * ipi_mux_process - Process multiplexed virtual IPIs
> > + */
> > +void ipi_mux_process(void)
> > +{
> > +     struct ipi_mux_cpu *icpu = this_cpu_ptr(imux->cpu);
> > +     irq_hw_number_t hwirq;
> > +     unsigned long ipis;
> > +     int en;
>
> Please use an unsigned type here (see the rationale in atomic_t.txt).

Sure, I will update.

>
> > +
> > +     if (static_branch_unlikely(&imux_pre_handle))
> > +             imux->ops->ipi_mux_pre_handle(imux->parent_virq, imux->data);
> > +
> > +     /*
> > +      * Reading enable mask does not need to be ordered as long as
> > +      * this function called from interrupt handler because only
> > +      * the CPU itself can change it's own enable mask.
> > +      */
> > +     en = atomic_read(&icpu->enable);
> > +
> > +     /*
> > +      * Clear the IPIs we are about to handle. This pairs with the
> > +      * atomic_fetch_or_release() in ipi_mux_send_mask().
> > +      */
> > +     ipis = atomic_fetch_andnot(en, &icpu->bits) & en;
> > +
> > +     for_each_set_bit(hwirq, &ipis, imux->nr)
> > +             generic_handle_domain_irq(imux->domain, hwirq);
> > +
> > +     if (static_branch_unlikely(&imux_post_handle))
> > +             imux->ops->ipi_mux_post_handle(imux->parent_virq, imux->data);
>
> Do you see what I meant about the {pre,post}_handle callback, and how
> they are *exactly* like irq_{ack,eoi}?

I see. I will remove these in the next patch revision.

>
> > +}
> > +
> > +static void ipi_mux_handler(struct irq_desc *desc)
> > +{
> > +     struct irq_chip *chip = irq_desc_get_chip(desc);
> > +
> > +     chained_irq_enter(chip, desc);
> > +     ipi_mux_process();
> > +     chained_irq_exit(chip, desc);
> > +}
> > +
> > +/**
> > + * ipi_mux_create - Create virtual IPIs multiplexed on top of a single
> > + * parent IPI.
> > + * @parent_virq:     virq of the parent per-CPU IRQ
> > + * @nr_ipi:          number of virtual IPIs to create. This should
> > + *                   be <= BITS_PER_TYPE(int)
> > + * @ops:             multiplexing operations for the parent IPI
> > + * @data:            opaque data used by the multiplexing operations
>
> What is the use for data? If anything, that data should be passed via
> the mux interrupt. But the whole point of this is to make the mux
> invisible. So this whole 'data' business is a mystery to me.

This is added only to pass back driver data in ipi_mux_send().

>
> > + *
> > + * If the parent IPI > 0 then ipi_mux_process() will be automatically
> > + * called via chained handler.
> > + *
> > + * If the parent IPI <= 0 then it is responsibility of irqchip drivers
> > + * to explicitly call ipi_mux_process() for processing muxed IPIs.
>
> 0 is a much more idiomatic value for "no parent IRQ". < 0 normally
> represents an error.

I am going to remove the "parent_virq" parameter itself so this
will go away.

>
> > + *
> > + * Returns first virq of the newly created virtual IPIs upon success
> > + * or <=0 upon failure
> > + */
> > +int ipi_mux_create(unsigned int parent_virq, unsigned int nr_ipi,
>
> Tell me how I can express a negative parent_virq here?

Sure, I will fix this.

>
> > +                const struct ipi_mux_ops *ops, void *data)
> > +{
> > +     struct fwnode_handle *fwnode;
> > +     struct irq_domain *domain;
> > +     int rc;
> > +
> > +     if (imux)
> > +             return -EEXIST;
> > +
> > +     if (BITS_PER_TYPE(int) < nr_ipi || !ops || !ops->ipi_mux_send)
> > +             return -EINVAL;
> > +
> > +     if (parent_virq &&
> > +         !irqd_is_per_cpu(irq_desc_get_irq_data(irq_to_desc(parent_virq))))
> > +             return -EINVAL;
>
> See how buggy this is if I follow your definition of "no parent IRQ"?

Sure, I will fix this.

>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Thanks,
Anup

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv