linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn@kernel.org>
To: Anup Patel <apatel@ventanamicro.com>
Cc: devicetree@vger.kernel.org, Conor Dooley <conor+dt@kernel.org>,
	Saravana Kannan <saravanak@google.com>,
	Marc Zyngier <maz@kernel.org>, Anup Patel <anup@brainfault.org>,
	Atish Patra <atishp@atishpatra.org>,
	linux-kernel@vger.kernel.org, Rob Herring <robh+dt@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Frank Rowand <frowand.list@gmail.com>,
	linux-riscv@lists.infradead.org,
	Andrew Jones <ajones@ventanamicro.com>
Subject: Re: [PATCH v10 00/15] Linux RISC-V AIA Support
Date: Fri, 20 Oct 2023 16:40:31 +0200	[thread overview]
Message-ID: <87o7gtpdb4.fsf@all.your.base.are.belong.to.us> (raw)
In-Reply-To: <CAK9=C2VE9-L49tMKHjSTGDSpOFZGZw14LtD1V4GMXGiVQ-A=ng@mail.gmail.com>

Anup Patel <apatel@ventanamicro.com> writes:

> On Fri, Oct 20, 2023 at 2:17 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Thanks for the quick reply!
>>
>> Anup Patel <apatel@ventanamicro.com> writes:
>>
>> > On Thu, Oct 19, 2023 at 7:13 PM Björn Töpel <bjorn@kernel.org> wrote:
>> >>
>> >> Hi Anup,
>> >>
>> >> Anup Patel <apatel@ventanamicro.com> writes:
>> >>
>> >> > The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> > process. The latest ratified AIA specifcation can be found at:
>> >> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >> >
>> >> > At a high-level, the AIA specification adds three things:
>> >> > 1) AIA CSRs
>> >> >    - Improved local interrupt support
>> >> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >> >    - Per-HART MSI controller
>> >> >    - Support MSI virtualization
>> >> >    - Support IPI along with virtualization
>> >> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >> >    - Wired interrupt controller
>> >> >    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >> >    - In Direct-mode, injects external interrupts directly into HARTs
>> >>
>> >> Thanks for working on the AIA support! I had a look at the series, and
>> >> have some concerns about interrupt ID abstraction.
>> >>
>> >> A bit of background, for readers not familiar with the AIA details.
>> >>
>> >> IMSIC allows for 2047 unique MSI ("msi-irq") sources per hart, and
>> >> each MSI is dedicated to a certain hart. The series takes the approach
>> >> to say that there are, e.g., 2047 interrupts ("lnx-irq") globally.
>> >> Each lnx-irq consists of #harts * msi-irq -- a slice -- and in the
>> >> slice only *one* msi-irq is acutally used.
>> >>
>> >> This scheme makes affinity changes more robust, because the interrupt
>> >> sources on "other" harts are pre-allocated. On the other hand it
>> >> requires to propagate irq masking to other harts via IPIs (this is
>> >> mostly done up setup/tear down). It's also wasteful, because msi-irqs
>> >> are hogged, and cannot be used.
>> >>
>> >> Contemporary storage/networking drivers usually uses queues per core
>> >> (or a sub-set of cores). The current scheme wastes a lot of msi-irqs.
>> >> If we instead used a scheme where "msi-irq == lnx-irq", instead of
>> >> "lnq-irq = {hart 0;msi-irq x , ... hart N;msi-irq x}", there would be
>> >> a lot MSIs for other users. 1-1 vs 1-N. E.g., if a storage device
>> >> would like to use 5 queues (5 cores) on a 128 core system, the current
>> >> scheme would consume 5 * 128 MSIs, instead of just 5.
>> >>
>> >> On the plus side:
>> >> * Changing interrupts affinity will never fail, because the interrupts
>> >>   on each hart is pre-allocated.
>> >>
>> >> On the negative side:
>> >> * Wasteful interrupt usage, and a system can potientially "run out" of
>> >>   interrupts. Especially for many core systems.
>> >> * Interrupt masking need to proagate to harts via IPIs (there's no
>> >>   broadcast csr in IMSIC), and a more complex locking scheme IMSIC
>> >>
>> >> Summary:
>> >> The current series caps the number of global interrupts to maximum
>> >> 2047 MSIs for all cores (whole system). A better scheme, IMO, would be
>> >> to expose 2047 * #harts unique MSIs.
>> >>
>> >> I think this could simplify/remove(?) the locking as well.
>> >
>> > Exposing 2047 * #harts unique MSIs has multiple issues:
>> > 1) The irq_set_affinity() does not work for MSIs because each
>> >      IRQ is not tied to a particular HART. This means we can't
>> >      balance the IRQ processing load among HARTs.
>>
>> Yes, you can balance. In your code, each *active* MSI is still
>> bound/active to a specific hard together with the affinity mask. In an
>> 1-1 model you would still need to track the affinity mask, but the
>> irq_set_affinity() would be different. It would try to allocate a new
>> MSI from the target CPU, and then switch to having that MSI active.
>>
>> That's what x86 does AFAIU, which is also constrained by the # of
>> available MSIs.
>>
>> The downside, as I pointed out, is that the set affinity action can
>> fail for a certain target CPU.
>
> Yes, irq_set_affinity() can fail for the suggested approach plus for
> RISC-V AIA, one HART does not have access to other HARTs
> MSI enable/disable bits so the approach will also involve IPI.

Correct, but the current series does a broadcast to all cores, where the
1-1 approach is at most an IPI to a single core.

128+c machines are getting more common, and you have devices that you
bring up/down on a per-core basis. Broadcasting IPIs to all cores, when
dealing with a per-core activity is a pretty noisy neighbor.

This could be fixed in the existing 1-n approach, by not require to sync
the cores that are not handling the MSI in question. "Lazy disable"

>> > 2) All wired IRQs for APLIC MSI-mode will also target a
>> >     fixed HART hence irq_set_affinity() won't work for wired
>> >     IRQs as well.
>>
>> I'm not following here. Why would APLIC put a constraint here? I had a
>> look at the specs, and I didn't see anything supporting the current
>> scheme explicitly.
>
> Lets say the number of APLIC wired interrupts  are greater than the
> number of per-CPU IMSIC IDs. In this case, if all wired interrupts are
> moved to a particular CPU then irq_set_affinity() will fail for some of
> the wired interrupts.

Right, it's the case of "full remote CPU" again. Thanks for clearing
that up.

>> > The idea of treating per-HART MSIs as separate IRQs has
>> > been discussed in the past.
>>
>> Aha! I tried to look for it in lore, but didn't find any. Could you
>> point me to those discussions?
>
> This was done 2 years back in the AIA TG meeting when we were
> doing the PoC for AIA spec.

Ah, too bad. Thanks regardless.

>> My concern is interrupts become a scarce resource with this
>> implementation, but maybe my view is incorrect. I've seen bare-metal
>> x86 systems (no VMs) with ~200 cores, and ~2000 interrupts, but maybe
>> that is considered "a lot of interrupts".
>>
>> As long as we don't get into scenarios where we're running out of
>> interrupts, due to the software design.
>>
>
> The current approach is simpler and ensures irq_set_affinity
> always works. The limit of max 2047 IDs is sufficient for many
> systems (if not all).

Let me give you another view. On a 128c system each core has ~16 unique
interrupts for disposal. E.g. the Intel E800 NIC has more than 2048
network queue pairs for each PF.

> When we encounter a system requiring a large number of MSIs,
> we can either:
> 1) Extend the AIA spec to support greater than 2047 IDs
> 2) Re-think the approach in the IMSIC driver
>
> The choice between #1 and #2 above depends on the
> guarantees we want for irq_set_affinity().

The irq_set_affinity() behavior is better with this series, but I think
the other downsides: number of available interrupt sources, and IPI
broadcast are worse.


Björn

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2023-10-20 14:40 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-03  4:43 [PATCH v10 00/15] Linux RISC-V AIA Support Anup Patel
2023-10-03  4:43 ` [PATCH v10 01/15] RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs Anup Patel
2023-10-03  4:43 ` [PATCH v10 02/15] of: property: Add fw_devlink support for msi-parent Anup Patel
2023-10-03  4:43 ` [PATCH v10 03/15] drivers: irqchip/riscv-intc: Mark all INTC nodes as initialized Anup Patel
2023-10-03  4:43 ` [PATCH v10 04/15] irqchip/sifive-plic: Fix syscore registration for multi-socket systems Anup Patel
2023-10-03  4:43 ` [PATCH v10 05/15] irqchip/sifive-plic: Convert PLIC driver into a platform driver Anup Patel
2023-10-03  4:43 ` [PATCH v10 06/15] irqchip/riscv-intc: Add support for RISC-V AIA Anup Patel
2023-10-03  4:43 ` [PATCH v10 07/15] dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller Anup Patel
2023-10-12 16:35   ` Conor Dooley
2023-10-13  6:46     ` Anup Patel
2023-10-13  7:41       ` Conor Dooley
2023-10-03  4:43 ` [PATCH v10 08/15] irqchip: Add RISC-V incoming MSI controller early driver Anup Patel
2023-10-03  4:43 ` [PATCH v10 09/15] irqchip/riscv-imsic: Add support for platform MSI irqdomain Anup Patel
2023-10-03  4:43 ` [PATCH v10 10/15] irqchip/riscv-imsic: Add support for PCI " Anup Patel
2023-10-03  4:43 ` [PATCH v10 11/15] dt-bindings: interrupt-controller: Add RISC-V advanced PLIC Anup Patel
2023-10-03  4:44 ` [PATCH v10 12/15] irqchip: Add RISC-V advanced PLIC driver for direct-mode Anup Patel
2023-10-03  4:44 ` [PATCH v10 13/15] irqchip/riscv-aplic: Add support for MSI-mode Anup Patel
2023-10-03  4:44 ` [PATCH v10 14/15] RISC-V: Select APLIC and IMSIC drivers Anup Patel
2023-10-03  4:44 ` [PATCH v10 15/15] MAINTAINERS: Add entry for RISC-V AIA drivers Anup Patel
2023-10-19 13:43 ` [PATCH v10 00/15] Linux RISC-V AIA Support Björn Töpel
2023-10-19 16:11   ` Anup Patel
2023-10-20  8:47     ` Björn Töpel
2023-10-20 11:00       ` Anup Patel
2023-10-20 14:40         ` Björn Töpel [this message]
2023-10-20 15:34           ` Anup Patel
2023-10-20 16:36             ` Björn Töpel
2023-10-20 17:13               ` Anup Patel
2023-10-20 19:45                 ` Björn Töpel
2023-10-23  7:02                   ` Björn Töpel
2023-10-23  8:34                     ` Anup Patel
2023-10-23 14:07                       ` Björn Töpel
2023-10-23 14:41                         ` Anup Patel
2023-10-23 15:45                           ` Björn Töpel
2023-10-23 17:25                             ` Anup Patel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o7gtpdb4.fsf@all.your.base.are.belong.to.us \
    --to=bjorn@kernel.org \
    --cc=ajones@ventanamicro.com \
    --cc=anup@brainfault.org \
    --cc=apatel@ventanamicro.com \
    --cc=atishp@atishpatra.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh+dt@kernel.org \
    --cc=saravanak@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).