linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Marc Zyngier <maz@kernel.org>, John Garry <john.garry@huawei.com>
Cc: Zhou Wang <wangzhou1@hisilicon.com>, linux-kernel@vger.kernel.org
Subject: Re: PCI MSI issue with reinserting a driver
Date: Wed, 30 Aug 2023 01:00:38 +0200	[thread overview]
Message-ID: <87350178tl.ffs@tglx> (raw)
In-Reply-To: <3d3d0155e66429968cb4f6b4feeae4b3@kernel.org>

On Wed, Feb 03 2021 at 17:23, Marc Zyngier wrote:
> On 2021-02-02 15:46, John Garry wrote:
>> On 02/02/2021 14:48, Marc Zyngier wrote:
>>> We can't really do that. All the events must be contiguous,
>>> and there is also a lot of assumptions in the ITS driver that
>>> LPI allocations is also contiguous.
>>> 
>>> But there is also the fact that for Multi-MSI, we *must*
>>> allocate 32 vectors. Any driver could assume that if we have
>>> allocated 17 vectors, then there is another 15 available.

In theory. In practice no driver can assume that simply because there is
no corresponding interrupt descriptor. The PCI/MSI code allocates $N
vectors, the MSI code allocates exactly $N interrupt descriptors, so
there is no way that the driver can make up N = round_up_power_of_to($N)
magically.

The only reason why this makes sense is when its_dev and the associated
bitmap is shared between devices because that way you ensure that above
the non-allocated interrupts there is a gap up to the next power of two
to catch malfunctioning hardware/drivers.

If its_dev is not shared, then this makes no difference as the gap is
there naturaly.

> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
...
> @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
> irq_domain *domain,
>   		return;
>
>   	for (i = 0; i < nr_irqs; i++) {
> -		if (irq_domain_get_irq_data(domain, irq_base + i))
> -			domain->ops->free(domain, irq_base + i, 1);
> +		int n ;
> +
> +		/* Find the largest possible span of IRQs to free in one go */
> +		for (n = 0;
> +		     ((i + n) < nr_irqs &&
> +		      irq_domain_get_irq_data(domain, irq_base + i + n));
> +		     n++);
> +
> +		if (!n)
> +			continue;
> +
> +		domain->ops->free(domain, irq_base + i, n);
> +		i += n;

TBH. That makes my eyes bleed and it's just an ugly workaround, which
works by chance for that ITS implementation detail. That's really not
the place to do that.

I've stared at this before when I was doing the initial PoC to convert
GIC over to the per device MSI domain model and did not come up with a
solution to implement support for dynamic PCI-MSI allocation.

I know that you need a fix for the current code, but we really should
move all this muck over to the per device model which makes this way
simpler as you really have per device mechanisms.

The underlying problem is the whole bulk allocation/free assumption in
the ITS driver, which you need to address anyway when you ever want to
support dynamic PCI/MSI-X and PCI/IMS.

For that this really needs to be properly split up:

   1) Initialization: Reserve a LPI bitmap region for a sufficiently
      large number of MSI interrupts which handles the power of 2
      requirement

   2) Alloc: Allocate the individual interrupts within the bitmap
      region

   3) Free: Free the individual interrupts and just clear the
      corresponding bit within the bitmap region

   4) Teardown: Release the bitmap region

But lets look at that later.

For the problem at hand I rather see something like the below instead of
this hideous 'try to find a range' hackery which is incomprehensible
without a comment the size of a planet.

That's not pretty either, but at least it makes it entirely clear that
the underlying irqdomain requires this for whatever insane reason.

Thanks,

        tglx
---
 include/linux/irqdomain.h |    8 ++++++++
 kernel/irq/irqdomain.c    |    9 +++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -208,6 +208,9 @@ enum {
 	/* Irq domain is a MSI device domain */
 	IRQ_DOMAIN_FLAG_MSI_DEVICE	= (1 << 9),
 
+	/* Irq domain requires bulk freeing of interrupts */
+	IRQ_DOMAIN_FREE_BULK		= (1 << 10),
+
 	/*
 	 * Flags starting from IRQ_DOMAIN_FLAG_NONCORE are reserved
 	 * for implementation specific purposes and ignored by the
@@ -572,6 +575,11 @@ static inline bool irq_domain_is_msi_dev
 	return domain->flags & IRQ_DOMAIN_FLAG_MSI_DEVICE;
 }
 
+static inline bool irq_domain_free_bulk(struct irq_domain *domain)
+{
+	return domain->flags & IRQ_DOMAIN_FREE_BULK;
+}
+
 #else	/* CONFIG_IRQ_DOMAIN_HIERARCHY */
 static inline int irq_domain_alloc_irqs(struct irq_domain *domain,
 			unsigned int nr_irqs, int node, void *arg)
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1442,14 +1442,19 @@ static void irq_domain_free_irqs_hierarc
 					   unsigned int irq_base,
 					   unsigned int nr_irqs)
 {
-	unsigned int i;
+	unsigned int i, tofree = 1;
 
 	if (!domain->ops->free)
 		return;
 
+	if (irq_domain_free_bulk(domain)) {
+		tofree = nr_irqs;
+		nr_irqs = 1;
+	}
+
 	for (i = 0; i < nr_irqs; i++) {
 		if (irq_domain_get_irq_data(domain, irq_base + i))
-			domain->ops->free(domain, irq_base + i, 1);
+			domain->ops->free(domain, irq_base + i, tofree);
 	}
 }
 

      parent reply	other threads:[~2023-08-29 23:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01 18:34 PCI MSI issue with reinserting a driver John Garry
2021-02-01 18:50 ` Marc Zyngier
2021-02-02  8:37   ` John Garry
2021-02-02 12:38     ` John Garry
2021-02-02 14:48       ` Marc Zyngier
2021-02-02 15:46         ` John Garry
2021-02-03 17:23           ` Marc Zyngier
2021-02-04 10:45             ` John Garry
2022-08-04 10:59               ` John Garry
2021-04-06  9:46             ` John Garry
2021-08-27  8:33             ` luojiaxing
2023-08-29 23:00             ` Thomas Gleixner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87350178tl.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=john.garry@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=wangzhou1@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).