linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCI MSI issue with reinserting a driver
@ 2021-02-01 18:34 John Garry
  2021-02-01 18:50 ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2021-02-01 18:34 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner; +Cc: Zhou Wang, linux-kernel

Just a heads-up, by chance I noticed that I can't re-insert a specific 
driver on v5.11-rc6:

[   64.356023] hisi_dma 0000:7b:00.0: Adding to iommu group 31
[   64.368627] hisi_dma 0000:7b:00.0: enabling device (0000 -> 0002)
[   64.384156] hisi_dma 0000:7b:00.0: Failed to allocate MSI vectors!
[   64.397180] hisi_dma: probe of 0000:7b:00.0 failed with error -28

That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y

Bisect tells me that this is the first bad commit:
4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has 
no mapping

The relevant driver code is 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547

That driver only allocates 30 MSI, so maybe there's a problem with not 
allocating (and freeing) all 32 MSI.

I'll have a bit more of a look tomorrow.

Cheers,
john

Bisect log:

git bisect start
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [1048ba83fb1c00cd24172e23e8263972f6b5d9ac] Linux 5.11-rc6
git bisect bad 1048ba83fb1c00cd24172e23e8263972f6b5d9ac
# bad: [ee249d30fadec7677364063648f5547e243bf93f] Merge branch 
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect bad ee249d30fadec7677364063648f5547e243bf93f
# good: [15b447361794271f4d03c04d82276a841fe06328] mm/lru: revise the 
comments of lru_lock
git bisect good 15b447361794271f4d03c04d82276a841fe06328
# good: [15b447361794271f4d03c04d82276a841fe06328] mm/lru: revise the 
comments of lru_lock
git bisect good 15b447361794271f4d03c04d82276a841fe06328
# good: [2aa899ebd5c3aef707460f58951cc8a1d1f466c1] MAINTAINERS: add 
mvpp2 driver entry
git bisect good 2aa899ebd5c3aef707460f58951cc8a1d1f466c1
# good: [2911ed9f47b47cb5ab87d03314b3b9fe008e607f] Merge tag 
'char-misc-5.11-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc 

git bisect good 2911ed9f47b47cb5ab87d03314b3b9fe008e607f
# bad: [a45f1d43311d3a4f6534e48a3655ba3247a59d48] Merge tag 
'regmap-v5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
git bisect bad a45f1d43311d3a4f6534e48a3655ba3247a59d48
# good: [749c1e1481e1d242ded9dd1bf210ddb7c0d22a4f] Merge tag 
'iio-for-5.11a' of 
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into 
staging-next
git bisect good 749c1e1481e1d242ded9dd1bf210ddb7c0d22a4f
# good: [98b32c71a455ff289442779fee02ad60a6217006] staging: rtl8723bs: 
replace HT_CAP_AMPDU_FACTOR
git bisect good 98b32c71a455ff289442779fee02ad60a6217006
# bad: [3c41e57a1e168d879e923c5583adeae47eec9f64] Merge tag 
'irqchip-5.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into 
irq/core
git bisect bad 3c41e57a1e168d879e923c5583adeae47eec9f64
# good: [e15f2fa959f2cce8a05e8e3a596e75d068cd42c5] driver core: 
platform: Add devm_platform_get_irqs_affinity()
git bisect good e15f2fa959f2cce8a05e8e3a596e75d068cd42c5
# good: [2cb0837e56e1b04b773ed05df72297de4e010063] arm64: irqstat: Get 
rid of duplicated declaration
git bisect good 2cb0837e56e1b04b773ed05df72297de4e010063
# bad: [4615fbc3788ddc8e7c6d697714ad35a53729aa2c] genirq/irqdomain: 
Don't try to free an interrupt that has no mapping
git bisect bad 4615fbc3788ddc8e7c6d697714ad35a53729aa2c
# good: [e091bc90cd2d65f48e4688faead2911558d177d7] irqstat: Move 
declaration into asm-generic/hardirq.h
git bisect good e091bc90cd2d65f48e4688faead2911558d177d7
# good: [ae9ef58996a4447dd44aa638759f913c883ba816] softirq: Move related 
code into one section
git bisect good ae9ef58996a4447dd44aa638759f913c883ba816
# good: [15b8d9372f27c47e17c91f6f16d359314cf11404] sh/irq: Add missing 
closing parentheses in arch_show_interrupts()
git bisect good 15b8d9372f27c47e17c91f6f16d359314cf11404
# first bad commit: [4615fbc3788ddc8e7c6d697714ad35a53729aa2c] 
genirq/irqdomain: Don't try to free an interrupt that has no mapping

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-01 18:34 PCI MSI issue with reinserting a driver John Garry
@ 2021-02-01 18:50 ` Marc Zyngier
  2021-02-02  8:37   ` John Garry
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2021-02-01 18:50 UTC (permalink / raw)
  To: John Garry; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel

Hi John,

On Mon, 01 Feb 2021 18:34:59 +0000,
John Garry <john.garry@huawei.com> wrote:
> 
> Just a heads-up, by chance I noticed that I can't re-insert a specific
> driver on v5.11-rc6:
> 
> [   64.356023] hisi_dma 0000:7b:00.0: Adding to iommu group 31
> [   64.368627] hisi_dma 0000:7b:00.0: enabling device (0000 -> 0002)
> [   64.384156] hisi_dma 0000:7b:00.0: Failed to allocate MSI vectors!
> [   64.397180] hisi_dma: probe of 0000:7b:00.0 failed with error -28
> 
> That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
> 
> Bisect tells me that this is the first bad commit:
> 4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
> no mapping
> 
> The relevant driver code is
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547
> 
> That driver only allocates 30 MSI, so maybe there's a problem with not
> allocating (and freeing) all 32 MSI.

Are they Multi-MSI (and not MSI-X)?

> I'll have a bit more of a look tomorrow.

Here's my suspicion: two of the interrupts are mapped in the low-level
domain (the ITS, I'd expect in your case), but they have never been
mapped at the higher level.

On teardown, we only get rid of the 30 that were actually mapped, and
leave the last two dangling in the ITS domain, and thus the ITS device
resources are never freed. On reload, we request another 32
interrupts, which can't be satisfied for this device.

Assuming I got it right, the question is: why weren't these interrupts
mapped in the PCI domain the first place. And if I got it wrong, I'm
even more curious!

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-01 18:50 ` Marc Zyngier
@ 2021-02-02  8:37   ` John Garry
  2021-02-02 12:38     ` John Garry
  0 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2021-02-02  8:37 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel

On 01/02/2021 18:50, Marc Zyngier wrote:

Hi Marc,

>> Just a heads-up, by chance I noticed that I can't re-insert a specific
>> driver on v5.11-rc6:
>>
>> [   64.356023] hisi_dma 0000:7b:00.0: Adding to iommu group 31
>> [   64.368627] hisi_dma 0000:7b:00.0: enabling device (0000 -> 0002)
>> [   64.384156] hisi_dma 0000:7b:00.0: Failed to allocate MSI vectors!
>> [   64.397180] hisi_dma: probe of 0000:7b:00.0 failed with error -28
>>
>> That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
>>
>> Bisect tells me that this is the first bad commit:
>> 4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
>> no mapping
>>
>> The relevant driver code is
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547
>>
>> That driver only allocates 30 MSI, so maybe there's a problem with not
>> allocating (and freeing) all 32 MSI.
> Are they Multi-MSI (and not MSI-X)?

multi-msi

> 
>> I'll have a bit more of a look tomorrow.
> Here's my suspicion: two of the interrupts are mapped in the low-level
> domain (the ITS, I'd expect in your case), but they have never been
> mapped at the higher level.
> 
> On teardown, we only get rid of the 30 that were actually mapped, and
> leave the last two dangling in the ITS domain, and thus the ITS device
> resources are never freed. On reload, we request another 32
> interrupts, which can't be satisfied for this device.
> 
> Assuming I got it right, the question is: why weren't these interrupts
> mapped in the PCI domain the first place. And if I got it wrong, I'm
> even more curious!

Not sure. I also now notice an error for the SAS PCI driver on D06 when 
nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks 
the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI.

Anyway, let me have a look today to see what is going wrong.

cheers,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-02  8:37   ` John Garry
@ 2021-02-02 12:38     ` John Garry
  2021-02-02 14:48       ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2021-02-02 12:38 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel

>> Here's my suspicion: two of the interrupts are mapped in the low-level
>> domain (the ITS, I'd expect in your case), but they have never been
>> mapped at the higher level.
>>
>> On teardown, we only get rid of the 30 that were actually mapped, and
>> leave the last two dangling in the ITS domain, and thus the ITS device
>> resources are never freed. On reload, we request another 32
>> interrupts, which can't be satisfied for this device.
>>
>> Assuming I got it right, the question is: why weren't these interrupts
>> mapped in the PCI domain the first place. And if I got it wrong, I'm
>> even more curious!
> 
> Not sure. I also now notice an error for the SAS PCI driver on D06 when 
> nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks 
> the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI.
> 
> Anyway, let me have a look today to see what is going wrong.
> 
Could this be the problem:

nr_cpus=11

In alloc path, we have:
	its_alloc_device_irq(nvecs=27 = 16+11)
	  bitmap_find_free_region(order = 5);
In free path, we have:
	its_irq_domain_free(nvecs = 1) and free each 27 vecs
	  bitmap_release_region(order = 0)

So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.

FWIW, this hack seems to fix it:

---->8-----

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c
index ac5412b284e6..458ea0ebea2b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c

@@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct 
irq_domain *domain, unsigned int virq,
  	struct its_device *its_dev = info->scratchpad[0].ptr;
  	struct its_node *its = its_dev->its;
  	struct irq_data *irqd;
-	irq_hw_number_t hwirq;
+	irq_hw_number_t hwirq[nr_irqs]; //vla :(
  	int err;
  	int i;

-	err = its_alloc_device_irq(its_dev, nr_irqs, &hwirq);
+	for (i = 0; i < nr_irqs; i++) {
+		err = its_alloc_device_irq(its_dev, 1, &hwirq[i]);
+		if (err) //tidy
+			return err;
+	}
+	
-	if (err)
-		return err;

  	err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
  	if (err)
  		return err;

  	for (i = 0; i < nr_irqs; i++) {
-		err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
+		err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]);
  		if (err)
  			return err;

  		irq_domain_set_hwirq_and_chip(domain, virq + i,
-					      hwirq + i, &its_irq_chip, its_dev);
+					      hwirq[i], &its_irq_chip, its_dev);
  		irqd = irq_get_irq_data(virq + i);
  		irqd_set_single_target(irqd);
  		irqd_set_affinity_on_activate(irqd);
  		pr_debug("ID:%d pID:%d vID:%d\n",
-			 (int)(hwirq + i - its_dev->event_map.lpi_base),
-			 (int)(hwirq + i), virq + i);
+			 (int)(hwirq[i] - its_dev->event_map.lpi_base),
+			 (int)(hwirq[i]), virq + i);
  	}
----8<-----


But I'm not sure that we have any requirement for those map bits to be 
consecutive.

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-02 12:38     ` John Garry
@ 2021-02-02 14:48       ` Marc Zyngier
  2021-02-02 15:46         ` John Garry
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2021-02-02 14:48 UTC (permalink / raw)
  To: John Garry; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel

On 2021-02-02 12:38, John Garry wrote:
>>> Here's my suspicion: two of the interrupts are mapped in the 
>>> low-level
>>> domain (the ITS, I'd expect in your case), but they have never been
>>> mapped at the higher level.
>>> 
>>> On teardown, we only get rid of the 30 that were actually mapped, and
>>> leave the last two dangling in the ITS domain, and thus the ITS 
>>> device
>>> resources are never freed. On reload, we request another 32
>>> interrupts, which can't be satisfied for this device.
>>> 
>>> Assuming I got it right, the question is: why weren't these 
>>> interrupts
>>> mapped in the PCI domain the first place. And if I got it wrong, I'm
>>> even more curious!
>> 
>> Not sure. I also now notice an error for the SAS PCI driver on D06 
>> when nr_cpus < 16, which means number of MSI vectors allocated < 32, 
>> so looks the same problem. There we try to allocate 16 + max(nr cpus, 
>> 16) MSI.
>> 
>> Anyway, let me have a look today to see what is going wrong.
>> 
> Could this be the problem:
> 
> nr_cpus=11
> 
> In alloc path, we have:
> 	its_alloc_device_irq(nvecs=27 = 16+11)
> 	  bitmap_find_free_region(order = 5);
> In free path, we have:
> 	its_irq_domain_free(nvecs = 1) and free each 27 vecs
> 	  bitmap_release_region(order = 0)
> 
> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
> 
> FWIW, this hack seems to fix it:
> 
> ---->8-----
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index ac5412b284e6..458ea0ebea2b 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> 
> @@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct
> irq_domain *domain, unsigned int virq,
>  	struct its_device *its_dev = info->scratchpad[0].ptr;
>  	struct its_node *its = its_dev->its;
>  	struct irq_data *irqd;
> -	irq_hw_number_t hwirq;
> +	irq_hw_number_t hwirq[nr_irqs]; //vla :(
>  	int err;
>  	int i;
> 
> -	err = its_alloc_device_irq(its_dev, nr_irqs, &hwirq);
> +	for (i = 0; i < nr_irqs; i++) {
> +		err = its_alloc_device_irq(its_dev, 1, &hwirq[i]);
> +		if (err) //tidy
> +			return err;
> +	}
> +
> -	if (err)
> -		return err;
> 
>  	err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
>  	if (err)
>  		return err;
> 
>  	for (i = 0; i < nr_irqs; i++) {
> -		err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
> +		err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]);
>  		if (err)
>  			return err;
> 
>  		irq_domain_set_hwirq_and_chip(domain, virq + i,
> -					      hwirq + i, &its_irq_chip, its_dev);
> +					      hwirq[i], &its_irq_chip, its_dev);
>  		irqd = irq_get_irq_data(virq + i);
>  		irqd_set_single_target(irqd);
>  		irqd_set_affinity_on_activate(irqd);
>  		pr_debug("ID:%d pID:%d vID:%d\n",
> -			 (int)(hwirq + i - its_dev->event_map.lpi_base),
> -			 (int)(hwirq + i), virq + i);
> +			 (int)(hwirq[i] - its_dev->event_map.lpi_base),
> +			 (int)(hwirq[i]), virq + i);
>  	}
> ----8<-----
> 
> 
> But I'm not sure that we have any requirement for those map bits to be
> consecutive.

We can't really do that. All the events must be contiguous,
and there is also a lot of assumptions in the ITS driver that
LPI allocations is also contiguous.

But there is also the fact that for Multi-MSI, we *must*
allocate 32 vectors. Any driver could assume that if we have
allocated 17 vectors, then there is another 15 available.

My question still stand: how was this working with the previous
behaviour?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-02 14:48       ` Marc Zyngier
@ 2021-02-02 15:46         ` John Garry
  2021-02-03 17:23           ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2021-02-02 15:46 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel

On 02/02/2021 14:48, Marc Zyngier wrote:
>>>
>>> Not sure. I also now notice an error for the SAS PCI driver on D06 
>>> when nr_cpus < 16, which means number of MSI vectors allocated < 32, 
>>> so looks the same problem. There we try to allocate 16 + max(nr cpus, 
>>> 16) MSI.
>>>
>>> Anyway, let me have a look today to see what is going wrong.
>>>
>> Could this be the problem:
>>
>> nr_cpus=11
>>
>> In alloc path, we have:
>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>       bitmap_find_free_region(order = 5);
>> In free path, we have:
>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>       bitmap_release_region(order = 0)
>>
>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.

[ ... ]

>>
>>
>> But I'm not sure that we have any requirement for those map bits to be
>> consecutive.
> 
> We can't really do that. All the events must be contiguous,
> and there is also a lot of assumptions in the ITS driver that
> LPI allocations is also contiguous.
> 
> But there is also the fact that for Multi-MSI, we *must*
> allocate 32 vectors. Any driver could assume that if we have
> allocated 17 vectors, then there is another 15 available.
> 
> My question still stand: how was this working with the previous
> behaviour?

Because previously in this scenario we would allocate 32 bits and free 
32 bits in the map; but now we allocate 32 bits, yet only free 27 - so 
leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy() 
now frees per-interrupt, instead of all irqs per domain.

Before:
  In free path, we have:
      its_irq_domain_free(nvecs = 27)
        bitmap_release_region(count order = 5 == 32bits)

Current:
  In free path, we have:
      its_irq_domain_free(nvecs = 1) for free each 27 vecs
        bitmap_release_region(count order = 0 == 1bit)

Cheers,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-02 15:46         ` John Garry
@ 2021-02-03 17:23           ` Marc Zyngier
  2021-02-04 10:45             ` John Garry
                               ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Marc Zyngier @ 2021-02-03 17:23 UTC (permalink / raw)
  To: John Garry; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel

On 2021-02-02 15:46, John Garry wrote:
> On 02/02/2021 14:48, Marc Zyngier wrote:
>>>> 
>>>> Not sure. I also now notice an error for the SAS PCI driver on D06 
>>>> when nr_cpus < 16, which means number of MSI vectors allocated < 32, 
>>>> so looks the same problem. There we try to allocate 16 + max(nr 
>>>> cpus, 16) MSI.
>>>> 
>>>> Anyway, let me have a look today to see what is going wrong.
>>>> 
>>> Could this be the problem:
>>> 
>>> nr_cpus=11
>>> 
>>> In alloc path, we have:
>>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>>       bitmap_find_free_region(order = 5);
>>> In free path, we have:
>>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>>       bitmap_release_region(order = 0)
>>> 
>>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
> 
> [ ... ]
> 
>>> 
>>> 
>>> But I'm not sure that we have any requirement for those map bits to 
>>> be
>>> consecutive.
>> 
>> We can't really do that. All the events must be contiguous,
>> and there is also a lot of assumptions in the ITS driver that
>> LPI allocations is also contiguous.
>> 
>> But there is also the fact that for Multi-MSI, we *must*
>> allocate 32 vectors. Any driver could assume that if we have
>> allocated 17 vectors, then there is another 15 available.
>> 
>> My question still stand: how was this working with the previous
>> behaviour?
> 
> Because previously in this scenario we would allocate 32 bits and free
> 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
> leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
> now frees per-interrupt, instead of all irqs per domain.
> 
> Before:
>  In free path, we have:
>      its_irq_domain_free(nvecs = 27)
>        bitmap_release_region(count order = 5 == 32bits)
> 
> Current:
>  In free path, we have:
>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>        bitmap_release_region(count order = 0 == 1bit)

Right. I was focusing on the patch and blindly ignored the explanation
at the top of the email. Apologies for that.

I'm not overly keen on handling this in the ITS though, and I'd rather
we try and do it in the generic code. How about this (compile tested
only)?

Thanks,

         M.

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 6aacd342cd14..cfccad83c2df 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
irq_domain *domain,
  		return;

  	for (i = 0; i < nr_irqs; i++) {
-		if (irq_domain_get_irq_data(domain, irq_base + i))
-			domain->ops->free(domain, irq_base + i, 1);
+		int n ;
+
+		/* Find the largest possible span of IRQs to free in one go */
+		for (n = 0;
+		     ((i + n) < nr_irqs &&
+		      irq_domain_get_irq_data(domain, irq_base + i + n));
+		     n++);
+
+		if (!n)
+			continue;
+
+		domain->ops->free(domain, irq_base + i, n);
+		i += n;
  	}
  }


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-03 17:23           ` Marc Zyngier
@ 2021-02-04 10:45             ` John Garry
  2022-08-04 10:59               ` John Garry
  2021-04-06  9:46             ` John Garry
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2021-02-04 10:45 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel

On 03/02/2021 17:23, Marc Zyngier wrote:
>>
>> Before:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 27)
>>        bitmap_release_region(count order = 5 == 32bits)
>>
>> Current:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>        bitmap_release_region(count order = 0 == 1bit)
> 
> Right. I was focusing on the patch and blindly ignored the explanation
> at the top of the email. Apologies for that.

Yeah, it was a distraction.

> 
> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
> 
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 6aacd342cd14..cfccad83c2df 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
> irq_domain *domain,
>           return;
> 
>       for (i = 0; i < nr_irqs; i++) {
> -        if (irq_domain_get_irq_data(domain, irq_base + i))
> -            domain->ops->free(domain, irq_base + i, 1);
> +        int n ;
> +
> +        /* Find the largest possible span of IRQs to free in one go */
> +        for (n = 0;
> +             ((i + n) < nr_irqs &&
> +              irq_domain_get_irq_data(domain, irq_base + i + n));
> +             n++);
> +
> +        if (!n)
> +            continue;
> +
> +        domain->ops->free(domain, irq_base + i, n);
> +        i += n;
>       }
>   }

That fixed my problem.

For my benefit, if MSIs must be allocated in power of 2, could we check 
a flag for the dealloc method? Like, if MSI domain, then do as before 
4615fbc3788d. But I'm not sure on the specific scenario which that 
commit fixed. Or even whether you want specifics like that in core code.

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-03 17:23           ` Marc Zyngier
  2021-02-04 10:45             ` John Garry
@ 2021-04-06  9:46             ` John Garry
  2021-08-27  8:33             ` luojiaxing
  2023-08-29 23:00             ` Thomas Gleixner
  3 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2021-04-06  9:46 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Thomas Gleixner, Zhou Wang, linux-kernel, luojiaxing

On 03/02/2021 17:23, Marc Zyngier wrote:
> On 2021-02-02 15:46, John Garry wrote:
>> On 02/02/2021 14:48, Marc Zyngier wrote:
>>>>>
>>>>> Not sure. I also now notice an error for the SAS PCI driver on D06 
>>>>> when nr_cpus < 16, which means number of MSI vectors allocated < 
>>>>> 32, so looks the same problem. There we try to allocate 16 + max(nr 
>>>>> cpus, 16) MSI.
>>>>>
>>>>> Anyway, let me have a look today to see what is going wrong.
>>>>>
>>>> Could this be the problem:
>>>>
>>>> nr_cpus=11
>>>>
>>>> In alloc path, we have:
>>>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>>>       bitmap_find_free_region(order = 5);
>>>> In free path, we have:
>>>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>>>       bitmap_release_region(order = 0)
>>>>
>>>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
>>
>> [ ... ]
>>
>>>>

Hi Marc,

Just a friendly reminder on this issue. Let me know if anything required 
some our side.

Cheers,
John

>>>>
>>>> But I'm not sure that we have any requirement for those map bits to be
>>>> consecutive.
>>>
>>> We can't really do that. All the events must be contiguous,
>>> and there is also a lot of assumptions in the ITS driver that
>>> LPI allocations is also contiguous.
>>>
>>> But there is also the fact that for Multi-MSI, we *must*
>>> allocate 32 vectors. Any driver could assume that if we have
>>> allocated 17 vectors, then there is another 15 available.
>>>
>>> My question still stand: how was this working with the previous
>>> behaviour?
>>
>> Because previously in this scenario we would allocate 32 bits and free
>> 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
>> leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
>> now frees per-interrupt, instead of all irqs per domain.
>>
>> Before:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 27)
>>        bitmap_release_region(count order = 5 == 32bits)
>>
>> Current:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>        bitmap_release_region(count order = 0 == 1bit)
> 
> Right. I was focusing on the patch and blindly ignored the explanation
> at the top of the email. Apologies for that.
> 
> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
> 
> Thanks,
> 
>          M.
> 
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 6aacd342cd14..cfccad83c2df 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
> irq_domain *domain,
>           return;
> 
>       for (i = 0; i < nr_irqs; i++) {
> -        if (irq_domain_get_irq_data(domain, irq_base + i))
> -            domain->ops->free(domain, irq_base + i, 1);
> +        int n ;
> +
> +        /* Find the largest possible span of IRQs to free in one go */
> +        for (n = 0;
> +             ((i + n) < nr_irqs &&
> +              irq_domain_get_irq_data(domain, irq_base + i + n));
> +             n++);
> +
> +        if (!n)
> +            continue;
> +
> +        domain->ops->free(domain, irq_base + i, n);
> +        i += n;
>       }
>   }
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-03 17:23           ` Marc Zyngier
  2021-02-04 10:45             ` John Garry
  2021-04-06  9:46             ` John Garry
@ 2021-08-27  8:33             ` luojiaxing
  2023-08-29 23:00             ` Thomas Gleixner
  3 siblings, 0 replies; 12+ messages in thread
From: luojiaxing @ 2021-08-27  8:33 UTC (permalink / raw)
  To: Marc Zyngier, John Garry
  Cc: Thomas Gleixner, Zhou Wang, linux-kernel, qianweili


On 2021/2/4 1:23, Marc Zyngier wrote:
> On 2021-02-02 15:46, John Garry wrote:
>> On 02/02/2021 14:48, Marc Zyngier wrote:
>>>>>
>>>>> Not sure. I also now notice an error for the SAS PCI driver on D06 
>>>>> when nr_cpus < 16, which means number of MSI vectors allocated < 
>>>>> 32, so looks the same problem. There we try to allocate 16 + 
>>>>> max(nr cpus, 16) MSI.
>>>>>
>>>>> Anyway, let me have a look today to see what is going wrong.
>>>>>
>>>> Could this be the problem:
>>>>
>>>> nr_cpus=11
>>>>
>>>> In alloc path, we have:
>>>>     its_alloc_device_irq(nvecs=27 = 16+11)
>>>>       bitmap_find_free_region(order = 5);
>>>> In free path, we have:
>>>>     its_irq_domain_free(nvecs = 1) and free each 27 vecs
>>>>       bitmap_release_region(order = 0)
>>>>
>>>> So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.
>>
>> [ ... ]
>>
>>>>
>>>>
>>>> But I'm not sure that we have any requirement for those map bits to be
>>>> consecutive.
>>>
>>> We can't really do that. All the events must be contiguous,
>>> and there is also a lot of assumptions in the ITS driver that
>>> LPI allocations is also contiguous.
>>>
>>> But there is also the fact that for Multi-MSI, we *must*
>>> allocate 32 vectors. Any driver could assume that if we have
>>> allocated 17 vectors, then there is another 15 available.
>>>
>>> My question still stand: how was this working with the previous
>>> behaviour?
>>
>> Because previously in this scenario we would allocate 32 bits and free
>> 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
>> leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
>> now frees per-interrupt, instead of all irqs per domain.
>>
>> Before:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 27)
>>        bitmap_release_region(count order = 5 == 32bits)
>>
>> Current:
>>  In free path, we have:
>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>        bitmap_release_region(count order = 0 == 1bit)
>
> Right. I was focusing on the patch and blindly ignored the explanation
> at the top of the email. Apologies for that.
>
> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
>
> Thanks,
>
>         M.


Hi, Marc, Just a friendly reminder on this issue. We tested the kernel 
on 5.14-rc4 and found that this issue still existed, and the following 
bugfix code was not incorporated into the kernel.

I wonder if you have any plans to merge this bugfix patch.


Thanks

Jiaxing


>
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 6aacd342cd14..cfccad83c2df 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -1399,8 +1399,19 @@ static void 
> irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>          return;
>
>      for (i = 0; i < nr_irqs; i++) {
> -        if (irq_domain_get_irq_data(domain, irq_base + i))
> -            domain->ops->free(domain, irq_base + i, 1);
> +        int n ;
> +
> +        /* Find the largest possible span of IRQs to free in one go */
> +        for (n = 0;
> +             ((i + n) < nr_irqs &&
> +              irq_domain_get_irq_data(domain, irq_base + i + n));
> +             n++);
> +
> +        if (!n)
> +            continue;
> +
> +        domain->ops->free(domain, irq_base + i, n);
> +        i += n;
>      }
>  }
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-04 10:45             ` John Garry
@ 2022-08-04 10:59               ` John Garry
  0 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2022-08-04 10:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Zhou Wang, linux-kernel, shenyang39, zhanjie (F)

On 04/02/2021 10:45, John Garry wrote:

Hi Marc,

Just a friendly reminder that we still have the issue with reinseting a 
PCI driver which does not allocate a power-of-2 MSIs.

> On 03/02/2021 17:23, Marc Zyngier wrote:
>>>
>>> Before:
>>>  In free path, we have:
>>>      its_irq_domain_free(nvecs = 27)
>>>        bitmap_release_region(count order = 5 == 32bits)
>>>
>>> Current:
>>>  In free path, we have:
>>>      its_irq_domain_free(nvecs = 1) for free each 27 vecs
>>>        bitmap_release_region(count order = 0 == 1bit)
>>
>> Right. I was focusing on the patch and blindly ignored the explanation
>> at the top of the email. Apologies for that.
> 
> Yeah, it was a distraction.
> 
>>
>> I'm not overly keen on handling this in the ITS though, and I'd rather
>> we try and do it in the generic code. How about this (compile tested
>> only)?
>>
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index 6aacd342cd14..cfccad83c2df 100644
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -1399,8 +1399,19 @@ static void 
>> irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>>           return;
>>
>>       for (i = 0; i < nr_irqs; i++) {
>> -        if (irq_domain_get_irq_data(domain, irq_base + i))
>> -            domain->ops->free(domain, irq_base + i, 1);
>> +        int n ;
>> +
>> +        /* Find the largest possible span of IRQs to free in one go */
>> +        for (n = 0;
>> +             ((i + n) < nr_irqs &&
>> +              irq_domain_get_irq_data(domain, irq_base + i + n));
>> +             n++);
>> +
>> +        if (!n)
>> +            continue;
>> +
>> +        domain->ops->free(domain, irq_base + i, n);
>> +        i += n;
>>       }
>>   }
> 
> That fixed my problem.
> 
> For my benefit, if MSIs must be allocated in power of 2, could we check 
> a flag for the dealloc method? Like, if MSI domain, then do as before 
> 4615fbc3788d. But I'm not sure on the specific scenario which that 
> commit fixed. Or even whether you want specifics like that in core code.

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI MSI issue with reinserting a driver
  2021-02-03 17:23           ` Marc Zyngier
                               ` (2 preceding siblings ...)
  2021-08-27  8:33             ` luojiaxing
@ 2023-08-29 23:00             ` Thomas Gleixner
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2023-08-29 23:00 UTC (permalink / raw)
  To: Marc Zyngier, John Garry; +Cc: Zhou Wang, linux-kernel

On Wed, Feb 03 2021 at 17:23, Marc Zyngier wrote:
> On 2021-02-02 15:46, John Garry wrote:
>> On 02/02/2021 14:48, Marc Zyngier wrote:
>>> We can't really do that. All the events must be contiguous,
>>> and there is also a lot of assumptions in the ITS driver that
>>> LPI allocations is also contiguous.
>>> 
>>> But there is also the fact that for Multi-MSI, we *must*
>>> allocate 32 vectors. Any driver could assume that if we have
>>> allocated 17 vectors, then there is another 15 available.

In theory. In practice no driver can assume that simply because there is
no corresponding interrupt descriptor. The PCI/MSI code allocates $N
vectors, the MSI code allocates exactly $N interrupt descriptors, so
there is no way that the driver can make up N = round_up_power_of_to($N)
magically.

The only reason why this makes sense is when its_dev and the associated
bitmap is shared between devices because that way you ensure that above
the non-allocated interrupts there is a gap up to the next power of two
to catch malfunctioning hardware/drivers.

If its_dev is not shared, then this makes no difference as the gap is
there naturaly.

> I'm not overly keen on handling this in the ITS though, and I'd rather
> we try and do it in the generic code. How about this (compile tested
> only)?
...
> @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
> irq_domain *domain,
>   		return;
>
>   	for (i = 0; i < nr_irqs; i++) {
> -		if (irq_domain_get_irq_data(domain, irq_base + i))
> -			domain->ops->free(domain, irq_base + i, 1);
> +		int n ;
> +
> +		/* Find the largest possible span of IRQs to free in one go */
> +		for (n = 0;
> +		     ((i + n) < nr_irqs &&
> +		      irq_domain_get_irq_data(domain, irq_base + i + n));
> +		     n++);
> +
> +		if (!n)
> +			continue;
> +
> +		domain->ops->free(domain, irq_base + i, n);
> +		i += n;

TBH. That makes my eyes bleed and it's just an ugly workaround, which
works by chance for that ITS implementation detail. That's really not
the place to do that.

I've stared at this before when I was doing the initial PoC to convert
GIC over to the per device MSI domain model and did not come up with a
solution to implement support for dynamic PCI-MSI allocation.

I know that you need a fix for the current code, but we really should
move all this muck over to the per device model which makes this way
simpler as you really have per device mechanisms.

The underlying problem is the whole bulk allocation/free assumption in
the ITS driver, which you need to address anyway when you ever want to
support dynamic PCI/MSI-X and PCI/IMS.

For that this really needs to be properly split up:

   1) Initialization: Reserve a LPI bitmap region for a sufficiently
      large number of MSI interrupts which handles the power of 2
      requirement

   2) Alloc: Allocate the individual interrupts within the bitmap
      region

   3) Free: Free the individual interrupts and just clear the
      corresponding bit within the bitmap region

   4) Teardown: Release the bitmap region

But lets look at that later.

For the problem at hand I rather see something like the below instead of
this hideous 'try to find a range' hackery which is incomprehensible
without a comment the size of a planet.

That's not pretty either, but at least it makes it entirely clear that
the underlying irqdomain requires this for whatever insane reason.

Thanks,

        tglx
---
 include/linux/irqdomain.h |    8 ++++++++
 kernel/irq/irqdomain.c    |    9 +++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -208,6 +208,9 @@ enum {
 	/* Irq domain is a MSI device domain */
 	IRQ_DOMAIN_FLAG_MSI_DEVICE	= (1 << 9),
 
+	/* Irq domain requires bulk freeing of interrupts */
+	IRQ_DOMAIN_FREE_BULK		= (1 << 10),
+
 	/*
 	 * Flags starting from IRQ_DOMAIN_FLAG_NONCORE are reserved
 	 * for implementation specific purposes and ignored by the
@@ -572,6 +575,11 @@ static inline bool irq_domain_is_msi_dev
 	return domain->flags & IRQ_DOMAIN_FLAG_MSI_DEVICE;
 }
 
+static inline bool irq_domain_free_bulk(struct irq_domain *domain)
+{
+	return domain->flags & IRQ_DOMAIN_FREE_BULK;
+}
+
 #else	/* CONFIG_IRQ_DOMAIN_HIERARCHY */
 static inline int irq_domain_alloc_irqs(struct irq_domain *domain,
 			unsigned int nr_irqs, int node, void *arg)
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1442,14 +1442,19 @@ static void irq_domain_free_irqs_hierarc
 					   unsigned int irq_base,
 					   unsigned int nr_irqs)
 {
-	unsigned int i;
+	unsigned int i, tofree = 1;
 
 	if (!domain->ops->free)
 		return;
 
+	if (irq_domain_free_bulk(domain)) {
+		tofree = nr_irqs;
+		nr_irqs = 1;
+	}
+
 	for (i = 0; i < nr_irqs; i++) {
 		if (irq_domain_get_irq_data(domain, irq_base + i))
-			domain->ops->free(domain, irq_base + i, 1);
+			domain->ops->free(domain, irq_base + i, tofree);
 	}
 }
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-08-29 23:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01 18:34 PCI MSI issue with reinserting a driver John Garry
2021-02-01 18:50 ` Marc Zyngier
2021-02-02  8:37   ` John Garry
2021-02-02 12:38     ` John Garry
2021-02-02 14:48       ` Marc Zyngier
2021-02-02 15:46         ` John Garry
2021-02-03 17:23           ` Marc Zyngier
2021-02-04 10:45             ` John Garry
2022-08-04 10:59               ` John Garry
2021-04-06  9:46             ` John Garry
2021-08-27  8:33             ` luojiaxing
2023-08-29 23:00             ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).