linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pcie-iproc-msi.c: Bug in Multi-MSI support?
@ 2021-05-20 12:00 Pali Rohár
  2021-05-20 13:47 ` Sandor Bodo-Merle
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2021-05-20 12:00 UTC (permalink / raw)
  To: linux-pci, bcm-kernel-feedback-list

Hello!

I think there is a bug in pcie-iproc-msi.c driver. It declares
Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174

but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs
argument when allocating interrupt numbers from bitmap, see:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246

I think this this is incorrect as alloc callback should allocate nr_irqs
multi interrupts as caller requested. All other drivers with Multi MSI
support are doing it.

Could you look at it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-20 12:00 pcie-iproc-msi.c: Bug in Multi-MSI support? Pali Rohár
@ 2021-05-20 13:47 ` Sandor Bodo-Merle
  2021-05-20 14:05   ` Pali Rohár
  0 siblings, 1 reply; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-05-20 13:47 UTC (permalink / raw)
  To: Pali Rohár; +Cc: linux-pci, bcm-kernel-feedback-list

Hi Pali,

thanks for catching this - i dig up the followup fixup commit we have
for the iproc multi MSI (it was sent to Broadcom - but unfortunately
we missed upstreaming it).

Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
failed to reserve the proper number of bits from the inner domain.
We need to allocate the proper amount of bits otherwise the domains for
multiple PCIe endpoints may overlap and freeing one of them will result
in freeing unrelated MSI vectors.

Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
---
 drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
index 708fdb1065f8..a00492dccb74 100644
--- drivers/pci/host/pcie-iproc-msi.c
+++ drivers/pci/host/pcie-iproc-msi.c
@@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
irq_domain *domain,

        mutex_lock(&msi->bitmap_lock);

-       /* Allocate 'nr_cpus' number of MSI vectors each time */
+       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
vectors each time */
        hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
-                                          msi->nr_cpus, 0);
+                                          msi->nr_cpus * nr_irqs, 0);
        if (hwirq < msi->nr_msi_vecs) {
-               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
+               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
        } else {
                mutex_unlock(&msi->bitmap_lock);
                return -ENOSPC;
@@ -292,7 +292,7 @@ static void iproc_msi_irq_domain_free(struct
irq_domain *domain,
        mutex_lock(&msi->bitmap_lock);

        hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
-       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
+       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);

        mutex_unlock(&msi->bitmap_lock);


On Thu, May 20, 2021 at 2:04 PM Pali Rohár <pali@kernel.org> wrote:
>
> Hello!
>
> I think there is a bug in pcie-iproc-msi.c driver. It declares
> Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174
>
> but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs
> argument when allocating interrupt numbers from bitmap, see:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246
>
> I think this this is incorrect as alloc callback should allocate nr_irqs
> multi interrupts as caller requested. All other drivers with Multi MSI
> support are doing it.
>
> Could you look at it?

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-20 13:47 ` Sandor Bodo-Merle
@ 2021-05-20 14:05   ` Pali Rohár
  2021-05-20 14:22     ` Sandor Bodo-Merle
  0 siblings, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2021-05-20 14:05 UTC (permalink / raw)
  To: Sandor Bodo-Merle; +Cc: linux-pci, bcm-kernel-feedback-list

Hello!

On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
> Hi Pali,
> 
> thanks for catching this - i dig up the followup fixup commit we have
> for the iproc multi MSI (it was sent to Broadcom - but unfortunately
> we missed upstreaming it).
> 
> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> failed to reserve the proper number of bits from the inner domain.
> We need to allocate the proper amount of bits otherwise the domains for
> multiple PCIe endpoints may overlap and freeing one of them will result
> in freeing unrelated MSI vectors.
> 
> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> ---
>  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
> index 708fdb1065f8..a00492dccb74 100644
> --- drivers/pci/host/pcie-iproc-msi.c
> +++ drivers/pci/host/pcie-iproc-msi.c
> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
> irq_domain *domain,
> 
>         mutex_lock(&msi->bitmap_lock);
> 
> -       /* Allocate 'nr_cpus' number of MSI vectors each time */
> +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
> vectors each time */
>         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
> -                                          msi->nr_cpus, 0);
> +                                          msi->nr_cpus * nr_irqs, 0);

I'm not sure if this construction is correct. Multi-MSI interrupts needs
to be aligned to number of requested interrupts. So if wifi driver asks
for 32 Multi-MSI interrupts then first allocated interrupt number must
be dividable by 32.

>         if (hwirq < msi->nr_msi_vecs) {
> -               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
> +               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);

And another issue is that only power of 2 interrupts for Multi-MSI can
be allocated. Otherwise one number may be allocated to more devices.

But I'm not sure how number of CPUs affects it as other PCIe controller
drivers do not use number of CPUs.

Other drivers are using bitmap_find_free_region() function with
order_base_2(nr_irqs) as argument.

I hope that somebody else more skilled with MSI interrupts look at these
constructions if are correct or needs more rework.

>         } else {
>                 mutex_unlock(&msi->bitmap_lock);
>                 return -ENOSPC;
> @@ -292,7 +292,7 @@ static void iproc_msi_irq_domain_free(struct
> irq_domain *domain,
>         mutex_lock(&msi->bitmap_lock);
> 
>         hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
> -       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
> +       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
> 
>         mutex_unlock(&msi->bitmap_lock);
> 
> 
> On Thu, May 20, 2021 at 2:04 PM Pali Rohár <pali@kernel.org> wrote:
> >
> > Hello!
> >
> > I think there is a bug in pcie-iproc-msi.c driver. It declares
> > Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174
> >
> > but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs
> > argument when allocating interrupt numbers from bitmap, see:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246
> >
> > I think this this is incorrect as alloc callback should allocate nr_irqs
> > multi interrupts as caller requested. All other drivers with Multi MSI
> > support are doing it.
> >
> > Could you look at it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-20 14:05   ` Pali Rohár
@ 2021-05-20 14:22     ` Sandor Bodo-Merle
  2021-05-20 17:11       ` Ray Jui
  0 siblings, 1 reply; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-05-20 14:22 UTC (permalink / raw)
  To: Pali Rohár; +Cc: linux-pci, bcm-kernel-feedback-list

On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@kernel.org> wrote:
>
> Hello!
>
> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
> > Hi Pali,
> >
> > thanks for catching this - i dig up the followup fixup commit we have
> > for the iproc multi MSI (it was sent to Broadcom - but unfortunately
> > we missed upstreaming it).
> >
> > Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> > failed to reserve the proper number of bits from the inner domain.
> > We need to allocate the proper amount of bits otherwise the domains for
> > multiple PCIe endpoints may overlap and freeing one of them will result
> > in freeing unrelated MSI vectors.
> >
> > Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> > ---
> >  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
> > index 708fdb1065f8..a00492dccb74 100644
> > --- drivers/pci/host/pcie-iproc-msi.c
> > +++ drivers/pci/host/pcie-iproc-msi.c
> > @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
> > irq_domain *domain,
> >
> >         mutex_lock(&msi->bitmap_lock);
> >
> > -       /* Allocate 'nr_cpus' number of MSI vectors each time */
> > +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
> > vectors each time */
> >         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
> > -                                          msi->nr_cpus, 0);
> > +                                          msi->nr_cpus * nr_irqs, 0);
>
> I'm not sure if this construction is correct. Multi-MSI interrupts needs
> to be aligned to number of requested interrupts. So if wifi driver asks
> for 32 Multi-MSI interrupts then first allocated interrupt number must
> be dividable by 32.
>

Ahh - i guess you are right. In our internal engineering we always
request 32 vectors.
IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly.

> >         if (hwirq < msi->nr_msi_vecs) {
> > -               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
> > +               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
>
> And another issue is that only power of 2 interrupts for Multi-MSI can
> be allocated. Otherwise one number may be allocated to more devices.
>
> But I'm not sure how number of CPUs affects it as other PCIe controller
> drivers do not use number of CPUs.
>
> Other drivers are using bitmap_find_free_region() function with
> order_base_2(nr_irqs) as argument.
>
> I hope that somebody else more skilled with MSI interrupts look at these
> constructions if are correct or needs more rework.
>

I see the point - i'll ask also in our engineering department ...

> >         } else {
> >                 mutex_unlock(&msi->bitmap_lock);
> >                 return -ENOSPC;
> > @@ -292,7 +292,7 @@ static void iproc_msi_irq_domain_free(struct
> > irq_domain *domain,
> >         mutex_lock(&msi->bitmap_lock);
> >
> >         hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
> > -       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
> > +       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
> >
> >         mutex_unlock(&msi->bitmap_lock);
> >
> >
> > On Thu, May 20, 2021 at 2:04 PM Pali Rohár <pali@kernel.org> wrote:
> > >
> > > Hello!
> > >
> > > I think there is a bug in pcie-iproc-msi.c driver. It declares
> > > Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174
> > >
> > > but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs
> > > argument when allocating interrupt numbers from bitmap, see:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246
> > >
> > > I think this this is incorrect as alloc callback should allocate nr_irqs
> > > multi interrupts as caller requested. All other drivers with Multi MSI
> > > support are doing it.
> > >
> > > Could you look at it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-20 14:22     ` Sandor Bodo-Merle
@ 2021-05-20 17:11       ` Ray Jui
  2021-05-20 19:31         ` Sandor Bodo-Merle
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Ray Jui @ 2021-05-20 17:11 UTC (permalink / raw)
  To: Sandor Bodo-Merle, Pali Rohár; +Cc: linux-pci, bcm-kernel-feedback-list

[-- Attachment #1: Type: text/plain, Size: 6223 bytes --]



On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
> On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@kernel.org> wrote:
>>
>> Hello!
>>
>> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
>>> Hi Pali,
>>>
>>> thanks for catching this - i dig up the followup fixup commit we have
>>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately
>>> we missed upstreaming it).
>>>
>>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>>> failed to reserve the proper number of bits from the inner domain.
>>> We need to allocate the proper amount of bits otherwise the domains for
>>> multiple PCIe endpoints may overlap and freeing one of them will result
>>> in freeing unrelated MSI vectors.
>>>
>>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>>> ---
>>>  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
>>> index 708fdb1065f8..a00492dccb74 100644
>>> --- drivers/pci/host/pcie-iproc-msi.c
>>> +++ drivers/pci/host/pcie-iproc-msi.c
>>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
>>> irq_domain *domain,
>>>
>>>         mutex_lock(&msi->bitmap_lock);
>>>
>>> -       /* Allocate 'nr_cpus' number of MSI vectors each time */
>>> +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
>>> vectors each time */
>>>         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
>>> -                                          msi->nr_cpus, 0);
>>> +                                          msi->nr_cpus * nr_irqs, 0);
>>
>> I'm not sure if this construction is correct. Multi-MSI interrupts needs
>> to be aligned to number of requested interrupts. So if wifi driver asks
>> for 32 Multi-MSI interrupts then first allocated interrupt number must
>> be dividable by 32.
>>
> 
> Ahh - i guess you are right. In our internal engineering we always
> request 32 vectors.
> IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly.
> 

May I ask which platforms are you guys running this driver on? Cygnus or
Northstar? Not that it matters, but just out of curiosity.

Let me start by explaining how MSI support works in this driver, or more
precisely, for all platforms that support this iProc based event queue
MSI scheme:

In iProc PCIe core, each MSI group is serviced by a GIC interrupt
(hwirq) and a dedicated event queue (event queue is paired up with
hwirq).  Each MSI group can support up to 64 MSI vectors. Note 64 is the
depth of the event queue.

The number of MSI groups varies between different iProc SoCs. The total
number of CPU cores also varies. To support MSI IRQ affinity, we
distribute GIC interrupts across all available CPUs.  MSI vector is
moved from one GIC interrupt to another to steer to the target CPU.

Assuming:
The number of MSI groups (the number of total hwirq for this PCIe
controller) is M
The number of CPU cores is N
M is always a multiple of N (we ensured that in the setup function)

Therefore:
Total number of raw MSI vectors = M * 64
Total number of supported MSI vectors = (M * 64) / N

I guess I'm not too clear on what you mean by "multi-MSI interrupts
needs to be aligned to number of requested interrupts.". Would you be
able to plug this into the above explanation so we can have a more clear
understanding of what you mean here?

In general, I don't see much issue of allocating 'msi->nr_cpus *
nr_irqs' here as long as we can still meet the affinity distribution
requirement as mentioned above.

Example in the dw PCIe driver does the following for reserving in the
bitmap:

bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors,
order_base_2(nr_irqs));

>>>         if (hwirq < msi->nr_msi_vecs) {
>>> -               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
>>> +               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
>>
>> And another issue is that only power of 2 interrupts for Multi-MSI can
>> be allocated. Otherwise one number may be allocated to more devices.
>>
>> But I'm not sure how number of CPUs affects it as other PCIe controller
>> drivers do not use number of CPUs.

As I explained above, we definitely need to take nr_cpus into account,
since we cannot move around the physical interrupt between CPUs.
Instead, we dedicate each physical interrupt to the CPU and service the
MSI across different event queues accordingly when user change irq affinity.

>>
>> Other drivers are using bitmap_find_free_region() function with
>> order_base_2(nr_irqs) as argument.
>>

Yes we should do that too.

>> I hope that somebody else more skilled with MSI interrupts look at these
>> constructions if are correct or needs more rework.
>>
> 
> I see the point - i'll ask also in our engineering department ...
> 
>>>         } else {
>>>                 mutex_unlock(&msi->bitmap_lock);
>>>                 return -ENOSPC;
>>> @@ -292,7 +292,7 @@ static void iproc_msi_irq_domain_free(struct
>>> irq_domain *domain,
>>>         mutex_lock(&msi->bitmap_lock);
>>>
>>>         hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
>>> -       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
>>> +       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
>>>
>>>         mutex_unlock(&msi->bitmap_lock);
>>>
>>>
>>> On Thu, May 20, 2021 at 2:04 PM Pali Rohár <pali@kernel.org> wrote:
>>>>
>>>> Hello!
>>>>
>>>> I think there is a bug in pcie-iproc-msi.c driver. It declares
>>>> Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174
>>>>
>>>> but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs
>>>> argument when allocating interrupt numbers from bitmap, see:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246
>>>>
>>>> I think this this is incorrect as alloc callback should allocate nr_irqs
>>>> multi interrupts as caller requested. All other drivers with Multi MSI
>>>> support are doing it.
>>>>
>>>> Could you look at it?

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-20 17:11       ` Ray Jui
@ 2021-05-20 19:31         ` Sandor Bodo-Merle
  2021-05-21 11:39         ` Sandor Bodo-Merle
  2021-05-24 10:37         ` Marc Zyngier
  2 siblings, 0 replies; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-05-20 19:31 UTC (permalink / raw)
  To: Ray Jui; +Cc: Pali Rohár, linux-pci, bcm-kernel-feedback-list

On Thu, May 20, 2021 at 7:11 PM Ray Jui <ray.jui@broadcom.com> wrote:
>
> May I ask which platforms are you guys running this driver on? Cygnus or
> Northstar? Not that it matters, but just out of curiosity.
>
Initial support was added for the XGS platform - initially the single
core Saber2,
then also dual core Katana2. The XGS is not upstreamed but uses the
same in-kernel
MSI implementation.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-20 17:11       ` Ray Jui
  2021-05-20 19:31         ` Sandor Bodo-Merle
@ 2021-05-21 11:39         ` Sandor Bodo-Merle
  2021-05-21 14:00           ` Sandor Bodo-Merle
  2021-05-24 10:37         ` Marc Zyngier
  2 siblings, 1 reply; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-05-21 11:39 UTC (permalink / raw)
  To: Ray Jui; +Cc: Pali Rohár, linux-pci, bcm-kernel-feedback-list

Digging up our patch queue - i found another multi-MSI related fix:

    Unfortunately the reverse mapping of the hwirq - as made by
    irq_find_mapping() was not applied to the message data only, but
    also to the MSI vector, which was lost as a result.
    Make sure that the reverse mapping is applied to the proper hwirq,
    contained in the message data.
    Tested on Saber2 and Katana2.

    Fixes: fc54bae288182 ("PCI: iproc: Allow allocation of multiple MSIs")

diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
index 990fc906d73d..708fdb1065f8 100644
--- drivers/pci/host/pcie-iproc-msi.c
+++ drivers/pci/host/pcie-iproc-msi.c
@@ -237,7 +237,12 @@ static void iproc_msi_irq_compose_msi_msg(struct
irq_data *data,
        addr = msi->msi_addr + iproc_msi_addr_offset(msi, data->hwirq);
        msg->address_lo = lower_32_bits(addr);
        msg->address_hi = upper_32_bits(addr);
-       msg->data = data->hwirq << 5;
+       /*
+        * Since we have multiple hwirq mapped to a single MSI vector,
+        * now we need to derive the hwirq at CPU0.  It can then be used to
+        * mapped back to virq.
+        */
+       msg->data = hwirq_to_canonical_hwirq(msi, data->hwirq) << 5;
 }

 static struct irq_chip iproc_msi_bottom_irq_chip = {
@@ -307,14 +312,8 @@ static inline u32 decode_msi_hwirq(struct
iproc_msi *msi, u32 eq, u32 head)
        offs = iproc_msi_eq_offset(msi, eq) + head * sizeof(u32);
        msg = (u32 *)(msi->eq_cpu + offs);
        hwirq = readl(msg);
-       hwirq = (hwirq >> 5) + (hwirq & 0x1f);

-       /*
-        * Since we have multiple hwirq mapped to a single MSI vector,
-        * now we need to derive the hwirq at CPU0.  It can then be used to
-        * mapped back to virq.
-        */
-       return hwirq_to_canonical_hwirq(msi, hwirq);
+       return hwirq;
 }

 static void iproc_msi_handler(struct irq_desc *desc)
@@ -360,7 +359,7 @@ static void iproc_msi_handler(struct irq_desc *desc)
                /* process all outstanding events */
                while (nr_events--) {
                        hwirq = decode_msi_hwirq(msi, eq, head);
-                       virq = irq_find_mapping(msi->inner_domain, hwirq);
+                       virq = irq_find_mapping(msi->inner_domain,
hwirq >> 5) + (hwirq & 0x1f);
                        generic_handle_irq(virq);

                        head++;

On Thu, May 20, 2021 at 7:11 PM Ray Jui <ray.jui@broadcom.com> wrote:
>
>
>
> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
> > On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@kernel.org> wrote:
> >>
> >> Hello!
> >>
> >> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
> >>> Hi Pali,
> >>>
> >>> thanks for catching this - i dig up the followup fixup commit we have
> >>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately
> >>> we missed upstreaming it).
> >>>
> >>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> >>> failed to reserve the proper number of bits from the inner domain.
> >>> We need to allocate the proper amount of bits otherwise the domains for
> >>> multiple PCIe endpoints may overlap and freeing one of them will result
> >>> in freeing unrelated MSI vectors.
> >>>
> >>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> >>> ---
> >>>  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
> >>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
> >>> index 708fdb1065f8..a00492dccb74 100644
> >>> --- drivers/pci/host/pcie-iproc-msi.c
> >>> +++ drivers/pci/host/pcie-iproc-msi.c
> >>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
> >>> irq_domain *domain,
> >>>
> >>>         mutex_lock(&msi->bitmap_lock);
> >>>
> >>> -       /* Allocate 'nr_cpus' number of MSI vectors each time */
> >>> +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
> >>> vectors each time */
> >>>         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
> >>> -                                          msi->nr_cpus, 0);
> >>> +                                          msi->nr_cpus * nr_irqs, 0);
> >>
> >> I'm not sure if this construction is correct. Multi-MSI interrupts needs
> >> to be aligned to number of requested interrupts. So if wifi driver asks
> >> for 32 Multi-MSI interrupts then first allocated interrupt number must
> >> be dividable by 32.
> >>
> >
> > Ahh - i guess you are right. In our internal engineering we always
> > request 32 vectors.
> > IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly.
> >
>
> May I ask which platforms are you guys running this driver on? Cygnus or
> Northstar? Not that it matters, but just out of curiosity.
>
> Let me start by explaining how MSI support works in this driver, or more
> precisely, for all platforms that support this iProc based event queue
> MSI scheme:
>
> In iProc PCIe core, each MSI group is serviced by a GIC interrupt
> (hwirq) and a dedicated event queue (event queue is paired up with
> hwirq).  Each MSI group can support up to 64 MSI vectors. Note 64 is the
> depth of the event queue.
>
> The number of MSI groups varies between different iProc SoCs. The total
> number of CPU cores also varies. To support MSI IRQ affinity, we
> distribute GIC interrupts across all available CPUs.  MSI vector is
> moved from one GIC interrupt to another to steer to the target CPU.
>
> Assuming:
> The number of MSI groups (the number of total hwirq for this PCIe
> controller) is M
> The number of CPU cores is N
> M is always a multiple of N (we ensured that in the setup function)
>
> Therefore:
> Total number of raw MSI vectors = M * 64
> Total number of supported MSI vectors = (M * 64) / N
>
> I guess I'm not too clear on what you mean by "multi-MSI interrupts
> needs to be aligned to number of requested interrupts.". Would you be
> able to plug this into the above explanation so we can have a more clear
> understanding of what you mean here?
>
> In general, I don't see much issue of allocating 'msi->nr_cpus *
> nr_irqs' here as long as we can still meet the affinity distribution
> requirement as mentioned above.
>
> Example in the dw PCIe driver does the following for reserving in the
> bitmap:
>
> bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors,
> order_base_2(nr_irqs));
>
> >>>         if (hwirq < msi->nr_msi_vecs) {
> >>> -               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
> >>> +               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
> >>
> >> And another issue is that only power of 2 interrupts for Multi-MSI can
> >> be allocated. Otherwise one number may be allocated to more devices.
> >>
> >> But I'm not sure how number of CPUs affects it as other PCIe controller
> >> drivers do not use number of CPUs.
>
> As I explained above, we definitely need to take nr_cpus into account,
> since we cannot move around the physical interrupt between CPUs.
> Instead, we dedicate each physical interrupt to the CPU and service the
> MSI across different event queues accordingly when user change irq affinity.
>
> >>
> >> Other drivers are using bitmap_find_free_region() function with
> >> order_base_2(nr_irqs) as argument.
> >>
>
> Yes we should do that too.
>
> >> I hope that somebody else more skilled with MSI interrupts look at these
> >> constructions if are correct or needs more rework.
> >>
> >
> > I see the point - i'll ask also in our engineering department ...
> >
> >>>         } else {
> >>>                 mutex_unlock(&msi->bitmap_lock);
> >>>                 return -ENOSPC;
> >>> @@ -292,7 +292,7 @@ static void iproc_msi_irq_domain_free(struct
> >>> irq_domain *domain,
> >>>         mutex_lock(&msi->bitmap_lock);
> >>>
> >>>         hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
> >>> -       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
> >>> +       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
> >>>
> >>>         mutex_unlock(&msi->bitmap_lock);
> >>>
> >>>
> >>> On Thu, May 20, 2021 at 2:04 PM Pali Rohár <pali@kernel.org> wrote:
> >>>>
> >>>> Hello!
> >>>>
> >>>> I think there is a bug in pcie-iproc-msi.c driver. It declares
> >>>> Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see:
> >>>>
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174
> >>>>
> >>>> but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs
> >>>> argument when allocating interrupt numbers from bitmap, see:
> >>>>
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246
> >>>>
> >>>> I think this this is incorrect as alloc callback should allocate nr_irqs
> >>>> multi interrupts as caller requested. All other drivers with Multi MSI
> >>>> support are doing it.
> >>>>
> >>>> Could you look at it?

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-21 11:39         ` Sandor Bodo-Merle
@ 2021-05-21 14:00           ` Sandor Bodo-Merle
  0 siblings, 0 replies; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-05-21 14:00 UTC (permalink / raw)
  To: Ray Jui; +Cc: Pali Rohár, linux-pci, bcm-kernel-feedback-list

Reading back the patches - and comparing them to the multi-MSI driver
versions that are using "order_base_2(nr_irqs)"
it seems that our engineering went on a different route.
We store the allocated hwirq in the MSI message data field (shifted by
5 - to leave room for the max 32 vectors).
So instead of using something like find_next_bit() - and i guess in
that case you really need properly aligned bitfields - we
use all the available data from the msi vector to be able to infer the
proper virq number.

btw - my interpretation can be totally bogus - since it was a while i
looked on the MSI code.

On Fri, May 21, 2021 at 1:39 PM Sandor Bodo-Merle <sbodomerle@gmail.com> wrote:
>
> Digging up our patch queue - i found another multi-MSI related fix:
>
>     Unfortunately the reverse mapping of the hwirq - as made by
>     irq_find_mapping() was not applied to the message data only, but
>     also to the MSI vector, which was lost as a result.
>     Make sure that the reverse mapping is applied to the proper hwirq,
>     contained in the message data.
>     Tested on Saber2 and Katana2.
>
>     Fixes: fc54bae288182 ("PCI: iproc: Allow allocation of multiple MSIs")
>
> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
> index 990fc906d73d..708fdb1065f8 100644
> --- drivers/pci/host/pcie-iproc-msi.c
> +++ drivers/pci/host/pcie-iproc-msi.c
> @@ -237,7 +237,12 @@ static void iproc_msi_irq_compose_msi_msg(struct
> irq_data *data,
>         addr = msi->msi_addr + iproc_msi_addr_offset(msi, data->hwirq);
>         msg->address_lo = lower_32_bits(addr);
>         msg->address_hi = upper_32_bits(addr);
> -       msg->data = data->hwirq << 5;
> +       /*
> +        * Since we have multiple hwirq mapped to a single MSI vector,
> +        * now we need to derive the hwirq at CPU0.  It can then be used to
> +        * mapped back to virq.
> +        */
> +       msg->data = hwirq_to_canonical_hwirq(msi, data->hwirq) << 5;
>  }
>
>  static struct irq_chip iproc_msi_bottom_irq_chip = {
> @@ -307,14 +312,8 @@ static inline u32 decode_msi_hwirq(struct
> iproc_msi *msi, u32 eq, u32 head)
>         offs = iproc_msi_eq_offset(msi, eq) + head * sizeof(u32);
>         msg = (u32 *)(msi->eq_cpu + offs);
>         hwirq = readl(msg);
> -       hwirq = (hwirq >> 5) + (hwirq & 0x1f);
>
> -       /*
> -        * Since we have multiple hwirq mapped to a single MSI vector,
> -        * now we need to derive the hwirq at CPU0.  It can then be used to
> -        * mapped back to virq.
> -        */
> -       return hwirq_to_canonical_hwirq(msi, hwirq);
> +       return hwirq;
>  }
>
>  static void iproc_msi_handler(struct irq_desc *desc)
> @@ -360,7 +359,7 @@ static void iproc_msi_handler(struct irq_desc *desc)
>                 /* process all outstanding events */
>                 while (nr_events--) {
>                         hwirq = decode_msi_hwirq(msi, eq, head);
> -                       virq = irq_find_mapping(msi->inner_domain, hwirq);
> +                       virq = irq_find_mapping(msi->inner_domain,
> hwirq >> 5) + (hwirq & 0x1f);
>                         generic_handle_irq(virq);
>
>                         head++;
>
> On Thu, May 20, 2021 at 7:11 PM Ray Jui <ray.jui@broadcom.com> wrote:
> >
> >
> >
> > On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
> > > On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@kernel.org> wrote:
> > >>
> > >> Hello!
> > >>
> > >> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
> > >>> Hi Pali,
> > >>>
> > >>> thanks for catching this - i dig up the followup fixup commit we have
> > >>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately
> > >>> we missed upstreaming it).
> > >>>
> > >>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> > >>> failed to reserve the proper number of bits from the inner domain.
> > >>> We need to allocate the proper amount of bits otherwise the domains for
> > >>> multiple PCIe endpoints may overlap and freeing one of them will result
> > >>> in freeing unrelated MSI vectors.
> > >>>
> > >>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> > >>> ---
> > >>>  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
> > >>>  1 file changed, 4 insertions(+), 4 deletions(-)
> > >>>
> > >>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
> > >>> index 708fdb1065f8..a00492dccb74 100644
> > >>> --- drivers/pci/host/pcie-iproc-msi.c
> > >>> +++ drivers/pci/host/pcie-iproc-msi.c
> > >>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
> > >>> irq_domain *domain,
> > >>>
> > >>>         mutex_lock(&msi->bitmap_lock);
> > >>>
> > >>> -       /* Allocate 'nr_cpus' number of MSI vectors each time */
> > >>> +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
> > >>> vectors each time */
> > >>>         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
> > >>> -                                          msi->nr_cpus, 0);
> > >>> +                                          msi->nr_cpus * nr_irqs, 0);
> > >>
> > >> I'm not sure if this construction is correct. Multi-MSI interrupts needs
> > >> to be aligned to number of requested interrupts. So if wifi driver asks
> > >> for 32 Multi-MSI interrupts then first allocated interrupt number must
> > >> be dividable by 32.
> > >>
> > >
> > > Ahh - i guess you are right. In our internal engineering we always
> > > request 32 vectors.
> > > IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly.
> > >
> >
> > May I ask which platforms are you guys running this driver on? Cygnus or
> > Northstar? Not that it matters, but just out of curiosity.
> >
> > Let me start by explaining how MSI support works in this driver, or more
> > precisely, for all platforms that support this iProc based event queue
> > MSI scheme:
> >
> > In iProc PCIe core, each MSI group is serviced by a GIC interrupt
> > (hwirq) and a dedicated event queue (event queue is paired up with
> > hwirq).  Each MSI group can support up to 64 MSI vectors. Note 64 is the
> > depth of the event queue.
> >
> > The number of MSI groups varies between different iProc SoCs. The total
> > number of CPU cores also varies. To support MSI IRQ affinity, we
> > distribute GIC interrupts across all available CPUs.  MSI vector is
> > moved from one GIC interrupt to another to steer to the target CPU.
> >
> > Assuming:
> > The number of MSI groups (the number of total hwirq for this PCIe
> > controller) is M
> > The number of CPU cores is N
> > M is always a multiple of N (we ensured that in the setup function)
> >
> > Therefore:
> > Total number of raw MSI vectors = M * 64
> > Total number of supported MSI vectors = (M * 64) / N
> >
> > I guess I'm not too clear on what you mean by "multi-MSI interrupts
> > needs to be aligned to number of requested interrupts.". Would you be
> > able to plug this into the above explanation so we can have a more clear
> > understanding of what you mean here?
> >
> > In general, I don't see much issue of allocating 'msi->nr_cpus *
> > nr_irqs' here as long as we can still meet the affinity distribution
> > requirement as mentioned above.
> >
> > Example in the dw PCIe driver does the following for reserving in the
> > bitmap:
> >
> > bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors,
> > order_base_2(nr_irqs));
> >
> > >>>         if (hwirq < msi->nr_msi_vecs) {
> > >>> -               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
> > >>> +               bitmap_set(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
> > >>
> > >> And another issue is that only power of 2 interrupts for Multi-MSI can
> > >> be allocated. Otherwise one number may be allocated to more devices.
> > >>
> > >> But I'm not sure how number of CPUs affects it as other PCIe controller
> > >> drivers do not use number of CPUs.
> >
> > As I explained above, we definitely need to take nr_cpus into account,
> > since we cannot move around the physical interrupt between CPUs.
> > Instead, we dedicate each physical interrupt to the CPU and service the
> > MSI across different event queues accordingly when user change irq affinity.
> >
> > >>
> > >> Other drivers are using bitmap_find_free_region() function with
> > >> order_base_2(nr_irqs) as argument.
> > >>
> >
> > Yes we should do that too.
> >
> > >> I hope that somebody else more skilled with MSI interrupts look at these
> > >> constructions if are correct or needs more rework.
> > >>
> > >
> > > I see the point - i'll ask also in our engineering department ...
> > >
> > >>>         } else {
> > >>>                 mutex_unlock(&msi->bitmap_lock);
> > >>>                 return -ENOSPC;
> > >>> @@ -292,7 +292,7 @@ static void iproc_msi_irq_domain_free(struct
> > >>> irq_domain *domain,
> > >>>         mutex_lock(&msi->bitmap_lock);
> > >>>
> > >>>         hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
> > >>> -       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
> > >>> +       bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs);
> > >>>
> > >>>         mutex_unlock(&msi->bitmap_lock);
> > >>>
> > >>>
> > >>> On Thu, May 20, 2021 at 2:04 PM Pali Rohár <pali@kernel.org> wrote:
> > >>>>
> > >>>> Hello!
> > >>>>
> > >>>> I think there is a bug in pcie-iproc-msi.c driver. It declares
> > >>>> Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see:
> > >>>>
> > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174
> > >>>>
> > >>>> but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs
> > >>>> argument when allocating interrupt numbers from bitmap, see:
> > >>>>
> > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246
> > >>>>
> > >>>> I think this this is incorrect as alloc callback should allocate nr_irqs
> > >>>> multi interrupts as caller requested. All other drivers with Multi MSI
> > >>>> support are doing it.
> > >>>>
> > >>>> Could you look at it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-20 17:11       ` Ray Jui
  2021-05-20 19:31         ` Sandor Bodo-Merle
  2021-05-21 11:39         ` Sandor Bodo-Merle
@ 2021-05-24 10:37         ` Marc Zyngier
  2021-05-25 17:27           ` Ray Jui
  2 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-05-24 10:37 UTC (permalink / raw)
  To: Ray Jui
  Cc: Sandor Bodo-Merle, Pali Rohár, linux-pci, bcm-kernel-feedback-list

On Thu, 20 May 2021 18:11:32 +0100,
Ray Jui <ray.jui@broadcom.com> wrote:
>
> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
> > On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@kernel.org> wrote:
> >>
> >> Hello!
> >>
> >> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
> >>> Hi Pali,
> >>>
> >>> thanks for catching this - i dig up the followup fixup commit we have
> >>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately
> >>> we missed upstreaming it).
> >>>
> >>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> >>> failed to reserve the proper number of bits from the inner domain.
> >>> We need to allocate the proper amount of bits otherwise the domains for
> >>> multiple PCIe endpoints may overlap and freeing one of them will result
> >>> in freeing unrelated MSI vectors.
> >>>
> >>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> >>> ---
> >>>  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
> >>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
> >>> index 708fdb1065f8..a00492dccb74 100644
> >>> --- drivers/pci/host/pcie-iproc-msi.c
> >>> +++ drivers/pci/host/pcie-iproc-msi.c
> >>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
> >>> irq_domain *domain,
> >>>
> >>>         mutex_lock(&msi->bitmap_lock);
> >>>
> >>> -       /* Allocate 'nr_cpus' number of MSI vectors each time */
> >>> +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
> >>> vectors each time */
> >>>         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
> >>> -                                          msi->nr_cpus, 0);
> >>> +                                          msi->nr_cpus * nr_irqs, 0);
> >>
> >> I'm not sure if this construction is correct. Multi-MSI interrupts needs
> >> to be aligned to number of requested interrupts. So if wifi driver asks
> >> for 32 Multi-MSI interrupts then first allocated interrupt number must
> >> be dividable by 32.
> >>
> > 
> > Ahh - i guess you are right. In our internal engineering we always
> > request 32 vectors.
> > IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly.
> > 
> 
> May I ask which platforms are you guys running this driver on? Cygnus or
> Northstar? Not that it matters, but just out of curiosity.
> 
> Let me start by explaining how MSI support works in this driver, or more
> precisely, for all platforms that support this iProc based event queue
> MSI scheme:
> 
> In iProc PCIe core, each MSI group is serviced by a GIC interrupt
> (hwirq) and a dedicated event queue (event queue is paired up with
> hwirq).  Each MSI group can support up to 64 MSI vectors. Note 64 is the
> depth of the event queue.
> 
> The number of MSI groups varies between different iProc SoCs. The total
> number of CPU cores also varies. To support MSI IRQ affinity, we
> distribute GIC interrupts across all available CPUs.  MSI vector is
> moved from one GIC interrupt to another to steer to the target CPU.
> 
> Assuming:
> The number of MSI groups (the number of total hwirq for this PCIe
> controller) is M
> The number of CPU cores is N
> M is always a multiple of N (we ensured that in the setup function)
> 
> Therefore:
> Total number of raw MSI vectors = M * 64
> Total number of supported MSI vectors = (M * 64) / N
> 
> I guess I'm not too clear on what you mean by "multi-MSI interrupts
> needs to be aligned to number of requested interrupts.". Would you be
> able to plug this into the above explanation so we can have a more clear
> understanding of what you mean here?

That's a generic PCI requirement: if you are providing a Multi-MSI
configuration, the base vector number has to be size-aligned
(2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
supplies up to 5 bits that are orr-ed into the base vector number,
with a *single* doorbell address. You effectively provide a single MSI
number and a single address, and the device knows how to drive 2^n MSIs.

This is different from MSI-X, which defines multiple individual
vectors, each with their own doorbell address.

The main problem you have here (other than the broken allocation
mechanism) is that moving an interrupt from one core to another
implies moving the doorbell address to that of another MSI
group. This isn't possible for Multi-MSI, as all the MSIs must have
the same doorbell address. As far as I can see, there is no way to
support Multi-MSI together with affinity change on this HW, and you
should stop advertising support for this feature.

There is also a more general problem here, which is the atomicity of
the update on affinity change. If you are moving an interrupt from one
CPU to the other, it seems you change both the vector number and the
target address. If that is the case, this isn't atomic, and you may
end-up with the device generating a message based on a half-applied
update.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-24 10:37         ` Marc Zyngier
@ 2021-05-25 17:27           ` Ray Jui
  2021-05-25 17:35             ` Pali Rohár
  2021-05-26  7:57             ` Marc Zyngier
  0 siblings, 2 replies; 20+ messages in thread
From: Ray Jui @ 2021-05-25 17:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sandor Bodo-Merle, Pali Rohár, linux-pci, bcm-kernel-feedback-list

[-- Attachment #1: Type: text/plain, Size: 6089 bytes --]



On 5/24/2021 3:37 AM, Marc Zyngier wrote:
> On Thu, 20 May 2021 18:11:32 +0100,
> Ray Jui <ray.jui@broadcom.com> wrote:
>>
>> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
>>> On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@kernel.org> wrote:
>>>>
>>>> Hello!
>>>>
>>>> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
>>>>> Hi Pali,
>>>>>
>>>>> thanks for catching this - i dig up the followup fixup commit we have
>>>>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately
>>>>> we missed upstreaming it).
>>>>>
>>>>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>>>>> failed to reserve the proper number of bits from the inner domain.
>>>>> We need to allocate the proper amount of bits otherwise the domains for
>>>>> multiple PCIe endpoints may overlap and freeing one of them will result
>>>>> in freeing unrelated MSI vectors.
>>>>>
>>>>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>>>>> ---
>>>>>  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
>>>>> index 708fdb1065f8..a00492dccb74 100644
>>>>> --- drivers/pci/host/pcie-iproc-msi.c
>>>>> +++ drivers/pci/host/pcie-iproc-msi.c
>>>>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
>>>>> irq_domain *domain,
>>>>>
>>>>>         mutex_lock(&msi->bitmap_lock);
>>>>>
>>>>> -       /* Allocate 'nr_cpus' number of MSI vectors each time */
>>>>> +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
>>>>> vectors each time */
>>>>>         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
>>>>> -                                          msi->nr_cpus, 0);
>>>>> +                                          msi->nr_cpus * nr_irqs, 0);
>>>>
>>>> I'm not sure if this construction is correct. Multi-MSI interrupts needs
>>>> to be aligned to number of requested interrupts. So if wifi driver asks
>>>> for 32 Multi-MSI interrupts then first allocated interrupt number must
>>>> be dividable by 32.
>>>>
>>>
>>> Ahh - i guess you are right. In our internal engineering we always
>>> request 32 vectors.
>>> IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly.
>>>
>>
>> May I ask which platforms are you guys running this driver on? Cygnus or
>> Northstar? Not that it matters, but just out of curiosity.
>>
>> Let me start by explaining how MSI support works in this driver, or more
>> precisely, for all platforms that support this iProc based event queue
>> MSI scheme:
>>
>> In iProc PCIe core, each MSI group is serviced by a GIC interrupt
>> (hwirq) and a dedicated event queue (event queue is paired up with
>> hwirq).  Each MSI group can support up to 64 MSI vectors. Note 64 is the
>> depth of the event queue.
>>
>> The number of MSI groups varies between different iProc SoCs. The total
>> number of CPU cores also varies. To support MSI IRQ affinity, we
>> distribute GIC interrupts across all available CPUs.  MSI vector is
>> moved from one GIC interrupt to another to steer to the target CPU.
>>
>> Assuming:
>> The number of MSI groups (the number of total hwirq for this PCIe
>> controller) is M
>> The number of CPU cores is N
>> M is always a multiple of N (we ensured that in the setup function)
>>
>> Therefore:
>> Total number of raw MSI vectors = M * 64
>> Total number of supported MSI vectors = (M * 64) / N
>>
>> I guess I'm not too clear on what you mean by "multi-MSI interrupts
>> needs to be aligned to number of requested interrupts.". Would you be
>> able to plug this into the above explanation so we can have a more clear
>> understanding of what you mean here?
> 
> That's a generic PCI requirement: if you are providing a Multi-MSI
> configuration, the base vector number has to be size-aligned
> (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
> supplies up to 5 bits that are orr-ed into the base vector number,
> with a *single* doorbell address. You effectively provide a single MSI
> number and a single address, and the device knows how to drive 2^n MSIs.
> 
> This is different from MSI-X, which defines multiple individual
> vectors, each with their own doorbell address.
> 
> The main problem you have here (other than the broken allocation
> mechanism) is that moving an interrupt from one core to another
> implies moving the doorbell address to that of another MSI
> group. This isn't possible for Multi-MSI, as all the MSIs must have
> the same doorbell address. As far as I can see, there is no way to
> support Multi-MSI together with affinity change on this HW, and you
> should stop advertising support for this feature.
> 

I was not aware of the fact that multi-MSI needs to use the same
doorbell address (aka MSI posted write address?). Thank you for helping
to point it out. In this case, yes, like you said, we cannot possibly
support both multi-MSI and affinity at the same time, since supporting
affinity requires us to move from one to another event queue (and irq)
that will have different doorbell address.

Do you think it makes sense to do the following by only advertising
multi-MSI capability in the single CPU core case (detected runtime via
'num_possible_cpus')? This will at least allow multi-MSI to work in
platforms with single CPU core that Sandor and Pali use?

> There is also a more general problem here, which is the atomicity of
> the update on affinity change. If you are moving an interrupt from one
> CPU to the other, it seems you change both the vector number and the
> target address. If that is the case, this isn't atomic, and you may
> end-up with the device generating a message based on a half-applied
> update.

Are you referring to the callback in 'irq_set_addinity" and
'irq_compose_msi_msg'? In such case, can you help to recommend a
solution for it (or there's no solution based on such architecture)? It
does not appear such atomy can be enforced from the irq framework level.

Thanks,

Ray

> 
> Thanks,
> 
> 	M.
> 

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-25 17:27           ` Ray Jui
@ 2021-05-25 17:35             ` Pali Rohár
  2021-05-25 17:39               ` Ray Jui
  2021-05-26  7:57             ` Marc Zyngier
  1 sibling, 1 reply; 20+ messages in thread
From: Pali Rohár @ 2021-05-25 17:35 UTC (permalink / raw)
  To: Ray Jui
  Cc: Marc Zyngier, Sandor Bodo-Merle, linux-pci, bcm-kernel-feedback-list

On Tuesday 25 May 2021 10:27:54 Ray Jui wrote:
> platforms with single CPU core that Sandor and Pali use?

Just to note that I'm not using this driver and platform. I have just
spotted suspicious function in this driver and therefore I started this
discussion.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-25 17:35             ` Pali Rohár
@ 2021-05-25 17:39               ` Ray Jui
  0 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2021-05-25 17:39 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Marc Zyngier, Sandor Bodo-Merle, linux-pci, bcm-kernel-feedback-list

[-- Attachment #1: Type: text/plain, Size: 380 bytes --]



On 5/25/2021 10:35 AM, Pali Rohár wrote:
> On Tuesday 25 May 2021 10:27:54 Ray Jui wrote:
>> platforms with single CPU core that Sandor and Pali use?
> 
> Just to note that I'm not using this driver and platform. I have just
> spotted suspicious function in this driver and therefore I started this
> discussion.
> 

Got it, and thanks for helping to point out the issue.

Ray

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-25 17:27           ` Ray Jui
  2021-05-25 17:35             ` Pali Rohár
@ 2021-05-26  7:57             ` Marc Zyngier
  2021-05-26 16:10               ` Sandor Bodo-Merle
  2021-05-26 17:34               ` Ray Jui
  1 sibling, 2 replies; 20+ messages in thread
From: Marc Zyngier @ 2021-05-26  7:57 UTC (permalink / raw)
  To: Ray Jui
  Cc: Sandor Bodo-Merle, Pali Rohár, linux-pci, bcm-kernel-feedback-list

On Tue, 25 May 2021 18:27:54 +0100,
Ray Jui <ray.jui@broadcom.com> wrote:
>
> On 5/24/2021 3:37 AM, Marc Zyngier wrote:
> > On Thu, 20 May 2021 18:11:32 +0100,
> > Ray Jui <ray.jui@broadcom.com> wrote:
> >>
> >> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:

[...]

> >> I guess I'm not too clear on what you mean by "multi-MSI interrupts
> >> needs to be aligned to number of requested interrupts.". Would you be
> >> able to plug this into the above explanation so we can have a more clear
> >> understanding of what you mean here?
> > 
> > That's a generic PCI requirement: if you are providing a Multi-MSI
> > configuration, the base vector number has to be size-aligned
> > (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
> > supplies up to 5 bits that are orr-ed into the base vector number,
> > with a *single* doorbell address. You effectively provide a single MSI
> > number and a single address, and the device knows how to drive 2^n MSIs.
> > 
> > This is different from MSI-X, which defines multiple individual
> > vectors, each with their own doorbell address.
> > 
> > The main problem you have here (other than the broken allocation
> > mechanism) is that moving an interrupt from one core to another
> > implies moving the doorbell address to that of another MSI
> > group. This isn't possible for Multi-MSI, as all the MSIs must have
> > the same doorbell address. As far as I can see, there is no way to
> > support Multi-MSI together with affinity change on this HW, and you
> > should stop advertising support for this feature.
> > 
> 
> I was not aware of the fact that multi-MSI needs to use the same
> doorbell address (aka MSI posted write address?). Thank you for helping
> to point it out. In this case, yes, like you said, we cannot possibly
> support both multi-MSI and affinity at the same time, since supporting
> affinity requires us to move from one to another event queue (and irq)
> that will have different doorbell address.
> 
> Do you think it makes sense to do the following by only advertising
> multi-MSI capability in the single CPU core case (detected runtime via
> 'num_possible_cpus')? This will at least allow multi-MSI to work in
> platforms with single CPU core that Sandor and Pali use?

I don't think this makes much sense. Single-CPU machines are an oddity
these days, and I'd rather you simplify this (already pretty
complicated) driver.

> > There is also a more general problem here, which is the atomicity of
> > the update on affinity change. If you are moving an interrupt from one
> > CPU to the other, it seems you change both the vector number and the
> > target address. If that is the case, this isn't atomic, and you may
> > end-up with the device generating a message based on a half-applied
> > update.
> 
> Are you referring to the callback in 'irq_set_addinity" and
> 'irq_compose_msi_msg'? In such case, can you help to recommend a
> solution for it (or there's no solution based on such architecture)? It
> does not appear such atomy can be enforced from the irq framework level.

irq_compose_msi_msg() is only one part of the problem. The core of the
issue is that the programming of the end-point is not atomic (you need
to update a 32bit payload *and* a 64bit address).

A solution to workaround it would be to rework the way you allocate
the vectors, making them constant across all CPUs so that only the
address changes when changing the affinity.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-26  7:57             ` Marc Zyngier
@ 2021-05-26 16:10               ` Sandor Bodo-Merle
  2021-06-02  8:34                 ` Marc Zyngier
  2021-05-26 17:34               ` Ray Jui
  1 sibling, 1 reply; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-05-26 16:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ray Jui, Pali Rohár, linux-pci, bcm-kernel-feedback-list

[-- Attachment #1: Type: text/plain, Size: 4004 bytes --]

The following patch addresses the allocation issue - but indeed - wont
fix the atomicity of IRQ affinity in this driver (but the majority of
our product relies on single core SOCs; we also use a dual-core SOC
also - but we don't change the initial the IRQ affinity).

On Wed, May 26, 2021 at 9:57 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 25 May 2021 18:27:54 +0100,
> Ray Jui <ray.jui@broadcom.com> wrote:
> >
> > On 5/24/2021 3:37 AM, Marc Zyngier wrote:
> > > On Thu, 20 May 2021 18:11:32 +0100,
> > > Ray Jui <ray.jui@broadcom.com> wrote:
> > >>
> > >> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
>
> [...]
>
> > >> I guess I'm not too clear on what you mean by "multi-MSI interrupts
> > >> needs to be aligned to number of requested interrupts.". Would you be
> > >> able to plug this into the above explanation so we can have a more clear
> > >> understanding of what you mean here?
> > >
> > > That's a generic PCI requirement: if you are providing a Multi-MSI
> > > configuration, the base vector number has to be size-aligned
> > > (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
> > > supplies up to 5 bits that are orr-ed into the base vector number,
> > > with a *single* doorbell address. You effectively provide a single MSI
> > > number and a single address, and the device knows how to drive 2^n MSIs.
> > >
> > > This is different from MSI-X, which defines multiple individual
> > > vectors, each with their own doorbell address.
> > >
> > > The main problem you have here (other than the broken allocation
> > > mechanism) is that moving an interrupt from one core to another
> > > implies moving the doorbell address to that of another MSI
> > > group. This isn't possible for Multi-MSI, as all the MSIs must have
> > > the same doorbell address. As far as I can see, there is no way to
> > > support Multi-MSI together with affinity change on this HW, and you
> > > should stop advertising support for this feature.
> > >
> >
> > I was not aware of the fact that multi-MSI needs to use the same
> > doorbell address (aka MSI posted write address?). Thank you for helping
> > to point it out. In this case, yes, like you said, we cannot possibly
> > support both multi-MSI and affinity at the same time, since supporting
> > affinity requires us to move from one to another event queue (and irq)
> > that will have different doorbell address.
> >
> > Do you think it makes sense to do the following by only advertising
> > multi-MSI capability in the single CPU core case (detected runtime via
> > 'num_possible_cpus')? This will at least allow multi-MSI to work in
> > platforms with single CPU core that Sandor and Pali use?
>
> I don't think this makes much sense. Single-CPU machines are an oddity
> these days, and I'd rather you simplify this (already pretty
> complicated) driver.
>
> > > There is also a more general problem here, which is the atomicity of
> > > the update on affinity change. If you are moving an interrupt from one
> > > CPU to the other, it seems you change both the vector number and the
> > > target address. If that is the case, this isn't atomic, and you may
> > > end-up with the device generating a message based on a half-applied
> > > update.
> >
> > Are you referring to the callback in 'irq_set_addinity" and
> > 'irq_compose_msi_msg'? In such case, can you help to recommend a
> > solution for it (or there's no solution based on such architecture)? It
> > does not appear such atomy can be enforced from the irq framework level.
>
> irq_compose_msi_msg() is only one part of the problem. The core of the
> issue is that the programming of the end-point is not atomic (you need
> to update a 32bit payload *and* a 64bit address).
>
> A solution to workaround it would be to rework the way you allocate
> the vectors, making them constant across all CPUs so that only the
> address changes when changing the affinity.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

[-- Attachment #2: 0001-PCI-iproc-fix-the-base-vector-number-allocation-for-.patch --]
[-- Type: text/x-diff, Size: 2239 bytes --]

From df31c9c0333ca4922b7978b30719348e368bea3c Mon Sep 17 00:00:00 2001
From: Sandor Bodo-Merle <sbodomerle@gmail.com>
Date: Wed, 26 May 2021 17:48:16 +0200
Subject: [PATCH] PCI: iproc: fix the base vector number allocation for Multi
 MSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
failed to reserve the proper number of bits from the inner domain.
Natural alignment of the base vector number was also not guaranteed.

Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
Reported-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Sandor Bodo-Merle <sbodomerle@gmail.com>
---
 drivers/pci/controller/pcie-iproc-msi.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git drivers/pci/controller/pcie-iproc-msi.c drivers/pci/controller/pcie-iproc-msi.c
index eede4e8f3f75..fa2734dd8482 100644
--- drivers/pci/controller/pcie-iproc-msi.c
+++ drivers/pci/controller/pcie-iproc-msi.c
@@ -252,18 +252,15 @@ static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
 
 	mutex_lock(&msi->bitmap_lock);
 
-	/* Allocate 'nr_cpus' number of MSI vectors each time */
-	hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
-					   msi->nr_cpus, 0);
-	if (hwirq < msi->nr_msi_vecs) {
-		bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
-	} else {
-		mutex_unlock(&msi->bitmap_lock);
-		return -ENOSPC;
-	}
+	/* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI vectors each time */
+	hwirq = bitmap_find_free_region(msi->bitmap, msi->nr_msi_vecs,
+					order_base_2(msi->nr_cpus * nr_irqs));
 
 	mutex_unlock(&msi->bitmap_lock);
 
+	if (hwirq < 0)
+		return -ENOSPC;
+
 	for (i = 0; i < nr_irqs; i++) {
 		irq_domain_set_info(domain, virq + i, hwirq + i,
 				    &iproc_msi_bottom_irq_chip,
@@ -284,7 +281,8 @@ static void iproc_msi_irq_domain_free(struct irq_domain *domain,
 	mutex_lock(&msi->bitmap_lock);
 
 	hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
-	bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
+	bitmap_release_region(msi->bitmap, hwirq,
+			      order_base_2(msi->nr_cpus * nr_irqs));
 
 	mutex_unlock(&msi->bitmap_lock);
 
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-26  7:57             ` Marc Zyngier
  2021-05-26 16:10               ` Sandor Bodo-Merle
@ 2021-05-26 17:34               ` Ray Jui
  2021-05-26 20:18                 ` Sandor Bodo-Merle
  1 sibling, 1 reply; 20+ messages in thread
From: Ray Jui @ 2021-05-26 17:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sandor Bodo-Merle, Pali Rohár, linux-pci, bcm-kernel-feedback-list

[-- Attachment #1: Type: text/plain, Size: 4371 bytes --]



On 5/26/2021 12:57 AM, Marc Zyngier wrote:
> On Tue, 25 May 2021 18:27:54 +0100,
> Ray Jui <ray.jui@broadcom.com> wrote:
>>
>> On 5/24/2021 3:37 AM, Marc Zyngier wrote:
>>> On Thu, 20 May 2021 18:11:32 +0100,
>>> Ray Jui <ray.jui@broadcom.com> wrote:
>>>>
>>>> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
> 
> [...]
> 
>>>> I guess I'm not too clear on what you mean by "multi-MSI interrupts
>>>> needs to be aligned to number of requested interrupts.". Would you be
>>>> able to plug this into the above explanation so we can have a more clear
>>>> understanding of what you mean here?
>>>
>>> That's a generic PCI requirement: if you are providing a Multi-MSI
>>> configuration, the base vector number has to be size-aligned
>>> (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
>>> supplies up to 5 bits that are orr-ed into the base vector number,
>>> with a *single* doorbell address. You effectively provide a single MSI
>>> number and a single address, and the device knows how to drive 2^n MSIs.
>>>
>>> This is different from MSI-X, which defines multiple individual
>>> vectors, each with their own doorbell address.
>>>
>>> The main problem you have here (other than the broken allocation
>>> mechanism) is that moving an interrupt from one core to another
>>> implies moving the doorbell address to that of another MSI
>>> group. This isn't possible for Multi-MSI, as all the MSIs must have
>>> the same doorbell address. As far as I can see, there is no way to
>>> support Multi-MSI together with affinity change on this HW, and you
>>> should stop advertising support for this feature.
>>>
>>
>> I was not aware of the fact that multi-MSI needs to use the same
>> doorbell address (aka MSI posted write address?). Thank you for helping
>> to point it out. In this case, yes, like you said, we cannot possibly
>> support both multi-MSI and affinity at the same time, since supporting
>> affinity requires us to move from one to another event queue (and irq)
>> that will have different doorbell address.
>>
>> Do you think it makes sense to do the following by only advertising
>> multi-MSI capability in the single CPU core case (detected runtime via
>> 'num_possible_cpus')? This will at least allow multi-MSI to work in
>> platforms with single CPU core that Sandor and Pali use?
> 
> I don't think this makes much sense. Single-CPU machines are an oddity
> these days, and I'd rather you simplify this (already pretty
> complicated) driver.
> 

Here's the thing. All Broadcom ARMv8 based SoCs have migrated to use
either gicv2m or gicv3-its for MSI/MSIX support. The platforms that
still use iProc event queue based MSI are only legacy ARMv7 based
platforms. Out of them:

NSP - dual core
Cygnus - single core
HR2 - single core

So based on this, it seems to me that it still makes sense to allow
multi-msi to be supported on single core platforms, and Sandor's company
seems to need such support in their particular use case. Sandor, can you
confirm?

>>> There is also a more general problem here, which is the atomicity of
>>> the update on affinity change. If you are moving an interrupt from one
>>> CPU to the other, it seems you change both the vector number and the
>>> target address. If that is the case, this isn't atomic, and you may
>>> end-up with the device generating a message based on a half-applied
>>> update.
>>
>> Are you referring to the callback in 'irq_set_addinity" and
>> 'irq_compose_msi_msg'? In such case, can you help to recommend a
>> solution for it (or there's no solution based on such architecture)? It
>> does not appear such atomy can be enforced from the irq framework level.
> 
> irq_compose_msi_msg() is only one part of the problem. The core of the
> issue is that the programming of the end-point is not atomic (you need
> to update a 32bit payload *and* a 64bit address).
> 
> A solution to workaround it would be to rework the way you allocate
> the vectors, making them constant across all CPUs so that only the
> address changes when changing the affinity.
> 

Thanks. This makes sense. And it looks like this can be addressed
separately from the above issue. I'll have to allocate time to work on
this. In addition, I'd also need someone else with the NSP dual-core
platform to test it for me since we don't have these legacy platforms in
our office anymore.

> Thanks,
> 
> 	M.
> 

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-26 17:34               ` Ray Jui
@ 2021-05-26 20:18                 ` Sandor Bodo-Merle
  0 siblings, 0 replies; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-05-26 20:18 UTC (permalink / raw)
  To: Ray Jui
  Cc: Marc Zyngier, Pali Rohár, linux-pci, bcm-kernel-feedback-list

On Wed, May 26, 2021 at 7:34 PM Ray Jui <ray.jui@broadcom.com> wrote:
>
>
> Here's the thing. All Broadcom ARMv8 based SoCs have migrated to use
> either gicv2m or gicv3-its for MSI/MSIX support. The platforms that
> still use iProc event queue based MSI are only legacy ARMv7 based
> platforms. Out of them:
>
> NSP - dual core
> Cygnus - single core
> HR2 - single core
>
> So based on this, it seems to me that it still makes sense to allow
> multi-msi to be supported on single core platforms, and Sandor's company
> seems to need such support in their particular use case. Sandor, can you
> confirm?

Right - we are using it in production on legacy ARMv7 SOCs.

>
>
> Thanks. This makes sense. And it looks like this can be addressed
> separately from the above issue. I'll have to allocate time to work on
> this. In addition, I'd also need someone else with the NSP dual-core
> platform to test it for me since we don't have these legacy platforms in
> our office anymore.
>

I will be able to test patches on the XGS Katana2 SOC - which is dual core.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-05-26 16:10               ` Sandor Bodo-Merle
@ 2021-06-02  8:34                 ` Marc Zyngier
  2021-06-03 16:59                   ` Ray Jui
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-06-02  8:34 UTC (permalink / raw)
  To: Sandor Bodo-Merle
  Cc: Ray Jui, Pali Rohár, linux-pci, bcm-kernel-feedback-list

On Wed, 26 May 2021 17:10:24 +0100,
Sandor Bodo-Merle <sbodomerle@gmail.com> wrote:
> 
> [1  <text/plain; UTF-8 (7bit)>]
> The following patch addresses the allocation issue - but indeed - wont
> fix the atomicity of IRQ affinity in this driver (but the majority of
> our product relies on single core SOCs; we also use a dual-core SOC
> also - but we don't change the initial the IRQ affinity).
> 
> On Wed, May 26, 2021 at 9:57 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 25 May 2021 18:27:54 +0100,
> > Ray Jui <ray.jui@broadcom.com> wrote:
> > >
> > > On 5/24/2021 3:37 AM, Marc Zyngier wrote:
> > > > On Thu, 20 May 2021 18:11:32 +0100,
> > > > Ray Jui <ray.jui@broadcom.com> wrote:
> > > >>
> > > >> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
> >
> > [...]
> >
> > > >> I guess I'm not too clear on what you mean by "multi-MSI interrupts
> > > >> needs to be aligned to number of requested interrupts.". Would you be
> > > >> able to plug this into the above explanation so we can have a more clear
> > > >> understanding of what you mean here?
> > > >
> > > > That's a generic PCI requirement: if you are providing a Multi-MSI
> > > > configuration, the base vector number has to be size-aligned
> > > > (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
> > > > supplies up to 5 bits that are orr-ed into the base vector number,
> > > > with a *single* doorbell address. You effectively provide a single MSI
> > > > number and a single address, and the device knows how to drive 2^n MSIs.
> > > >
> > > > This is different from MSI-X, which defines multiple individual
> > > > vectors, each with their own doorbell address.
> > > >
> > > > The main problem you have here (other than the broken allocation
> > > > mechanism) is that moving an interrupt from one core to another
> > > > implies moving the doorbell address to that of another MSI
> > > > group. This isn't possible for Multi-MSI, as all the MSIs must have
> > > > the same doorbell address. As far as I can see, there is no way to
> > > > support Multi-MSI together with affinity change on this HW, and you
> > > > should stop advertising support for this feature.
> > > >
> > >
> > > I was not aware of the fact that multi-MSI needs to use the same
> > > doorbell address (aka MSI posted write address?). Thank you for helping
> > > to point it out. In this case, yes, like you said, we cannot possibly
> > > support both multi-MSI and affinity at the same time, since supporting
> > > affinity requires us to move from one to another event queue (and irq)
> > > that will have different doorbell address.
> > >
> > > Do you think it makes sense to do the following by only advertising
> > > multi-MSI capability in the single CPU core case (detected runtime via
> > > 'num_possible_cpus')? This will at least allow multi-MSI to work in
> > > platforms with single CPU core that Sandor and Pali use?
> >
> > I don't think this makes much sense. Single-CPU machines are an oddity
> > these days, and I'd rather you simplify this (already pretty
> > complicated) driver.
> >
> > > > There is also a more general problem here, which is the atomicity of
> > > > the update on affinity change. If you are moving an interrupt from one
> > > > CPU to the other, it seems you change both the vector number and the
> > > > target address. If that is the case, this isn't atomic, and you may
> > > > end-up with the device generating a message based on a half-applied
> > > > update.
> > >
> > > Are you referring to the callback in 'irq_set_addinity" and
> > > 'irq_compose_msi_msg'? In such case, can you help to recommend a
> > > solution for it (or there's no solution based on such architecture)? It
> > > does not appear such atomy can be enforced from the irq framework level.
> >
> > irq_compose_msi_msg() is only one part of the problem. The core of the
> > issue is that the programming of the end-point is not atomic (you need
> > to update a 32bit payload *and* a 64bit address).
> >
> > A solution to workaround it would be to rework the way you allocate
> > the vectors, making them constant across all CPUs so that only the
> > address changes when changing the affinity.
> >
> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
> [2 0001-PCI-iproc-fix-the-base-vector-number-allocation-for-.patch <text/x-diff; UTF-8 (base64)>]
> From df31c9c0333ca4922b7978b30719348e368bea3c Mon Sep 17 00:00:00 2001
> From: Sandor Bodo-Merle <sbodomerle@gmail.com>
> Date: Wed, 26 May 2021 17:48:16 +0200
> Subject: [PATCH] PCI: iproc: fix the base vector number allocation for Multi
>  MSI
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> failed to reserve the proper number of bits from the inner domain.
> Natural alignment of the base vector number was also not guaranteed.
> 
> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> Reported-by: Pali Rohár <pali@kernel.org>
> Signed-off-by: Sandor Bodo-Merle <sbodomerle@gmail.com>
> ---
>  drivers/pci/controller/pcie-iproc-msi.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git drivers/pci/controller/pcie-iproc-msi.c drivers/pci/controller/pcie-iproc-msi.c
> index eede4e8f3f75..fa2734dd8482 100644
> --- drivers/pci/controller/pcie-iproc-msi.c
> +++ drivers/pci/controller/pcie-iproc-msi.c
> @@ -252,18 +252,15 @@ static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
>  
>  	mutex_lock(&msi->bitmap_lock);
>  
> -	/* Allocate 'nr_cpus' number of MSI vectors each time */
> -	hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
> -					   msi->nr_cpus, 0);
> -	if (hwirq < msi->nr_msi_vecs) {
> -		bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
> -	} else {
> -		mutex_unlock(&msi->bitmap_lock);
> -		return -ENOSPC;
> -	}
> +	/* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI vectors each time */
> +	hwirq = bitmap_find_free_region(msi->bitmap, msi->nr_msi_vecs,
> +					order_base_2(msi->nr_cpus * nr_irqs));
>  
>  	mutex_unlock(&msi->bitmap_lock);
>  
> +	if (hwirq < 0)
> +		return -ENOSPC;
> +
>  	for (i = 0; i < nr_irqs; i++) {
>  		irq_domain_set_info(domain, virq + i, hwirq + i,
>  				    &iproc_msi_bottom_irq_chip,
> @@ -284,7 +281,8 @@ static void iproc_msi_irq_domain_free(struct irq_domain *domain,
>  	mutex_lock(&msi->bitmap_lock);
>  
>  	hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
> -	bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
> +	bitmap_release_region(msi->bitmap, hwirq,
> +			      order_base_2(msi->nr_cpus * nr_irqs));
>  
>  	mutex_unlock(&msi->bitmap_lock);
>  

This looks reasonable. However, this doesn't change the issue that you
have with SMP systems and Multi-MSI. I'd like to see a more complete
patch (disabling Multi-MSI on SMP, at the very least).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-06-02  8:34                 ` Marc Zyngier
@ 2021-06-03 16:59                   ` Ray Jui
  2021-06-04  9:17                     ` Sandor Bodo-Merle
  0 siblings, 1 reply; 20+ messages in thread
From: Ray Jui @ 2021-06-03 16:59 UTC (permalink / raw)
  To: Marc Zyngier, Sandor Bodo-Merle
  Cc: Pali Rohár, linux-pci, bcm-kernel-feedback-list

[-- Attachment #1: Type: text/plain, Size: 7116 bytes --]



On 6/2/2021 1:34 AM, Marc Zyngier wrote:
> On Wed, 26 May 2021 17:10:24 +0100,
> Sandor Bodo-Merle <sbodomerle@gmail.com> wrote:
>>
>> [1  <text/plain; UTF-8 (7bit)>]
>> The following patch addresses the allocation issue - but indeed - wont
>> fix the atomicity of IRQ affinity in this driver (but the majority of
>> our product relies on single core SOCs; we also use a dual-core SOC
>> also - but we don't change the initial the IRQ affinity).
>>
>> On Wed, May 26, 2021 at 9:57 AM Marc Zyngier <maz@kernel.org> wrote:
>>>
>>> On Tue, 25 May 2021 18:27:54 +0100,
>>> Ray Jui <ray.jui@broadcom.com> wrote:
>>>>
>>>> On 5/24/2021 3:37 AM, Marc Zyngier wrote:
>>>>> On Thu, 20 May 2021 18:11:32 +0100,
>>>>> Ray Jui <ray.jui@broadcom.com> wrote:
>>>>>>
>>>>>> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
>>>
>>> [...]
>>>
>>>>>> I guess I'm not too clear on what you mean by "multi-MSI interrupts
>>>>>> needs to be aligned to number of requested interrupts.". Would you be
>>>>>> able to plug this into the above explanation so we can have a more clear
>>>>>> understanding of what you mean here?
>>>>>
>>>>> That's a generic PCI requirement: if you are providing a Multi-MSI
>>>>> configuration, the base vector number has to be size-aligned
>>>>> (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
>>>>> supplies up to 5 bits that are orr-ed into the base vector number,
>>>>> with a *single* doorbell address. You effectively provide a single MSI
>>>>> number and a single address, and the device knows how to drive 2^n MSIs.
>>>>>
>>>>> This is different from MSI-X, which defines multiple individual
>>>>> vectors, each with their own doorbell address.
>>>>>
>>>>> The main problem you have here (other than the broken allocation
>>>>> mechanism) is that moving an interrupt from one core to another
>>>>> implies moving the doorbell address to that of another MSI
>>>>> group. This isn't possible for Multi-MSI, as all the MSIs must have
>>>>> the same doorbell address. As far as I can see, there is no way to
>>>>> support Multi-MSI together with affinity change on this HW, and you
>>>>> should stop advertising support for this feature.
>>>>>
>>>>
>>>> I was not aware of the fact that multi-MSI needs to use the same
>>>> doorbell address (aka MSI posted write address?). Thank you for helping
>>>> to point it out. In this case, yes, like you said, we cannot possibly
>>>> support both multi-MSI and affinity at the same time, since supporting
>>>> affinity requires us to move from one to another event queue (and irq)
>>>> that will have different doorbell address.
>>>>
>>>> Do you think it makes sense to do the following by only advertising
>>>> multi-MSI capability in the single CPU core case (detected runtime via
>>>> 'num_possible_cpus')? This will at least allow multi-MSI to work in
>>>> platforms with single CPU core that Sandor and Pali use?
>>>
>>> I don't think this makes much sense. Single-CPU machines are an oddity
>>> these days, and I'd rather you simplify this (already pretty
>>> complicated) driver.
>>>
>>>>> There is also a more general problem here, which is the atomicity of
>>>>> the update on affinity change. If you are moving an interrupt from one
>>>>> CPU to the other, it seems you change both the vector number and the
>>>>> target address. If that is the case, this isn't atomic, and you may
>>>>> end-up with the device generating a message based on a half-applied
>>>>> update.
>>>>
>>>> Are you referring to the callback in 'irq_set_addinity" and
>>>> 'irq_compose_msi_msg'? In such case, can you help to recommend a
>>>> solution for it (or there's no solution based on such architecture)? It
>>>> does not appear such atomy can be enforced from the irq framework level.
>>>
>>> irq_compose_msi_msg() is only one part of the problem. The core of the
>>> issue is that the programming of the end-point is not atomic (you need
>>> to update a 32bit payload *and* a 64bit address).
>>>
>>> A solution to workaround it would be to rework the way you allocate
>>> the vectors, making them constant across all CPUs so that only the
>>> address changes when changing the affinity.
>>>
>>> Thanks,
>>>
>>>         M.
>>>
>>> --
>>> Without deviation from the norm, progress is not possible.
>> [2 0001-PCI-iproc-fix-the-base-vector-number-allocation-for-.patch <text/x-diff; UTF-8 (base64)>]
>> From df31c9c0333ca4922b7978b30719348e368bea3c Mon Sep 17 00:00:00 2001
>> From: Sandor Bodo-Merle <sbodomerle@gmail.com>
>> Date: Wed, 26 May 2021 17:48:16 +0200
>> Subject: [PATCH] PCI: iproc: fix the base vector number allocation for Multi
>>  MSI
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>> failed to reserve the proper number of bits from the inner domain.
>> Natural alignment of the base vector number was also not guaranteed.
>>
>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>> Reported-by: Pali Rohár <pali@kernel.org>
>> Signed-off-by: Sandor Bodo-Merle <sbodomerle@gmail.com>
>> ---
>>  drivers/pci/controller/pcie-iproc-msi.c | 18 ++++++++----------
>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git drivers/pci/controller/pcie-iproc-msi.c drivers/pci/controller/pcie-iproc-msi.c
>> index eede4e8f3f75..fa2734dd8482 100644
>> --- drivers/pci/controller/pcie-iproc-msi.c
>> +++ drivers/pci/controller/pcie-iproc-msi.c
>> @@ -252,18 +252,15 @@ static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
>>  
>>  	mutex_lock(&msi->bitmap_lock);
>>  
>> -	/* Allocate 'nr_cpus' number of MSI vectors each time */
>> -	hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
>> -					   msi->nr_cpus, 0);
>> -	if (hwirq < msi->nr_msi_vecs) {
>> -		bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
>> -	} else {
>> -		mutex_unlock(&msi->bitmap_lock);
>> -		return -ENOSPC;
>> -	}
>> +	/* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI vectors each time */
>> +	hwirq = bitmap_find_free_region(msi->bitmap, msi->nr_msi_vecs,
>> +					order_base_2(msi->nr_cpus * nr_irqs));
>>  
>>  	mutex_unlock(&msi->bitmap_lock);
>>  
>> +	if (hwirq < 0)
>> +		return -ENOSPC;
>> +
>>  	for (i = 0; i < nr_irqs; i++) {
>>  		irq_domain_set_info(domain, virq + i, hwirq + i,
>>  				    &iproc_msi_bottom_irq_chip,
>> @@ -284,7 +281,8 @@ static void iproc_msi_irq_domain_free(struct irq_domain *domain,
>>  	mutex_lock(&msi->bitmap_lock);
>>  
>>  	hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
>> -	bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
>> +	bitmap_release_region(msi->bitmap, hwirq,
>> +			      order_base_2(msi->nr_cpus * nr_irqs));
>>  
>>  	mutex_unlock(&msi->bitmap_lock);
>>  
> 
> This looks reasonable. However, this doesn't change the issue that you
> have with SMP systems and Multi-MSI. I'd like to see a more complete
> patch (disabling Multi-MSI on SMP, at the very least).
> 

Yeah, agree with you that we want to see this patch at least disables
multi-msi when it detects 'num_possible_cpus' > 1.


> Thanks,
> 
> 	M.
> 

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-06-03 16:59                   ` Ray Jui
@ 2021-06-04  9:17                     ` Sandor Bodo-Merle
  2021-06-04 16:47                       ` Ray Jui
  0 siblings, 1 reply; 20+ messages in thread
From: Sandor Bodo-Merle @ 2021-06-04  9:17 UTC (permalink / raw)
  To: Ray Jui
  Cc: Marc Zyngier, Pali Rohár, linux-pci, bcm-kernel-feedback-list

Would something like this work on top of the previous patch - or it
needs more checks,
like the "nr_irqs" iniproc_msi_irq_domain_alloc() ?

diff --git drivers/pci/controller/pcie-iproc-msi.c
drivers/pci/controller/pcie-iproc-msi.c
index eede4e8f3f75..49e9d1a761ff 100644
--- drivers/pci/controller/pcie-iproc-msi.c
+++ drivers/pci/controller/pcie-iproc-msi.c
@@ -171,7 +171,7 @@ static struct irq_chip iproc_msi_irq_chip = {

 static struct msi_domain_info iproc_msi_domain_info = {
        .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-               MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
+              MSI_FLAG_PCI_MSIX,
        .chip = &iproc_msi_irq_chip,
 };

@@ -539,6 +539,9 @@ int iproc_msi_init(struct iproc_pcie *pcie, struct
device_node *node)
        mutex_init(&msi->bitmap_lock);
        msi->nr_cpus = num_possible_cpus();

+       if (msi->nr_cpus == 1)
+               iproc_msi_domain_info.flags |=  MSI_FLAG_MULTI_PCI_MSI;
+
        msi->nr_irqs = of_irq_count(node);
        if (!msi->nr_irqs) {
                dev_err(pcie->dev, "found no MSI GIC interrupt\n");

On Thu, Jun 3, 2021 at 6:59 PM Ray Jui <ray.jui@broadcom.com> wrote:
>
>
>
> On 6/2/2021 1:34 AM, Marc Zyngier wrote:
> > On Wed, 26 May 2021 17:10:24 +0100,
> > Sandor Bodo-Merle <sbodomerle@gmail.com> wrote:
> >>
> >> [1  <text/plain; UTF-8 (7bit)>]
> >> The following patch addresses the allocation issue - but indeed - wont
> >> fix the atomicity of IRQ affinity in this driver (but the majority of
> >> our product relies on single core SOCs; we also use a dual-core SOC
> >> also - but we don't change the initial the IRQ affinity).
> >>
> >> On Wed, May 26, 2021 at 9:57 AM Marc Zyngier <maz@kernel.org> wrote:
> >>>
> >>> On Tue, 25 May 2021 18:27:54 +0100,
> >>> Ray Jui <ray.jui@broadcom.com> wrote:
> >>>>
> >>>> On 5/24/2021 3:37 AM, Marc Zyngier wrote:
> >>>>> On Thu, 20 May 2021 18:11:32 +0100,
> >>>>> Ray Jui <ray.jui@broadcom.com> wrote:
> >>>>>>
> >>>>>> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
> >>>
> >>> [...]
> >>>
> >>>>>> I guess I'm not too clear on what you mean by "multi-MSI interrupts
> >>>>>> needs to be aligned to number of requested interrupts.". Would you be
> >>>>>> able to plug this into the above explanation so we can have a more clear
> >>>>>> understanding of what you mean here?
> >>>>>
> >>>>> That's a generic PCI requirement: if you are providing a Multi-MSI
> >>>>> configuration, the base vector number has to be size-aligned
> >>>>> (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
> >>>>> supplies up to 5 bits that are orr-ed into the base vector number,
> >>>>> with a *single* doorbell address. You effectively provide a single MSI
> >>>>> number and a single address, and the device knows how to drive 2^n MSIs.
> >>>>>
> >>>>> This is different from MSI-X, which defines multiple individual
> >>>>> vectors, each with their own doorbell address.
> >>>>>
> >>>>> The main problem you have here (other than the broken allocation
> >>>>> mechanism) is that moving an interrupt from one core to another
> >>>>> implies moving the doorbell address to that of another MSI
> >>>>> group. This isn't possible for Multi-MSI, as all the MSIs must have
> >>>>> the same doorbell address. As far as I can see, there is no way to
> >>>>> support Multi-MSI together with affinity change on this HW, and you
> >>>>> should stop advertising support for this feature.
> >>>>>
> >>>>
> >>>> I was not aware of the fact that multi-MSI needs to use the same
> >>>> doorbell address (aka MSI posted write address?). Thank you for helping
> >>>> to point it out. In this case, yes, like you said, we cannot possibly
> >>>> support both multi-MSI and affinity at the same time, since supporting
> >>>> affinity requires us to move from one to another event queue (and irq)
> >>>> that will have different doorbell address.
> >>>>
> >>>> Do you think it makes sense to do the following by only advertising
> >>>> multi-MSI capability in the single CPU core case (detected runtime via
> >>>> 'num_possible_cpus')? This will at least allow multi-MSI to work in
> >>>> platforms with single CPU core that Sandor and Pali use?
> >>>
> >>> I don't think this makes much sense. Single-CPU machines are an oddity
> >>> these days, and I'd rather you simplify this (already pretty
> >>> complicated) driver.
> >>>
> >>>>> There is also a more general problem here, which is the atomicity of
> >>>>> the update on affinity change. If you are moving an interrupt from one
> >>>>> CPU to the other, it seems you change both the vector number and the
> >>>>> target address. If that is the case, this isn't atomic, and you may
> >>>>> end-up with the device generating a message based on a half-applied
> >>>>> update.
> >>>>
> >>>> Are you referring to the callback in 'irq_set_addinity" and
> >>>> 'irq_compose_msi_msg'? In such case, can you help to recommend a
> >>>> solution for it (or there's no solution based on such architecture)? It
> >>>> does not appear such atomy can be enforced from the irq framework level.
> >>>
> >>> irq_compose_msi_msg() is only one part of the problem. The core of the
> >>> issue is that the programming of the end-point is not atomic (you need
> >>> to update a 32bit payload *and* a 64bit address).
> >>>
> >>> A solution to workaround it would be to rework the way you allocate
> >>> the vectors, making them constant across all CPUs so that only the
> >>> address changes when changing the affinity.
> >>>
> >>> Thanks,
> >>>
> >>>         M.
> >>>
> >>> --
> >>> Without deviation from the norm, progress is not possible.
> >> [2 0001-PCI-iproc-fix-the-base-vector-number-allocation-for-.patch <text/x-diff; UTF-8 (base64)>]
> >> From df31c9c0333ca4922b7978b30719348e368bea3c Mon Sep 17 00:00:00 2001
> >> From: Sandor Bodo-Merle <sbodomerle@gmail.com>
> >> Date: Wed, 26 May 2021 17:48:16 +0200
> >> Subject: [PATCH] PCI: iproc: fix the base vector number allocation for Multi
> >>  MSI
> >> MIME-Version: 1.0
> >> Content-Type: text/plain; charset=UTF-8
> >> Content-Transfer-Encoding: 8bit
> >>
> >> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> >> failed to reserve the proper number of bits from the inner domain.
> >> Natural alignment of the base vector number was also not guaranteed.
> >>
> >> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
> >> Reported-by: Pali Rohár <pali@kernel.org>
> >> Signed-off-by: Sandor Bodo-Merle <sbodomerle@gmail.com>
> >> ---
> >>  drivers/pci/controller/pcie-iproc-msi.c | 18 ++++++++----------
> >>  1 file changed, 8 insertions(+), 10 deletions(-)
> >>
> >> diff --git drivers/pci/controller/pcie-iproc-msi.c drivers/pci/controller/pcie-iproc-msi.c
> >> index eede4e8f3f75..fa2734dd8482 100644
> >> --- drivers/pci/controller/pcie-iproc-msi.c
> >> +++ drivers/pci/controller/pcie-iproc-msi.c
> >> @@ -252,18 +252,15 @@ static int iproc_msi_irq_domain_alloc(struct irq_domain *domain,
> >>
> >>      mutex_lock(&msi->bitmap_lock);
> >>
> >> -    /* Allocate 'nr_cpus' number of MSI vectors each time */
> >> -    hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
> >> -                                       msi->nr_cpus, 0);
> >> -    if (hwirq < msi->nr_msi_vecs) {
> >> -            bitmap_set(msi->bitmap, hwirq, msi->nr_cpus);
> >> -    } else {
> >> -            mutex_unlock(&msi->bitmap_lock);
> >> -            return -ENOSPC;
> >> -    }
> >> +    /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI vectors each time */
> >> +    hwirq = bitmap_find_free_region(msi->bitmap, msi->nr_msi_vecs,
> >> +                                    order_base_2(msi->nr_cpus * nr_irqs));
> >>
> >>      mutex_unlock(&msi->bitmap_lock);
> >>
> >> +    if (hwirq < 0)
> >> +            return -ENOSPC;
> >> +
> >>      for (i = 0; i < nr_irqs; i++) {
> >>              irq_domain_set_info(domain, virq + i, hwirq + i,
> >>                                  &iproc_msi_bottom_irq_chip,
> >> @@ -284,7 +281,8 @@ static void iproc_msi_irq_domain_free(struct irq_domain *domain,
> >>      mutex_lock(&msi->bitmap_lock);
> >>
> >>      hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq);
> >> -    bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus);
> >> +    bitmap_release_region(msi->bitmap, hwirq,
> >> +                          order_base_2(msi->nr_cpus * nr_irqs));
> >>
> >>      mutex_unlock(&msi->bitmap_lock);
> >>
> >
> > This looks reasonable. However, this doesn't change the issue that you
> > have with SMP systems and Multi-MSI. I'd like to see a more complete
> > patch (disabling Multi-MSI on SMP, at the very least).
> >
>
> Yeah, agree with you that we want to see this patch at least disables
> multi-msi when it detects 'num_possible_cpus' > 1.
>
>
> > Thanks,
> >
> >       M.
> >

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: pcie-iproc-msi.c: Bug in Multi-MSI support?
  2021-06-04  9:17                     ` Sandor Bodo-Merle
@ 2021-06-04 16:47                       ` Ray Jui
  0 siblings, 0 replies; 20+ messages in thread
From: Ray Jui @ 2021-06-04 16:47 UTC (permalink / raw)
  To: Sandor Bodo-Merle
  Cc: Marc Zyngier, Pali Rohár, linux-pci, bcm-kernel-feedback-list

[-- Attachment #1: Type: text/plain, Size: 1354 bytes --]



On 6/4/2021 2:17 AM, Sandor Bodo-Merle wrote:
> Would something like this work on top of the previous patch - or it
> needs more checks,
> like the "nr_irqs" iniproc_msi_irq_domain_alloc() ?
> 
> diff --git drivers/pci/controller/pcie-iproc-msi.c
> drivers/pci/controller/pcie-iproc-msi.c
> index eede4e8f3f75..49e9d1a761ff 100644
> --- drivers/pci/controller/pcie-iproc-msi.c
> +++ drivers/pci/controller/pcie-iproc-msi.c
> @@ -171,7 +171,7 @@ static struct irq_chip iproc_msi_irq_chip = {
> 
>  static struct msi_domain_info iproc_msi_domain_info = {
>         .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
> -               MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
> +              MSI_FLAG_PCI_MSIX,
>         .chip = &iproc_msi_irq_chip,
>  };
> 
> @@ -539,6 +539,9 @@ int iproc_msi_init(struct iproc_pcie *pcie, struct
> device_node *node)
>         mutex_init(&msi->bitmap_lock);
>         msi->nr_cpus = num_possible_cpus();
> 
> +       if (msi->nr_cpus == 1)
> +               iproc_msi_domain_info.flags |=  MSI_FLAG_MULTI_PCI_MSI;
> +

That looks right to me. Please also add comments to explain we cannot
support multi-msi and msi affinity at the same time.

Thanks.

>         msi->nr_irqs = of_irq_count(node);
>         if (!msi->nr_irqs) {
>                 dev_err(pcie->dev, "found no MSI GIC interrupt\n");
> 


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-06-04 16:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-20 12:00 pcie-iproc-msi.c: Bug in Multi-MSI support? Pali Rohár
2021-05-20 13:47 ` Sandor Bodo-Merle
2021-05-20 14:05   ` Pali Rohár
2021-05-20 14:22     ` Sandor Bodo-Merle
2021-05-20 17:11       ` Ray Jui
2021-05-20 19:31         ` Sandor Bodo-Merle
2021-05-21 11:39         ` Sandor Bodo-Merle
2021-05-21 14:00           ` Sandor Bodo-Merle
2021-05-24 10:37         ` Marc Zyngier
2021-05-25 17:27           ` Ray Jui
2021-05-25 17:35             ` Pali Rohár
2021-05-25 17:39               ` Ray Jui
2021-05-26  7:57             ` Marc Zyngier
2021-05-26 16:10               ` Sandor Bodo-Merle
2021-06-02  8:34                 ` Marc Zyngier
2021-06-03 16:59                   ` Ray Jui
2021-06-04  9:17                     ` Sandor Bodo-Merle
2021-06-04 16:47                       ` Ray Jui
2021-05-26 17:34               ` Ray Jui
2021-05-26 20:18                 ` Sandor Bodo-Merle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).