On 5/24/2021 3:37 AM, Marc Zyngier wrote:
> On Thu, 20 May 2021 18:11:32 +0100,
> Ray Jui <ray.jui@broadcom.com> wrote:
>>
>> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote:
>>> On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@kernel.org> wrote:
>>>>
>>>> Hello!
>>>>
>>>> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote:
>>>>> Hi Pali,
>>>>>
>>>>> thanks for catching this - i dig up the followup fixup commit we have
>>>>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately
>>>>> we missed upstreaming it).
>>>>>
>>>>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>>>>> failed to reserve the proper number of bits from the inner domain.
>>>>> We need to allocate the proper amount of bits otherwise the domains for
>>>>> multiple PCIe endpoints may overlap and freeing one of them will result
>>>>> in freeing unrelated MSI vectors.
>>>>>
>>>>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs")
>>>>> ---
>>>>>  drivers/pci/host/pcie-iproc-msi.c | 8 ++++----
>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c
>>>>> index 708fdb1065f8..a00492dccb74 100644
>>>>> --- drivers/pci/host/pcie-iproc-msi.c
>>>>> +++ drivers/pci/host/pcie-iproc-msi.c
>>>>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct
>>>>> irq_domain *domain,
>>>>>
>>>>>         mutex_lock(&msi->bitmap_lock);
>>>>>
>>>>> -       /* Allocate 'nr_cpus' number of MSI vectors each time */
>>>>> +       /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI
>>>>> vectors each time */
>>>>>         hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0,
>>>>> -                                          msi->nr_cpus, 0);
>>>>> +                                          msi->nr_cpus * nr_irqs, 0);
>>>>
>>>> I'm not sure if this construction is correct. Multi-MSI interrupts needs
>>>> to be aligned to number of requested interrupts. So if wifi driver asks
>>>> for 32 Multi-MSI interrupts then first allocated interrupt number must
>>>> be dividable by 32.
>>>>
>>>
>>> Ahh - i guess you are right. In our internal engineering we always
>>> request 32 vectors.
>>> IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly.
>>>
>>
>> May I ask which platforms are you guys running this driver on? Cygnus or
>> Northstar? Not that it matters, but just out of curiosity.
>>
>> Let me start by explaining how MSI support works in this driver, or more
>> precisely, for all platforms that support this iProc based event queue
>> MSI scheme:
>>
>> In iProc PCIe core, each MSI group is serviced by a GIC interrupt
>> (hwirq) and a dedicated event queue (event queue is paired up with
>> hwirq).  Each MSI group can support up to 64 MSI vectors. Note 64 is the
>> depth of the event queue.
>>
>> The number of MSI groups varies between different iProc SoCs. The total
>> number of CPU cores also varies. To support MSI IRQ affinity, we
>> distribute GIC interrupts across all available CPUs.  MSI vector is
>> moved from one GIC interrupt to another to steer to the target CPU.
>>
>> Assuming:
>> The number of MSI groups (the number of total hwirq for this PCIe
>> controller) is M
>> The number of CPU cores is N
>> M is always a multiple of N (we ensured that in the setup function)
>>
>> Therefore:
>> Total number of raw MSI vectors = M * 64
>> Total number of supported MSI vectors = (M * 64) / N
>>
>> I guess I'm not too clear on what you mean by "multi-MSI interrupts
>> needs to be aligned to number of requested interrupts.". Would you be
>> able to plug this into the above explanation so we can have a more clear
>> understanding of what you mean here?
> 
> That's a generic PCI requirement: if you are providing a Multi-MSI
> configuration, the base vector number has to be size-aligned
> (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point
> supplies up to 5 bits that are orr-ed into the base vector number,
> with a *single* doorbell address. You effectively provide a single MSI
> number and a single address, and the device knows how to drive 2^n MSIs.
> 
> This is different from MSI-X, which defines multiple individual
> vectors, each with their own doorbell address.
> 
> The main problem you have here (other than the broken allocation
> mechanism) is that moving an interrupt from one core to another
> implies moving the doorbell address to that of another MSI
> group. This isn't possible for Multi-MSI, as all the MSIs must have
> the same doorbell address. As far as I can see, there is no way to
> support Multi-MSI together with affinity change on this HW, and you
> should stop advertising support for this feature.
> 

I was not aware of the fact that multi-MSI needs to use the same
doorbell address (aka MSI posted write address?). Thank you for helping
to point it out. In this case, yes, like you said, we cannot possibly
support both multi-MSI and affinity at the same time, since supporting
affinity requires us to move from one to another event queue (and irq)
that will have different doorbell address.

Do you think it makes sense to do the following by only advertising
multi-MSI capability in the single CPU core case (detected runtime via
'num_possible_cpus')? This will at least allow multi-MSI to work in
platforms with single CPU core that Sandor and Pali use?

> There is also a more general problem here, which is the atomicity of
> the update on affinity change. If you are moving an interrupt from one
> CPU to the other, it seems you change both the vector number and the
> target address. If that is the case, this isn't atomic, and you may
> end-up with the device generating a message based on a half-applied
> update.

Are you referring to the callback in 'irq_set_addinity" and
'irq_compose_msi_msg'? In such case, can you help to recommend a
solution for it (or there's no solution based on such architecture)? It
does not appear such atomy can be enforced from the irq framework level.

Thanks,

Ray

> 
> Thanks,
> 
> 	M.
>