From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23516C04FF3 for ; Mon, 24 May 2021 10:37:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E30216109F for ; Mon, 24 May 2021 10:37:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232640AbhEXKiw convert rfc822-to-8bit (ORCPT ); Mon, 24 May 2021 06:38:52 -0400 Received: from mail.kernel.org ([198.145.29.99]:50092 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232642AbhEXKiq (ORCPT ); Mon, 24 May 2021 06:38:46 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 41270610A5; Mon, 24 May 2021 10:37:18 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ll7xT-003BrG-W2; Mon, 24 May 2021 11:37:16 +0100 Date: Mon, 24 May 2021 11:37:15 +0100 Message-ID: <87pmxgwh7o.wl-maz@kernel.org> From: Marc Zyngier To: Ray Jui Cc: Sandor Bodo-Merle , Pali =?UTF-8?B?Um9ow6Fy?= , linux-pci@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com Subject: Re: pcie-iproc-msi.c: Bug in Multi-MSI support? In-Reply-To: <4e972ecb-43df-639f-052d-8d1518bae9c0@broadcom.com> References: <20210520120055.jl7vkqanv7wzeipq@pali> <20210520140529.rczoz3npjoadzfqc@pali> <4e972ecb-43df-639f-052d-8d1518bae9c0@broadcom.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: ray.jui@broadcom.com, sbodomerle@gmail.com, pali@kernel.org, linux-pci@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Thu, 20 May 2021 18:11:32 +0100, Ray Jui wrote: > > On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote: > > On Thu, May 20, 2021 at 4:05 PM Pali Rohár wrote: > >> > >> Hello! > >> > >> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote: > >>> Hi Pali, > >>> > >>> thanks for catching this - i dig up the followup fixup commit we have > >>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately > >>> we missed upstreaming it). > >>> > >>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") > >>> failed to reserve the proper number of bits from the inner domain. > >>> We need to allocate the proper amount of bits otherwise the domains for > >>> multiple PCIe endpoints may overlap and freeing one of them will result > >>> in freeing unrelated MSI vectors. > >>> > >>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") > >>> --- > >>> drivers/pci/host/pcie-iproc-msi.c | 8 ++++---- > >>> 1 file changed, 4 insertions(+), 4 deletions(-) > >>> > >>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c > >>> index 708fdb1065f8..a00492dccb74 100644 > >>> --- drivers/pci/host/pcie-iproc-msi.c > >>> +++ drivers/pci/host/pcie-iproc-msi.c > >>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct > >>> irq_domain *domain, > >>> > >>> mutex_lock(&msi->bitmap_lock); > >>> > >>> - /* Allocate 'nr_cpus' number of MSI vectors each time */ > >>> + /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI > >>> vectors each time */ > >>> hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0, > >>> - msi->nr_cpus, 0); > >>> + msi->nr_cpus * nr_irqs, 0); > >> > >> I'm not sure if this construction is correct. Multi-MSI interrupts needs > >> to be aligned to number of requested interrupts. So if wifi driver asks > >> for 32 Multi-MSI interrupts then first allocated interrupt number must > >> be dividable by 32. > >> > > > > Ahh - i guess you are right. In our internal engineering we always > > request 32 vectors. > > IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly. > > > > May I ask which platforms are you guys running this driver on? Cygnus or > Northstar? Not that it matters, but just out of curiosity. > > Let me start by explaining how MSI support works in this driver, or more > precisely, for all platforms that support this iProc based event queue > MSI scheme: > > In iProc PCIe core, each MSI group is serviced by a GIC interrupt > (hwirq) and a dedicated event queue (event queue is paired up with > hwirq). Each MSI group can support up to 64 MSI vectors. Note 64 is the > depth of the event queue. > > The number of MSI groups varies between different iProc SoCs. The total > number of CPU cores also varies. To support MSI IRQ affinity, we > distribute GIC interrupts across all available CPUs. MSI vector is > moved from one GIC interrupt to another to steer to the target CPU. > > Assuming: > The number of MSI groups (the number of total hwirq for this PCIe > controller) is M > The number of CPU cores is N > M is always a multiple of N (we ensured that in the setup function) > > Therefore: > Total number of raw MSI vectors = M * 64 > Total number of supported MSI vectors = (M * 64) / N > > I guess I'm not too clear on what you mean by "multi-MSI interrupts > needs to be aligned to number of requested interrupts.". Would you be > able to plug this into the above explanation so we can have a more clear > understanding of what you mean here? That's a generic PCI requirement: if you are providing a Multi-MSI configuration, the base vector number has to be size-aligned (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point supplies up to 5 bits that are orr-ed into the base vector number, with a *single* doorbell address. You effectively provide a single MSI number and a single address, and the device knows how to drive 2^n MSIs. This is different from MSI-X, which defines multiple individual vectors, each with their own doorbell address. The main problem you have here (other than the broken allocation mechanism) is that moving an interrupt from one core to another implies moving the doorbell address to that of another MSI group. This isn't possible for Multi-MSI, as all the MSIs must have the same doorbell address. As far as I can see, there is no way to support Multi-MSI together with affinity change on this HW, and you should stop advertising support for this feature. There is also a more general problem here, which is the atomicity of the update on affinity change. If you are moving an interrupt from one CPU to the other, it seems you change both the vector number and the target address. If that is the case, this isn't atomic, and you may end-up with the device generating a message based on a half-applied update. Thanks, M. -- Without deviation from the norm, progress is not possible.