From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66BB3C4708F for ; Wed, 2 Jun 2021 08:34:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 45EA8613D0 for ; Wed, 2 Jun 2021 08:34:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232675AbhFBIf5 convert rfc822-to-8bit (ORCPT ); Wed, 2 Jun 2021 04:35:57 -0400 Received: from mail.kernel.org ([198.145.29.99]:40064 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232654AbhFBIf4 (ORCPT ); Wed, 2 Jun 2021 04:35:56 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1329D613D0; Wed, 2 Jun 2021 08:34:14 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1loMKJ-004yu9-VL; Wed, 02 Jun 2021 09:34:12 +0100 Date: Wed, 02 Jun 2021 09:34:11 +0100 Message-ID: <87bl8o1x8c.wl-maz@kernel.org> From: Marc Zyngier To: Sandor Bodo-Merle Cc: Ray Jui , Pali =?UTF-8?B?Um9ow6Fy?= , linux-pci@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com Subject: Re: pcie-iproc-msi.c: Bug in Multi-MSI support? In-Reply-To: References: <20210520120055.jl7vkqanv7wzeipq@pali> <20210520140529.rczoz3npjoadzfqc@pali> <4e972ecb-43df-639f-052d-8d1518bae9c0@broadcom.com> <87pmxgwh7o.wl-maz@kernel.org> <13a7e409-646d-40a7-17a0-4e4be011efb2@broadcom.com> <874keqvsf2.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: sbodomerle@gmail.com, ray.jui@broadcom.com, pali@kernel.org, linux-pci@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Wed, 26 May 2021 17:10:24 +0100, Sandor Bodo-Merle wrote: > > [1 ] > The following patch addresses the allocation issue - but indeed - wont > fix the atomicity of IRQ affinity in this driver (but the majority of > our product relies on single core SOCs; we also use a dual-core SOC > also - but we don't change the initial the IRQ affinity). > > On Wed, May 26, 2021 at 9:57 AM Marc Zyngier wrote: > > > > On Tue, 25 May 2021 18:27:54 +0100, > > Ray Jui wrote: > > > > > > On 5/24/2021 3:37 AM, Marc Zyngier wrote: > > > > On Thu, 20 May 2021 18:11:32 +0100, > > > > Ray Jui wrote: > > > >> > > > >> On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote: > > > > [...] > > > > > >> I guess I'm not too clear on what you mean by "multi-MSI interrupts > > > >> needs to be aligned to number of requested interrupts.". Would you be > > > >> able to plug this into the above explanation so we can have a more clear > > > >> understanding of what you mean here? > > > > > > > > That's a generic PCI requirement: if you are providing a Multi-MSI > > > > configuration, the base vector number has to be size-aligned > > > > (2-aligned for 2 MSIs, 4 aligned for 4, up to 32), and the end-point > > > > supplies up to 5 bits that are orr-ed into the base vector number, > > > > with a *single* doorbell address. You effectively provide a single MSI > > > > number and a single address, and the device knows how to drive 2^n MSIs. > > > > > > > > This is different from MSI-X, which defines multiple individual > > > > vectors, each with their own doorbell address. > > > > > > > > The main problem you have here (other than the broken allocation > > > > mechanism) is that moving an interrupt from one core to another > > > > implies moving the doorbell address to that of another MSI > > > > group. This isn't possible for Multi-MSI, as all the MSIs must have > > > > the same doorbell address. As far as I can see, there is no way to > > > > support Multi-MSI together with affinity change on this HW, and you > > > > should stop advertising support for this feature. > > > > > > > > > > I was not aware of the fact that multi-MSI needs to use the same > > > doorbell address (aka MSI posted write address?). Thank you for helping > > > to point it out. In this case, yes, like you said, we cannot possibly > > > support both multi-MSI and affinity at the same time, since supporting > > > affinity requires us to move from one to another event queue (and irq) > > > that will have different doorbell address. > > > > > > Do you think it makes sense to do the following by only advertising > > > multi-MSI capability in the single CPU core case (detected runtime via > > > 'num_possible_cpus')? This will at least allow multi-MSI to work in > > > platforms with single CPU core that Sandor and Pali use? > > > > I don't think this makes much sense. Single-CPU machines are an oddity > > these days, and I'd rather you simplify this (already pretty > > complicated) driver. > > > > > > There is also a more general problem here, which is the atomicity of > > > > the update on affinity change. If you are moving an interrupt from one > > > > CPU to the other, it seems you change both the vector number and the > > > > target address. If that is the case, this isn't atomic, and you may > > > > end-up with the device generating a message based on a half-applied > > > > update. > > > > > > Are you referring to the callback in 'irq_set_addinity" and > > > 'irq_compose_msi_msg'? In such case, can you help to recommend a > > > solution for it (or there's no solution based on such architecture)? It > > > does not appear such atomy can be enforced from the irq framework level. > > > > irq_compose_msi_msg() is only one part of the problem. The core of the > > issue is that the programming of the end-point is not atomic (you need > > to update a 32bit payload *and* a 64bit address). > > > > A solution to workaround it would be to rework the way you allocate > > the vectors, making them constant across all CPUs so that only the > > address changes when changing the affinity. > > > > Thanks, > > > > M. > > > > -- > > Without deviation from the norm, progress is not possible. > [2 0001-PCI-iproc-fix-the-base-vector-number-allocation-for-.patch ] > From df31c9c0333ca4922b7978b30719348e368bea3c Mon Sep 17 00:00:00 2001 > From: Sandor Bodo-Merle > Date: Wed, 26 May 2021 17:48:16 +0200 > Subject: [PATCH] PCI: iproc: fix the base vector number allocation for Multi > MSI > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") > failed to reserve the proper number of bits from the inner domain. > Natural alignment of the base vector number was also not guaranteed. > > Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") > Reported-by: Pali Rohár > Signed-off-by: Sandor Bodo-Merle > --- > drivers/pci/controller/pcie-iproc-msi.c | 18 ++++++++---------- > 1 file changed, 8 insertions(+), 10 deletions(-) > > diff --git drivers/pci/controller/pcie-iproc-msi.c drivers/pci/controller/pcie-iproc-msi.c > index eede4e8f3f75..fa2734dd8482 100644 > --- drivers/pci/controller/pcie-iproc-msi.c > +++ drivers/pci/controller/pcie-iproc-msi.c > @@ -252,18 +252,15 @@ static int iproc_msi_irq_domain_alloc(struct irq_domain *domain, > > mutex_lock(&msi->bitmap_lock); > > - /* Allocate 'nr_cpus' number of MSI vectors each time */ > - hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0, > - msi->nr_cpus, 0); > - if (hwirq < msi->nr_msi_vecs) { > - bitmap_set(msi->bitmap, hwirq, msi->nr_cpus); > - } else { > - mutex_unlock(&msi->bitmap_lock); > - return -ENOSPC; > - } > + /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI vectors each time */ > + hwirq = bitmap_find_free_region(msi->bitmap, msi->nr_msi_vecs, > + order_base_2(msi->nr_cpus * nr_irqs)); > > mutex_unlock(&msi->bitmap_lock); > > + if (hwirq < 0) > + return -ENOSPC; > + > for (i = 0; i < nr_irqs; i++) { > irq_domain_set_info(domain, virq + i, hwirq + i, > &iproc_msi_bottom_irq_chip, > @@ -284,7 +281,8 @@ static void iproc_msi_irq_domain_free(struct irq_domain *domain, > mutex_lock(&msi->bitmap_lock); > > hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq); > - bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus); > + bitmap_release_region(msi->bitmap, hwirq, > + order_base_2(msi->nr_cpus * nr_irqs)); > > mutex_unlock(&msi->bitmap_lock); > This looks reasonable. However, this doesn't change the issue that you have with SMP systems and Multi-MSI. I'd like to see a more complete patch (disabling Multi-MSI on SMP, at the very least). Thanks, M. -- Without deviation from the norm, progress is not possible.