From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Edworthy Date: Fri, 13 Nov 2015 09:36:48 +0000 Subject: RE: [PATCH] PCI: pcie-rcar: Fix OF node passed to MSI irq domain Message-Id: List-Id: References: <1446542899-25137-1-git-send-email-phil.edworthy@renesas.com> <20151109161115.GA13870@ulmo.nvidia.com> <20151110155232.GA25368@ulmo.nvidia.com> <20151111163802.3a96080c@arm.com> <20151112203100.2e91da2a@arm.com> In-Reply-To: <20151112203100.2e91da2a@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: Marc Zyngier Cc: Thierry Reding , Bjorn Helgaas , Wolfram Sang , Geert Uytterhoeven , Simon Horman , "linux-pci@vger.kernel.org" , "linux-sh@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Ley Foon Tan , Jingoo Han Hi Marc, On 12 November 2015 20:31, Marc Zyngier wrote: > Phil Edworthy wrote: > > On 11 November 2015 16:38, Marc Zyngier wrote: > > > On Tue, 10 Nov 2015 16:52:33 +0100 > > > Thierry Reding wrote: > > > > > > > On Mon, Nov 09, 2015 at 06:01:49PM +0000, Phil Edworthy wrote: > > > > > Hi Thierry, > > > > > > > > > > On 09 November 2015 17:24, Phil wrote: > > > > > > On 09 November 2015 16:11, Thierry wrote: > > > > > > > On Mon, Nov 09, 2015 at 03:20:24PM +0000, Phil Edworthy wrote: > > > > > > > > cc'ing others (Tegra, Altera, Designware) who may have the = same > bug > > > > > > > > > > > > > > > > On 03 November 2015 09:28, Phil Edworthy wrote: > > > > > > > > > The OF node passed to irq_domain_add_linear() should be a > > > > > > > > > pointer to interrupt controller's device tree node, or NU= LL, > > > > > > > > > but not the PCI controller's node. > > > > > > > > > > > > > > > > > > This fixes an oops in msi_domain_alloc_irqs() when it tri= es > > > > > > > > > to call msi_check(). > > > > > > > > > > > > > > > > > > Signed-off-by: Phil Edworthy > > > > > > > > > --- > > > > > > > > > drivers/pci/host/pcie-rcar.c | 2 +- > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/h= ost/pcie- > rcar.c > > > > > > > > > index 2377bf0..c6fa562 100644 > > > > > > > > > --- a/drivers/pci/host/pcie-rcar.c > > > > > > > > > +++ b/drivers/pci/host/pcie-rcar.c > > > > > > > > > @@ -709,7 +709,7 @@ static int rcar_pcie_enable_msi(struct > rcar_pcie > > > > > > *pcie) > > > > > > > > > msi->chip.setup_irq =3D rcar_msi_setup_irq; > > > > > > > > > msi->chip.teardown_irq =3D rcar_msi_teardown_irq; > > > > > > > > > > > > > > > > > > - msi->domain =3D irq_domain_add_linear(pcie->dev->of_nod= e, > > > > > > > > > INT_PCI_MSI_NR, > > > > > > > > > + msi->domain =3D irq_domain_add_linear(NULL, > INT_PCI_MSI_NR, > > > > > > > > > &msi_domain_ops, &msi- > > > >chip); > > > > > > > > > if (!msi->domain) { > > > > > > > > > dev_err(&pdev->dev, "failed to create IRQ domain\n"); > > > > > > > > > > > > > > On Tegra the PCI controller is in fact the interrupt controll= er for > > > > > > > MSIs. And looking at the code here it seems like the same wou= ld apply > to > > > > > > > RCAR. > > > > > > Yes you are correct here. > > > > > > > > > > > > > I'm also slightly confused as to why this would cause ->msi_c= heck() to > > > > > > > fail. The default implementation (msi_domain_ops_check()) doe= sn't > do > > > > > > > anything. > > > > > > > > > > > > > > Also, how is passing in NULL instead of a valid struct device= _node * > > > > > > > going to prevent an oops? Perhaps this is one of those refere= nce > count > > > > > > > imbalance bugs that have recently been showing up? > > > > > > On arm64 (previously I didn't realise this just affects arm64, = not arm), > > > > > > the changes in commit f075915ac0b11 ("PCI/MSI: Drop domain field > from > > > > > > msi_controller") and d8a1cb757550 ("PCI/MSI: Let pci_msi_get_do= main > use > > > > > > struct device::msi_domain") return an uninitialized msi domain = that > leads > > > > > > to the oops. It appears that these changes assume that msi inte= rrupt > > > > > > controller is separate from the PCI controller. > > > > > More accurately, when CONFIG_GENERIC_MSI_IRQ_DOMAIN is enabled, > > > > > pci_msi_get_domain() calls dev_get_msi_domain() and at this point > > > > > dev->msi_domain is uninitialized. > > > > > > > > Marc, any idea what's going on here? > > > > > > Thanks for putting me in the loop. > > > > > > No precise idea yet, but the proposed fix definitely looks like the > > > wrong one. Actually, not passing a node identifier to any domain > > > constructor is pretty much always a mistake when using DT. > > > > > > Can someone post a stack trace for this issue so that I can have a > > > look? I'm currently traveling, so expect a slightly delayed reply... > > > > Unfortunately, not all the code for this arm64 board is upstream > > yet, this code base is off 4.3-rc7. >=20 > Oh, this is arm64? Well, you're not supposed to use the old > msi_controller stuff on arm64 - I really want all arm64 controllers to > be converted to generic MSI domains. Please have a look at the xgene > code, for example. Oh right, I wasn't aware of that. I had hoped that drivers weren't so arch specific... =20 > But irrespective of that, I share Thierry's skepticism: >=20 > > systemd-udevd[1315]: undefined instruction: pc=FFffffc03106d41c > > Code: ffffffc0 311f9740 ffffffc0 3106d138 (ffffffc0) > > Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP > > Modules linked in: e1000e(+) > > CPU: 0 PID: 1315 Comm: systemd-udevd Not tainted 4.3.0-rc7+ #4 > > Hardware name: Renesas Salvator-X board based on r8a7795 (DT) > > task: ffffffc0307af080 ti: ffffffc030ecc000 task.ti: ffffffc030ecc000 > > PC is at 0xffffffc03106d41c >=20 > You are clearly jumping to nowhereland, and I doubt this is related to > the domain of_node being set. Are you overriding arch_setup_msi_irq one > way or another? No, I'm not overriding arch_setup_msi_irq at all. Since the stack trace doesn't help that much I added some tracing: pci_msi_setup_msi_irqs() calls pci_msi_get_domain() calls dev_get_msi_domain(), gets a non-NULL domain. pci_msi_setup_msi_irqs() calls pci_msi_domain_alloc_irqs() calls msi_domain_alloc_irqs() msi_domain_alloc_irqs:273: ops=FFffffc03193a810 msi_domain_alloc_irqs:274: ops->msi_check=FFffffc031161418 systemd-udevd[1311]: undefined instruction: pc=FFffffc03116141c That looks to me as though msi_check is off pointing to the weeds. By passing a NULL domain into irq_domain_add_linear() you get: pci_msi_setup_msi_irqs() calls pci_msi_get_domain() calls dev_get_msi_domain(), gets a NULL domain. calls arch_setup_msi_irq() All ok then. Thanks for your help, Phil From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754334AbbKMJhA (ORCPT ); Fri, 13 Nov 2015 04:37:00 -0500 Received: from relmlor4.renesas.com ([210.160.252.174]:10215 "EHLO relmlie3.idc.renesas.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753787AbbKMJgx convert rfc822-to-8bit (ORCPT ); Fri, 13 Nov 2015 04:36:53 -0500 X-IronPort-AV: E=Sophos;i="5.20,286,1444662000"; d="scan'208";a="198479850" From: Phil Edworthy To: Marc Zyngier CC: Thierry Reding , Bjorn Helgaas , Wolfram Sang , Geert Uytterhoeven , Simon Horman , "linux-pci@vger.kernel.org" , "linux-sh@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Ley Foon Tan , Jingoo Han Subject: RE: [PATCH] PCI: pcie-rcar: Fix OF node passed to MSI irq domain Thread-Topic: [PATCH] PCI: pcie-rcar: Fix OF node passed to MSI irq domain Thread-Index: AQHRFhpazS7O4def1E+1x+YbvdM7WJ6T04kggAASxwCAABC7IIAADPxggAFvZICAAZ8KAIABBYmAgADN4gCAANc20A== Date: Fri, 13 Nov 2015 09:36:48 +0000 Message-ID: References: <1446542899-25137-1-git-send-email-phil.edworthy@renesas.com> <20151109161115.GA13870@ulmo.nvidia.com> <20151110155232.GA25368@ulmo.nvidia.com> <20151111163802.3a96080c@arm.com> <20151112203100.2e91da2a@arm.com> In-Reply-To: <20151112203100.2e91da2a@arm.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=phil.edworthy@renesas.com; x-originating-ip: [193.141.220.21] x-microsoft-exchange-diagnostics: 1;PS1PR06MB1179;5:ppdgCTarmhPkn+uPPBDe27rAr6yZJxwRQm61RSeBUfs9Xb57aClx98a8VK7tEbDF4Hpe+BexNnurxGXfvPrWmWipfxFzWo4xwflMCk6fNF0eDsq2VpIDERAwltTevllvLeIu+lJ1JLVxHgzlxlDfOg==;24:kEc8mP085mwTsHTZkIrbgF34YqMGoNFqufi/DVhevtqmyzXOsWtG4B4U6n/V6wdhRMOPuIMPEuCa2zcK2jqYH3zUbDIibvEmTjEiQjjDViA=;20:zC5P5N3oaNKsX+CJnMthCy3GtXPNfDKnFMWEsrToiIFnXUoF7xtnYXA4b2b072EGo2Xp9VEGnIX1AzuOGGkO/jdyjQwzHphiWs29ReUW1a41ncF2MhpXvROAcq47YI5DCjdazjtG2RBlEya/WwKG3Fdm+EXUES7GK9czrwrDDMM= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:PS1PR06MB1179; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(85106069007906); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(520078)(8121501046)(10201501046)(3002001);SRVR:PS1PR06MB1179;BCL:0;PCL:0;RULEID:;SRVR:PS1PR06MB1179; x-forefront-prvs: 0759F7A50A x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(52604005)(199003)(40764003)(24454002)(189002)(74316001)(122556002)(92566002)(102836002)(5001920100001)(76576001)(106116001)(97736004)(5004730100002)(5001960100002)(2950100001)(86362001)(106356001)(40100003)(101416001)(5008740100001)(81156007)(77096005)(105586002)(575784001)(110136002)(19580405001)(33656002)(54356999)(10400500002)(50986999)(2900100001)(5003600100002)(87936001)(19580395003)(5002640100001)(76176999)(5007970100001)(66066001)(93886004)(189998001)(7059030);DIR:OUT;SFP:1102;SCL:1;SRVR:PS1PR06MB1179;H:PS1PR06MB1180.apcprd06.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: renesas.com X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Nov 2015 09:36:48.0899 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 53d82571-da19-47e4-9cb4-625a166a4a2a X-MS-Exchange-Transport-CrossTenantHeadersStamped: PS1PR06MB1179 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marc, On 12 November 2015 20:31, Marc Zyngier wrote: > Phil Edworthy wrote: > > On 11 November 2015 16:38, Marc Zyngier wrote: > > > On Tue, 10 Nov 2015 16:52:33 +0100 > > > Thierry Reding wrote: > > > > > > > On Mon, Nov 09, 2015 at 06:01:49PM +0000, Phil Edworthy wrote: > > > > > Hi Thierry, > > > > > > > > > > On 09 November 2015 17:24, Phil wrote: > > > > > > On 09 November 2015 16:11, Thierry wrote: > > > > > > > On Mon, Nov 09, 2015 at 03:20:24PM +0000, Phil Edworthy wrote: > > > > > > > > cc'ing others (Tegra, Altera, Designware) who may have the same > bug > > > > > > > > > > > > > > > > On 03 November 2015 09:28, Phil Edworthy wrote: > > > > > > > > > The OF node passed to irq_domain_add_linear() should be a > > > > > > > > > pointer to interrupt controller's device tree node, or NULL, > > > > > > > > > but not the PCI controller's node. > > > > > > > > > > > > > > > > > > This fixes an oops in msi_domain_alloc_irqs() when it tries > > > > > > > > > to call msi_check(). > > > > > > > > > > > > > > > > > > Signed-off-by: Phil Edworthy > > > > > > > > > --- > > > > > > > > > drivers/pci/host/pcie-rcar.c | 2 +- > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/host/pcie- > rcar.c > > > > > > > > > index 2377bf0..c6fa562 100644 > > > > > > > > > --- a/drivers/pci/host/pcie-rcar.c > > > > > > > > > +++ b/drivers/pci/host/pcie-rcar.c > > > > > > > > > @@ -709,7 +709,7 @@ static int rcar_pcie_enable_msi(struct > rcar_pcie > > > > > > *pcie) > > > > > > > > > msi->chip.setup_irq = rcar_msi_setup_irq; > > > > > > > > > msi->chip.teardown_irq = rcar_msi_teardown_irq; > > > > > > > > > > > > > > > > > > - msi->domain = irq_domain_add_linear(pcie->dev->of_node, > > > > > > > > > INT_PCI_MSI_NR, > > > > > > > > > + msi->domain = irq_domain_add_linear(NULL, > INT_PCI_MSI_NR, > > > > > > > > > &msi_domain_ops, &msi- > > > >chip); > > > > > > > > > if (!msi->domain) { > > > > > > > > > dev_err(&pdev->dev, "failed to create IRQ domain\n"); > > > > > > > > > > > > > > On Tegra the PCI controller is in fact the interrupt controller for > > > > > > > MSIs. And looking at the code here it seems like the same would apply > to > > > > > > > RCAR. > > > > > > Yes you are correct here. > > > > > > > > > > > > > I'm also slightly confused as to why this would cause ->msi_check() to > > > > > > > fail. The default implementation (msi_domain_ops_check()) doesn't > do > > > > > > > anything. > > > > > > > > > > > > > > Also, how is passing in NULL instead of a valid struct device_node * > > > > > > > going to prevent an oops? Perhaps this is one of those reference > count > > > > > > > imbalance bugs that have recently been showing up? > > > > > > On arm64 (previously I didn't realise this just affects arm64, not arm), > > > > > > the changes in commit f075915ac0b11 ("PCI/MSI: Drop domain field > from > > > > > > msi_controller") and d8a1cb757550 ("PCI/MSI: Let pci_msi_get_domain > use > > > > > > struct device::msi_domain") return an uninitialized msi domain that > leads > > > > > > to the oops. It appears that these changes assume that msi interrupt > > > > > > controller is separate from the PCI controller. > > > > > More accurately, when CONFIG_GENERIC_MSI_IRQ_DOMAIN is enabled, > > > > > pci_msi_get_domain() calls dev_get_msi_domain() and at this point > > > > > dev->msi_domain is uninitialized. > > > > > > > > Marc, any idea what's going on here? > > > > > > Thanks for putting me in the loop. > > > > > > No precise idea yet, but the proposed fix definitely looks like the > > > wrong one. Actually, not passing a node identifier to any domain > > > constructor is pretty much always a mistake when using DT. > > > > > > Can someone post a stack trace for this issue so that I can have a > > > look? I'm currently traveling, so expect a slightly delayed reply... > > > > Unfortunately, not all the code for this arm64 board is upstream > > yet, this code base is off 4.3-rc7. > > Oh, this is arm64? Well, you're not supposed to use the old > msi_controller stuff on arm64 - I really want all arm64 controllers to > be converted to generic MSI domains. Please have a look at the xgene > code, for example. Oh right, I wasn't aware of that. I had hoped that drivers weren't so arch specific... > But irrespective of that, I share Thierry's skepticism: > > > systemd-udevd[1315]: undefined instruction: pc=ffffffc03106d41c > > Code: ffffffc0 311f9740 ffffffc0 3106d138 (ffffffc0) > > Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP > > Modules linked in: e1000e(+) > > CPU: 0 PID: 1315 Comm: systemd-udevd Not tainted 4.3.0-rc7+ #4 > > Hardware name: Renesas Salvator-X board based on r8a7795 (DT) > > task: ffffffc0307af080 ti: ffffffc030ecc000 task.ti: ffffffc030ecc000 > > PC is at 0xffffffc03106d41c > > You are clearly jumping to nowhereland, and I doubt this is related to > the domain of_node being set. Are you overriding arch_setup_msi_irq one > way or another? No, I'm not overriding arch_setup_msi_irq at all. Since the stack trace doesn't help that much I added some tracing: pci_msi_setup_msi_irqs() calls pci_msi_get_domain() calls dev_get_msi_domain(), gets a non-NULL domain. pci_msi_setup_msi_irqs() calls pci_msi_domain_alloc_irqs() calls msi_domain_alloc_irqs() msi_domain_alloc_irqs:273: ops=ffffffc03193a810 msi_domain_alloc_irqs:274: ops->msi_check=ffffffc031161418 systemd-udevd[1311]: undefined instruction: pc=ffffffc03116141c That looks to me as though msi_check is off pointing to the weeds. By passing a NULL domain into irq_domain_add_linear() you get: pci_msi_setup_msi_irqs() calls pci_msi_get_domain() calls dev_get_msi_domain(), gets a NULL domain. calls arch_setup_msi_irq() All ok then. Thanks for your help, Phil