All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Thierry Reding <treding@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Wolfram Sang <wsa@the-dreams.de>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Simon Horman <horms@verge.net.au>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ley Foon Tan <lftan@altera.com>, Jingoo Han <jg1.han@samsung.com>
Subject: Re: [PATCH] PCI: pcie-rcar: Fix OF node passed to MSI irq domain
Date: Mon, 16 Nov 2015 18:31:29 +0000	[thread overview]
Message-ID: <564A2101.90600@arm.com> (raw)
In-Reply-To: <PS1PR06MB11800B547C8957B226077B83F5110@PS1PR06MB1180.apcprd06.prod.outlook.com>

On 13/11/15 09:36, Phil Edworthy wrote:
> Hi Marc,
> 
> On 12 November 2015 20:31, Marc Zyngier wrote:
>> Phil Edworthy <phil.edworthy@renesas.com> wrote:
>>> On 11 November 2015 16:38, Marc Zyngier wrote:
>>>> On Tue, 10 Nov 2015 16:52:33 +0100
>>>> Thierry Reding <treding@nvidia.com> wrote:
>>>>
>>>>> On Mon, Nov 09, 2015 at 06:01:49PM +0000, Phil Edworthy wrote:
>>>>>> Hi Thierry,
>>>>>>
>>>>>> On 09 November 2015 17:24, Phil wrote:
>>>>>>> On 09 November 2015 16:11, Thierry wrote:
>>>>>>>> On Mon, Nov 09, 2015 at 03:20:24PM +0000, Phil Edworthy wrote:
>>>>>>>>> cc'ing others (Tegra, Altera, Designware) who may have the same
>> bug
>>>>>>>>>
>>>>>>>>> On 03 November 2015 09:28, Phil Edworthy wrote:
>>>>>>>>>> The OF node passed to irq_domain_add_linear() should be a
>>>>>>>>>> pointer to interrupt controller's device tree node, or NULL,
>>>>>>>>>> but not the PCI controller's node.
>>>>>>>>>>
>>>>>>>>>> This fixes an oops in msi_domain_alloc_irqs() when it tries
>>>>>>>>>> to call msi_check().
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/pci/host/pcie-rcar.c | 2 +-
>>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/host/pcie-
>> rcar.c
>>>>>>>>>> index 2377bf0..c6fa562 100644
>>>>>>>>>> --- a/drivers/pci/host/pcie-rcar.c
>>>>>>>>>> +++ b/drivers/pci/host/pcie-rcar.c
>>>>>>>>>> @@ -709,7 +709,7 @@ static int rcar_pcie_enable_msi(struct
>> rcar_pcie
>>>>>>> *pcie)
>>>>>>>>>>  	msi->chip.setup_irq = rcar_msi_setup_irq;
>>>>>>>>>>  	msi->chip.teardown_irq = rcar_msi_teardown_irq;
>>>>>>>>>>
>>>>>>>>>> -	msi->domain = irq_domain_add_linear(pcie->dev->of_node,
>>>>>>>>>> INT_PCI_MSI_NR,
>>>>>>>>>> +	msi->domain = irq_domain_add_linear(NULL,
>> INT_PCI_MSI_NR,
>>>>>>>>>>  					    &msi_domain_ops, &msi-
>>>>> chip);
>>>>>>>>>>  	if (!msi->domain) {
>>>>>>>>>>  		dev_err(&pdev->dev, "failed to create IRQ domain\n");
>>>>>>>>
>>>>>>>> On Tegra the PCI controller is in fact the interrupt controller for
>>>>>>>> MSIs. And looking at the code here it seems like the same would apply
>> to
>>>>>>>> RCAR.
>>>>>>> Yes you are correct here.
>>>>>>>
>>>>>>>> I'm also slightly confused as to why this would cause ->msi_check() to
>>>>>>>> fail. The default implementation (msi_domain_ops_check()) doesn't
>> do
>>>>>>>> anything.
>>>>>>>>
>>>>>>>> Also, how is passing in NULL instead of a valid struct device_node *
>>>>>>>> going to prevent an oops? Perhaps this is one of those reference
>> count
>>>>>>>> imbalance bugs that have recently been showing up?
>>>>>>> On arm64 (previously I didn't realise this just affects arm64, not arm),
>>>>>>> the changes in commit f075915ac0b11 ("PCI/MSI: Drop domain field
>> from
>>>>>>> msi_controller") and d8a1cb757550 ("PCI/MSI: Let pci_msi_get_domain
>> use
>>>>>>> struct device::msi_domain") return an uninitialized msi domain that
>> leads
>>>>>>> to the oops. It appears that these changes assume that msi interrupt
>>>>>>> controller is separate from the PCI controller.
>>>>>> More accurately, when CONFIG_GENERIC_MSI_IRQ_DOMAIN is enabled,
>>>>>> pci_msi_get_domain() calls dev_get_msi_domain() and at this point
>>>>>> dev->msi_domain is uninitialized.
>>>>>
>>>>> Marc, any idea what's going on here?
>>>>
>>>> Thanks for putting me in the loop.
>>>>
>>>> No precise idea yet, but the proposed fix definitely looks like the
>>>> wrong one. Actually, not passing a node identifier to any domain
>>>> constructor is pretty much always a mistake when using DT.
>>>>
>>>> Can someone post a stack trace for this issue so that I can have a
>>>> look? I'm currently traveling, so expect a slightly delayed reply...
>>>
>>> Unfortunately, not all the code for this arm64 board is upstream
>>> yet, this code base is off 4.3-rc7.
>>
>> Oh, this is arm64? Well, you're not supposed to use the old
>> msi_controller stuff on arm64 - I really want all arm64 controllers to
>> be converted to generic MSI domains. Please have a look at the xgene
>> code, for example.
> Oh right, I wasn't aware of that. I had hoped that drivers weren't so
> arch specific...

They are not. Generic MSI domains are supported on all other
architectures that select this option (arm, x86).

>> But irrespective of that, I share Thierry's skepticism:
>>
>>> systemd-udevd[1315]: undefined instruction: pcÿffffc03106d41c
>>> Code: ffffffc0 311f9740 ffffffc0 3106d138 (ffffffc0)
>>> Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP
>>> Modules linked in: e1000e(+)
>>> CPU: 0 PID: 1315 Comm: systemd-udevd Not tainted 4.3.0-rc7+ #4
>>> Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
>>> task: ffffffc0307af080 ti: ffffffc030ecc000 task.ti: ffffffc030ecc000
>>> PC is at 0xffffffc03106d41c
>>
>> You are clearly jumping to nowhereland, and I doubt this is related to
>> the domain of_node being set. Are you overriding arch_setup_msi_irq one
>> way or another?
> No, I'm not overriding arch_setup_msi_irq at all.
> 
> Since the stack trace doesn't help that much I added some tracing:
> pci_msi_setup_msi_irqs()
>   calls pci_msi_get_domain()
>     calls dev_get_msi_domain(), gets a non-NULL domain.
> pci_msi_setup_msi_irqs()
>   calls pci_msi_domain_alloc_irqs()
>     calls msi_domain_alloc_irqs()
> msi_domain_alloc_irqs:273: opsÿffffc03193a810
> msi_domain_alloc_irqs:274: ops->msi_checkÿffffc031161418
> systemd-udevd[1311]: undefined instruction: pcÿffffc03116141c
> That looks to me as though msi_check is off pointing to the weeds.

So the next step is to find out who initializes msi_check. Assuming
someone does...

> By passing a NULL domain into irq_domain_add_linear() you get:
> pci_msi_setup_msi_irqs()
>   calls pci_msi_get_domain()
>     calls dev_get_msi_domain(), gets a NULL domain.
>     calls arch_setup_msi_irq()
> All ok then.

Yes, because you're sidestepping the issue. Any chance you could dig a
bit deeper? I'd really like to nail this one down (before we convert
your PCI driver to the right API... ;-).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <marc.zyngier@arm.com>
To: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Thierry Reding <treding@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Wolfram Sang <wsa@the-dreams.de>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Simon Horman <horms@verge.net.au>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ley Foon Tan <lftan@altera.com>, Jingoo Han <jg1.han@samsung.com>
Subject: Re: [PATCH] PCI: pcie-rcar: Fix OF node passed to MSI irq domain
Date: Mon, 16 Nov 2015 18:31:29 +0000	[thread overview]
Message-ID: <564A2101.90600@arm.com> (raw)
In-Reply-To: <PS1PR06MB11800B547C8957B226077B83F5110@PS1PR06MB1180.apcprd06.prod.outlook.com>

On 13/11/15 09:36, Phil Edworthy wrote:
> Hi Marc,
> 
> On 12 November 2015 20:31, Marc Zyngier wrote:
>> Phil Edworthy <phil.edworthy@renesas.com> wrote:
>>> On 11 November 2015 16:38, Marc Zyngier wrote:
>>>> On Tue, 10 Nov 2015 16:52:33 +0100
>>>> Thierry Reding <treding@nvidia.com> wrote:
>>>>
>>>>> On Mon, Nov 09, 2015 at 06:01:49PM +0000, Phil Edworthy wrote:
>>>>>> Hi Thierry,
>>>>>>
>>>>>> On 09 November 2015 17:24, Phil wrote:
>>>>>>> On 09 November 2015 16:11, Thierry wrote:
>>>>>>>> On Mon, Nov 09, 2015 at 03:20:24PM +0000, Phil Edworthy wrote:
>>>>>>>>> cc'ing others (Tegra, Altera, Designware) who may have the same
>> bug
>>>>>>>>>
>>>>>>>>> On 03 November 2015 09:28, Phil Edworthy wrote:
>>>>>>>>>> The OF node passed to irq_domain_add_linear() should be a
>>>>>>>>>> pointer to interrupt controller's device tree node, or NULL,
>>>>>>>>>> but not the PCI controller's node.
>>>>>>>>>>
>>>>>>>>>> This fixes an oops in msi_domain_alloc_irqs() when it tries
>>>>>>>>>> to call msi_check().
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/pci/host/pcie-rcar.c | 2 +-
>>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/host/pcie-
>> rcar.c
>>>>>>>>>> index 2377bf0..c6fa562 100644
>>>>>>>>>> --- a/drivers/pci/host/pcie-rcar.c
>>>>>>>>>> +++ b/drivers/pci/host/pcie-rcar.c
>>>>>>>>>> @@ -709,7 +709,7 @@ static int rcar_pcie_enable_msi(struct
>> rcar_pcie
>>>>>>> *pcie)
>>>>>>>>>>  	msi->chip.setup_irq = rcar_msi_setup_irq;
>>>>>>>>>>  	msi->chip.teardown_irq = rcar_msi_teardown_irq;
>>>>>>>>>>
>>>>>>>>>> -	msi->domain = irq_domain_add_linear(pcie->dev->of_node,
>>>>>>>>>> INT_PCI_MSI_NR,
>>>>>>>>>> +	msi->domain = irq_domain_add_linear(NULL,
>> INT_PCI_MSI_NR,
>>>>>>>>>>  					    &msi_domain_ops, &msi-
>>>>> chip);
>>>>>>>>>>  	if (!msi->domain) {
>>>>>>>>>>  		dev_err(&pdev->dev, "failed to create IRQ domain\n");
>>>>>>>>
>>>>>>>> On Tegra the PCI controller is in fact the interrupt controller for
>>>>>>>> MSIs. And looking at the code here it seems like the same would apply
>> to
>>>>>>>> RCAR.
>>>>>>> Yes you are correct here.
>>>>>>>
>>>>>>>> I'm also slightly confused as to why this would cause ->msi_check() to
>>>>>>>> fail. The default implementation (msi_domain_ops_check()) doesn't
>> do
>>>>>>>> anything.
>>>>>>>>
>>>>>>>> Also, how is passing in NULL instead of a valid struct device_node *
>>>>>>>> going to prevent an oops? Perhaps this is one of those reference
>> count
>>>>>>>> imbalance bugs that have recently been showing up?
>>>>>>> On arm64 (previously I didn't realise this just affects arm64, not arm),
>>>>>>> the changes in commit f075915ac0b11 ("PCI/MSI: Drop domain field
>> from
>>>>>>> msi_controller") and d8a1cb757550 ("PCI/MSI: Let pci_msi_get_domain
>> use
>>>>>>> struct device::msi_domain") return an uninitialized msi domain that
>> leads
>>>>>>> to the oops. It appears that these changes assume that msi interrupt
>>>>>>> controller is separate from the PCI controller.
>>>>>> More accurately, when CONFIG_GENERIC_MSI_IRQ_DOMAIN is enabled,
>>>>>> pci_msi_get_domain() calls dev_get_msi_domain() and at this point
>>>>>> dev->msi_domain is uninitialized.
>>>>>
>>>>> Marc, any idea what's going on here?
>>>>
>>>> Thanks for putting me in the loop.
>>>>
>>>> No precise idea yet, but the proposed fix definitely looks like the
>>>> wrong one. Actually, not passing a node identifier to any domain
>>>> constructor is pretty much always a mistake when using DT.
>>>>
>>>> Can someone post a stack trace for this issue so that I can have a
>>>> look? I'm currently traveling, so expect a slightly delayed reply...
>>>
>>> Unfortunately, not all the code for this arm64 board is upstream
>>> yet, this code base is off 4.3-rc7.
>>
>> Oh, this is arm64? Well, you're not supposed to use the old
>> msi_controller stuff on arm64 - I really want all arm64 controllers to
>> be converted to generic MSI domains. Please have a look at the xgene
>> code, for example.
> Oh right, I wasn't aware of that. I had hoped that drivers weren't so
> arch specific...

They are not. Generic MSI domains are supported on all other
architectures that select this option (arm, x86).

>> But irrespective of that, I share Thierry's skepticism:
>>
>>> systemd-udevd[1315]: undefined instruction: pc=ffffffc03106d41c
>>> Code: ffffffc0 311f9740 ffffffc0 3106d138 (ffffffc0)
>>> Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP
>>> Modules linked in: e1000e(+)
>>> CPU: 0 PID: 1315 Comm: systemd-udevd Not tainted 4.3.0-rc7+ #4
>>> Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
>>> task: ffffffc0307af080 ti: ffffffc030ecc000 task.ti: ffffffc030ecc000
>>> PC is at 0xffffffc03106d41c
>>
>> You are clearly jumping to nowhereland, and I doubt this is related to
>> the domain of_node being set. Are you overriding arch_setup_msi_irq one
>> way or another?
> No, I'm not overriding arch_setup_msi_irq at all.
> 
> Since the stack trace doesn't help that much I added some tracing:
> pci_msi_setup_msi_irqs()
>   calls pci_msi_get_domain()
>     calls dev_get_msi_domain(), gets a non-NULL domain.
> pci_msi_setup_msi_irqs()
>   calls pci_msi_domain_alloc_irqs()
>     calls msi_domain_alloc_irqs()
> msi_domain_alloc_irqs:273: ops=ffffffc03193a810
> msi_domain_alloc_irqs:274: ops->msi_check=ffffffc031161418
> systemd-udevd[1311]: undefined instruction: pc=ffffffc03116141c
> That looks to me as though msi_check is off pointing to the weeds.

So the next step is to find out who initializes msi_check. Assuming
someone does...

> By passing a NULL domain into irq_domain_add_linear() you get:
> pci_msi_setup_msi_irqs()
>   calls pci_msi_get_domain()
>     calls dev_get_msi_domain(), gets a NULL domain.
>     calls arch_setup_msi_irq()
> All ok then.

Yes, because you're sidestepping the issue. Any chance you could dig a
bit deeper? I'd really like to nail this one down (before we convert
your PCI driver to the right API... ;-).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2015-11-16 18:31 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-03  9:28 [PATCH] PCI: pcie-rcar: Fix OF node passed to MSI irq domain Phil Edworthy
2015-11-03  9:28 ` Phil Edworthy
2015-11-07 13:59 ` Wolfram Sang
2015-11-07 13:59   ` Wolfram Sang
2015-11-09  9:00   ` Phil Edworthy
2015-11-09  9:00     ` Phil Edworthy
2015-11-10  1:21   ` Simon Horman
2015-11-10  1:21     ` Simon Horman
2015-11-09 15:20 ` Phil Edworthy
2015-11-09 15:20   ` Phil Edworthy
2015-11-09 16:11   ` Thierry Reding
2015-11-09 16:11     ` Thierry Reding
2015-11-09 17:24     ` Phil Edworthy
2015-11-09 17:24       ` Phil Edworthy
2015-11-09 18:01     ` Phil Edworthy
2015-11-09 18:01       ` Phil Edworthy
2015-11-10 15:52       ` Thierry Reding
2015-11-10 15:52         ` Thierry Reding
2015-11-11 16:38         ` Marc Zyngier
2015-11-12  8:57           ` Phil Edworthy
2015-11-12  8:57             ` Phil Edworthy
2015-11-12 20:31             ` Marc Zyngier
2015-11-12 20:31               ` Marc Zyngier
2015-11-13  9:36               ` Phil Edworthy
2015-11-13  9:36                 ` Phil Edworthy
2015-11-16 18:31                 ` Marc Zyngier [this message]
2015-11-16 18:31                   ` Marc Zyngier
2015-11-18 18:01                   ` Phil Edworthy
2015-11-18 18:01                     ` Phil Edworthy
2015-11-20  9:38                     ` Marc Zyngier
2015-11-20  9:38                       ` Marc Zyngier
2015-11-20  9:49                     ` Marc Zyngier
2015-11-20  9:49                       ` Marc Zyngier
2015-11-23  9:44                       ` Phil Edworthy
2015-11-23  9:44                         ` Phil Edworthy
2015-11-23 10:15                         ` Marc Zyngier
2015-11-23 10:15                           ` Marc Zyngier
2015-11-23 10:29                           ` Wolfram Sang
2015-11-23 10:29                             ` Wolfram Sang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564A2101.90600@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=bhelgaas@google.com \
    --cc=geert@linux-m68k.org \
    --cc=horms@verge.net.au \
    --cc=jg1.han@samsung.com \
    --cc=lftan@altera.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=phil.edworthy@renesas.com \
    --cc=treding@nvidia.com \
    --cc=wsa@the-dreams.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.