All of lore.kernel.org
 help / color / mirror / Atom feed
From: Soeren Moch <smoch@web.de>
To: Robin Murphy <robin.murphy@arm.com>,
	Shawn Lin <shawn.lin@rock-chips.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Andrew Murray <amurray@thegoodpenguin.co.uk>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Heiko Stuebner <heiko@sntech.de>,
	linux-rockchip@lists.infradead.org, linux-pci@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [BUG] PCI: rockchip: rk3399: pcie switch support
Date: Mon, 27 Apr 2020 21:32:29 +0200	[thread overview]
Message-ID: <213ff8e4-0921-6e06-a98e-e7d955ca279d@web.de> (raw)
In-Reply-To: <d02e0b72-5fb3-dd47-468c-08b86db07a9a@arm.com>

On 14.04.20 14:28, Robin Murphy wrote:
> On 2020-04-14 12:35 pm, Soeren Moch wrote:
>> On 06.04.20 19:12, Soeren Moch wrote:
>>> On 06.04.20 14:52, Robin Murphy wrote:
>>>> On 2020-04-04 7:41 pm, Soeren Moch wrote:
>>>>> I want to use a PCIe switch on a RK3399 based RockPro64 V2.1 board.
>>>>> "Normal" PCIe cards work (mostly) just fine on this board. The PCIe
>>>>> switches (I tried Pericom and ASMedia based switches) also work
>>>>> fine on
>>>>> other boards. The RK3399 PCIe controller with pcie_rockchip_host
>>>>> driver
>>>>> also recognises the switch, but fails to initialize the buses
>>>>> behind the
>>>>> bridge properly, see syslog from linux-5.6.0.
>>>>>
>>>>> Any ideas what I do wrong, or any suggestions what I can test here?
>>>> See the thread here:
>>>>
>>>> https://lore.kernel.org/linux-pci/CAMdYzYoTwjKz4EN8PtD5pZfu3+SX+68JL+dfvmCrSnLL=K6Few@mail.gmail.com/
>>>>
>>>>
>>> Thanks Robin!
>>>
>>> I also found out in the meantime that device enumeration fails in this
>>> fatal way when probing non-existent devices. So if I hack my complete
>>> bus topology into rockchip_pcie_valid_device, then all existing devices
>>> come up properly. Of course this is not how PCIe should work.
>>>> The conclusion there seems to be that the RK3399 root complex just
>>>> doesn't handle certain types of response in a sensible manner, and
>>>> there's not much that can reasonably be done to change that.
>>> Hm, at least there is the promising suggestion to take over the SError
>>> handler, maybe in ATF, as workaround.
>> Unfortunately it seems to be not that easy. Only when PCIe device
>> probing runs on one of the Cortex-A72 cores of rk3399 we see the SError.
>> When probing runs on one of the A53 cores, we get a synchronous external
>> abort instead.
>>
>> Is this expected to see different error types on big.LITTLE systems? Or
>> is this another special property of the rk3399 pcie controller?
>
> As far as I'm aware, the CPU microarchitecture is indeed one of the
> factors in whether it takes a given external abort synchronously or
> asynchronously, so yes, I'd say that probably is expected. I wouldn't
> necessarily even rely on a single microarchitecture only behaving one
> way, since in principle it's possible that surrounding instructions
> might affect whether the core still has enough context left to take
> the exception synchronously or not at the point the abort does come back.
>
> In general external aborts are a "should never happen" kind of thing,
> so they're not necessarily expected to be recoverable (I think the RAS
> extensions might add a more robustness in terms of reporting, but
> aren't relevant here either way).
>
Okay. In an ideal world we would not need software workarounds for
hardware bugs.

@Shawn: Can you point me to the rk3399 errata you mentioned in commit
712fa1777207c2f2703a6eb618a9699099cbe37b ?

Thanks.


> At this point I'm starting to wonder whether it might be possible to
> do something similar to the Arm N1SDP workaround using the Cortex-M0,
> albeit with the complication that probing would realistically have to
> be explicitly invoked from the Linux driver due to clocks and external
> regulators... :/
>
Sounds complicated.
For me I use the patch below. Of course this hack is not intended for
merging, just as reference to conclude this discussion.

If someone comes up with a better solution, I'm happy to test this.

Thanks,
Soeren


------------------------8<------------------------------------
From 9f2e26186bbf867f1baada057bcbd843c465c381 Mon Sep 17 00:00:00 2001
From: Soeren Moch <smoch@web.de>
Date: Fri, 17 Apr 2020 12:14:04 +0200
Subject: [PATCH] PCI: rockchip: rk3399: pcie switch support

Due to a hardware bug the rk3399 PCIe controller signals error conditions
to the cpu when scanning for PCIe devices, which are not available. So
PCIe bridges are not supported.

The rk3399 Cortex-A72 cores generate SError interrupts for these false
PCIe errors, Cortex-A53 cores generate Synchronuos External Aborts.

This hack enables PCIe device probing on buses behind bridges by
ignoring the generated SError. Device probing needs to be done on
Cortex-A72 cores, e.g. use
taskset -c 4 modprobe pcie_rockchip_host

Signed-off-by: Soeren Moch <smoch@web.de>
---
 arch/arm64/kernel/traps.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cf402be5c573..da2b64d2613f 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -906,8 +906,16 @@ bool arm64_is_fatal_ras_serror(struct pt_regs
*regs, unsigned int esr)
 
 asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
 {
-    const bool was_in_nmi = in_nmi();
+    bool was_in_nmi;
 
+    /* ignore SError to enable rk3399 PCIe bus enumeration */
+    if (esr >> ESR_ELx_EC_SHIFT == ESR_ELx_EC_SERROR) {
+        pr_debug("ignoring SError Interrupt on CPU%d\n",
+                smp_processor_id());
+        return;
+    }
+
+    was_in_nmi = in_nmi();
     if (!was_in_nmi)
         nmi_enter();
 
--
2.17.1


WARNING: multiple messages have this Message-ID (diff)
From: Soeren Moch <smoch@web.de>
To: Robin Murphy <robin.murphy@arm.com>,
	Shawn Lin <shawn.lin@rock-chips.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Heiko Stuebner <heiko@sntech.de>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rockchip@lists.infradead.org,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-arm-kernel@lists.infradead.org,
	Andrew Murray <amurray@thegoodpenguin.co.uk>
Subject: Re: [BUG] PCI: rockchip: rk3399: pcie switch support
Date: Mon, 27 Apr 2020 21:32:29 +0200	[thread overview]
Message-ID: <213ff8e4-0921-6e06-a98e-e7d955ca279d@web.de> (raw)
In-Reply-To: <d02e0b72-5fb3-dd47-468c-08b86db07a9a@arm.com>

On 14.04.20 14:28, Robin Murphy wrote:
> On 2020-04-14 12:35 pm, Soeren Moch wrote:
>> On 06.04.20 19:12, Soeren Moch wrote:
>>> On 06.04.20 14:52, Robin Murphy wrote:
>>>> On 2020-04-04 7:41 pm, Soeren Moch wrote:
>>>>> I want to use a PCIe switch on a RK3399 based RockPro64 V2.1 board.
>>>>> "Normal" PCIe cards work (mostly) just fine on this board. The PCIe
>>>>> switches (I tried Pericom and ASMedia based switches) also work
>>>>> fine on
>>>>> other boards. The RK3399 PCIe controller with pcie_rockchip_host
>>>>> driver
>>>>> also recognises the switch, but fails to initialize the buses
>>>>> behind the
>>>>> bridge properly, see syslog from linux-5.6.0.
>>>>>
>>>>> Any ideas what I do wrong, or any suggestions what I can test here?
>>>> See the thread here:
>>>>
>>>> https://lore.kernel.org/linux-pci/CAMdYzYoTwjKz4EN8PtD5pZfu3+SX+68JL+dfvmCrSnLL=K6Few@mail.gmail.com/
>>>>
>>>>
>>> Thanks Robin!
>>>
>>> I also found out in the meantime that device enumeration fails in this
>>> fatal way when probing non-existent devices. So if I hack my complete
>>> bus topology into rockchip_pcie_valid_device, then all existing devices
>>> come up properly. Of course this is not how PCIe should work.
>>>> The conclusion there seems to be that the RK3399 root complex just
>>>> doesn't handle certain types of response in a sensible manner, and
>>>> there's not much that can reasonably be done to change that.
>>> Hm, at least there is the promising suggestion to take over the SError
>>> handler, maybe in ATF, as workaround.
>> Unfortunately it seems to be not that easy. Only when PCIe device
>> probing runs on one of the Cortex-A72 cores of rk3399 we see the SError.
>> When probing runs on one of the A53 cores, we get a synchronous external
>> abort instead.
>>
>> Is this expected to see different error types on big.LITTLE systems? Or
>> is this another special property of the rk3399 pcie controller?
>
> As far as I'm aware, the CPU microarchitecture is indeed one of the
> factors in whether it takes a given external abort synchronously or
> asynchronously, so yes, I'd say that probably is expected. I wouldn't
> necessarily even rely on a single microarchitecture only behaving one
> way, since in principle it's possible that surrounding instructions
> might affect whether the core still has enough context left to take
> the exception synchronously or not at the point the abort does come back.
>
> In general external aborts are a "should never happen" kind of thing,
> so they're not necessarily expected to be recoverable (I think the RAS
> extensions might add a more robustness in terms of reporting, but
> aren't relevant here either way).
>
Okay. In an ideal world we would not need software workarounds for
hardware bugs.

@Shawn: Can you point me to the rk3399 errata you mentioned in commit
712fa1777207c2f2703a6eb618a9699099cbe37b ?

Thanks.


> At this point I'm starting to wonder whether it might be possible to
> do something similar to the Arm N1SDP workaround using the Cortex-M0,
> albeit with the complication that probing would realistically have to
> be explicitly invoked from the Linux driver due to clocks and external
> regulators... :/
>
Sounds complicated.
For me I use the patch below. Of course this hack is not intended for
merging, just as reference to conclude this discussion.

If someone comes up with a better solution, I'm happy to test this.

Thanks,
Soeren


------------------------8<------------------------------------
From 9f2e26186bbf867f1baada057bcbd843c465c381 Mon Sep 17 00:00:00 2001
From: Soeren Moch <smoch@web.de>
Date: Fri, 17 Apr 2020 12:14:04 +0200
Subject: [PATCH] PCI: rockchip: rk3399: pcie switch support

Due to a hardware bug the rk3399 PCIe controller signals error conditions
to the cpu when scanning for PCIe devices, which are not available. So
PCIe bridges are not supported.

The rk3399 Cortex-A72 cores generate SError interrupts for these false
PCIe errors, Cortex-A53 cores generate Synchronuos External Aborts.

This hack enables PCIe device probing on buses behind bridges by
ignoring the generated SError. Device probing needs to be done on
Cortex-A72 cores, e.g. use
taskset -c 4 modprobe pcie_rockchip_host

Signed-off-by: Soeren Moch <smoch@web.de>
---
 arch/arm64/kernel/traps.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cf402be5c573..da2b64d2613f 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -906,8 +906,16 @@ bool arm64_is_fatal_ras_serror(struct pt_regs
*regs, unsigned int esr)
 
 asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
 {
-    const bool was_in_nmi = in_nmi();
+    bool was_in_nmi;
 
+    /* ignore SError to enable rk3399 PCIe bus enumeration */
+    if (esr >> ESR_ELx_EC_SHIFT == ESR_ELx_EC_SERROR) {
+        pr_debug("ignoring SError Interrupt on CPU%d\n",
+                smp_processor_id());
+        return;
+    }
+
+    was_in_nmi = in_nmi();
     if (!was_in_nmi)
         nmi_enter();
 
--
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-04-27 19:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-04 18:41 [BUG] PCI: rockchip: rk3399: pcie switch support Soeren Moch
2020-04-04 18:41 ` Soeren Moch
2020-04-06 12:52 ` Robin Murphy
2020-04-06 12:52   ` Robin Murphy
2020-04-06 17:12   ` Soeren Moch
2020-04-06 17:12     ` Soeren Moch
2020-04-06 17:12     ` Soeren Moch
2020-04-14 11:35     ` Soeren Moch
2020-04-14 11:35       ` Soeren Moch
2020-04-14 11:35       ` Soeren Moch
2020-04-14 12:28       ` Robin Murphy
2020-04-14 12:28         ` Robin Murphy
2020-04-14 12:28         ` Robin Murphy
2020-04-27 19:32         ` Soeren Moch [this message]
2020-04-27 19:32           ` Soeren Moch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=213ff8e4-0921-6e06-a98e-e7d955ca279d@web.de \
    --to=smoch@web.de \
    --cc=amurray@thegoodpenguin.co.uk \
    --cc=bhelgaas@google.com \
    --cc=heiko@sntech.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=shawn.lin@rock-chips.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.