From: Soeren Moch <smoch@web.de> To: Robin Murphy <robin.murphy@arm.com>, Shawn Lin <shawn.lin@rock-chips.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Andrew Murray <amurray@thegoodpenguin.co.uk>, Bjorn Helgaas <bhelgaas@google.com>, Heiko Stuebner <heiko@sntech.de>, linux-rockchip@lists.infradead.org, linux-pci@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [BUG] PCI: rockchip: rk3399: pcie switch support Date: Mon, 27 Apr 2020 21:32:29 +0200 [thread overview] Message-ID: <213ff8e4-0921-6e06-a98e-e7d955ca279d@web.de> (raw) In-Reply-To: <d02e0b72-5fb3-dd47-468c-08b86db07a9a@arm.com> On 14.04.20 14:28, Robin Murphy wrote: > On 2020-04-14 12:35 pm, Soeren Moch wrote: >> On 06.04.20 19:12, Soeren Moch wrote: >>> On 06.04.20 14:52, Robin Murphy wrote: >>>> On 2020-04-04 7:41 pm, Soeren Moch wrote: >>>>> I want to use a PCIe switch on a RK3399 based RockPro64 V2.1 board. >>>>> "Normal" PCIe cards work (mostly) just fine on this board. The PCIe >>>>> switches (I tried Pericom and ASMedia based switches) also work >>>>> fine on >>>>> other boards. The RK3399 PCIe controller with pcie_rockchip_host >>>>> driver >>>>> also recognises the switch, but fails to initialize the buses >>>>> behind the >>>>> bridge properly, see syslog from linux-5.6.0. >>>>> >>>>> Any ideas what I do wrong, or any suggestions what I can test here? >>>> See the thread here: >>>> >>>> https://lore.kernel.org/linux-pci/CAMdYzYoTwjKz4EN8PtD5pZfu3+SX+68JL+dfvmCrSnLL=K6Few@mail.gmail.com/ >>>> >>>> >>> Thanks Robin! >>> >>> I also found out in the meantime that device enumeration fails in this >>> fatal way when probing non-existent devices. So if I hack my complete >>> bus topology into rockchip_pcie_valid_device, then all existing devices >>> come up properly. Of course this is not how PCIe should work. >>>> The conclusion there seems to be that the RK3399 root complex just >>>> doesn't handle certain types of response in a sensible manner, and >>>> there's not much that can reasonably be done to change that. >>> Hm, at least there is the promising suggestion to take over the SError >>> handler, maybe in ATF, as workaround. >> Unfortunately it seems to be not that easy. Only when PCIe device >> probing runs on one of the Cortex-A72 cores of rk3399 we see the SError. >> When probing runs on one of the A53 cores, we get a synchronous external >> abort instead. >> >> Is this expected to see different error types on big.LITTLE systems? Or >> is this another special property of the rk3399 pcie controller? > > As far as I'm aware, the CPU microarchitecture is indeed one of the > factors in whether it takes a given external abort synchronously or > asynchronously, so yes, I'd say that probably is expected. I wouldn't > necessarily even rely on a single microarchitecture only behaving one > way, since in principle it's possible that surrounding instructions > might affect whether the core still has enough context left to take > the exception synchronously or not at the point the abort does come back. > > In general external aborts are a "should never happen" kind of thing, > so they're not necessarily expected to be recoverable (I think the RAS > extensions might add a more robustness in terms of reporting, but > aren't relevant here either way). > Okay. In an ideal world we would not need software workarounds for hardware bugs. @Shawn: Can you point me to the rk3399 errata you mentioned in commit 712fa1777207c2f2703a6eb618a9699099cbe37b ? Thanks. > At this point I'm starting to wonder whether it might be possible to > do something similar to the Arm N1SDP workaround using the Cortex-M0, > albeit with the complication that probing would realistically have to > be explicitly invoked from the Linux driver due to clocks and external > regulators... :/ > Sounds complicated. For me I use the patch below. Of course this hack is not intended for merging, just as reference to conclude this discussion. If someone comes up with a better solution, I'm happy to test this. Thanks, Soeren ------------------------8<------------------------------------ From 9f2e26186bbf867f1baada057bcbd843c465c381 Mon Sep 17 00:00:00 2001 From: Soeren Moch <smoch@web.de> Date: Fri, 17 Apr 2020 12:14:04 +0200 Subject: [PATCH] PCI: rockchip: rk3399: pcie switch support Due to a hardware bug the rk3399 PCIe controller signals error conditions to the cpu when scanning for PCIe devices, which are not available. So PCIe bridges are not supported. The rk3399 Cortex-A72 cores generate SError interrupts for these false PCIe errors, Cortex-A53 cores generate Synchronuos External Aborts. This hack enables PCIe device probing on buses behind bridges by ignoring the generated SError. Device probing needs to be done on Cortex-A72 cores, e.g. use taskset -c 4 modprobe pcie_rockchip_host Signed-off-by: Soeren Moch <smoch@web.de> --- arch/arm64/kernel/traps.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index cf402be5c573..da2b64d2613f 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -906,8 +906,16 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr) asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr) { - const bool was_in_nmi = in_nmi(); + bool was_in_nmi; + /* ignore SError to enable rk3399 PCIe bus enumeration */ + if (esr >> ESR_ELx_EC_SHIFT == ESR_ELx_EC_SERROR) { + pr_debug("ignoring SError Interrupt on CPU%d\n", + smp_processor_id()); + return; + } + + was_in_nmi = in_nmi(); if (!was_in_nmi) nmi_enter(); -- 2.17.1
WARNING: multiple messages have this Message-ID (diff)
From: Soeren Moch <smoch@web.de> To: Robin Murphy <robin.murphy@arm.com>, Shawn Lin <shawn.lin@rock-chips.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Heiko Stuebner <heiko@sntech.de>, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rockchip@lists.infradead.org, Bjorn Helgaas <bhelgaas@google.com>, linux-arm-kernel@lists.infradead.org, Andrew Murray <amurray@thegoodpenguin.co.uk> Subject: Re: [BUG] PCI: rockchip: rk3399: pcie switch support Date: Mon, 27 Apr 2020 21:32:29 +0200 [thread overview] Message-ID: <213ff8e4-0921-6e06-a98e-e7d955ca279d@web.de> (raw) In-Reply-To: <d02e0b72-5fb3-dd47-468c-08b86db07a9a@arm.com> On 14.04.20 14:28, Robin Murphy wrote: > On 2020-04-14 12:35 pm, Soeren Moch wrote: >> On 06.04.20 19:12, Soeren Moch wrote: >>> On 06.04.20 14:52, Robin Murphy wrote: >>>> On 2020-04-04 7:41 pm, Soeren Moch wrote: >>>>> I want to use a PCIe switch on a RK3399 based RockPro64 V2.1 board. >>>>> "Normal" PCIe cards work (mostly) just fine on this board. The PCIe >>>>> switches (I tried Pericom and ASMedia based switches) also work >>>>> fine on >>>>> other boards. The RK3399 PCIe controller with pcie_rockchip_host >>>>> driver >>>>> also recognises the switch, but fails to initialize the buses >>>>> behind the >>>>> bridge properly, see syslog from linux-5.6.0. >>>>> >>>>> Any ideas what I do wrong, or any suggestions what I can test here? >>>> See the thread here: >>>> >>>> https://lore.kernel.org/linux-pci/CAMdYzYoTwjKz4EN8PtD5pZfu3+SX+68JL+dfvmCrSnLL=K6Few@mail.gmail.com/ >>>> >>>> >>> Thanks Robin! >>> >>> I also found out in the meantime that device enumeration fails in this >>> fatal way when probing non-existent devices. So if I hack my complete >>> bus topology into rockchip_pcie_valid_device, then all existing devices >>> come up properly. Of course this is not how PCIe should work. >>>> The conclusion there seems to be that the RK3399 root complex just >>>> doesn't handle certain types of response in a sensible manner, and >>>> there's not much that can reasonably be done to change that. >>> Hm, at least there is the promising suggestion to take over the SError >>> handler, maybe in ATF, as workaround. >> Unfortunately it seems to be not that easy. Only when PCIe device >> probing runs on one of the Cortex-A72 cores of rk3399 we see the SError. >> When probing runs on one of the A53 cores, we get a synchronous external >> abort instead. >> >> Is this expected to see different error types on big.LITTLE systems? Or >> is this another special property of the rk3399 pcie controller? > > As far as I'm aware, the CPU microarchitecture is indeed one of the > factors in whether it takes a given external abort synchronously or > asynchronously, so yes, I'd say that probably is expected. I wouldn't > necessarily even rely on a single microarchitecture only behaving one > way, since in principle it's possible that surrounding instructions > might affect whether the core still has enough context left to take > the exception synchronously or not at the point the abort does come back. > > In general external aborts are a "should never happen" kind of thing, > so they're not necessarily expected to be recoverable (I think the RAS > extensions might add a more robustness in terms of reporting, but > aren't relevant here either way). > Okay. In an ideal world we would not need software workarounds for hardware bugs. @Shawn: Can you point me to the rk3399 errata you mentioned in commit 712fa1777207c2f2703a6eb618a9699099cbe37b ? Thanks. > At this point I'm starting to wonder whether it might be possible to > do something similar to the Arm N1SDP workaround using the Cortex-M0, > albeit with the complication that probing would realistically have to > be explicitly invoked from the Linux driver due to clocks and external > regulators... :/ > Sounds complicated. For me I use the patch below. Of course this hack is not intended for merging, just as reference to conclude this discussion. If someone comes up with a better solution, I'm happy to test this. Thanks, Soeren ------------------------8<------------------------------------ From 9f2e26186bbf867f1baada057bcbd843c465c381 Mon Sep 17 00:00:00 2001 From: Soeren Moch <smoch@web.de> Date: Fri, 17 Apr 2020 12:14:04 +0200 Subject: [PATCH] PCI: rockchip: rk3399: pcie switch support Due to a hardware bug the rk3399 PCIe controller signals error conditions to the cpu when scanning for PCIe devices, which are not available. So PCIe bridges are not supported. The rk3399 Cortex-A72 cores generate SError interrupts for these false PCIe errors, Cortex-A53 cores generate Synchronuos External Aborts. This hack enables PCIe device probing on buses behind bridges by ignoring the generated SError. Device probing needs to be done on Cortex-A72 cores, e.g. use taskset -c 4 modprobe pcie_rockchip_host Signed-off-by: Soeren Moch <smoch@web.de> --- arch/arm64/kernel/traps.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index cf402be5c573..da2b64d2613f 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -906,8 +906,16 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr) asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr) { - const bool was_in_nmi = in_nmi(); + bool was_in_nmi; + /* ignore SError to enable rk3399 PCIe bus enumeration */ + if (esr >> ESR_ELx_EC_SHIFT == ESR_ELx_EC_SERROR) { + pr_debug("ignoring SError Interrupt on CPU%d\n", + smp_processor_id()); + return; + } + + was_in_nmi = in_nmi(); if (!was_in_nmi) nmi_enter(); -- 2.17.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-04-27 19:32 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-04 18:41 [BUG] PCI: rockchip: rk3399: pcie switch support Soeren Moch 2020-04-04 18:41 ` Soeren Moch 2020-04-06 12:52 ` Robin Murphy 2020-04-06 12:52 ` Robin Murphy 2020-04-06 17:12 ` Soeren Moch 2020-04-06 17:12 ` Soeren Moch 2020-04-06 17:12 ` Soeren Moch 2020-04-14 11:35 ` Soeren Moch 2020-04-14 11:35 ` Soeren Moch 2020-04-14 11:35 ` Soeren Moch 2020-04-14 12:28 ` Robin Murphy 2020-04-14 12:28 ` Robin Murphy 2020-04-14 12:28 ` Robin Murphy 2020-04-27 19:32 ` Soeren Moch [this message] 2020-04-27 19:32 ` Soeren Moch
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=213ff8e4-0921-6e06-a98e-e7d955ca279d@web.de \ --to=smoch@web.de \ --cc=amurray@thegoodpenguin.co.uk \ --cc=bhelgaas@google.com \ --cc=heiko@sntech.de \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pci@vger.kernel.org \ --cc=linux-rockchip@lists.infradead.org \ --cc=lorenzo.pieralisi@arm.com \ --cc=robin.murphy@arm.com \ --cc=shawn.lin@rock-chips.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.