From: Eric Chanudet <echanude@redhat.com>
To: Lucas Karpinski <lkarpins@redhat.com>
Cc: linux-kernel@vger.kernel.org, agross@kernel.org,
andersson@kernel.org, konrad.dybcio@linaro.org,
robh+dt@kernel.org, krzysztof.kozlowski+dt@linaro.org,
linux-arm-msm@vger.kernel.org, devicetree@vger.kernel.org,
ahalaney@redhat.com, bmasney@redhat.com,
quic_shazhuss@quicinc.com
Subject: Re: [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a node"
Date: Mon, 12 Jun 2023 16:05:04 -0400 [thread overview]
Message-ID: <6m7xpqs73wrlin2ghhviwc4ijb5kyvk7ba2wpflqkgjivv6ol2@z5i5uli3h7f3> (raw)
In-Reply-To: <pmodcoakbs25z2a7mlo5gpuz63zluh35vbgb5itn6k5aqhjnny@jvphbpvahtse>
On Fri, Jun 02, 2023 at 03:33:21PM -0400, Lucas Karpinski wrote:
> This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71.
>
> The patch introduced a sporadic error where the Qdrive3 will fail to
> boot occasionally due to an rcu preempt stall.
> Qualcomm has disabled pcie2a downstream:
> https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476
> rcu: (t=5253 jiffies g=-175 q=2552 ncpus=8)
> Call trace:
> __do_softirq
> ____do_softirq
> call_on_irq_stack
> do_softirq_own_stack
> __irq_exit_rcu
> irq_exit_rcu
>
> The issue occurs normally once every 3-4 boot cycles.
> There is likely a race condition caused when setting up the two pcie
> domains concurrently (pcie2a and pcie3a).
>
> The issue is not present when only pcie2a is enabled or when only pcie3a
> is enabled.
> A workaround was found that allowed the Qdrive3 to boot with both pcie2a
> and pcie3a enabled.
> Set the .probe_type to PROBE_FORCE_SYNCHRONOUS and add an msleep() to
> the probing function.
> This is not a solution, so this patch is disabling pcie2a as it seems
> Red Hat are the only ones working on the board,
> we're find with disabling the node until a root cause is found. If
> anyone has further suggestions for debugging, let me know.
>
> Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> ---
> During debugging:
> - Added additional time for clock/regulator stabilization.
> - Reduced the bandwidth across pcie2a and pcie3a.
> - Replaced the interconnect setup from another driver.
> - The 32-bit/64-bit/config-io space for both pcie2a and pcie3a look to be mapped correctly.
> - Verified interconnects were started successfully.
I was looking at another issue downstream triggering a soft lock on
CPU0, but it turns out this could be the same thing except the symptoms
are less noticeable (the 3-4 boot cycles you mention).
Using next-20230609, if I add a return kprobe on dw_handle_msi_irq:
echo 'r:dwmsi_probe dw_handle_msi_irq $retval' > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/dwmsi_probe/enable
cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0 [000] d.h1. 690.417268: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0 [000] d.h1. 690.417272: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0 [000] d.h1. 690.417276: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0 [000] d.h1. 690.417281: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0 [000] d.h1. 690.417284: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0 [000] d.h1. 690.417288: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
[...]
dw_handle_msi_irq constantly fires and never returns IRQ_HANDLED. It
happens consistently for pcie2a or pcie3a, after I disable one or the
other. I presume having both might be enough to overwhelm the system and
trigger the stall?
Looking at the handler, the status is always 0 after:
status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS +
(i * MSI_REG_CTRL_BLOCK_SIZE));
Unfortunately I do not know why that is yet.
>
> arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 44 -----------------------
> 1 file changed, 44 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> index 24fa449d48a6..d492723ccf7c 100644
> --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> @@ -186,27 +186,6 @@ &i2c18 {
> status = "okay";
> };
>
> -&pcie2a {
> - ranges = <0x01000000 0x0 0x3c200000 0x0 0x3c200000 0x0 0x100000>,
> - <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>,
> - <0x03000000 0x5 0x00000000 0x5 0x00000000 0x1 0x00000000>;
> -
> - perst-gpios = <&tlmm 143 GPIO_ACTIVE_LOW>;
> - wake-gpios = <&tlmm 145 GPIO_ACTIVE_HIGH>;
> -
> - pinctrl-names = "default";
> - pinctrl-0 = <&pcie2a_default>;
> -
> - status = "okay";
> -};
> -
> -&pcie2a_phy {
> - vdda-phy-supply = <&vreg_l11a>;
> - vdda-pll-supply = <&vreg_l3a>;
> -
> - status = "okay";
> -};
> -
> &pcie3a {
> ranges = <0x01000000 0x0 0x40200000 0x0 0x40200000 0x0 0x100000>,
> <0x02000000 0x0 0x40300000 0x0 0x40300000 0x0 0x20000000>,
> @@ -356,29 +335,6 @@ i2c18_default: i2c18-default-state {
> bias-pull-up;
> };
>
> - pcie2a_default: pcie2a-default-state {
> - perst-pins {
> - pins = "gpio143";
> - function = "gpio";
> - drive-strength = <2>;
> - bias-pull-down;
> - };
> -
> - clkreq-pins {
> - pins = "gpio142";
> - function = "pcie2a_clkreq";
> - drive-strength = <2>;
> - bias-pull-up;
> - };
> -
> - wake-pins {
> - pins = "gpio145";
> - function = "gpio";
> - drive-strength = <2>;
> - bias-pull-up;
> - };
> - };
> -
> pcie3a_default: pcie3a-default-state {
> perst-pins {
> pins = "gpio151";
> --
> 2.40.1
>
--
Eric Chanudet
prev parent reply other threads:[~2023-06-12 20:06 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 19:33 [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a node" Lucas Karpinski
2023-06-07 13:44 ` Brian Masney
2023-06-12 20:05 ` Eric Chanudet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6m7xpqs73wrlin2ghhviwc4ijb5kyvk7ba2wpflqkgjivv6ol2@z5i5uli3h7f3 \
--to=echanude@redhat.com \
--cc=agross@kernel.org \
--cc=ahalaney@redhat.com \
--cc=andersson@kernel.org \
--cc=bmasney@redhat.com \
--cc=devicetree@vger.kernel.org \
--cc=konrad.dybcio@linaro.org \
--cc=krzysztof.kozlowski+dt@linaro.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkarpins@redhat.com \
--cc=quic_shazhuss@quicinc.com \
--cc=robh+dt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).