linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Chanudet <echanude@redhat.com>
To: Lucas Karpinski <lkarpins@redhat.com>
Cc: linux-kernel@vger.kernel.org, agross@kernel.org,
	andersson@kernel.org, konrad.dybcio@linaro.org,
	robh+dt@kernel.org, krzysztof.kozlowski+dt@linaro.org,
	linux-arm-msm@vger.kernel.org, devicetree@vger.kernel.org,
	ahalaney@redhat.com, bmasney@redhat.com,
	quic_shazhuss@quicinc.com
Subject: Re: [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a node"
Date: Mon, 12 Jun 2023 16:05:04 -0400	[thread overview]
Message-ID: <6m7xpqs73wrlin2ghhviwc4ijb5kyvk7ba2wpflqkgjivv6ol2@z5i5uli3h7f3> (raw)
In-Reply-To: <pmodcoakbs25z2a7mlo5gpuz63zluh35vbgb5itn6k5aqhjnny@jvphbpvahtse>

On Fri, Jun 02, 2023 at 03:33:21PM -0400, Lucas Karpinski wrote:
> This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71.
> 
> The patch introduced a sporadic error where the Qdrive3 will fail to
> boot occasionally due to an rcu preempt stall.
> Qualcomm has disabled pcie2a downstream:
> https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:     0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476
> rcu:     (t=5253 jiffies g=-175 q=2552 ncpus=8)
> Call trace:
>  __do_softirq
>  ____do_softirq
>  call_on_irq_stack
>  do_softirq_own_stack
>  __irq_exit_rcu
>  irq_exit_rcu
> 
> The issue occurs normally once every 3-4 boot cycles.
> There is likely a race condition caused when setting up the two pcie
> domains concurrently (pcie2a and pcie3a).
> 
> The issue is not present when only pcie2a is enabled or when only pcie3a
> is enabled.
> A workaround was found that allowed the Qdrive3 to boot with both pcie2a
> and pcie3a enabled.
> Set the .probe_type to PROBE_FORCE_SYNCHRONOUS and add an msleep() to
> the probing function.
> This is not a solution, so this patch is disabling pcie2a as it seems
> Red Hat are the only ones working on the board,
> we're find with disabling the node until a root cause is found. If
> anyone has further suggestions for debugging, let me know.
> 
> Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> ---
>  During debugging:
>         - Added additional time for clock/regulator stabilization.
>         - Reduced the bandwidth across pcie2a and pcie3a.
>         - Replaced the interconnect setup from another driver.
>         - The 32-bit/64-bit/config-io space for both pcie2a and pcie3a look to be mapped correctly.
>         - Verified interconnects were started successfully.

I was looking at another issue downstream triggering a soft lock on
CPU0, but it turns out this could be the same thing except the symptoms
are less noticeable (the 3-4 boot cycles you mention).

Using next-20230609, if I add a return kprobe on dw_handle_msi_irq:

echo 'r:dwmsi_probe dw_handle_msi_irq $retval' > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/dwmsi_probe/enable 
cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0       [000] d.h1.   690.417268: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417272: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417276: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417281: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417284: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417288: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
[...]

dw_handle_msi_irq constantly fires and never returns IRQ_HANDLED. It
happens consistently for pcie2a or pcie3a, after I disable one or the
other. I presume having both might be enough to overwhelm the system and
trigger the stall?

Looking at the handler, the status is always 0 after:
status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS +
			   (i * MSI_REG_CTRL_BLOCK_SIZE));

Unfortunately I do not know why that is yet.

> 
>  arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 44 -----------------------
>  1 file changed, 44 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> index 24fa449d48a6..d492723ccf7c 100644
> --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> @@ -186,27 +186,6 @@ &i2c18 {
>  	status = "okay";
>  };
>  
> -&pcie2a {
> -	ranges = <0x01000000 0x0 0x3c200000 0x0 0x3c200000 0x0 0x100000>,
> -		 <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>,
> -		 <0x03000000 0x5 0x00000000 0x5 0x00000000 0x1 0x00000000>;
> -
> -	perst-gpios = <&tlmm 143 GPIO_ACTIVE_LOW>;
> -	wake-gpios = <&tlmm 145 GPIO_ACTIVE_HIGH>;
> -
> -	pinctrl-names = "default";
> -	pinctrl-0 = <&pcie2a_default>;
> -
> -	status = "okay";
> -};
> -
> -&pcie2a_phy {
> -	vdda-phy-supply = <&vreg_l11a>;
> -	vdda-pll-supply = <&vreg_l3a>;
> -
> -	status = "okay";
> -};
> -
>  &pcie3a {
>  	ranges = <0x01000000 0x0 0x40200000 0x0 0x40200000 0x0 0x100000>,
>  		 <0x02000000 0x0 0x40300000 0x0 0x40300000 0x0 0x20000000>,
> @@ -356,29 +335,6 @@ i2c18_default: i2c18-default-state {
>  		bias-pull-up;
>  	};
>  
> -	pcie2a_default: pcie2a-default-state {
> -		perst-pins {
> -			pins = "gpio143";
> -			function = "gpio";
> -			drive-strength = <2>;
> -			bias-pull-down;
> -		};
> -
> -		clkreq-pins {
> -			pins = "gpio142";
> -			function = "pcie2a_clkreq";
> -			drive-strength = <2>;
> -			bias-pull-up;
> -		};
> -
> -		wake-pins {
> -			pins = "gpio145";
> -			function = "gpio";
> -			drive-strength = <2>;
> -			bias-pull-up;
> -		};
> -	};
> -
>  	pcie3a_default: pcie3a-default-state {
>  		perst-pins {
>  			pins = "gpio151";
> -- 
> 2.40.1
> 

-- 
Eric Chanudet


      parent reply	other threads:[~2023-06-12 20:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 19:33 [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a node" Lucas Karpinski
2023-06-07 13:44 ` Brian Masney
2023-06-12 20:05 ` Eric Chanudet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6m7xpqs73wrlin2ghhviwc4ijb5kyvk7ba2wpflqkgjivv6ol2@z5i5uli3h7f3 \
    --to=echanude@redhat.com \
    --cc=agross@kernel.org \
    --cc=ahalaney@redhat.com \
    --cc=andersson@kernel.org \
    --cc=bmasney@redhat.com \
    --cc=devicetree@vger.kernel.org \
    --cc=konrad.dybcio@linaro.org \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkarpins@redhat.com \
    --cc=quic_shazhuss@quicinc.com \
    --cc=robh+dt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).