linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ath10k: Wait until copy complete is actually done before completing
@ 2020-06-09 15:20 Douglas Anderson
  2020-06-12 14:25 ` pillair
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Douglas Anderson @ 2020-06-09 15:20 UTC (permalink / raw)
  To: kvalo
  Cc: kuabhs, pillair, saiprakash.ranjan, linux-arm-msm,
	Douglas Anderson, David S. Miller, Jakub Kicinski, ath10k,
	linux-kernel, linux-wireless, netdev

On wcn3990 we have "per_ce_irq = true".  That makes the
ath10k_ce_interrupt_summary() function always return 0xfff. The
ath10k_ce_per_engine_service_any() function will see this and think
that _all_ copy engines have an interrupt.  Without checking, the
ath10k_ce_per_engine_service() assumes that if it's called that the
"copy complete" (cc) interrupt fired.  This combination seems bad.

Let's add a check to make sure that the "copy complete" interrupt
actually fired in ath10k_ce_per_engine_service().

This might fix a hard-to-reproduce failure where it appears that the
copy complete handlers run before the copy is really complete.
Specifically a symptom was that we were seeing this on a Qualcomm
sc7180 board:
  arm-smmu 15000000.iommu: Unhandled context fault:
  fsr=0x402, iova=0x7fdd45780, fsynr=0x30003, cbfrsynra=0xc1, cb=10

Even on platforms that don't have wcn3990 this still seems like it
would be a sane thing to do.  Specifically the current IRQ handler
comments indicate that there might be other misc interrupt sources
firing that need to be cleared.  If one of those sources was the one
that caused the IRQ handler to be called it would also be important to
double-check that the interrupt we cared about actually fired.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/net/wireless/ath/ath10k/ce.c | 30 +++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index 294fbc1e89ab..ffdd4b995f33 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -481,6 +481,15 @@ static inline void ath10k_ce_engine_int_status_clear(struct ath10k *ar,
 	ath10k_ce_write32(ar, ce_ctrl_addr + wm_regs->addr, mask);
 }
 
+static inline bool ath10k_ce_engine_int_status_check(struct ath10k *ar,
+						     u32 ce_ctrl_addr,
+						     unsigned int mask)
+{
+	struct ath10k_hw_ce_host_wm_regs *wm_regs = ar->hw_ce_regs->wm_regs;
+
+	return ath10k_ce_read32(ar, ce_ctrl_addr + wm_regs->addr) & mask;
+}
+
 /*
  * Guts of ath10k_ce_send.
  * The caller takes responsibility for any needed locking.
@@ -1301,19 +1310,22 @@ void ath10k_ce_per_engine_service(struct ath10k *ar, unsigned int ce_id)
 
 	spin_lock_bh(&ce->ce_lock);
 
-	/* Clear the copy-complete interrupts that will be handled here. */
-	ath10k_ce_engine_int_status_clear(ar, ctrl_addr,
-					  wm_regs->cc_mask);
+	if (ath10k_ce_engine_int_status_check(ar, ctrl_addr,
+					      wm_regs->cc_mask)) {
+		/* Clear before handling */
+		ath10k_ce_engine_int_status_clear(ar, ctrl_addr,
+						  wm_regs->cc_mask);
 
-	spin_unlock_bh(&ce->ce_lock);
+		spin_unlock_bh(&ce->ce_lock);
 
-	if (ce_state->recv_cb)
-		ce_state->recv_cb(ce_state);
+		if (ce_state->recv_cb)
+			ce_state->recv_cb(ce_state);
 
-	if (ce_state->send_cb)
-		ce_state->send_cb(ce_state);
+		if (ce_state->send_cb)
+			ce_state->send_cb(ce_state);
 
-	spin_lock_bh(&ce->ce_lock);
+		spin_lock_bh(&ce->ce_lock);
+	}
 
 	/*
 	 * Misc CE interrupts are not being handled, but still need
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ath10k: Wait until copy complete is actually done before completing
  2020-06-09 15:20 [PATCH] ath10k: Wait until copy complete is actually done before completing Douglas Anderson
@ 2020-06-12 14:25 ` pillair
  2020-06-15 14:32 ` Kalle Valo
  2020-06-16  8:11 ` Kalle Valo
  2 siblings, 0 replies; 7+ messages in thread
From: pillair @ 2020-06-12 14:25 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kvalo, kuabhs, saiprakash.ranjan, linux-arm-msm, David S. Miller,
	Jakub Kicinski, ath10k, linux-kernel, linux-wireless, netdev

Hi Doug,

The send callback for the CEs do check for hw_index/SRRI before trying 
to free the buffer.
But adding a check for copy-complete (before calling the individual CE 
callbacks) seems to be the better approach. Hence I agree that this 
patch should be added.

Thanks,
Rakesh Pillai.

On 2020-06-09 20:50, Douglas Anderson wrote:
> On wcn3990 we have "per_ce_irq = true".  That makes the
> ath10k_ce_interrupt_summary() function always return 0xfff. The
> ath10k_ce_per_engine_service_any() function will see this and think
> that _all_ copy engines have an interrupt.  Without checking, the
> ath10k_ce_per_engine_service() assumes that if it's called that the
> "copy complete" (cc) interrupt fired.  This combination seems bad.
> 
> Let's add a check to make sure that the "copy complete" interrupt
> actually fired in ath10k_ce_per_engine_service().
> 
> This might fix a hard-to-reproduce failure where it appears that the
> copy complete handlers run before the copy is really complete.
> Specifically a symptom was that we were seeing this on a Qualcomm
> sc7180 board:
>   arm-smmu 15000000.iommu: Unhandled context fault:
>   fsr=0x402, iova=0x7fdd45780, fsynr=0x30003, cbfrsynra=0xc1, cb=10
> 
> Even on platforms that don't have wcn3990 this still seems like it
> would be a sane thing to do.  Specifically the current IRQ handler
> comments indicate that there might be other misc interrupt sources
> firing that need to be cleared.  If one of those sources was the one
> that caused the IRQ handler to be called it would also be important to
> double-check that the interrupt we cared about actually fired.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>


Reviewed-by: Rakesh Pillai <pillair@codeaurora.org>


> ---
> 
>  drivers/net/wireless/ath/ath10k/ce.c | 30 +++++++++++++++++++---------
>  1 file changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/ce.c
> b/drivers/net/wireless/ath/ath10k/ce.c
> index 294fbc1e89ab..ffdd4b995f33 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.c
> +++ b/drivers/net/wireless/ath/ath10k/ce.c
> @@ -481,6 +481,15 @@ static inline void
> ath10k_ce_engine_int_status_clear(struct ath10k *ar,
>  	ath10k_ce_write32(ar, ce_ctrl_addr + wm_regs->addr, mask);
>  }
> 
> +static inline bool ath10k_ce_engine_int_status_check(struct ath10k 
> *ar,
> +						     u32 ce_ctrl_addr,
> +						     unsigned int mask)
> +{
> +	struct ath10k_hw_ce_host_wm_regs *wm_regs = ar->hw_ce_regs->wm_regs;
> +
> +	return ath10k_ce_read32(ar, ce_ctrl_addr + wm_regs->addr) & mask;
> +}
> +
>  /*
>   * Guts of ath10k_ce_send.
>   * The caller takes responsibility for any needed locking.
> @@ -1301,19 +1310,22 @@ void ath10k_ce_per_engine_service(struct
> ath10k *ar, unsigned int ce_id)
> 
>  	spin_lock_bh(&ce->ce_lock);
> 
> -	/* Clear the copy-complete interrupts that will be handled here. */
> -	ath10k_ce_engine_int_status_clear(ar, ctrl_addr,
> -					  wm_regs->cc_mask);
> +	if (ath10k_ce_engine_int_status_check(ar, ctrl_addr,
> +					      wm_regs->cc_mask)) {
> +		/* Clear before handling */
> +		ath10k_ce_engine_int_status_clear(ar, ctrl_addr,
> +						  wm_regs->cc_mask);
> 
> -	spin_unlock_bh(&ce->ce_lock);
> +		spin_unlock_bh(&ce->ce_lock);
> 
> -	if (ce_state->recv_cb)
> -		ce_state->recv_cb(ce_state);
> +		if (ce_state->recv_cb)
> +			ce_state->recv_cb(ce_state);
> 
> -	if (ce_state->send_cb)
> -		ce_state->send_cb(ce_state);
> +		if (ce_state->send_cb)
> +			ce_state->send_cb(ce_state);
> 
> -	spin_lock_bh(&ce->ce_lock);
> +		spin_lock_bh(&ce->ce_lock);
> +	}
> 
>  	/*
>  	 * Misc CE interrupts are not being handled, but still need

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ath10k: Wait until copy complete is actually done before completing
  2020-06-09 15:20 [PATCH] ath10k: Wait until copy complete is actually done before completing Douglas Anderson
  2020-06-12 14:25 ` pillair
@ 2020-06-15 14:32 ` Kalle Valo
  2020-06-15 14:39   ` Doug Anderson
  2020-06-16  8:11 ` Kalle Valo
  2 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2020-06-15 14:32 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kuabhs, pillair, saiprakash.ranjan, linux-arm-msm,
	Douglas Anderson, David S. Miller, Jakub Kicinski, ath10k,
	linux-kernel, linux-wireless, netdev

Douglas Anderson <dianders@chromium.org> wrote:

> On wcn3990 we have "per_ce_irq = true".  That makes the
> ath10k_ce_interrupt_summary() function always return 0xfff. The
> ath10k_ce_per_engine_service_any() function will see this and think
> that _all_ copy engines have an interrupt.  Without checking, the
> ath10k_ce_per_engine_service() assumes that if it's called that the
> "copy complete" (cc) interrupt fired.  This combination seems bad.
> 
> Let's add a check to make sure that the "copy complete" interrupt
> actually fired in ath10k_ce_per_engine_service().
> 
> This might fix a hard-to-reproduce failure where it appears that the
> copy complete handlers run before the copy is really complete.
> Specifically a symptom was that we were seeing this on a Qualcomm
> sc7180 board:
>   arm-smmu 15000000.iommu: Unhandled context fault:
>   fsr=0x402, iova=0x7fdd45780, fsynr=0x30003, cbfrsynra=0xc1, cb=10
> 
> Even on platforms that don't have wcn3990 this still seems like it
> would be a sane thing to do.  Specifically the current IRQ handler
> comments indicate that there might be other misc interrupt sources
> firing that need to be cleared.  If one of those sources was the one
> that caused the IRQ handler to be called it would also be important to
> double-check that the interrupt we cared about actually fired.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

ath10k firmwares work very differently, on what hardware and firmware did you
test this? I'll add that information to the commit log.

-- 
https://patchwork.kernel.org/patch/11595887/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ath10k: Wait until copy complete is actually done before completing
  2020-06-15 14:32 ` Kalle Valo
@ 2020-06-15 14:39   ` Doug Anderson
  2020-06-15 14:56     ` Kalle Valo
  0 siblings, 1 reply; 7+ messages in thread
From: Doug Anderson @ 2020-06-15 14:39 UTC (permalink / raw)
  To: Kalle Valo
  Cc: kuabhs, Rakesh Pillai, Sai Prakash Ranjan, linux-arm-msm,
	David S. Miller, Jakub Kicinski, ath10k, LKML, linux-wireless,
	netdev

Hi,

On Mon, Jun 15, 2020 at 7:32 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Douglas Anderson <dianders@chromium.org> wrote:
>
> > On wcn3990 we have "per_ce_irq = true".  That makes the
> > ath10k_ce_interrupt_summary() function always return 0xfff. The
> > ath10k_ce_per_engine_service_any() function will see this and think
> > that _all_ copy engines have an interrupt.  Without checking, the
> > ath10k_ce_per_engine_service() assumes that if it's called that the
> > "copy complete" (cc) interrupt fired.  This combination seems bad.
> >
> > Let's add a check to make sure that the "copy complete" interrupt
> > actually fired in ath10k_ce_per_engine_service().
> >
> > This might fix a hard-to-reproduce failure where it appears that the
> > copy complete handlers run before the copy is really complete.
> > Specifically a symptom was that we were seeing this on a Qualcomm
> > sc7180 board:
> >   arm-smmu 15000000.iommu: Unhandled context fault:
> >   fsr=0x402, iova=0x7fdd45780, fsynr=0x30003, cbfrsynra=0xc1, cb=10
> >
> > Even on platforms that don't have wcn3990 this still seems like it
> > would be a sane thing to do.  Specifically the current IRQ handler
> > comments indicate that there might be other misc interrupt sources
> > firing that need to be cleared.  If one of those sources was the one
> > that caused the IRQ handler to be called it would also be important to
> > double-check that the interrupt we cared about actually fired.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
>
> ath10k firmwares work very differently, on what hardware and firmware did you
> test this? I'll add that information to the commit log.

I am running on a Qualcomm sc7180 SoC.

> --
> https://patchwork.kernel.org/patch/11595887/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ath10k: Wait until copy complete is actually done before completing
  2020-06-15 14:39   ` Doug Anderson
@ 2020-06-15 14:56     ` Kalle Valo
  2020-06-15 15:02       ` Doug Anderson
  0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2020-06-15 14:56 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Sai Prakash Ranjan, linux-arm-msm, linux-wireless, LKML, ath10k,
	Rakesh Pillai, netdev, Jakub Kicinski, David S. Miller, kuabhs

Doug Anderson <dianders@chromium.org> writes:

> On Mon, Jun 15, 2020 at 7:32 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>>
>> Douglas Anderson <dianders@chromium.org> wrote:
>>
>> > On wcn3990 we have "per_ce_irq = true".  That makes the
>> > ath10k_ce_interrupt_summary() function always return 0xfff. The
>> > ath10k_ce_per_engine_service_any() function will see this and think
>> > that _all_ copy engines have an interrupt.  Without checking, the
>> > ath10k_ce_per_engine_service() assumes that if it's called that the
>> > "copy complete" (cc) interrupt fired.  This combination seems bad.
>> >
>> > Let's add a check to make sure that the "copy complete" interrupt
>> > actually fired in ath10k_ce_per_engine_service().
>> >
>> > This might fix a hard-to-reproduce failure where it appears that the
>> > copy complete handlers run before the copy is really complete.
>> > Specifically a symptom was that we were seeing this on a Qualcomm
>> > sc7180 board:
>> >   arm-smmu 15000000.iommu: Unhandled context fault:
>> >   fsr=0x402, iova=0x7fdd45780, fsynr=0x30003, cbfrsynra=0xc1, cb=10
>> >
>> > Even on platforms that don't have wcn3990 this still seems like it
>> > would be a sane thing to do.  Specifically the current IRQ handler
>> > comments indicate that there might be other misc interrupt sources
>> > firing that need to be cleared.  If one of those sources was the one
>> > that caused the IRQ handler to be called it would also be important to
>> > double-check that the interrupt we cared about actually fired.
>> >
>> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> > Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
>>
>> ath10k firmwares work very differently, on what hardware and firmware did you
>> test this? I'll add that information to the commit log.
>
> I am running on a Qualcomm sc7180 SoC.

Sorry, I was unclear, I meant the ath10k hardware :) I guess WCN3990 but
what firmware version?

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ath10k: Wait until copy complete is actually done before completing
  2020-06-15 14:56     ` Kalle Valo
@ 2020-06-15 15:02       ` Doug Anderson
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Anderson @ 2020-06-15 15:02 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Sai Prakash Ranjan, linux-arm-msm, linux-wireless, LKML, ath10k,
	Rakesh Pillai, netdev, Jakub Kicinski, David S. Miller, kuabhs

Hi,

On Mon, Jun 15, 2020 at 7:56 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Doug Anderson <dianders@chromium.org> writes:
>
> > On Mon, Jun 15, 2020 at 7:32 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> >>
> >> Douglas Anderson <dianders@chromium.org> wrote:
> >>
> >> > On wcn3990 we have "per_ce_irq = true".  That makes the
> >> > ath10k_ce_interrupt_summary() function always return 0xfff. The
> >> > ath10k_ce_per_engine_service_any() function will see this and think
> >> > that _all_ copy engines have an interrupt.  Without checking, the
> >> > ath10k_ce_per_engine_service() assumes that if it's called that the
> >> > "copy complete" (cc) interrupt fired.  This combination seems bad.
> >> >
> >> > Let's add a check to make sure that the "copy complete" interrupt
> >> > actually fired in ath10k_ce_per_engine_service().
> >> >
> >> > This might fix a hard-to-reproduce failure where it appears that the
> >> > copy complete handlers run before the copy is really complete.
> >> > Specifically a symptom was that we were seeing this on a Qualcomm
> >> > sc7180 board:
> >> >   arm-smmu 15000000.iommu: Unhandled context fault:
> >> >   fsr=0x402, iova=0x7fdd45780, fsynr=0x30003, cbfrsynra=0xc1, cb=10
> >> >
> >> > Even on platforms that don't have wcn3990 this still seems like it
> >> > would be a sane thing to do.  Specifically the current IRQ handler
> >> > comments indicate that there might be other misc interrupt sources
> >> > firing that need to be cleared.  If one of those sources was the one
> >> > that caused the IRQ handler to be called it would also be important to
> >> > double-check that the interrupt we cared about actually fired.
> >> >
> >> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> >> > Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
> >>
> >> ath10k firmwares work very differently, on what hardware and firmware did you
> >> test this? I'll add that information to the commit log.
> >
> > I am running on a Qualcomm sc7180 SoC.
>
> Sorry, I was unclear, I meant the ath10k hardware :) I guess WCN3990 but
> what firmware version?

Ah, sorry!  Yes, it appears to be wcn3990 based on my device tree:

$ git grep -A2 wifi -- arch/arm64/boot/dts/qcom/sc7180.dtsi

wifi: wifi@18800000 {
        compatible = "qcom,wcn3990-wifi";
        reg = <0 0x18800000 0 0x800000>;
        reg-names = "membase";

Firmware isn't final yet, but currently my boot log shows:

qmi fw_version 0x322a01ea
fw_build_timestamp 2020-05-20 03:47
QC_IMAGE_VERSION_STRING=WLAN.HL.3.2.2-00490-QCAHLSWMTPL-1

-Doug

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ath10k: Wait until copy complete is actually done before completing
  2020-06-09 15:20 [PATCH] ath10k: Wait until copy complete is actually done before completing Douglas Anderson
  2020-06-12 14:25 ` pillair
  2020-06-15 14:32 ` Kalle Valo
@ 2020-06-16  8:11 ` Kalle Valo
  2 siblings, 0 replies; 7+ messages in thread
From: Kalle Valo @ 2020-06-16  8:11 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kuabhs, pillair, saiprakash.ranjan, linux-arm-msm,
	Douglas Anderson, David S. Miller, Jakub Kicinski, ath10k,
	linux-kernel, linux-wireless, netdev

Douglas Anderson <dianders@chromium.org> wrote:

> On wcn3990 we have "per_ce_irq = true".  That makes the
> ath10k_ce_interrupt_summary() function always return 0xfff. The
> ath10k_ce_per_engine_service_any() function will see this and think
> that _all_ copy engines have an interrupt.  Without checking, the
> ath10k_ce_per_engine_service() assumes that if it's called that the
> "copy complete" (cc) interrupt fired.  This combination seems bad.
> 
> Let's add a check to make sure that the "copy complete" interrupt
> actually fired in ath10k_ce_per_engine_service().
> 
> This might fix a hard-to-reproduce failure where it appears that the
> copy complete handlers run before the copy is really complete.
> Specifically a symptom was that we were seeing this on a Qualcomm
> sc7180 board:
>   arm-smmu 15000000.iommu: Unhandled context fault:
>   fsr=0x402, iova=0x7fdd45780, fsynr=0x30003, cbfrsynra=0xc1, cb=10
> 
> Even on platforms that don't have wcn3990 this still seems like it
> would be a sane thing to do.  Specifically the current IRQ handler
> comments indicate that there might be other misc interrupt sources
> firing that need to be cleared.  If one of those sources was the one
> that caused the IRQ handler to be called it would also be important to
> double-check that the interrupt we cared about actually fired.
> 
> Tested-on: WCN3990 SNOC WLAN.HL.3.2.2-00490-QCAHLSWMTPL-1
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

8f9ed93d09a9 ath10k: Wait until copy complete is actually done before completing

-- 
https://patchwork.kernel.org/patch/11595887/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-16  8:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-09 15:20 [PATCH] ath10k: Wait until copy complete is actually done before completing Douglas Anderson
2020-06-12 14:25 ` pillair
2020-06-15 14:32 ` Kalle Valo
2020-06-15 14:39   ` Doug Anderson
2020-06-15 14:56     ` Kalle Valo
2020-06-15 15:02       ` Doug Anderson
2020-06-16  8:11 ` Kalle Valo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).