linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2] wlcore: fix race for WL1271_FLAG_IRQ_RUNNING
@ 2019-10-07 17:28 Tony Lindgren
  2019-10-08 14:05 ` Tony Lindgren
  0 siblings, 1 reply; 4+ messages in thread
From: Tony Lindgren @ 2019-10-07 17:28 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, linux-wireless,
	linux-omap, Anders Roxell, John Stultz, Ulf Hansson

We set WL1271_FLAG_IRQ_RUNNING in the beginning of wlcore_irq(), and test
for it in wlcore_runtime_resume(). But WL1271_FLAG_IRQ_RUNNING currently
gets cleared too early by wlcore_irq_locked() before wlcore_irq() is done
calling it. And this will race against wlcore_runtime_resume() testing it.

Let's set and clear IRQ_RUNNING in wlcore_irq() so wlcore_runtime_resume()
can rely on it. And let's remove old comments about hardirq, that's no
longer the case as we're using request_threaded_irq().

This fixes occasional annoying wlcore firmware reboots stat start with
"wlcore: WARNING ELP wakeup timeout!" followed by a multisecond latency
when the wlcore firmware gets wrongly rebooted waiting for an ELP wake
interrupt that won't be coming.

Note that I also suspect some form of this issue was the root cause why
the wlcore GPIO interrupt has been often configured as a level interrupt
instead of edge as an attempt to work around the ELP wake timeout errors.

Fixes: fa2648a34e73 ("wlcore: Add support for runtime PM")
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Eyal Reizer <eyalr@ti.com>
Cc: Guy Mishol <guym@ti.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---

Changes since v1:

- Add locking around clear_bit like we do elsewhere in the driver

 drivers/net/wireless/ti/wlcore/main.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -544,11 +544,6 @@ static int wlcore_irq_locked(struct wl1271 *wl)
 	}
 
 	while (!done && loopcount--) {
-		/*
-		 * In order to avoid a race with the hardirq, clear the flag
-		 * before acknowledging the chip.
-		 */
-		clear_bit(WL1271_FLAG_IRQ_RUNNING, &wl->flags);
 		smp_mb__after_atomic();
 
 		ret = wlcore_fw_status(wl, wl->fw_status);
@@ -668,7 +663,7 @@ static irqreturn_t wlcore_irq(int irq, void *cookie)
 		disable_irq_nosync(wl->irq);
 		pm_wakeup_event(wl->dev, 0);
 		spin_unlock_irqrestore(&wl->wl_lock, flags);
-		return IRQ_HANDLED;
+		goto out_handled;
 	}
 	spin_unlock_irqrestore(&wl->wl_lock, flags);
 
@@ -692,6 +687,11 @@ static irqreturn_t wlcore_irq(int irq, void *cookie)
 
 	mutex_unlock(&wl->mutex);
 
+out_handled:
+	spin_lock_irqsave(&wl->wl_lock, flags);
+	clear_bit(WL1271_FLAG_IRQ_RUNNING, &wl->flags);
+	spin_unlock_irqrestore(&wl->wl_lock, flags);
+
 	return IRQ_HANDLED;
 }
 
-- 
2.23.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv2] wlcore: fix race for WL1271_FLAG_IRQ_RUNNING
  2019-10-07 17:28 [PATCHv2] wlcore: fix race for WL1271_FLAG_IRQ_RUNNING Tony Lindgren
@ 2019-10-08 14:05 ` Tony Lindgren
  2019-10-08 14:16   ` Kalle Valo
  0 siblings, 1 reply; 4+ messages in thread
From: Tony Lindgren @ 2019-10-08 14:05 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, linux-wireless,
	linux-omap, Anders Roxell, John Stultz, Ulf Hansson

* Tony Lindgren <tony@atomide.com> [191007 17:29]:
> We set WL1271_FLAG_IRQ_RUNNING in the beginning of wlcore_irq(), and test
> for it in wlcore_runtime_resume(). But WL1271_FLAG_IRQ_RUNNING currently
> gets cleared too early by wlcore_irq_locked() before wlcore_irq() is done
> calling it. And this will race against wlcore_runtime_resume() testing it.
> 
> Let's set and clear IRQ_RUNNING in wlcore_irq() so wlcore_runtime_resume()
> can rely on it. And let's remove old comments about hardirq, that's no
> longer the case as we're using request_threaded_irq().
> 
> This fixes occasional annoying wlcore firmware reboots stat start with
> "wlcore: WARNING ELP wakeup timeout!" followed by a multisecond latency
> when the wlcore firmware gets wrongly rebooted waiting for an ELP wake
> interrupt that won't be coming.
> 
> Note that I also suspect some form of this issue was the root cause why
> the wlcore GPIO interrupt has been often configured as a level interrupt
> instead of edge as an attempt to work around the ELP wake timeout errors.

So this fixed a reproducable test case where loading some webpages
often produced ELP timeout errors. But looks like I'm still seeing ELP
timeouts elsewhere. So best to wait on this one. Something is still
wrong with the ELP timeout handling.

Regards,

Tony

> Fixes: fa2648a34e73 ("wlcore: Add support for runtime PM")
> Cc: Anders Roxell <anders.roxell@linaro.org>
> Cc: Eyal Reizer <eyalr@ti.com>
> Cc: Guy Mishol <guym@ti.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
> 
> Changes since v1:
> 
> - Add locking around clear_bit like we do elsewhere in the driver
> 
>  drivers/net/wireless/ti/wlcore/main.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c
> --- a/drivers/net/wireless/ti/wlcore/main.c
> +++ b/drivers/net/wireless/ti/wlcore/main.c
> @@ -544,11 +544,6 @@ static int wlcore_irq_locked(struct wl1271 *wl)
>  	}
>  
>  	while (!done && loopcount--) {
> -		/*
> -		 * In order to avoid a race with the hardirq, clear the flag
> -		 * before acknowledging the chip.
> -		 */
> -		clear_bit(WL1271_FLAG_IRQ_RUNNING, &wl->flags);
>  		smp_mb__after_atomic();
>  
>  		ret = wlcore_fw_status(wl, wl->fw_status);
> @@ -668,7 +663,7 @@ static irqreturn_t wlcore_irq(int irq, void *cookie)
>  		disable_irq_nosync(wl->irq);
>  		pm_wakeup_event(wl->dev, 0);
>  		spin_unlock_irqrestore(&wl->wl_lock, flags);
> -		return IRQ_HANDLED;
> +		goto out_handled;
>  	}
>  	spin_unlock_irqrestore(&wl->wl_lock, flags);
>  
> @@ -692,6 +687,11 @@ static irqreturn_t wlcore_irq(int irq, void *cookie)
>  
>  	mutex_unlock(&wl->mutex);
>  
> +out_handled:
> +	spin_lock_irqsave(&wl->wl_lock, flags);
> +	clear_bit(WL1271_FLAG_IRQ_RUNNING, &wl->flags);
> +	spin_unlock_irqrestore(&wl->wl_lock, flags);
> +
>  	return IRQ_HANDLED;
>  }
>  
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv2] wlcore: fix race for WL1271_FLAG_IRQ_RUNNING
  2019-10-08 14:05 ` Tony Lindgren
@ 2019-10-08 14:16   ` Kalle Valo
  2019-10-09 16:42     ` Tony Lindgren
  0 siblings, 1 reply; 4+ messages in thread
From: Kalle Valo @ 2019-10-08 14:16 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, linux-wireless,
	linux-omap, Anders Roxell, John Stultz, Ulf Hansson

Tony Lindgren <tony@atomide.com> writes:

> * Tony Lindgren <tony@atomide.com> [191007 17:29]:
>> We set WL1271_FLAG_IRQ_RUNNING in the beginning of wlcore_irq(), and test
>> for it in wlcore_runtime_resume(). But WL1271_FLAG_IRQ_RUNNING currently
>> gets cleared too early by wlcore_irq_locked() before wlcore_irq() is done
>> calling it. And this will race against wlcore_runtime_resume() testing it.
>> 
>> Let's set and clear IRQ_RUNNING in wlcore_irq() so wlcore_runtime_resume()
>> can rely on it. And let's remove old comments about hardirq, that's no
>> longer the case as we're using request_threaded_irq().
>> 
>> This fixes occasional annoying wlcore firmware reboots stat start with
>> "wlcore: WARNING ELP wakeup timeout!" followed by a multisecond latency
>> when the wlcore firmware gets wrongly rebooted waiting for an ELP wake
>> interrupt that won't be coming.
>> 
>> Note that I also suspect some form of this issue was the root cause why
>> the wlcore GPIO interrupt has been often configured as a level interrupt
>> instead of edge as an attempt to work around the ELP wake timeout errors.
>
> So this fixed a reproducable test case where loading some webpages
> often produced ELP timeout errors. But looks like I'm still seeing ELP
> timeouts elsewhere. So best to wait on this one. Something is still
> wrong with the ELP timeout handling.

Ok, I'll drop this then. Please send v3 once you think the patch is
ready to be applied.

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv2] wlcore: fix race for WL1271_FLAG_IRQ_RUNNING
  2019-10-08 14:16   ` Kalle Valo
@ 2019-10-09 16:42     ` Tony Lindgren
  0 siblings, 0 replies; 4+ messages in thread
From: Tony Lindgren @ 2019-10-09 16:42 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, linux-wireless,
	linux-omap, Anders Roxell, John Stultz, Ulf Hansson

* Kalle Valo <kvalo@codeaurora.org> [191008 14:17]:
> Tony Lindgren <tony@atomide.com> writes:
> 
> > * Tony Lindgren <tony@atomide.com> [191007 17:29]:
> >> We set WL1271_FLAG_IRQ_RUNNING in the beginning of wlcore_irq(), and test
> >> for it in wlcore_runtime_resume(). But WL1271_FLAG_IRQ_RUNNING currently
> >> gets cleared too early by wlcore_irq_locked() before wlcore_irq() is done
> >> calling it. And this will race against wlcore_runtime_resume() testing it.
> >> 
> >> Let's set and clear IRQ_RUNNING in wlcore_irq() so wlcore_runtime_resume()
> >> can rely on it. And let's remove old comments about hardirq, that's no
> >> longer the case as we're using request_threaded_irq().
> >> 
> >> This fixes occasional annoying wlcore firmware reboots stat start with
> >> "wlcore: WARNING ELP wakeup timeout!" followed by a multisecond latency
> >> when the wlcore firmware gets wrongly rebooted waiting for an ELP wake
> >> interrupt that won't be coming.
> >> 
> >> Note that I also suspect some form of this issue was the root cause why
> >> the wlcore GPIO interrupt has been often configured as a level interrupt
> >> instead of edge as an attempt to work around the ELP wake timeout errors.
> >
> > So this fixed a reproducable test case where loading some webpages
> > often produced ELP timeout errors. But looks like I'm still seeing ELP
> > timeouts elsewhere. So best to wait on this one. Something is still
> > wrong with the ELP timeout handling.
> 
> Ok, I'll drop this then. Please send v3 once you think the patch is
> ready to be applied.

Looks like the real fix is to use level instead of edge interrupt
for omap4 and 5 to avoid the check for untriggered interrupts in
omap_gpio_unidle(). Should not be needed for other SoCs as their
l4per can't idle independent of the CPUs.

I'll send a separate patch for that. And I'll send an updated clean-up
patch for $subject patch as the race described above should never
happen.

The clearing of WL1271_FLAG_IRQ_RUNNING bit happens already within
pm_runtime_get_sync() section of wlcore_irq_locked(). So this patch just
happened to sligthly change the timings for my reproducable test case.
We should not be able to hit the race described above even with super
short autosuspend timeouts between wlcore_irq_locked() and the end of
wlcore_irq() :)

Regards,

Tony


> -- 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-09 16:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 17:28 [PATCHv2] wlcore: fix race for WL1271_FLAG_IRQ_RUNNING Tony Lindgren
2019-10-08 14:05 ` Tony Lindgren
2019-10-08 14:16   ` Kalle Valo
2019-10-09 16:42     ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).