All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-10-01 21:38 ` Tony Lindgren
  0 siblings, 0 replies; 8+ messages in thread
From: Tony Lindgren @ 2018-10-01 21:38 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, Luca Coelho,
	Maital Hahn, Maxim Altshul, Shahar Patury, linux-wireless,
	linux-omap

We do not currently clear wl->elp_compl on ELP timeout and we have bogus
lingering pointer that wlcore_irq then will try to access after recovery
is done:

BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
...
(spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
(do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
(_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
(complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
(wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
(irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
(irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
(kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
...

After that the system will hang. Let's fix this by adding a flag for
recovery and moving the recovery work call to to the error handling
section.

And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
it too in wl1271_recovery_work() and just downgrade the error to a
warning to prevent overly verbose output.

Cc: Eyal Reizer <eyalr@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 drivers/net/wireless/ti/wlcore/main.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -957,6 +957,8 @@ static void wl1271_recovery_work(struct work_struct *work)
 	BUG_ON(wl->conf.recovery.bug_on_recovery &&
 	       !test_bit(WL1271_FLAG_INTENDED_FW_RECOVERY, &wl->flags));
 
+	clear_bit(WL1271_FLAG_INTENDED_FW_RECOVERY, &wl->flags);
+
 	if (wl->conf.recovery.no_recovery) {
 		wl1271_info("No recovery (chosen on module load). Fw will remain stuck.");
 		goto out_unlock;
@@ -6710,6 +6712,7 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 	int ret;
 	unsigned long start_time = jiffies;
 	bool pending = false;
+	bool recovery = false;
 
 	/* Nothing to do if no ELP mode requested */
 	if (!test_bit(WL1271_FLAG_IN_ELP, &wl->flags))
@@ -6726,7 +6729,7 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 
 	ret = wlcore_raw_write32(wl, HW_ACCESS_ELP_CTRL_REG, ELPCTRL_WAKE_UP);
 	if (ret < 0) {
-		wl12xx_queue_recovery_work(wl);
+		recovery = true;
 		goto err;
 	}
 
@@ -6734,11 +6737,12 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 		ret = wait_for_completion_timeout(&compl,
 			msecs_to_jiffies(WL1271_WAKEUP_TIMEOUT));
 		if (ret == 0) {
-			wl1271_error("ELP wakeup timeout!");
-			wl12xx_queue_recovery_work(wl);
+			wl1271_warning("ELP wakeup timeout!");
 
 			/* Return no error for runtime PM for recovery */
-			return 0;
+			ret = 0;
+			recovery = true;
+			goto err;
 		}
 	}
 
@@ -6753,6 +6757,12 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 	spin_lock_irqsave(&wl->wl_lock, flags);
 	wl->elp_compl = NULL;
 	spin_unlock_irqrestore(&wl->wl_lock, flags);
+
+	if (recovery) {
+		set_bit(WL1271_FLAG_INTENDED_FW_RECOVERY, &wl->flags);
+		wl12xx_queue_recovery_work(wl);
+	}
+
 	return ret;
 }
 
-- 
2.19.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-10-01 21:38 ` Tony Lindgren
  0 siblings, 0 replies; 8+ messages in thread
From: Tony Lindgren @ 2018-10-01 21:38 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, Luca Coelho,
	Maital Hahn, Maxim Altshul, Shahar Patury,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA

We do not currently clear wl->elp_compl on ELP timeout and we have bogus
lingering pointer that wlcore_irq then will try to access after recovery
is done:

BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
...
(spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
(do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
(_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
(complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
(wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
(irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
(irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
(kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
...

After that the system will hang. Let's fix this by adding a flag for
recovery and moving the recovery work call to to the error handling
section.

And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
it too in wl1271_recovery_work() and just downgrade the error to a
warning to prevent overly verbose output.

Cc: Eyal Reizer <eyalr-l0cyMroinI0@public.gmane.org>
Signed-off-by: Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org>
---
 drivers/net/wireless/ti/wlcore/main.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -957,6 +957,8 @@ static void wl1271_recovery_work(struct work_struct *work)
 	BUG_ON(wl->conf.recovery.bug_on_recovery &&
 	       !test_bit(WL1271_FLAG_INTENDED_FW_RECOVERY, &wl->flags));
 
+	clear_bit(WL1271_FLAG_INTENDED_FW_RECOVERY, &wl->flags);
+
 	if (wl->conf.recovery.no_recovery) {
 		wl1271_info("No recovery (chosen on module load). Fw will remain stuck.");
 		goto out_unlock;
@@ -6710,6 +6712,7 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 	int ret;
 	unsigned long start_time = jiffies;
 	bool pending = false;
+	bool recovery = false;
 
 	/* Nothing to do if no ELP mode requested */
 	if (!test_bit(WL1271_FLAG_IN_ELP, &wl->flags))
@@ -6726,7 +6729,7 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 
 	ret = wlcore_raw_write32(wl, HW_ACCESS_ELP_CTRL_REG, ELPCTRL_WAKE_UP);
 	if (ret < 0) {
-		wl12xx_queue_recovery_work(wl);
+		recovery = true;
 		goto err;
 	}
 
@@ -6734,11 +6737,12 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 		ret = wait_for_completion_timeout(&compl,
 			msecs_to_jiffies(WL1271_WAKEUP_TIMEOUT));
 		if (ret == 0) {
-			wl1271_error("ELP wakeup timeout!");
-			wl12xx_queue_recovery_work(wl);
+			wl1271_warning("ELP wakeup timeout!");
 
 			/* Return no error for runtime PM for recovery */
-			return 0;
+			ret = 0;
+			recovery = true;
+			goto err;
 		}
 	}
 
@@ -6753,6 +6757,12 @@ static int __maybe_unused wlcore_runtime_resume(struct device *dev)
 	spin_lock_irqsave(&wl->wl_lock, flags);
 	wl->elp_compl = NULL;
 	spin_unlock_irqrestore(&wl->wl_lock, flags);
+
+	if (recovery) {
+		set_bit(WL1271_FLAG_INTENDED_FW_RECOVERY, &wl->flags);
+		wl12xx_queue_recovery_work(wl);
+	}
+
 	return ret;
 }
 
-- 
2.19.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-10-05  8:33   ` Kalle Valo
  0 siblings, 0 replies; 8+ messages in thread
From: Kalle Valo @ 2018-10-05  8:33 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, Luca Coelho,
	Maital Hahn, Maxim Altshul, Shahar Patury, linux-wireless,
	linux-omap

Tony Lindgren <tony@atomide.com> wrote:

> We do not currently clear wl->elp_compl on ELP timeout and we have bogus
> lingering pointer that wlcore_irq then will try to access after recovery
> is done:
> 
> BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
> ...
> (spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
> (do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
> (_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
> (complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
> (wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
> (irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
> (irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
> (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> ...
> 
> After that the system will hang. Let's fix this by adding a flag for
> recovery and moving the recovery work call to to the error handling
> section.
> 
> And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
> it too in wl1271_recovery_work() and just downgrade the error to a
> warning to prevent overly verbose output.
> 
> Cc: Eyal Reizer <eyalr@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>

Patch applied to wireless-drivers-next.git, thanks.

4e651bad8489 wlcore: Fix BUG with clear completion on timeout

-- 
https://patchwork.kernel.org/patch/10622767/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-10-05  8:33   ` Kalle Valo
  0 siblings, 0 replies; 8+ messages in thread
From: Kalle Valo @ 2018-10-05  8:33 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Eyal Reizer, Kishon Vijay Abraham I, Guy Mishol, Luca Coelho,
	Maital Hahn, Maxim Altshul, Shahar Patury,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA

Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org> wrote:

> We do not currently clear wl->elp_compl on ELP timeout and we have bogus
> lingering pointer that wlcore_irq then will try to access after recovery
> is done:
> 
> BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
> ...
> (spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
> (do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
> (_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
> (complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
> (wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
> (irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
> (irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
> (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> ...
> 
> After that the system will hang. Let's fix this by adding a flag for
> recovery and moving the recovery work call to to the error handling
> section.
> 
> And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
> it too in wl1271_recovery_work() and just downgrade the error to a
> warning to prevent overly verbose output.
> 
> Cc: Eyal Reizer <eyalr-l0cyMroinI0@public.gmane.org>
> Signed-off-by: Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org>

Patch applied to wireless-drivers-next.git, thanks.

4e651bad8489 wlcore: Fix BUG with clear completion on timeout

-- 
https://patchwork.kernel.org/patch/10622767/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-11-30 13:16     ` Adam Ford
  0 siblings, 0 replies; 8+ messages in thread
From: Adam Ford @ 2018-11-30 13:16 UTC (permalink / raw)
  To: kvalo
  Cc: Tony Lindgren, Reizer, Eyal, Kishon Vijay Abraham I, guym,
	luciano.coelho, maitalm, maxim.altshul, shaharp, linux-wireless,
	linux-omap

On Fri, Oct 5, 2018 at 3:33 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Tony Lindgren <tony@atomide.com> wrote:
>
> > We do not currently clear wl->elp_compl on ELP timeout and we have bogus
> > lingering pointer that wlcore_irq then will try to access after recovery
> > is done:
> >
> > BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
> > ...
> > (spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
> > (do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
> > (_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
> > (complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
> > (wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
> > (irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
> > (irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
> > (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> > ...
> >
> > After that the system will hang. Let's fix this by adding a flag for
> > recovery and moving the recovery work call to to the error handling
> > section.
> >
> > And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
> > it too in wl1271_recovery_work() and just downgrade the error to a
> > warning to prevent overly verbose output.
> >

Do we know how far back this bug goes and which versions need this
patch applied to it?  I have seen something similar on 4.19, but I
haven't tried this patch to fix it.  It wasn't clear to me if this is
linux-next or 4.19 or something different.

thanks

adam
> > Cc: Eyal Reizer <eyalr@ti.com>
> > Signed-off-by: Tony Lindgren <tony@atomide.com>
>
> Patch applied to wireless-drivers-next.git, thanks.
>
> 4e651bad8489 wlcore: Fix BUG with clear completion on timeout
>
> --
> https://patchwork.kernel.org/patch/10622767/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-11-30 13:16     ` Adam Ford
  0 siblings, 0 replies; 8+ messages in thread
From: Adam Ford @ 2018-11-30 13:16 UTC (permalink / raw)
  To: kvalo-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: Tony Lindgren, Reizer, Eyal, Kishon Vijay Abraham I,
	guym-l0cyMroinI0, luciano.coelho-ral2JQCrhuEAvxtiuMwx3w,
	maitalm-l0cyMroinI0, maxim.altshul-l0cyMroinI0,
	shaharp-l0cyMroinI0, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA

On Fri, Oct 5, 2018 at 3:33 AM Kalle Valo <kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
>
> Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org> wrote:
>
> > We do not currently clear wl->elp_compl on ELP timeout and we have bogus
> > lingering pointer that wlcore_irq then will try to access after recovery
> > is done:
> >
> > BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
> > ...
> > (spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
> > (do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
> > (_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
> > (complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
> > (wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
> > (irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
> > (irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
> > (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> > ...
> >
> > After that the system will hang. Let's fix this by adding a flag for
> > recovery and moving the recovery work call to to the error handling
> > section.
> >
> > And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
> > it too in wl1271_recovery_work() and just downgrade the error to a
> > warning to prevent overly verbose output.
> >

Do we know how far back this bug goes and which versions need this
patch applied to it?  I have seen something similar on 4.19, but I
haven't tried this patch to fix it.  It wasn't clear to me if this is
linux-next or 4.19 or something different.

thanks

adam
> > Cc: Eyal Reizer <eyalr-l0cyMroinI0@public.gmane.org>
> > Signed-off-by: Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org>
>
> Patch applied to wireless-drivers-next.git, thanks.
>
> 4e651bad8489 wlcore: Fix BUG with clear completion on timeout
>
> --
> https://patchwork.kernel.org/patch/10622767/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-11-30 18:32       ` Tony Lindgren
  0 siblings, 0 replies; 8+ messages in thread
From: Tony Lindgren @ 2018-11-30 18:32 UTC (permalink / raw)
  To: Adam Ford
  Cc: kvalo, Reizer, Eyal, Kishon Vijay Abraham I, guym,
	luciano.coelho, maitalm, maxim.altshul, shaharp, linux-wireless,
	linux-omap

Hi,

* Adam Ford <aford173@gmail.com> [181130 13:16]:
> On Fri, Oct 5, 2018 at 3:33 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> >
> > Tony Lindgren <tony@atomide.com> wrote:
> >
> > > We do not currently clear wl->elp_compl on ELP timeout and we have bogus
> > > lingering pointer that wlcore_irq then will try to access after recovery
> > > is done:
> > >
> > > BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
> > > ...
> > > (spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
> > > (do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
> > > (_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
> > > (complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
> > > (wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
> > > (irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
> > > (irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
> > > (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> > > ...
> > >
> > > After that the system will hang. Let's fix this by adding a flag for
> > > recovery and moving the recovery work call to to the error handling
> > > section.
> > >
> > > And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
> > > it too in wl1271_recovery_work() and just downgrade the error to a
> > > warning to prevent overly verbose output.
> > >
> 
> Do we know how far back this bug goes and which versions need this
> patch applied to it?  I have seen something similar on 4.19, but I
> haven't tried this patch to fix it.  It wasn't clear to me if this is
> linux-next or 4.19 or something different.

I'm not sure if this is needed for v4.19 as the wakeirq patch
is not there. Maybe give it a try and see if it helps with
the issue you're seeing, then request inclusion for stable if
it helps?

BTW any wlcore issues with earlier kernels should be separately
debugged and tested. Fixes done after changing wlcore to use
PM runtime and wakeirq may be incomple for earlier kernels,
that's the two commits and below and any changes related to them.

And in general there seems to be two categories of common issues
with wlcore that I've seen: GPIO interrupt not behaving with the
SoC or old firmware being used for wlcore.

Regards,

Tony

8< -----------------
3c83dd577c7f ("wlcore: Add support for optional wakeirq")
fa2648a34e73 ("wlcore: Add support for runtime PM")

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wlcore: Fix BUG with clear completion on timeout
@ 2018-11-30 18:32       ` Tony Lindgren
  0 siblings, 0 replies; 8+ messages in thread
From: Tony Lindgren @ 2018-11-30 18:32 UTC (permalink / raw)
  To: Adam Ford
  Cc: kvalo-sgV2jX0FEOL9JmXXK+q4OQ, Reizer, Eyal,
	Kishon Vijay Abraham I, guym-l0cyMroinI0,
	luciano.coelho-ral2JQCrhuEAvxtiuMwx3w, maitalm-l0cyMroinI0,
	maxim.altshul-l0cyMroinI0, shaharp-l0cyMroinI0,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA

Hi,

* Adam Ford <aford173-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [181130 13:16]:
> On Fri, Oct 5, 2018 at 3:33 AM Kalle Valo <kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> wrote:
> >
> > Tony Lindgren <tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org> wrote:
> >
> > > We do not currently clear wl->elp_compl on ELP timeout and we have bogus
> > > lingering pointer that wlcore_irq then will try to access after recovery
> > > is done:
> > >
> > > BUG: spinlock bad magic on CPU#1, irq/255-wl12xx/580
> > > ...
> > > (spin_dump) from [<c01b9344>] (do_raw_spin_lock+0xc8/0x124)
> > > (do_raw_spin_lock) from [<c09b3970>] (_raw_spin_lock_irqsave+0x68/0x74)
> > > (_raw_spin_lock_irqsave) from [<c01a02f0>] (complete+0x24/0x58)
> > > (complete) from [<bf572610>] (wlcore_irq+0x48/0x17c [wlcore])
> > > (wlcore_irq [wlcore]) from [<c01c5efc>] (irq_thread_fn+0x2c/0x64)
> > > (irq_thread_fn) from [<c01c623c>] (irq_thread+0x148/0x290)
> > > (irq_thread) from [<c016b4b0>] (kthread+0x160/0x17c)
> > > (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> > > ...
> > >
> > > After that the system will hang. Let's fix this by adding a flag for
> > > recovery and moving the recovery work call to to the error handling
> > > section.
> > >
> > > And we want to set WL1271_FLAG_INTENDED_FW_RECOVERY and actually clear
> > > it too in wl1271_recovery_work() and just downgrade the error to a
> > > warning to prevent overly verbose output.
> > >
> 
> Do we know how far back this bug goes and which versions need this
> patch applied to it?  I have seen something similar on 4.19, but I
> haven't tried this patch to fix it.  It wasn't clear to me if this is
> linux-next or 4.19 or something different.

I'm not sure if this is needed for v4.19 as the wakeirq patch
is not there. Maybe give it a try and see if it helps with
the issue you're seeing, then request inclusion for stable if
it helps?

BTW any wlcore issues with earlier kernels should be separately
debugged and tested. Fixes done after changing wlcore to use
PM runtime and wakeirq may be incomple for earlier kernels,
that's the two commits and below and any changes related to them.

And in general there seems to be two categories of common issues
with wlcore that I've seen: GPIO interrupt not behaving with the
SoC or old firmware being used for wlcore.

Regards,

Tony

8< -----------------
3c83dd577c7f ("wlcore: Add support for optional wakeirq")
fa2648a34e73 ("wlcore: Add support for runtime PM")

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-11-30 18:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-01 21:38 [PATCH] wlcore: Fix BUG with clear completion on timeout Tony Lindgren
2018-10-01 21:38 ` Tony Lindgren
2018-10-05  8:33 ` Kalle Valo
2018-10-05  8:33   ` Kalle Valo
2018-11-30 13:16   ` Adam Ford
2018-11-30 13:16     ` Adam Ford
2018-11-30 18:32     ` Tony Lindgren
2018-11-30 18:32       ` Tony Lindgren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.