linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] ath10k: warm reset fixes 2014-05-14
@ 2014-05-14  6:57 Michal Kazior
  2014-05-14  6:57 ` [PATCH 1/2] ath10k: improve warm reset reliability Michal Kazior
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Michal Kazior @ 2014-05-14  6:57 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Hi,

Warm reset was not able to recover from a device
crash so ath10k was falling back to cold reset
which would occasionally hang the system or
generate a data bus error with some device chips.

Warm reset now works a lot more reliably (at least
on my T430). It's able to recover from a "hard"
simulate fw crash and some firmware asserts I've
forced meaning it should recover in most cases
without falling back to the cold reset.

There's still one case of a crash warm reset is
incapable to recover from: IOMMU faults (when
supported and enabled and device tries to access
non-DMA mapped host memory) trigger device crash
and leave CE totally unresponsive before and after
the warm reset.


Michal Kazior (2):
  ath10k: improve warm reset reliability
  ath10k: retry warm reset a few times

 drivers/net/wireless/ath/ath10k/pci.c | 53 +++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 3 deletions(-)

-- 
1.8.5.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] ath10k: improve warm reset reliability
  2014-05-14  6:57 [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Michal Kazior
@ 2014-05-14  6:57 ` Michal Kazior
  2014-05-15 10:48   ` Kalle Valo
  2014-05-14  6:57 ` [PATCH 2/2] ath10k: retry warm reset a few times Michal Kazior
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Michal Kazior @ 2014-05-14  6:57 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Warm reset is now able to recover after device
crashes which required a cold reset before.

This should greatly reduce chances of getting data
bus errors or host system freezes due to buggy
cold reset on some chips.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/pci.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 66b1f30..c9b14b4 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1802,6 +1802,28 @@ static void ath10k_pci_fw_interrupt_handler(struct ath10k *ar)
 	ath10k_pci_sleep(ar);
 }
 
+/* this function effectively clears target memory controller assert line */
+static void ath10k_pci_warm_reset_si0(struct ath10k *ar)
+{
+	u32 val;
+
+	val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
+				SOC_RESET_CONTROL_ADDRESS);
+	ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
+			   val | SOC_RESET_CONTROL_SI0_RST_MASK);
+	val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
+				SOC_RESET_CONTROL_ADDRESS);
+	msleep(10);
+
+	val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
+				SOC_RESET_CONTROL_ADDRESS);
+	ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
+			   val & ~SOC_RESET_CONTROL_SI0_RST_MASK);
+	val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
+				SOC_RESET_CONTROL_ADDRESS);
+	msleep(10);
+}
+
 static int ath10k_pci_warm_reset(struct ath10k *ar)
 {
 	int ret = 0;
@@ -1860,6 +1882,8 @@ static int ath10k_pci_warm_reset(struct ath10k *ar)
 				SOC_RESET_CONTROL_ADDRESS);
 	msleep(10);
 
+	ath10k_pci_warm_reset_si0(ar);
+
 	/* debug */
 	val = ath10k_pci_read32(ar, SOC_CORE_BASE_ADDRESS +
 				PCIE_INTR_CAUSE_ADDRESS);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] ath10k: retry warm reset a few times
  2014-05-14  6:57 [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Michal Kazior
  2014-05-14  6:57 ` [PATCH 1/2] ath10k: improve warm reset reliability Michal Kazior
@ 2014-05-14  6:57 ` Michal Kazior
  2014-05-14 14:22 ` [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Kalle Valo
  2014-05-16 13:49 ` Kalle Valo
  3 siblings, 0 replies; 9+ messages in thread
From: Michal Kazior @ 2014-05-14  6:57 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Michal Kazior

Sometimes warm reset works upon retry. It might be
related to imperfect warm reset routine, but for
now let's just do the retries.

This should improve the reliability of some chips
that hang/crash with cold reset which is used as a
last resort if warm reset fails.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/pci.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index c9b14b4..c9e482e 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -59,6 +59,7 @@ MODULE_PARM_DESC(reset_mode, "0: auto, 1: warm only (default: 0)");
 
 /* how long wait to wait for target to initialise, in ms */
 #define ATH10K_PCI_TARGET_WAIT 3000
+#define ATH10K_PCI_NUM_WARM_RESET_ATTEMPTS 3
 
 #define QCA988X_2_0_DEVICE_ID	(0x003c)
 
@@ -2012,6 +2013,28 @@ err:
 	return ret;
 }
 
+static int ath10k_pci_hif_power_up_warm(struct ath10k *ar)
+{
+	int i, ret;
+
+	/*
+	 * Sometime warm reset succeeds after retries.
+	 *
+	 * FIXME: It might be possible to tune ath10k_pci_warm_reset() to work
+	 * at first try.
+	 */
+	for (i = 0; i < ATH10K_PCI_NUM_WARM_RESET_ATTEMPTS; i++) {
+		ret = __ath10k_pci_hif_power_up(ar, false);
+		if (ret == 0)
+			break;
+
+		ath10k_warn("failed to warm reset (attempt %d out of %d): %d\n",
+			    i + 1, ATH10K_PCI_NUM_WARM_RESET_ATTEMPTS, ret);
+	}
+
+	return ret;
+}
+
 static int ath10k_pci_hif_power_up(struct ath10k *ar)
 {
 	int ret;
@@ -2023,10 +2046,10 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
 	 * preferred (and safer) way to perform a device reset is through a
 	 * warm reset.
 	 *
-	 * Warm reset doesn't always work though (notably after a firmware
-	 * crash) so fall back to cold reset if necessary.
+	 * Warm reset doesn't always work though so fall back to cold reset may
+	 * be necessary.
 	 */
-	ret = __ath10k_pci_hif_power_up(ar, false);
+	ret = ath10k_pci_hif_power_up_warm(ar);
 	if (ret) {
 		ath10k_warn("failed to power up target using warm reset: %d\n",
 			    ret);
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ath10k: warm reset fixes 2014-05-14
  2014-05-14  6:57 [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Michal Kazior
  2014-05-14  6:57 ` [PATCH 1/2] ath10k: improve warm reset reliability Michal Kazior
  2014-05-14  6:57 ` [PATCH 2/2] ath10k: retry warm reset a few times Michal Kazior
@ 2014-05-14 14:22 ` Kalle Valo
  2014-05-14 14:25   ` Kalle Valo
  2014-05-16 13:49 ` Kalle Valo
  3 siblings, 1 reply; 9+ messages in thread
From: Kalle Valo @ 2014-05-14 14:22 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k, linux-wireless

Michal Kazior <michal.kazior@tieto.com> writes:

> Warm reset was not able to recover from a device
> crash so ath10k was falling back to cold reset
> which would occasionally hang the system or
> generate a data bus error with some device chips.
>
> Warm reset now works a lot more reliably (at least
> on my T430). It's able to recover from a "hard"
> simulate fw crash and some firmware asserts I've
> forced meaning it should recover in most cases
> without falling back to the cold reset.
>
> There's still one case of a crash warm reset is
> incapable to recover from: IOMMU faults (when
> supported and enabled and device tries to access
> non-DMA mapped host memory) trigger device crash
> and leave CE totally unresponsive before and after
> the warm reset.
>
>
> Michal Kazior (2):
>   ath10k: improve warm reset reliability
>   ath10k: retry warm reset a few times

The patches don't apply anymore and the conflict was not trivial enough
for me to fix it. Can you rebase, please?

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ath10k: warm reset fixes 2014-05-14
  2014-05-14 14:22 ` [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Kalle Valo
@ 2014-05-14 14:25   ` Kalle Valo
  0 siblings, 0 replies; 9+ messages in thread
From: Kalle Valo @ 2014-05-14 14:25 UTC (permalink / raw)
  To: Michal Kazior; +Cc: linux-wireless, ath10k

Kalle Valo <kvalo@qca.qualcomm.com> writes:

> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> Warm reset was not able to recover from a device
>> crash so ath10k was falling back to cold reset
>> which would occasionally hang the system or
>> generate a data bus error with some device chips.
>>
>> Warm reset now works a lot more reliably (at least
>> on my T430). It's able to recover from a "hard"
>> simulate fw crash and some firmware asserts I've
>> forced meaning it should recover in most cases
>> without falling back to the cold reset.
>>
>> There's still one case of a crash warm reset is
>> incapable to recover from: IOMMU faults (when
>> supported and enabled and device tries to access
>> non-DMA mapped host memory) trigger device crash
>> and leave CE totally unresponsive before and after
>> the warm reset.
>>
>>
>> Michal Kazior (2):
>>   ath10k: improve warm reset reliability
>>   ath10k: retry warm reset a few times
>
> The patches don't apply anymore and the conflict was not trivial enough
> for me to fix it. Can you rebase, please?

Sorry, I replied to the wrong thread. Please ignore what I said above!

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ath10k: improve warm reset reliability
  2014-05-14  6:57 ` [PATCH 1/2] ath10k: improve warm reset reliability Michal Kazior
@ 2014-05-15 10:48   ` Kalle Valo
  2014-05-15 12:42     ` Michal Kazior
  0 siblings, 1 reply; 9+ messages in thread
From: Kalle Valo @ 2014-05-15 10:48 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k, linux-wireless

Michal Kazior <michal.kazior@tieto.com> writes:

> Warm reset is now able to recover after device
> crashes which required a cold reset before.
>
> This should greatly reduce chances of getting data
> bus errors or host system freezes due to buggy
> cold reset on some chips.
>
> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>

Awesome! This is very much needed. I just have a cosmetic comment:

> +/* this function effectively clears target memory controller assert line */
> +static void ath10k_pci_warm_reset_si0(struct ath10k *ar)
> +{
> +	u32 val;
> +
> +	val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
> +				SOC_RESET_CONTROL_ADDRESS);
> +	ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
> +			   val | SOC_RESET_CONTROL_SI0_RST_MASK);
> +	val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
> +				SOC_RESET_CONTROL_ADDRESS);

We do have the ath10k_pci_soc_ functions for accessing SOC registers, I
would prefer to use those here. I now modified your patch with the diff
below. Is that ok to you?

--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1807,20 +1807,18 @@ static void ath10k_pci_warm_reset_si0(struct ath10k *ar)
 {
        u32 val;
 
-       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
-                               SOC_RESET_CONTROL_ADDRESS);
-       ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
-                          val | SOC_RESET_CONTROL_SI0_RST_MASK);
-       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
-                               SOC_RESET_CONTROL_ADDRESS);
+       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
+       ath10k_pci_soc_write32(ar, SOC_RESET_CONTROL_ADDRESS,
+                              val | SOC_RESET_CONTROL_SI0_RST_MASK);
+       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
+
        msleep(10);
 
-       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
-                               SOC_RESET_CONTROL_ADDRESS);
-       ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
-                          val & ~SOC_RESET_CONTROL_SI0_RST_MASK);
-       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
-                               SOC_RESET_CONTROL_ADDRESS);
+       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
+       ath10k_pci_soc_write32(ar, SOC_RESET_CONTROL_ADDRESS,
+                              val & ~SOC_RESET_CONTROL_SI0_RST_MASK);
+       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
+
        msleep(10);
 }

Full patch here:

https://github.com/kvalo/ath/commit/7b52054308a371d479a9e686e1d8411d19a90fd7

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ath10k: improve warm reset reliability
  2014-05-15 10:48   ` Kalle Valo
@ 2014-05-15 12:42     ` Michal Kazior
  2014-05-15 12:57       ` Kalle Valo
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Kazior @ 2014-05-15 12:42 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless, ath10k

On 15 May 2014 12:48, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> Warm reset is now able to recover after device
>> crashes which required a cold reset before.
>>
>> This should greatly reduce chances of getting data
>> bus errors or host system freezes due to buggy
>> cold reset on some chips.
>>
>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>
> Awesome! This is very much needed. I just have a cosmetic comment:
>
>> +/* this function effectively clears target memory controller assert line */
>> +static void ath10k_pci_warm_reset_si0(struct ath10k *ar)
>> +{
>> +     u32 val;
>> +
>> +     val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
>> +                             SOC_RESET_CONTROL_ADDRESS);
>> +     ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
>> +                        val | SOC_RESET_CONTROL_SI0_RST_MASK);
>> +     val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
>> +                             SOC_RESET_CONTROL_ADDRESS);
>
> We do have the ath10k_pci_soc_ functions for accessing SOC registers, I
> would prefer to use those here. I now modified your patch with the diff
> below. Is that ok to you?

Ah, yeah. I totally forgot about these functions (again).


> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -1807,20 +1807,18 @@ static void ath10k_pci_warm_reset_si0(struct ath10k *ar)
>  {
>         u32 val;
>
> -       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
> -                               SOC_RESET_CONTROL_ADDRESS);
> -       ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
> -                          val | SOC_RESET_CONTROL_SI0_RST_MASK);
> -       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
> -                               SOC_RESET_CONTROL_ADDRESS);
> +       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
> +       ath10k_pci_soc_write32(ar, SOC_RESET_CONTROL_ADDRESS,
> +                              val | SOC_RESET_CONTROL_SI0_RST_MASK);
> +       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
> +
>         msleep(10);
>
> -       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
> -                               SOC_RESET_CONTROL_ADDRESS);
> -       ath10k_pci_write32(ar, RTC_SOC_BASE_ADDRESS + SOC_RESET_CONTROL_ADDRESS,
> -                          val & ~SOC_RESET_CONTROL_SI0_RST_MASK);
> -       val = ath10k_pci_read32(ar, RTC_SOC_BASE_ADDRESS +
> -                               SOC_RESET_CONTROL_ADDRESS);
> +       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
> +       ath10k_pci_soc_write32(ar, SOC_RESET_CONTROL_ADDRESS,
> +                              val & ~SOC_RESET_CONTROL_SI0_RST_MASK);
> +       val = ath10k_pci_soc_read32(ar, SOC_RESET_CONTROL_ADDRESS);
> +
>         msleep(10);
>  }

Looks good to me :-)


> Full patch here:
>
> https://github.com/kvalo/ath/commit/7b52054308a371d479a9e686e1d8411d19a90fd7

You probably mean:

https://github.com/kvalo/ath/commit/136ef8110cba61752eee5ef6bd7ce170b15cf491


Michał

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] ath10k: improve warm reset reliability
  2014-05-15 12:42     ` Michal Kazior
@ 2014-05-15 12:57       ` Kalle Valo
  0 siblings, 0 replies; 9+ messages in thread
From: Kalle Valo @ 2014-05-15 12:57 UTC (permalink / raw)
  To: Michal Kazior; +Cc: linux-wireless, ath10k

Michal Kazior <michal.kazior@tieto.com> writes:

>> Full patch here:
>>
>> https://github.com/kvalo/ath/commit/7b52054308a371d479a9e686e1d8411d19a90fd7
>
> You probably mean:
>
> https://github.com/kvalo/ath/commit/136ef8110cba61752eee5ef6bd7ce170b15cf491

Yup, sorry for the confusion.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] ath10k: warm reset fixes 2014-05-14
  2014-05-14  6:57 [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Michal Kazior
                   ` (2 preceding siblings ...)
  2014-05-14 14:22 ` [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Kalle Valo
@ 2014-05-16 13:49 ` Kalle Valo
  3 siblings, 0 replies; 9+ messages in thread
From: Kalle Valo @ 2014-05-16 13:49 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k, linux-wireless

Michal Kazior <michal.kazior@tieto.com> writes:

> Hi,
>
> Warm reset was not able to recover from a device
> crash so ath10k was falling back to cold reset
> which would occasionally hang the system or
> generate a data bus error with some device chips.
>
> Warm reset now works a lot more reliably (at least
> on my T430). It's able to recover from a "hard"
> simulate fw crash and some firmware asserts I've
> forced meaning it should recover in most cases
> without falling back to the cold reset.
>
> There's still one case of a crash warm reset is
> incapable to recover from: IOMMU faults (when
> supported and enabled and device tries to access
> non-DMA mapped host memory) trigger device crash
> and leave CE totally unresponsive before and after
> the warm reset.
>
>
> Michal Kazior (2):
>   ath10k: improve warm reset reliability
>   ath10k: retry warm reset a few times

Thanks, both patches applied.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-05-16 13:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14  6:57 [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Michal Kazior
2014-05-14  6:57 ` [PATCH 1/2] ath10k: improve warm reset reliability Michal Kazior
2014-05-15 10:48   ` Kalle Valo
2014-05-15 12:42     ` Michal Kazior
2014-05-15 12:57       ` Kalle Valo
2014-05-14  6:57 ` [PATCH 2/2] ath10k: retry warm reset a few times Michal Kazior
2014-05-14 14:22 ` [PATCH 0/2] ath10k: warm reset fixes 2014-05-14 Kalle Valo
2014-05-14 14:25   ` Kalle Valo
2014-05-16 13:49 ` Kalle Valo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).