All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-03 22:13 ` greearb
  0 siblings, 0 replies; 14+ messages in thread
From: greearb @ 2017-10-03 22:13 UTC (permalink / raw)
  To: linux-wireless; +Cc: ath10k, Ben Greear

From: Ben Greear <greearb@candelatech.com>

This works around a problem we see when sometimes the wifi NIC does
not respond the first time.  This seems to happen especially often on
some of the 9984 NICs in mid-range platforms.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  Change to mdelay instead of udelay to fix compile issue on ARM.

 drivers/net/wireless/ath/ath10k/pci.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 77beb13..0861f7f 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -3487,8 +3487,8 @@ static const struct ath10k_bus_ops ath10k_pci_bus_ops = {
 	.get_num_banks	= ath10k_pci_get_num_banks,
 };
 
-static int ath10k_pci_probe(struct pci_dev *pdev,
-			    const struct pci_device_id *pci_dev)
+static int __ath10k_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *pci_dev)
 {
 	int ret = 0;
 	struct ath10k *ar;
@@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
 	return ret;
 }
 
+static int ath10k_pci_probe(struct pci_dev *pdev,
+			    const struct pci_device_id *pci_dev)
+{
+	int cnt = 0;
+	int rv;
+	do {
+		rv = __ath10k_pci_probe(pdev, pci_dev);
+		if (rv == 0)
+			return rv;
+		pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
+		mdelay(10); /* let the ath10k firmware gerbil take a small break */
+	} while (cnt++ < 10);
+	return rv;
+}
+
+
 static void ath10k_pci_remove(struct pci_dev *pdev)
 {
 	struct ath10k *ar = pci_get_drvdata(pdev);
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-03 22:13 ` greearb
  0 siblings, 0 replies; 14+ messages in thread
From: greearb @ 2017-10-03 22:13 UTC (permalink / raw)
  To: linux-wireless; +Cc: Ben Greear, ath10k

From: Ben Greear <greearb@candelatech.com>

This works around a problem we see when sometimes the wifi NIC does
not respond the first time.  This seems to happen especially often on
some of the 9984 NICs in mid-range platforms.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  Change to mdelay instead of udelay to fix compile issue on ARM.

 drivers/net/wireless/ath/ath10k/pci.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 77beb13..0861f7f 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -3487,8 +3487,8 @@ static const struct ath10k_bus_ops ath10k_pci_bus_ops = {
 	.get_num_banks	= ath10k_pci_get_num_banks,
 };
 
-static int ath10k_pci_probe(struct pci_dev *pdev,
-			    const struct pci_device_id *pci_dev)
+static int __ath10k_pci_probe(struct pci_dev *pdev,
+			      const struct pci_device_id *pci_dev)
 {
 	int ret = 0;
 	struct ath10k *ar;
@@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
 	return ret;
 }
 
+static int ath10k_pci_probe(struct pci_dev *pdev,
+			    const struct pci_device_id *pci_dev)
+{
+	int cnt = 0;
+	int rv;
+	do {
+		rv = __ath10k_pci_probe(pdev, pci_dev);
+		if (rv == 0)
+			return rv;
+		pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
+		mdelay(10); /* let the ath10k firmware gerbil take a small break */
+	} while (cnt++ < 10);
+	return rv;
+}
+
+
 static void ath10k_pci_remove(struct pci_dev *pdev)
 {
 	struct ath10k *ar = pci_get_drvdata(pdev);
-- 
2.4.11


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
  2017-10-03 22:13 ` greearb
@ 2017-10-13 12:41   ` Kalle Valo
  -1 siblings, 0 replies; 14+ messages in thread
From: Kalle Valo @ 2017-10-13 12:41 UTC (permalink / raw)
  To: greearb; +Cc: linux-wireless, ath10k

greearb@candelatech.com writes:

> From: Ben Greear <greearb@candelatech.com>
>
> This works around a problem we see when sometimes the wifi NIC does
> not respond the first time.  This seems to happen especially often on
> some of the 9984 NICs in mid-range platforms.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>

[...]

> -static int ath10k_pci_probe(struct pci_dev *pdev,
> -			    const struct pci_device_id *pci_dev)
> +static int __ath10k_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *pci_dev)
>  {
>  	int ret =3D 0;
>  	struct ath10k *ar;
> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>  	return ret;
>  }
> =20
> +static int ath10k_pci_probe(struct pci_dev *pdev,
> +			    const struct pci_device_id *pci_dev)
> +{
> +	int cnt =3D 0;
> +	int rv;
> +	do {
> +		rv =3D __ath10k_pci_probe(pdev, pci_dev);
> +		if (rv =3D=3D 0)
> +			return rv;
> +		pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt)=
;
> +		mdelay(10); /* let the ath10k firmware gerbil take a small break */
> +	} while (cnt++ < 10);
> +	return rv;
> +}

This is a sledgehammer approach and it causes reload for all error
cases, like when hardware is broken or memory allocation is failing.

When the problem happens does it always fail at the the same place? Is
it hw reset or something else? It's better to retry the invidiual action
than to do this hack. Or is it just some more delay needed somewhere?

--=20
Kalle Valo=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-13 12:41   ` Kalle Valo
  0 siblings, 0 replies; 14+ messages in thread
From: Kalle Valo @ 2017-10-13 12:41 UTC (permalink / raw)
  To: greearb; +Cc: linux-wireless, ath10k

greearb@candelatech.com writes:

> From: Ben Greear <greearb@candelatech.com>
>
> This works around a problem we see when sometimes the wifi NIC does
> not respond the first time.  This seems to happen especially often on
> some of the 9984 NICs in mid-range platforms.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>

[...]

> -static int ath10k_pci_probe(struct pci_dev *pdev,
> -			    const struct pci_device_id *pci_dev)
> +static int __ath10k_pci_probe(struct pci_dev *pdev,
> +			      const struct pci_device_id *pci_dev)
>  {
>  	int ret = 0;
>  	struct ath10k *ar;
> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>  	return ret;
>  }
>  
> +static int ath10k_pci_probe(struct pci_dev *pdev,
> +			    const struct pci_device_id *pci_dev)
> +{
> +	int cnt = 0;
> +	int rv;
> +	do {
> +		rv = __ath10k_pci_probe(pdev, pci_dev);
> +		if (rv == 0)
> +			return rv;
> +		pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
> +		mdelay(10); /* let the ath10k firmware gerbil take a small break */
> +	} while (cnt++ < 10);
> +	return rv;
> +}

This is a sledgehammer approach and it causes reload for all error
cases, like when hardware is broken or memory allocation is failing.

When the problem happens does it always fail at the the same place? Is
it hw reset or something else? It's better to retry the invidiual action
than to do this hack. Or is it just some more delay needed somewhere?

-- 
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
  2017-10-13 12:41   ` Kalle Valo
@ 2017-10-13 15:50     ` Adrian Chadd
  -1 siblings, 0 replies; 14+ messages in thread
From: Adrian Chadd @ 2017-10-13 15:50 UTC (permalink / raw)
  To: Kalle Valo; +Cc: greearb, linux-wireless, ath10k

On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> greearb@candelatech.com writes:
>
>> From: Ben Greear <greearb@candelatech.com>
>>
>> This works around a problem we see when sometimes the wifi NIC does
>> not respond the first time.  This seems to happen especially often on
>> some of the 9984 NICs in mid-range platforms.
>>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>
> [...]
>
>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>> -                         const struct pci_device_id *pci_dev)
>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>> +                           const struct pci_device_id *pci_dev)
>>  {
>>       int ret = 0;
>>       struct ath10k *ar;
>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>       return ret;
>>  }
>>
>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>> +                         const struct pci_device_id *pci_dev)
>> +{
>> +     int cnt = 0;
>> +     int rv;
>> +     do {
>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>> +             if (rv == 0)
>> +                     return rv;
>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>> +     } while (cnt++ < 10);
>> +     return rv;
>> +}
>
> This is a sledgehammer approach and it causes reload for all error
> cases, like when hardware is broken or memory allocation is failing.
>
> When the problem happens does it always fail at the the same place? Is
> it hw reset or something else? It's better to retry the invidiual action
> than to do this hack. Or is it just some more delay needed somewhere?

I am seeing WMI timeouts during initial firmware load and wait on
QCA9984 + BCM7444S SoC.
My guess is the WMI wakeup time is not "right" enough and needs to be
extended a little bit.

But then, I have played a lot of whackamole with WMI timeouts during
my loooong porting effort..


-adrian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-13 15:50     ` Adrian Chadd
  0 siblings, 0 replies; 14+ messages in thread
From: Adrian Chadd @ 2017-10-13 15:50 UTC (permalink / raw)
  To: Kalle Valo; +Cc: greearb, linux-wireless, ath10k

On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> greearb@candelatech.com writes:
>
>> From: Ben Greear <greearb@candelatech.com>
>>
>> This works around a problem we see when sometimes the wifi NIC does
>> not respond the first time.  This seems to happen especially often on
>> some of the 9984 NICs in mid-range platforms.
>>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>
> [...]
>
>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>> -                         const struct pci_device_id *pci_dev)
>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>> +                           const struct pci_device_id *pci_dev)
>>  {
>>       int ret = 0;
>>       struct ath10k *ar;
>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>       return ret;
>>  }
>>
>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>> +                         const struct pci_device_id *pci_dev)
>> +{
>> +     int cnt = 0;
>> +     int rv;
>> +     do {
>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>> +             if (rv == 0)
>> +                     return rv;
>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>> +     } while (cnt++ < 10);
>> +     return rv;
>> +}
>
> This is a sledgehammer approach and it causes reload for all error
> cases, like when hardware is broken or memory allocation is failing.
>
> When the problem happens does it always fail at the the same place? Is
> it hw reset or something else? It's better to retry the invidiual action
> than to do this hack. Or is it just some more delay needed somewhere?

I am seeing WMI timeouts during initial firmware load and wait on
QCA9984 + BCM7444S SoC.
My guess is the WMI wakeup time is not "right" enough and needs to be
extended a little bit.

But then, I have played a lot of whackamole with WMI timeouts during
my loooong porting effort..


-adrian

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
  2017-10-13 15:50     ` Adrian Chadd
@ 2017-10-13 20:41       ` Ben Greear
  -1 siblings, 0 replies; 14+ messages in thread
From: Ben Greear @ 2017-10-13 20:41 UTC (permalink / raw)
  To: Adrian Chadd, Kalle Valo; +Cc: linux-wireless, ath10k

On 10/13/2017 08:50 AM, Adrian Chadd wrote:
> On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>> greearb@candelatech.com writes:
>>
>>> From: Ben Greear <greearb@candelatech.com>
>>>
>>> This works around a problem we see when sometimes the wifi NIC does
>>> not respond the first time.  This seems to happen especially often on
>>> some of the 9984 NICs in mid-range platforms.
>>>
>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>
>> [...]
>>
>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>> -                         const struct pci_device_id *pci_dev)
>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>> +                           const struct pci_device_id *pci_dev)
>>>   {
>>>        int ret = 0;
>>>        struct ath10k *ar;
>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>        return ret;
>>>   }
>>>
>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>> +                         const struct pci_device_id *pci_dev)
>>> +{
>>> +     int cnt = 0;
>>> +     int rv;
>>> +     do {
>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>> +             if (rv == 0)
>>> +                     return rv;
>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>> +     } while (cnt++ < 10);
>>> +     return rv;
>>> +}
>>
>> This is a sledgehammer approach and it causes reload for all error
>> cases, like when hardware is broken or memory allocation is failing.
>>
>> When the problem happens does it always fail at the the same place? Is
>> it hw reset or something else? It's better to retry the invidiual action
>> than to do this hack. Or is it just some more delay needed somewhere?
>
> I am seeing WMI timeouts during initial firmware load and wait on
> QCA9984 + BCM7444S SoC.
> My guess is the WMI wakeup time is not "right" enough and needs to be
> extended a little bit.
>
> But then, I have played a lot of whackamole with WMI timeouts during
> my loooong porting effort..

The failure I saw was a failure to wake pci, and from comments, it seems that
the current wait is longer than what should be required, and it warns on slow
wakes, and I never saw that warning.  So I assume that waiting longer would not help.

I saw it fail twice in a row to wake pci and then succeed on the third try, for instance,
when testing my patch.

As for a big hammer, I guess we could check for certain return codes if you think
that is better than just retrying all failures?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-13 20:41       ` Ben Greear
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Greear @ 2017-10-13 20:41 UTC (permalink / raw)
  To: Adrian Chadd, Kalle Valo; +Cc: linux-wireless, ath10k

On 10/13/2017 08:50 AM, Adrian Chadd wrote:
> On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>> greearb@candelatech.com writes:
>>
>>> From: Ben Greear <greearb@candelatech.com>
>>>
>>> This works around a problem we see when sometimes the wifi NIC does
>>> not respond the first time.  This seems to happen especially often on
>>> some of the 9984 NICs in mid-range platforms.
>>>
>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>
>> [...]
>>
>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>> -                         const struct pci_device_id *pci_dev)
>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>> +                           const struct pci_device_id *pci_dev)
>>>   {
>>>        int ret = 0;
>>>        struct ath10k *ar;
>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>        return ret;
>>>   }
>>>
>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>> +                         const struct pci_device_id *pci_dev)
>>> +{
>>> +     int cnt = 0;
>>> +     int rv;
>>> +     do {
>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>> +             if (rv == 0)
>>> +                     return rv;
>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>> +     } while (cnt++ < 10);
>>> +     return rv;
>>> +}
>>
>> This is a sledgehammer approach and it causes reload for all error
>> cases, like when hardware is broken or memory allocation is failing.
>>
>> When the problem happens does it always fail at the the same place? Is
>> it hw reset or something else? It's better to retry the invidiual action
>> than to do this hack. Or is it just some more delay needed somewhere?
>
> I am seeing WMI timeouts during initial firmware load and wait on
> QCA9984 + BCM7444S SoC.
> My guess is the WMI wakeup time is not "right" enough and needs to be
> extended a little bit.
>
> But then, I have played a lot of whackamole with WMI timeouts during
> my loooong porting effort..

The failure I saw was a failure to wake pci, and from comments, it seems that
the current wait is longer than what should be required, and it warns on slow
wakes, and I never saw that warning.  So I assume that waiting longer would not help.

I saw it fail twice in a row to wake pci and then succeed on the third try, for instance,
when testing my patch.

As for a big hammer, I guess we could check for certain return codes if you think
that is better than just retrying all failures?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
  2017-10-13 20:41       ` Ben Greear
@ 2017-10-13 20:55         ` Adrian Chadd
  -1 siblings, 0 replies; 14+ messages in thread
From: Adrian Chadd @ 2017-10-13 20:55 UTC (permalink / raw)
  To: Ben Greear; +Cc: Kalle Valo, linux-wireless, ath10k

[snip]

* WMI setup stuff fails locally because of memory fragmentation when
you reload the driver. Heh. Sigh.
* I also see the PCI wakeup failures, so I'm going to go poke that
soon and see what I find.



-adrian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-13 20:55         ` Adrian Chadd
  0 siblings, 0 replies; 14+ messages in thread
From: Adrian Chadd @ 2017-10-13 20:55 UTC (permalink / raw)
  To: Ben Greear; +Cc: Kalle Valo, linux-wireless, ath10k

[snip]

* WMI setup stuff fails locally because of memory fragmentation when
you reload the driver. Heh. Sigh.
* I also see the PCI wakeup failures, so I'm going to go poke that
soon and see what I find.



-adrian

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
  2017-10-13 20:41       ` Ben Greear
@ 2017-10-17  8:45         ` Kalle Valo
  -1 siblings, 0 replies; 14+ messages in thread
From: Kalle Valo @ 2017-10-17  8:45 UTC (permalink / raw)
  To: Ben Greear; +Cc: Adrian Chadd, linux-wireless, ath10k

Ben Greear <greearb@candelatech.com> writes:

> On 10/13/2017 08:50 AM, Adrian Chadd wrote:
>> On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>>> greearb@candelatech.com writes:
>>>
>>>> From: Ben Greear <greearb@candelatech.com>
>>>>
>>>> This works around a problem we see when sometimes the wifi NIC does
>>>> not respond the first time.  This seems to happen especially often on
>>>> some of the 9984 NICs in mid-range platforms.
>>>>
>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>
>>> [...]
>>>
>>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> -                         const struct pci_device_id *pci_dev)
>>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                           const struct pci_device_id *pci_dev)
>>>>   {
>>>>        int ret =3D 0;
>>>>        struct ath10k *ar;
>>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pde=
v,
>>>>        return ret;
>>>>   }
>>>>
>>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                         const struct pci_device_id *pci_dev)
>>>> +{
>>>> +     int cnt =3D 0;
>>>> +     int rv;
>>>> +     do {
>>>> +             rv =3D __ath10k_pci_probe(pdev, pci_dev);
>>>> +             if (rv =3D=3D 0)
>>>> +                     return rv;
>>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %=
d\n", rv, cnt);
>>>> +             mdelay(10); /* let the ath10k firmware gerbil take a sma=
ll break */
>>>> +     } while (cnt++ < 10);
>>>> +     return rv;
>>>> +}
>>>
>>> This is a sledgehammer approach and it causes reload for all error
>>> cases, like when hardware is broken or memory allocation is failing.
>>>
>>> When the problem happens does it always fail at the the same place? Is
>>> it hw reset or something else? It's better to retry the invidiual actio=
n
>>> than to do this hack. Or is it just some more delay needed somewhere?
>>
>> I am seeing WMI timeouts during initial firmware load and wait on
>> QCA9984 + BCM7444S SoC.
>> My guess is the WMI wakeup time is not "right" enough and needs to be
>> extended a little bit.
>>
>> But then, I have played a lot of whackamole with WMI timeouts during
>> my loooong porting effort..
>
> The failure I saw was a failure to wake pci, and from comments, it seems =
that
> the current wait is longer than what should be required, and it warns on =
slow
> wakes, and I never saw that warning.  So I assume that waiting longer wou=
ld not help.
>
> I saw it fail twice in a row to wake pci and then succeed on the third
> try, for instance,
> when testing my patch.
>
> As for a big hammer, I guess we could check for certain return codes if y=
ou think
> that is better than just retrying all failures?

ath10k_pci_probe() has a lots of stuff which should not affect your
problem, like allocating memory, setting up timers and interrupts etc.
It's quite ugly to redo that in every cycle. A more fine grained
solution, like looping specific action (reset, wake whatever) is much
more preferred.

Do you have debug logs of failing cases?

--=20
Kalle Valo=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-17  8:45         ` Kalle Valo
  0 siblings, 0 replies; 14+ messages in thread
From: Kalle Valo @ 2017-10-17  8:45 UTC (permalink / raw)
  To: Ben Greear; +Cc: Adrian Chadd, linux-wireless, ath10k

Ben Greear <greearb@candelatech.com> writes:

> On 10/13/2017 08:50 AM, Adrian Chadd wrote:
>> On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>>> greearb@candelatech.com writes:
>>>
>>>> From: Ben Greear <greearb@candelatech.com>
>>>>
>>>> This works around a problem we see when sometimes the wifi NIC does
>>>> not respond the first time.  This seems to happen especially often on
>>>> some of the 9984 NICs in mid-range platforms.
>>>>
>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>
>>> [...]
>>>
>>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> -                         const struct pci_device_id *pci_dev)
>>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                           const struct pci_device_id *pci_dev)
>>>>   {
>>>>        int ret = 0;
>>>>        struct ath10k *ar;
>>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>        return ret;
>>>>   }
>>>>
>>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                         const struct pci_device_id *pci_dev)
>>>> +{
>>>> +     int cnt = 0;
>>>> +     int rv;
>>>> +     do {
>>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>>> +             if (rv == 0)
>>>> +                     return rv;
>>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>>> +     } while (cnt++ < 10);
>>>> +     return rv;
>>>> +}
>>>
>>> This is a sledgehammer approach and it causes reload for all error
>>> cases, like when hardware is broken or memory allocation is failing.
>>>
>>> When the problem happens does it always fail at the the same place? Is
>>> it hw reset or something else? It's better to retry the invidiual action
>>> than to do this hack. Or is it just some more delay needed somewhere?
>>
>> I am seeing WMI timeouts during initial firmware load and wait on
>> QCA9984 + BCM7444S SoC.
>> My guess is the WMI wakeup time is not "right" enough and needs to be
>> extended a little bit.
>>
>> But then, I have played a lot of whackamole with WMI timeouts during
>> my loooong porting effort..
>
> The failure I saw was a failure to wake pci, and from comments, it seems that
> the current wait is longer than what should be required, and it warns on slow
> wakes, and I never saw that warning.  So I assume that waiting longer would not help.
>
> I saw it fail twice in a row to wake pci and then succeed on the third
> try, for instance,
> when testing my patch.
>
> As for a big hammer, I guess we could check for certain return codes if you think
> that is better than just retrying all failures?

ath10k_pci_probe() has a lots of stuff which should not affect your
problem, like allocating memory, setting up timers and interrupts etc.
It's quite ugly to redo that in every cycle. A more fine grained
solution, like looping specific action (reset, wake whatever) is much
more preferred.

Do you have debug logs of failing cases?

-- 
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
  2017-10-17  8:45         ` Kalle Valo
@ 2017-10-17 15:57           ` Ben Greear
  -1 siblings, 0 replies; 14+ messages in thread
From: Ben Greear @ 2017-10-17 15:57 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Adrian Chadd, linux-wireless, ath10k

On 10/17/2017 01:45 AM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> On 10/13/2017 08:50 AM, Adrian Chadd wrote:
>>> On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>>>> greearb@candelatech.com writes:
>>>>
>>>>> From: Ben Greear <greearb@candelatech.com>
>>>>>
>>>>> This works around a problem we see when sometimes the wifi NIC does
>>>>> not respond the first time.  This seems to happen especially often on
>>>>> some of the 9984 NICs in mid-range platforms.
>>>>>
>>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>
>>>> [...]
>>>>
>>>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>> -                         const struct pci_device_id *pci_dev)
>>>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>>>> +                           const struct pci_device_id *pci_dev)
>>>>>   {
>>>>>        int ret = 0;
>>>>>        struct ath10k *ar;
>>>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>>        return ret;
>>>>>   }
>>>>>
>>>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>> +                         const struct pci_device_id *pci_dev)
>>>>> +{
>>>>> +     int cnt = 0;
>>>>> +     int rv;
>>>>> +     do {
>>>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>>>> +             if (rv == 0)
>>>>> +                     return rv;
>>>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>>>> +     } while (cnt++ < 10);
>>>>> +     return rv;
>>>>> +}
>>>>
>>>> This is a sledgehammer approach and it causes reload for all error
>>>> cases, like when hardware is broken or memory allocation is failing.
>>>>
>>>> When the problem happens does it always fail at the the same place? Is
>>>> it hw reset or something else? It's better to retry the invidiual action
>>>> than to do this hack. Or is it just some more delay needed somewhere?
>>>
>>> I am seeing WMI timeouts during initial firmware load and wait on
>>> QCA9984 + BCM7444S SoC.
>>> My guess is the WMI wakeup time is not "right" enough and needs to be
>>> extended a little bit.
>>>
>>> But then, I have played a lot of whackamole with WMI timeouts during
>>> my loooong porting effort..
>>
>> The failure I saw was a failure to wake pci, and from comments, it seems that
>> the current wait is longer than what should be required, and it warns on slow
>> wakes, and I never saw that warning.  So I assume that waiting longer would not help.
>>
>> I saw it fail twice in a row to wake pci and then succeed on the third
>> try, for instance,
>> when testing my patch.
>>
>> As for a big hammer, I guess we could check for certain return codes if you think
>> that is better than just retrying all failures?
>
> ath10k_pci_probe() has a lots of stuff which should not affect your
> problem, like allocating memory, setting up timers and interrupts etc.
> It's quite ugly to redo that in every cycle. A more fine grained
> solution, like looping specific action (reset, wake whatever) is much
> more preferred.
>
> Do you have debug logs of failing cases?

I'll gather the logs next time I see this problem.

The patch I wrote likely does more than the minimal required to fix
this problem, but it does not complicate the code much, so I think that
is a benefit.  If we try to make it more specific, it will first likely
require a lot of testing effort to see if it is as effective, and second, it
will likely complicate the probe method quite a bit.

Its not like this is a performance issue...the extra loops will only be run
if the probe fails, and only on driver load.

If the driver fails to load due to issues that my hack cannot work around,
then the user has bigger problems than an extra second of time during the
boot.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] ath10k: Retry pci probe on failure.
@ 2017-10-17 15:57           ` Ben Greear
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Greear @ 2017-10-17 15:57 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Adrian Chadd, linux-wireless, ath10k

On 10/17/2017 01:45 AM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> On 10/13/2017 08:50 AM, Adrian Chadd wrote:
>>> On 13 October 2017 at 05:41, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>>>> greearb@candelatech.com writes:
>>>>
>>>>> From: Ben Greear <greearb@candelatech.com>
>>>>>
>>>>> This works around a problem we see when sometimes the wifi NIC does
>>>>> not respond the first time.  This seems to happen especially often on
>>>>> some of the 9984 NICs in mid-range platforms.
>>>>>
>>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>
>>>> [...]
>>>>
>>>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>> -                         const struct pci_device_id *pci_dev)
>>>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>>>> +                           const struct pci_device_id *pci_dev)
>>>>>   {
>>>>>        int ret = 0;
>>>>>        struct ath10k *ar;
>>>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>>        return ret;
>>>>>   }
>>>>>
>>>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>> +                         const struct pci_device_id *pci_dev)
>>>>> +{
>>>>> +     int cnt = 0;
>>>>> +     int rv;
>>>>> +     do {
>>>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>>>> +             if (rv == 0)
>>>>> +                     return rv;
>>>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>>>> +     } while (cnt++ < 10);
>>>>> +     return rv;
>>>>> +}
>>>>
>>>> This is a sledgehammer approach and it causes reload for all error
>>>> cases, like when hardware is broken or memory allocation is failing.
>>>>
>>>> When the problem happens does it always fail at the the same place? Is
>>>> it hw reset or something else? It's better to retry the invidiual action
>>>> than to do this hack. Or is it just some more delay needed somewhere?
>>>
>>> I am seeing WMI timeouts during initial firmware load and wait on
>>> QCA9984 + BCM7444S SoC.
>>> My guess is the WMI wakeup time is not "right" enough and needs to be
>>> extended a little bit.
>>>
>>> But then, I have played a lot of whackamole with WMI timeouts during
>>> my loooong porting effort..
>>
>> The failure I saw was a failure to wake pci, and from comments, it seems that
>> the current wait is longer than what should be required, and it warns on slow
>> wakes, and I never saw that warning.  So I assume that waiting longer would not help.
>>
>> I saw it fail twice in a row to wake pci and then succeed on the third
>> try, for instance,
>> when testing my patch.
>>
>> As for a big hammer, I guess we could check for certain return codes if you think
>> that is better than just retrying all failures?
>
> ath10k_pci_probe() has a lots of stuff which should not affect your
> problem, like allocating memory, setting up timers and interrupts etc.
> It's quite ugly to redo that in every cycle. A more fine grained
> solution, like looping specific action (reset, wake whatever) is much
> more preferred.
>
> Do you have debug logs of failing cases?

I'll gather the logs next time I see this problem.

The patch I wrote likely does more than the minimal required to fix
this problem, but it does not complicate the code much, so I think that
is a benefit.  If we try to make it more specific, it will first likely
require a lot of testing effort to see if it is as effective, and second, it
will likely complicate the probe method quite a bit.

Its not like this is a performance issue...the extra loops will only be run
if the probe fails, and only on driver load.

If the driver fails to load due to issues that my hack cannot work around,
then the user has bigger problems than an extra second of time during the
boot.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-10-17 15:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03 22:13 [PATCH v2] ath10k: Retry pci probe on failure greearb
2017-10-03 22:13 ` greearb
2017-10-13 12:41 ` Kalle Valo
2017-10-13 12:41   ` Kalle Valo
2017-10-13 15:50   ` Adrian Chadd
2017-10-13 15:50     ` Adrian Chadd
2017-10-13 20:41     ` Ben Greear
2017-10-13 20:41       ` Ben Greear
2017-10-13 20:55       ` Adrian Chadd
2017-10-13 20:55         ` Adrian Chadd
2017-10-17  8:45       ` Kalle Valo
2017-10-17  8:45         ` Kalle Valo
2017-10-17 15:57         ` Ben Greear
2017-10-17 15:57           ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.