All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ath10k:  Fix crash when using v1 hardware.
@ 2013-07-02 22:42 ` greearb at candelatech.com
  0 siblings, 0 replies; 18+ messages in thread
From: greearb @ 2013-07-02 22:42 UTC (permalink / raw)
  To: ath9k-devel; +Cc: linux-wireless, Ben Greear

From: Ben Greear <greearb@candelatech.com>

I put a v1 NIC from an TP-LINK AC 1750 AP in
a 64-bit PC, and the OS crashes on bootup.  I'm not
sure how broken my hardware is (possibly completely non
functional), but at least with this patch it will no longer
crash the OS.  Not sure it ever got far enough to try,
but I also do not have firmware for the NIC.

With this patch I get this info on module load:

ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
ath10k: MSI-X interrupt handling (8 intrs)
ath10k: Unable to wakeup target
ath10k: target takes too long to wake up (awake count 1)
ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
ath10k: Failed to get pcie state addr: -5
ath10k: early firmware event indicated
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
....
(it hits the warning case about 5-6 times and then seems to quiesce OK).

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/wireless/ath/ath10k/ce.c |   32 +++++++++++++++++++++++++++++++-
 1 files changed, 31 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index 61a8ac7..56669f8 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -756,13 +756,23 @@ void ath10k_ce_per_engine_service(struct ath10k *ar, unsigned int ce_id)
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	struct ce_state *ce_state = ar_pci->ce_id_to_state[ce_id];
-	u32 ctrl_addr = ce_state->ctrl_addr;
+	u32 ctrl_addr;
 	void *transfer_context;
 	u32 buf;
 	unsigned int nbytes;
 	unsigned int id;
 	unsigned int flags;
 
+	/* On v1 hardware at least, setup can fail, causing ce_id_state to
+	 * be cleaned up, but this method is still called a few times.  Check
+	 * for NULL here so we don't crash.  Probably a better fix is to stop
+	 * the ath10k_pci_ce_tasklet sooner.
+	 */
+	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
+		return;
+
+	ctrl_addr = ce_state->ctrl_addr;
+
 	ath10k_pci_wake(ar);
 	spin_lock_bh(&ar_pci->ce_lock);
 
@@ -954,6 +964,16 @@ static int ath10k_ce_init_src_ring(struct ath10k *ar,
 
 	src_ring->write_index =
 		ath10k_ce_src_ring_write_index_get(ar, ctrl_addr);
+	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
+	 * on a v1 board.
+	 */
+	if (src_ring->write_index > src_ring->nentries_mask) {
+		ath10k_err("src_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
+			   src_ring, src_ring->write_index,
+			   src_ring->nentries_mask);
+		src_ring->write_index = 0;
+	}
+
 	ath10k_pci_sleep(ar);
 
 	src_ring->per_transfer_context = (void **)ptr;
@@ -1037,6 +1057,16 @@ static int ath10k_ce_init_dest_ring(struct ath10k *ar,
 	dest_ring->sw_index = ath10k_ce_dest_ring_read_index_get(ar, ctrl_addr);
 	dest_ring->write_index =
 		ath10k_ce_dest_ring_write_index_get(ar, ctrl_addr);
+	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
+	 * on a v1 board.
+	 */
+	if (dest_ring->write_index > dest_ring->nentries_mask) {
+		ath10k_err("dest_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
+			   dest_ring, dest_ring->write_index,
+			   dest_ring->nentries_mask);
+		dest_ring->write_index = 0;
+	}
+
 	ath10k_pci_sleep(ar);
 
 	dest_ring->per_transfer_context = (void **)ptr;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
@ 2013-07-02 22:42 ` greearb at candelatech.com
  0 siblings, 0 replies; 18+ messages in thread
From: greearb at candelatech.com @ 2013-07-02 22:42 UTC (permalink / raw)
  To: ath9k-devel

From: Ben Greear <greearb@candelatech.com>

I put a v1 NIC from an TP-LINK AC 1750 AP in
a 64-bit PC, and the OS crashes on bootup.  I'm not
sure how broken my hardware is (possibly completely non
functional), but at least with this patch it will no longer
crash the OS.  Not sure it ever got far enough to try,
but I also do not have firmware for the NIC.

With this patch I get this info on module load:

ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
ath10k: MSI-X interrupt handling (8 intrs)
ath10k: Unable to wakeup target
ath10k: target takes too long to wake up (awake count 1)
ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
ath10k: Failed to get pcie state addr: -5
ath10k: early firmware event indicated
------------[ cut here ]------------
WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
....
(it hits the warning case about 5-6 times and then seems to quiesce OK).

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/wireless/ath/ath10k/ce.c |   32 +++++++++++++++++++++++++++++++-
 1 files changed, 31 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index 61a8ac7..56669f8 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -756,13 +756,23 @@ void ath10k_ce_per_engine_service(struct ath10k *ar, unsigned int ce_id)
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	struct ce_state *ce_state = ar_pci->ce_id_to_state[ce_id];
-	u32 ctrl_addr = ce_state->ctrl_addr;
+	u32 ctrl_addr;
 	void *transfer_context;
 	u32 buf;
 	unsigned int nbytes;
 	unsigned int id;
 	unsigned int flags;
 
+	/* On v1 hardware at least, setup can fail, causing ce_id_state to
+	 * be cleaned up, but this method is still called a few times.  Check
+	 * for NULL here so we don't crash.  Probably a better fix is to stop
+	 * the ath10k_pci_ce_tasklet sooner.
+	 */
+	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
+		return;
+
+	ctrl_addr = ce_state->ctrl_addr;
+
 	ath10k_pci_wake(ar);
 	spin_lock_bh(&ar_pci->ce_lock);
 
@@ -954,6 +964,16 @@ static int ath10k_ce_init_src_ring(struct ath10k *ar,
 
 	src_ring->write_index =
 		ath10k_ce_src_ring_write_index_get(ar, ctrl_addr);
+	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
+	 * on a v1 board.
+	 */
+	if (src_ring->write_index > src_ring->nentries_mask) {
+		ath10k_err("src_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
+			   src_ring, src_ring->write_index,
+			   src_ring->nentries_mask);
+		src_ring->write_index = 0;
+	}
+
 	ath10k_pci_sleep(ar);
 
 	src_ring->per_transfer_context = (void **)ptr;
@@ -1037,6 +1057,16 @@ static int ath10k_ce_init_dest_ring(struct ath10k *ar,
 	dest_ring->sw_index = ath10k_ce_dest_ring_read_index_get(ar, ctrl_addr);
 	dest_ring->write_index =
 		ath10k_ce_dest_ring_write_index_get(ar, ctrl_addr);
+	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
+	 * on a v1 board.
+	 */
+	if (dest_ring->write_index > dest_ring->nentries_mask) {
+		ath10k_err("dest_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
+			   dest_ring, dest_ring->write_index,
+			   dest_ring->nentries_mask);
+		dest_ring->write_index = 0;
+	}
+
 	ath10k_pci_sleep(ar);
 
 	dest_ring->per_transfer_context = (void **)ptr;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] ath10k:  Fix crash when using v1 hardware.
  2013-07-02 22:42 ` [ath9k-devel] " greearb at candelatech.com
@ 2013-07-02 23:08   ` Ben Greear
  -1 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-02 23:08 UTC (permalink / raw)
  To: ath9k-devel; +Cc: linux-wireless

Actually, I have no idea what type of hardware this is.  It was
suggested earlier to me that this AP had a v1 hardware in it, but
lspci shows this fairly unpromising thing, and the ath10k driver
appears to call 003c the V2 hardware....

05:00.0 Network controller: Atheros Communications Inc. Device 003c (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel modules: ath10k_pci


On 07/02/2013 03:42 PM, greearb@candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
>
> I put a v1 NIC from an TP-LINK AC 1750 AP in
> a 64-bit PC, and the OS crashes on bootup.  I'm not
> sure how broken my hardware is (possibly completely non
> functional), but at least with this patch it will no longer
> crash the OS.  Not sure it ever got far enough to try,
> but I also do not have firmware for the NIC.
>
> With this patch I get this info on module load:
>
> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
> ath10k: MSI-X interrupt handling (8 intrs)
> ath10k: Unable to wakeup target
> ath10k: target takes too long to wake up (awake count 1)
> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: Failed to get pcie state addr: -5
> ath10k: early firmware event indicated
> ------------[ cut here ]------------
> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
> ....
> (it hits the warning case about 5-6 times and then seems to quiesce OK).
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
>   drivers/net/wireless/ath/ath10k/ce.c |   32 +++++++++++++++++++++++++++++++-
>   1 files changed, 31 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
> index 61a8ac7..56669f8 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.c
> +++ b/drivers/net/wireless/ath/ath10k/ce.c
> @@ -756,13 +756,23 @@ void ath10k_ce_per_engine_service(struct ath10k *ar, unsigned int ce_id)
>   {
>   	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
>   	struct ce_state *ce_state = ar_pci->ce_id_to_state[ce_id];
> -	u32 ctrl_addr = ce_state->ctrl_addr;
> +	u32 ctrl_addr;
>   	void *transfer_context;
>   	u32 buf;
>   	unsigned int nbytes;
>   	unsigned int id;
>   	unsigned int flags;
>
> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
> +	 * be cleaned up, but this method is still called a few times.  Check
> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
> +	 * the ath10k_pci_ce_tasklet sooner.
> +	 */
> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
> +		return;
> +
> +	ctrl_addr = ce_state->ctrl_addr;
> +
>   	ath10k_pci_wake(ar);
>   	spin_lock_bh(&ar_pci->ce_lock);
>
> @@ -954,6 +964,16 @@ static int ath10k_ce_init_src_ring(struct ath10k *ar,
>
>   	src_ring->write_index =
>   		ath10k_ce_src_ring_write_index_get(ar, ctrl_addr);
> +	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
> +	 * on a v1 board.
> +	 */
> +	if (src_ring->write_index > src_ring->nentries_mask) {
> +		ath10k_err("src_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
> +			   src_ring, src_ring->write_index,
> +			   src_ring->nentries_mask);
> +		src_ring->write_index = 0;
> +	}
> +
>   	ath10k_pci_sleep(ar);
>
>   	src_ring->per_transfer_context = (void **)ptr;
> @@ -1037,6 +1057,16 @@ static int ath10k_ce_init_dest_ring(struct ath10k *ar,
>   	dest_ring->sw_index = ath10k_ce_dest_ring_read_index_get(ar, ctrl_addr);
>   	dest_ring->write_index =
>   		ath10k_ce_dest_ring_write_index_get(ar, ctrl_addr);
> +	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
> +	 * on a v1 board.
> +	 */
> +	if (dest_ring->write_index > dest_ring->nentries_mask) {
> +		ath10k_err("dest_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
> +			   dest_ring, dest_ring->write_index,
> +			   dest_ring->nentries_mask);
> +		dest_ring->write_index = 0;
> +	}
> +
>   	ath10k_pci_sleep(ar);
>
>   	dest_ring->per_transfer_context = (void **)ptr;
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.
@ 2013-07-02 23:08   ` Ben Greear
  0 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-02 23:08 UTC (permalink / raw)
  To: ath9k-devel

Actually, I have no idea what type of hardware this is.  It was
suggested earlier to me that this AP had a v1 hardware in it, but
lspci shows this fairly unpromising thing, and the ath10k driver
appears to call 003c the V2 hardware....

05:00.0 Network controller: Atheros Communications Inc. Device 003c (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel modules: ath10k_pci


On 07/02/2013 03:42 PM, greearb at candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
>
> I put a v1 NIC from an TP-LINK AC 1750 AP in
> a 64-bit PC, and the OS crashes on bootup.  I'm not
> sure how broken my hardware is (possibly completely non
> functional), but at least with this patch it will no longer
> crash the OS.  Not sure it ever got far enough to try,
> but I also do not have firmware for the NIC.
>
> With this patch I get this info on module load:
>
> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
> ath10k: MSI-X interrupt handling (8 intrs)
> ath10k: Unable to wakeup target
> ath10k: target takes too long to wake up (awake count 1)
> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: Failed to get pcie state addr: -5
> ath10k: early firmware event indicated
> ------------[ cut here ]------------
> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
> ....
> (it hits the warning case about 5-6 times and then seems to quiesce OK).
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
>   drivers/net/wireless/ath/ath10k/ce.c |   32 +++++++++++++++++++++++++++++++-
>   1 files changed, 31 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
> index 61a8ac7..56669f8 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.c
> +++ b/drivers/net/wireless/ath/ath10k/ce.c
> @@ -756,13 +756,23 @@ void ath10k_ce_per_engine_service(struct ath10k *ar, unsigned int ce_id)
>   {
>   	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
>   	struct ce_state *ce_state = ar_pci->ce_id_to_state[ce_id];
> -	u32 ctrl_addr = ce_state->ctrl_addr;
> +	u32 ctrl_addr;
>   	void *transfer_context;
>   	u32 buf;
>   	unsigned int nbytes;
>   	unsigned int id;
>   	unsigned int flags;
>
> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
> +	 * be cleaned up, but this method is still called a few times.  Check
> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
> +	 * the ath10k_pci_ce_tasklet sooner.
> +	 */
> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
> +		return;
> +
> +	ctrl_addr = ce_state->ctrl_addr;
> +
>   	ath10k_pci_wake(ar);
>   	spin_lock_bh(&ar_pci->ce_lock);
>
> @@ -954,6 +964,16 @@ static int ath10k_ce_init_src_ring(struct ath10k *ar,
>
>   	src_ring->write_index =
>   		ath10k_ce_src_ring_write_index_get(ar, ctrl_addr);
> +	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
> +	 * on a v1 board.
> +	 */
> +	if (src_ring->write_index > src_ring->nentries_mask) {
> +		ath10k_err("src_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
> +			   src_ring, src_ring->write_index,
> +			   src_ring->nentries_mask);
> +		src_ring->write_index = 0;
> +	}
> +
>   	ath10k_pci_sleep(ar);
>
>   	src_ring->per_transfer_context = (void **)ptr;
> @@ -1037,6 +1057,16 @@ static int ath10k_ce_init_dest_ring(struct ath10k *ar,
>   	dest_ring->sw_index = ath10k_ce_dest_ring_read_index_get(ar, ctrl_addr);
>   	dest_ring->write_index =
>   		ath10k_ce_dest_ring_write_index_get(ar, ctrl_addr);
> +	/* Make sure the value above is sane.  Can get 0xFFFFFFFF
> +	 * on a v1 board.
> +	 */
> +	if (dest_ring->write_index > dest_ring->nentries_mask) {
> +		ath10k_err("dest_ring %p:  write_index is out of bounds: %u  nentries_mask: %u.\n",
> +			   dest_ring, dest_ring->write_index,
> +			   dest_ring->nentries_mask);
> +		dest_ring->write_index = 0;
> +	}
> +
>   	ath10k_pci_sleep(ar);
>
>   	dest_ring->per_transfer_context = (void **)ptr;
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
  2013-07-02 22:42 ` [ath9k-devel] " greearb at candelatech.com
@ 2013-07-11  9:36   ` Kalle Valo
  -1 siblings, 0 replies; 18+ messages in thread
From: Kalle Valo @ 2013-07-11  9:36 UTC (permalink / raw)
  To: greearb; +Cc: ath10k, linux-wireless

greearb@candelatech.com writes:

> From: Ben Greear <greearb@candelatech.com>
>
> I put a v1 NIC from an TP-LINK AC 1750 AP in
> a 64-bit PC, and the OS crashes on bootup.  I'm not
> sure how broken my hardware is (possibly completely non
> functional), but at least with this patch it will no longer
> crash the OS.  Not sure it ever got far enough to try,
> but I also do not have firmware for the NIC.
>
> With this patch I get this info on module load:
>
> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
> ath10k: MSI-X interrupt handling (8 intrs)
> ath10k: Unable to wakeup target
> ath10k: target takes too long to wake up (awake count 1)
> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: Failed to get pcie state addr: -5
> ath10k: early firmware event indicated
> ------------[ cut here ]------------
> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
> ....
> (it hits the warning case about 5-6 times and then seems to quiesce OK).

I haven't seen this myself so it might be a hw problem, but difficult to
say.

> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
> +	 * be cleaned up, but this method is still called a few times.  Check
> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
> +	 * the ath10k_pci_ce_tasklet sooner.
> +	 */
> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
> +		return;
> +
> +	ctrl_addr = ce_state->ctrl_addr;
> +

The tests you add look like workarounds. I would prefer to try fix these
by going to the source of the problem. Maybe we should add
ath10k_pci_wake() and ath10k_do_pci_wake()?

Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
would give more hint there things are going wrong.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
@ 2013-07-11  9:36   ` Kalle Valo
  0 siblings, 0 replies; 18+ messages in thread
From: Kalle Valo @ 2013-07-11  9:36 UTC (permalink / raw)
  To: greearb; +Cc: linux-wireless, ath10k

greearb@candelatech.com writes:

> From: Ben Greear <greearb@candelatech.com>
>
> I put a v1 NIC from an TP-LINK AC 1750 AP in
> a 64-bit PC, and the OS crashes on bootup.  I'm not
> sure how broken my hardware is (possibly completely non
> functional), but at least with this patch it will no longer
> crash the OS.  Not sure it ever got far enough to try,
> but I also do not have firmware for the NIC.
>
> With this patch I get this info on module load:
>
> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
> ath10k: MSI-X interrupt handling (8 intrs)
> ath10k: Unable to wakeup target
> ath10k: target takes too long to wake up (awake count 1)
> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
> ath10k: Failed to get pcie state addr: -5
> ath10k: early firmware event indicated
> ------------[ cut here ]------------
> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
> ....
> (it hits the warning case about 5-6 times and then seems to quiesce OK).

I haven't seen this myself so it might be a hw problem, but difficult to
say.

> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
> +	 * be cleaned up, but this method is still called a few times.  Check
> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
> +	 * the ath10k_pci_ce_tasklet sooner.
> +	 */
> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
> +		return;
> +
> +	ctrl_addr = ce_state->ctrl_addr;
> +

The tests you add look like workarounds. I would prefer to try fix these
by going to the source of the problem. Maybe we should add
ath10k_pci_wake() and ath10k_do_pci_wake()?

Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
would give more hint there things are going wrong.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.
  2013-07-02 23:08   ` [ath9k-devel] " Ben Greear
@ 2013-07-11  9:37     ` Kalle Valo
  -1 siblings, 0 replies; 18+ messages in thread
From: Kalle Valo @ 2013-07-11  9:37 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k, linux-wireless

Ben Greear <greearb@candelatech.com> writes:

> Actually, I have no idea what type of hardware this is.  It was
> suggested earlier to me that this AP had a v1 hardware in it, but
> lspci shows this fairly unpromising thing, and the ath10k driver
> appears to call 003c the V2 hardware....
>
> 05:00.0 Network controller: Atheros Communications Inc. Device 003c (rev ff) (prog-if ff)
> 	!!! Unknown header type 7f
> 	Kernel modules: ath10k_pci

That unknown header type looks scary. I'm starting to suspect even more
that you have a hw problem of some sort.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.
@ 2013-07-11  9:37     ` Kalle Valo
  0 siblings, 0 replies; 18+ messages in thread
From: Kalle Valo @ 2013-07-11  9:37 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless, ath10k

Ben Greear <greearb@candelatech.com> writes:

> Actually, I have no idea what type of hardware this is.  It was
> suggested earlier to me that this AP had a v1 hardware in it, but
> lspci shows this fairly unpromising thing, and the ath10k driver
> appears to call 003c the V2 hardware....
>
> 05:00.0 Network controller: Atheros Communications Inc. Device 003c (rev ff) (prog-if ff)
> 	!!! Unknown header type 7f
> 	Kernel modules: ath10k_pci

That unknown header type looks scary. I'm starting to suspect even more
that you have a hw problem of some sort.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
  2013-07-11  9:36   ` Kalle Valo
@ 2013-07-11 15:05     ` Ben Greear
  -1 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-11 15:05 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k, linux-wireless

On 07/11/2013 02:36 AM, Kalle Valo wrote:
> greearb@candelatech.com writes:
>
>> From: Ben Greear <greearb@candelatech.com>
>>
>> I put a v1 NIC from an TP-LINK AC 1750 AP in
>> a 64-bit PC, and the OS crashes on bootup.  I'm not
>> sure how broken my hardware is (possibly completely non
>> functional), but at least with this patch it will no longer
>> crash the OS.  Not sure it ever got far enough to try,
>> but I also do not have firmware for the NIC.
>>
>> With this patch I get this info on module load:
>>
>> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
>> ath10k: MSI-X interrupt handling (8 intrs)
>> ath10k: Unable to wakeup target
>> ath10k: target takes too long to wake up (awake count 1)
>> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
>> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
>> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
>> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: Failed to get pcie state addr: -5
>> ath10k: early firmware event indicated
>> ------------[ cut here ]------------
>> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
>> ....
>> (it hits the warning case about 5-6 times and then seems to quiesce OK).
>
> I haven't seen this myself so it might be a hw problem, but difficult to
> say.
>
>> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
>> +	 * be cleaned up, but this method is still called a few times.  Check
>> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
>> +	 * the ath10k_pci_ce_tasklet sooner.
>> +	 */
>> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>> +		return;
>> +
>> +	ctrl_addr = ce_state->ctrl_addr;
>> +
>
> The tests you add look like workarounds. I would prefer to try fix these
> by going to the source of the problem. Maybe we should add
> ath10k_pci_wake() and ath10k_do_pci_wake()?

These are work-arounds, but you should not let a bad piece of hardware/firmware crash
the entire OS just because you don't want to do sanity checking on the
values you get from the firmware.  Perhaps there is a better fix for the
code above, but the warning splat should still provide incentive to get
it right, while not crashing the OS in the meantime.


> Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
> would give more hint there things are going wrong.


Yes, I can do that.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
@ 2013-07-11 15:05     ` Ben Greear
  0 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-11 15:05 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless, ath10k

On 07/11/2013 02:36 AM, Kalle Valo wrote:
> greearb@candelatech.com writes:
>
>> From: Ben Greear <greearb@candelatech.com>
>>
>> I put a v1 NIC from an TP-LINK AC 1750 AP in
>> a 64-bit PC, and the OS crashes on bootup.  I'm not
>> sure how broken my hardware is (possibly completely non
>> functional), but at least with this patch it will no longer
>> crash the OS.  Not sure it ever got far enough to try,
>> but I also do not have firmware for the NIC.
>>
>> With this patch I get this info on module load:
>>
>> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
>> ath10k: MSI-X interrupt handling (8 intrs)
>> ath10k: Unable to wakeup target
>> ath10k: target takes too long to wake up (awake count 1)
>> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
>> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
>> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
>> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: Failed to get pcie state addr: -5
>> ath10k: early firmware event indicated
>> ------------[ cut here ]------------
>> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
>> ....
>> (it hits the warning case about 5-6 times and then seems to quiesce OK).
>
> I haven't seen this myself so it might be a hw problem, but difficult to
> say.
>
>> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
>> +	 * be cleaned up, but this method is still called a few times.  Check
>> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
>> +	 * the ath10k_pci_ce_tasklet sooner.
>> +	 */
>> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>> +		return;
>> +
>> +	ctrl_addr = ce_state->ctrl_addr;
>> +
>
> The tests you add look like workarounds. I would prefer to try fix these
> by going to the source of the problem. Maybe we should add
> ath10k_pci_wake() and ath10k_do_pci_wake()?

These are work-arounds, but you should not let a bad piece of hardware/firmware crash
the entire OS just because you don't want to do sanity checking on the
values you get from the firmware.  Perhaps there is a better fix for the
code above, but the warning splat should still provide incentive to get
it right, while not crashing the OS in the meantime.


> Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
> would give more hint there things are going wrong.


Yes, I can do that.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.
  2013-07-11  9:37     ` Kalle Valo
@ 2013-07-11 15:06       ` Ben Greear
  -1 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-11 15:06 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k, linux-wireless

On 07/11/2013 02:37 AM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> Actually, I have no idea what type of hardware this is.  It was
>> suggested earlier to me that this AP had a v1 hardware in it, but
>> lspci shows this fairly unpromising thing, and the ath10k driver
>> appears to call 003c the V2 hardware....
>>
>> 05:00.0 Network controller: Atheros Communications Inc. Device 003c (rev ff) (prog-if ff)
>> 	!!! Unknown header type 7f
>> 	Kernel modules: ath10k_pci
>
> That unknown header type looks scary. I'm starting to suspect even more
> that you have a hw problem of some sort.

I tried two different NICS extracted from two different APs and the behaviour
is the same, so it is probably not just a manufacturing error.

Please note that I do NOT have v1 firmware, though from what I could tell,
this NIC was showing up as v2 hardware anyway...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.
@ 2013-07-11 15:06       ` Ben Greear
  0 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-11 15:06 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless, ath10k

On 07/11/2013 02:37 AM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> Actually, I have no idea what type of hardware this is.  It was
>> suggested earlier to me that this AP had a v1 hardware in it, but
>> lspci shows this fairly unpromising thing, and the ath10k driver
>> appears to call 003c the V2 hardware....
>>
>> 05:00.0 Network controller: Atheros Communications Inc. Device 003c (rev ff) (prog-if ff)
>> 	!!! Unknown header type 7f
>> 	Kernel modules: ath10k_pci
>
> That unknown header type looks scary. I'm starting to suspect even more
> that you have a hw problem of some sort.

I tried two different NICS extracted from two different APs and the behaviour
is the same, so it is probably not just a manufacturing error.

Please note that I do NOT have v1 firmware, though from what I could tell,
this NIC was showing up as v2 hardware anyway...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
  2013-07-11 15:05     ` Ben Greear
@ 2013-07-12 14:51       ` Kalle Valo
  -1 siblings, 0 replies; 18+ messages in thread
From: Kalle Valo @ 2013-07-12 14:51 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k, linux-wireless

Ben Greear <greearb@candelatech.com> writes:

> On 07/11/2013 02:36 AM, Kalle Valo wrote:
>> greearb@candelatech.com writes:
>>
>>> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
>>> +	 * be cleaned up, but this method is still called a few times.  Check
>>> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
>>> +	 * the ath10k_pci_ce_tasklet sooner.
>>> +	 */
>>> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>>> +		return;
>>> +
>>> +	ctrl_addr = ce_state->ctrl_addr;
>>> +
>>
>> The tests you add look like workarounds. I would prefer to try fix these
>> by going to the source of the problem. Maybe we should add
>> ath10k_pci_wake() and ath10k_do_pci_wake()?

Doh, dropped an essential word from a sentence, again. That's what I get
when trying to do multiple things at the same time.

What I was trying to say: Maybe we should add proper error hanling to
ath10k_pci_wake() and ath10k_do_pci_wake() and that way avoid this?

> These are work-arounds, but you should not let a bad piece of
> hardware/firmware crash the entire OS just because you don't want to
> do sanity checking on the values you get from the firmware. Perhaps
> there is a better fix for the code above, but the warning splat should
> still provide incentive to get it right, while not crashing the OS in
> the meantime.

Sure, the driver should not crash if HW is not functioning correctly.
I'm just saying that adding odd checks at random places is not the
"kernel way" to do things, only GTK people do that ;)

I was thinking that the proper way is to check that wakeup succeeds and
not enable interrupts or something like that.

>> Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
>> would give more hint there things are going wrong.
>
> Yes, I can do that.

Thanks.

Oh, did you mention something that ath10k detect the device as hw2.0?
Maybe the PCI ids are wrong? Then you could also force the same
workaround for hw2.0 as hw1.0 has:

--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -2145,10 +2145,8 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
 
        switch (pci_dev->device) {
        case QCA988X_1_0_DEVICE_ID:
-               set_bit(ATH10K_PCI_FEATURE_HW_1_0_WARKAROUND, ar_pci->features);
-               break;
        case QCA988X_2_0_DEVICE_ID:
-               set_bit(ATH10K_PCI_FEATURE_MSI_X, ar_pci->features);
+               set_bit(ATH10K_PCI_FEATURE_HW_1_0_WARKAROUND, ar_pci->features);
                break;
        default:
                ret = -ENODEV;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
@ 2013-07-12 14:51       ` Kalle Valo
  0 siblings, 0 replies; 18+ messages in thread
From: Kalle Valo @ 2013-07-12 14:51 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless, ath10k

Ben Greear <greearb@candelatech.com> writes:

> On 07/11/2013 02:36 AM, Kalle Valo wrote:
>> greearb@candelatech.com writes:
>>
>>> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
>>> +	 * be cleaned up, but this method is still called a few times.  Check
>>> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
>>> +	 * the ath10k_pci_ce_tasklet sooner.
>>> +	 */
>>> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>>> +		return;
>>> +
>>> +	ctrl_addr = ce_state->ctrl_addr;
>>> +
>>
>> The tests you add look like workarounds. I would prefer to try fix these
>> by going to the source of the problem. Maybe we should add
>> ath10k_pci_wake() and ath10k_do_pci_wake()?

Doh, dropped an essential word from a sentence, again. That's what I get
when trying to do multiple things at the same time.

What I was trying to say: Maybe we should add proper error hanling to
ath10k_pci_wake() and ath10k_do_pci_wake() and that way avoid this?

> These are work-arounds, but you should not let a bad piece of
> hardware/firmware crash the entire OS just because you don't want to
> do sanity checking on the values you get from the firmware. Perhaps
> there is a better fix for the code above, but the warning splat should
> still provide incentive to get it right, while not crashing the OS in
> the meantime.

Sure, the driver should not crash if HW is not functioning correctly.
I'm just saying that adding odd checks at random places is not the
"kernel way" to do things, only GTK people do that ;)

I was thinking that the proper way is to check that wakeup succeeds and
not enable interrupts or something like that.

>> Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
>> would give more hint there things are going wrong.
>
> Yes, I can do that.

Thanks.

Oh, did you mention something that ath10k detect the device as hw2.0?
Maybe the PCI ids are wrong? Then you could also force the same
workaround for hw2.0 as hw1.0 has:

--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -2145,10 +2145,8 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
 
        switch (pci_dev->device) {
        case QCA988X_1_0_DEVICE_ID:
-               set_bit(ATH10K_PCI_FEATURE_HW_1_0_WARKAROUND, ar_pci->features);
-               break;
        case QCA988X_2_0_DEVICE_ID:
-               set_bit(ATH10K_PCI_FEATURE_MSI_X, ar_pci->features);
+               set_bit(ATH10K_PCI_FEATURE_HW_1_0_WARKAROUND, ar_pci->features);
                break;
        default:
                ret = -ENODEV;


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
  2013-07-12 14:51       ` Kalle Valo
@ 2013-07-12 15:34         ` Ben Greear
  -1 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-12 15:34 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k, linux-wireless

On 07/12/2013 07:51 AM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> On 07/11/2013 02:36 AM, Kalle Valo wrote:
>>> greearb@candelatech.com writes:
>>>
>>>> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
>>>> +	 * be cleaned up, but this method is still called a few times.  Check
>>>> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
>>>> +	 * the ath10k_pci_ce_tasklet sooner.
>>>> +	 */
>>>> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>>>> +		return;
>>>> +
>>>> +	ctrl_addr = ce_state->ctrl_addr;
>>>> +
>>>
>>> The tests you add look like workarounds. I would prefer to try fix these
>>> by going to the source of the problem. Maybe we should add
>>> ath10k_pci_wake() and ath10k_do_pci_wake()?
>
> Doh, dropped an essential word from a sentence, again. That's what I get
> when trying to do multiple things at the same time.
>
> What I was trying to say: Maybe we should add proper error hanling to
> ath10k_pci_wake() and ath10k_do_pci_wake() and that way avoid this?

Maybe..I haven't spent much time in the driver yet, so not sure
the best place to fix the null ce_state issue.

>> These are work-arounds, but you should not let a bad piece of
>> hardware/firmware crash the entire OS just because you don't want to
>> do sanity checking on the values you get from the firmware. Perhaps
>> there is a better fix for the code above, but the warning splat should
>> still provide incentive to get it right, while not crashing the OS in
>> the meantime.
>
> Sure, the driver should not crash if HW is not functioning correctly.
> I'm just saying that adding odd checks at random places is not the
> "kernel way" to do things, only GTK people do that ;)
>
> I was thinking that the proper way is to check that wakeup succeeds and
> not enable interrupts or something like that.

With the array-index crashes, you are asking firmware for some
value, and not sanity checking the return value to make sure it
is bounded by the array length.  In my case, it's returning something very
wrong because PCI isn't talking, or something like that.

But, even with functional PCI, you could get a value that for
whatever reason doesn't match the expected value (ie, buggy firmware).
I think you must sanity check those values that can crash the OS
with stray memory access, at least.

>>> Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
>>> would give more hint there things are going wrong.
>>
>> Yes, I can do that.
>
> Thanks.
>
> Oh, did you mention something that ath10k detect the device as hw2.0?
> Maybe the PCI ids are wrong? Then you could also force the same
> workaround for hw2.0 as hw1.0 has:

I tried just changing the #defines so that it was treated as V1,
but that did not help.  I can try your suggestion when I get
a chance.  I hope to have some time to do some more
twiddling on these NICs later today or early next week.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k:  Fix crash when using v1 hardware.
@ 2013-07-12 15:34         ` Ben Greear
  0 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2013-07-12 15:34 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless, ath10k

On 07/12/2013 07:51 AM, Kalle Valo wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> On 07/11/2013 02:36 AM, Kalle Valo wrote:
>>> greearb@candelatech.com writes:
>>>
>>>> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
>>>> +	 * be cleaned up, but this method is still called a few times.  Check
>>>> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
>>>> +	 * the ath10k_pci_ce_tasklet sooner.
>>>> +	 */
>>>> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>>>> +		return;
>>>> +
>>>> +	ctrl_addr = ce_state->ctrl_addr;
>>>> +
>>>
>>> The tests you add look like workarounds. I would prefer to try fix these
>>> by going to the source of the problem. Maybe we should add
>>> ath10k_pci_wake() and ath10k_do_pci_wake()?
>
> Doh, dropped an essential word from a sentence, again. That's what I get
> when trying to do multiple things at the same time.
>
> What I was trying to say: Maybe we should add proper error hanling to
> ath10k_pci_wake() and ath10k_do_pci_wake() and that way avoid this?

Maybe..I haven't spent much time in the driver yet, so not sure
the best place to fix the null ce_state issue.

>> These are work-arounds, but you should not let a bad piece of
>> hardware/firmware crash the entire OS just because you don't want to
>> do sanity checking on the values you get from the firmware. Perhaps
>> there is a better fix for the code above, but the warning splat should
>> still provide incentive to get it right, while not crashing the OS in
>> the meantime.
>
> Sure, the driver should not crash if HW is not functioning correctly.
> I'm just saying that adding odd checks at random places is not the
> "kernel way" to do things, only GTK people do that ;)
>
> I was thinking that the proper way is to check that wakeup succeeds and
> not enable interrupts or something like that.

With the array-index crashes, you are asking firmware for some
value, and not sanity checking the return value to make sure it
is bounded by the array length.  In my case, it's returning something very
wrong because PCI isn't talking, or something like that.

But, even with functional PCI, you could get a value that for
whatever reason doesn't match the expected value (ie, buggy firmware).
I think you must sanity check those values that can crash the OS
with stray memory access, at least.

>>> Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
>>> would give more hint there things are going wrong.
>>
>> Yes, I can do that.
>
> Thanks.
>
> Oh, did you mention something that ath10k detect the device as hw2.0?
> Maybe the PCI ids are wrong? Then you could also force the same
> workaround for hw2.0 as hw1.0 has:

I tried just changing the #defines so that it was treated as V1,
but that did not help.  I can try your suggestion when I get
a chance.  I hope to have some time to do some more
twiddling on these NICs later today or early next week.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.
  2013-07-11  9:36   ` Kalle Valo
@ 2013-07-15  6:55     ` Michal Kazior
  -1 siblings, 0 replies; 18+ messages in thread
From: Michal Kazior @ 2013-07-15  6:55 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Ben Greear, ath10k, linux-wireless

Hi,

On 11 July 2013 11:36, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> greearb@candelatech.com writes:
>
>> From: Ben Greear <greearb@candelatech.com>
>>
>> I put a v1 NIC from an TP-LINK AC 1750 AP in
>> a 64-bit PC, and the OS crashes on bootup.  I'm not
>> sure how broken my hardware is (possibly completely non
>> functional), but at least with this patch it will no longer
>> crash the OS.  Not sure it ever got far enough to try,
>> but I also do not have firmware for the NIC.
>>
>> With this patch I get this info on module load:
>>
>> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
>> ath10k: MSI-X interrupt handling (8 intrs)
>> ath10k: Unable to wakeup target
>> ath10k: target takes too long to wake up (awake count 1)
>> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
>> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
>> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
>> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: Failed to get pcie state addr: -5
>> ath10k: early firmware event indicated
>> ------------[ cut here ]------------
>> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
>> ....
>> (it hits the warning case about 5-6 times and then seems to quiesce OK).
>
> I haven't seen this myself so it might be a hw problem, but difficult to
> say.
>
>> +     /* On v1 hardware at least, setup can fail, causing ce_id_state to
>> +      * be cleaned up, but this method is still called a few times.  Check
>> +      * for NULL here so we don't crash.  Probably a better fix is to stop
>> +      * the ath10k_pci_ce_tasklet sooner.
>> +      */
>> +     if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>> +             return;
>> +
>> +     ctrl_addr = ce_state->ctrl_addr;
>> +
>
> The tests you add look like workarounds. I would prefer to try fix these
> by going to the source of the problem. Maybe we should add
> ath10k_pci_wake() and ath10k_do_pci_wake()?

The teardown sequence is broken. Interrupts aren't
disabled/unregistered soon enough. I have a pending patch that should
fix this (although it's still more of a workaround than a real fix
since we need proper variable init code/ hw init code separation to be
done). However the patch depends on my recovery patchset so I haven't
posted it yet.


Pozdrawiam / Best regards,
Michał Kazior.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.
@ 2013-07-15  6:55     ` Michal Kazior
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Kazior @ 2013-07-15  6:55 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Ben Greear, linux-wireless, ath10k

Hi,

On 11 July 2013 11:36, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> greearb@candelatech.com writes:
>
>> From: Ben Greear <greearb@candelatech.com>
>>
>> I put a v1 NIC from an TP-LINK AC 1750 AP in
>> a 64-bit PC, and the OS crashes on bootup.  I'm not
>> sure how broken my hardware is (possibly completely non
>> functional), but at least with this patch it will no longer
>> crash the OS.  Not sure it ever got far enough to try,
>> but I also do not have firmware for the NIC.
>>
>> With this patch I get this info on module load:
>>
>> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
>> ath10k: MSI-X interrupt handling (8 intrs)
>> ath10k: Unable to wakeup target
>> ath10k: target takes too long to wake up (awake count 1)
>> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
>> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
>> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
>> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: Failed to get pcie state addr: -5
>> ath10k: early firmware event indicated
>> ------------[ cut here ]------------
>> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
>> ....
>> (it hits the warning case about 5-6 times and then seems to quiesce OK).
>
> I haven't seen this myself so it might be a hw problem, but difficult to
> say.
>
>> +     /* On v1 hardware at least, setup can fail, causing ce_id_state to
>> +      * be cleaned up, but this method is still called a few times.  Check
>> +      * for NULL here so we don't crash.  Probably a better fix is to stop
>> +      * the ath10k_pci_ce_tasklet sooner.
>> +      */
>> +     if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>> +             return;
>> +
>> +     ctrl_addr = ce_state->ctrl_addr;
>> +
>
> The tests you add look like workarounds. I would prefer to try fix these
> by going to the source of the problem. Maybe we should add
> ath10k_pci_wake() and ath10k_do_pci_wake()?

The teardown sequence is broken. Interrupts aren't
disabled/unregistered soon enough. I have a pending patch that should
fix this (although it's still more of a workaround than a real fix
since we need proper variable init code/ hw init code separation to be
done). However the patch depends on my recovery patchset so I haven't
posted it yet.


Pozdrawiam / Best regards,
Michał Kazior.

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-07-15  6:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-02 22:42 [PATCH] ath10k: Fix crash when using v1 hardware greearb
2013-07-02 22:42 ` [ath9k-devel] " greearb at candelatech.com
2013-07-02 23:08 ` Ben Greear
2013-07-02 23:08   ` [ath9k-devel] " Ben Greear
2013-07-11  9:37   ` Kalle Valo
2013-07-11  9:37     ` Kalle Valo
2013-07-11 15:06     ` Ben Greear
2013-07-11 15:06       ` Ben Greear
2013-07-11  9:36 ` Kalle Valo
2013-07-11  9:36   ` Kalle Valo
2013-07-11 15:05   ` Ben Greear
2013-07-11 15:05     ` Ben Greear
2013-07-12 14:51     ` Kalle Valo
2013-07-12 14:51       ` Kalle Valo
2013-07-12 15:34       ` Ben Greear
2013-07-12 15:34         ` Ben Greear
2013-07-15  6:55   ` Michal Kazior
2013-07-15  6:55     ` Michal Kazior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.