All of lore.kernel.org
 help / color / mirror / Atom feed
* ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
@ 2013-10-11 23:51 ` Ben Greear
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Greear @ 2013-10-11 23:51 UTC (permalink / raw)
  To: ath9k-devel@lists.ath9k.org; +Cc: linux-wireless

My kernel is lightly patched, no patches to ath10k.

Looks like something is not checking a NULL pointer, but I did
not dig into the code.  I'm not actually running any traffic on
the ath10k device.


ath10k: Completion buffers are full
ath10k: Completion buffers are full
ath10k: MSI-X interrupt handling (8 intrs)
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
ath10k: UART prints disabled
ath10k: firmware 1.0.0.636 booted
ath10k: htt target version 2.1

...

ath10k: Completion buffers are full
ath10k: Completion buffers are full
IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
8021q: adding VLAN 0 to HW filter on device eth1
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
ath10k: MSI-X interrupt handling (8 intrs)
ath10k: Unable to wakeup target
ath10k: target took longer 5000 us to wake up (awake count 1)
ath10k: Failed to get pcie state addr: -16
ath10k: early firmware event indicated
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
PGD d1f5b067 PUD d9c1c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs fscache nf_nat_ipv4 nf_nat 8021q ga]
CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G         C   3.12.0-rc3-wl+ #10
Hardware name: To be filled by O.E.M. To be filled by O.E.M./HURONRIVER, BIOS 4.6.5 05/02/2012
task: ffff8802160fdd80 ti: ffff880216306000 task.ti: ffff880216306000
RIP: 0010:[<ffffffffa06ae46c>]  [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
RSP: 0018:ffff880216307c98  EFLAGS: 00010246
RAX: 0000000000005b5b RBX: ffff88020bfbfc50 RCX: 0000000000057400
RDX: ffff88020c7e5f20 RSI: ffff880216307d10 RDI: ffff88020bfbfc48
RBP: ffff880216307cf8 R08: ffff880216307d1c R09: ffff8800d967d438
R10: ffff88021fa92ff0 R11: 0000000000000001 R12: ffff880216307d10
R13: ffff880216307d24 R14: ffff880216307d20 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88021fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 00000000d1c76000 CR4: 00000000000407e0
Stack:
  ffff880216307cc8 ffff8802160fdde8 ffff88021fa92ff0 ffff880216307d1c
  ffff88020bfbfc48 ffff88021fa92ff0 ffff880216307d08 ffff88020c7e5f20
  ffff88020bfbf800 0000000000000000 ffff88020bfbfc48 0000000000057430
Call Trace:
  [<ffffffffa06ab568>] ath10k_pci_bmi_send_done+0x1d/0x32 [ath10k_pci]
  [<ffffffff810a4915>] ? local_bh_enable_ip+0x9/0xb
  [<ffffffff8158e801>] ? _raw_spin_unlock_bh+0x1f/0x21
  [<ffffffffa06ae357>] ath10k_ce_per_engine_service+0xab/0xea [ath10k_pci]
  [<ffffffffa06ab3b4>] ath10k_pci_ce_tasklet+0x15/0x17 [ath10k_pci]
  [<ffffffff810a3e3e>] tasklet_action+0x78/0xc6
  [<ffffffff810a463c>] __do_softirq+0xc4/0x19d
  [<ffffffff810a4738>] run_ksoftirqd+0x23/0x42
  [<ffffffff810c0ea3>] smpboot_thread_fn+0x21e/0x223
  [<ffffffff810c0c85>] ? smpboot_create_threads+0x61/0x61
  [<ffffffff810ba54d>] kthread+0xb0/0xb8
  [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
  [<ffffffff815939cc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
Code: 38 4c 89 45 b8 48 8b 07 48 8b 80 a0 01 00 00 48 05 48 04 00 00 48 89 c7 48 89 45 c0 e8 b6 07 ee e0
RIP  [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
  RSP <ffff880216307c98>
CR2: 0000000000000004
---[ end trace 6b10a0163cca0cc3 ]---

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ath9k-devel] ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
@ 2013-10-11 23:51 ` Ben Greear
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Greear @ 2013-10-11 23:51 UTC (permalink / raw)
  To: ath9k-devel

My kernel is lightly patched, no patches to ath10k.

Looks like something is not checking a NULL pointer, but I did
not dig into the code.  I'm not actually running any traffic on
the ath10k device.


ath10k: Completion buffers are full
ath10k: Completion buffers are full
ath10k: MSI-X interrupt handling (8 intrs)
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
ath10k: UART prints disabled
ath10k: firmware 1.0.0.636 booted
ath10k: htt target version 2.1

...

ath10k: Completion buffers are full
ath10k: Completion buffers are full
IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
8021q: adding VLAN 0 to HW filter on device eth1
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
ath10k: MSI-X interrupt handling (8 intrs)
ath10k: Unable to wakeup target
ath10k: target took longer 5000 us to wake up (awake count 1)
ath10k: Failed to get pcie state addr: -16
ath10k: early firmware event indicated
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
PGD d1f5b067 PUD d9c1c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs fscache nf_nat_ipv4 nf_nat 8021q ga]
CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G         C   3.12.0-rc3-wl+ #10
Hardware name: To be filled by O.E.M. To be filled by O.E.M./HURONRIVER, BIOS 4.6.5 05/02/2012
task: ffff8802160fdd80 ti: ffff880216306000 task.ti: ffff880216306000
RIP: 0010:[<ffffffffa06ae46c>]  [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
RSP: 0018:ffff880216307c98  EFLAGS: 00010246
RAX: 0000000000005b5b RBX: ffff88020bfbfc50 RCX: 0000000000057400
RDX: ffff88020c7e5f20 RSI: ffff880216307d10 RDI: ffff88020bfbfc48
RBP: ffff880216307cf8 R08: ffff880216307d1c R09: ffff8800d967d438
R10: ffff88021fa92ff0 R11: 0000000000000001 R12: ffff880216307d10
R13: ffff880216307d24 R14: ffff880216307d20 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88021fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 00000000d1c76000 CR4: 00000000000407e0
Stack:
  ffff880216307cc8 ffff8802160fdde8 ffff88021fa92ff0 ffff880216307d1c
  ffff88020bfbfc48 ffff88021fa92ff0 ffff880216307d08 ffff88020c7e5f20
  ffff88020bfbf800 0000000000000000 ffff88020bfbfc48 0000000000057430
Call Trace:
  [<ffffffffa06ab568>] ath10k_pci_bmi_send_done+0x1d/0x32 [ath10k_pci]
  [<ffffffff810a4915>] ? local_bh_enable_ip+0x9/0xb
  [<ffffffff8158e801>] ? _raw_spin_unlock_bh+0x1f/0x21
  [<ffffffffa06ae357>] ath10k_ce_per_engine_service+0xab/0xea [ath10k_pci]
  [<ffffffffa06ab3b4>] ath10k_pci_ce_tasklet+0x15/0x17 [ath10k_pci]
  [<ffffffff810a3e3e>] tasklet_action+0x78/0xc6
  [<ffffffff810a463c>] __do_softirq+0xc4/0x19d
  [<ffffffff810a4738>] run_ksoftirqd+0x23/0x42
  [<ffffffff810c0ea3>] smpboot_thread_fn+0x21e/0x223
  [<ffffffff810c0c85>] ? smpboot_create_threads+0x61/0x61
  [<ffffffff810ba54d>] kthread+0xb0/0xb8
  [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
  [<ffffffff815939cc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
Code: 38 4c 89 45 b8 48 8b 07 48 8b 80 a0 01 00 00 48 05 48 04 00 00 48 89 c7 48 89 45 c0 e8 b6 07 ee e0
RIP  [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
  RSP <ffff880216307c98>
CR2: 0000000000000004
---[ end trace 6b10a0163cca0cc3 ]---

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-11 23:51 ` [ath9k-devel] " Ben Greear
  (?)
@ 2013-10-16 16:00 ` Ben Greear
  2013-10-16 16:23   ` Kalle Valo
  -1 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2013-10-16 16:00 UTC (permalink / raw)
  To: ath10k

I sent this to the wrong list the first time.

Do you know if this is already addressed?  If not, I'll see if I can
find a fix.

Thanks,
Ben


-------- Original Message --------
Subject: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
Date: Fri, 11 Oct 2013 16:51:40 -0700
From: Ben Greear <greearb@candelatech.com>
Organization: Candela Technologies
To: ath9k-devel@lists.ath9k.org <ath9k-devel@venema.h4ckr.net>
CC: linux-wireless@vger.kernel.org <linux-wireless@vger.kernel.org>

My kernel is lightly patched, no patches to ath10k.

Looks like something is not checking a NULL pointer, but I did
not dig into the code.  I'm not actually running any traffic on
the ath10k device.


ath10k: Completion buffers are full
ath10k: Completion buffers are full
ath10k: MSI-X interrupt handling (8 intrs)
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
ath10k: UART prints disabled
ath10k: firmware 1.0.0.636 booted
ath10k: htt target version 2.1

...

ath10k: Completion buffers are full
ath10k: Completion buffers are full
IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
8021q: adding VLAN 0 to HW filter on device eth1
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
ath10k: MSI-X interrupt handling (8 intrs)
ath10k: Unable to wakeup target
ath10k: target took longer 5000 us to wake up (awake count 1)
ath10k: Failed to get pcie state addr: -16
ath10k: early firmware event indicated
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
PGD d1f5b067 PUD d9c1c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs fscache nf_nat_ipv4 nf_nat 8021q ga]
CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G         C   3.12.0-rc3-wl+ #10
Hardware name: To be filled by O.E.M. To be filled by O.E.M./HURONRIVER, BIOS 4.6.5 05/02/2012
task: ffff8802160fdd80 ti: ffff880216306000 task.ti: ffff880216306000
RIP: 0010:[<ffffffffa06ae46c>]  [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
RSP: 0018:ffff880216307c98  EFLAGS: 00010246
RAX: 0000000000005b5b RBX: ffff88020bfbfc50 RCX: 0000000000057400
RDX: ffff88020c7e5f20 RSI: ffff880216307d10 RDI: ffff88020bfbfc48
RBP: ffff880216307cf8 R08: ffff880216307d1c R09: ffff8800d967d438
R10: ffff88021fa92ff0 R11: 0000000000000001 R12: ffff880216307d10
R13: ffff880216307d24 R14: ffff880216307d20 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88021fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 00000000d1c76000 CR4: 00000000000407e0
Stack:
  ffff880216307cc8 ffff8802160fdde8 ffff88021fa92ff0 ffff880216307d1c
  ffff88020bfbfc48 ffff88021fa92ff0 ffff880216307d08 ffff88020c7e5f20
  ffff88020bfbf800 0000000000000000 ffff88020bfbfc48 0000000000057430
Call Trace:
  [<ffffffffa06ab568>] ath10k_pci_bmi_send_done+0x1d/0x32 [ath10k_pci]
  [<ffffffff810a4915>] ? local_bh_enable_ip+0x9/0xb
  [<ffffffff8158e801>] ? _raw_spin_unlock_bh+0x1f/0x21
  [<ffffffffa06ae357>] ath10k_ce_per_engine_service+0xab/0xea [ath10k_pci]
  [<ffffffffa06ab3b4>] ath10k_pci_ce_tasklet+0x15/0x17 [ath10k_pci]
  [<ffffffff810a3e3e>] tasklet_action+0x78/0xc6
  [<ffffffff810a463c>] __do_softirq+0xc4/0x19d
  [<ffffffff810a4738>] run_ksoftirqd+0x23/0x42
  [<ffffffff810c0ea3>] smpboot_thread_fn+0x21e/0x223
  [<ffffffff810c0c85>] ? smpboot_create_threads+0x61/0x61
  [<ffffffff810ba54d>] kthread+0xb0/0xb8
  [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
  [<ffffffff815939cc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
Code: 38 4c 89 45 b8 48 8b 07 48 8b 80 a0 01 00 00 48 05 48 04 00 00 48 89 c7 48 89 45 c0 e8 b6 07 ee e0
RIP  [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
  RSP <ffff880216307c98>
CR2: 0000000000000004
---[ end trace 6b10a0163cca0cc3 ]---

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-16 16:00 ` Fwd: " Ben Greear
@ 2013-10-16 16:23   ` Kalle Valo
  2013-10-17  8:43     ` Kalle Valo
  2013-10-21 14:00     ` Kalle Valo
  0 siblings, 2 replies; 12+ messages in thread
From: Kalle Valo @ 2013-10-16 16:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Ben Greear <greearb@candelatech.com> writes:

> I sent this to the wrong list the first time.
>
> Do you know if this is already addressed?  If not, I'll see if I can
> find a fix.

[...]

> ath10k: MSI-X interrupt handling (8 intrs)
> ath10k: Unable to wakeup target
> ath10k: target took longer 5000 us to wake up (awake count 1)
> ath10k: Failed to get pcie state addr: -16
> ath10k: early firmware event indicated
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122
> [ath10k_pci]

I think there are two bugs here:

1) Cold reset doesn't always work, Michal has a patch for that. That's
   why the wakeup fails:

http://lists.infradead.org/pipermail/ath10k/2013-October/000638.html

2) We enable interrupts too early and if wakeup fails and we get a
   spurious interrupt ath10k crashes. We don't have a fix for this yet.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-16 16:23   ` Kalle Valo
@ 2013-10-17  8:43     ` Kalle Valo
  2013-10-17 22:58       ` Michal Kazior
  2013-10-21 14:00     ` Kalle Valo
  1 sibling, 1 reply; 12+ messages in thread
From: Kalle Valo @ 2013-10-17  8:43 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Kalle Valo <kvalo@qca.qualcomm.com> writes:

> Ben Greear <greearb@candelatech.com> writes:
>
>> I sent this to the wrong list the first time.
>>
>> Do you know if this is already addressed?  If not, I'll see if I can
>> find a fix.
>
> [...]
>
>> ath10k: MSI-X interrupt handling (8 intrs)
>> ath10k: Unable to wakeup target
>> ath10k: target took longer 5000 us to wake up (awake count 1)
>> ath10k: Failed to get pcie state addr: -16
>> ath10k: early firmware event indicated
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122
>> [ath10k_pci]
>
> I think there are two bugs here:
>
> 1) Cold reset doesn't always work, Michal has a patch for that. That's
>    why the wakeup fails:
>
> http://lists.infradead.org/pipermail/ath10k/2013-October/000638.html
>
> 2) We enable interrupts too early and if wakeup fails and we get a
>    spurious interrupt ath10k crashes. We don't have a fix for this yet.

I tried to look how to enable interrupts only after everything is
properly initialised in ath10k, but didn't find any quick way to do
that. I guess one ugly way to workaround this race is to add a state
variable which is checked in the interrupt handler.

Does anyone else have any other ideas?

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-17  8:43     ` Kalle Valo
@ 2013-10-17 22:58       ` Michal Kazior
  2013-10-18  6:29         ` Kalle Valo
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Kazior @ 2013-10-17 22:58 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Ben Greear, ath10k

On 17 October 2013 01:43, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>
>> Ben Greear <greearb@candelatech.com> writes:
>>
>>> I sent this to the wrong list the first time.
>>>
>>> Do you know if this is already addressed?  If not, I'll see if I can
>>> find a fix.
>>
>> [...]
>>
>>> ath10k: MSI-X interrupt handling (8 intrs)
>>> ath10k: Unable to wakeup target
>>> ath10k: target took longer 5000 us to wake up (awake count 1)
>>> ath10k: Failed to get pcie state addr: -16
>>> ath10k: early firmware event indicated
>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>> IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122
>>> [ath10k_pci]

Hmm.. if BMI handlers are set then must've been CE is allocated
earlier. I'm suspecting this is because ath10k_pci_ce_deinit() gets
called on ath10k_pci_power_up() failpath before interrupts are
disabled/hanlders unregistered.

In that case the solution is to fix the failpath in
ath10k_pci_power_up(). Disabling interrupts or moving
ath10k_pci_ce_deinit() before the very return statement in
ath10k_pci_hif_power_up()'s failpath should suffice. It should be safe
to call ath10k_pci_ce_deinit() without calling to
ath10k_pci_ce_init().


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-17 22:58       ` Michal Kazior
@ 2013-10-18  6:29         ` Kalle Valo
  2013-10-18 15:41           ` Michal Kazior
  0 siblings, 1 reply; 12+ messages in thread
From: Kalle Valo @ 2013-10-18  6:29 UTC (permalink / raw)
  To: Michal Kazior; +Cc: Ben Greear, ath10k

Michal Kazior <michal.kazior@tieto.com> writes:

> On 17 October 2013 01:43, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>
>>> Ben Greear <greearb@candelatech.com> writes:
>>>
>>>> I sent this to the wrong list the first time.
>>>>
>>>> Do you know if this is already addressed?  If not, I'll see if I can
>>>> find a fix.
>>>
>>> [...]
>>>
>>>> ath10k: MSI-X interrupt handling (8 intrs)
>>>> ath10k: Unable to wakeup target
>>>> ath10k: target took longer 5000 us to wake up (awake count 1)
>>>> ath10k: Failed to get pcie state addr: -16
>>>> ath10k: early firmware event indicated
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>>> IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122
>>>> [ath10k_pci]
>
> Hmm.. if BMI handlers are set then must've been CE is allocated
> earlier. I'm suspecting this is because ath10k_pci_ce_deinit() gets
> called on ath10k_pci_power_up() failpath before interrupts are
> disabled/hanlders unregistered.
>
> In that case the solution is to fix the failpath in
> ath10k_pci_power_up(). Disabling interrupts or moving
> ath10k_pci_ce_deinit() before the very return statement in
> ath10k_pci_hif_power_up()'s failpath should suffice. It should be safe
> to call ath10k_pci_ce_deinit() without calling to
> ath10k_pci_ce_init().

But doesn't that will still leave the race of having interrupts enabled
but tasklet handler not properly initialised? I think the right fix is
to first initialise everything and only then enable interrupts.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-18  6:29         ` Kalle Valo
@ 2013-10-18 15:41           ` Michal Kazior
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Kazior @ 2013-10-18 15:41 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Ben Greear, ath10k

On 17 October 2013 23:29, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> On 17 October 2013 01:43, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>>
>>>> Ben Greear <greearb@candelatech.com> writes:
>>>>
>>>>> I sent this to the wrong list the first time.
>>>>>
>>>>> Do you know if this is already addressed?  If not, I'll see if I can
>>>>> find a fix.
>>>>
>>>> [...]
>>>>
>>>>> ath10k: MSI-X interrupt handling (8 intrs)
>>>>> ath10k: Unable to wakeup target
>>>>> ath10k: target took longer 5000 us to wake up (awake count 1)
>>>>> ath10k: Failed to get pcie state addr: -16
>>>>> ath10k: early firmware event indicated
>>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>>>> IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122
>>>>> [ath10k_pci]
>>
>> Hmm.. if BMI handlers are set then must've been CE is allocated
>> earlier. I'm suspecting this is because ath10k_pci_ce_deinit() gets
>> called on ath10k_pci_power_up() failpath before interrupts are
>> disabled/hanlders unregistered.
>>
>> In that case the solution is to fix the failpath in
>> ath10k_pci_power_up(). Disabling interrupts or moving
>> ath10k_pci_ce_deinit() before the very return statement in
>> ath10k_pci_hif_power_up()'s failpath should suffice. It should be safe
>> to call ath10k_pci_ce_deinit() without calling to
>> ath10k_pci_ce_init().
>
> But doesn't that will still leave the race of having interrupts enabled
> but tasklet handler not properly initialised? I think the right fix is
> to first initialise everything and only then enable interrupts.

Yes. It would still leave that case broken.


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-16 16:23   ` Kalle Valo
  2013-10-17  8:43     ` Kalle Valo
@ 2013-10-21 14:00     ` Kalle Valo
  2013-10-21 15:59       ` Ben Greear
  1 sibling, 1 reply; 12+ messages in thread
From: Kalle Valo @ 2013-10-21 14:00 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Kalle Valo <kvalo@qca.qualcomm.com> writes:

> Ben Greear <greearb@candelatech.com> writes:
>
>> I sent this to the wrong list the first time.
>>
>> Do you know if this is already addressed?  If not, I'll see if I can
>> find a fix.
>
> [...]
>
>> ath10k: MSI-X interrupt handling (8 intrs)
>> ath10k: Unable to wakeup target
>> ath10k: target took longer 5000 us to wake up (awake count 1)
>> ath10k: Failed to get pcie state addr: -16
>> ath10k: early firmware event indicated
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122
>> [ath10k_pci]
>
> I think there are two bugs here:
>
> 1) Cold reset doesn't always work, Michal has a patch for that. That's
>    why the wakeup fails:
>
> http://lists.infradead.org/pipermail/ath10k/2013-October/000638.html
>
> 2) We enable interrupts too early and if wakeup fails and we get a
>    spurious interrupt ath10k crashes. We don't have a fix for this yet.

I was able to reproduce your crash and it seems that my patch "ath10k:
add error handling to ath10k_pci_wait()" fixes the crash. But it's not
an ultimate solution, it's just bandaid for the issue:

[ 3200.447601] ath10k: MSI-X didn't succeed (1), trying MSI
[ 3200.447898] ath10k_pci 0000:02:00.0: irq 49 for MSI/MSI-X
[ 3200.449101] ath10k: MSI interrupt handling
[ 3201.620679] ath10k: Unable to wakeup target
[ 3201.620754] ath10k: Failed to reset target, target did not wake up: -110
[ 3201.621111] ath10k: early firmware event indicated
[ 3201.621624] ath10k: could not start pci hif (-110)
[ 3201.621691] ath10k: could not probe fw (-110)
[ 3201.621750] ath10k: could not register driver core (-110)

We are working on fixing the wakeup issue, doing just warm reset helps
but there are other problems with that. Need to investigate more.

And we need to fix the interrupt race properly at some point, I added a
todo entry for this:

http://wireless.kernel.org/en/users/Drivers/ath10k/todo

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-21 14:00     ` Kalle Valo
@ 2013-10-21 15:59       ` Ben Greear
  2013-10-21 16:46         ` Kalle Valo
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2013-10-21 15:59 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k

On 10/21/2013 07:00 AM, Kalle Valo wrote:
> Kalle Valo <kvalo@qca.qualcomm.com> writes:
> 
>> Ben Greear <greearb@candelatech.com> writes:
>>
>>> I sent this to the wrong list the first time.
>>>
>>> Do you know if this is already addressed?  If not, I'll see if I can
>>> find a fix.
>>
>> [...]
>>
>>> ath10k: MSI-X interrupt handling (8 intrs)
>>> ath10k: Unable to wakeup target
>>> ath10k: target took longer 5000 us to wake up (awake count 1)
>>> ath10k: Failed to get pcie state addr: -16
>>> ath10k: early firmware event indicated
>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>> IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122
>>> [ath10k_pci]
>>
>> I think there are two bugs here:
>>
>> 1) Cold reset doesn't always work, Michal has a patch for that. That's
>>    why the wakeup fails:
>>
>> http://lists.infradead.org/pipermail/ath10k/2013-October/000638.html
>>
>> 2) We enable interrupts too early and if wakeup fails and we get a
>>    spurious interrupt ath10k crashes. We don't have a fix for this yet.
> 
> I was able to reproduce your crash and it seems that my patch "ath10k:
> add error handling to ath10k_pci_wait()" fixes the crash. But it's not
> an ultimate solution, it's just bandaid for the issue:
> 
> [ 3200.447601] ath10k: MSI-X didn't succeed (1), trying MSI
> [ 3200.447898] ath10k_pci 0000:02:00.0: irq 49 for MSI/MSI-X
> [ 3200.449101] ath10k: MSI interrupt handling
> [ 3201.620679] ath10k: Unable to wakeup target
> [ 3201.620754] ath10k: Failed to reset target, target did not wake up: -110
> [ 3201.621111] ath10k: early firmware event indicated
> [ 3201.621624] ath10k: could not start pci hif (-110)
> [ 3201.621691] ath10k: could not probe fw (-110)
> [ 3201.621750] ath10k: could not register driver core (-110)
> 
> We are working on fixing the wakeup issue, doing just warm reset helps
> but there are other problems with that. Need to investigate more.
> 
> And we need to fix the interrupt race properly at some point, I added a
> todo entry for this:
> 
> http://wireless.kernel.org/en/users/Drivers/ath10k/todo

Thanks.  Is this the proper repository to develop/test against?

git://github.com/kvalo/ath.git

('master' branch)?


Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fwd: ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-21 15:59       ` Ben Greear
@ 2013-10-21 16:46         ` Kalle Valo
  0 siblings, 0 replies; 12+ messages in thread
From: Kalle Valo @ 2013-10-21 16:46 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

Ben Greear <greearb@candelatech.com> writes:

>> We are working on fixing the wakeup issue, doing just warm reset helps
>> but there are other problems with that. Need to investigate more.
>> 
>> And we need to fix the interrupt race properly at some point, I added a
>> todo entry for this:
>> 
>> http://wireless.kernel.org/en/users/Drivers/ath10k/todo
>
> Thanks.  Is this the proper repository to develop/test against?
>
> git://github.com/kvalo/ath.git
>
> ('master' branch)?

Yes. The master branch is basically no more than 2 weeks old
wireless-testing plus latest ath10k/ath6kl patches.

-- 
Kalle Valo

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ath9k-devel] ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+)
  2013-10-11 23:51 ` [ath9k-devel] " Ben Greear
  (?)
  (?)
@ 2014-02-28  7:37 ` sanju More
  -1 siblings, 0 replies; 12+ messages in thread
From: sanju More @ 2014-02-28  7:37 UTC (permalink / raw)
  To: ath9k-devel

Hi I want to know how to unsubcribe from this mailing list.

Sanjeev




On Saturday, 12 October 2013 5:22 AM, Ben Greear <greearb@candelatech.com> wrote:
 
My kernel is lightly patched, no patches to ath10k.

Looks like something is not checking a NULL pointer, but I did
not dig into the code.? I'm not actually running any traffic on
the ath10k device.


ath10k: Completion buffers are full
ath10k: Completion buffers are full
ath10k: MSI-X interrupt handling (8 intrs)
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
ath10k: UART prints disabled
ath10k: firmware 1.0.0.636 booted
ath10k: htt target version 2.1

...

ath10k: Completion buffers are full
ath10k: Completion buffers are full
IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
8021q: adding VLAN 0 to HW filter on device eth1
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): sta0: link is not ready
ath10k: MSI-X interrupt handling (8 intrs)
ath10k: Unable to wakeup target
ath10k: target took longer 5000 us to wake up (awake count 1)
ath10k: Failed to get pcie state addr: -16
ath10k: early firmware event indicated
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
PGD d1f5b067 PUD d9c1c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs fscache nf_nat_ipv4 nf_nat 8021q ga]
CPU: 1 PID: 17 Comm: ksoftirqd/1 Tainted: G? ? ? ?  C?  3.12.0-rc3-wl+ #10
Hardware name: To be filled by O.E.M. To be filled by O.E.M./HURONRIVER, BIOS 4.6.5 05/02/2012
task: ffff8802160fdd80 ti: ffff880216306000 task.ti: ffff880216306000
RIP: 0010:[<ffffffffa06ae46c>]? [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
RSP: 0018:ffff880216307c98? EFLAGS: 00010246
RAX: 0000000000005b5b RBX: ffff88020bfbfc50 RCX: 0000000000057400
RDX: ffff88020c7e5f20 RSI: ffff880216307d10 RDI: ffff88020bfbfc48
RBP: ffff880216307cf8 R08: ffff880216307d1c R09: ffff8800d967d438
R10: ffff88021fa92ff0 R11: 0000000000000001 R12: ffff880216307d10
R13: ffff880216307d24 R14: ffff880216307d20 R15: 0000000000000000
FS:? 0000000000000000(0000) GS:ffff88021fa80000(0000) knlGS:0000000000000000
CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 00000000d1c76000 CR4: 00000000000407e0
Stack:
? ffff880216307cc8 ffff8802160fdde8 ffff88021fa92ff0 ffff880216307d1c
? ffff88020bfbfc48 ffff88021fa92ff0 ffff880216307d08 ffff88020c7e5f20
? ffff88020bfbf800 0000000000000000 ffff88020bfbfc48 0000000000057430
Call Trace:
? [<ffffffffa06ab568>] ath10k_pci_bmi_send_done+0x1d/0x32 [ath10k_pci]
? [<ffffffff810a4915>] ? local_bh_enable_ip+0x9/0xb
? [<ffffffff8158e801>] ? _raw_spin_unlock_bh+0x1f/0x21
? [<ffffffffa06ae357>] ath10k_ce_per_engine_service+0xab/0xea [ath10k_pci]
? [<ffffffffa06ab3b4>] ath10k_pci_ce_tasklet+0x15/0x17 [ath10k_pci]
? [<ffffffff810a3e3e>] tasklet_action+0x78/0xc6
? [<ffffffff810a463c>] __do_softirq+0xc4/0x19d
? [<ffffffff810a4738>] run_ksoftirqd+0x23/0x42
? [<ffffffff810c0ea3>] smpboot_thread_fn+0x21e/0x223
? [<ffffffff810c0c85>] ? smpboot_create_threads+0x61/0x61
? [<ffffffff810ba54d>] kthread+0xb0/0xb8
? [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
? [<ffffffff815939cc>] ret_from_fork+0x7c/0xb0
? [<ffffffff810ba49d>] ? kthread_freezable_should_stop+0x5b/0x5b
Code: 38 4c 89 45 b8 48 8b 07 48 8b 80 a0 01 00 00 48 05 48 04 00 00 48 89 c7 48 89 45 c0 e8 b6 07 ee e0
RIP? [<ffffffffa06ae46c>] ath10k_ce_completed_send_next+0x47/0x122 [ath10k_pci]
? RSP <ffff880216307c98>
CR2: 0000000000000004
---[ end trace 6b10a0163cca0cc3 ]---

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc? http://www.candelatech.com

_______________________________________________
ath9k-devel mailing list
ath9k-devel at lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ath9k.org/pipermail/ath9k-devel/attachments/20140228/1ca8d8ca/attachment-0001.htm 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-02-28  7:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-11 23:51 ath10k related kernel crash in wireless-testing (3.12.0-rc3-wl+) Ben Greear
2013-10-11 23:51 ` [ath9k-devel] " Ben Greear
2013-10-16 16:00 ` Fwd: " Ben Greear
2013-10-16 16:23   ` Kalle Valo
2013-10-17  8:43     ` Kalle Valo
2013-10-17 22:58       ` Michal Kazior
2013-10-18  6:29         ` Kalle Valo
2013-10-18 15:41           ` Michal Kazior
2013-10-21 14:00     ` Kalle Valo
2013-10-21 15:59       ` Ben Greear
2013-10-21 16:46         ` Kalle Valo
2014-02-28  7:37 ` [ath9k-devel] " sanju More

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.