All of lore.kernel.org
 help / color / mirror / Atom feed
* ath9k: race conditions in dma
@ 2010-11-01 15:17 ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 15:17 UTC (permalink / raw)
  To: linux-wireless, ath9k-devel

Hi all,

I have an application that creates and destroys a lot of ap vifs and
does a lot of monitor frame injection. The recent ath9k rx locking
fixes have helped with stability in this use-case but there still
seems to be some tx/beacon related race condition(s). These manifests
themselves as follows on an AR913x based router running
compat-wireless-2010-10-19 (with locking fixes etc from openwrt):

1. TX DMA hangs under simultaneous high RX and TX load

This can happen within minutes but only seems to happen if there is
high load on both RX and TX. These hangs take several seconds to fully
recover from and seem more severe than the usual ones we used to see
before the pcu locking fixes. Log output looks like this:

Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA.
Resetting hardware!
Jan  1 00:08:47 user.debug kernel: ath: DMA failed to stop in 10 ms
AR_CR=0x00000024 AR_DIAG_SW=0x42000020
Jan  1 00:08:47 user.debug kernel: ath: ah->misc_mode 0xc
Jan  1 00:08:47 user.debug kernel: ath: Setting CFG 0x10a
Jan  1 00:08:47 user.debug kernel: ath: ah->misc_mode 0xc
Jan  1 00:08:47 user.debug kernel: ath: Setting CFG 0x10a

Also note that in my use-case there is more processing done on
ieee80211_rx() and ieee80211_tx_status() than perhaps normal.

2. TX is completely hung but chip is never reset

Another similar failure mode under the same conditions as above (when
TX and RX load is high) is that the TX pipeline is somehow hung
(nothing coming out on radio) but there is no log output to suggest
that anything is seriously wrong. My guess here is that the tx queue
might be stopped but I have not been able to verify that.

3. Interrupts completely stop coming

The last failure mode happens when the driver is not RX/TX loaded but
instead left running for a longer period of time (about 12 hours is
enough in most cases but 48 hours basically always does the trick).
The system is fine but it seems ath9k is not receiving any interrupts
("cat /sys/kernel/debug/ath9k/phy0/interrupts" produces the same
result over and over again). If full debug is enabled ("echo 0xffff >
/sys/kernel/debug/ath9k/phy0/debug") only shortcal and longcal related
prints appear (driven by a timer). Bringing the interface down and
then up again with ifconfig does not bring it back to life, but
restarting hostapd does.

Help in tracking these down would be much appreciated. I will follow
up below with some thoughts on contributing factors.

/Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 15:17 ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 15:17 UTC (permalink / raw)
  To: ath9k-devel

Hi all,

I have an application that creates and destroys a lot of ap vifs and
does a lot of monitor frame injection. The recent ath9k rx locking
fixes have helped with stability in this use-case but there still
seems to be some tx/beacon related race condition(s). These manifests
themselves as follows on an AR913x based router running
compat-wireless-2010-10-19 (with locking fixes etc from openwrt):

1. TX DMA hangs under simultaneous high RX and TX load

This can happen within minutes but only seems to happen if there is
high load on both RX and TX. These hangs take several seconds to fully
recover from and seem more severe than the usual ones we used to see
before the pcu locking fixes. Log output looks like this:

Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA in 100
msec after killing last frame
Jan  1 00:08:47 user.debug kernel: ath: Failed to stop TX DMA.
Resetting hardware!
Jan  1 00:08:47 user.debug kernel: ath: DMA failed to stop in 10 ms
AR_CR=0x00000024 AR_DIAG_SW=0x42000020
Jan  1 00:08:47 user.debug kernel: ath: ah->misc_mode 0xc
Jan  1 00:08:47 user.debug kernel: ath: Setting CFG 0x10a
Jan  1 00:08:47 user.debug kernel: ath: ah->misc_mode 0xc
Jan  1 00:08:47 user.debug kernel: ath: Setting CFG 0x10a

Also note that in my use-case there is more processing done on
ieee80211_rx() and ieee80211_tx_status() than perhaps normal.

2. TX is completely hung but chip is never reset

Another similar failure mode under the same conditions as above (when
TX and RX load is high) is that the TX pipeline is somehow hung
(nothing coming out on radio) but there is no log output to suggest
that anything is seriously wrong. My guess here is that the tx queue
might be stopped but I have not been able to verify that.

3. Interrupts completely stop coming

The last failure mode happens when the driver is not RX/TX loaded but
instead left running for a longer period of time (about 12 hours is
enough in most cases but 48 hours basically always does the trick).
The system is fine but it seems ath9k is not receiving any interrupts
("cat /sys/kernel/debug/ath9k/phy0/interrupts" produces the same
result over and over again). If full debug is enabled ("echo 0xffff >
/sys/kernel/debug/ath9k/phy0/debug") only shortcal and longcal related
prints appear (driven by a timer). Bringing the interface down and
then up again with ifconfig does not bring it back to life, but
restarting hostapd does.

Help in tracking these down would be much appreciated. I will follow
up below with some thoughts on contributing factors.

/Bj?rn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 15:17 ` [ath9k-devel] " Björn Smedman
@ 2010-11-01 15:43   ` Ben Gamari
  -1 siblings, 0 replies; 24+ messages in thread
From: Ben Gamari @ 2010-11-01 15:43 UTC (permalink / raw)
  To: Björn Smedman, linux-wireless, ath9k-devel

On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
> Hi all,
> 
> I have an application that creates and destroys a lot of ap vifs and
> does a lot of monitor frame injection. The recent ath9k rx locking
> fixes have helped with stability in this use-case but there still
> seems to be some tx/beacon related race condition(s). These manifests
> themselves as follows on an AR913x based router running
> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
> 
> 1. TX DMA hangs under simultaneous high RX and TX load
> 2. TX is completely hung but chip is never reset

I have also observed both of these behaviors with just a standard
hostapd single VIF configuration. Quite annoying. It seems to be better
with recent wireless-testing trees.

- Ben

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 15:43   ` Ben Gamari
  0 siblings, 0 replies; 24+ messages in thread
From: Ben Gamari @ 2010-11-01 15:43 UTC (permalink / raw)
  To: ath9k-devel

On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
> Hi all,
> 
> I have an application that creates and destroys a lot of ap vifs and
> does a lot of monitor frame injection. The recent ath9k rx locking
> fixes have helped with stability in this use-case but there still
> seems to be some tx/beacon related race condition(s). These manifests
> themselves as follows on an AR913x based router running
> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
> 
> 1. TX DMA hangs under simultaneous high RX and TX load
> 2. TX is completely hung but chip is never reset

I have also observed both of these behaviors with just a standard
hostapd single VIF configuration. Quite annoying. It seems to be better
with recent wireless-testing trees.

- Ben

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 15:43   ` Ben Gamari
@ 2010-11-01 15:50     ` Björn Smedman
  -1 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 15:50 UTC (permalink / raw)
  To: Ben Gamari; +Cc: linux-wireless, ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

Thanx Ben, it's a relief to know I'm not the only one suffering from this.

Unfortunately I can't run wireless-testing (built-in system with
out-of-tree arch). Could this be fixed in later compat-wireless
snapshots? Can you recommend a specific snapshot?

/Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 15:50     ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 15:50 UTC (permalink / raw)
  To: ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

Thanx Ben, it's a relief to know I'm not the only one suffering from this.

Unfortunately I can't run wireless-testing (built-in system with
out-of-tree arch). Could this be fixed in later compat-wireless
snapshots? Can you recommend a specific snapshot?

/Bj?rn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 15:43   ` Ben Gamari
@ 2010-11-01 16:20     ` Björn Smedman
  -1 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 16:20 UTC (permalink / raw)
  To: Ben Gamari; +Cc: linux-wireless, ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

Looking at the code here is the first passage that triggers a bad
fuzzy feeling for me (beacon.c):

        skb = ieee80211_get_buffered_bc(hw, vif);

        /*
         * if the CABQ traffic from previous DTIM is pending and the current
         *  beacon is also a DTIM.
         *  1) if there is only one vif let the cab traffic continue.
         *  2) if there are more than one vif and we are using staggered
         *     beacons, then drain the cabq by dropping all the frames in
         *     the cabq so that the current vifs cab traffic can be scheduled.
         */
        spin_lock_bh(&cabq->axq_lock);
        cabq_depth = cabq->axq_depth;
        spin_unlock_bh(&cabq->axq_lock);

        if (skb && cabq_depth) {
                if (sc->nvifs > 1) {
                        ath_print(common, ATH_DBG_BEACON,
                                  "Flushing previous cabq traffic\n");
                        ath_draintxq(sc, cabq, false);
                }
        }

        ath_beacon_setup(sc, avp, bf, info->control.rates[0].idx);

        while (skb) {
                ath_tx_cabq(hw, skb);
                skb = ieee80211_get_buffered_bc(hw, vif);
        }

>From what I can tell there is no guarantee that CABQ TX DMA is stopped
when ath_draintxq() is called. From ath_draintxq() point of view that
looks like a bad idea (race between CPU and DMA).

Also, that looking around "cabq_depth = cabq->axq_depth;" looks very
peculiar. I believe it's correct (because nobody else puts anything
into this queue and we don't care if it's shorter later on when we
drain it) but I think it would be nice with a comment.

Any thoughts? I can whip up and test a patch if there are no objections.

/Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 16:20     ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 16:20 UTC (permalink / raw)
  To: ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

Looking at the code here is the first passage that triggers a bad
fuzzy feeling for me (beacon.c):

        skb = ieee80211_get_buffered_bc(hw, vif);

        /*
         * if the CABQ traffic from previous DTIM is pending and the current
         *  beacon is also a DTIM.
         *  1) if there is only one vif let the cab traffic continue.
         *  2) if there are more than one vif and we are using staggered
         *     beacons, then drain the cabq by dropping all the frames in
         *     the cabq so that the current vifs cab traffic can be scheduled.
         */
        spin_lock_bh(&cabq->axq_lock);
        cabq_depth = cabq->axq_depth;
        spin_unlock_bh(&cabq->axq_lock);

        if (skb && cabq_depth) {
                if (sc->nvifs > 1) {
                        ath_print(common, ATH_DBG_BEACON,
                                  "Flushing previous cabq traffic\n");
                        ath_draintxq(sc, cabq, false);
                }
        }

        ath_beacon_setup(sc, avp, bf, info->control.rates[0].idx);

        while (skb) {
                ath_tx_cabq(hw, skb);
                skb = ieee80211_get_buffered_bc(hw, vif);
        }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 15:43   ` Ben Gamari
@ 2010-11-01 16:39     ` Björn Smedman
  -1 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 16:39 UTC (permalink / raw)
  To: Ben Gamari; +Cc: linux-wireless, ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

The next thing that looks racy to me is ath_beacon_alloc() vs
ath_beacon_tasklet() in beacon.c. Beacon queue TX DMA is always
stopped in main.c before calling ath_beacon_alloc() but
ath_beacon_tasklet() is scheduled when we get an SWBA interrupt. My
guess is that these keep coming even if we stop TX DMA on the beacon
queue, no?

/Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 16:39     ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 16:39 UTC (permalink / raw)
  To: ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

The next thing that looks racy to me is ath_beacon_alloc() vs
ath_beacon_tasklet() in beacon.c. Beacon queue TX DMA is always
stopped in main.c before calling ath_beacon_alloc() but
ath_beacon_tasklet() is scheduled when we get an SWBA interrupt. My
guess is that these keep coming even if we stop TX DMA on the beacon
queue, no?

/Bj?rn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 16:39     ` Björn Smedman
@ 2010-11-01 16:44       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 24+ messages in thread
From: Luis R. Rodriguez @ 2010-11-01 16:44 UTC (permalink / raw)
  To: Björn Smedman; +Cc: Ben Gamari, ath9k-devel, linux-wireless

2010/11/1 Björn Smedman <bjorn.smedman@venatech.se>:
> On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
>> On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
>>> Hi all,
>>>
>>> I have an application that creates and destroys a lot of ap vifs and
>>> does a lot of monitor frame injection. The recent ath9k rx locking
>>> fixes have helped with stability in this use-case but there still
>>> seems to be some tx/beacon related race condition(s). These manifests
>>> themselves as follows on an AR913x based router running
>>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>>
>>> 1. TX DMA hangs under simultaneous high RX and TX load
>>> 2. TX is completely hung but chip is never reset
>>
>> I have also observed both of these behaviors with just a standard
>> hostapd single VIF configuration. Quite annoying. It seems to be better
>> with recent wireless-testing trees.
>>
>> - Ben
>
> The next thing that looks racy to me is ath_beacon_alloc() vs
> ath_beacon_tasklet() in beacon.c. Beacon queue TX DMA is always
> stopped in main.c before calling ath_beacon_alloc() but
> ath_beacon_tasklet() is scheduled when we get an SWBA interrupt. My
> guess is that these keep coming even if we stop TX DMA on the beacon
> queue, no?

My TX PCU patches for ath9k are not merged yet, try those or wait
until John merges them.

  Luis

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 16:44       ` Luis R. Rodriguez
  0 siblings, 0 replies; 24+ messages in thread
From: Luis R. Rodriguez @ 2010-11-01 16:44 UTC (permalink / raw)
  To: ath9k-devel

2010/11/1 Bj?rn Smedman <bjorn.smedman@venatech.se>:
> On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
>> On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
>>> Hi all,
>>>
>>> I have an application that creates and destroys a lot of ap vifs and
>>> does a lot of monitor frame injection. The recent ath9k rx locking
>>> fixes have helped with stability in this use-case but there still
>>> seems to be some tx/beacon related race condition(s). These manifests
>>> themselves as follows on an AR913x based router running
>>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>>
>>> 1. TX DMA hangs under simultaneous high RX and TX load
>>> 2. TX is completely hung but chip is never reset
>>
>> I have also observed both of these behaviors with just a standard
>> hostapd single VIF configuration. Quite annoying. It seems to be better
>> with recent wireless-testing trees.
>>
>> - Ben
>
> The next thing that looks racy to me is ath_beacon_alloc() vs
> ath_beacon_tasklet() in beacon.c. Beacon queue TX DMA is always
> stopped in main.c before calling ath_beacon_alloc() but
> ath_beacon_tasklet() is scheduled when we get an SWBA interrupt. My
> guess is that these keep coming even if we stop TX DMA on the beacon
> queue, no?

My TX PCU patches for ath9k are not merged yet, try those or wait
until John merges them.

  Luis

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 16:44       ` Luis R. Rodriguez
@ 2010-11-01 16:52         ` Felix Fietkau
  -1 siblings, 0 replies; 24+ messages in thread
From: Felix Fietkau @ 2010-11-01 16:52 UTC (permalink / raw)
  To: Luis R. Rodriguez, Björn Smedman
  Cc: Ben Gamari, ath9k-devel, linux-wireless

On 2010-11-01 5:44 PM, Luis R. Rodriguez wrote:
> 2010/11/1 Björn Smedman <bjorn.smedman@venatech.se>:
>> On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
>>> On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
>>>> Hi all,
>>>>
>>>> I have an application that creates and destroys a lot of ap vifs and
>>>> does a lot of monitor frame injection. The recent ath9k rx locking
>>>> fixes have helped with stability in this use-case but there still
>>>> seems to be some tx/beacon related race condition(s). These manifests
>>>> themselves as follows on an AR913x based router running
>>>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>>>
>>>> 1. TX DMA hangs under simultaneous high RX and TX load
>>>> 2. TX is completely hung but chip is never reset
>>>
>>> I have also observed both of these behaviors with just a standard
>>> hostapd single VIF configuration. Quite annoying. It seems to be better
>>> with recent wireless-testing trees.
>>>
>>> - Ben
>>
>> The next thing that looks racy to me is ath_beacon_alloc() vs
>> ath_beacon_tasklet() in beacon.c. Beacon queue TX DMA is always
>> stopped in main.c before calling ath_beacon_alloc() but
>> ath_beacon_tasklet() is scheduled when we get an SWBA interrupt. My
>> guess is that these keep coming even if we stop TX DMA on the beacon
>> queue, no?
> 
> My TX PCU patches for ath9k are not merged yet, try those or wait
> until John merges them.
They are merged in OpenWrt. Björn, which OpenWrt revision did you use in
your tests?

- Felix

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 16:52         ` Felix Fietkau
  0 siblings, 0 replies; 24+ messages in thread
From: Felix Fietkau @ 2010-11-01 16:52 UTC (permalink / raw)
  To: ath9k-devel

On 2010-11-01 5:44 PM, Luis R. Rodriguez wrote:
> 2010/11/1 Bj?rn Smedman <bjorn.smedman@venatech.se>:
>> On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
>>> On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
>>>> Hi all,
>>>>
>>>> I have an application that creates and destroys a lot of ap vifs and
>>>> does a lot of monitor frame injection. The recent ath9k rx locking
>>>> fixes have helped with stability in this use-case but there still
>>>> seems to be some tx/beacon related race condition(s). These manifests
>>>> themselves as follows on an AR913x based router running
>>>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>>>
>>>> 1. TX DMA hangs under simultaneous high RX and TX load
>>>> 2. TX is completely hung but chip is never reset
>>>
>>> I have also observed both of these behaviors with just a standard
>>> hostapd single VIF configuration. Quite annoying. It seems to be better
>>> with recent wireless-testing trees.
>>>
>>> - Ben
>>
>> The next thing that looks racy to me is ath_beacon_alloc() vs
>> ath_beacon_tasklet() in beacon.c. Beacon queue TX DMA is always
>> stopped in main.c before calling ath_beacon_alloc() but
>> ath_beacon_tasklet() is scheduled when we get an SWBA interrupt. My
>> guess is that these keep coming even if we stop TX DMA on the beacon
>> queue, no?
> 
> My TX PCU patches for ath9k are not merged yet, try those or wait
> until John merges them.
They are merged in OpenWrt. Bj?rn, which OpenWrt revision did you use in
your tests?

- Felix

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 16:52         ` Felix Fietkau
@ 2010-11-01 17:12           ` Björn Smedman
  -1 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 17:12 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: Luis R. Rodriguez, Ben Gamari, ath9k-devel, linux-wireless

2010/11/1 Felix Fietkau <nbd@openwrt.org>
> > My TX PCU patches for ath9k are not merged yet, try those or wait
> > until John merges them.
> They are merged in OpenWrt. Björn, which OpenWrt revision did you use in
> your tests?
>
> - Felix

I'm based on openwrt/trunk@23720 when I run code. But when I read code
I'm looking at the latest wireless-testing (and trying to keep track
of pending patches on linux-wireless). I will apply the TX PCU patch
and see if that changes my bad fuzzy feeling.

/Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 17:12           ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-01 17:12 UTC (permalink / raw)
  To: ath9k-devel

2010/11/1 Felix Fietkau <nbd@openwrt.org>
> > My TX PCU patches for ath9k are not merged yet, try those or wait
> > until John merges them.
> They are merged in OpenWrt. Bj?rn, which OpenWrt revision did you use in
> your tests?
>
> - Felix

I'm based on openwrt/trunk at 23720 when I run code. But when I read code
I'm looking at the latest wireless-testing (and trying to keep track
of pending patches on linux-wireless). I will apply the TX PCU patch
and see if that changes my bad fuzzy feeling.

/Bj?rn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 15:50     ` Björn Smedman
@ 2010-11-01 23:12       ` Peter Stuge
  -1 siblings, 0 replies; 24+ messages in thread
From: Peter Stuge @ 2010-11-01 23:12 UTC (permalink / raw)
  To: Björn Smedman; +Cc: Ben Gamari, ath9k-devel, linux-wireless

Björn Smedman wrote:
> >> 1. TX DMA hangs under simultaneous high RX and TX load
> >> 2. TX is completely hung but chip is never reset
> >
> > I have also observed both of these behaviors with just a standard
> > hostapd single VIF configuration.
> 
> Thanx Ben, it's a relief to know I'm not the only one suffering
> from this.

Just a note to confirm that I have also seen many different failures
related to this. The lasting impression is that it's a big mess.

I bought my first ath9k hardware roughly a year ago. That was totally
useless as STA up until kernels a few months ago. I am now using an
AR9280 card and for the very first time ath9k hardware and driver is
actually working at all for me, but there are still issues as I noted
in the other email.

Unfortunately they're the kind of issues which can't be debugged much
lacking strong knowledge of device internals.


//Peter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-01 23:12       ` Peter Stuge
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Stuge @ 2010-11-01 23:12 UTC (permalink / raw)
  To: ath9k-devel

Bj?rn Smedman wrote:
> >> 1. TX DMA hangs under simultaneous high RX and TX load
> >> 2. TX is completely hung but chip is never reset
> >
> > I have also observed both of these behaviors with just a standard
> > hostapd single VIF configuration.
> 
> Thanx Ben, it's a relief to know I'm not the only one suffering
> from this.

Just a note to confirm that I have also seen many different failures
related to this. The lasting impression is that it's a big mess.

I bought my first ath9k hardware roughly a year ago. That was totally
useless as STA up until kernels a few months ago. I am now using an
AR9280 card and for the very first time ath9k hardware and driver is
actually working at all for me, but there are still issues as I noted
in the other email.

Unfortunately they're the kind of issues which can't be debugged much
lacking strong knowledge of device internals.


//Peter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-01 15:43   ` Ben Gamari
@ 2010-11-02 16:55     ` Björn Smedman
  -1 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-02 16:55 UTC (permalink / raw)
  To: Ben Gamari; +Cc: linux-wireless, ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

I just posted "[RFC] ath9k: fix tx queue selection" with a patch that
fixes (or at least reduces) these two for me. I'm not sure it is the
whole story but at least in theory 1 could be caused by locking one tx
queue and actually transmitting on another. 2 is probably caused by
stopping one mac80211 queue and then starting another.

Ben, if you can easily trigger these problems on wireless-testing,
could you test with my patch and see if it helps? I'm especially
interested to see if it really fixes problem 1.

/Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-02 16:55     ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-02 16:55 UTC (permalink / raw)
  To: ath9k-devel

On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
>> Hi all,
>>
>> I have an application that creates and destroys a lot of ap vifs and
>> does a lot of monitor frame injection. The recent ath9k rx locking
>> fixes have helped with stability in this use-case but there still
>> seems to be some tx/beacon related race condition(s). These manifests
>> themselves as follows on an AR913x based router running
>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>
>> 1. TX DMA hangs under simultaneous high RX and TX load
>> 2. TX is completely hung but chip is never reset
>
> I have also observed both of these behaviors with just a standard
> hostapd single VIF configuration. Quite annoying. It seems to be better
> with recent wireless-testing trees.
>
> - Ben

I just posted "[RFC] ath9k: fix tx queue selection" with a patch that
fixes (or at least reduces) these two for me. I'm not sure it is the
whole story but at least in theory 1 could be caused by locking one tx
queue and actually transmitting on another. 2 is probably caused by
stopping one mac80211 queue and then starting another.

Ben, if you can easily trigger these problems on wireless-testing,
could you test with my patch and see if it helps? I'm especially
interested to see if it really fixes problem 1.

/Bj?rn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-02 16:55     ` Björn Smedman
@ 2010-11-03 16:41       ` Björn Smedman
  -1 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-03 16:41 UTC (permalink / raw)
  To: linux-wireless, ath9k-devel

2010/11/2 Björn Smedman <bjorn.smedman@venatech.se>:
> On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
>> On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
>>> Hi all,
>>>
>>> I have an application that creates and destroys a lot of ap vifs and
>>> does a lot of monitor frame injection. The recent ath9k rx locking
>>> fixes have helped with stability in this use-case but there still
>>> seems to be some tx/beacon related race condition(s). These manifests
>>> themselves as follows on an AR913x based router running
>>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>>
>>> 1. TX DMA hangs under simultaneous high RX and TX load
>>> 2. TX is completely hung but chip is never reset
>>
>> I have also observed both of these behaviors with just a standard
>> hostapd single VIF configuration. Quite annoying. It seems to be better
>> with recent wireless-testing trees.
>>
>> - Ben
>
> I just posted "[RFC] ath9k: fix tx queue selection" with a patch that
> fixes (or at least reduces) these two for me. I'm not sure it is the
> whole story but at least in theory 1 could be caused by locking one tx
> queue and actually transmitting on another. 2 is probably caused by
> stopping one mac80211 queue and then starting another.

Problem 1 is still there. After 5-15 hours of varying rx/tx frame
injection load something like this happens and the chip goes
deaf/mute:

      Jan  1 00:18:33 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x40000020
      Jan  1 00:18:33 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020
      Jan  1 00:18:33 user.debug kernel: ath: ah->misc_mode 0xc
      Jan  1 00:18:33 user.debug kernel: ath: Setting CFG 0x10a
      Jan  1 00:18:43 user.debug kernel: ath: Timeout while waiting
for nf to load: AR_PHY_AGC_CONTROL=0x40d22
      Jan  1 00:18:44 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x40000020
      Jan  1 00:18:44 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020
      Jan  1 00:18:44 user.debug kernel: ath: ah->misc_mode 0xc
      Jan  1 00:18:44 user.debug kernel: ath: Setting CFG 0x10a

Problem 2 seems gone though.

/Björn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-03 16:41       ` Björn Smedman
  0 siblings, 0 replies; 24+ messages in thread
From: Björn Smedman @ 2010-11-03 16:41 UTC (permalink / raw)
  To: ath9k-devel

2010/11/2 Bj?rn Smedman <bjorn.smedman@venatech.se>:
> On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari <bgamari@gmail.com> wrote:
>> On Mon, 1 Nov 2010 16:17:23 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
>>> Hi all,
>>>
>>> I have an application that creates and destroys a lot of ap vifs and
>>> does a lot of monitor frame injection. The recent ath9k rx locking
>>> fixes have helped with stability in this use-case but there still
>>> seems to be some tx/beacon related race condition(s). These manifests
>>> themselves as follows on an AR913x based router running
>>> compat-wireless-2010-10-19 (with locking fixes etc from openwrt):
>>>
>>> 1. TX DMA hangs under simultaneous high RX and TX load
>>> 2. TX is completely hung but chip is never reset
>>
>> I have also observed both of these behaviors with just a standard
>> hostapd single VIF configuration. Quite annoying. It seems to be better
>> with recent wireless-testing trees.
>>
>> - Ben
>
> I just posted "[RFC] ath9k: fix tx queue selection" with a patch that
> fixes (or at least reduces) these two for me. I'm not sure it is the
> whole story but at least in theory 1 could be caused by locking one tx
> queue and actually transmitting on another. 2 is probably caused by
> stopping one mac80211 queue and then starting another.

Problem 1 is still there. After 5-15 hours of varying rx/tx frame
injection load something like this happens and the chip goes
deaf/mute:

      Jan  1 00:18:33 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x40000020
      Jan  1 00:18:33 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020
      Jan  1 00:18:33 user.debug kernel: ath: ah->misc_mode 0xc
      Jan  1 00:18:33 user.debug kernel: ath: Setting CFG 0x10a
      Jan  1 00:18:43 user.debug kernel: ath: Timeout while waiting
for nf to load: AR_PHY_AGC_CONTROL=0x40d22
      Jan  1 00:18:44 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x40000020
      Jan  1 00:18:44 user.debug kernel: ath: DMA failed to stop in 10
ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020
      Jan  1 00:18:44 user.debug kernel: ath: ah->misc_mode 0xc
      Jan  1 00:18:44 user.debug kernel: ath: Setting CFG 0x10a

Problem 2 seems gone though.

/Bj?rn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [ath9k-devel] ath9k: race conditions in dma
  2010-11-02 16:55     ` Björn Smedman
@ 2010-11-03 17:47       ` Ben Gamari
  -1 siblings, 0 replies; 24+ messages in thread
From: Ben Gamari @ 2010-11-03 17:47 UTC (permalink / raw)
  To: Björn Smedman; +Cc: ath9k-devel, linux-wireless

On Tue, 2 Nov 2010 17:55:22 +0100, Björn Smedman <bjorn.smedman@venatech.se> wrote:
> Ben, if you can easily trigger these problems on wireless-testing,
> could you test with my patch and see if it helps? I'm especially
> interested to see if it really fixes problem 1.
> 
The only time I've been able to reproduce the issue with
wireless-testing is when using my work laptop. I'll bring it home
tonight and see if your patch makes any difference. Thanks,

- Ben


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [ath9k-devel] ath9k: race conditions in dma
@ 2010-11-03 17:47       ` Ben Gamari
  0 siblings, 0 replies; 24+ messages in thread
From: Ben Gamari @ 2010-11-03 17:47 UTC (permalink / raw)
  To: ath9k-devel

On Tue, 2 Nov 2010 17:55:22 +0100, Bj?rn Smedman <bjorn.smedman@venatech.se> wrote:
> Ben, if you can easily trigger these problems on wireless-testing,
> could you test with my patch and see if it helps? I'm especially
> interested to see if it really fixes problem 1.
> 
The only time I've been able to reproduce the issue with
wireless-testing is when using my work laptop. I'll bring it home
tonight and see if your patch makes any difference. Thanks,

- Ben

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2010-11-03 17:47 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-01 15:17 ath9k: race conditions in dma Björn Smedman
2010-11-01 15:17 ` [ath9k-devel] " Björn Smedman
2010-11-01 15:43 ` Ben Gamari
2010-11-01 15:43   ` Ben Gamari
2010-11-01 15:50   ` Björn Smedman
2010-11-01 15:50     ` Björn Smedman
2010-11-01 23:12     ` Peter Stuge
2010-11-01 23:12       ` Peter Stuge
2010-11-01 16:20   ` Björn Smedman
2010-11-01 16:20     ` Björn Smedman
2010-11-01 16:39   ` Björn Smedman
2010-11-01 16:39     ` Björn Smedman
2010-11-01 16:44     ` Luis R. Rodriguez
2010-11-01 16:44       ` Luis R. Rodriguez
2010-11-01 16:52       ` Felix Fietkau
2010-11-01 16:52         ` Felix Fietkau
2010-11-01 17:12         ` Björn Smedman
2010-11-01 17:12           ` Björn Smedman
2010-11-02 16:55   ` Björn Smedman
2010-11-02 16:55     ` Björn Smedman
2010-11-03 16:41     ` Björn Smedman
2010-11-03 16:41       ` Björn Smedman
2010-11-03 17:47     ` Ben Gamari
2010-11-03 17:47       ` Ben Gamari

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.