All of lore.kernel.org
 help / color / mirror / Atom feed
* [ath9k-devel] ath9k queue hang
@ 2014-04-15  2:16 Dave Taht
  2014-04-15  3:49 ` Ben Greear
  0 siblings, 1 reply; 2+ messages in thread
From: Dave Taht @ 2014-04-15  2:16 UTC (permalink / raw)
  To: ath9k-devel

We have been trying to replicate a bug in seeing wifi connections hanging
in strange ways after tons of data is transferred... for several months now.

The symptoms varied, anything from multicast failing to background or best
effort traffic failing - from local access working with remote access
not working...

Last week, we finally got a situation where we had enough debugging on to see
something that matches the symptoms we saw, in that one of the wifi queues
would hang and leave the overlying qdisc full of packets that didn't drain.

Nothing short of a reboot clears it.

More details are at:

http://www.bufferbloat.net/issues/442#note-11

It's easily seen if you are in that state if you do a:

cat /sys/kernel/debug/ieee80211/phy*/ath9k/queues

(VO): qnum: 0 qdepth: 0 ampdu-depth: 0 pending: 0 stopped: 0
(VI): qnum: 1 qdepth: 0 ampdu-depth: 0 pending: 0 stopped: 0
(BE): qnum: 2 qdepth: 0 ampdu-depth: 0 pending: 0 stopped: 0
(BK): qnum: 3 qdepth: 0 ampdu-depth: 0 pending: 151 stopped: 1
(CAB): qnum: 8 qdepth: 0 ampdu-depth: 0 pending: 0 stopped: 0

It's ok to have pending frames, and even be stopped. It's not ok to have
pending stay stuck or increase.

If you are running with a smarter qdisc enabled, you can also see it wedged -
in this case the BK queue (1:4).

root at cerowrt:/mnt/disk1# tc -s qdisc show dev sw00
qdisc mq 1: root
Sent 3926131082 bytes 2998293 pkt (dropped 91657, overlimits 0 requeues 70095)
backlog 77608b 1000p requeues 70095
qdisc fq_codel 10: parent 1:1 limit 800p flows 1024 quantum 500 target
10.0ms interval 100.0ms
Sent 110555 bytes 771 pkt (dropped 0, overlimits 0 requeues 5)
backlog 0b 0p requeues 5
maxpacket 256 drop_overlimit 0 new_flow_count 2 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 20: parent 1:2 limit 800p flows 1024 quantum 300 target
5.0ms interval 100.0ms ecn
Sent 2526448 bytes 17982 pkt (dropped 1, overlimits 0 requeues 31)
backlog 0b 0p requeues 31
maxpacket 929 drop_overlimit 0 new_flow_count 71 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 30: parent 1:3 limit 1000p flows 1024 quantum 300
target 5.0ms interval 100.0ms ecn
Sent 15145657 bytes 106290 pkt (dropped 0, overlimits 0 requeues 179)
backlog 0b 0p requeues 179
maxpacket 256 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 40: parent 1:4 limit 1000p flows 1024 quantum 300
target 5.0ms interval 100.0ms
Sent 3908348422 bytes 2873250 pkt (dropped 91656, overlimits 0 requeues 69880)
backlog 77608b 1000p requeues 69880
^^^!!!!!

Experiencing problem for months, on the linux-backports to 3.10.x in openwrt.


-- 
Dave T?ht

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [ath9k-devel] ath9k queue hang
  2014-04-15  2:16 [ath9k-devel] ath9k queue hang Dave Taht
@ 2014-04-15  3:49 ` Ben Greear
  0 siblings, 0 replies; 2+ messages in thread
From: Ben Greear @ 2014-04-15  3:49 UTC (permalink / raw)
  To: ath9k-devel

On 04/14/2014 07:16 PM, Dave Taht wrote:
> We have been trying to replicate a bug in seeing wifi connections hanging
> in strange ways after tons of data is transferred... for several months now.
>
> The symptoms varied, anything from multicast failing to background or best
> effort traffic failing - from local access working with remote access
> not working...
>
> Last week, we finally got a situation where we had enough debugging on to see
> something that matches the symptoms we saw, in that one of the wifi queues
> would hang and leave the overlying qdisc full of packets that didn't drain.

Sounds familiar...I had a relatively clean patch in the 3.9 days, but had some
issues merging along the way and haven't bothered to rebase it, so patch is
not as clean as it used to be:

http://dmz2.candelatech.com/git/?p=linux-3.14.dev.y/.git;a=commitdiff;h=a34e34f46fbffc627dfc2d93c508f580fbaf29e2;hp=cce0d841338348c69ae6f7ef1b2bc8a6abea3fc4
http://dmz2.candelatech.com/git/?p=linux-3.14.dev.y/.git;a=commitdiff;h=3ecefa9c9f7eed21002dad7a6540d6d250297466;hp=134543c6fec7e28bf91272ce995b550b1bf73c62

I posted the patch to the mailing lists some time back..maybe a year or two ago.

If I recall, we could reproduce our problem fairly reliably by stepping an attenuator
in 10 db steps while under load.

I'd be curious to know if you try it out and it works for you...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-04-15  3:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-15  2:16 [ath9k-devel] ath9k queue hang Dave Taht
2014-04-15  3:49 ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.