All of lore.kernel.org
 help / color / mirror / Atom feed
* ath10k wake_tx_queue issues
@ 2018-05-15 13:45 ` Niklas Cassel
  0 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-15 13:45 UTC (permalink / raw)
  To: johannes, kvalo, erik.stromdahl, toke; +Cc: linux-wireless, ath10k

Hello mac80211 and ath10k people

Using ath10k, TX stops working when running iperf

[  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
[  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
[  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
[  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
[  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec

The problem can be reproduced without specifying a send buffer size,
however, specifying a small send buffer helps to reproduce the problem faster.

What happens is that iperf gets -EAGAIN on write().
It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
The reason why we get -EAGAIN is because the send socket buffer is full
(iperf uses non-blocking I/O).

The problem is that the mac80211 wake_tx_queue callback never comes.

I guess the best way to describe this is to show my ftrace buffer:

     ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
     ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
     ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
     ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
     ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
     ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
     ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
     ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
     ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
     ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
     ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
     ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
     ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
     ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
     ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
     ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
     ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
     ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
     ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
     ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
     ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
     ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
     ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
     ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
     ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
     ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
     ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
     ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
     ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
     ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
     ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
     ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
     ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
     ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
     ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
     ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
     ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
     ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
     ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0

(since we just called ieee80211_stop_queues(), I wouldn't expect to see
wake_tx_queue being called directly after)

     ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
  ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
  ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
  ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0

here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
(ath10k_mac_op_wake_tx_queue) is never seen again...

  ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
  ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
  ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
  ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
  ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
  ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
  ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
  ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
  ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
  ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
  ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
  ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
  ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
  ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
  ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
  ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
  ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
  ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
  ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
  ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
  ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
  ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
  ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
  ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
  ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
  ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
  ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
  ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
  ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
  ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
  ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
  ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
  ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
  ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
  ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
  ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
  ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
  ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
  ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
  ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
  ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
  ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
  ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
  ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
  ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
  ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
  ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
  ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
  ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
  ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
  ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
  ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
  ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
  ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
  ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
  ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
  ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
  ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
  ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
  ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
  ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
  ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
  ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0

ath10k just finished transmission of pending frames.

Half a second later, the send buffer is full, and we start seeing errors
in iperf.

           iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
           iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
           iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11



Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
=> ieee80211_queue_skb => drv_wake_tx_queue.

But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
in the ieee80211_txq that needs to be sent, to allow more data to be written to
the socket. But obviously the callback never comes.
Or how else is this supposed to work?


Note that setting ops->wake_tx_queue = NULL; works around the problem
(i.e. let mac80211 use ath10k's tx callback instead of wake_tx_queue callback).
But then there might still be a bug for other drivers to stumble upon.

However, since the only wireless drivers using wake_tx_queue are: ath9k, ath10k,
and mt76, perhaps it is not such a bad idea to use the tx callback instead of
the wake_tx_queue callback.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* ath10k wake_tx_queue issues
@ 2018-05-15 13:45 ` Niklas Cassel
  0 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-15 13:45 UTC (permalink / raw)
  To: johannes, kvalo, erik.stromdahl, toke; +Cc: linux-wireless, ath10k

Hello mac80211 and ath10k people

Using ath10k, TX stops working when running iperf

[  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
[  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
[  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
[  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
[  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec

The problem can be reproduced without specifying a send buffer size,
however, specifying a small send buffer helps to reproduce the problem faster.

What happens is that iperf gets -EAGAIN on write().
It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
The reason why we get -EAGAIN is because the send socket buffer is full
(iperf uses non-blocking I/O).

The problem is that the mac80211 wake_tx_queue callback never comes.

I guess the best way to describe this is to show my ftrace buffer:

     ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
     ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
     ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
     ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
     ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
     ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
     ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
     ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
     ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
     ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
     ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
     ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
     ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
     ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
     ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
     ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
     ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
     ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
     ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
     ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
     ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
     ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
     ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
     ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
     ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
     ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
     ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
     ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
     ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
     ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
     ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
     ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
     ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
     ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
     ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
     ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
     ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
     ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
     ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0

(since we just called ieee80211_stop_queues(), I wouldn't expect to see
wake_tx_queue being called directly after)

     ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
  ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
  ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
  ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0

here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
(ath10k_mac_op_wake_tx_queue) is never seen again...

  ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
  ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
  ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
  ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
  ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
  ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
  ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
  ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
  ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
  ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
  ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
  ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
  ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
  ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
  ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
  ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
  ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
  ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
  ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
  ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
  ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
  ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
  ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
  ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
  ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
  ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
  ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
  ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
  ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
  ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
  ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
  ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
  ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
  ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
  ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
  ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
  ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
  ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
  ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
  ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
  ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
  ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
  ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
  ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
  ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
  ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
  ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
  ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
  ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
  ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
  ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
  ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
  ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
  ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
  ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
  ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
  ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
  ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
  ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
  ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
  ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
  ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
  ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0

ath10k just finished transmission of pending frames.

Half a second later, the send buffer is full, and we start seeing errors
in iperf.

           iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
           iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
           iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11



Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
=> ieee80211_queue_skb => drv_wake_tx_queue.

But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
in the ieee80211_txq that needs to be sent, to allow more data to be written to
the socket. But obviously the callback never comes.
Or how else is this supposed to work?


Note that setting ops->wake_tx_queue = NULL; works around the problem
(i.e. let mac80211 use ath10k's tx callback instead of wake_tx_queue callback).
But then there might still be a bug for other drivers to stumble upon.

However, since the only wireless drivers using wake_tx_queue are: ath9k, ath10k,
and mt76, perhaps it is not such a bad idea to use the tx callback instead of
the wake_tx_queue callback.


Kind regards,
Niklas

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-15 13:45 ` Niklas Cassel
@ 2018-05-15 13:50   ` Ben Greear
  -1 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2018-05-15 13:50 UTC (permalink / raw)
  To: Niklas Cassel, johannes, kvalo, erik.stromdahl, toke
  Cc: linux-wireless, ath10k


On 05/15/2018 06:45 AM, Niklas Cassel wrote:
> Hello mac80211 and ath10k people
>
> Using ath10k, TX stops working when running iperf

What kernel and firmware and ath10k hardware are you using?

Does the same thing happen on older kernels?  I had some issues
with ath9k having some sort of tx-hang, but my test case was
using 200 virtual stations to transmit, and others could not reproduce
it, so I am not sure I saw the same issue as you did.

Thanks,
Ben

>
> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>
> The problem can be reproduced without specifying a send buffer size,
> however, specifying a small send buffer helps to reproduce the problem faster.
>
> What happens is that iperf gets -EAGAIN on write().
> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> The reason why we get -EAGAIN is because the send socket buffer is full
> (iperf uses non-blocking I/O).
>
> The problem is that the mac80211 wake_tx_queue callback never comes.
>
> I guess the best way to describe this is to show my ftrace buffer:
>
>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
>
> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> wake_tx_queue being called directly after)
>
>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
>
> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> (ath10k_mac_op_wake_tx_queue) is never seen again...
>
>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
>
> ath10k just finished transmission of pending frames.
>
> Half a second later, the send buffer is full, and we start seeing errors
> in iperf.
>
>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
>
>
>
> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> => ieee80211_queue_skb => drv_wake_tx_queue.
>
> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> in the ieee80211_txq that needs to be sent, to allow more data to be written to
> the socket. But obviously the callback never comes.
> Or how else is this supposed to work?
>
>
> Note that setting ops->wake_tx_queue = NULL; works around the problem
> (i.e. let mac80211 use ath10k's tx callback instead of wake_tx_queue callback).
> But then there might still be a bug for other drivers to stumble upon.
>
> However, since the only wireless drivers using wake_tx_queue are: ath9k, ath10k,
> and mt76, perhaps it is not such a bad idea to use the tx callback instead of
> the wake_tx_queue callback.
>
>
> Kind regards,
> Niklas
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-15 13:50   ` Ben Greear
  0 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2018-05-15 13:50 UTC (permalink / raw)
  To: Niklas Cassel, johannes, kvalo, erik.stromdahl, toke
  Cc: linux-wireless, ath10k


On 05/15/2018 06:45 AM, Niklas Cassel wrote:
> Hello mac80211 and ath10k people
>
> Using ath10k, TX stops working when running iperf

What kernel and firmware and ath10k hardware are you using?

Does the same thing happen on older kernels?  I had some issues
with ath9k having some sort of tx-hang, but my test case was
using 200 virtual stations to transmit, and others could not reproduce
it, so I am not sure I saw the same issue as you did.

Thanks,
Ben

>
> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>
> The problem can be reproduced without specifying a send buffer size,
> however, specifying a small send buffer helps to reproduce the problem faster.
>
> What happens is that iperf gets -EAGAIN on write().
> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> The reason why we get -EAGAIN is because the send socket buffer is full
> (iperf uses non-blocking I/O).
>
> The problem is that the mac80211 wake_tx_queue callback never comes.
>
> I guess the best way to describe this is to show my ftrace buffer:
>
>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
>
> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> wake_tx_queue being called directly after)
>
>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
>
> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> (ath10k_mac_op_wake_tx_queue) is never seen again...
>
>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
>
> ath10k just finished transmission of pending frames.
>
> Half a second later, the send buffer is full, and we start seeing errors
> in iperf.
>
>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
>
>
>
> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> => ieee80211_queue_skb => drv_wake_tx_queue.
>
> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> in the ieee80211_txq that needs to be sent, to allow more data to be written to
> the socket. But obviously the callback never comes.
> Or how else is this supposed to work?
>
>
> Note that setting ops->wake_tx_queue = NULL; works around the problem
> (i.e. let mac80211 use ath10k's tx callback instead of wake_tx_queue callback).
> But then there might still be a bug for other drivers to stumble upon.
>
> However, since the only wireless drivers using wake_tx_queue are: ath9k, ath10k,
> and mt76, perhaps it is not such a bad idea to use the tx callback instead of
> the wake_tx_queue callback.
>
>
> Kind regards,
> Niklas
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-15 13:45 ` Niklas Cassel
@ 2018-05-15 14:13   ` Toke Høiland-Jørgensen
  -1 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-05-15 14:13 UTC (permalink / raw)
  To: Niklas Cassel, johannes, kvalo, erik.stromdahl, Felix Fietkau
  Cc: linux-wireless, ath10k

[ Adding Felix ]


Niklas Cassel <niklas.cassel@linaro.org> writes:

> Hello mac80211 and ath10k people
>
> Using ath10k, TX stops working when running iperf
>
> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>
> The problem can be reproduced without specifying a send buffer size,
> however, specifying a small send buffer helps to reproduce the problem faster.
>
> What happens is that iperf gets -EAGAIN on write().
> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> The reason why we get -EAGAIN is because the send socket buffer is full
> (iperf uses non-blocking I/O).
>
> The problem is that the mac80211 wake_tx_queue callback never comes.
>
> I guess the best way to describe this is to show my ftrace buffer:
>
>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
>
> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> wake_tx_queue being called directly after)
>
>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
>
> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> (ath10k_mac_op_wake_tx_queue) is never seen again...
>
>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
>
> ath10k just finished transmission of pending frames.
>
> Half a second later, the send buffer is full, and we start seeing errors
> in iperf.
>
>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
>
>
>
> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> => ieee80211_queue_skb => drv_wake_tx_queue.
>
> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> in the ieee80211_txq that needs to be sent, to allow more data to be written to
> the socket. But obviously the callback never comes.
> Or how else is this supposed to work?

The driver should reschedule itself before/after calling
ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
does the right thing either, I'm not too familiar with that part of the
code. There's no direct call to reschedule that I can see, but there may
be another reason why this is not needed for ath9k. I'm sure Felix
knows?

> However, since the only wireless drivers using wake_tx_queue are:
> ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
> tx callback instead of the wake_tx_queue callback.

On the contrary, we want more drivers to move to wake_tx_queue :)

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-15 14:13   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-05-15 14:13 UTC (permalink / raw)
  To: Niklas Cassel, johannes, kvalo, erik.stromdahl, Felix Fietkau
  Cc: linux-wireless, ath10k

[ Adding Felix ]


Niklas Cassel <niklas.cassel@linaro.org> writes:

> Hello mac80211 and ath10k people
>
> Using ath10k, TX stops working when running iperf
>
> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>
> The problem can be reproduced without specifying a send buffer size,
> however, specifying a small send buffer helps to reproduce the problem faster.
>
> What happens is that iperf gets -EAGAIN on write().
> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> The reason why we get -EAGAIN is because the send socket buffer is full
> (iperf uses non-blocking I/O).
>
> The problem is that the mac80211 wake_tx_queue callback never comes.
>
> I guess the best way to describe this is to show my ftrace buffer:
>
>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
>
> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> wake_tx_queue being called directly after)
>
>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
>
> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> (ath10k_mac_op_wake_tx_queue) is never seen again...
>
>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
>
> ath10k just finished transmission of pending frames.
>
> Half a second later, the send buffer is full, and we start seeing errors
> in iperf.
>
>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
>
>
>
> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> => ieee80211_queue_skb => drv_wake_tx_queue.
>
> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> in the ieee80211_txq that needs to be sent, to allow more data to be written to
> the socket. But obviously the callback never comes.
> Or how else is this supposed to work?

The driver should reschedule itself before/after calling
ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
does the right thing either, I'm not too familiar with that part of the
code. There's no direct call to reschedule that I can see, but there may
be another reason why this is not needed for ath9k. I'm sure Felix
knows?

> However, since the only wireless drivers using wake_tx_queue are:
> ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
> tx callback instead of the wake_tx_queue callback.

On the contrary, we want more drivers to move to wake_tx_queue :)

-Toke

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-15 14:13   ` Toke Høiland-Jørgensen
@ 2018-05-15 20:31     ` Niklas Cassel
  -1 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-15 20:31 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: johannes, kvalo, erik.stromdahl, Felix Fietkau, linux-wireless, ath10k

On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
> [ Adding Felix ]
> 
> 
> Niklas Cassel <niklas.cassel@linaro.org> writes:
> 
> > Hello mac80211 and ath10k people
> >
> > Using ath10k, TX stops working when running iperf
> >
> > [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> > [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> > [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> > [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> > [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
> >
> > The problem can be reproduced without specifying a send buffer size,
> > however, specifying a small send buffer helps to reproduce the problem faster.
> >
> > What happens is that iperf gets -EAGAIN on write().
> > It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> > The reason why we get -EAGAIN is because the send socket buffer is full
> > (iperf uses non-blocking I/O).
> >
> > The problem is that the mac80211 wake_tx_queue callback never comes.
> >
> > I guess the best way to describe this is to show my ftrace buffer:
> >
> >      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
> >      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
> >      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
> >      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
> >      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
> >      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
> >      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
> >      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
> >      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
> >      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
> >      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
> >      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
> >      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
> >      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
> >
> > (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> > wake_tx_queue being called directly after)
> >
> >      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
> >   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
> >   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
> >   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
> >
> > here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> > (ath10k_mac_op_wake_tx_queue) is never seen again...
> >
> >   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
> >   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
> >   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
> >   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
> >   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
> >   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
> >   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
> >   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
> >   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
> >   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
> >   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
> >   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
> >   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
> >   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
> >   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
> >   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
> >   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
> >   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
> >   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
> >   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
> >   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
> >   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
> >   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
> >   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
> >   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
> >   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
> >   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
> >   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
> >   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
> >   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
> >   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
> >   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
> >   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
> >   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
> >   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
> >   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
> >   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
> >   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
> >   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
> >   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
> >   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
> >   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
> >   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
> >   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
> >   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
> >   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
> >   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
> >   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
> >   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
> >   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
> >   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
> >   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
> >   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
> >   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
> >   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
> >   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
> >   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
> >   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
> >   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
> >   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
> >   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
> >   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
> >   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
> >
> > ath10k just finished transmission of pending frames.
> >
> > Half a second later, the send buffer is full, and we start seeing errors
> > in iperf.
> >
> >            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
> >            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
> >            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
> >
> >
> >
> > Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> > ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> > => ieee80211_queue_skb => drv_wake_tx_queue.
> >
> > But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> > to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> > in the ieee80211_txq that needs to be sent, to allow more data to be written to
> > the socket. But obviously the callback never comes.
> > Or how else is this supposed to work?
> 
> The driver should reschedule itself before/after calling
> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
> does the right thing either, I'm not too familiar with that part of the
> code. There's no direct call to reschedule that I can see, but there may
> be another reason why this is not needed for ath9k. I'm sure Felix
> knows?

Hello Toke

Unfortunately, it doesn't look like mt76 uses any ieee80211_* function to
reschedule.

I just came across a ieee80211_schedule_txq() function in
e937b8da5a59 ("mac80211: Add TXQ scheduling API").
However, this commit was reverted. Any plans on resubmitting this?

Regards,
Niklas

> 
> > However, since the only wireless drivers using wake_tx_queue are:
> > ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
> > tx callback instead of the wake_tx_queue callback.
> 
> On the contrary, we want more drivers to move to wake_tx_queue :)
> 
> -Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-15 20:31     ` Niklas Cassel
  0 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-15 20:31 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: erik.stromdahl, linux-wireless, ath10k, kvalo, johannes, Felix Fietkau

On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
> [ Adding Felix ]
> 
> 
> Niklas Cassel <niklas.cassel@linaro.org> writes:
> 
> > Hello mac80211 and ath10k people
> >
> > Using ath10k, TX stops working when running iperf
> >
> > [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> > [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> > [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> > [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> > [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
> >
> > The problem can be reproduced without specifying a send buffer size,
> > however, specifying a small send buffer helps to reproduce the problem faster.
> >
> > What happens is that iperf gets -EAGAIN on write().
> > It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> > The reason why we get -EAGAIN is because the send socket buffer is full
> > (iperf uses non-blocking I/O).
> >
> > The problem is that the mac80211 wake_tx_queue callback never comes.
> >
> > I guess the best way to describe this is to show my ftrace buffer:
> >
> >      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
> >      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
> >      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
> >      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
> >      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
> >      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
> >      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
> >      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
> >      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
> >      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
> >      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
> >      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
> >      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
> >      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
> >      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
> >      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
> >
> > (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> > wake_tx_queue being called directly after)
> >
> >      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
> >      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
> >   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
> >   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
> >   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
> >
> > here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> > (ath10k_mac_op_wake_tx_queue) is never seen again...
> >
> >   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
> >   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
> >   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
> >   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
> >   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
> >   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
> >   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
> >   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
> >   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
> >   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
> >   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
> >   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
> >   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
> >   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
> >   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
> >   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
> >   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
> >   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
> >   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
> >   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
> >   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
> >   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
> >   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
> >   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
> >   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
> >   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
> >   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
> >   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
> >   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
> >   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
> >   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
> >   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
> >   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
> >   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
> >   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
> >   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
> >   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
> >   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
> >   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
> >   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
> >   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
> >   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
> >   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
> >   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
> >   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
> >   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
> >   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
> >   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
> >   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
> >   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
> >   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
> >   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
> >   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
> >   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
> >   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
> >   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
> >   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
> >   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
> >   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
> >   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
> >   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
> >   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
> >   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
> >
> > ath10k just finished transmission of pending frames.
> >
> > Half a second later, the send buffer is full, and we start seeing errors
> > in iperf.
> >
> >            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
> >            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
> >            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
> >
> >
> >
> > Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> > ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> > => ieee80211_queue_skb => drv_wake_tx_queue.
> >
> > But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> > to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> > in the ieee80211_txq that needs to be sent, to allow more data to be written to
> > the socket. But obviously the callback never comes.
> > Or how else is this supposed to work?
> 
> The driver should reschedule itself before/after calling
> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
> does the right thing either, I'm not too familiar with that part of the
> code. There's no direct call to reschedule that I can see, but there may
> be another reason why this is not needed for ath9k. I'm sure Felix
> knows?

Hello Toke

Unfortunately, it doesn't look like mt76 uses any ieee80211_* function to
reschedule.

I just came across a ieee80211_schedule_txq() function in
e937b8da5a59 ("mac80211: Add TXQ scheduling API").
However, this commit was reverted. Any plans on resubmitting this?

Regards,
Niklas

> 
> > However, since the only wireless drivers using wake_tx_queue are:
> > ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
> > tx callback instead of the wake_tx_queue callback.
> 
> On the contrary, we want more drivers to move to wake_tx_queue :)
> 
> -Toke

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-15 20:31     ` Niklas Cassel
@ 2018-05-16  5:54       ` Alexander Wetzel
  -1 siblings, 0 replies; 18+ messages in thread
From: Alexander Wetzel @ 2018-05-16  5:54 UTC (permalink / raw)
  To: Niklas Cassel, Toke Høiland-Jørgensen
  Cc: johannes, kvalo, erik.stromdahl, Felix Fietkau, linux-wireless, ath10k

Hello,

This sounds exactly like the issue I just submitted a patch for.
Can you test https://patchwork.kernel.org/patch/10399613/ if that solves
the issue?



Am 15.05.2018 um 22:31 schrieb Niklas Cassel:
> On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
>> [ Adding Felix ]
>>
>>
>> Niklas Cassel <niklas.cassel@linaro.org> writes:
>>
>>> Hello mac80211 and ath10k people
>>>
>>> Using ath10k, TX stops working when running iperf
>>>
>>> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
>>> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
>>> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
>>> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
>>> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>>>
>>> The problem can be reproduced without specifying a send buffer size,
>>> however, specifying a small send buffer helps to reproduce the problem faster.
>>>
>>> What happens is that iperf gets -EAGAIN on write().
>>> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
>>> The reason why we get -EAGAIN is because the send socket buffer is full
>>> (iperf uses non-blocking I/O).
>>>
>>> The problem is that the mac80211 wake_tx_queue callback never comes.
>>>
>>> I guess the best way to describe this is to show my ftrace buffer:
>>>
>>>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>>>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>>>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>>>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>>>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>>>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>>>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>>>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>>>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
>>>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>>>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
>>>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
>>>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
>>>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
>>>
>>> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
>>> wake_tx_queue being called directly after)
>>>
>>>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>>>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
>>>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>>>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
>>>
>>> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
>>> (ath10k_mac_op_wake_tx_queue) is never seen again...
>>>
>>>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>>>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
>>>
>>> ath10k just finished transmission of pending frames.
>>>
>>> Half a second later, the send buffer is full, and we start seeing errors
>>> in iperf.
>>>
>>>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
>>>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
>>>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
>>>
>>>
>>>
>>> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
>>> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
>>> => ieee80211_queue_skb => drv_wake_tx_queue.
>>>
>>> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
>>> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
>>> in the ieee80211_txq that needs to be sent, to allow more data to be written to
>>> the socket. But obviously the callback never comes.
>>> Or how else is this supposed to work?
>>
>> The driver should reschedule itself before/after calling
>> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
>> does the right thing either, I'm not too familiar with that part of the
>> code. There's no direct call to reschedule that I can see, but there may
>> be another reason why this is not needed for ath9k. I'm sure Felix
>> knows?
> 
> Hello Toke
> 
> Unfortunately, it doesn't look like mt76 uses any ieee80211_* function to
> reschedule.
> 
> I just came across a ieee80211_schedule_txq() function in
> e937b8da5a59 ("mac80211: Add TXQ scheduling API").
> However, this commit was reverted. Any plans on resubmitting this?
> 
> Regards,
> Niklas
> 
>>
>>> However, since the only wireless drivers using wake_tx_queue are:
>>> ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
>>> tx callback instead of the wake_tx_queue callback.
>>
>> On the contrary, we want more drivers to move to wake_tx_queue :)
>>
>> -Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-16  5:54       ` Alexander Wetzel
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Wetzel @ 2018-05-16  5:54 UTC (permalink / raw)
  To: Niklas Cassel, Toke Høiland-Jørgensen
  Cc: erik.stromdahl, linux-wireless, ath10k, kvalo, johannes, Felix Fietkau

Hello,

This sounds exactly like the issue I just submitted a patch for.
Can you test https://patchwork.kernel.org/patch/10399613/ if that solves
the issue?



Am 15.05.2018 um 22:31 schrieb Niklas Cassel:
> On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
>> [ Adding Felix ]
>>
>>
>> Niklas Cassel <niklas.cassel@linaro.org> writes:
>>
>>> Hello mac80211 and ath10k people
>>>
>>> Using ath10k, TX stops working when running iperf
>>>
>>> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
>>> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
>>> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
>>> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
>>> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>>>
>>> The problem can be reproduced without specifying a send buffer size,
>>> however, specifying a small send buffer helps to reproduce the problem faster.
>>>
>>> What happens is that iperf gets -EAGAIN on write().
>>> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
>>> The reason why we get -EAGAIN is because the send socket buffer is full
>>> (iperf uses non-blocking I/O).
>>>
>>> The problem is that the mac80211 wake_tx_queue callback never comes.
>>>
>>> I guess the best way to describe this is to show my ftrace buffer:
>>>
>>>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>>>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>>>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>>>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>>>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>>>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>>>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>>>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>>>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
>>>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>>>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
>>>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
>>>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
>>>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
>>>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>>>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
>>>
>>> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
>>> wake_tx_queue being called directly after)
>>>
>>>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>>>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>>>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
>>>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>>>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
>>>
>>> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
>>> (ath10k_mac_op_wake_tx_queue) is never seen again...
>>>
>>>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>>>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
>>>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
>>>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
>>>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
>>>
>>> ath10k just finished transmission of pending frames.
>>>
>>> Half a second later, the send buffer is full, and we start seeing errors
>>> in iperf.
>>>
>>>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
>>>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
>>>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
>>>
>>>
>>>
>>> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
>>> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
>>> => ieee80211_queue_skb => drv_wake_tx_queue.
>>>
>>> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
>>> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
>>> in the ieee80211_txq that needs to be sent, to allow more data to be written to
>>> the socket. But obviously the callback never comes.
>>> Or how else is this supposed to work?
>>
>> The driver should reschedule itself before/after calling
>> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
>> does the right thing either, I'm not too familiar with that part of the
>> code. There's no direct call to reschedule that I can see, but there may
>> be another reason why this is not needed for ath9k. I'm sure Felix
>> knows?
> 
> Hello Toke
> 
> Unfortunately, it doesn't look like mt76 uses any ieee80211_* function to
> reschedule.
> 
> I just came across a ieee80211_schedule_txq() function in
> e937b8da5a59 ("mac80211: Add TXQ scheduling API").
> However, this commit was reverted. Any plans on resubmitting this?
> 
> Regards,
> Niklas
> 
>>
>>> However, since the only wireless drivers using wake_tx_queue are:
>>> ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
>>> tx callback instead of the wake_tx_queue callback.
>>
>> On the contrary, we want more drivers to move to wake_tx_queue :)
>>
>> -Toke

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-16  5:54       ` Alexander Wetzel
@ 2018-05-16  9:16         ` Niklas Cassel
  -1 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-16  9:16 UTC (permalink / raw)
  To: Alexander Wetzel
  Cc: Toke Høiland-Jørgensen, johannes, kvalo,
	erik.stromdahl, Felix Fietkau, linux-wireless, ath10k

On Wed, May 16, 2018 at 07:54:54AM +0200, Alexander Wetzel wrote:
> Hello,
> 
> This sounds exactly like the issue I just submitted a patch for.
> Can you test https://patchwork.kernel.org/patch/10399613/ if that solves
> the issue?

Hello Alexander,


I just tried you fix, and unfortunately it doesn't solve my problem.


Regards,
Niklas

> 
> 
> 
> Am 15.05.2018 um 22:31 schrieb Niklas Cassel:
> > On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
> >> [ Adding Felix ]
> >>
> >>
> >> Niklas Cassel <niklas.cassel@linaro.org> writes:
> >>
> >>> Hello mac80211 and ath10k people
> >>>
> >>> Using ath10k, TX stops working when running iperf
> >>>
> >>> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> >>> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> >>> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> >>> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> >>> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
> >>>
> >>> The problem can be reproduced without specifying a send buffer size,
> >>> however, specifying a small send buffer helps to reproduce the problem faster.
> >>>
> >>> What happens is that iperf gets -EAGAIN on write().
> >>> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> >>> The reason why we get -EAGAIN is because the send socket buffer is full
> >>> (iperf uses non-blocking I/O).
> >>>
> >>> The problem is that the mac80211 wake_tx_queue callback never comes.
> >>>
> >>> I guess the best way to describe this is to show my ftrace buffer:
> >>>
> >>>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
> >>>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
> >>>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
> >>>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
> >>>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
> >>>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
> >>>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
> >>>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
> >>>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
> >>>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
> >>>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
> >>>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
> >>>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
> >>>
> >>> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> >>> wake_tx_queue being called directly after)
> >>>
> >>>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
> >>>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
> >>>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
> >>>
> >>> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> >>> (ath10k_mac_op_wake_tx_queue) is never seen again...
> >>>
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
> >>>
> >>> ath10k just finished transmission of pending frames.
> >>>
> >>> Half a second later, the send buffer is full, and we start seeing errors
> >>> in iperf.
> >>>
> >>>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
> >>>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
> >>>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
> >>>
> >>>
> >>>
> >>> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> >>> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> >>> => ieee80211_queue_skb => drv_wake_tx_queue.
> >>>
> >>> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> >>> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> >>> in the ieee80211_txq that needs to be sent, to allow more data to be written to
> >>> the socket. But obviously the callback never comes.
> >>> Or how else is this supposed to work?
> >>
> >> The driver should reschedule itself before/after calling
> >> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
> >> does the right thing either, I'm not too familiar with that part of the
> >> code. There's no direct call to reschedule that I can see, but there may
> >> be another reason why this is not needed for ath9k. I'm sure Felix
> >> knows?
> > 
> > Hello Toke
> > 
> > Unfortunately, it doesn't look like mt76 uses any ieee80211_* function to
> > reschedule.
> > 
> > I just came across a ieee80211_schedule_txq() function in
> > e937b8da5a59 ("mac80211: Add TXQ scheduling API").
> > However, this commit was reverted. Any plans on resubmitting this?
> > 
> > Regards,
> > Niklas
> > 
> >>
> >>> However, since the only wireless drivers using wake_tx_queue are:
> >>> ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
> >>> tx callback instead of the wake_tx_queue callback.
> >>
> >> On the contrary, we want more drivers to move to wake_tx_queue :)
> >>
> >> -Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-16  9:16         ` Niklas Cassel
  0 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-16  9:16 UTC (permalink / raw)
  To: Alexander Wetzel
  Cc: Toke Høiland-Jørgensen, erik.stromdahl, linux-wireless,
	ath10k, kvalo, johannes, Felix Fietkau

On Wed, May 16, 2018 at 07:54:54AM +0200, Alexander Wetzel wrote:
> Hello,
> 
> This sounds exactly like the issue I just submitted a patch for.
> Can you test https://patchwork.kernel.org/patch/10399613/ if that solves
> the issue?

Hello Alexander,


I just tried you fix, and unfortunately it doesn't solve my problem.


Regards,
Niklas

> 
> 
> 
> Am 15.05.2018 um 22:31 schrieb Niklas Cassel:
> > On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
> >> [ Adding Felix ]
> >>
> >>
> >> Niklas Cassel <niklas.cassel@linaro.org> writes:
> >>
> >>> Hello mac80211 and ath10k people
> >>>
> >>> Using ath10k, TX stops working when running iperf
> >>>
> >>> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> >>> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> >>> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> >>> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> >>> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
> >>>
> >>> The problem can be reproduced without specifying a send buffer size,
> >>> however, specifying a small send buffer helps to reproduce the problem faster.
> >>>
> >>> What happens is that iperf gets -EAGAIN on write().
> >>> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> >>> The reason why we get -EAGAIN is because the send socket buffer is full
> >>> (iperf uses non-blocking I/O).
> >>>
> >>> The problem is that the mac80211 wake_tx_queue callback never comes.
> >>>
> >>> I guess the best way to describe this is to show my ftrace buffer:
> >>>
> >>>      ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
> >>>      ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
> >>>      ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
> >>>      ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
> >>>      ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
> >>>      ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
> >>>      ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
> >>>      ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
> >>>      ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
> >>>      ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
> >>>      ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
> >>>      ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
> >>>      ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
> >>>      ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
> >>>
> >>> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> >>> wake_tx_queue being called directly after)
> >>>
> >>>      ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
> >>>      ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
> >>>   ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
> >>>   ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
> >>>
> >>> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> >>> (ath10k_mac_op_wake_tx_queue) is never seen again...
> >>>
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
> >>>   ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0
> >>>
> >>> ath10k just finished transmission of pending frames.
> >>>
> >>> Half a second later, the send buffer is full, and we start seeing errors
> >>> in iperf.
> >>>
> >>>            iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
> >>>            iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
> >>>            iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11
> >>>
> >>>
> >>>
> >>> Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
> >>> ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
> >>> => ieee80211_queue_skb => drv_wake_tx_queue.
> >>>
> >>> But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
> >>> to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
> >>> in the ieee80211_txq that needs to be sent, to allow more data to be written to
> >>> the socket. But obviously the callback never comes.
> >>> Or how else is this supposed to work?
> >>
> >> The driver should reschedule itself before/after calling
> >> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
> >> does the right thing either, I'm not too familiar with that part of the
> >> code. There's no direct call to reschedule that I can see, but there may
> >> be another reason why this is not needed for ath9k. I'm sure Felix
> >> knows?
> > 
> > Hello Toke
> > 
> > Unfortunately, it doesn't look like mt76 uses any ieee80211_* function to
> > reschedule.
> > 
> > I just came across a ieee80211_schedule_txq() function in
> > e937b8da5a59 ("mac80211: Add TXQ scheduling API").
> > However, this commit was reverted. Any plans on resubmitting this?
> > 
> > Regards,
> > Niklas
> > 
> >>
> >>> However, since the only wireless drivers using wake_tx_queue are:
> >>> ath9k, ath10k, and mt76, perhaps it is not such a bad idea to use the
> >>> tx callback instead of the wake_tx_queue callback.
> >>
> >> On the contrary, we want more drivers to move to wake_tx_queue :)
> >>
> >> -Toke

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-15 20:31     ` Niklas Cassel
@ 2018-05-16  9:22       ` Toke Høiland-Jørgensen
  -1 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-05-16  9:22 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: johannes, kvalo, erik.stromdahl, Felix Fietkau, linux-wireless, ath10k

Niklas Cassel <niklas.cassel@linaro.org> writes:
> [ .. snip .. ]
>> > Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
>> > ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
>> > => ieee80211_queue_skb => drv_wake_tx_queue.
>> >
>> > But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
>> > to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
>> > in the ieee80211_txq that needs to be sent, to allow more data to be written to
>> > the socket. But obviously the callback never comes.
>> > Or how else is this supposed to work?
>> 
>> The driver should reschedule itself before/after calling
>> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
>> does the right thing either, I'm not too familiar with that part of the
>> code. There's no direct call to reschedule that I can see, but there may
>> be another reason why this is not needed for ath9k. I'm sure Felix
>> knows?
>
> Hello Toke
>
> Unfortunately, it doesn't look like mt76 uses any ieee80211_* function
> to reschedule.

It doesn't need to; it just reschedules itself.

Basically, the wake_tx_queue() callback is just a way for mac80211 to
notify the driver that new packets are available and that it should
start its scheduling function. But in this case it is the driver that is
restarting the queues, so it already knows that. And so it can just call
its internal scheduling function. This is what mt76 does in
mt76_dma_tx_cleanup() with the call to mt76_txq_schedule() before
calling ieee80211_wake_queue().

I think that what ath10k should be doing is calling
ath10k_mac_tx_push_pending() when it restarts the queues.

> I just came across a ieee80211_schedule_txq() function in e937b8da5a59
> ("mac80211: Add TXQ scheduling API"). However, this commit was
> reverted. Any plans on resubmitting this?

Yeah, I have a revised version lying around waiting for Felix to review
it. But that wouldn't help this bug; it's just an API change, it doesn't
change behaviour...

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-16  9:22       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-05-16  9:22 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: erik.stromdahl, linux-wireless, ath10k, kvalo, johannes, Felix Fietkau

Niklas Cassel <niklas.cassel@linaro.org> writes:
> [ .. snip .. ]
>> > Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
>> > ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
>> > => ieee80211_queue_skb => drv_wake_tx_queue.
>> >
>> > But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
>> > to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
>> > in the ieee80211_txq that needs to be sent, to allow more data to be written to
>> > the socket. But obviously the callback never comes.
>> > Or how else is this supposed to work?
>> 
>> The driver should reschedule itself before/after calling
>> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
>> does the right thing either, I'm not too familiar with that part of the
>> code. There's no direct call to reschedule that I can see, but there may
>> be another reason why this is not needed for ath9k. I'm sure Felix
>> knows?
>
> Hello Toke
>
> Unfortunately, it doesn't look like mt76 uses any ieee80211_* function
> to reschedule.

It doesn't need to; it just reschedules itself.

Basically, the wake_tx_queue() callback is just a way for mac80211 to
notify the driver that new packets are available and that it should
start its scheduling function. But in this case it is the driver that is
restarting the queues, so it already knows that. And so it can just call
its internal scheduling function. This is what mt76 does in
mt76_dma_tx_cleanup() with the call to mt76_txq_schedule() before
calling ieee80211_wake_queue().

I think that what ath10k should be doing is calling
ath10k_mac_tx_push_pending() when it restarts the queues.

> I just came across a ieee80211_schedule_txq() function in e937b8da5a59
> ("mac80211: Add TXQ scheduling API"). However, this commit was
> reverted. Any plans on resubmitting this?

Yeah, I have a revised version lying around waiting for Felix to review
it. But that wouldn't help this bug; it's just an API change, it doesn't
change behaviour...

-Toke

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-15 13:45 ` Niklas Cassel
@ 2018-05-16 17:28   ` Erik Stromdahl
  -1 siblings, 0 replies; 18+ messages in thread
From: Erik Stromdahl @ 2018-05-16 17:28 UTC (permalink / raw)
  To: Niklas Cassel, johannes, kvalo, toke; +Cc: linux-wireless, ath10k

Hello Niklas

Quick question:
Are you using my patch: "ath10k: add htt_tx num_pending window"?

I assume (from your logs below) that you are not...

See more comments below.

<snip>
> 
> I guess the best way to describe this is to show my ftrace buffer:
> 
>       ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>       ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>       ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>       ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>       ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>       ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>       ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>       ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>       ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>       ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>       ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>       ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
For high latency devices, when the num_pending counter reaches this point (64),
ath10k calls ieee80211_stop_queues (from ath10k_htt_tx_inc_pending -> ath10k_mac_tx_lock).
This will stop all per-TID queues (16 in total).

>       ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>       ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
At this point, the driver have received a TX_COMPL_IND (since num_pending has been decremented to 63).
Without the patch I mentioned above, the logic is that the TX queues will be re-enabled immediately.
This is achieved by calling ieee80211_stop_queues.

>       ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
2 queues have been enabled here. I don't know why not all queues have been re-enabled.
Perhaps it is due to that mac80211 immediately gets more data on one of the just re-enabled queues,
resulting in...
>       ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
...the the wake_tx_queue op beeing invoked again ...
>       ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
... and the num_pending counter beeing incremented as a result of this.
This results in all queues beeing ...
>       ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
... stopped again (since we reached the upper limit of 64).
> 
> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> wake_tx_queue being called directly after)
> 
Below looks really odd indeed.
If ath10k_mac_op_wake_tx_queue is beeing called several times with num_pending == 64,
those calls will fail (nothing will be transmitted).

>       ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>    ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
Here we get another TX_COMP_IND resulting in a re-enable of the queues.
>    ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>    ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
> 
> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> (ath10k_mac_op_wake_tx_queue) is never seen again...
> 
Below we get TX_COMP_IND's for all outstanding TX messages:
>    ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>    ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>    ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>    ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>    ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>    ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
...
<snip>
> 
> ath10k just finished transmission of pending frames.
> 
It would be interesting to see what happens if the max_num_pending value was incremented.

It is defined by TARGET_TLV_NUM_MSDU_DESC_HL in hw.h

I have tried a value of 105 (used in more recent version of qcacld) and it seems to work fine.
When combined with the patch I mentioned above, it will create a num_pending window,
that potentially can mitigate this problem.

Another thing you could try to tweak is the HTC_HOST_MAX_MSG_PER_TX_BUNDLE constant (only present in my patch).
If you set it to a high enough value (let's say 24 in your case, since you have 24 wake_tx_queue
invocations after the queues have been stopped), ath10k will be able to handle those extra frames
(at least 24 of them), since the queues will be stopped before the maximum num_pending limit is reached.

--
Erik

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-16 17:28   ` Erik Stromdahl
  0 siblings, 0 replies; 18+ messages in thread
From: Erik Stromdahl @ 2018-05-16 17:28 UTC (permalink / raw)
  To: Niklas Cassel, johannes, kvalo, toke; +Cc: linux-wireless, ath10k

Hello Niklas

Quick question:
Are you using my patch: "ath10k: add htt_tx num_pending window"?

I assume (from your logs below) that you are not...

See more comments below.

<snip>
> 
> I guess the best way to describe this is to show my ftrace buffer:
> 
>       ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
>       ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>       ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
>       ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
>       ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
>       ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>       ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
>       ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
>       ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
>       ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>       ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
>       ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
For high latency devices, when the num_pending counter reaches this point (64),
ath10k calls ieee80211_stop_queues (from ath10k_htt_tx_inc_pending -> ath10k_mac_tx_lock).
This will stop all per-TID queues (16 in total).

>       ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
>       ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
At this point, the driver have received a TX_COMPL_IND (since num_pending has been decremented to 63).
Without the patch I mentioned above, the logic is that the TX queues will be re-enabled immediately.
This is achieved by calling ieee80211_stop_queues.

>       ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
2 queues have been enabled here. I don't know why not all queues have been re-enabled.
Perhaps it is due to that mac80211 immediately gets more data on one of the just re-enabled queues,
resulting in...
>       ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
...the the wake_tx_queue op beeing invoked again ...
>       ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
... and the num_pending counter beeing incremented as a result of this.
This results in all queues beeing ...
>       ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
>       ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0
... stopped again (since we reached the upper limit of 64).
> 
> (since we just called ieee80211_stop_queues(), I wouldn't expect to see
> wake_tx_queue being called directly after)
> 
Below looks really odd indeed.
If ath10k_mac_op_wake_tx_queue is beeing called several times with num_pending == 64,
those calls will fail (nothing will be transmitted).

>       ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
>       ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
>    ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
Here we get another TX_COMP_IND resulting in a re-enable of the queues.
>    ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
>    ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0
> 
> here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
> (ath10k_mac_op_wake_tx_queue) is never seen again...
> 
Below we get TX_COMP_IND's for all outstanding TX messages:
>    ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
>    ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
>    ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
>    ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
>    ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
>    ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
...
<snip>
> 
> ath10k just finished transmission of pending frames.
> 
It would be interesting to see what happens if the max_num_pending value was incremented.

It is defined by TARGET_TLV_NUM_MSDU_DESC_HL in hw.h

I have tried a value of 105 (used in more recent version of qcacld) and it seems to work fine.
When combined with the patch I mentioned above, it will create a num_pending window,
that potentially can mitigate this problem.

Another thing you could try to tweak is the HTC_HOST_MAX_MSG_PER_TX_BUNDLE constant (only present in my patch).
If you set it to a high enough value (let's say 24 in your case, since you have 24 wake_tx_queue
invocations after the queues have been stopped), ath10k will be able to handle those extra frames
(at least 24 of them), since the queues will be stopped before the maximum num_pending limit is reached.

--
Erik

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
  2018-05-16 17:28   ` Erik Stromdahl
@ 2018-05-17 23:47     ` Niklas Cassel
  -1 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-17 23:47 UTC (permalink / raw)
  To: Erik Stromdahl; +Cc: johannes, kvalo, toke, linux-wireless, ath10k

On Wed, May 16, 2018 at 07:28:21PM +0200, Erik Stromdahl wrote:
> Hello Niklas
> 
> Quick question:
> Are you using my patch: "ath10k: add htt_tx num_pending window"?

Nope, but I definitely think that your patch should be merged,
since the current code can lock/unlock/lock a lot of times for
no good reason.

(I actually tried it, but I could still reproduce the bug with it.)

> 
> I assume (from your logs below) that you are not...
> 

Thanks a lot for you suggestions Erik!

Increasing max_num_pending might be a good idea (perhaps we will
get better thoughput in the SDIO case).

However, increasing either max_num_pending or
HTC_HOST_MAX_MSG_PER_TX_BUNDLE would probably just move the problem,
since it would still be possible for us to get hit by the same problem
again in the future.

I actually took Toke's suggestion and cooked up a patch:
https://marc.info/?l=linux-kernel&m=152659902128543&w=2


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k wake_tx_queue issues
@ 2018-05-17 23:47     ` Niklas Cassel
  0 siblings, 0 replies; 18+ messages in thread
From: Niklas Cassel @ 2018-05-17 23:47 UTC (permalink / raw)
  To: Erik Stromdahl; +Cc: johannes, toke, linux-wireless, kvalo, ath10k

On Wed, May 16, 2018 at 07:28:21PM +0200, Erik Stromdahl wrote:
> Hello Niklas
> 
> Quick question:
> Are you using my patch: "ath10k: add htt_tx num_pending window"?

Nope, but I definitely think that your patch should be merged,
since the current code can lock/unlock/lock a lot of times for
no good reason.

(I actually tried it, but I could still reproduce the bug with it.)

> 
> I assume (from your logs below) that you are not...
> 

Thanks a lot for you suggestions Erik!

Increasing max_num_pending might be a good idea (perhaps we will
get better thoughput in the SDIO case).

However, increasing either max_num_pending or
HTC_HOST_MAX_MSG_PER_TX_BUNDLE would probably just move the problem,
since it would still be possible for us to get hit by the same problem
again in the future.

I actually took Toke's suggestion and cooked up a patch:
https://marc.info/?l=linux-kernel&m=152659902128543&w=2


Kind regards,
Niklas



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-05-17 23:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-15 13:45 ath10k wake_tx_queue issues Niklas Cassel
2018-05-15 13:45 ` Niklas Cassel
2018-05-15 13:50 ` Ben Greear
2018-05-15 13:50   ` Ben Greear
2018-05-15 14:13 ` Toke Høiland-Jørgensen
2018-05-15 14:13   ` Toke Høiland-Jørgensen
2018-05-15 20:31   ` Niklas Cassel
2018-05-15 20:31     ` Niklas Cassel
2018-05-16  5:54     ` Alexander Wetzel
2018-05-16  5:54       ` Alexander Wetzel
2018-05-16  9:16       ` Niklas Cassel
2018-05-16  9:16         ` Niklas Cassel
2018-05-16  9:22     ` Toke Høiland-Jørgensen
2018-05-16  9:22       ` Toke Høiland-Jørgensen
2018-05-16 17:28 ` Erik Stromdahl
2018-05-16 17:28   ` Erik Stromdahl
2018-05-17 23:47   ` Niklas Cassel
2018-05-17 23:47     ` Niklas Cassel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.