ath9k-devel.lists.ath9k.org archive mirror
 help / color / mirror / Atom feed
* [ath9k-devel] [PATCH 0/2] ath9k: Add airtime fairness scheduler
@ 2016-06-17  9:09 Toke Høiland-Jørgensen
  2016-06-17  9:09 ` [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues Toke Høiland-Jørgensen
  2016-06-17  9:09 ` [ath9k-devel] [PATCH 2/2] ath9k: Add a per-station airtime deficit scheduler Toke Høiland-Jørgensen
  0 siblings, 2 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-17  9:09 UTC (permalink / raw)
  To: ath9k-devel

This is the second version of my airtime fairness patch. This version
has a somewhat reworked scheduler (now closer to the structure of
fq_codel) and a different way to measure RX airtime; and there's a
debugfs entry to control which airtime measurements to include in the
scheduling decisions. For a simple one-way UDP test, the scheduler
achieves pretty much perfect airtime share (by its own measure). There's
not much throughput difference in the UDP case, but TCP tests see a
moderate improvement. I'll write up something more detailed on the
performance measures over the weekend and post it in a separate mail.

This patch set is rebased to mac80211-next - which means it no longer
includes Michal's patch to disable qdiscs. I have retained my version of
Tim's patch to make ath9k use wake_tx_queue in this patch set. That
probably needs some work still, but I believe he is working on that. I
have not tested extensively with the mac80211 FQ-CoDel patches enabled,
but I expect them to be complementary to this.

Changes since the RFC version:

- The scheduler will now enforce fairness harder. The previous version
  would refill the deficit of slow stations too fast in some cases.

- Change the way RX airtime is measured. For aggregates, the airtime is
  now calculated as the difference between the rs->rs_tstamp of the
  first and last frame in the aggregate. For non-aggregates, the
  previous calculation from the packet size is retained.

- There is now an 'airtime_flags' debugfs entry which can be used to
  control which airtime measures are accounted to the deficit. If bit 0
  is set, TX airtime will be accounted, and if bit 1 is set, RX airtime
  will. If no bits are set, the scheduler will revert to simple
  round-robin scheduling. The default is enabling both TX and RX.

- Squashed the whole thing into one patch and rebased to mac80211-next.

Toke H?iland-J?rgensen (2):
  ath9k: use mac80211 intermediate software queues
  ath9k: Add a per-station airtime deficit scheduler

 drivers/net/wireless/ath/ath9k/ath9k.h     |  34 +++-
 drivers/net/wireless/ath/ath9k/channel.c   |  12 +-
 drivers/net/wireless/ath/ath9k/debug.c     |   3 +
 drivers/net/wireless/ath/ath9k/debug.h     |  29 ++++
 drivers/net/wireless/ath/ath9k/debug_sta.c |  53 +++++-
 drivers/net/wireless/ath/ath9k/init.c      |   2 +
 drivers/net/wireless/ath/ath9k/main.c      |   7 +-
 drivers/net/wireless/ath/ath9k/recv.c      |  60 +++++++
 drivers/net/wireless/ath/ath9k/xmit.c      | 255 ++++++++++++++++++++++-------
 9 files changed, 386 insertions(+), 69 deletions(-)

-- 
2.8.3

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17  9:09 [ath9k-devel] [PATCH 0/2] ath9k: Add airtime fairness scheduler Toke Høiland-Jørgensen
@ 2016-06-17  9:09 ` Toke Høiland-Jørgensen
  2016-06-17 13:28   ` Felix Fietkau
  2016-06-18 19:06   ` [ath9k-devel] [PATCH] ath9k: Switch to using " Toke Høiland-Jørgensen
  2016-06-17  9:09 ` [ath9k-devel] [PATCH 2/2] ath9k: Add a per-station airtime deficit scheduler Toke Høiland-Jørgensen
  1 sibling, 2 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-17  9:09 UTC (permalink / raw)
  To: ath9k-devel

This patch leaves the code for ath9k's internal per-node per-tid
queues in place and just modifies the driver to also pull from
the new mac80211 intermediate software queues, and implements
the .wake_tx_queue method, which will cause mac80211 to deliver
packets to be sent via the new intermediate queue.

Signed-off-by: Tim Shepard <shep@alum.mit.edu>

Reworked to not require the global variable renaming in ath9k.

Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
---
 drivers/net/wireless/ath/ath9k/ath9k.h     |  16 +++-
 drivers/net/wireless/ath/ath9k/debug_sta.c |   7 +-
 drivers/net/wireless/ath/ath9k/init.c      |   1 +
 drivers/net/wireless/ath/ath9k/main.c      |   1 +
 drivers/net/wireless/ath/ath9k/xmit.c      | 119 +++++++++++++++++++++++++----
 5 files changed, 125 insertions(+), 19 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 93b3793..caeae10 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define BAW_WITHIN(_start, _bawsz, _seqno) \
 	((((_seqno) - (_start)) & 4095) < (_bawsz))
 
-#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
-
 #define IS_HT_RATE(rate)   (rate & 0x80)
 #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
 #define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
@@ -232,8 +230,10 @@ struct ath_buf {
 
 struct ath_atx_tid {
 	struct list_head list;
+	struct sk_buff_head i_q;
 	struct sk_buff_head buf_q;
 	struct sk_buff_head retry_q;
+	struct ieee80211_txq *swq;
 	struct ath_node *an;
 	struct ath_txq *txq;
 	unsigned long tx_buf[BITS_TO_LONGS(ATH_TID_MAX_BUFS)];
@@ -247,13 +247,13 @@ struct ath_atx_tid {
 	s8 bar_index;
 	bool active;
 	bool clear_ps_filter;
+	bool swq_nonempty;
 };
 
 struct ath_node {
 	struct ath_softc *sc;
 	struct ieee80211_sta *sta; /* station struct we're part of */
 	struct ieee80211_vif *vif; /* interface with which we're associated */
-	struct ath_atx_tid tid[IEEE80211_NUM_TIDS];
 
 	u16 maxampdu;
 	u8 mpdudensity;
@@ -271,6 +271,15 @@ struct ath_node {
 	struct list_head list;
 };
 
+static inline
+struct ath_atx_tid *ath_an_2_tid(struct ath_node *an, u8 tidno)
+{
+	struct ieee80211_sta *sta = an->sta;
+	struct ieee80211_vif *vif = an->vif;
+	struct ieee80211_txq *swq = sta ? sta->txq[tidno] : vif->txq;
+	return (struct ath_atx_tid *) swq->drv_priv;
+}
+
 struct ath_tx_control {
 	struct ath_txq *txq;
 	struct ath_node *an;
@@ -585,6 +594,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 				   u16 tids, int nframes,
 				   enum ieee80211_frame_release_type reason,
 				   bool more_data);
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *swq);
 
 /********/
 /* VIFs */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index b66cfa9..0e7f6b5 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -25,6 +25,7 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
 {
 	struct ath_node *an = file->private_data;
 	struct ath_softc *sc = an->sc;
+	struct ieee80211_txq *swq;
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
 	u32 len = 0, size = 4096;
@@ -52,8 +53,10 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
 			 "TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
 			 "BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0;
+	     tidno < IEEE80211_NUM_TIDS; tidno++) {
+		swq = an->sta->txq[tidno];
+		tid = (struct ath_atx_tid *) swq->drv_priv;
 		txq = tid->txq;
 		ath_txq_lock(sc, txq);
 		if (tid->active) {
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 2ee8624..211736c 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -873,6 +873,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
 	hw->max_rate_tries = 10;
 	hw->sta_data_size = sizeof(struct ath_node);
 	hw->vif_data_size = sizeof(struct ath_vif);
+	hw->txq_data_size = sizeof(struct ath_atx_tid);
 	hw->extra_tx_headroom = 4;
 
 	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 8b63988..6ab56e5 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2668,4 +2668,5 @@ struct ieee80211_ops ath9k_ops = {
 	.sw_scan_start	    = ath9k_sw_scan_start,
 	.sw_scan_complete   = ath9k_sw_scan_complete,
 	.get_txpower        = ath9k_get_txpower,
+	.wake_tx_queue      = ath9k_wake_tx_queue,
 };
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 8ddd604..cdc8684 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
 					   struct ath_txq *txq,
 					   struct ath_atx_tid *tid,
 					   struct sk_buff *skb);
+static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
+			  struct ath_tx_control *txctl);
 
 enum {
 	MCS_HT20,
@@ -118,6 +120,21 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
 		list_add_tail(&tid->list, list);
 }
 
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *swq)
+{
+	struct ath_softc *sc = hw->priv;
+	struct ath_atx_tid *tid = (struct ath_atx_tid *) swq->drv_priv;
+	struct ath_txq *txq = tid->txq;
+
+	spin_lock_bh(&txq->axq_lock);
+
+	tid->swq_nonempty = true;
+	ath_tx_queue_tid(sc, txq, tid);
+	ath_txq_schedule(sc, txq);
+
+	spin_unlock_bh(&txq->axq_lock);
+}
+
 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
 {
 	struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
@@ -170,12 +187,51 @@ static struct ath_atx_tid *
 ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
 {
 	u8 tidno = skb->priority & IEEE80211_QOS_CTL_TID_MASK;
-	return ATH_AN_2_TID(an, tidno);
+	return ath_an_2_tid(an, tidno);
 }
 
+static void ath_swq_pull(struct ath_atx_tid *tid)
+{
+	struct sk_buff *skb;
+	struct ath_tx_control txctl;
+	struct ath_frame_info *fi;
+	int r;
+
+	if (!skb_queue_empty(&tid->i_q))
+		return;
+
+	if (!tid->swq_nonempty)
+		return;
+
+	skb = ieee80211_tx_dequeue(tid->an->sc->hw, tid->swq);
+	if (!skb) {
+		tid->swq_nonempty = false;
+	} else {
+		/* sad to do all this with axq_lock held */
+		memset(&txctl, 0, sizeof txctl);
+		txctl.txq = tid->txq;
+		txctl.sta = tid->an->sta;
+		r = ath_tx_prepare(tid->an->sc->hw, skb, &txctl);
+		if (WARN_ON(r != 0)) {
+			/** should not happen ??? */
+		} else {
+			/* perhaps not needed here ??? */
+			fi = get_frame_info(skb);
+			fi->txq = skb_get_queue_mapping(skb);
+
+			__skb_queue_tail(&tid->i_q, skb);
+			++tid->txq->pending_frames;
+		}
+	}
+ }
+
+
 static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
 {
-	return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
+	if (!skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q) || !skb_queue_empty(&tid->i_q))
+		return true;
+	ath_swq_pull(tid);
+	return !skb_queue_empty(&tid->i_q);
 }
 
 static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
@@ -185,6 +241,12 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
 	skb = __skb_dequeue(&tid->retry_q);
 	if (!skb)
 		skb = __skb_dequeue(&tid->buf_q);
+	if (!skb)
+		skb = __skb_dequeue(&tid->i_q);
+	if (!skb) {
+		ath_swq_pull(tid);
+		skb = __skb_dequeue(&tid->i_q);
+	}
 
 	return skb;
 }
@@ -870,6 +932,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 		*q = &tid->retry_q;
 		if (skb_queue_empty(*q))
 			*q = &tid->buf_q;
+		if (skb_queue_empty(*q))
+			*q = &tid->i_q;
+		if (skb_queue_empty(*q))
+			ath_swq_pull(tid);
 
 		skb = skb_peek(*q);
 		if (!skb)
@@ -1482,7 +1548,7 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
 	an = (struct ath_node *)sta->drv_priv;
-	txtid = ATH_AN_2_TID(an, tid);
+	txtid = ath_an_2_tid(an, tid);
 	txq = txtid->txq;
 
 	ath_txq_lock(sc, txq);
@@ -1517,7 +1583,7 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
 {
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
 	struct ath_node *an = (struct ath_node *)sta->drv_priv;
-	struct ath_atx_tid *txtid = ATH_AN_2_TID(an, tid);
+	struct ath_atx_tid *txtid = ath_an_2_tid(an, tid);
 	struct ath_txq *txq = txtid->txq;
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
@@ -1533,6 +1599,7 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 		       struct ath_node *an)
 {
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	struct ieee80211_txq *swq;
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
 	bool buffered;
@@ -1540,9 +1607,11 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0;
+	     tidno < IEEE80211_NUM_TIDS; tidno++) {
 
+		swq = an->sta->txq[tidno];
+		tid = (struct ath_atx_tid *) swq->drv_priv;
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -1565,15 +1634,18 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
 {
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	struct ieee80211_txq *swq;
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
 	int tidno;
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0;
+	     tidno < IEEE80211_NUM_TIDS; tidno++) {
 
+		swq = an->sta->txq[tidno];
+		tid = (struct ath_atx_tid *) swq->drv_priv;
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -1599,7 +1671,7 @@ void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
 	an = (struct ath_node *)sta->drv_priv;
-	tid = ATH_AN_2_TID(an, tidno);
+	tid = ath_an_2_tid(an, tidno);
 	txq = tid->txq;
 
 	ath_txq_lock(sc, txq);
@@ -1637,7 +1709,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 		if (!(tids & 1))
 			continue;
 
-		tid = ATH_AN_2_TID(an, i);
+		tid = ath_an_2_tid(an, i);
 
 		ath_txq_lock(sc, tid->txq);
 		while (nframes > 0) {
@@ -2853,12 +2925,18 @@ int ath_tx_init(struct ath_softc *sc, int nbufs)
 
 void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 {
+	struct ieee80211_txq *swq;
+	struct ieee80211_sta *sta = an->sta;
+	struct ieee80211_vif *vif = an->vif;
 	struct ath_atx_tid *tid;
 	int tidno, acno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
+	for (tidno = 0;
 	     tidno < IEEE80211_NUM_TIDS;
-	     tidno++, tid++) {
+	     tidno++) {
+		swq = sta ? sta->txq[tidno] : vif->txq;
+		tid = (struct ath_atx_tid *) swq->drv_priv;
+		tid->swq       = swq;
 		tid->an        = an;
 		tid->tidno     = tidno;
 		tid->seq_start = tid->seq_next = 0;
@@ -2866,23 +2944,33 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 		tid->baw_head  = tid->baw_tail = 0;
 		tid->active	   = false;
 		tid->clear_ps_filter = true;
+		tid->swq_nonempty  = false;
+		__skb_queue_head_init(&tid->i_q);
 		__skb_queue_head_init(&tid->buf_q);
 		__skb_queue_head_init(&tid->retry_q);
 		INIT_LIST_HEAD(&tid->list);
 		acno = TID_TO_WME_AC(tidno);
 		tid->txq = sc->tx.txq_map[acno];
+
+		if (!sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
 void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 {
+	struct ieee80211_txq *swq;
+	struct ieee80211_sta *sta = an->sta;
+	struct ieee80211_vif *vif = an->vif;
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
 	int tidno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0;
+	     tidno < IEEE80211_NUM_TIDS; tidno++) {
 
+		swq = sta ? sta->txq[tidno] : vif->txq;
+		tid = (struct ath_atx_tid *) swq->drv_priv;
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -2894,6 +2982,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 		tid->active = false;
 
 		ath_txq_unlock(sc, txq);
+
+		if (!sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 2/2] ath9k: Add a per-station airtime deficit scheduler
  2016-06-17  9:09 [ath9k-devel] [PATCH 0/2] ath9k: Add airtime fairness scheduler Toke Høiland-Jørgensen
  2016-06-17  9:09 ` [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues Toke Høiland-Jørgensen
@ 2016-06-17  9:09 ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-17  9:09 UTC (permalink / raw)
  To: ath9k-devel

This modifies the logic in ath_txq_schedule to account airtime consumed
by each station and uses a deficit-based scheduler derived from FQ-CoDel
to try to enforce airtime fairness. A debugfs entry controls whether TX
airtime, RX airtime or both is accounted to the deficit on which the
scheduler makes decisions.

Uses the ts->duration + retry-chain information to account for time
spent transmitting to a station. The RX airtime is measured as the
duration from first to last frame in an aggregate, using the rs_tstamp
fields.

Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
---
 drivers/net/wireless/ath/ath9k/ath9k.h     |  18 +++-
 drivers/net/wireless/ath/ath9k/channel.c   |  12 ++-
 drivers/net/wireless/ath/ath9k/debug.c     |   3 +
 drivers/net/wireless/ath/ath9k/debug.h     |  29 ++++++
 drivers/net/wireless/ath/ath9k/debug_sta.c |  46 ++++++++++
 drivers/net/wireless/ath/ath9k/init.c      |   1 +
 drivers/net/wireless/ath/ath9k/main.c      |   6 +-
 drivers/net/wireless/ath/ath9k/recv.c      |  60 +++++++++++++
 drivers/net/wireless/ath/ath9k/xmit.c      | 136 ++++++++++++++++++++---------
 9 files changed, 261 insertions(+), 50 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index caeae10..e5a930c 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -261,9 +261,12 @@ struct ath_node {
 
 	bool sleeping;
 	bool no_ps_filter;
+	s64 airtime_deficit;
+	u32 airtime_rx_start;
 
 #ifdef CONFIG_ATH9K_STATION_STATISTICS
 	struct ath_rx_rate_stats rx_rate_stats;
+	struct ath_airtime_stats airtime_stats;
 #endif
 	u8 key_idx[4];
 
@@ -331,10 +334,15 @@ struct ath_rx {
 /* Channel Context */
 /*******************/
 
+struct ath_acq {
+	struct list_head acq_new;
+	struct list_head acq_old;
+};
+
 struct ath_chanctx {
 	struct cfg80211_chan_def chandef;
 	struct list_head vifs;
-	struct list_head acq[IEEE80211_NUM_ACS];
+	struct ath_acq acq[IEEE80211_NUM_ACS];
 	int hw_queue_base;
 
 	/* do not dereference, use for comparison only */
@@ -573,6 +581,8 @@ void ath_txq_schedule_all(struct ath_softc *sc);
 int ath_tx_init(struct ath_softc *sc, int nbufs);
 int ath_txq_update(struct ath_softc *sc, int qnum,
 		   struct ath9k_tx_queue_info *q);
+u32 ath_pkt_duration(struct ath_softc *sc, u8 rix, int pktlen,
+		     int width, int half_gi, bool shortPreamble);
 void ath_update_max_aggr_framelen(struct ath_softc *sc, int queue, int txop);
 void ath_assign_seq(struct ath_common *common, struct sk_buff *skb);
 int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
@@ -959,6 +969,10 @@ void ath_ant_comb_scan(struct ath_softc *sc, struct ath_rx_status *rs);
 
 #define ATH9K_NUM_CHANCTX  2 /* supports 2 operating channels */
 
+#define AIRTIME_USE_TX BIT(0)
+#define AIRTIME_USE_RX BIT(1)
+#define AIRTIME_ACTIVE(flags) (!!(flags & (AIRTIME_USE_TX|AIRTIME_USE_RX)))
+
 struct ath_softc {
 	struct ieee80211_hw *hw;
 	struct device *dev;
@@ -1001,6 +1015,8 @@ struct ath_softc {
 	short nbcnvifs;
 	unsigned long ps_usecount;
 
+	u16 airtime_flags; /* AIRTIME_* */
+
 	struct ath_rx rx;
 	struct ath_tx tx;
 	struct ath_beacon beacon;
diff --git a/drivers/net/wireless/ath/ath9k/channel.c b/drivers/net/wireless/ath/ath9k/channel.c
index e56bafc..2594029 100644
--- a/drivers/net/wireless/ath/ath9k/channel.c
+++ b/drivers/net/wireless/ath/ath9k/channel.c
@@ -118,8 +118,10 @@ void ath_chanctx_init(struct ath_softc *sc)
 		INIT_LIST_HEAD(&ctx->vifs);
 		ctx->txpower = ATH_TXPOWER_MAX;
 		ctx->flush_timeout = HZ / 5; /* 200ms */
-		for (j = 0; j < ARRAY_SIZE(ctx->acq); j++)
-			INIT_LIST_HEAD(&ctx->acq[j]);
+		for (j = 0; j < ARRAY_SIZE(ctx->acq); j++) {
+			INIT_LIST_HEAD(&ctx->acq[j].acq_new);
+			INIT_LIST_HEAD(&ctx->acq[j].acq_old);
+		}
 	}
 }
 
@@ -1344,8 +1346,10 @@ void ath9k_offchannel_init(struct ath_softc *sc)
 	ctx->txpower = ATH_TXPOWER_MAX;
 	cfg80211_chandef_create(&ctx->chandef, chan, NL80211_CHAN_HT20);
 
-	for (i = 0; i < ARRAY_SIZE(ctx->acq); i++)
-		INIT_LIST_HEAD(&ctx->acq[i]);
+	for (i = 0; i < ARRAY_SIZE(ctx->acq); i++) {
+		INIT_LIST_HEAD(&ctx->acq[i].acq_new);
+		INIT_LIST_HEAD(&ctx->acq[i].acq_old);
+	}
 
 	sc->offchannel.chan.offchannel = true;
 }
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index c56e40f..413de3c 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -1411,5 +1411,8 @@ int ath9k_init_debug(struct ath_hw *ah)
 	debugfs_create_file("tpc", S_IRUSR | S_IWUSR,
 			    sc->debug.debugfs_phy, sc, &fops_tpc);
 
+	debugfs_create_u16("airtime_flags", S_IRUSR | S_IWUSR,
+			   sc->debug.debugfs_phy, &sc->airtime_flags);
+
 	return 0;
 }
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index cd68c5f..bf1a540 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -223,6 +223,11 @@ struct ath_rx_rate_stats {
 	} cck_stats[4];
 };
 
+struct ath_airtime_stats {
+	u32 rx_airtime;
+	u32 tx_airtime;
+};
+
 #define ANT_MAIN 0
 #define ANT_ALT  1
 
@@ -316,12 +321,36 @@ ath9k_debug_sync_cause(struct ath_softc *sc, u32 sync_cause)
 void ath_debug_rate_stats(struct ath_softc *sc,
 			  struct ath_rx_status *rs,
 			  struct sk_buff *skb);
+void ath_debug_tx_airtime(struct ath_softc *sc,
+		          struct ath_buf *bf,
+			  struct ath_tx_status *ts);
+void ath_debug_rx_airtime(struct ath_softc *sc,
+			  struct ath_rx_status *rs,
+			  struct sk_buff *skb);
+void ath_debug_airtime(struct ath_softc *sc,
+		       struct ath_node *an,
+		       u32 rx, u32 tx);
 #else
 static inline void ath_debug_rate_stats(struct ath_softc *sc,
 					struct ath_rx_status *rs,
 					struct sk_buff *skb)
 {
 }
+static inline void ath_debug_tx_airtime(struct ath_softc *sc,
+					struct ath_buf *bf,
+					struct ath_tx_status *ts)
+{
+}
+static inline void ath_debug_rx_airtime(struct ath_softc *sc,
+					struct ath_rx_status *rs,
+					struct sk_buff *skb)
+{
+}
+static void ath_debug_airtime(struct ath_softc *sc,
+			      struct ath_node *an,
+			      u32 rx, u32 tx)
+{
+}
 #endif /* CONFIG_ATH9K_STATION_STATISTICS */
 
 #endif /* DEBUG_H */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index 0e7f6b5..e7f2ef2 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -245,6 +245,51 @@ static const struct file_operations fops_node_recv = {
 	.llseek = default_llseek,
 };
 
+void ath_debug_airtime(struct ath_softc *sc,
+		struct ath_node *an,
+		u32 rx,
+		u32 tx)
+{
+	struct ath_airtime_stats *astats = &an->airtime_stats;
+
+	astats->rx_airtime += rx;
+	astats->tx_airtime += tx;
+}
+
+static ssize_t read_airtime(struct file *file, char __user *user_buf,
+			size_t count, loff_t *ppos)
+{
+	struct ath_node *an = file->private_data;
+	struct ath_airtime_stats *astats;
+	u32 len = 0, size = 128;
+	char *buf;
+	size_t retval;
+
+	buf = kzalloc(size, GFP_KERNEL);
+	if (buf == NULL)
+		return -ENOMEM;
+
+	astats = &an->airtime_stats;
+
+	len += scnprintf(buf + len, size - len, "RX: %u us\n", astats->rx_airtime);
+	len += scnprintf(buf + len, size - len, "TX: %u us\n", astats->tx_airtime);
+	len += scnprintf(buf + len, size - len, "Deficit: %lld us\n", an->airtime_deficit);
+
+	retval = simple_read_from_buffer(user_buf, count, ppos, buf, len);
+	kfree(buf);
+
+	return retval;
+}
+
+
+static const struct file_operations fops_airtime = {
+	.read = read_airtime,
+	.open = simple_open,
+	.owner = THIS_MODULE,
+	.llseek = default_llseek,
+};
+
+
 void ath9k_sta_add_debugfs(struct ieee80211_hw *hw,
 			   struct ieee80211_vif *vif,
 			   struct ieee80211_sta *sta,
@@ -254,4 +299,5 @@ void ath9k_sta_add_debugfs(struct ieee80211_hw *hw,
 
 	debugfs_create_file("node_aggr", S_IRUGO, dir, an, &fops_node_aggr);
 	debugfs_create_file("node_recv", S_IRUGO, dir, an, &fops_node_recv);
+	debugfs_create_file("airtime", S_IRUGO, dir, an, &fops_airtime);
 }
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 211736c..f4e9dd3 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -560,6 +560,7 @@ static int ath9k_init_softc(u16 devid, struct ath_softc *sc,
 
 	/* Will be cleared in ath9k_start() */
 	set_bit(ATH_OP_INVALID, &common->op_flags);
+	sc->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
 
 	sc->sc_ah = ah;
 	sc->dfs_detector = dfs_pattern_detector_init(common, NL80211_DFS_UNSET);
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 6ab56e5..e13068b 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -70,10 +70,10 @@ static bool ath9k_has_pending_frames(struct ath_softc *sc, struct ath_txq *txq,
 		goto out;
 
 	if (txq->mac80211_qnum >= 0) {
-		struct list_head *list;
+		struct ath_acq *acq;
 
-		list = &sc->cur_chan->acq[txq->mac80211_qnum];
-		if (!list_empty(list))
+		acq = &sc->cur_chan->acq[txq->mac80211_qnum];
+		if (!list_empty(&acq->acq_new) || !list_empty(&acq->acq_old))
 			pending = true;
 	}
 out:
diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index 32160fc..a48667f 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -991,6 +991,65 @@ static void ath9k_apply_ampdu_details(struct ath_softc *sc,
 	}
 }
 
+static void ath_rx_count_airtime(struct ath_softc *sc,
+				 struct ath_rx_status *rs,
+				 struct sk_buff *skb)
+{
+	struct ath_node *an;
+	struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
+	struct ath_hw *ah = sc->sc_ah;
+	struct ath_common *common = ath9k_hw_common(ah);
+	struct ieee80211_sta *sta;
+	struct ieee80211_rx_status *rxs;
+	const struct ieee80211_rate *rate;
+	bool is_sgi, is_40, is_sp;
+	int phy;
+	u32 airtime = 0;
+
+	if (!ieee80211_is_data(hdr->frame_control))
+		return;
+
+	rcu_read_lock();
+
+	sta = ieee80211_find_sta_by_ifaddr(sc->hw, hdr->addr2, NULL);
+	if (!sta)
+		goto exit;
+	an = (struct ath_node *) sta->drv_priv;
+
+	if (rs->rs_isaggr && rs->rs_firstaggr) {
+		an->airtime_rx_start = rs->rs_tstamp;
+	} else if (rs->rs_isaggr && !rs->rs_moreaggr && an->airtime_rx_start) {
+		airtime = rs->rs_tstamp - an->airtime_rx_start;
+	} else if (!rs->rs_isaggr) {
+		an->airtime_rx_start = 0;
+
+		rxs = IEEE80211_SKB_RXCB(skb);
+
+		is_sgi = !!(rxs->flag & RX_FLAG_SHORT_GI);
+		is_40 = !!(rxs->flag & RX_FLAG_40MHZ);
+		is_sp = !!(rxs->flag & RX_FLAG_SHORTPRE);
+
+		if (!!(rxs->flag & RX_FLAG_HT)) {
+			/* MCS rates */
+
+			airtime += ath_pkt_duration(sc, rxs->rate_idx, rs->rs_datalen,
+						is_40, is_sgi, is_sp);
+		} else {
+
+			phy = IS_CCK_RATE(rs->rs_rate) ? WLAN_RC_PHY_CCK : WLAN_RC_PHY_OFDM;
+			rate = &common->sbands[rxs->band].bitrates[rxs->rate_idx];
+			airtime += ath9k_hw_computetxtime(ah, phy, rate->bitrate * 100,
+							rs->rs_datalen, rxs->rate_idx, is_sp);
+		}
+	}
+
+ 	if (!!(sc->airtime_flags & AIRTIME_USE_RX))
+		an->airtime_deficit -= airtime;
+	ath_debug_airtime(sc, an, airtime, 0);
+exit:
+	rcu_read_unlock();
+}
+
 int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
 {
 	struct ath_rxbuf *bf;
@@ -1137,6 +1196,7 @@ int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
 		ath9k_antenna_check(sc, &rs);
 		ath9k_apply_ampdu_details(sc, &rs, rxs);
 		ath_debug_rate_stats(sc, &rs, skb);
+		ath_rx_count_airtime(sc, &rs, skb);
 
 		hdr = (struct ieee80211_hdr *)skb->data;
 		if (ieee80211_is_ack(hdr->frame_control))
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index cdc8684..ef0a4a1 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -108,16 +108,19 @@ void ath_txq_unlock_complete(struct ath_softc *sc, struct ath_txq *txq)
 static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
 			     struct ath_atx_tid *tid)
 {
-	struct list_head *list;
 	struct ath_vif *avp = (struct ath_vif *) tid->an->vif->drv_priv;
 	struct ath_chanctx *ctx = avp->chanctx;
+	struct ath_acq *acq;
+	struct list_head *tid_list;
 
 	if (!ctx)
 		return;
 
-	list = &ctx->acq[TID_TO_WME_AC(tid->tidno)];
+
+	acq = &ctx->acq[TID_TO_WME_AC(tid->tidno)];
+	tid_list = AIRTIME_ACTIVE(sc->airtime_flags) ? &acq->acq_new : &acq->acq_old;
 	if (list_empty(&tid->list))
-		list_add_tail(&tid->list, list);
+		list_add_tail(&tid->list, tid_list);
 }
 
 void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *swq)
@@ -722,6 +725,48 @@ static bool bf_is_ampdu_not_probing(struct ath_buf *bf)
     return bf_isampdu(bf) && !(info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE);
 }
 
+static void ath_tx_count_airtime(struct ath_softc *sc,
+				 struct ath_buf *bf,
+				 struct ath_tx_status *ts)
+{
+	struct ath_node *an;
+	struct sk_buff *skb;
+	struct ieee80211_hdr *hdr;
+	struct ieee80211_hw *hw = sc->hw;
+	struct ieee80211_tx_rate rates[4];
+	struct ieee80211_sta *sta;
+	int i;
+	u32 airtime = 0;
+
+	skb = bf->bf_mpdu;
+	if(!skb)
+		return;
+
+	hdr = (struct ieee80211_hdr *)skb->data;
+	memcpy(rates, bf->rates, sizeof(rates));
+
+	rcu_read_lock();
+
+	sta = ieee80211_find_sta_by_ifaddr(hw, hdr->addr1, hdr->addr2);
+	if(!sta)
+		goto exit;
+
+
+	an = (struct ath_node *) sta->drv_priv;
+
+	airtime += ts->duration * (ts->ts_longretry + 1);
+
+	for(i=0; i < ts->ts_rateindex; i++)
+		airtime += ath9k_hw_get_duration(sc->sc_ah, bf->bf_desc, i) * rates[i].count;
+
+	if (!!(sc->airtime_flags & AIRTIME_USE_TX))
+		an->airtime_deficit -= airtime;
+	ath_debug_airtime(sc, an, 0, airtime);
+
+exit:
+	rcu_read_unlock();
+}
+
 static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,
 				  struct ath_tx_status *ts, struct ath_buf *bf,
 				  struct list_head *bf_head)
@@ -739,6 +784,8 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,
 
 	ts->duration = ath9k_hw_get_duration(sc->sc_ah, bf->bf_desc,
 					     ts->ts_rateindex);
+	ath_tx_count_airtime(sc, bf, ts);
+
 	if (!bf_isampdu(bf)) {
 		if (!flush) {
 			info = IEEE80211_SKB_CB(bf->bf_mpdu);
@@ -751,6 +798,7 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,
 	} else
 		ath_tx_complete_aggr(sc, txq, bf, bf_head, ts, txok);
 
+
 	if (!flush)
 		ath_txq_schedule(sc, txq);
 }
@@ -1090,8 +1138,8 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
  * width  - 0 for 20 MHz, 1 for 40 MHz
  * half_gi - to use 4us v/s 3.6 us for symbol time
  */
-static u32 ath_pkt_duration(struct ath_softc *sc, u8 rix, int pktlen,
-			    int width, int half_gi, bool shortPreamble)
+u32 ath_pkt_duration(struct ath_softc *sc, u8 rix, int pktlen,
+		     int width, int half_gi, bool shortPreamble)
 {
 	u32 nbits, nsymbits, duration, nsymbols;
 	int streams;
@@ -1490,7 +1538,7 @@ ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 }
 
 static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
-			      struct ath_atx_tid *tid, bool *stop)
+			      struct ath_atx_tid *tid)
 {
 	struct ath_buf *bf;
 	struct ieee80211_tx_info *tx_info;
@@ -1512,7 +1560,6 @@ static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
 	aggr = !!(tx_info->flags & IEEE80211_TX_CTL_AMPDU);
 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
 		(!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
-		*stop = true;
 		return false;
 	}
 
@@ -1984,9 +2031,10 @@ void ath_tx_cleanupq(struct ath_softc *sc, struct ath_txq *txq)
 void ath_txq_schedule(struct ath_softc *sc, struct ath_txq *txq)
 {
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
-	struct ath_atx_tid *tid, *last_tid;
+	struct ath_atx_tid *tid;
 	struct list_head *tid_list;
-	bool sent = false;
+	struct ath_acq *acq;
+	bool active = AIRTIME_ACTIVE(sc->airtime_flags);
 
 	if (txq->mac80211_qnum < 0)
 		return;
@@ -1995,48 +2043,50 @@ void ath_txq_schedule(struct ath_softc *sc, struct ath_txq *txq)
 		return;
 
 	spin_lock_bh(&sc->chan_lock);
-	tid_list = &sc->cur_chan->acq[txq->mac80211_qnum];
+	acq = &sc->cur_chan->acq[txq->mac80211_qnum];
 
-	if (list_empty(tid_list)) {
-		spin_unlock_bh(&sc->chan_lock);
-		return;
-	}
+	if (sc->cur_chan->stopped)
+		goto out;
 
 	rcu_read_lock();
+begin:
+	tid_list = &acq->acq_new;
+	if (list_empty(tid_list)) {
+		tid_list = &acq->acq_old;
+		if (list_empty(tid_list))
+			goto out;
+	}
+	tid = list_first_entry(tid_list, struct ath_atx_tid, list);
 
-	last_tid = list_entry(tid_list->prev, struct ath_atx_tid, list);
-	while (!list_empty(tid_list)) {
-		bool stop = false;
-
-		if (sc->cur_chan->stopped)
-			break;
-
-		tid = list_first_entry(tid_list, struct ath_atx_tid, list);
-		list_del_init(&tid->list);
-
-		if (ath_tx_sched_aggr(sc, txq, tid, &stop))
-			sent = true;
-
-		/*
-		 * add tid to round-robin queue if more frames
-		 * are pending for the tid
-		 */
-		if (ath_tid_has_buffered(tid))
-			ath_tx_queue_tid(sc, txq, tid);
+	if (active && tid->an->airtime_deficit <= 0) {
+		tid->an->airtime_deficit += 300;
+		list_move_tail(&tid->list, &acq->acq_old);
+		goto begin;
+	}
 
-		if (stop)
-			break;
+	if (!ath_tid_has_buffered(tid)) {
+		if ((tid_list == &acq->acq_new) && !list_empty(&acq->acq_old))
+			list_move_tail(&tid->list, &acq->acq_old);
+		else
+			list_del_init(&tid->list);
+		goto begin;
+	}
 
-		if (tid == last_tid) {
-			if (!sent)
-				break;
 
-			sent = false;
-			last_tid = list_entry(tid_list->prev,
-					      struct ath_atx_tid, list);
-		}
+	/* If a station succeeds in queueing something, immediately restart the
+	 * loop. This makes sure the queues are shuffled if the station now has
+	 * no more packets queued, and also ensures we keep the hardware queues
+	 * full.
+	 *
+	 * If we dequeued from a new queue, shuffle the queues, to prevent it
+	 * from hogging too much airtime. */
+	if(ath_tx_sched_aggr(sc, txq, tid)) {
+		if (!active || ((tid_list == &acq->acq_new) && !list_empty(&acq->acq_old)))
+			list_move_tail(&tid->list, &acq->acq_old);
+		goto begin;
 	}
 
+out:
 	rcu_read_unlock();
 	spin_unlock_bh(&sc->chan_lock);
 }
@@ -2931,6 +2981,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 	struct ath_atx_tid *tid;
 	int tidno, acno;
 
+	an->airtime_deficit = 300;
+
 	for (tidno = 0;
 	     tidno < IEEE80211_NUM_TIDS;
 	     tidno++) {
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17  9:09 ` [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues Toke Høiland-Jørgensen
@ 2016-06-17 13:28   ` Felix Fietkau
  2016-06-17 13:43     ` Toke Høiland-Jørgensen
                       ` (2 more replies)
  2016-06-18 19:06   ` [ath9k-devel] [PATCH] ath9k: Switch to using " Toke Høiland-Jørgensen
  1 sibling, 3 replies; 50+ messages in thread
From: Felix Fietkau @ 2016-06-17 13:28 UTC (permalink / raw)
  To: ath9k-devel

On 2016-06-17 11:09, Toke H?iland-J?rgensen wrote:
> This patch leaves the code for ath9k's internal per-node per-tid
> queues in place and just modifies the driver to also pull from
> the new mac80211 intermediate software queues, and implements
> the .wake_tx_queue method, which will cause mac80211 to deliver
> packets to be sent via the new intermediate queue.
> 
> Signed-off-by: Tim Shepard <shep@alum.mit.edu>
> 
> Reworked to not require the global variable renaming in ath9k.
> 
> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
> ---
>  drivers/net/wireless/ath/ath9k/ath9k.h     |  16 +++-
>  drivers/net/wireless/ath/ath9k/debug_sta.c |   7 +-
>  drivers/net/wireless/ath/ath9k/init.c      |   1 +
>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>  drivers/net/wireless/ath/ath9k/xmit.c      | 119 +++++++++++++++++++++++++----
>  5 files changed, 125 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
> index 93b3793..caeae10 100644
> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> @@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>  #define BAW_WITHIN(_start, _bawsz, _seqno) \
>  	((((_seqno) - (_start)) & 4095) < (_bawsz))
>  
> -#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
> -
>  #define IS_HT_RATE(rate)   (rate & 0x80)
>  #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
>  #define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
> @@ -232,8 +230,10 @@ struct ath_buf {
>  
>  struct ath_atx_tid {
>  	struct list_head list;
> +	struct sk_buff_head i_q;
Do we really need a third queue here? Instead of adding yet another
layer of queueing here, I think we should even get rid of buf_q.

Channel context based queue handling can be dealt with by
stopping/starting relevant queues on channel context changes.

buf_q becomes unnecessary when you remove all code in the drv_tx
codepath that moves frames to the intermediate queue.

Any frame that was pulled from the intermediate queue and prepared for
tx, but which can't be sent right now can simply be queued to retry_q.

This will also help with getting the diffstat insertion/deletion ratio
under control ;)

>  	struct sk_buff_head buf_q;
>  	struct sk_buff_head retry_q;
> +	struct ieee80211_txq *swq;
No need for this pointer, you can use container_of.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 13:28   ` Felix Fietkau
@ 2016-06-17 13:43     ` Toke Høiland-Jørgensen
  2016-06-17 13:48       ` Felix Fietkau
  2016-06-17 14:08     ` Tim Shepard
  2016-06-17 14:10     ` Dave Taht
  2 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-17 13:43 UTC (permalink / raw)
  To: ath9k-devel

Felix Fietkau <nbd@nbd.name> writes:

> On 2016-06-17 11:09, Toke H?iland-J?rgensen wrote:
>> This patch leaves the code for ath9k's internal per-node per-tid
>> queues in place and just modifies the driver to also pull from
>> the new mac80211 intermediate software queues, and implements
>> the .wake_tx_queue method, which will cause mac80211 to deliver
>> packets to be sent via the new intermediate queue.
>> 
>> Signed-off-by: Tim Shepard <shep@alum.mit.edu>
>> 
>> Reworked to not require the global variable renaming in ath9k.
>> 
>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>> ---
>>  drivers/net/wireless/ath/ath9k/ath9k.h     |  16 +++-
>>  drivers/net/wireless/ath/ath9k/debug_sta.c |   7 +-
>>  drivers/net/wireless/ath/ath9k/init.c      |   1 +
>>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>>  drivers/net/wireless/ath/ath9k/xmit.c      | 119 +++++++++++++++++++++++++----
>>  5 files changed, 125 insertions(+), 19 deletions(-)
>> 
>> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
>> index 93b3793..caeae10 100644
>> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
>> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
>> @@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>>  #define BAW_WITHIN(_start, _bawsz, _seqno) \
>>  	((((_seqno) - (_start)) & 4095) < (_bawsz))
>>  
>> -#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
>> -
>>  #define IS_HT_RATE(rate)   (rate & 0x80)
>>  #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
>>  #define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
>> @@ -232,8 +230,10 @@ struct ath_buf {
>>  
>>  struct ath_atx_tid {
>>  	struct list_head list;
>> +	struct sk_buff_head i_q;
> Do we really need a third queue here? Instead of adding yet another
> layer of queueing here, I think we should even get rid of buf_q.

This is definitely something that needs to be improved. One other
sticking point related to this: in the current version of this patch
ath_tid_has_buffered() gains a side effect of pulling from the mac80211
txq, which is obviously not so nice.

The obvious way to get rid of this is to export a txq_has_buffered()
function at the mac80211 layer. But avoiding that may be possible; the
sticking point is what to do with the code paths that do not dequeue
packets, but check ath_tid_has_buffered() to decide whether to schedule
the queue and/or to tell ieee80211_sta_set_buffered() about it (these
are for instance ath_tx_aggr_sleep/wakeup(). Can those just be removed
(i.e. don't call into ieee80211, and always schedule the txq on wakeup?
I'm not familiar enough with the intermediate queues to make that
call...


> Channel context based queue handling can be dealt with by
> stopping/starting relevant queues on channel context changes.

Noted.

> buf_q becomes unnecessary when you remove all code in the drv_tx
> codepath that moves frames to the intermediate queue.
>
> Any frame that was pulled from the intermediate queue and prepared for
> tx, but which can't be sent right now can simply be queued to retry_q.

Right.

> This will also help with getting the diffstat insertion/deletion ratio
> under control ;)

Yes, that would be good ;)

>>  	struct sk_buff_head buf_q;
>>  	struct sk_buff_head retry_q;
>> +	struct ieee80211_txq *swq;
> No need for this pointer, you can use container_of.

Ah, cool, thanks!

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 13:43     ` Toke Høiland-Jørgensen
@ 2016-06-17 13:48       ` Felix Fietkau
  2016-06-17 16:33         ` Felix Fietkau
  0 siblings, 1 reply; 50+ messages in thread
From: Felix Fietkau @ 2016-06-17 13:48 UTC (permalink / raw)
  To: ath9k-devel

On 2016-06-17 15:43, Toke H?iland-J?rgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:
> 
>> On 2016-06-17 11:09, Toke H?iland-J?rgensen wrote:
>>> This patch leaves the code for ath9k's internal per-node per-tid
>>> queues in place and just modifies the driver to also pull from
>>> the new mac80211 intermediate software queues, and implements
>>> the .wake_tx_queue method, which will cause mac80211 to deliver
>>> packets to be sent via the new intermediate queue.
>>> 
>>> Signed-off-by: Tim Shepard <shep@alum.mit.edu>
>>> 
>>> Reworked to not require the global variable renaming in ath9k.
>>> 
>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>> ---
>>>  drivers/net/wireless/ath/ath9k/ath9k.h     |  16 +++-
>>>  drivers/net/wireless/ath/ath9k/debug_sta.c |   7 +-
>>>  drivers/net/wireless/ath/ath9k/init.c      |   1 +
>>>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>>>  drivers/net/wireless/ath/ath9k/xmit.c      | 119 +++++++++++++++++++++++++----
>>>  5 files changed, 125 insertions(+), 19 deletions(-)
>>> 
>>> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
>>> index 93b3793..caeae10 100644
>>> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
>>> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
>>> @@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>>>  #define BAW_WITHIN(_start, _bawsz, _seqno) \
>>>  	((((_seqno) - (_start)) & 4095) < (_bawsz))
>>>  
>>> -#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
>>> -
>>>  #define IS_HT_RATE(rate)   (rate & 0x80)
>>>  #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
>>>  #define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
>>> @@ -232,8 +230,10 @@ struct ath_buf {
>>>  
>>>  struct ath_atx_tid {
>>>  	struct list_head list;
>>> +	struct sk_buff_head i_q;
>> Do we really need a third queue here? Instead of adding yet another
>> layer of queueing here, I think we should even get rid of buf_q.
> 
> This is definitely something that needs to be improved. One other
> sticking point related to this: in the current version of this patch
> ath_tid_has_buffered() gains a side effect of pulling from the mac80211
> txq, which is obviously not so nice.
> 
> The obvious way to get rid of this is to export a txq_has_buffered()
> function at the mac80211 layer. But avoiding that may be possible; the
> sticking point is what to do with the code paths that do not dequeue
> packets, but check ath_tid_has_buffered() to decide whether to schedule
> the queue and/or to tell ieee80211_sta_set_buffered() about it (these
> are for instance ath_tx_aggr_sleep/wakeup(). Can those just be removed
> (i.e. don't call into ieee80211, and always schedule the txq on wakeup?
> I'm not familiar enough with the intermediate queues to make that
> call...
For tx scheduling, we can use swq_nonempty and deal with false positives.
For power save we should only use ieee80211_sta_set_buffered if the
driver itself has buffered some frames. Indication of packets in the
mac80211 intermediate queue is already taken care of inside mac80211.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 13:28   ` Felix Fietkau
  2016-06-17 13:43     ` Toke Høiland-Jørgensen
@ 2016-06-17 14:08     ` Tim Shepard
  2016-06-17 14:35       ` Felix Fietkau
  2016-06-17 14:10     ` Dave Taht
  2 siblings, 1 reply; 50+ messages in thread
From: Tim Shepard @ 2016-06-17 14:08 UTC (permalink / raw)
  To: ath9k-devel



> > diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
> > index 93b3793..caeae10 100644
> > --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> > +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> > @@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
> >  #define BAW_WITHIN(_start, _bawsz, _seqno) \
> >  	((((_seqno) - (_start)) & 4095) < (_bawsz))
> >  
> > -#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
> > -
> >  #define IS_HT_RATE(rate)   (rate & 0x80)
> >  #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
> >  #define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
> > @@ -232,8 +230,10 @@ struct ath_buf {
> >  
> >  struct ath_atx_tid {
> >  	struct list_head list;
> > +	struct sk_buff_head i_q;
> Do we really need a third queue here? Instead of adding yet another
> layer of queueing here, I think we should even get rid of buf_q.
> 
> Channel context based queue handling can be dealt with by
> stopping/starting relevant queues on channel context changes.
> 
> buf_q becomes unnecessary when you remove all code in the drv_tx
> codepath that moves frames to the intermediate queue.
> 
> Any frame that was pulled from the intermediate queue and prepared for
> tx, but which can't be sent right now can simply be queued to retry_q.
> 
> This will also help with getting the diffstat insertion/deletion ratio
> under control ;)
> 
> >  	struct sk_buff_head buf_q;
> >  	struct sk_buff_head retry_q;
> > +	struct ieee80211_txq *swq;
> No need for this pointer, you can use container_of.


Felix, great to hear from you and thanks for your feedback.  I will
try to work on this.

I was struggling to understand the channel context stuff, and I have
no idea how to test it.  (Is there anyone else listening who might be
able to help with testing the channel context stuff as we improve this
patch and simplify the ath9k driver's use of the new mac80211
intermediate queues?)


Felix, do you have any thoughts on the renaming of txq to hwx that I
had done in my original version of this patch?  I had a good e-mail
discussion with Toke a week or two ago (cc these same various lists)
and I believe he came to understand that perhaps the renaming I had
done in the original version of this patch was worth doing.

Now in Toke's version of my patch he calls the ieee80211 txq a "swq"
and the ath9k hardware queue is called a "txq".  (I had called the
ieee80211 txq a "txq" and I renamed the ath9k hardware queue "hwq"
throught all the ath9k driver code.    This also made ath9k's names of
things more similar to mt76 which I was looking at as an example of a
driver that uses your new ieee80211 txq mechanism.

I think the renaming is worth doing, but I also understand the
renaming can be disruptive to others actively working on ath9k.
It would be nice to have another opinion on this.


			-Tim Shepard
			 shep at alum.mit.edu

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 13:28   ` Felix Fietkau
  2016-06-17 13:43     ` Toke Høiland-Jørgensen
  2016-06-17 14:08     ` Tim Shepard
@ 2016-06-17 14:10     ` Dave Taht
  2 siblings, 0 replies; 50+ messages in thread
From: Dave Taht @ 2016-06-17 14:10 UTC (permalink / raw)
  To: ath9k-devel

>>  struct ath_atx_tid {
>>       struct list_head list;
>> +     struct sk_buff_head i_q;
> Do we really need a third queue here? Instead of adding yet another
> layer of queueing here, I think we should even get rid of buf_q.

Less queues, more filling!

>
> Channel context based queue handling can be dealt with by
> stopping/starting relevant queues on channel context changes.

what can be done to reduce the impact of channel scans?

http://blog.cerowrt.org/post/disabling_channel_scans/

> buf_q becomes unnecessary when you remove all code in the drv_tx
> codepath that moves frames to the intermediate queue.
>
> Any frame that was pulled from the intermediate queue and prepared for
> tx, but which can't be sent right now can simply be queued to retry_q.
>
> This will also help with getting the diffstat insertion/deletion ratio
> under control ;)

The ideas here can apply elsewhere, also. Are you still actively
working with the mt76?

Anything else "out there" besides that and the ath5k worth looking at?

Am I seeing patches and firmware changes for better statistic keeping
on the ath10k that look promising for airtime fairness... or am I
delusional?

> elsewhere powersave was mentioned

How big can a powersave queue get?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 14:08     ` Tim Shepard
@ 2016-06-17 14:35       ` Felix Fietkau
  2016-06-17 17:45         ` Tim Shepard
  0 siblings, 1 reply; 50+ messages in thread
From: Felix Fietkau @ 2016-06-17 14:35 UTC (permalink / raw)
  To: ath9k-devel

On 2016-06-17 15:41, Tim Shepard wrote:
>> > diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
>> > index 93b3793..caeae10 100644
>> > --- a/drivers/net/wireless/ath/ath9k/ath9k.h
>> > +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
>> > @@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>> >  #define BAW_WITHIN(_start, _bawsz, _seqno) \
>> >  	((((_seqno) - (_start)) & 4095) < (_bawsz))
>> >  
>> > -#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
>> > -
>> >  #define IS_HT_RATE(rate)   (rate & 0x80)
>> >  #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
>> >  #define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
>> > @@ -232,8 +230,10 @@ struct ath_buf {
>> >  
>> >  struct ath_atx_tid {
>> >  	struct list_head list;
>> > +	struct sk_buff_head i_q;
>> Do we really need a third queue here? Instead of adding yet another
>> layer of queueing here, I think we should even get rid of buf_q.
>> 
>> Channel context based queue handling can be dealt with by
>> stopping/starting relevant queues on channel context changes.
>> 
>> buf_q becomes unnecessary when you remove all code in the drv_tx
>> codepath that moves frames to the intermediate queue.
>> 
>> Any frame that was pulled from the intermediate queue and prepared for
>> tx, but which can't be sent right now can simply be queued to retry_q.
>> 
>> This will also help with getting the diffstat insertion/deletion ratio
>> under control ;)
>> 
>> >  	struct sk_buff_head buf_q;
>> >  	struct sk_buff_head retry_q;
>> > +	struct ieee80211_txq *swq;
>> No need for this pointer, you can use container_of.
> 
> 
> Felix, great to hear from you and thanks for your feedback.  I will
> try to work on this.
> 
> I was struggling to understand the channel context stuff, and I have
> no idea how to test it.  (Is there anyone else listening who might be
> able to help with testing the channel context stuff as we improve this
> patch and simplify the ath9k driver's use of the new mac80211
> intermediate queues?)
> 
> 
> Felix, do you have any thoughts on the renaming of txq to hwx that I
> had done in my original version of this patch?  I had a good e-mail
> discussion with Toke a week or two ago (cc these same various lists)
> and I believe he came to understand that perhaps the renaming I had
> done in the original version of this patch was worth doing.
> 
> Now in Toke's version of my patch he calls the ieee80211 txq a "swq"
> and the ath9k hardware queue is called a "txq".  (I had called the
> ieee80211 txq a "txq" and I renamed the ath9k hardware queue "hwq"
> throught all the ath9k driver code.    This also made ath9k's names of
> things more similar to mt76 which I was looking at as an example of a
> driver that uses your new ieee80211 txq mechanism.
> 
> I think the renaming is worth doing, but I also understand the
> renaming can be disruptive to others actively working on ath9k.
> It would be nice to have another opinion on this.
I think we should finish intermediate queues support first and then look
into the rename later.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 13:48       ` Felix Fietkau
@ 2016-06-17 16:33         ` Felix Fietkau
  0 siblings, 0 replies; 50+ messages in thread
From: Felix Fietkau @ 2016-06-17 16:33 UTC (permalink / raw)
  To: ath9k-devel

On 2016-06-17 15:48, Felix Fietkau wrote:
> On 2016-06-17 15:43, Toke H?iland-J?rgensen wrote:
>> Felix Fietkau <nbd@nbd.name> writes:
>> 
>>> On 2016-06-17 11:09, Toke H?iland-J?rgensen wrote:
>>>> This patch leaves the code for ath9k's internal per-node per-tid
>>>> queues in place and just modifies the driver to also pull from
>>>> the new mac80211 intermediate software queues, and implements
>>>> the .wake_tx_queue method, which will cause mac80211 to deliver
>>>> packets to be sent via the new intermediate queue.
>>>> 
>>>> Signed-off-by: Tim Shepard <shep@alum.mit.edu>
>>>> 
>>>> Reworked to not require the global variable renaming in ath9k.
>>>> 
>>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>>> ---
>>>>  drivers/net/wireless/ath/ath9k/ath9k.h     |  16 +++-
>>>>  drivers/net/wireless/ath/ath9k/debug_sta.c |   7 +-
>>>>  drivers/net/wireless/ath/ath9k/init.c      |   1 +
>>>>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>>>>  drivers/net/wireless/ath/ath9k/xmit.c      | 119 +++++++++++++++++++++++++----
>>>>  5 files changed, 125 insertions(+), 19 deletions(-)
>>>> 
>>>> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
>>>> index 93b3793..caeae10 100644
>>>> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
>>>> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
>>>> @@ -145,8 +145,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>>>>  #define BAW_WITHIN(_start, _bawsz, _seqno) \
>>>>  	((((_seqno) - (_start)) & 4095) < (_bawsz))
>>>>  
>>>> -#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
>>>> -
>>>>  #define IS_HT_RATE(rate)   (rate & 0x80)
>>>>  #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
>>>>  #define IS_OFDM_RATE(rate) ((rate >= 0x8) && (rate <= 0xf))
>>>> @@ -232,8 +230,10 @@ struct ath_buf {
>>>>  
>>>>  struct ath_atx_tid {
>>>>  	struct list_head list;
>>>> +	struct sk_buff_head i_q;
>>> Do we really need a third queue here? Instead of adding yet another
>>> layer of queueing here, I think we should even get rid of buf_q.
>> 
>> This is definitely something that needs to be improved. One other
>> sticking point related to this: in the current version of this patch
>> ath_tid_has_buffered() gains a side effect of pulling from the mac80211
>> txq, which is obviously not so nice.
>> 
>> The obvious way to get rid of this is to export a txq_has_buffered()
>> function at the mac80211 layer. But avoiding that may be possible; the
>> sticking point is what to do with the code paths that do not dequeue
>> packets, but check ath_tid_has_buffered() to decide whether to schedule
>> the queue and/or to tell ieee80211_sta_set_buffered() about it (these
>> are for instance ath_tx_aggr_sleep/wakeup(). Can those just be removed
>> (i.e. don't call into ieee80211, and always schedule the txq on wakeup?
>> I'm not familiar enough with the intermediate queues to make that
>> call...
> For tx scheduling, we can use swq_nonempty and deal with false positives.
> For power save we should only use ieee80211_sta_set_buffered if the
> driver itself has buffered some frames. Indication of packets in the
> mac80211 intermediate queue is already taken care of inside mac80211.
One more thing that I forgot in my previous reply: on PS wakeup, the
driver does not need to schedule the intermediate queues itself -
mac80211 will call drv_wake_tx_queue if frames are pending.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 14:35       ` Felix Fietkau
@ 2016-06-17 17:45         ` Tim Shepard
  2016-06-17 19:16           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Tim Shepard @ 2016-06-17 17:45 UTC (permalink / raw)
  To: ath9k-devel



> I think we should finish intermediate queues support first and then look
> into the rename later.


Hmm... if the renaming is going to go in mainline, I feel pretty
strongly it should go in *before* a patch to switch over to use the
intermediate queues.  The whole point of the renaming was to make the
code that uses the intermediate queues much more understandable
(avoiding the unfortuante collision of "txq" meaning two different
things throughout the code).

Once it is all done and everyone's done reading and trying to
understand this code, there's much less reason to do the renaming.


Toke, how do you feel about this at this point?

I'm asking because I hope to have a new version of my patch soon
(fixing a bug in how it handles tid->hwq->pending_frames and
hq_max_pending[*] ), and I need to decide whether I should do it the
way I did last time (with the renaming txq in ath9k first) or produce
a new patch that is more like Toke's reworking of my patch.

Hmm...

			-Tim Shepard
			 shep at alum.mit.edu

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues
  2016-06-17 17:45         ` Tim Shepard
@ 2016-06-17 19:16           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-17 19:16 UTC (permalink / raw)
  To: ath9k-devel

Tim Shepard <shep@alum.mit.edu> writes:

> Hmm... if the renaming is going to go in mainline, I feel pretty
> strongly it should go in *before* a patch to switch over to use the
> intermediate queues.  The whole point of the renaming was to make the
> code that uses the intermediate queues much more understandable
> (avoiding the unfortuante collision of "txq" meaning two different
> things throughout the code).
>
> Once it is all done and everyone's done reading and trying to
> understand this code, there's much less reason to do the renaming.
>
> Toke, how do you feel about this at this point?

I'm fine with not renaming things for now. Been looking at the current
code enough that it doesn't bother me.

Oh, and you can hide most of the ieee80211_txq stuff behind macros, so
it doesn't have to be all over the code. Makes the patch set smaller
too...

> I'm asking because I hope to have a new version of my patch soon
> (fixing a bug in how it handles tid->hwq->pending_frames and
> hq_max_pending[*] ),

Cool. I started looking into what it will take to do a full conversion
(getting rid of the old TX path). Not quite there yet (to say the
least), so if you have a less buggy base I can work from that would be
cool ;)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-06-17  9:09 ` [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues Toke Høiland-Jørgensen
  2016-06-17 13:28   ` Felix Fietkau
@ 2016-06-18 19:06   ` Toke Høiland-Jørgensen
  2016-06-19  3:17     ` Tim Shepard
                       ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-18 19:06 UTC (permalink / raw)
  To: ath9k-devel

This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.

The old code path in ath_tx_start that would queue packets has been
disabled and turned into a WARN_ON() and failure. Figure it can be
removed in a v2 (or kept and removed later).

Based on Tim's original patch set, but reworked quite thoroughly.

Cc: Tim Shepard <shep@alum.mit.edu>
Cc: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
---
 drivers/net/wireless/ath/ath9k/ath9k.h     |   8 +-
 drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
 drivers/net/wireless/ath/ath9k/init.c      |   1 +
 drivers/net/wireless/ath/ath9k/main.c      |   1 +
 drivers/net/wireless/ath/ath9k/xmit.c      | 254 ++++++++++++++---------------
 5 files changed, 129 insertions(+), 139 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 5294595..b9cdf20 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -145,7 +145,9 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define BAW_WITHIN(_start, _bawsz, _seqno) \
 	((((_seqno) - (_start)) & 4095) < (_bawsz))
 
-#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
+#define ATH_STA_2_TID(_sta, _tidno) ((struct ath_atx_tid *)(_sta)->txq[_tidno]->drv_priv)
+#define ATH_VIF_2_TID(_vif) ((struct ath_atx_tid *)(_vif)->txq->drv_priv)
+#define ATH_AN_2_TID(_an, _tidno) ((_an)->sta ? ATH_STA_2_TID((_an)->sta, _tidno) : ATH_VIF_2_TID((_an)->vif))
 
 #define IS_HT_RATE(rate)   (rate & 0x80)
 #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
@@ -232,7 +234,6 @@ struct ath_buf {
 
 struct ath_atx_tid {
 	struct list_head list;
-	struct sk_buff_head buf_q;
 	struct sk_buff_head retry_q;
 	struct ath_node *an;
 	struct ath_txq *txq;
@@ -247,13 +248,13 @@ struct ath_atx_tid {
 	s8 bar_index;
 	bool active;
 	bool clear_ps_filter;
+	bool has_queued;
 };
 
 struct ath_node {
 	struct ath_softc *sc;
 	struct ieee80211_sta *sta; /* station struct we're part of */
 	struct ieee80211_vif *vif; /* interface with which we're associated */
-	struct ath_atx_tid tid[IEEE80211_NUM_TIDS];
 
 	u16 maxampdu;
 	u8 mpdudensity;
@@ -585,6 +586,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 				   u16 tids, int nframes,
 				   enum ieee80211_frame_release_type reason,
 				   bool more_data);
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue);
 
 /********/
 /* VIFs */
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index c2ca57a..d789798 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -52,8 +52,8 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
 			 "TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
 			 "BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 		ath_txq_lock(sc, txq);
 		if (tid->active) {
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 1c226d6..1434018 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -867,6 +867,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
 	hw->max_rate_tries = 10;
 	hw->sta_data_size = sizeof(struct ath_node);
 	hw->vif_data_size = sizeof(struct ath_vif);
+	hw->txq_data_size = sizeof(struct ath_atx_tid);
 	hw->extra_tx_headroom = 4;
 
 	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 3aed43a..f584e19 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2673,4 +2673,5 @@ struct ieee80211_ops ath9k_ops = {
 	.sw_scan_start	    = ath9k_sw_scan_start,
 	.sw_scan_complete   = ath9k_sw_scan_complete,
 	.get_txpower        = ath9k_get_txpower,
+	.wake_tx_queue      = ath9k_wake_tx_queue,
 };
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index fe795fc..81fd480 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
 					   struct ath_txq *txq,
 					   struct ath_atx_tid *tid,
 					   struct sk_buff *skb);
+static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
+			  struct ath_tx_control *txctl);
 
 enum {
 	MCS_HT20,
@@ -118,6 +120,26 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
 		list_add_tail(&tid->list, list);
 }
 
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue)
+{
+	struct ath_softc *sc = hw->priv;
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	struct ath_atx_tid *tid = (struct ath_atx_tid *) queue->drv_priv;
+	struct ath_txq *txq = tid->txq;
+
+	ath_dbg(common, QUEUE, "Waking TX queue: %pM (%d)\n",
+		queue->sta ? queue->sta->addr : queue->vif->addr,
+		tid->tidno);
+
+	ath_txq_lock(sc, txq);
+
+	tid->has_queued = true;
+	ath_tx_queue_tid(sc, txq, tid);
+	ath_txq_schedule(sc, txq);
+
+	ath_txq_unlock(sc, txq);
+}
+
 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
 {
 	struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
@@ -173,9 +195,47 @@ ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
 	return ATH_AN_2_TID(an, tidno);
 }
 
+static struct sk_buff *
+ath_tid_pull(struct ath_atx_tid *tid)
+{
+	struct ath_softc *sc = tid->an->sc;
+	struct ieee80211_hw *hw = sc->hw;
+	struct ath_tx_control txctl = {
+		.txq = tid->txq,
+		.sta = tid->an->sta,
+	};
+	struct sk_buff *skb;
+	struct ath_frame_info *fi;
+	int q;
+
+	if (!tid->has_queued)
+		return NULL;
+
+	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
+	if (!skb) {
+		tid->has_queued = false;
+		return NULL;
+	}
+
+	if (ath_tx_prepare(hw, skb, &txctl)) {
+		ieee80211_free_txskb(hw, skb);
+		return NULL;
+	}
+
+	q = skb_get_queue_mapping(skb);
+	if (tid->txq == sc->tx.txq_map[q]) {
+		fi = get_frame_info(skb);
+		fi->txq = q;
+		++tid->txq->pending_frames;
+	}
+
+	return skb;
+ }
+
+
 static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
 {
-	return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
+	return !skb_queue_empty(&tid->retry_q) || tid->has_queued;
 }
 
 static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
@@ -184,46 +244,11 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
 
 	skb = __skb_dequeue(&tid->retry_q);
 	if (!skb)
-		skb = __skb_dequeue(&tid->buf_q);
+		skb = ath_tid_pull(tid);
 
 	return skb;
 }
 
-/*
- * ath_tx_tid_change_state:
- * - clears a-mpdu flag of previous session
- * - force sequence number allocation to fix next BlockAck Window
- */
-static void
-ath_tx_tid_change_state(struct ath_softc *sc, struct ath_atx_tid *tid)
-{
-	struct ath_txq *txq = tid->txq;
-	struct ieee80211_tx_info *tx_info;
-	struct sk_buff *skb, *tskb;
-	struct ath_buf *bf;
-	struct ath_frame_info *fi;
-
-	skb_queue_walk_safe(&tid->buf_q, skb, tskb) {
-		fi = get_frame_info(skb);
-		bf = fi->bf;
-
-		tx_info = IEEE80211_SKB_CB(skb);
-		tx_info->flags &= ~IEEE80211_TX_CTL_AMPDU;
-
-		if (bf)
-			continue;
-
-		bf = ath_tx_setup_buffer(sc, txq, tid, skb);
-		if (!bf) {
-			__skb_unlink(skb, &tid->buf_q);
-			ath_txq_skb_done(sc, txq, skb);
-			ieee80211_free_txskb(sc->hw, skb);
-			continue;
-		}
-	}
-
-}
-
 static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
 {
 	struct ath_txq *txq = tid->txq;
@@ -858,7 +883,7 @@ static int ath_compute_num_delims(struct ath_softc *sc, struct ath_atx_tid *tid,
 
 static struct ath_buf *
 ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
-			struct ath_atx_tid *tid, struct sk_buff_head **q)
+			struct ath_atx_tid *tid)
 {
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
@@ -867,11 +892,7 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	u16 seqno;
 
 	while (1) {
-		*q = &tid->retry_q;
-		if (skb_queue_empty(*q))
-			*q = &tid->buf_q;
-
-		skb = skb_peek(*q);
+		skb = ath_tid_dequeue(tid);
 		if (!skb)
 			break;
 
@@ -883,7 +904,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 			bf->bf_state.stale = false;
 
 		if (!bf) {
-			__skb_unlink(skb, *q);
 			ath_txq_skb_done(sc, txq, skb);
 			ieee80211_free_txskb(sc->hw, skb);
 			continue;
@@ -912,8 +932,18 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 		seqno = bf->bf_state.seqno;
 
 		/* do not step over block-ack window */
-		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
+		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
+			__skb_queue_tail(&tid->retry_q, skb);
+
+			/* If there are other skbs in the retry q, they are
+			 * probably within the BAW, so loop immediately to get
+			 * one of them. Otherwise the queue can get stuck.
+			 *
+			 * FIXME: Do we need to protect against looping forever? */
+			if (!skb_queue_is_first(&tid->retry_q, skb))
+				continue;
 			break;
+		}
 
 		if (tid->bar_index > ATH_BA_INDEX(tid->seq_start, seqno)) {
 			struct ath_tx_status ts = {};
@@ -921,7 +951,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 
 			INIT_LIST_HEAD(&bf_head);
 			list_add(&bf->list, &bf_head);
-			__skb_unlink(skb, *q);
 			ath_tx_update_baw(sc, tid, seqno);
 			ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0);
 			continue;
@@ -933,11 +962,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	return NULL;
 }
 
-static bool
+static int
 ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		 struct ath_atx_tid *tid, struct list_head *bf_q,
-		 struct ath_buf *bf_first, struct sk_buff_head *tid_q,
-		 int *aggr_len)
+		 struct ath_buf *bf_first)
 {
 #define PADBYTES(_len) ((4 - ((_len) % 4)) % 4)
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
@@ -947,12 +975,13 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
 	struct sk_buff *skb;
-	bool closed = false;
+
 
 	bf = bf_first;
 	aggr_limit = ath_lookup_rate(sc, bf, tid);
 
-	do {
+	while (bf)
+	{
 		skb = bf->bf_mpdu;
 		fi = get_frame_info(skb);
 
@@ -961,12 +990,12 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes) {
 			if (aggr_limit < al + bpad + al_delta ||
 			    ath_lookup_legacy(bf) || nframes >= h_baw)
-				break;
+				goto stop;
 
 			tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 			if ((tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) ||
 			    !(tx_info->flags & IEEE80211_TX_CTL_AMPDU))
-				break;
+				goto stop;
 		}
 
 		/* add padding for previous frame to aggregation length */
@@ -988,20 +1017,18 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 			ath_tx_addto_baw(sc, tid, bf);
 		bf->bf_state.ndelim = ndelim;
 
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
 
 		bf_prev = bf;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
-		if (!bf) {
-			closed = true;
-			break;
-		}
-	} while (ath_tid_has_buffered(tid));
-
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
+	}
+	goto finish;
+stop:
+	__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
+finish:
 	bf = bf_first;
 	bf->bf_lastbf = bf_prev;
 
@@ -1012,9 +1039,7 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		TX_STAT_INC(txq->axq_qnum, a_aggr);
 	}
 
-	*aggr_len = al;
-
-	return closed;
+	return al;
 #undef PADBYTES
 }
 
@@ -1391,18 +1416,15 @@ static void ath_tx_fill_desc(struct ath_softc *sc, struct ath_buf *bf,
 static void
 ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		  struct ath_atx_tid *tid, struct list_head *bf_q,
-		  struct ath_buf *bf_first, struct sk_buff_head *tid_q)
+		  struct ath_buf *bf_first)
 {
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
-	struct sk_buff *skb;
 	int nframes = 0;
 
 	do {
 		struct ieee80211_tx_info *tx_info;
-		skb = bf->bf_mpdu;
 
 		nframes++;
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
@@ -1411,13 +1433,15 @@ ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes >= 2)
 			break;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
 		if (!bf)
 			break;
 
 		tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
-		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU)
+		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU) {
+			__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 			break;
+		}
 
 		ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	} while (1);
@@ -1428,34 +1452,33 @@ static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
 {
 	struct ath_buf *bf;
 	struct ieee80211_tx_info *tx_info;
-	struct sk_buff_head *tid_q;
 	struct list_head bf_q;
 	int aggr_len = 0;
-	bool aggr, last = true;
+	bool aggr;
 
 	if (!ath_tid_has_buffered(tid))
 		return false;
 
 	INIT_LIST_HEAD(&bf_q);
 
-	bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+	bf = ath_tx_get_tid_subframe(sc, txq, tid);
 	if (!bf)
 		return false;
 
 	tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 	aggr = !!(tx_info->flags & IEEE80211_TX_CTL_AMPDU);
 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
-		(!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 		*stop = true;
 		return false;
 	}
 
 	ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	if (aggr)
-		last = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf,
-					tid_q, &aggr_len);
+		aggr_len = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf);
 	else
-		ath_tx_form_burst(sc, txq, tid, &bf_q, bf, tid_q);
+		ath_tx_form_burst(sc, txq, tid, &bf_q, bf);
 
 	if (list_empty(&bf_q))
 		return false;
@@ -1498,9 +1521,6 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
 		an->mpdudensity = density;
 	}
 
-	/* force sequence number allocation for pending frames */
-	ath_tx_tid_change_state(sc, txtid);
-
 	txtid->active = true;
 	*ssn = txtid->seq_start = txtid->seq_next;
 	txtid->bar_index = -1;
@@ -1525,7 +1545,6 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
 	ath_txq_lock(sc, txq);
 	txtid->active = false;
 	ath_tx_flush_tid(sc, txtid);
-	ath_tx_tid_change_state(sc, txtid);
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1535,14 +1554,12 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
-	bool buffered;
 	int tidno;
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -1552,13 +1569,9 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 			continue;
 		}
 
-		buffered = ath_tid_has_buffered(tid);
-
 		list_del_init(&tid->list);
 
 		ath_txq_unlock(sc, txq);
-
-		ieee80211_sta_set_buffered(sta, tidno, buffered);
 	}
 }
 
@@ -1571,19 +1584,12 @@ void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
 		tid->clear_ps_filter = true;
-
-		if (ath_tid_has_buffered(tid)) {
-			ath_tx_queue_tid(sc, txq, tid);
-			ath_txq_schedule(sc, txq);
-		}
-
 		ath_txq_unlock_complete(sc, txq);
 	}
 }
@@ -1606,11 +1612,6 @@ void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
 
 	tid->baw_size = IEEE80211_MIN_AMPDU_BUF << sta->ht_cap.ampdu_factor;
 
-	if (ath_tid_has_buffered(tid)) {
-		ath_tx_queue_tid(sc, txq, tid);
-		ath_txq_schedule(sc, txq);
-	}
-
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1626,7 +1627,6 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 	struct ieee80211_tx_info *info;
 	struct list_head bf_q;
 	struct ath_buf *bf_tail = NULL, *bf;
-	struct sk_buff_head *tid_q;
 	int sent = 0;
 	int i;
 
@@ -1641,11 +1641,10 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 
 		ath_txq_lock(sc, tid->txq);
 		while (nframes > 0) {
-			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid, &tid_q);
+			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid);
 			if (!bf)
 				break;
 
-			__skb_unlink(bf->bf_mpdu, tid_q);
 			list_add_tail(&bf->list, &bf_q);
 			ath_set_rates(tid->an->vif, tid->an->sta, bf);
 			if (bf_isampdu(bf)) {
@@ -1660,7 +1659,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 			sent++;
 			TX_STAT_INC(txq->axq_qnum, a_queued_hw);
 
-			if (an->sta && !ath_tid_has_buffered(tid))
+			if (an->sta && skb_queue_empty(&tid->retry_q))
 				ieee80211_sta_set_buffered(an->sta, i, false);
 		}
 		ath_txq_unlock_complete(sc, tid->txq);
@@ -2349,30 +2348,13 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 		skip_uapsd = true;
 	}
 
-	if (txctl->an && queue)
-		tid = ath_get_skb_tid(sc, txctl->an, skb);
-
 	if (!skip_uapsd && ps_resp) {
 		ath_txq_unlock(sc, txq);
 		txq = sc->tx.uapsdq;
 		ath_txq_lock(sc, txq);
-	} else if (txctl->an && queue) {
-		WARN_ON(tid->txq != txctl->txq);
-
-		if (info->flags & IEEE80211_TX_CTL_CLEAR_PS_FILT)
-			tid->clear_ps_filter = true;
-
-		/*
-		 * Add this frame to software queue for scheduling later
-		 * for aggregation.
-		 */
-		TX_STAT_INC(txq->axq_qnum, a_queued_sw);
-		__skb_queue_tail(&tid->buf_q, skb);
-		if (!txctl->an->sleeping)
-			ath_tx_queue_tid(sc, txq, tid);
-
-		ath_txq_schedule(sc, txq);
-		goto out;
+	} else if(WARN_ON(txctl->an && queue)) {
+		ath_txq_unlock(sc, txq);
+		return -EINVAL;
 	}
 
 	bf = ath_tx_setup_buffer(sc, txq, tid, skb);
@@ -2856,9 +2838,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 	struct ath_atx_tid *tid;
 	int tidno, acno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS;
-	     tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		tid->an        = an;
 		tid->tidno     = tidno;
 		tid->seq_start = tid->seq_next = 0;
@@ -2866,11 +2847,14 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 		tid->baw_head  = tid->baw_tail = 0;
 		tid->active	   = false;
 		tid->clear_ps_filter = true;
-		__skb_queue_head_init(&tid->buf_q);
+		tid->has_queued  = false;
 		__skb_queue_head_init(&tid->retry_q);
 		INIT_LIST_HEAD(&tid->list);
 		acno = TID_TO_WME_AC(tidno);
 		tid->txq = sc->tx.txq_map[acno];
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
@@ -2880,9 +2864,8 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 	struct ath_txq *txq;
 	int tidno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -2894,6 +2877,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 		tid->active = false;
 
 		ath_txq_unlock(sc, txq);
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-06-18 19:06   ` [ath9k-devel] [PATCH] ath9k: Switch to using " Toke Høiland-Jørgensen
@ 2016-06-19  3:17     ` Tim Shepard
  2016-06-19  8:52       ` Toke Høiland-Jørgensen
  2016-07-03  3:53     ` Tim Shepard
  2016-07-06 16:17     ` [ath9k-devel] [PATCH v2] " Toke Høiland-Jørgensen
  2 siblings, 1 reply; 50+ messages in thread
From: Tim Shepard @ 2016-06-19  3:17 UTC (permalink / raw)
  To: ath9k-devel




Oh cool.. I will try to understand this patch thoroughly in the next
couple of days.

My patch (both v1 and v2) have one or two bugs (depending on exactly
how you count bugs and/or my confusion) (that I know of).

At first glance my first bug appears to remain in your reworked patch:

>  
> +static struct sk_buff *
> +ath_tid_pull(struct ath_atx_tid *tid)
> +{
> +	struct ath_softc *sc = tid->an->sc;
> +	struct ieee80211_hw *hw = sc->hw;
> +	struct ath_tx_control txctl = {
> +		.txq = tid->txq,
> +		.sta = tid->an->sta,
> +	};
> +	struct sk_buff *skb;
> +	struct ath_frame_info *fi;
> +	int q;
> +
> +	if (!tid->has_queued)
> +		return NULL;
> +
> +	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
> +	if (!skb) {
> +		tid->has_queued = false;
> +		return NULL;
> +	}
> +
> +	if (ath_tx_prepare(hw, skb, &txctl)) {
> +		ieee80211_free_txskb(hw, skb);
> +		return NULL;
> +	}
> +
> +	q = skb_get_queue_mapping(skb);
> +	if (tid->txq == sc->tx.txq_map[q]) {
> +		fi = get_frame_info(skb);
> +		fi->txq = q;
> +		++tid->txq->pending_frames;
> +	}
> +
> +	return skb;
> + }
> +
> +

The increment of ->pending_frames lacks a corresponding check against
sc->tx.txq_max_pending to see if we've reached the limit.  (Which begs
the question: what to do if it has?)

I discovered this bug doing experiments by trying to turn down the
various /sys/kernel/debug/ieee80211/phy0/ath9k/qlen_* to low numbers
(including as low as one, and then even zero) and found it had no
effect.

OK, so that's one bug.


The second more mysterious bug which I'm still struggling to
understand is why doesn't large values in these ath9k/qlen_* (or more
accurately, given the first bug above, the failure to check these qlen
limit values at all) allow for increased hardware queue bloat (with
observable delay).  I suspect that is because the driver with my patch
to use the new intermediate queues is doing something silly like
failing to have more than one aggregate at a time hooked up in the
hardware transmit queue for transmission.  But I haven't figured out
what is really happening yet.  And this bug (depending on what exactly
it turns out to be) might make the low latency results some of you
have seen somewhat problematic to understand because it might be the
case that with my patch as it is (up to now) there's a flaw that leads
to low latency and gives up some throughput by simply failing to keep
the device busy transmitting packets when there are packets to send.

Fixing this bug might increase latency...  my plan all along has been
to put something in akin to autotuning like we have in bql/dql in
wired network interfaces.  Note the right amount to queue depends on
CPU performance capability in running the driver... that's why it
needs to be autotuned at run time.


But anyway, Toke, between struggling to understand this second bug and
some distractions I neglected to answer your question of almost two
weeks ago when you said:

> What's the symptom of this? As I said I haven't noticed anything, but it
> might be worth looking out for.

So now I've finally tried to answer that question.  Perhaps with your
recent work on this patch your head is loaded with context that might
be helpful in understanding this.

Tomorrow (after I get some sleep) I'm planning on taking a look at
what ath9k looks like with this patch of yours applied and see if that
makes it any easier to figure out what to do about the above bug(s) in
my original patch.

			-Tim Shepard
			 shep at alum.mit.edu

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-06-19  3:17     ` Tim Shepard
@ 2016-06-19  8:52       ` Toke Høiland-Jørgensen
  2016-06-19 13:40         ` Tim Shepard
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-19  8:52 UTC (permalink / raw)
  To: ath9k-devel

Tim Shepard <shep@alum.mit.edu> writes:

>> +static struct sk_buff *
>> +ath_tid_pull(struct ath_atx_tid *tid)
>> +{
>> +	struct ath_softc *sc = tid->an->sc;
>> +	struct ieee80211_hw *hw = sc->hw;
>> +	struct ath_tx_control txctl = {
>> +		.txq = tid->txq,
>> +		.sta = tid->an->sta,
>> +	};
>> +	struct sk_buff *skb;
>> +	struct ath_frame_info *fi;
>> +	int q;
>> +
>> +	if (!tid->has_queued)
>> +		return NULL;
>> +
>> +	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
>> +	if (!skb) {
>> +		tid->has_queued = false;
>> +		return NULL;
>> +	}
>> +
>> +	if (ath_tx_prepare(hw, skb, &txctl)) {
>> +		ieee80211_free_txskb(hw, skb);
>> +		return NULL;
>> +	}
>> +
>> +	q = skb_get_queue_mapping(skb);
>> +	if (tid->txq == sc->tx.txq_map[q]) {
>> +		fi = get_frame_info(skb);
>> +		fi->txq = q;
>> +		++tid->txq->pending_frames;
>> +	}
>> +
>> +	return skb;
>> + }
>> +
>> +
>
> The increment of ->pending_frames lacks a corresponding check against
> sc->tx.txq_max_pending to see if we've reached the limit.  (Which begs
> the question: what to do if it has?)
>
> I discovered this bug doing experiments by trying to turn down the
> various /sys/kernel/debug/ieee80211/phy0/ath9k/qlen_* to low numbers
> (including as low as one, and then even zero) and found it had no
> effect.

You're right that it doesn't check the max. However, this is less of a
problem now that there is no intermediate queueing in the driver; and
indeed the utility of haven the qlen_* tunables is somewhat questionable
with the patch applied: The only thing this is going to control is the
size of the retry queue, and possible limit the size of the retry queue.
The actual queueing is happening in the mac80211 layer, which these
tunables can't control (and which is not FQ-CoDel controlled in
mac80211-next). So it might actually be that simply removing the
tunables is the right thing to do with this patch.

Removing the limits would also probably mean getting rid of txq->stopped
and the calls to ieee80211_wake_queue() and ieee80211_stop_queue().
I suspect that is fine when using the mac80211 intermediate queues, but
I'm not sure.

Felix, care to comment? :)

> The second more mysterious bug which I'm still struggling to
> understand is why doesn't large values in these ath9k/qlen_* (or more
> accurately, given the first bug above, the failure to check these qlen
> limit values at all) allow for increased hardware queue bloat (with
> observable delay).

Because there's a second limit in play (which has always been there): in
ath_tx_sched_aggr() there is this check:

	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
		*stop = true;
		return false;
	}

The two constants are 2 and 8 respectively. This means that, with
aggregation enabled, no more than two full aggregates will be queued up.
The size of the aggregates is dynamically computed from the current
rate: they are limited a maximum of four milliseconds of (estimated)
airtime (for the BE queue; the others have different limits).

So in a sense there's already a dynamic limit on the hardware queues.
Now, whether four milliseconds is the right maximum aggregate size might
be worth discussing. It is the maximum allowed by the standard. Dave and
I have been 

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-06-19  8:52       ` Toke Høiland-Jørgensen
@ 2016-06-19 13:40         ` Tim Shepard
  2016-06-19 13:50           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Tim Shepard @ 2016-06-19 13:40 UTC (permalink / raw)
  To: ath9k-devel



> 
> You're right that it doesn't check the max. However, this is less of a
> problem now that there is no intermediate queueing in the driver; and
> indeed the utility of haven the qlen_* tunables is somewhat questionable
> with the patch applied: The only thing this is going to control is the
> size of the retry queue, and possible limit the size of the retry queue.
> [....]

The driver queues things up for the hardware to DMA and transmit.
Something has to limit the amount of packets handed over to the
hardware.  (We lack access to hardware documentation (grrrr!) but it
appears to me that the hardware has a hard limit on how many packets
can be handed to it.)

> > The second more mysterious bug which I'm still struggling to
> > understand is why doesn't large values in these ath9k/qlen_* (or more
> > accurately, given the first bug above, the failure to check these qlen
> > limit values at all) allow for increased hardware queue bloat (with
> > observable delay).
> 
> Because there's a second limit in play (which has always been there): in
> ath_tx_sched_aggr() there is this check:
> 
> 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
> 	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
> 		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
> 		*stop = true;
> 		return false;
> 	}
> 
> The two constants are 2 and 8 respectively. This means that, with
> aggregation enabled, no more than two full aggregates will be queued up.
> The size of the aggregates is dynamically computed from the current
> rate: they are limited a maximum of four milliseconds of (estimated)
> airtime (for the BE queue; the others have different limits).
> 
> So in a sense there's already a dynamic limit on the hardware queues.
> Now, whether four milliseconds is the right maximum aggregate size might
> be worth discussing. It is the maximum allowed by the standard. Dave and
> I have been 

Ah that may be the clue that I lacked.  There's got to be a dependency
on processor speed (how quickly the system and driver can get another
packet hooked up for transmission after completions) but perhaps with
aggregates being so large in time, with full aggregates even the
slowest processors are fast enough to avoid starvation.

If there's no aggregation, a limit of some sort is needed (probably to
prevent malfunction of the hardware/driver, but in any case to limit
excess latency).  And this limit will depend on processor speed (and
will need to be autotuned at runtime).


			-Tim Shepard
			 shep at alum.mit.edu

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-06-19 13:40         ` Tim Shepard
@ 2016-06-19 13:50           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-06-19 13:50 UTC (permalink / raw)
  To: ath9k-devel

Tim Shepard <shep@alum.mit.edu> writes:

>> 
>> You're right that it doesn't check the max. However, this is less of a
>> problem now that there is no intermediate queueing in the driver; and
>> indeed the utility of haven the qlen_* tunables is somewhat questionable
>> with the patch applied: The only thing this is going to control is the
>> size of the retry queue, and possible limit the size of the retry queue.
>> [....]
>
> The driver queues things up for the hardware to DMA and transmit.
> Something has to limit the amount of packets handed over to the
> hardware.  (We lack access to hardware documentation (grrrr!) but it
> appears to me that the hardware has a hard limit on how many packets
> can be handed to it.)

There's a ring buffer eight entries long that the aggregates (or
packets) are put on when actually being handed to the hardware.

This is in ath_txq->txq_fifo.

>> Because there's a second limit in play (which has always been there): in
>> ath_tx_sched_aggr() there is this check:
>> 
>> 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
>> 	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
>> 		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
>> 		*stop = true;
>> 		return false;
>> 	}
>> 
>> The two constants are 2 and 8 respectively. This means that, with
>> aggregation enabled, no more than two full aggregates will be queued up.
>> The size of the aggregates is dynamically computed from the current
>> rate: they are limited a maximum of four milliseconds of (estimated)
>> airtime (for the BE queue; the others have different limits).
>> 
>> So in a sense there's already a dynamic limit on the hardware queues.
>> Now, whether four milliseconds is the right maximum aggregate size might
>> be worth discussing. It is the maximum allowed by the standard. Dave and
>> I have been 
>
> Ah that may be the clue that I lacked.  There's got to be a dependency
> on processor speed (how quickly the system and driver can get another
> packet hooked up for transmission after completions) but perhaps with
> aggregates being so large in time, with full aggregates even the
> slowest processors are fast enough to avoid starvation.
>
> If there's no aggregation, a limit of some sort is needed (probably to
> prevent malfunction of the hardware/driver, but in any case to limit
> excess latency).  And this limit will depend on processor speed (and
> will need to be autotuned at runtime).

ATH_NON_AGGR_MIN_QDEPTH is 8 -- so yeah, the limit is higher if there is
no aggregation.

These are hard-coded values, so presumably they are large enough to keep
the hardware busy on most platforms (or someone would have noticed and
changed them?). So I doubt there is much to be gained to add a mechanism
to dynamically tune them (between 0 and 2?).

The exception being in case pulling from the mac80211 queue is too slow
to keep the hardware busy at the current settings. I see no problems
with this on my hardware, but that's an x86 box. I would probably hold
off on the dynamic tuning until having proven that there's actually a
bottleneck, though... ;)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-06-18 19:06   ` [ath9k-devel] [PATCH] ath9k: Switch to using " Toke Høiland-Jørgensen
  2016-06-19  3:17     ` Tim Shepard
@ 2016-07-03  3:53     ` Tim Shepard
  2016-07-04 17:47       ` Toke Høiland-Jørgensen
  2016-07-06 16:17     ` [ath9k-devel] [PATCH v2] " Toke Høiland-Jørgensen
  2 siblings, 1 reply; 50+ messages in thread
From: Tim Shepard @ 2016-07-03  3:53 UTC (permalink / raw)
  To: ath9k-devel




Toke,

I've been tesing your ath9k patch (using it instead of my earlier
ath9k patch) and plan to continue testing it.

Thanks for unconfusing me a couple weeks ago, and cluing me into how
the limit on ->pending_frames is not really relevant for the data
packets that go through the new intermediate queues.

But I'm not sure if this would allow us to remove the limit on
pending_frames because even though normal data packets would not
normally build up that many packets, there are other packet types
which bypass the intermediate queues and are transmitted directly
(also in most cases bypassing the ath9k internal queues in the way
ath9k worked before we patched it to use the mac80211 intermediate
queues).


Along similar lines, from reading the code I think your patch has
introduced a bug (but I don't know how to demonstrate it at runtime).

Looking in the body of ath_tx_start() at the result of applying your
patch, we now see this:

	[...]

	/* Force queueing of all frames that belong to a virtual interface on
	 * a different channel context, to ensure that they are sent on the
	 * correct channel.
	 */
	if (((avp && avp->chanctx != sc->cur_chan) ||
	     sc->cur_chan->stopped) && !txctl->force_channel) {
		if (!txctl->an)
			txctl->an = &avp->mcast_node;
		queue = true;
		skip_uapsd = true;
	}

	if (!skip_uapsd && ps_resp) {
		ath_txq_unlock(sc, txq);
		txq = sc->tx.uapsdq;
		ath_txq_lock(sc, txq);
	} else if(WARN_ON(txctl->an && queue)) 
		ath_txq_unlock(sc, txq);
		return -EINVAL;
	}

	[...]


In the case where the first if body above is run to force queuing of
all packets (not just normal data packets), then the else case of the
second if statement above will surely run and its if statement will
surely be true, so your new WARN_ON will happen.

This is why I left the previous ath9k internal queueing mechanisms in
place.  I couldn't figure out how to handle the above case without
leaving the ath9k internal queueing mechanisms.

I'm not sure how to test for this though... I don't know what sort of
configuration scenario I would need to set up to generate the above
situation and trigger the warning. (Presumably, it involves multiple
vifs on different channels.)  But unless the first if statement body
is dead code that can never happen, I think you've introduced a bug
here (with a good WARN_ON to make it obvious when it happens).

Earlier Felix said:

> Channel context based queue handling can be dealt with by
> stopping/starting relevant queues on channel context changes.

But I don't see how to handle the case here where packets get passed
to the driver with ath_tx_start() and wind up in the scenario above.


			-Tim Shepard
			 shep at alum.mit.edu

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-03  3:53     ` Tim Shepard
@ 2016-07-04 17:47       ` Toke Høiland-Jørgensen
  2016-07-06 13:23         ` Felix Fietkau
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-04 17:47 UTC (permalink / raw)
  To: ath9k-devel

Tim Shepard <shep@alum.mit.edu> writes:

> Thanks for unconfusing me a couple weeks ago, and cluing me into how
> the limit on ->pending_frames is not really relevant for the data
> packets that go through the new intermediate queues.
>
> But I'm not sure if this would allow us to remove the limit on
> pending_frames because even though normal data packets would not
> normally build up that many packets, there are other packet types
> which bypass the intermediate queues and are transmitted directly
> (also in most cases bypassing the ath9k internal queues in the way
> ath9k worked before we patched it to use the mac80211 intermediate
> queues).

Yes, but, well, since they're not queued they are not going to overflow
anything. The aggregation building logic stops at two queued aggregates,
so the default limit of 123 packets is never going to be hit when the
queue is moved into the mac80211 layer. So keeping the knobs around only
helps people who purposefully want to cripple their ability to do
aggregation; and it won't be doing what it promises (limiting qlen),
since that is now moved out of the driver. So IMO, removing the knobs is
the right thing to do. I have already updated my patch to do so, which
I'll send as a v2 once the other bits are resolved.

> Along similar lines, from reading the code I think your patch has
> introduced a bug (but I don't know how to demonstrate it at runtime).
>
> Looking in the body of ath_tx_start() at the result of applying your
> patch, we now see this:
>
> 	[...]
>
> 	/* Force queueing of all frames that belong to a virtual interface on
> 	 * a different channel context, to ensure that they are sent on the
> 	 * correct channel.
> 	 */
> 	if (((avp && avp->chanctx != sc->cur_chan) ||
> 	     sc->cur_chan->stopped) && !txctl->force_channel) {
> 		if (!txctl->an)
> 			txctl->an = &avp->mcast_node;
> 		queue = true;
> 		skip_uapsd = true;
> 	}
>
> 	if (!skip_uapsd && ps_resp) {
> 		ath_txq_unlock(sc, txq);
> 		txq = sc->tx.uapsdq;
> 		ath_txq_lock(sc, txq);
> 	} else if(WARN_ON(txctl->an && queue)) 
> 		ath_txq_unlock(sc, txq);
> 		return -EINVAL;
> 	}
>
> 	[...]
>
> In the case where the first if body above is run to force queuing of
> all packets (not just normal data packets), then the else case of the
> second if statement above will surely run and its if statement will
> surely be true, so your new WARN_ON will happen.

Yup, I'm aware of that (and it's why I put in the WARN_ON instead of
just removing those code paths). Haven't seen it trigger yet, but
haven't tried very hard either. Guess you're right that it requires
vifs on different channels...

> Earlier Felix said:
>
>> Channel context based queue handling can be dealt with by
>> stopping/starting relevant queues on channel context changes.
>
> But I don't see how to handle the case here where packets get passed
> to the driver with ath_tx_start() and wind up in the scenario above.

Well, presumably the upper layers won't try to transmit anything through
the old TX path if the start/stop logic is implemented properly. The
chanctx code already seems to call the ieee80211_{start,stop}_queue()
functions when changing context, so not sure what else is needed. Guess
I'll go see if I can provoke an actual triggering of the bug, unless
Felix elaborates on what he means before I get around to it (poke,
Felix? :)).

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-04 17:47       ` Toke Høiland-Jørgensen
@ 2016-07-06 13:23         ` Felix Fietkau
  2016-07-06 14:46           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Felix Fietkau @ 2016-07-06 13:23 UTC (permalink / raw)
  To: ath9k-devel

On 2016-07-04 19:46, Toke H?iland-J?rgensen wrote:
> Tim Shepard <shep@alum.mit.edu> writes:
> 
>> Thanks for unconfusing me a couple weeks ago, and cluing me into how
>> the limit on ->pending_frames is not really relevant for the data
>> packets that go through the new intermediate queues.
>>
>> But I'm not sure if this would allow us to remove the limit on
>> pending_frames because even though normal data packets would not
>> normally build up that many packets, there are other packet types
>> which bypass the intermediate queues and are transmitted directly
>> (also in most cases bypassing the ath9k internal queues in the way
>> ath9k worked before we patched it to use the mac80211 intermediate
>> queues).
> 
> Yes, but, well, since they're not queued they are not going to overflow
> anything. The aggregation building logic stops at two queued aggregates,
> so the default limit of 123 packets is never going to be hit when the
> queue is moved into the mac80211 layer. So keeping the knobs around only
> helps people who purposefully want to cripple their ability to do
> aggregation; and it won't be doing what it promises (limiting qlen),
> since that is now moved out of the driver. So IMO, removing the knobs is
> the right thing to do. I have already updated my patch to do so, which
> I'll send as a v2 once the other bits are resolved.
I agree.

>> Earlier Felix said:
>>
>>> Channel context based queue handling can be dealt with by
>>> stopping/starting relevant queues on channel context changes.
>>
>> But I don't see how to handle the case here where packets get passed
>> to the driver with ath_tx_start() and wind up in the scenario above.
> 
> Well, presumably the upper layers won't try to transmit anything through
> the old TX path if the start/stop logic is implemented properly. The
> chanctx code already seems to call the ieee80211_{start,stop}_queue()
> functions when changing context, so not sure what else is needed. Guess
> I'll go see if I can provoke an actual triggering of the bug, unless
> Felix elaborates on what he means before I get around to it (poke,
> Felix? :)).
Then I guess the logic in ath_tx_start was a leftover from a time before
some queue related rework happened to the chanctx code.
In that case you can simply remove the chanctx related software queueing
stuff from ath_tx_start.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 13:23         ` Felix Fietkau
@ 2016-07-06 14:46           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-06 14:46 UTC (permalink / raw)
  To: ath9k-devel

Felix Fietkau <nbd@nbd.name> writes:

>> Well, presumably the upper layers won't try to transmit anything through
>> the old TX path if the start/stop logic is implemented properly. The
>> chanctx code already seems to call the ieee80211_{start,stop}_queue()
>> functions when changing context, so not sure what else is needed. Guess
>> I'll go see if I can provoke an actual triggering of the bug, unless
>> Felix elaborates on what he means before I get around to it (poke,
>> Felix? :)).
> Then I guess the logic in ath_tx_start was a leftover from a time before
> some queue related rework happened to the chanctx code.
> In that case you can simply remove the chanctx related software queueing
> stuff from ath_tx_start.

Awesome. I'll double-check that I can't get the WARN_ON to trigger, then
send a v2 :)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v2] ath9k: Switch to using mac80211 intermediate software queues.
  2016-06-18 19:06   ` [ath9k-devel] [PATCH] ath9k: Switch to using " Toke Høiland-Jørgensen
  2016-06-19  3:17     ` Tim Shepard
  2016-07-03  3:53     ` Tim Shepard
@ 2016-07-06 16:17     ` Toke Høiland-Jørgensen
  2016-07-06 18:13       ` Felix Fietkau
                         ` (2 more replies)
  2 siblings, 3 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-06 16:17 UTC (permalink / raw)
  To: ath9k-devel

This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.

The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).

Based on Tim's original patch set, but reworked quite thoroughly.

Cc: Tim Shepard <shep@alum.mit.edu>
Cc: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
---
Changes since v1:
  - Remove the old intermediate queueing logic completely instead of
    just disabling it.
  - Remove the qlen debug tunables.
  - Remove the force_channel parameter from struct txctl (since we just
    removed the code path that was using it).

 drivers/net/wireless/ath/ath9k/ath9k.h     |  12 +-
 drivers/net/wireless/ath/ath9k/channel.c   |   2 -
 drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
 drivers/net/wireless/ath/ath9k/debug.h     |   2 -
 drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
 drivers/net/wireless/ath/ath9k/init.c      |   2 +-
 drivers/net/wireless/ath/ath9k/main.c      |   1 +
 drivers/net/wireless/ath/ath9k/xmit.c      | 307 +++++++++++------------------
 8 files changed, 130 insertions(+), 214 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 5294595..daf972c 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -91,7 +91,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define ATH_RXBUF               512
 #define ATH_TXBUF               512
 #define ATH_TXBUF_RESERVE       5
-#define ATH_MAX_QDEPTH          (ATH_TXBUF / 4 - ATH_TXBUF_RESERVE)
 #define ATH_TXMAXTRY            13
 #define ATH_MAX_SW_RETRIES      30
 
@@ -145,7 +144,9 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define BAW_WITHIN(_start, _bawsz, _seqno) \
 	((((_seqno) - (_start)) & 4095) < (_bawsz))
 
-#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
+#define ATH_STA_2_TID(_sta, _tidno) ((struct ath_atx_tid *)(_sta)->txq[_tidno]->drv_priv)
+#define ATH_VIF_2_TID(_vif) ((struct ath_atx_tid *)(_vif)->txq->drv_priv)
+#define ATH_AN_2_TID(_an, _tidno) ((_an)->sta ? ATH_STA_2_TID((_an)->sta, _tidno) : ATH_VIF_2_TID((_an)->vif))
 
 #define IS_HT_RATE(rate)   (rate & 0x80)
 #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
@@ -164,7 +165,6 @@ struct ath_txq {
 	spinlock_t axq_lock;
 	u32 axq_depth;
 	u32 axq_ampdu_depth;
-	bool stopped;
 	bool axq_tx_inprogress;
 	struct list_head txq_fifo[ATH_TXFIFO_DEPTH];
 	u8 txq_headidx;
@@ -232,7 +232,6 @@ struct ath_buf {
 
 struct ath_atx_tid {
 	struct list_head list;
-	struct sk_buff_head buf_q;
 	struct sk_buff_head retry_q;
 	struct ath_node *an;
 	struct ath_txq *txq;
@@ -247,13 +246,13 @@ struct ath_atx_tid {
 	s8 bar_index;
 	bool active;
 	bool clear_ps_filter;
+	bool has_queued;
 };
 
 struct ath_node {
 	struct ath_softc *sc;
 	struct ieee80211_sta *sta; /* station struct we're part of */
 	struct ieee80211_vif *vif; /* interface with which we're associated */
-	struct ath_atx_tid tid[IEEE80211_NUM_TIDS];
 
 	u16 maxampdu;
 	u8 mpdudensity;
@@ -276,7 +275,6 @@ struct ath_tx_control {
 	struct ath_node *an;
 	struct ieee80211_sta *sta;
 	u8 paprd;
-	bool force_channel;
 };
 
 
@@ -293,7 +291,6 @@ struct ath_tx {
 	struct ath_descdma txdma;
 	struct ath_txq *txq_map[IEEE80211_NUM_ACS];
 	struct ath_txq *uapsdq;
-	u32 txq_max_pending[IEEE80211_NUM_ACS];
 	u16 max_aggr_framelen[IEEE80211_NUM_ACS][4][32];
 };
 
@@ -585,6 +582,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 				   u16 tids, int nframes,
 				   enum ieee80211_frame_release_type reason,
 				   bool more_data);
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue);
 
 /********/
 /* VIFs */
diff --git a/drivers/net/wireless/ath/ath9k/channel.c b/drivers/net/wireless/ath/ath9k/channel.c
index 319cb5f..a5ce016 100644
--- a/drivers/net/wireless/ath/ath9k/channel.c
+++ b/drivers/net/wireless/ath/ath9k/channel.c
@@ -1007,7 +1007,6 @@ static void ath_scan_send_probe(struct ath_softc *sc,
 		goto error;
 
 	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
-	txctl.force_channel = true;
 	if (ath_tx_start(sc->hw, skb, &txctl))
 		goto error;
 
@@ -1130,7 +1129,6 @@ ath_chanctx_send_vif_ps_frame(struct ath_softc *sc, struct ath_vif *avp,
 	memset(&txctl, 0, sizeof(txctl));
 	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
 	txctl.sta = sta;
-	txctl.force_channel = true;
 	if (ath_tx_start(sc->hw, skb, &txctl)) {
 		ieee80211_free_txskb(sc->hw, skb);
 		return false;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index 6de64cf..48b181d 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -600,7 +600,6 @@ static int read_file_xmit(struct seq_file *file, void *data)
 	PR("MPDUs XRetried:  ", xretries);
 	PR("Aggregates:      ", a_aggr);
 	PR("AMPDUs Queued HW:", a_queued_hw);
-	PR("AMPDUs Queued SW:", a_queued_sw);
 	PR("AMPDUs Completed:", a_completed);
 	PR("AMPDUs Retried:  ", a_retries);
 	PR("AMPDUs XRetried: ", a_xretries);
@@ -629,8 +628,7 @@ static void print_queue(struct ath_softc *sc, struct ath_txq *txq,
 	seq_printf(file, "%s: %d ", "qnum", txq->axq_qnum);
 	seq_printf(file, "%s: %2d ", "qdepth", txq->axq_depth);
 	seq_printf(file, "%s: %2d ", "ampdu-depth", txq->axq_ampdu_depth);
-	seq_printf(file, "%s: %3d ", "pending", txq->pending_frames);
-	seq_printf(file, "%s: %d\n", "stopped", txq->stopped);
+	seq_printf(file, "%s: %3d\n", "pending", txq->pending_frames);
 
 	ath_txq_unlock(sc, txq);
 }
@@ -1190,7 +1188,6 @@ static const char ath9k_gstrings_stats[][ETH_GSTRING_LEN] = {
 	AMKSTR(d_tx_mpdu_xretries),
 	AMKSTR(d_tx_aggregates),
 	AMKSTR(d_tx_ampdus_queued_hw),
-	AMKSTR(d_tx_ampdus_queued_sw),
 	AMKSTR(d_tx_ampdus_completed),
 	AMKSTR(d_tx_ampdu_retries),
 	AMKSTR(d_tx_ampdu_xretries),
@@ -1270,7 +1267,6 @@ void ath9k_get_et_stats(struct ieee80211_hw *hw,
 	AWDATA(xretries);
 	AWDATA(a_aggr);
 	AWDATA(a_queued_hw);
-	AWDATA(a_queued_sw);
 	AWDATA(a_completed);
 	AWDATA(a_retries);
 	AWDATA(a_xretries);
@@ -1328,14 +1324,6 @@ int ath9k_init_debug(struct ath_hw *ah)
 				    read_file_xmit);
 	debugfs_create_devm_seqfile(sc->dev, "queues", sc->debug.debugfs_phy,
 				    read_file_queues);
-	debugfs_create_u32("qlen_bk", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_BK]);
-	debugfs_create_u32("qlen_be", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_BE]);
-	debugfs_create_u32("qlen_vi", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_VI]);
-	debugfs_create_u32("qlen_vo", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_VO]);
 	debugfs_create_devm_seqfile(sc->dev, "misc", sc->debug.debugfs_phy,
 				    read_file_misc);
 	debugfs_create_devm_seqfile(sc->dev, "reset", sc->debug.debugfs_phy,
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index cd68c5f..a078cdd 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -147,7 +147,6 @@ struct ath_interrupt_stats {
  * @completed: Total MPDUs (non-aggr) completed
  * @a_aggr: Total no. of aggregates queued
  * @a_queued_hw: Total AMPDUs queued to hardware
- * @a_queued_sw: Total AMPDUs queued to software queues
  * @a_completed: Total AMPDUs completed
  * @a_retries: No. of AMPDUs retried (SW)
  * @a_xretries: No. of AMPDUs dropped due to xretries
@@ -174,7 +173,6 @@ struct ath_tx_stats {
 	u32 xretries;
 	u32 a_aggr;
 	u32 a_queued_hw;
-	u32 a_queued_sw;
 	u32 a_completed;
 	u32 a_retries;
 	u32 a_xretries;
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index c2ca57a..d789798 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -52,8 +52,8 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
 			 "TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
 			 "BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 		ath_txq_lock(sc, txq);
 		if (tid->active) {
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 1c226d6..752cacb 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -354,7 +354,6 @@ static int ath9k_init_queues(struct ath_softc *sc)
 	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
 		sc->tx.txq_map[i] = ath_txq_setup(sc, ATH9K_TX_QUEUE_DATA, i);
 		sc->tx.txq_map[i]->mac80211_qnum = i;
-		sc->tx.txq_max_pending[i] = ATH_MAX_QDEPTH;
 	}
 	return 0;
 }
@@ -867,6 +866,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
 	hw->max_rate_tries = 10;
 	hw->sta_data_size = sizeof(struct ath_node);
 	hw->vif_data_size = sizeof(struct ath_vif);
+	hw->txq_data_size = sizeof(struct ath_atx_tid);
 	hw->extra_tx_headroom = 4;
 
 	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 3aed43a..f584e19 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2673,4 +2673,5 @@ struct ieee80211_ops ath9k_ops = {
 	.sw_scan_start	    = ath9k_sw_scan_start,
 	.sw_scan_complete   = ath9k_sw_scan_complete,
 	.get_txpower        = ath9k_get_txpower,
+	.wake_tx_queue      = ath9k_wake_tx_queue,
 };
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index fe795fc..4077eeb 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
 					   struct ath_txq *txq,
 					   struct ath_atx_tid *tid,
 					   struct sk_buff *skb);
+static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
+			  struct ath_tx_control *txctl);
 
 enum {
 	MCS_HT20,
@@ -118,6 +120,26 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
 		list_add_tail(&tid->list, list);
 }
 
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue)
+{
+	struct ath_softc *sc = hw->priv;
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	struct ath_atx_tid *tid = (struct ath_atx_tid *) queue->drv_priv;
+	struct ath_txq *txq = tid->txq;
+
+	ath_dbg(common, QUEUE, "Waking TX queue: %pM (%d)\n",
+		queue->sta ? queue->sta->addr : queue->vif->addr,
+		tid->tidno);
+
+	ath_txq_lock(sc, txq);
+
+	tid->has_queued = true;
+	ath_tx_queue_tid(sc, txq, tid);
+	ath_txq_schedule(sc, txq);
+
+	ath_txq_unlock(sc, txq);
+}
+
 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
 {
 	struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
@@ -145,7 +167,6 @@ static void ath_set_rates(struct ieee80211_vif *vif, struct ieee80211_sta *sta,
 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
 			     struct sk_buff *skb)
 {
-	struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
 	struct ath_frame_info *fi = get_frame_info(skb);
 	int q = fi->txq;
 
@@ -156,14 +177,6 @@ static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
 	if (WARN_ON(--txq->pending_frames < 0))
 		txq->pending_frames = 0;
 
-	if (txq->stopped &&
-	    txq->pending_frames < sc->tx.txq_max_pending[q]) {
-		if (ath9k_is_chanctx_enabled())
-			ieee80211_wake_queue(sc->hw, info->hw_queue);
-		else
-			ieee80211_wake_queue(sc->hw, q);
-		txq->stopped = false;
-	}
 }
 
 static struct ath_atx_tid *
@@ -173,9 +186,47 @@ ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
 	return ATH_AN_2_TID(an, tidno);
 }
 
+static struct sk_buff *
+ath_tid_pull(struct ath_atx_tid *tid)
+{
+	struct ath_softc *sc = tid->an->sc;
+	struct ieee80211_hw *hw = sc->hw;
+	struct ath_tx_control txctl = {
+		.txq = tid->txq,
+		.sta = tid->an->sta,
+	};
+	struct sk_buff *skb;
+	struct ath_frame_info *fi;
+	int q;
+
+	if (!tid->has_queued)
+		return NULL;
+
+	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
+	if (!skb) {
+		tid->has_queued = false;
+		return NULL;
+	}
+
+	if (ath_tx_prepare(hw, skb, &txctl)) {
+		ieee80211_free_txskb(hw, skb);
+		return NULL;
+	}
+
+	q = skb_get_queue_mapping(skb);
+	if (tid->txq == sc->tx.txq_map[q]) {
+		fi = get_frame_info(skb);
+		fi->txq = q;
+		++tid->txq->pending_frames;
+	}
+
+	return skb;
+ }
+
+
 static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
 {
-	return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
+	return !skb_queue_empty(&tid->retry_q) || tid->has_queued;
 }
 
 static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
@@ -184,46 +235,11 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
 
 	skb = __skb_dequeue(&tid->retry_q);
 	if (!skb)
-		skb = __skb_dequeue(&tid->buf_q);
+		skb = ath_tid_pull(tid);
 
 	return skb;
 }
 
-/*
- * ath_tx_tid_change_state:
- * - clears a-mpdu flag of previous session
- * - force sequence number allocation to fix next BlockAck Window
- */
-static void
-ath_tx_tid_change_state(struct ath_softc *sc, struct ath_atx_tid *tid)
-{
-	struct ath_txq *txq = tid->txq;
-	struct ieee80211_tx_info *tx_info;
-	struct sk_buff *skb, *tskb;
-	struct ath_buf *bf;
-	struct ath_frame_info *fi;
-
-	skb_queue_walk_safe(&tid->buf_q, skb, tskb) {
-		fi = get_frame_info(skb);
-		bf = fi->bf;
-
-		tx_info = IEEE80211_SKB_CB(skb);
-		tx_info->flags &= ~IEEE80211_TX_CTL_AMPDU;
-
-		if (bf)
-			continue;
-
-		bf = ath_tx_setup_buffer(sc, txq, tid, skb);
-		if (!bf) {
-			__skb_unlink(skb, &tid->buf_q);
-			ath_txq_skb_done(sc, txq, skb);
-			ieee80211_free_txskb(sc->hw, skb);
-			continue;
-		}
-	}
-
-}
-
 static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
 {
 	struct ath_txq *txq = tid->txq;
@@ -858,7 +874,7 @@ static int ath_compute_num_delims(struct ath_softc *sc, struct ath_atx_tid *tid,
 
 static struct ath_buf *
 ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
-			struct ath_atx_tid *tid, struct sk_buff_head **q)
+			struct ath_atx_tid *tid)
 {
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
@@ -867,11 +883,7 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	u16 seqno;
 
 	while (1) {
-		*q = &tid->retry_q;
-		if (skb_queue_empty(*q))
-			*q = &tid->buf_q;
-
-		skb = skb_peek(*q);
+		skb = ath_tid_dequeue(tid);
 		if (!skb)
 			break;
 
@@ -883,7 +895,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 			bf->bf_state.stale = false;
 
 		if (!bf) {
-			__skb_unlink(skb, *q);
 			ath_txq_skb_done(sc, txq, skb);
 			ieee80211_free_txskb(sc->hw, skb);
 			continue;
@@ -912,8 +923,16 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 		seqno = bf->bf_state.seqno;
 
 		/* do not step over block-ack window */
-		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
+		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
+			__skb_queue_tail(&tid->retry_q, skb);
+
+			/* If there are other skbs in the retry q, they are
+			 * probably within the BAW, so loop immediately to get
+			 * one of them. Otherwise the queue can get stuck. */
+			if (!skb_queue_is_first(&tid->retry_q, skb))
+				continue;
 			break;
+		}
 
 		if (tid->bar_index > ATH_BA_INDEX(tid->seq_start, seqno)) {
 			struct ath_tx_status ts = {};
@@ -921,7 +940,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 
 			INIT_LIST_HEAD(&bf_head);
 			list_add(&bf->list, &bf_head);
-			__skb_unlink(skb, *q);
 			ath_tx_update_baw(sc, tid, seqno);
 			ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0);
 			continue;
@@ -933,11 +951,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	return NULL;
 }
 
-static bool
+static int
 ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		 struct ath_atx_tid *tid, struct list_head *bf_q,
-		 struct ath_buf *bf_first, struct sk_buff_head *tid_q,
-		 int *aggr_len)
+		 struct ath_buf *bf_first)
 {
 #define PADBYTES(_len) ((4 - ((_len) % 4)) % 4)
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
@@ -947,12 +964,13 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
 	struct sk_buff *skb;
-	bool closed = false;
+
 
 	bf = bf_first;
 	aggr_limit = ath_lookup_rate(sc, bf, tid);
 
-	do {
+	while (bf)
+	{
 		skb = bf->bf_mpdu;
 		fi = get_frame_info(skb);
 
@@ -961,12 +979,12 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes) {
 			if (aggr_limit < al + bpad + al_delta ||
 			    ath_lookup_legacy(bf) || nframes >= h_baw)
-				break;
+				goto stop;
 
 			tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 			if ((tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) ||
 			    !(tx_info->flags & IEEE80211_TX_CTL_AMPDU))
-				break;
+				goto stop;
 		}
 
 		/* add padding for previous frame to aggregation length */
@@ -988,20 +1006,18 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 			ath_tx_addto_baw(sc, tid, bf);
 		bf->bf_state.ndelim = ndelim;
 
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
 
 		bf_prev = bf;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
-		if (!bf) {
-			closed = true;
-			break;
-		}
-	} while (ath_tid_has_buffered(tid));
-
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
+	}
+	goto finish;
+stop:
+	__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
+finish:
 	bf = bf_first;
 	bf->bf_lastbf = bf_prev;
 
@@ -1012,9 +1028,7 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		TX_STAT_INC(txq->axq_qnum, a_aggr);
 	}
 
-	*aggr_len = al;
-
-	return closed;
+	return al;
 #undef PADBYTES
 }
 
@@ -1391,18 +1405,15 @@ static void ath_tx_fill_desc(struct ath_softc *sc, struct ath_buf *bf,
 static void
 ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		  struct ath_atx_tid *tid, struct list_head *bf_q,
-		  struct ath_buf *bf_first, struct sk_buff_head *tid_q)
+		  struct ath_buf *bf_first)
 {
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
-	struct sk_buff *skb;
 	int nframes = 0;
 
 	do {
 		struct ieee80211_tx_info *tx_info;
-		skb = bf->bf_mpdu;
 
 		nframes++;
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
@@ -1411,13 +1422,15 @@ ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes >= 2)
 			break;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
 		if (!bf)
 			break;
 
 		tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
-		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU)
+		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU) {
+			__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 			break;
+		}
 
 		ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	} while (1);
@@ -1428,34 +1441,33 @@ static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
 {
 	struct ath_buf *bf;
 	struct ieee80211_tx_info *tx_info;
-	struct sk_buff_head *tid_q;
 	struct list_head bf_q;
 	int aggr_len = 0;
-	bool aggr, last = true;
+	bool aggr;
 
 	if (!ath_tid_has_buffered(tid))
 		return false;
 
 	INIT_LIST_HEAD(&bf_q);
 
-	bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+	bf = ath_tx_get_tid_subframe(sc, txq, tid);
 	if (!bf)
 		return false;
 
 	tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 	aggr = !!(tx_info->flags & IEEE80211_TX_CTL_AMPDU);
 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
-		(!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 		*stop = true;
 		return false;
 	}
 
 	ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	if (aggr)
-		last = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf,
-					tid_q, &aggr_len);
+		aggr_len = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf);
 	else
-		ath_tx_form_burst(sc, txq, tid, &bf_q, bf, tid_q);
+		ath_tx_form_burst(sc, txq, tid, &bf_q, bf);
 
 	if (list_empty(&bf_q))
 		return false;
@@ -1498,9 +1510,6 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
 		an->mpdudensity = density;
 	}
 
-	/* force sequence number allocation for pending frames */
-	ath_tx_tid_change_state(sc, txtid);
-
 	txtid->active = true;
 	*ssn = txtid->seq_start = txtid->seq_next;
 	txtid->bar_index = -1;
@@ -1525,7 +1534,6 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
 	ath_txq_lock(sc, txq);
 	txtid->active = false;
 	ath_tx_flush_tid(sc, txtid);
-	ath_tx_tid_change_state(sc, txtid);
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1535,14 +1543,12 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
-	bool buffered;
 	int tidno;
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -1552,13 +1558,9 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 			continue;
 		}
 
-		buffered = ath_tid_has_buffered(tid);
-
 		list_del_init(&tid->list);
 
 		ath_txq_unlock(sc, txq);
-
-		ieee80211_sta_set_buffered(sta, tidno, buffered);
 	}
 }
 
@@ -1571,19 +1573,12 @@ void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
 		tid->clear_ps_filter = true;
-
-		if (ath_tid_has_buffered(tid)) {
-			ath_tx_queue_tid(sc, txq, tid);
-			ath_txq_schedule(sc, txq);
-		}
-
 		ath_txq_unlock_complete(sc, txq);
 	}
 }
@@ -1606,11 +1601,6 @@ void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
 
 	tid->baw_size = IEEE80211_MIN_AMPDU_BUF << sta->ht_cap.ampdu_factor;
 
-	if (ath_tid_has_buffered(tid)) {
-		ath_tx_queue_tid(sc, txq, tid);
-		ath_txq_schedule(sc, txq);
-	}
-
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1626,7 +1616,6 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 	struct ieee80211_tx_info *info;
 	struct list_head bf_q;
 	struct ath_buf *bf_tail = NULL, *bf;
-	struct sk_buff_head *tid_q;
 	int sent = 0;
 	int i;
 
@@ -1641,11 +1630,10 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 
 		ath_txq_lock(sc, tid->txq);
 		while (nframes > 0) {
-			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid, &tid_q);
+			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid);
 			if (!bf)
 				break;
 
-			__skb_unlink(bf->bf_mpdu, tid_q);
 			list_add_tail(&bf->list, &bf_q);
 			ath_set_rates(tid->an->vif, tid->an->sta, bf);
 			if (bf_isampdu(bf)) {
@@ -1660,7 +1648,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 			sent++;
 			TX_STAT_INC(txq->axq_qnum, a_queued_hw);
 
-			if (an->sta && !ath_tid_has_buffered(tid))
+			if (an->sta && skb_queue_empty(&tid->retry_q))
 				ieee80211_sta_set_buffered(an->sta, i, false);
 		}
 		ath_txq_unlock_complete(sc, tid->txq);
@@ -1887,13 +1875,7 @@ bool ath_drain_all_txq(struct ath_softc *sc)
 		if (!ATH_TXQ_SETUP(sc, i))
 			continue;
 
-		/*
-		 * The caller will resume queues with ieee80211_wake_queues.
-		 * Mark the queue as not stopped to prevent ath_tx_complete
-		 * from waking the queue too early.
-		 */
 		txq = &sc->tx.txq[i];
-		txq->stopped = false;
 		ath_draintxq(sc, txq);
 	}
 
@@ -2293,15 +2275,12 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 	struct ath_txq *txq = txctl->txq;
 	struct ath_atx_tid *tid = NULL;
 	struct ath_buf *bf;
-	bool queue, skip_uapsd = false, ps_resp;
+	bool ps_resp;
 	int q, ret;
 
 	if (vif)
 		avp = (void *)vif->drv_priv;
 
-	if (info->flags & IEEE80211_TX_CTL_TX_OFFCHAN)
-		txctl->force_channel = true;
-
 	ps_resp = !!(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE);
 
 	ret = ath_tx_prepare(hw, skb, txctl);
@@ -2316,63 +2295,13 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 
 	q = skb_get_queue_mapping(skb);
 
+	if (ps_resp)
+		txq = sc->tx.uapsdq;
+
 	ath_txq_lock(sc, txq);
 	if (txq == sc->tx.txq_map[q]) {
 		fi->txq = q;
-		if (++txq->pending_frames > sc->tx.txq_max_pending[q] &&
-		    !txq->stopped) {
-			if (ath9k_is_chanctx_enabled())
-				ieee80211_stop_queue(sc->hw, info->hw_queue);
-			else
-				ieee80211_stop_queue(sc->hw, q);
-			txq->stopped = true;
-		}
-	}
-
-	queue = ieee80211_is_data_present(hdr->frame_control);
-
-	/* If chanctx, queue all null frames while NOA could be there */
-	if (ath9k_is_chanctx_enabled() &&
-	    ieee80211_is_nullfunc(hdr->frame_control) &&
-	    !txctl->force_channel)
-		queue = true;
-
-	/* Force queueing of all frames that belong to a virtual interface on
-	 * a different channel context, to ensure that they are sent on the
-	 * correct channel.
-	 */
-	if (((avp && avp->chanctx != sc->cur_chan) ||
-	     sc->cur_chan->stopped) && !txctl->force_channel) {
-		if (!txctl->an)
-			txctl->an = &avp->mcast_node;
-		queue = true;
-		skip_uapsd = true;
-	}
-
-	if (txctl->an && queue)
-		tid = ath_get_skb_tid(sc, txctl->an, skb);
-
-	if (!skip_uapsd && ps_resp) {
-		ath_txq_unlock(sc, txq);
-		txq = sc->tx.uapsdq;
-		ath_txq_lock(sc, txq);
-	} else if (txctl->an && queue) {
-		WARN_ON(tid->txq != txctl->txq);
-
-		if (info->flags & IEEE80211_TX_CTL_CLEAR_PS_FILT)
-			tid->clear_ps_filter = true;
-
-		/*
-		 * Add this frame to software queue for scheduling later
-		 * for aggregation.
-		 */
-		TX_STAT_INC(txq->axq_qnum, a_queued_sw);
-		__skb_queue_tail(&tid->buf_q, skb);
-		if (!txctl->an->sleeping)
-			ath_tx_queue_tid(sc, txq, tid);
-
-		ath_txq_schedule(sc, txq);
-		goto out;
+		++txq->pending_frames;
 	}
 
 	bf = ath_tx_setup_buffer(sc, txq, tid, skb);
@@ -2856,9 +2785,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 	struct ath_atx_tid *tid;
 	int tidno, acno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS;
-	     tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		tid->an        = an;
 		tid->tidno     = tidno;
 		tid->seq_start = tid->seq_next = 0;
@@ -2866,11 +2794,14 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 		tid->baw_head  = tid->baw_tail = 0;
 		tid->active	   = false;
 		tid->clear_ps_filter = true;
-		__skb_queue_head_init(&tid->buf_q);
+		tid->has_queued  = false;
 		__skb_queue_head_init(&tid->retry_q);
 		INIT_LIST_HEAD(&tid->list);
 		acno = TID_TO_WME_AC(tidno);
 		tid->txq = sc->tx.txq_map[acno];
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
@@ -2880,9 +2811,8 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 	struct ath_txq *txq;
 	int tidno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -2894,6 +2824,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 		tid->active = false;
 
 		ath_txq_unlock(sc, txq);
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v2] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 16:17     ` [ath9k-devel] [PATCH v2] " Toke Høiland-Jørgensen
@ 2016-07-06 18:13       ` Felix Fietkau
  2016-07-06 18:52         ` Toke Høiland-Jørgensen
  2016-07-06 18:19       ` Sebastian Gottschall
  2016-07-06 19:38       ` [ath9k-devel] [PATCH v3] " Toke Høiland-Jørgensen
  2 siblings, 1 reply; 50+ messages in thread
From: Felix Fietkau @ 2016-07-06 18:13 UTC (permalink / raw)
  To: ath9k-devel

On 2016-07-06 18:16, Toke H?iland-J?rgensen wrote:
> This switches ath9k over to using the mac80211 intermediate software
> queueing mechanism for data packets. It removes the queueing inside the
> driver, except for the retry queue, and instead pulls from mac80211 when
> a packet is needed. The retry queue is used to store a packet that was
> pulled but can't be sent immediately.
> 
> The old code path in ath_tx_start that would queue packets has been
> removed completely, as has the qlen limit tunables (since there's no
> longer a queue in the driver to limit).
> 
> Based on Tim's original patch set, but reworked quite thoroughly.
> 
> Cc: Tim Shepard <shep@alum.mit.edu>
> Cc: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
> ---
> Changes since v1:
>   - Remove the old intermediate queueing logic completely instead of
>     just disabling it.
>   - Remove the qlen debug tunables.
>   - Remove the force_channel parameter from struct txctl (since we just
>     removed the code path that was using it).
> 
>  drivers/net/wireless/ath/ath9k/ath9k.h     |  12 +-
>  drivers/net/wireless/ath/ath9k/channel.c   |   2 -
>  drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
>  drivers/net/wireless/ath/ath9k/debug.h     |   2 -
>  drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
>  drivers/net/wireless/ath/ath9k/init.c      |   2 +-
>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>  drivers/net/wireless/ath/ath9k/xmit.c      | 307 +++++++++++------------------
>  8 files changed, 130 insertions(+), 214 deletions(-)
Nice work!


> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> index fe795fc..4077eeb 100644
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -912,8 +923,16 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>  		seqno = bf->bf_state.seqno;
>  
>  		/* do not step over block-ack window */
> -		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
> +		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
> +			__skb_queue_tail(&tid->retry_q, skb);
> +
> +			/* If there are other skbs in the retry q, they are
> +			 * probably within the BAW, so loop immediately to get
> +			 * one of them. Otherwise the queue can get stuck. */
> +			if (!skb_queue_is_first(&tid->retry_q, skb))
> +				continue;
Not sure if this can happen, but if we ever somehow end up with two skbs
in the retry queue that do not fit into the Block-Ack window, there's
potential for an infinite loop here.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v2] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 16:17     ` [ath9k-devel] [PATCH v2] " Toke Høiland-Jørgensen
  2016-07-06 18:13       ` Felix Fietkau
@ 2016-07-06 18:19       ` Sebastian Gottschall
  2016-07-06 19:38       ` [ath9k-devel] [PATCH v3] " Toke Høiland-Jørgensen
  2 siblings, 0 replies; 50+ messages in thread
From: Sebastian Gottschall @ 2016-07-06 18:19 UTC (permalink / raw)
  To: ath9k-devel

testing now on my various devices in various operation modes, but looks 
good so far. no stability issues

Sebastian

Am 06.07.2016 um 18:16 schrieb Toke H?iland-J?rgensen:
> This switches ath9k over to using the mac80211 intermediate software
> queueing mechanism for data packets. It removes the queueing inside the
> driver, except for the retry queue, and instead pulls from mac80211 when
> a packet is needed. The retry queue is used to store a packet that was
> pulled but can't be sent immediately.
>
> The old code path in ath_tx_start that would queue packets has been
> removed completely, as has the qlen limit tunables (since there's no
> longer a queue in the driver to limit).
>
> Based on Tim's original patch set, but reworked quite thoroughly.
>
> Cc: Tim Shepard <shep@alum.mit.edu>
> Cc: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
> ---
> Changes since v1:
>    - Remove the old intermediate queueing logic completely instead of
>      just disabling it.
>    - Remove the qlen debug tunables.
>    - Remove the force_channel parameter from struct txctl (since we just
>      removed the code path that was using it).
>
>   drivers/net/wireless/ath/ath9k/ath9k.h     |  12 +-
>   drivers/net/wireless/ath/ath9k/channel.c   |   2 -
>   drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
>   drivers/net/wireless/ath/ath9k/debug.h     |   2 -
>   drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
>   drivers/net/wireless/ath/ath9k/init.c      |   2 +-
>   drivers/net/wireless/ath/ath9k/main.c      |   1 +
>   drivers/net/wireless/ath/ath9k/xmit.c      | 307 +++++++++++------------------
>   8 files changed, 130 insertions(+), 214 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
> index 5294595..daf972c 100644
> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> @@ -91,7 +91,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>   #define ATH_RXBUF               512
>   #define ATH_TXBUF               512
>   #define ATH_TXBUF_RESERVE       5
> -#define ATH_MAX_QDEPTH          (ATH_TXBUF / 4 - ATH_TXBUF_RESERVE)
>   #define ATH_TXMAXTRY            13
>   #define ATH_MAX_SW_RETRIES      30
>   
> @@ -145,7 +144,9 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>   #define BAW_WITHIN(_start, _bawsz, _seqno) \
>   	((((_seqno) - (_start)) & 4095) < (_bawsz))
>   
> -#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
> +#define ATH_STA_2_TID(_sta, _tidno) ((struct ath_atx_tid *)(_sta)->txq[_tidno]->drv_priv)
> +#define ATH_VIF_2_TID(_vif) ((struct ath_atx_tid *)(_vif)->txq->drv_priv)
> +#define ATH_AN_2_TID(_an, _tidno) ((_an)->sta ? ATH_STA_2_TID((_an)->sta, _tidno) : ATH_VIF_2_TID((_an)->vif))
>   
>   #define IS_HT_RATE(rate)   (rate & 0x80)
>   #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
> @@ -164,7 +165,6 @@ struct ath_txq {
>   	spinlock_t axq_lock;
>   	u32 axq_depth;
>   	u32 axq_ampdu_depth;
> -	bool stopped;
>   	bool axq_tx_inprogress;
>   	struct list_head txq_fifo[ATH_TXFIFO_DEPTH];
>   	u8 txq_headidx;
> @@ -232,7 +232,6 @@ struct ath_buf {
>   
>   struct ath_atx_tid {
>   	struct list_head list;
> -	struct sk_buff_head buf_q;
>   	struct sk_buff_head retry_q;
>   	struct ath_node *an;
>   	struct ath_txq *txq;
> @@ -247,13 +246,13 @@ struct ath_atx_tid {
>   	s8 bar_index;
>   	bool active;
>   	bool clear_ps_filter;
> +	bool has_queued;
>   };
>   
>   struct ath_node {
>   	struct ath_softc *sc;
>   	struct ieee80211_sta *sta; /* station struct we're part of */
>   	struct ieee80211_vif *vif; /* interface with which we're associated */
> -	struct ath_atx_tid tid[IEEE80211_NUM_TIDS];
>   
>   	u16 maxampdu;
>   	u8 mpdudensity;
> @@ -276,7 +275,6 @@ struct ath_tx_control {
>   	struct ath_node *an;
>   	struct ieee80211_sta *sta;
>   	u8 paprd;
> -	bool force_channel;
>   };
>   
>   
> @@ -293,7 +291,6 @@ struct ath_tx {
>   	struct ath_descdma txdma;
>   	struct ath_txq *txq_map[IEEE80211_NUM_ACS];
>   	struct ath_txq *uapsdq;
> -	u32 txq_max_pending[IEEE80211_NUM_ACS];
>   	u16 max_aggr_framelen[IEEE80211_NUM_ACS][4][32];
>   };
>   
> @@ -585,6 +582,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
>   				   u16 tids, int nframes,
>   				   enum ieee80211_frame_release_type reason,
>   				   bool more_data);
> +void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue);
>   
>   /********/
>   /* VIFs */
> diff --git a/drivers/net/wireless/ath/ath9k/channel.c b/drivers/net/wireless/ath/ath9k/channel.c
> index 319cb5f..a5ce016 100644
> --- a/drivers/net/wireless/ath/ath9k/channel.c
> +++ b/drivers/net/wireless/ath/ath9k/channel.c
> @@ -1007,7 +1007,6 @@ static void ath_scan_send_probe(struct ath_softc *sc,
>   		goto error;
>   
>   	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
> -	txctl.force_channel = true;
>   	if (ath_tx_start(sc->hw, skb, &txctl))
>   		goto error;
>   
> @@ -1130,7 +1129,6 @@ ath_chanctx_send_vif_ps_frame(struct ath_softc *sc, struct ath_vif *avp,
>   	memset(&txctl, 0, sizeof(txctl));
>   	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
>   	txctl.sta = sta;
> -	txctl.force_channel = true;
>   	if (ath_tx_start(sc->hw, skb, &txctl)) {
>   		ieee80211_free_txskb(sc->hw, skb);
>   		return false;
> diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
> index 6de64cf..48b181d 100644
> --- a/drivers/net/wireless/ath/ath9k/debug.c
> +++ b/drivers/net/wireless/ath/ath9k/debug.c
> @@ -600,7 +600,6 @@ static int read_file_xmit(struct seq_file *file, void *data)
>   	PR("MPDUs XRetried:  ", xretries);
>   	PR("Aggregates:      ", a_aggr);
>   	PR("AMPDUs Queued HW:", a_queued_hw);
> -	PR("AMPDUs Queued SW:", a_queued_sw);
>   	PR("AMPDUs Completed:", a_completed);
>   	PR("AMPDUs Retried:  ", a_retries);
>   	PR("AMPDUs XRetried: ", a_xretries);
> @@ -629,8 +628,7 @@ static void print_queue(struct ath_softc *sc, struct ath_txq *txq,
>   	seq_printf(file, "%s: %d ", "qnum", txq->axq_qnum);
>   	seq_printf(file, "%s: %2d ", "qdepth", txq->axq_depth);
>   	seq_printf(file, "%s: %2d ", "ampdu-depth", txq->axq_ampdu_depth);
> -	seq_printf(file, "%s: %3d ", "pending", txq->pending_frames);
> -	seq_printf(file, "%s: %d\n", "stopped", txq->stopped);
> +	seq_printf(file, "%s: %3d\n", "pending", txq->pending_frames);
>   
>   	ath_txq_unlock(sc, txq);
>   }
> @@ -1190,7 +1188,6 @@ static const char ath9k_gstrings_stats[][ETH_GSTRING_LEN] = {
>   	AMKSTR(d_tx_mpdu_xretries),
>   	AMKSTR(d_tx_aggregates),
>   	AMKSTR(d_tx_ampdus_queued_hw),
> -	AMKSTR(d_tx_ampdus_queued_sw),
>   	AMKSTR(d_tx_ampdus_completed),
>   	AMKSTR(d_tx_ampdu_retries),
>   	AMKSTR(d_tx_ampdu_xretries),
> @@ -1270,7 +1267,6 @@ void ath9k_get_et_stats(struct ieee80211_hw *hw,
>   	AWDATA(xretries);
>   	AWDATA(a_aggr);
>   	AWDATA(a_queued_hw);
> -	AWDATA(a_queued_sw);
>   	AWDATA(a_completed);
>   	AWDATA(a_retries);
>   	AWDATA(a_xretries);
> @@ -1328,14 +1324,6 @@ int ath9k_init_debug(struct ath_hw *ah)
>   				    read_file_xmit);
>   	debugfs_create_devm_seqfile(sc->dev, "queues", sc->debug.debugfs_phy,
>   				    read_file_queues);
> -	debugfs_create_u32("qlen_bk", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
> -			   &sc->tx.txq_max_pending[IEEE80211_AC_BK]);
> -	debugfs_create_u32("qlen_be", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
> -			   &sc->tx.txq_max_pending[IEEE80211_AC_BE]);
> -	debugfs_create_u32("qlen_vi", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
> -			   &sc->tx.txq_max_pending[IEEE80211_AC_VI]);
> -	debugfs_create_u32("qlen_vo", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
> -			   &sc->tx.txq_max_pending[IEEE80211_AC_VO]);
>   	debugfs_create_devm_seqfile(sc->dev, "misc", sc->debug.debugfs_phy,
>   				    read_file_misc);
>   	debugfs_create_devm_seqfile(sc->dev, "reset", sc->debug.debugfs_phy,
> diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
> index cd68c5f..a078cdd 100644
> --- a/drivers/net/wireless/ath/ath9k/debug.h
> +++ b/drivers/net/wireless/ath/ath9k/debug.h
> @@ -147,7 +147,6 @@ struct ath_interrupt_stats {
>    * @completed: Total MPDUs (non-aggr) completed
>    * @a_aggr: Total no. of aggregates queued
>    * @a_queued_hw: Total AMPDUs queued to hardware
> - * @a_queued_sw: Total AMPDUs queued to software queues
>    * @a_completed: Total AMPDUs completed
>    * @a_retries: No. of AMPDUs retried (SW)
>    * @a_xretries: No. of AMPDUs dropped due to xretries
> @@ -174,7 +173,6 @@ struct ath_tx_stats {
>   	u32 xretries;
>   	u32 a_aggr;
>   	u32 a_queued_hw;
> -	u32 a_queued_sw;
>   	u32 a_completed;
>   	u32 a_retries;
>   	u32 a_xretries;
> diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
> index c2ca57a..d789798 100644
> --- a/drivers/net/wireless/ath/ath9k/debug_sta.c
> +++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
> @@ -52,8 +52,8 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
>   			 "TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
>   			 "BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");
>   
> -	for (tidno = 0, tid = &an->tid[tidno];
> -	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
> +	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
> +		tid = ATH_STA_2_TID(an->sta, tidno);
>   		txq = tid->txq;
>   		ath_txq_lock(sc, txq);
>   		if (tid->active) {
> diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
> index 1c226d6..752cacb 100644
> --- a/drivers/net/wireless/ath/ath9k/init.c
> +++ b/drivers/net/wireless/ath/ath9k/init.c
> @@ -354,7 +354,6 @@ static int ath9k_init_queues(struct ath_softc *sc)
>   	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>   		sc->tx.txq_map[i] = ath_txq_setup(sc, ATH9K_TX_QUEUE_DATA, i);
>   		sc->tx.txq_map[i]->mac80211_qnum = i;
> -		sc->tx.txq_max_pending[i] = ATH_MAX_QDEPTH;
>   	}
>   	return 0;
>   }
> @@ -867,6 +866,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
>   	hw->max_rate_tries = 10;
>   	hw->sta_data_size = sizeof(struct ath_node);
>   	hw->vif_data_size = sizeof(struct ath_vif);
> +	hw->txq_data_size = sizeof(struct ath_atx_tid);
>   	hw->extra_tx_headroom = 4;
>   
>   	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
> index 3aed43a..f584e19 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -2673,4 +2673,5 @@ struct ieee80211_ops ath9k_ops = {
>   	.sw_scan_start	    = ath9k_sw_scan_start,
>   	.sw_scan_complete   = ath9k_sw_scan_complete,
>   	.get_txpower        = ath9k_get_txpower,
> +	.wake_tx_queue      = ath9k_wake_tx_queue,
>   };
> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> index fe795fc..4077eeb 100644
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
>   					   struct ath_txq *txq,
>   					   struct ath_atx_tid *tid,
>   					   struct sk_buff *skb);
> +static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
> +			  struct ath_tx_control *txctl);
>   
>   enum {
>   	MCS_HT20,
> @@ -118,6 +120,26 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
>   		list_add_tail(&tid->list, list);
>   }
>   
> +void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue)
> +{
> +	struct ath_softc *sc = hw->priv;
> +	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
> +	struct ath_atx_tid *tid = (struct ath_atx_tid *) queue->drv_priv;
> +	struct ath_txq *txq = tid->txq;
> +
> +	ath_dbg(common, QUEUE, "Waking TX queue: %pM (%d)\n",
> +		queue->sta ? queue->sta->addr : queue->vif->addr,
> +		tid->tidno);
> +
> +	ath_txq_lock(sc, txq);
> +
> +	tid->has_queued = true;
> +	ath_tx_queue_tid(sc, txq, tid);
> +	ath_txq_schedule(sc, txq);
> +
> +	ath_txq_unlock(sc, txq);
> +}
> +
>   static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
>   {
>   	struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
> @@ -145,7 +167,6 @@ static void ath_set_rates(struct ieee80211_vif *vif, struct ieee80211_sta *sta,
>   static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
>   			     struct sk_buff *skb)
>   {
> -	struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
>   	struct ath_frame_info *fi = get_frame_info(skb);
>   	int q = fi->txq;
>   
> @@ -156,14 +177,6 @@ static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
>   	if (WARN_ON(--txq->pending_frames < 0))
>   		txq->pending_frames = 0;
>   
> -	if (txq->stopped &&
> -	    txq->pending_frames < sc->tx.txq_max_pending[q]) {
> -		if (ath9k_is_chanctx_enabled())
> -			ieee80211_wake_queue(sc->hw, info->hw_queue);
> -		else
> -			ieee80211_wake_queue(sc->hw, q);
> -		txq->stopped = false;
> -	}
>   }
>   
>   static struct ath_atx_tid *
> @@ -173,9 +186,47 @@ ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
>   	return ATH_AN_2_TID(an, tidno);
>   }
>   
> +static struct sk_buff *
> +ath_tid_pull(struct ath_atx_tid *tid)
> +{
> +	struct ath_softc *sc = tid->an->sc;
> +	struct ieee80211_hw *hw = sc->hw;
> +	struct ath_tx_control txctl = {
> +		.txq = tid->txq,
> +		.sta = tid->an->sta,
> +	};
> +	struct sk_buff *skb;
> +	struct ath_frame_info *fi;
> +	int q;
> +
> +	if (!tid->has_queued)
> +		return NULL;
> +
> +	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
> +	if (!skb) {
> +		tid->has_queued = false;
> +		return NULL;
> +	}
> +
> +	if (ath_tx_prepare(hw, skb, &txctl)) {
> +		ieee80211_free_txskb(hw, skb);
> +		return NULL;
> +	}
> +
> +	q = skb_get_queue_mapping(skb);
> +	if (tid->txq == sc->tx.txq_map[q]) {
> +		fi = get_frame_info(skb);
> +		fi->txq = q;
> +		++tid->txq->pending_frames;
> +	}
> +
> +	return skb;
> + }
> +
> +
>   static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
>   {
> -	return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
> +	return !skb_queue_empty(&tid->retry_q) || tid->has_queued;
>   }
>   
>   static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
> @@ -184,46 +235,11 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
>   
>   	skb = __skb_dequeue(&tid->retry_q);
>   	if (!skb)
> -		skb = __skb_dequeue(&tid->buf_q);
> +		skb = ath_tid_pull(tid);
>   
>   	return skb;
>   }
>   
> -/*
> - * ath_tx_tid_change_state:
> - * - clears a-mpdu flag of previous session
> - * - force sequence number allocation to fix next BlockAck Window
> - */
> -static void
> -ath_tx_tid_change_state(struct ath_softc *sc, struct ath_atx_tid *tid)
> -{
> -	struct ath_txq *txq = tid->txq;
> -	struct ieee80211_tx_info *tx_info;
> -	struct sk_buff *skb, *tskb;
> -	struct ath_buf *bf;
> -	struct ath_frame_info *fi;
> -
> -	skb_queue_walk_safe(&tid->buf_q, skb, tskb) {
> -		fi = get_frame_info(skb);
> -		bf = fi->bf;
> -
> -		tx_info = IEEE80211_SKB_CB(skb);
> -		tx_info->flags &= ~IEEE80211_TX_CTL_AMPDU;
> -
> -		if (bf)
> -			continue;
> -
> -		bf = ath_tx_setup_buffer(sc, txq, tid, skb);
> -		if (!bf) {
> -			__skb_unlink(skb, &tid->buf_q);
> -			ath_txq_skb_done(sc, txq, skb);
> -			ieee80211_free_txskb(sc->hw, skb);
> -			continue;
> -		}
> -	}
> -
> -}
> -
>   static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
>   {
>   	struct ath_txq *txq = tid->txq;
> @@ -858,7 +874,7 @@ static int ath_compute_num_delims(struct ath_softc *sc, struct ath_atx_tid *tid,
>   
>   static struct ath_buf *
>   ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
> -			struct ath_atx_tid *tid, struct sk_buff_head **q)
> +			struct ath_atx_tid *tid)
>   {
>   	struct ieee80211_tx_info *tx_info;
>   	struct ath_frame_info *fi;
> @@ -867,11 +883,7 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>   	u16 seqno;
>   
>   	while (1) {
> -		*q = &tid->retry_q;
> -		if (skb_queue_empty(*q))
> -			*q = &tid->buf_q;
> -
> -		skb = skb_peek(*q);
> +		skb = ath_tid_dequeue(tid);
>   		if (!skb)
>   			break;
>   
> @@ -883,7 +895,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>   			bf->bf_state.stale = false;
>   
>   		if (!bf) {
> -			__skb_unlink(skb, *q);
>   			ath_txq_skb_done(sc, txq, skb);
>   			ieee80211_free_txskb(sc->hw, skb);
>   			continue;
> @@ -912,8 +923,16 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>   		seqno = bf->bf_state.seqno;
>   
>   		/* do not step over block-ack window */
> -		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
> +		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
> +			__skb_queue_tail(&tid->retry_q, skb);
> +
> +			/* If there are other skbs in the retry q, they are
> +			 * probably within the BAW, so loop immediately to get
> +			 * one of them. Otherwise the queue can get stuck. */
> +			if (!skb_queue_is_first(&tid->retry_q, skb))
> +				continue;
>   			break;
> +		}
>   
>   		if (tid->bar_index > ATH_BA_INDEX(tid->seq_start, seqno)) {
>   			struct ath_tx_status ts = {};
> @@ -921,7 +940,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>   
>   			INIT_LIST_HEAD(&bf_head);
>   			list_add(&bf->list, &bf_head);
> -			__skb_unlink(skb, *q);
>   			ath_tx_update_baw(sc, tid, seqno);
>   			ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0);
>   			continue;
> @@ -933,11 +951,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>   	return NULL;
>   }
>   
> -static bool
> +static int
>   ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
>   		 struct ath_atx_tid *tid, struct list_head *bf_q,
> -		 struct ath_buf *bf_first, struct sk_buff_head *tid_q,
> -		 int *aggr_len)
> +		 struct ath_buf *bf_first)
>   {
>   #define PADBYTES(_len) ((4 - ((_len) % 4)) % 4)
>   	struct ath_buf *bf = bf_first, *bf_prev = NULL;
> @@ -947,12 +964,13 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
>   	struct ieee80211_tx_info *tx_info;
>   	struct ath_frame_info *fi;
>   	struct sk_buff *skb;
> -	bool closed = false;
> +
>   
>   	bf = bf_first;
>   	aggr_limit = ath_lookup_rate(sc, bf, tid);
>   
> -	do {
> +	while (bf)
> +	{
>   		skb = bf->bf_mpdu;
>   		fi = get_frame_info(skb);
>   
> @@ -961,12 +979,12 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
>   		if (nframes) {
>   			if (aggr_limit < al + bpad + al_delta ||
>   			    ath_lookup_legacy(bf) || nframes >= h_baw)
> -				break;
> +				goto stop;
>   
>   			tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
>   			if ((tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) ||
>   			    !(tx_info->flags & IEEE80211_TX_CTL_AMPDU))
> -				break;
> +				goto stop;
>   		}
>   
>   		/* add padding for previous frame to aggregation length */
> @@ -988,20 +1006,18 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
>   			ath_tx_addto_baw(sc, tid, bf);
>   		bf->bf_state.ndelim = ndelim;
>   
> -		__skb_unlink(skb, tid_q);
>   		list_add_tail(&bf->list, bf_q);
>   		if (bf_prev)
>   			bf_prev->bf_next = bf;
>   
>   		bf_prev = bf;
>   
> -		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
> -		if (!bf) {
> -			closed = true;
> -			break;
> -		}
> -	} while (ath_tid_has_buffered(tid));
> -
> +		bf = ath_tx_get_tid_subframe(sc, txq, tid);
> +	}
> +	goto finish;
> +stop:
> +	__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
> +finish:
>   	bf = bf_first;
>   	bf->bf_lastbf = bf_prev;
>   
> @@ -1012,9 +1028,7 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
>   		TX_STAT_INC(txq->axq_qnum, a_aggr);
>   	}
>   
> -	*aggr_len = al;
> -
> -	return closed;
> +	return al;
>   #undef PADBYTES
>   }
>   
> @@ -1391,18 +1405,15 @@ static void ath_tx_fill_desc(struct ath_softc *sc, struct ath_buf *bf,
>   static void
>   ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
>   		  struct ath_atx_tid *tid, struct list_head *bf_q,
> -		  struct ath_buf *bf_first, struct sk_buff_head *tid_q)
> +		  struct ath_buf *bf_first)
>   {
>   	struct ath_buf *bf = bf_first, *bf_prev = NULL;
> -	struct sk_buff *skb;
>   	int nframes = 0;
>   
>   	do {
>   		struct ieee80211_tx_info *tx_info;
> -		skb = bf->bf_mpdu;
>   
>   		nframes++;
> -		__skb_unlink(skb, tid_q);
>   		list_add_tail(&bf->list, bf_q);
>   		if (bf_prev)
>   			bf_prev->bf_next = bf;
> @@ -1411,13 +1422,15 @@ ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
>   		if (nframes >= 2)
>   			break;
>   
> -		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
> +		bf = ath_tx_get_tid_subframe(sc, txq, tid);
>   		if (!bf)
>   			break;
>   
>   		tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
> -		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU)
> +		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU) {
> +			__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
>   			break;
> +		}
>   
>   		ath_set_rates(tid->an->vif, tid->an->sta, bf);
>   	} while (1);
> @@ -1428,34 +1441,33 @@ static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
>   {
>   	struct ath_buf *bf;
>   	struct ieee80211_tx_info *tx_info;
> -	struct sk_buff_head *tid_q;
>   	struct list_head bf_q;
>   	int aggr_len = 0;
> -	bool aggr, last = true;
> +	bool aggr;
>   
>   	if (!ath_tid_has_buffered(tid))
>   		return false;
>   
>   	INIT_LIST_HEAD(&bf_q);
>   
> -	bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
> +	bf = ath_tx_get_tid_subframe(sc, txq, tid);
>   	if (!bf)
>   		return false;
>   
>   	tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
>   	aggr = !!(tx_info->flags & IEEE80211_TX_CTL_AMPDU);
>   	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
> -		(!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
> +	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
> +		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
>   		*stop = true;
>   		return false;
>   	}
>   
>   	ath_set_rates(tid->an->vif, tid->an->sta, bf);
>   	if (aggr)
> -		last = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf,
> -					tid_q, &aggr_len);
> +		aggr_len = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf);
>   	else
> -		ath_tx_form_burst(sc, txq, tid, &bf_q, bf, tid_q);
> +		ath_tx_form_burst(sc, txq, tid, &bf_q, bf);
>   
>   	if (list_empty(&bf_q))
>   		return false;
> @@ -1498,9 +1510,6 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
>   		an->mpdudensity = density;
>   	}
>   
> -	/* force sequence number allocation for pending frames */
> -	ath_tx_tid_change_state(sc, txtid);
> -
>   	txtid->active = true;
>   	*ssn = txtid->seq_start = txtid->seq_next;
>   	txtid->bar_index = -1;
> @@ -1525,7 +1534,6 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
>   	ath_txq_lock(sc, txq);
>   	txtid->active = false;
>   	ath_tx_flush_tid(sc, txtid);
> -	ath_tx_tid_change_state(sc, txtid);
>   	ath_txq_unlock_complete(sc, txq);
>   }
>   
> @@ -1535,14 +1543,12 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
>   	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
>   	struct ath_atx_tid *tid;
>   	struct ath_txq *txq;
> -	bool buffered;
>   	int tidno;
>   
>   	ath_dbg(common, XMIT, "%s called\n", __func__);
>   
> -	for (tidno = 0, tid = &an->tid[tidno];
> -	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
> -
> +	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
> +		tid = ATH_AN_2_TID(an, tidno);
>   		txq = tid->txq;
>   
>   		ath_txq_lock(sc, txq);
> @@ -1552,13 +1558,9 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
>   			continue;
>   		}
>   
> -		buffered = ath_tid_has_buffered(tid);
> -
>   		list_del_init(&tid->list);
>   
>   		ath_txq_unlock(sc, txq);
> -
> -		ieee80211_sta_set_buffered(sta, tidno, buffered);
>   	}
>   }
>   
> @@ -1571,19 +1573,12 @@ void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
>   
>   	ath_dbg(common, XMIT, "%s called\n", __func__);
>   
> -	for (tidno = 0, tid = &an->tid[tidno];
> -	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
> -
> +	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
> +		tid = ATH_AN_2_TID(an, tidno);
>   		txq = tid->txq;
>   
>   		ath_txq_lock(sc, txq);
>   		tid->clear_ps_filter = true;
> -
> -		if (ath_tid_has_buffered(tid)) {
> -			ath_tx_queue_tid(sc, txq, tid);
> -			ath_txq_schedule(sc, txq);
> -		}
> -
>   		ath_txq_unlock_complete(sc, txq);
>   	}
>   }
> @@ -1606,11 +1601,6 @@ void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
>   
>   	tid->baw_size = IEEE80211_MIN_AMPDU_BUF << sta->ht_cap.ampdu_factor;
>   
> -	if (ath_tid_has_buffered(tid)) {
> -		ath_tx_queue_tid(sc, txq, tid);
> -		ath_txq_schedule(sc, txq);
> -	}
> -
>   	ath_txq_unlock_complete(sc, txq);
>   }
>   
> @@ -1626,7 +1616,6 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
>   	struct ieee80211_tx_info *info;
>   	struct list_head bf_q;
>   	struct ath_buf *bf_tail = NULL, *bf;
> -	struct sk_buff_head *tid_q;
>   	int sent = 0;
>   	int i;
>   
> @@ -1641,11 +1630,10 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
>   
>   		ath_txq_lock(sc, tid->txq);
>   		while (nframes > 0) {
> -			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid, &tid_q);
> +			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid);
>   			if (!bf)
>   				break;
>   
> -			__skb_unlink(bf->bf_mpdu, tid_q);
>   			list_add_tail(&bf->list, &bf_q);
>   			ath_set_rates(tid->an->vif, tid->an->sta, bf);
>   			if (bf_isampdu(bf)) {
> @@ -1660,7 +1648,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
>   			sent++;
>   			TX_STAT_INC(txq->axq_qnum, a_queued_hw);
>   
> -			if (an->sta && !ath_tid_has_buffered(tid))
> +			if (an->sta && skb_queue_empty(&tid->retry_q))
>   				ieee80211_sta_set_buffered(an->sta, i, false);
>   		}
>   		ath_txq_unlock_complete(sc, tid->txq);
> @@ -1887,13 +1875,7 @@ bool ath_drain_all_txq(struct ath_softc *sc)
>   		if (!ATH_TXQ_SETUP(sc, i))
>   			continue;
>   
> -		/*
> -		 * The caller will resume queues with ieee80211_wake_queues.
> -		 * Mark the queue as not stopped to prevent ath_tx_complete
> -		 * from waking the queue too early.
> -		 */
>   		txq = &sc->tx.txq[i];
> -		txq->stopped = false;
>   		ath_draintxq(sc, txq);
>   	}
>   
> @@ -2293,15 +2275,12 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
>   	struct ath_txq *txq = txctl->txq;
>   	struct ath_atx_tid *tid = NULL;
>   	struct ath_buf *bf;
> -	bool queue, skip_uapsd = false, ps_resp;
> +	bool ps_resp;
>   	int q, ret;
>   
>   	if (vif)
>   		avp = (void *)vif->drv_priv;
>   
> -	if (info->flags & IEEE80211_TX_CTL_TX_OFFCHAN)
> -		txctl->force_channel = true;
> -
>   	ps_resp = !!(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE);
>   
>   	ret = ath_tx_prepare(hw, skb, txctl);
> @@ -2316,63 +2295,13 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
>   
>   	q = skb_get_queue_mapping(skb);
>   
> +	if (ps_resp)
> +		txq = sc->tx.uapsdq;
> +
>   	ath_txq_lock(sc, txq);
>   	if (txq == sc->tx.txq_map[q]) {
>   		fi->txq = q;
> -		if (++txq->pending_frames > sc->tx.txq_max_pending[q] &&
> -		    !txq->stopped) {
> -			if (ath9k_is_chanctx_enabled())
> -				ieee80211_stop_queue(sc->hw, info->hw_queue);
> -			else
> -				ieee80211_stop_queue(sc->hw, q);
> -			txq->stopped = true;
> -		}
> -	}
> -
> -	queue = ieee80211_is_data_present(hdr->frame_control);
> -
> -	/* If chanctx, queue all null frames while NOA could be there */
> -	if (ath9k_is_chanctx_enabled() &&
> -	    ieee80211_is_nullfunc(hdr->frame_control) &&
> -	    !txctl->force_channel)
> -		queue = true;
> -
> -	/* Force queueing of all frames that belong to a virtual interface on
> -	 * a different channel context, to ensure that they are sent on the
> -	 * correct channel.
> -	 */
> -	if (((avp && avp->chanctx != sc->cur_chan) ||
> -	     sc->cur_chan->stopped) && !txctl->force_channel) {
> -		if (!txctl->an)
> -			txctl->an = &avp->mcast_node;
> -		queue = true;
> -		skip_uapsd = true;
> -	}
> -
> -	if (txctl->an && queue)
> -		tid = ath_get_skb_tid(sc, txctl->an, skb);
> -
> -	if (!skip_uapsd && ps_resp) {
> -		ath_txq_unlock(sc, txq);
> -		txq = sc->tx.uapsdq;
> -		ath_txq_lock(sc, txq);
> -	} else if (txctl->an && queue) {
> -		WARN_ON(tid->txq != txctl->txq);
> -
> -		if (info->flags & IEEE80211_TX_CTL_CLEAR_PS_FILT)
> -			tid->clear_ps_filter = true;
> -
> -		/*
> -		 * Add this frame to software queue for scheduling later
> -		 * for aggregation.
> -		 */
> -		TX_STAT_INC(txq->axq_qnum, a_queued_sw);
> -		__skb_queue_tail(&tid->buf_q, skb);
> -		if (!txctl->an->sleeping)
> -			ath_tx_queue_tid(sc, txq, tid);
> -
> -		ath_txq_schedule(sc, txq);
> -		goto out;
> +		++txq->pending_frames;
>   	}
>   
>   	bf = ath_tx_setup_buffer(sc, txq, tid, skb);
> @@ -2856,9 +2785,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
>   	struct ath_atx_tid *tid;
>   	int tidno, acno;
>   
> -	for (tidno = 0, tid = &an->tid[tidno];
> -	     tidno < IEEE80211_NUM_TIDS;
> -	     tidno++, tid++) {
> +	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
> +		tid = ATH_AN_2_TID(an, tidno);
>   		tid->an        = an;
>   		tid->tidno     = tidno;
>   		tid->seq_start = tid->seq_next = 0;
> @@ -2866,11 +2794,14 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
>   		tid->baw_head  = tid->baw_tail = 0;
>   		tid->active	   = false;
>   		tid->clear_ps_filter = true;
> -		__skb_queue_head_init(&tid->buf_q);
> +		tid->has_queued  = false;
>   		__skb_queue_head_init(&tid->retry_q);
>   		INIT_LIST_HEAD(&tid->list);
>   		acno = TID_TO_WME_AC(tidno);
>   		tid->txq = sc->tx.txq_map[acno];
> +
> +		if (!an->sta)
> +			break; /* just one multicast ath_atx_tid */
>   	}
>   }
>   
> @@ -2880,9 +2811,8 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
>   	struct ath_txq *txq;
>   	int tidno;
>   
> -	for (tidno = 0, tid = &an->tid[tidno];
> -	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
> -
> +	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
> +		tid = ATH_AN_2_TID(an, tidno);
>   		txq = tid->txq;
>   
>   		ath_txq_lock(sc, txq);
> @@ -2894,6 +2824,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
>   		tid->active = false;
>   
>   		ath_txq_unlock(sc, txq);
> +
> +		if (!an->sta)
> +			break; /* just one multicast ath_atx_tid */
>   	}
>   }
>   


-- 
Mit freundlichen Gr?ssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz:  Berliner Ring 101, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Gesch?ftsf?hrer: Peter Steinh?user, Christian Scheele
http://www.dd-wrt.com
email: s.gottschall at dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v2] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 18:13       ` Felix Fietkau
@ 2016-07-06 18:52         ` Toke Høiland-Jørgensen
  2016-07-06 18:59           ` Felix Fietkau
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-06 18:52 UTC (permalink / raw)
  To: ath9k-devel

Felix Fietkau <nbd@nbd.name> writes:

> On 2016-07-06 18:16, Toke H?iland-J?rgensen wrote:
>> This switches ath9k over to using the mac80211 intermediate software
>> queueing mechanism for data packets. It removes the queueing inside the
>> driver, except for the retry queue, and instead pulls from mac80211 when
>> a packet is needed. The retry queue is used to store a packet that was
>> pulled but can't be sent immediately.
>> 
>> The old code path in ath_tx_start that would queue packets has been
>> removed completely, as has the qlen limit tunables (since there's no
>> longer a queue in the driver to limit).
>> 
>> Based on Tim's original patch set, but reworked quite thoroughly.
>> 
>> Cc: Tim Shepard <shep@alum.mit.edu>
>> Cc: Felix Fietkau <nbd@nbd.name>
>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>> ---
>> Changes since v1:
>>   - Remove the old intermediate queueing logic completely instead of
>>     just disabling it.
>>   - Remove the qlen debug tunables.
>>   - Remove the force_channel parameter from struct txctl (since we just
>>     removed the code path that was using it).
>> 
>>  drivers/net/wireless/ath/ath9k/ath9k.h     |  12 +-
>>  drivers/net/wireless/ath/ath9k/channel.c   |   2 -
>>  drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
>>  drivers/net/wireless/ath/ath9k/debug.h     |   2 -
>>  drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
>>  drivers/net/wireless/ath/ath9k/init.c      |   2 +-
>>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>>  drivers/net/wireless/ath/ath9k/xmit.c      | 307 +++++++++++------------------
>>  8 files changed, 130 insertions(+), 214 deletions(-)
> Nice work!

Thanks :)

>> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> index fe795fc..4077eeb 100644
>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> @@ -912,8 +923,16 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>>  		seqno = bf->bf_state.seqno;
>>  
>>  		/* do not step over block-ack window */
>> -		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
>> +		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
>> +			__skb_queue_tail(&tid->retry_q, skb);
>> +
>> +			/* If there are other skbs in the retry q, they are
>> +			 * probably within the BAW, so loop immediately to get
>> +			 * one of them. Otherwise the queue can get stuck. */
>> +			if (!skb_queue_is_first(&tid->retry_q, skb))
>> +				continue;
> Not sure if this can happen, but if we ever somehow end up with two skbs
> in the retry queue that do not fit into the Block-Ack window, there's
> potential for an infinite loop here.

Yes, I realise that (v1 contained a comment on that). However, I don't
actually think it can happen: The code will only ever put one skb from
the intermediate queues on the retry queue (ath_tid_pull() is only
called if the retry queue is empty). Everything else on there are actual
retries that have been put on there by ath_tx_complete_aggr(). Figure
the latter will always be within the BAW?

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v2] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 18:52         ` Toke Høiland-Jørgensen
@ 2016-07-06 18:59           ` Felix Fietkau
  2016-07-06 19:08             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Felix Fietkau @ 2016-07-06 18:59 UTC (permalink / raw)
  To: ath9k-devel

On 2016-07-06 20:52, Toke H?iland-J?rgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:
> 
>> On 2016-07-06 18:16, Toke H?iland-J?rgensen wrote:
>>> This switches ath9k over to using the mac80211 intermediate software
>>> queueing mechanism for data packets. It removes the queueing inside the
>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>> a packet is needed. The retry queue is used to store a packet that was
>>> pulled but can't be sent immediately.
>>> 
>>> The old code path in ath_tx_start that would queue packets has been
>>> removed completely, as has the qlen limit tunables (since there's no
>>> longer a queue in the driver to limit).
>>> 
>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>> 
>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>> Cc: Felix Fietkau <nbd@nbd.name>
>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>> ---
>>> Changes since v1:
>>>   - Remove the old intermediate queueing logic completely instead of
>>>     just disabling it.
>>>   - Remove the qlen debug tunables.
>>>   - Remove the force_channel parameter from struct txctl (since we just
>>>     removed the code path that was using it).
>>> 
>>>  drivers/net/wireless/ath/ath9k/ath9k.h     |  12 +-
>>>  drivers/net/wireless/ath/ath9k/channel.c   |   2 -
>>>  drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
>>>  drivers/net/wireless/ath/ath9k/debug.h     |   2 -
>>>  drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
>>>  drivers/net/wireless/ath/ath9k/init.c      |   2 +-
>>>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>>>  drivers/net/wireless/ath/ath9k/xmit.c      | 307 +++++++++++------------------
>>>  8 files changed, 130 insertions(+), 214 deletions(-)
>> Nice work!
> 
> Thanks :)
> 
>>> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>>> index fe795fc..4077eeb 100644
>>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>>> @@ -912,8 +923,16 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>>>  		seqno = bf->bf_state.seqno;
>>>  
>>>  		/* do not step over block-ack window */
>>> -		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
>>> +		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
>>> +			__skb_queue_tail(&tid->retry_q, skb);
>>> +
>>> +			/* If there are other skbs in the retry q, they are
>>> +			 * probably within the BAW, so loop immediately to get
>>> +			 * one of them. Otherwise the queue can get stuck. */
>>> +			if (!skb_queue_is_first(&tid->retry_q, skb))
>>> +				continue;
>> Not sure if this can happen, but if we ever somehow end up with two skbs
>> in the retry queue that do not fit into the Block-Ack window, there's
>> potential for an infinite loop here.
> 
> Yes, I realise that (v1 contained a comment on that). However, I don't
> actually think it can happen: The code will only ever put one skb from
> the intermediate queues on the retry queue (ath_tid_pull() is only
> called if the retry queue is empty). Everything else on there are actual
> retries that have been put on there by ath_tx_complete_aggr(). Figure
> the latter will always be within the BAW?
I think it would be a good idea to have a check there (with WARN_ON), in
case there's some weird corner case with seqno handling, software retry
and aggregation state changes.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v2] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 18:59           ` Felix Fietkau
@ 2016-07-06 19:08             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-06 19:08 UTC (permalink / raw)
  To: ath9k-devel

Felix Fietkau <nbd@nbd.name> writes:

> On 2016-07-06 20:52, Toke H?iland-J?rgensen wrote:
>> Felix Fietkau <nbd@nbd.name> writes:
>> 
>>> On 2016-07-06 18:16, Toke H?iland-J?rgensen wrote:
>>>> This switches ath9k over to using the mac80211 intermediate software
>>>> queueing mechanism for data packets. It removes the queueing inside the
>>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>>> a packet is needed. The retry queue is used to store a packet that was
>>>> pulled but can't be sent immediately.
>>>> 
>>>> The old code path in ath_tx_start that would queue packets has been
>>>> removed completely, as has the qlen limit tunables (since there's no
>>>> longer a queue in the driver to limit).
>>>> 
>>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>>> 
>>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>>> Cc: Felix Fietkau <nbd@nbd.name>
>>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>>> ---
>>>> Changes since v1:
>>>>   - Remove the old intermediate queueing logic completely instead of
>>>>     just disabling it.
>>>>   - Remove the qlen debug tunables.
>>>>   - Remove the force_channel parameter from struct txctl (since we just
>>>>     removed the code path that was using it).
>>>> 
>>>>  drivers/net/wireless/ath/ath9k/ath9k.h     |  12 +-
>>>>  drivers/net/wireless/ath/ath9k/channel.c   |   2 -
>>>>  drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
>>>>  drivers/net/wireless/ath/ath9k/debug.h     |   2 -
>>>>  drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
>>>>  drivers/net/wireless/ath/ath9k/init.c      |   2 +-
>>>>  drivers/net/wireless/ath/ath9k/main.c      |   1 +
>>>>  drivers/net/wireless/ath/ath9k/xmit.c      | 307 +++++++++++------------------
>>>>  8 files changed, 130 insertions(+), 214 deletions(-)
>>> Nice work!
>> 
>> Thanks :)
>> 
>>>> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>>>> index fe795fc..4077eeb 100644
>>>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>>>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>>>> @@ -912,8 +923,16 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
>>>>  		seqno = bf->bf_state.seqno;
>>>>  
>>>>  		/* do not step over block-ack window */
>>>> -		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
>>>> +		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
>>>> +			__skb_queue_tail(&tid->retry_q, skb);
>>>> +
>>>> +			/* If there are other skbs in the retry q, they are
>>>> +			 * probably within the BAW, so loop immediately to get
>>>> +			 * one of them. Otherwise the queue can get stuck. */
>>>> +			if (!skb_queue_is_first(&tid->retry_q, skb))
>>>> +				continue;
>>> Not sure if this can happen, but if we ever somehow end up with two skbs
>>> in the retry queue that do not fit into the Block-Ack window, there's
>>> potential for an infinite loop here.
>> 
>> Yes, I realise that (v1 contained a comment on that). However, I don't
>> actually think it can happen: The code will only ever put one skb from
>> the intermediate queues on the retry queue (ath_tid_pull() is only
>> called if the retry queue is empty). Everything else on there are actual
>> retries that have been put on there by ath_tx_complete_aggr(). Figure
>> the latter will always be within the BAW?
> I think it would be a good idea to have a check there (with WARN_ON), in
> case there's some weird corner case with seqno handling, software retry
> and aggregation state changes.

Right, can do :)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 16:17     ` [ath9k-devel] [PATCH v2] " Toke Høiland-Jørgensen
  2016-07-06 18:13       ` Felix Fietkau
  2016-07-06 18:19       ` Sebastian Gottschall
@ 2016-07-06 19:38       ` Toke Høiland-Jørgensen
  2016-07-08 14:26         ` [ath9k-devel] [v3] " Kalle Valo
                           ` (2 more replies)
  2 siblings, 3 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-06 19:38 UTC (permalink / raw)
  To: ath9k-devel

This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.

The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).

Based on Tim's original patch set, but reworked quite thoroughly.

Cc: Tim Shepard <shep@alum.mit.edu>
Cc: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
---
Changes since v2:
  - Add safeguard against looping infinitely in
    ath_tx_get_tid_subframe().
  
 drivers/net/wireless/ath/ath9k/ath9k.h     |  12 +-
 drivers/net/wireless/ath/ath9k/channel.c   |   2 -
 drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
 drivers/net/wireless/ath/ath9k/debug.h     |   2 -
 drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
 drivers/net/wireless/ath/ath9k/init.c      |   2 +-
 drivers/net/wireless/ath/ath9k/main.c      |   1 +
 drivers/net/wireless/ath/ath9k/xmit.c      | 312 ++++++++++++-----------------
 8 files changed, 134 insertions(+), 215 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 5294595..daf972c 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -91,7 +91,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define ATH_RXBUF               512
 #define ATH_TXBUF               512
 #define ATH_TXBUF_RESERVE       5
-#define ATH_MAX_QDEPTH          (ATH_TXBUF / 4 - ATH_TXBUF_RESERVE)
 #define ATH_TXMAXTRY            13
 #define ATH_MAX_SW_RETRIES      30
 
@@ -145,7 +144,9 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define BAW_WITHIN(_start, _bawsz, _seqno) \
 	((((_seqno) - (_start)) & 4095) < (_bawsz))
 
-#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
+#define ATH_STA_2_TID(_sta, _tidno) ((struct ath_atx_tid *)(_sta)->txq[_tidno]->drv_priv)
+#define ATH_VIF_2_TID(_vif) ((struct ath_atx_tid *)(_vif)->txq->drv_priv)
+#define ATH_AN_2_TID(_an, _tidno) ((_an)->sta ? ATH_STA_2_TID((_an)->sta, _tidno) : ATH_VIF_2_TID((_an)->vif))
 
 #define IS_HT_RATE(rate)   (rate & 0x80)
 #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
@@ -164,7 +165,6 @@ struct ath_txq {
 	spinlock_t axq_lock;
 	u32 axq_depth;
 	u32 axq_ampdu_depth;
-	bool stopped;
 	bool axq_tx_inprogress;
 	struct list_head txq_fifo[ATH_TXFIFO_DEPTH];
 	u8 txq_headidx;
@@ -232,7 +232,6 @@ struct ath_buf {
 
 struct ath_atx_tid {
 	struct list_head list;
-	struct sk_buff_head buf_q;
 	struct sk_buff_head retry_q;
 	struct ath_node *an;
 	struct ath_txq *txq;
@@ -247,13 +246,13 @@ struct ath_atx_tid {
 	s8 bar_index;
 	bool active;
 	bool clear_ps_filter;
+	bool has_queued;
 };
 
 struct ath_node {
 	struct ath_softc *sc;
 	struct ieee80211_sta *sta; /* station struct we're part of */
 	struct ieee80211_vif *vif; /* interface with which we're associated */
-	struct ath_atx_tid tid[IEEE80211_NUM_TIDS];
 
 	u16 maxampdu;
 	u8 mpdudensity;
@@ -276,7 +275,6 @@ struct ath_tx_control {
 	struct ath_node *an;
 	struct ieee80211_sta *sta;
 	u8 paprd;
-	bool force_channel;
 };
 
 
@@ -293,7 +291,6 @@ struct ath_tx {
 	struct ath_descdma txdma;
 	struct ath_txq *txq_map[IEEE80211_NUM_ACS];
 	struct ath_txq *uapsdq;
-	u32 txq_max_pending[IEEE80211_NUM_ACS];
 	u16 max_aggr_framelen[IEEE80211_NUM_ACS][4][32];
 };
 
@@ -585,6 +582,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 				   u16 tids, int nframes,
 				   enum ieee80211_frame_release_type reason,
 				   bool more_data);
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue);
 
 /********/
 /* VIFs */
diff --git a/drivers/net/wireless/ath/ath9k/channel.c b/drivers/net/wireless/ath/ath9k/channel.c
index 319cb5f..a5ce016 100644
--- a/drivers/net/wireless/ath/ath9k/channel.c
+++ b/drivers/net/wireless/ath/ath9k/channel.c
@@ -1007,7 +1007,6 @@ static void ath_scan_send_probe(struct ath_softc *sc,
 		goto error;
 
 	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
-	txctl.force_channel = true;
 	if (ath_tx_start(sc->hw, skb, &txctl))
 		goto error;
 
@@ -1130,7 +1129,6 @@ ath_chanctx_send_vif_ps_frame(struct ath_softc *sc, struct ath_vif *avp,
 	memset(&txctl, 0, sizeof(txctl));
 	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
 	txctl.sta = sta;
-	txctl.force_channel = true;
 	if (ath_tx_start(sc->hw, skb, &txctl)) {
 		ieee80211_free_txskb(sc->hw, skb);
 		return false;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index 6de64cf..48b181d 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -600,7 +600,6 @@ static int read_file_xmit(struct seq_file *file, void *data)
 	PR("MPDUs XRetried:  ", xretries);
 	PR("Aggregates:      ", a_aggr);
 	PR("AMPDUs Queued HW:", a_queued_hw);
-	PR("AMPDUs Queued SW:", a_queued_sw);
 	PR("AMPDUs Completed:", a_completed);
 	PR("AMPDUs Retried:  ", a_retries);
 	PR("AMPDUs XRetried: ", a_xretries);
@@ -629,8 +628,7 @@ static void print_queue(struct ath_softc *sc, struct ath_txq *txq,
 	seq_printf(file, "%s: %d ", "qnum", txq->axq_qnum);
 	seq_printf(file, "%s: %2d ", "qdepth", txq->axq_depth);
 	seq_printf(file, "%s: %2d ", "ampdu-depth", txq->axq_ampdu_depth);
-	seq_printf(file, "%s: %3d ", "pending", txq->pending_frames);
-	seq_printf(file, "%s: %d\n", "stopped", txq->stopped);
+	seq_printf(file, "%s: %3d\n", "pending", txq->pending_frames);
 
 	ath_txq_unlock(sc, txq);
 }
@@ -1190,7 +1188,6 @@ static const char ath9k_gstrings_stats[][ETH_GSTRING_LEN] = {
 	AMKSTR(d_tx_mpdu_xretries),
 	AMKSTR(d_tx_aggregates),
 	AMKSTR(d_tx_ampdus_queued_hw),
-	AMKSTR(d_tx_ampdus_queued_sw),
 	AMKSTR(d_tx_ampdus_completed),
 	AMKSTR(d_tx_ampdu_retries),
 	AMKSTR(d_tx_ampdu_xretries),
@@ -1270,7 +1267,6 @@ void ath9k_get_et_stats(struct ieee80211_hw *hw,
 	AWDATA(xretries);
 	AWDATA(a_aggr);
 	AWDATA(a_queued_hw);
-	AWDATA(a_queued_sw);
 	AWDATA(a_completed);
 	AWDATA(a_retries);
 	AWDATA(a_xretries);
@@ -1328,14 +1324,6 @@ int ath9k_init_debug(struct ath_hw *ah)
 				    read_file_xmit);
 	debugfs_create_devm_seqfile(sc->dev, "queues", sc->debug.debugfs_phy,
 				    read_file_queues);
-	debugfs_create_u32("qlen_bk", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_BK]);
-	debugfs_create_u32("qlen_be", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_BE]);
-	debugfs_create_u32("qlen_vi", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_VI]);
-	debugfs_create_u32("qlen_vo", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_VO]);
 	debugfs_create_devm_seqfile(sc->dev, "misc", sc->debug.debugfs_phy,
 				    read_file_misc);
 	debugfs_create_devm_seqfile(sc->dev, "reset", sc->debug.debugfs_phy,
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index cd68c5f..a078cdd 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -147,7 +147,6 @@ struct ath_interrupt_stats {
  * @completed: Total MPDUs (non-aggr) completed
  * @a_aggr: Total no. of aggregates queued
  * @a_queued_hw: Total AMPDUs queued to hardware
- * @a_queued_sw: Total AMPDUs queued to software queues
  * @a_completed: Total AMPDUs completed
  * @a_retries: No. of AMPDUs retried (SW)
  * @a_xretries: No. of AMPDUs dropped due to xretries
@@ -174,7 +173,6 @@ struct ath_tx_stats {
 	u32 xretries;
 	u32 a_aggr;
 	u32 a_queued_hw;
-	u32 a_queued_sw;
 	u32 a_completed;
 	u32 a_retries;
 	u32 a_xretries;
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index c2ca57a..d789798 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -52,8 +52,8 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
 			 "TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
 			 "BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_STA_2_TID(an->sta, tidno);
 		txq = tid->txq;
 		ath_txq_lock(sc, txq);
 		if (tid->active) {
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 1c226d6..752cacb 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -354,7 +354,6 @@ static int ath9k_init_queues(struct ath_softc *sc)
 	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
 		sc->tx.txq_map[i] = ath_txq_setup(sc, ATH9K_TX_QUEUE_DATA, i);
 		sc->tx.txq_map[i]->mac80211_qnum = i;
-		sc->tx.txq_max_pending[i] = ATH_MAX_QDEPTH;
 	}
 	return 0;
 }
@@ -867,6 +866,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
 	hw->max_rate_tries = 10;
 	hw->sta_data_size = sizeof(struct ath_node);
 	hw->vif_data_size = sizeof(struct ath_vif);
+	hw->txq_data_size = sizeof(struct ath_atx_tid);
 	hw->extra_tx_headroom = 4;
 
 	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 3aed43a..f584e19 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2673,4 +2673,5 @@ struct ieee80211_ops ath9k_ops = {
 	.sw_scan_start	    = ath9k_sw_scan_start,
 	.sw_scan_complete   = ath9k_sw_scan_complete,
 	.get_txpower        = ath9k_get_txpower,
+	.wake_tx_queue      = ath9k_wake_tx_queue,
 };
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index fe795fc..c2a2dbe 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
 					   struct ath_txq *txq,
 					   struct ath_atx_tid *tid,
 					   struct sk_buff *skb);
+static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
+			  struct ath_tx_control *txctl);
 
 enum {
 	MCS_HT20,
@@ -118,6 +120,26 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
 		list_add_tail(&tid->list, list);
 }
 
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue)
+{
+	struct ath_softc *sc = hw->priv;
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	struct ath_atx_tid *tid = (struct ath_atx_tid *) queue->drv_priv;
+	struct ath_txq *txq = tid->txq;
+
+	ath_dbg(common, QUEUE, "Waking TX queue: %pM (%d)\n",
+		queue->sta ? queue->sta->addr : queue->vif->addr,
+		tid->tidno);
+
+	ath_txq_lock(sc, txq);
+
+	tid->has_queued = true;
+	ath_tx_queue_tid(sc, txq, tid);
+	ath_txq_schedule(sc, txq);
+
+	ath_txq_unlock(sc, txq);
+}
+
 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
 {
 	struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
@@ -145,7 +167,6 @@ static void ath_set_rates(struct ieee80211_vif *vif, struct ieee80211_sta *sta,
 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
 			     struct sk_buff *skb)
 {
-	struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
 	struct ath_frame_info *fi = get_frame_info(skb);
 	int q = fi->txq;
 
@@ -156,14 +177,6 @@ static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
 	if (WARN_ON(--txq->pending_frames < 0))
 		txq->pending_frames = 0;
 
-	if (txq->stopped &&
-	    txq->pending_frames < sc->tx.txq_max_pending[q]) {
-		if (ath9k_is_chanctx_enabled())
-			ieee80211_wake_queue(sc->hw, info->hw_queue);
-		else
-			ieee80211_wake_queue(sc->hw, q);
-		txq->stopped = false;
-	}
 }
 
 static struct ath_atx_tid *
@@ -173,9 +186,47 @@ ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
 	return ATH_AN_2_TID(an, tidno);
 }
 
+static struct sk_buff *
+ath_tid_pull(struct ath_atx_tid *tid)
+{
+	struct ath_softc *sc = tid->an->sc;
+	struct ieee80211_hw *hw = sc->hw;
+	struct ath_tx_control txctl = {
+		.txq = tid->txq,
+		.sta = tid->an->sta,
+	};
+	struct sk_buff *skb;
+	struct ath_frame_info *fi;
+	int q;
+
+	if (!tid->has_queued)
+		return NULL;
+
+	skb = ieee80211_tx_dequeue(hw, container_of((void*)tid, struct ieee80211_txq, drv_priv));
+	if (!skb) {
+		tid->has_queued = false;
+		return NULL;
+	}
+
+	if (ath_tx_prepare(hw, skb, &txctl)) {
+		ieee80211_free_txskb(hw, skb);
+		return NULL;
+	}
+
+	q = skb_get_queue_mapping(skb);
+	if (tid->txq == sc->tx.txq_map[q]) {
+		fi = get_frame_info(skb);
+		fi->txq = q;
+		++tid->txq->pending_frames;
+	}
+
+	return skb;
+ }
+
+
 static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
 {
-	return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
+	return !skb_queue_empty(&tid->retry_q) || tid->has_queued;
 }
 
 static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
@@ -184,46 +235,11 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
 
 	skb = __skb_dequeue(&tid->retry_q);
 	if (!skb)
-		skb = __skb_dequeue(&tid->buf_q);
+		skb = ath_tid_pull(tid);
 
 	return skb;
 }
 
-/*
- * ath_tx_tid_change_state:
- * - clears a-mpdu flag of previous session
- * - force sequence number allocation to fix next BlockAck Window
- */
-static void
-ath_tx_tid_change_state(struct ath_softc *sc, struct ath_atx_tid *tid)
-{
-	struct ath_txq *txq = tid->txq;
-	struct ieee80211_tx_info *tx_info;
-	struct sk_buff *skb, *tskb;
-	struct ath_buf *bf;
-	struct ath_frame_info *fi;
-
-	skb_queue_walk_safe(&tid->buf_q, skb, tskb) {
-		fi = get_frame_info(skb);
-		bf = fi->bf;
-
-		tx_info = IEEE80211_SKB_CB(skb);
-		tx_info->flags &= ~IEEE80211_TX_CTL_AMPDU;
-
-		if (bf)
-			continue;
-
-		bf = ath_tx_setup_buffer(sc, txq, tid, skb);
-		if (!bf) {
-			__skb_unlink(skb, &tid->buf_q);
-			ath_txq_skb_done(sc, txq, skb);
-			ieee80211_free_txskb(sc->hw, skb);
-			continue;
-		}
-	}
-
-}
-
 static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
 {
 	struct ath_txq *txq = tid->txq;
@@ -858,20 +874,16 @@ static int ath_compute_num_delims(struct ath_softc *sc, struct ath_atx_tid *tid,
 
 static struct ath_buf *
 ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
-			struct ath_atx_tid *tid, struct sk_buff_head **q)
+			struct ath_atx_tid *tid)
 {
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
-	struct sk_buff *skb;
+	struct sk_buff *skb, *first_skb = NULL;
 	struct ath_buf *bf;
 	u16 seqno;
 
 	while (1) {
-		*q = &tid->retry_q;
-		if (skb_queue_empty(*q))
-			*q = &tid->buf_q;
-
-		skb = skb_peek(*q);
+		skb = ath_tid_dequeue(tid);
 		if (!skb)
 			break;
 
@@ -883,7 +895,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 			bf->bf_state.stale = false;
 
 		if (!bf) {
-			__skb_unlink(skb, *q);
 			ath_txq_skb_done(sc, txq, skb);
 			ieee80211_free_txskb(sc->hw, skb);
 			continue;
@@ -912,8 +923,19 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 		seqno = bf->bf_state.seqno;
 
 		/* do not step over block-ack window */
-		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
+		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
+			__skb_queue_tail(&tid->retry_q, skb);
+
+			/* If there are other skbs in the retry q, they are
+			 * probably within the BAW, so loop immediately to get
+			 * one of them. Otherwise the queue can get stuck. */
+			if (!skb_queue_is_first(&tid->retry_q, skb) && skb != first_skb) {
+				if(!first_skb) /* infinite loop prevention */
+					first_skb = skb;
+				continue;
+			}
 			break;
+		}
 
 		if (tid->bar_index > ATH_BA_INDEX(tid->seq_start, seqno)) {
 			struct ath_tx_status ts = {};
@@ -921,7 +943,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 
 			INIT_LIST_HEAD(&bf_head);
 			list_add(&bf->list, &bf_head);
-			__skb_unlink(skb, *q);
 			ath_tx_update_baw(sc, tid, seqno);
 			ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0);
 			continue;
@@ -933,11 +954,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	return NULL;
 }
 
-static bool
+static int
 ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		 struct ath_atx_tid *tid, struct list_head *bf_q,
-		 struct ath_buf *bf_first, struct sk_buff_head *tid_q,
-		 int *aggr_len)
+		 struct ath_buf *bf_first)
 {
 #define PADBYTES(_len) ((4 - ((_len) % 4)) % 4)
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
@@ -947,12 +967,13 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
 	struct sk_buff *skb;
-	bool closed = false;
+
 
 	bf = bf_first;
 	aggr_limit = ath_lookup_rate(sc, bf, tid);
 
-	do {
+	while (bf)
+	{
 		skb = bf->bf_mpdu;
 		fi = get_frame_info(skb);
 
@@ -961,12 +982,12 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes) {
 			if (aggr_limit < al + bpad + al_delta ||
 			    ath_lookup_legacy(bf) || nframes >= h_baw)
-				break;
+				goto stop;
 
 			tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 			if ((tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) ||
 			    !(tx_info->flags & IEEE80211_TX_CTL_AMPDU))
-				break;
+				goto stop;
 		}
 
 		/* add padding for previous frame to aggregation length */
@@ -988,20 +1009,18 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 			ath_tx_addto_baw(sc, tid, bf);
 		bf->bf_state.ndelim = ndelim;
 
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
 
 		bf_prev = bf;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
-		if (!bf) {
-			closed = true;
-			break;
-		}
-	} while (ath_tid_has_buffered(tid));
-
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
+	}
+	goto finish;
+stop:
+	__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
+finish:
 	bf = bf_first;
 	bf->bf_lastbf = bf_prev;
 
@@ -1012,9 +1031,7 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		TX_STAT_INC(txq->axq_qnum, a_aggr);
 	}
 
-	*aggr_len = al;
-
-	return closed;
+	return al;
 #undef PADBYTES
 }
 
@@ -1391,18 +1408,15 @@ static void ath_tx_fill_desc(struct ath_softc *sc, struct ath_buf *bf,
 static void
 ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		  struct ath_atx_tid *tid, struct list_head *bf_q,
-		  struct ath_buf *bf_first, struct sk_buff_head *tid_q)
+		  struct ath_buf *bf_first)
 {
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
-	struct sk_buff *skb;
 	int nframes = 0;
 
 	do {
 		struct ieee80211_tx_info *tx_info;
-		skb = bf->bf_mpdu;
 
 		nframes++;
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
@@ -1411,13 +1425,15 @@ ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes >= 2)
 			break;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
 		if (!bf)
 			break;
 
 		tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
-		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU)
+		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU) {
+			__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 			break;
+		}
 
 		ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	} while (1);
@@ -1428,34 +1444,33 @@ static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
 {
 	struct ath_buf *bf;
 	struct ieee80211_tx_info *tx_info;
-	struct sk_buff_head *tid_q;
 	struct list_head bf_q;
 	int aggr_len = 0;
-	bool aggr, last = true;
+	bool aggr;
 
 	if (!ath_tid_has_buffered(tid))
 		return false;
 
 	INIT_LIST_HEAD(&bf_q);
 
-	bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+	bf = ath_tx_get_tid_subframe(sc, txq, tid);
 	if (!bf)
 		return false;
 
 	tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 	aggr = !!(tx_info->flags & IEEE80211_TX_CTL_AMPDU);
 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
-		(!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 		*stop = true;
 		return false;
 	}
 
 	ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	if (aggr)
-		last = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf,
-					tid_q, &aggr_len);
+		aggr_len = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf);
 	else
-		ath_tx_form_burst(sc, txq, tid, &bf_q, bf, tid_q);
+		ath_tx_form_burst(sc, txq, tid, &bf_q, bf);
 
 	if (list_empty(&bf_q))
 		return false;
@@ -1498,9 +1513,6 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
 		an->mpdudensity = density;
 	}
 
-	/* force sequence number allocation for pending frames */
-	ath_tx_tid_change_state(sc, txtid);
-
 	txtid->active = true;
 	*ssn = txtid->seq_start = txtid->seq_next;
 	txtid->bar_index = -1;
@@ -1525,7 +1537,6 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
 	ath_txq_lock(sc, txq);
 	txtid->active = false;
 	ath_tx_flush_tid(sc, txtid);
-	ath_tx_tid_change_state(sc, txtid);
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1535,14 +1546,12 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
-	bool buffered;
 	int tidno;
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -1552,13 +1561,9 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 			continue;
 		}
 
-		buffered = ath_tid_has_buffered(tid);
-
 		list_del_init(&tid->list);
 
 		ath_txq_unlock(sc, txq);
-
-		ieee80211_sta_set_buffered(sta, tidno, buffered);
 	}
 }
 
@@ -1571,19 +1576,12 @@ void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
 		tid->clear_ps_filter = true;
-
-		if (ath_tid_has_buffered(tid)) {
-			ath_tx_queue_tid(sc, txq, tid);
-			ath_txq_schedule(sc, txq);
-		}
-
 		ath_txq_unlock_complete(sc, txq);
 	}
 }
@@ -1606,11 +1604,6 @@ void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
 
 	tid->baw_size = IEEE80211_MIN_AMPDU_BUF << sta->ht_cap.ampdu_factor;
 
-	if (ath_tid_has_buffered(tid)) {
-		ath_tx_queue_tid(sc, txq, tid);
-		ath_txq_schedule(sc, txq);
-	}
-
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1626,7 +1619,6 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 	struct ieee80211_tx_info *info;
 	struct list_head bf_q;
 	struct ath_buf *bf_tail = NULL, *bf;
-	struct sk_buff_head *tid_q;
 	int sent = 0;
 	int i;
 
@@ -1641,11 +1633,10 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 
 		ath_txq_lock(sc, tid->txq);
 		while (nframes > 0) {
-			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid, &tid_q);
+			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid);
 			if (!bf)
 				break;
 
-			__skb_unlink(bf->bf_mpdu, tid_q);
 			list_add_tail(&bf->list, &bf_q);
 			ath_set_rates(tid->an->vif, tid->an->sta, bf);
 			if (bf_isampdu(bf)) {
@@ -1660,7 +1651,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 			sent++;
 			TX_STAT_INC(txq->axq_qnum, a_queued_hw);
 
-			if (an->sta && !ath_tid_has_buffered(tid))
+			if (an->sta && skb_queue_empty(&tid->retry_q))
 				ieee80211_sta_set_buffered(an->sta, i, false);
 		}
 		ath_txq_unlock_complete(sc, tid->txq);
@@ -1887,13 +1878,7 @@ bool ath_drain_all_txq(struct ath_softc *sc)
 		if (!ATH_TXQ_SETUP(sc, i))
 			continue;
 
-		/*
-		 * The caller will resume queues with ieee80211_wake_queues.
-		 * Mark the queue as not stopped to prevent ath_tx_complete
-		 * from waking the queue too early.
-		 */
 		txq = &sc->tx.txq[i];
-		txq->stopped = false;
 		ath_draintxq(sc, txq);
 	}
 
@@ -2293,15 +2278,12 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 	struct ath_txq *txq = txctl->txq;
 	struct ath_atx_tid *tid = NULL;
 	struct ath_buf *bf;
-	bool queue, skip_uapsd = false, ps_resp;
+	bool ps_resp;
 	int q, ret;
 
 	if (vif)
 		avp = (void *)vif->drv_priv;
 
-	if (info->flags & IEEE80211_TX_CTL_TX_OFFCHAN)
-		txctl->force_channel = true;
-
 	ps_resp = !!(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE);
 
 	ret = ath_tx_prepare(hw, skb, txctl);
@@ -2316,63 +2298,13 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 
 	q = skb_get_queue_mapping(skb);
 
+	if (ps_resp)
+		txq = sc->tx.uapsdq;
+
 	ath_txq_lock(sc, txq);
 	if (txq == sc->tx.txq_map[q]) {
 		fi->txq = q;
-		if (++txq->pending_frames > sc->tx.txq_max_pending[q] &&
-		    !txq->stopped) {
-			if (ath9k_is_chanctx_enabled())
-				ieee80211_stop_queue(sc->hw, info->hw_queue);
-			else
-				ieee80211_stop_queue(sc->hw, q);
-			txq->stopped = true;
-		}
-	}
-
-	queue = ieee80211_is_data_present(hdr->frame_control);
-
-	/* If chanctx, queue all null frames while NOA could be there */
-	if (ath9k_is_chanctx_enabled() &&
-	    ieee80211_is_nullfunc(hdr->frame_control) &&
-	    !txctl->force_channel)
-		queue = true;
-
-	/* Force queueing of all frames that belong to a virtual interface on
-	 * a different channel context, to ensure that they are sent on the
-	 * correct channel.
-	 */
-	if (((avp && avp->chanctx != sc->cur_chan) ||
-	     sc->cur_chan->stopped) && !txctl->force_channel) {
-		if (!txctl->an)
-			txctl->an = &avp->mcast_node;
-		queue = true;
-		skip_uapsd = true;
-	}
-
-	if (txctl->an && queue)
-		tid = ath_get_skb_tid(sc, txctl->an, skb);
-
-	if (!skip_uapsd && ps_resp) {
-		ath_txq_unlock(sc, txq);
-		txq = sc->tx.uapsdq;
-		ath_txq_lock(sc, txq);
-	} else if (txctl->an && queue) {
-		WARN_ON(tid->txq != txctl->txq);
-
-		if (info->flags & IEEE80211_TX_CTL_CLEAR_PS_FILT)
-			tid->clear_ps_filter = true;
-
-		/*
-		 * Add this frame to software queue for scheduling later
-		 * for aggregation.
-		 */
-		TX_STAT_INC(txq->axq_qnum, a_queued_sw);
-		__skb_queue_tail(&tid->buf_q, skb);
-		if (!txctl->an->sleeping)
-			ath_tx_queue_tid(sc, txq, tid);
-
-		ath_txq_schedule(sc, txq);
-		goto out;
+		++txq->pending_frames;
 	}
 
 	bf = ath_tx_setup_buffer(sc, txq, tid, skb);
@@ -2856,9 +2788,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 	struct ath_atx_tid *tid;
 	int tidno, acno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS;
-	     tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		tid->an        = an;
 		tid->tidno     = tidno;
 		tid->seq_start = tid->seq_next = 0;
@@ -2866,11 +2797,14 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 		tid->baw_head  = tid->baw_tail = 0;
 		tid->active	   = false;
 		tid->clear_ps_filter = true;
-		__skb_queue_head_init(&tid->buf_q);
+		tid->has_queued  = false;
 		__skb_queue_head_init(&tid->retry_q);
 		INIT_LIST_HEAD(&tid->list);
 		acno = TID_TO_WME_AC(tidno);
 		tid->txq = sc->tx.txq_map[acno];
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
@@ -2880,9 +2814,8 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 	struct ath_txq *txq;
 	int tidno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ATH_AN_2_TID(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -2894,6 +2827,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 		tid->active = false;
 
 		ath_txq_unlock(sc, txq);
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 19:38       ` [ath9k-devel] [PATCH v3] " Toke Høiland-Jørgensen
@ 2016-07-08 14:26         ` Kalle Valo
  2016-07-08 15:53           ` Toke Høiland-Jørgensen
  2016-07-08 16:38         ` [ath9k-devel] [PATCH v3] " Tim Shepard
  2016-08-05 16:05         ` [ath9k-devel] [PATCH v4] " Toke Høiland-Jørgensen
  2 siblings, 1 reply; 50+ messages in thread
From: Kalle Valo @ 2016-07-08 14:26 UTC (permalink / raw)
  To: ath9k-devel

Toke H?iland-J?rgensen wrote:
> This switches ath9k over to using the mac80211 intermediate software
> queueing mechanism for data packets. It removes the queueing inside the
> driver, except for the retry queue, and instead pulls from mac80211 when
> a packet is needed. The retry queue is used to store a packet that was
> pulled but can't be sent immediately.
> 
> The old code path in ath_tx_start that would queue packets has been
> removed completely, as has the qlen limit tunables (since there's no
> longer a queue in the driver to limit).
> 
> Based on Tim's original patch set, but reworked quite thoroughly.
> 
> Cc: Tim Shepard <shep@alum.mit.edu>
> Cc: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>

Nice work. Because this is such a significant change, and to maximise testing
time, I'm planning to queue this for 4.9 (so I would apply this to ath-next in
3-4 weeks after the merge window closes). But anyone who wants to test this can
use master-pending branch from my ath.git tree (uses wireless-testing as the
baseline). Sounds good?

Testing and review feedback very welcome!

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9216993/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 14:26         ` [ath9k-devel] [v3] " Kalle Valo
@ 2016-07-08 15:53           ` Toke Høiland-Jørgensen
  2016-07-08 16:10             ` Felix Fietkau
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-08 15:53 UTC (permalink / raw)
  To: ath9k-devel

Kalle Valo <kvalo@qca.qualcomm.com> writes:

> Toke H?iland-J?rgensen wrote:
>> This switches ath9k over to using the mac80211 intermediate software
>> queueing mechanism for data packets. It removes the queueing inside the
>> driver, except for the retry queue, and instead pulls from mac80211 when
>> a packet is needed. The retry queue is used to store a packet that was
>> pulled but can't be sent immediately.
>> 
>> The old code path in ath_tx_start that would queue packets has been
>> removed completely, as has the qlen limit tunables (since there's no
>> longer a queue in the driver to limit).
>> 
>> Based on Tim's original patch set, but reworked quite thoroughly.
>> 
>> Cc: Tim Shepard <shep@alum.mit.edu>
>> Cc: Felix Fietkau <nbd@nbd.name>
>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>
> Nice work.

Thanks :)

> Because this is such a significant change, and to maximise testing
> time, I'm planning to queue this for 4.9 (so I would apply this to
> ath-next in 3-4 weeks after the merge window closes). But anyone who
> wants to test this can use master-pending branch from my ath.git tree
> (uses wireless-testing as the baseline). Sounds good?

Sounds good to me. I'm planning on backporting this and Michael's
mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
Hopefully that will get it some more testing as well.

> Testing and review feedback very welcome!

My own evaluation results are here:
https://blog.tohojo.dk/2016/06/fixing-the-wifi-performance-anomaly-on-ath9k.html
-- I see aggregate throughput to multiple stations improve by a factor
of ~3 and latency under load decrease by a factor of ~10 now that we can
take advantage of the mac80211 FQ-CoDel patches.

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 15:53           ` Toke Høiland-Jørgensen
@ 2016-07-08 16:10             ` Felix Fietkau
  2016-07-08 16:28               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Felix Fietkau @ 2016-07-08 16:10 UTC (permalink / raw)
  To: ath9k-devel

On 2016-07-08 17:53, Toke H?iland-J?rgensen wrote:
> Kalle Valo <kvalo@qca.qualcomm.com> writes:
> 
>> Toke H?iland-J?rgensen wrote:
>>> This switches ath9k over to using the mac80211 intermediate software
>>> queueing mechanism for data packets. It removes the queueing inside the
>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>> a packet is needed. The retry queue is used to store a packet that was
>>> pulled but can't be sent immediately.
>>> 
>>> The old code path in ath_tx_start that would queue packets has been
>>> removed completely, as has the qlen limit tunables (since there's no
>>> longer a queue in the driver to limit).
>>> 
>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>> 
>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>> Cc: Felix Fietkau <nbd@nbd.name>
>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>
>> Nice work.
> 
> Thanks :)
> 
>> Because this is such a significant change, and to maximise testing
>> time, I'm planning to queue this for 4.9 (so I would apply this to
>> ath-next in 3-4 weeks after the merge window closes). But anyone who
>> wants to test this can use master-pending branch from my ath.git tree
>> (uses wireless-testing as the baseline). Sounds good?
> 
> Sounds good to me. I'm planning on backporting this and Michael's
> mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
> Hopefully that will get it some more testing as well.
I've pushed a backport of this into my LEDE staging tree:
https://git.lede-project.org/?p=lede/nbd/staging.git;a=summary

I don't have time for testing it myself at the moment, but I'll try to
get some people to do so.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 16:10             ` Felix Fietkau
@ 2016-07-08 16:28               ` Toke Høiland-Jørgensen
  2016-07-08 16:31                 ` Felix Fietkau
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-08 16:28 UTC (permalink / raw)
  To: ath9k-devel

Felix Fietkau <nbd@nbd.name> writes:

> On 2016-07-08 17:53, Toke H?iland-J?rgensen wrote:
>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>> 
>>> Toke H?iland-J?rgensen wrote:
>>>> This switches ath9k over to using the mac80211 intermediate software
>>>> queueing mechanism for data packets. It removes the queueing inside the
>>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>>> a packet is needed. The retry queue is used to store a packet that was
>>>> pulled but can't be sent immediately.
>>>> 
>>>> The old code path in ath_tx_start that would queue packets has been
>>>> removed completely, as has the qlen limit tunables (since there's no
>>>> longer a queue in the driver to limit).
>>>> 
>>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>>> 
>>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>>> Cc: Felix Fietkau <nbd@nbd.name>
>>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>>
>>> Nice work.
>> 
>> Thanks :)
>> 
>>> Because this is such a significant change, and to maximise testing
>>> time, I'm planning to queue this for 4.9 (so I would apply this to
>>> ath-next in 3-4 weeks after the merge window closes). But anyone who
>>> wants to test this can use master-pending branch from my ath.git tree
>>> (uses wireless-testing as the baseline). Sounds good?
>> 
>> Sounds good to me. I'm planning on backporting this and Michael's
>> mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
>> Hopefully that will get it some more testing as well.
> I've pushed a backport of this into my LEDE staging tree:
> https://git.lede-project.org/?p=lede/nbd/staging.git;a=summary

Awesome! What about the FQ-CoDel mac80211 patches themselves? I have a
tree where I've separated out the needed patches and rebased them on
mainline 4.4.9. Can I post that somewhere (or just email you the series)
and get you to include those as well? Or do I just dump the patch files
into the LEDE patches dir and send that as a patch to LEDE? (I see your
patch also refreshed subsequent patches; is there a script to do that
automatically?)

> I don't have time for testing it myself at the moment, but I'll try to
> get some people to do so.

Awesome :)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 16:28               ` Toke Høiland-Jørgensen
@ 2016-07-08 16:31                 ` Felix Fietkau
  2016-07-08 16:38                   ` Toke Høiland-Jørgensen
  2016-07-08 18:24                   ` Sebastian Gottschall
  0 siblings, 2 replies; 50+ messages in thread
From: Felix Fietkau @ 2016-07-08 16:31 UTC (permalink / raw)
  To: ath9k-devel

On 2016-07-08 18:28, Toke H?iland-J?rgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:
> 
>> On 2016-07-08 17:53, Toke H?iland-J?rgensen wrote:
>>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>> 
>>>> Toke H?iland-J?rgensen wrote:
>>>>> This switches ath9k over to using the mac80211 intermediate software
>>>>> queueing mechanism for data packets. It removes the queueing inside the
>>>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>>>> a packet is needed. The retry queue is used to store a packet that was
>>>>> pulled but can't be sent immediately.
>>>>> 
>>>>> The old code path in ath_tx_start that would queue packets has been
>>>>> removed completely, as has the qlen limit tunables (since there's no
>>>>> longer a queue in the driver to limit).
>>>>> 
>>>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>>>> 
>>>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>>>> Cc: Felix Fietkau <nbd@nbd.name>
>>>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>>>
>>>> Nice work.
>>> 
>>> Thanks :)
>>> 
>>>> Because this is such a significant change, and to maximise testing
>>>> time, I'm planning to queue this for 4.9 (so I would apply this to
>>>> ath-next in 3-4 weeks after the merge window closes). But anyone who
>>>> wants to test this can use master-pending branch from my ath.git tree
>>>> (uses wireless-testing as the baseline). Sounds good?
>>> 
>>> Sounds good to me. I'm planning on backporting this and Michael's
>>> mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
>>> Hopefully that will get it some more testing as well.
>> I've pushed a backport of this into my LEDE staging tree:
>> https://git.lede-project.org/?p=lede/nbd/staging.git;a=summary
> 
> Awesome! What about the FQ-CoDel mac80211 patches themselves? I have a
> tree where I've separated out the needed patches and rebased them on
> mainline 4.4.9. Can I post that somewhere (or just email you the series)
> and get you to include those as well? Or do I just dump the patch files
> into the LEDE patches dir and send that as a patch to LEDE? (I see your
> patch also refreshed subsequent patches; is there a script to do that
> automatically?)
You don't need to do anything here. LEDE does not use mac80211 and
drivers from the kernel tree, it's built using backports.
It's currently using a backports snapshot that I built myself from
wireless-testing 2016-06-20, which already includes FQ-Codel.

- Felix

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 19:38       ` [ath9k-devel] [PATCH v3] " Toke Høiland-Jørgensen
  2016-07-08 14:26         ` [ath9k-devel] [v3] " Kalle Valo
@ 2016-07-08 16:38         ` Tim Shepard
  2016-07-09 15:45           ` Toke Høiland-Jørgensen
  2016-08-05 16:05         ` [ath9k-devel] [PATCH v4] " Toke Høiland-Jørgensen
  2 siblings, 1 reply; 50+ messages in thread
From: Tim Shepard @ 2016-07-08 16:38 UTC (permalink / raw)
  To: ath9k-devel


> The old code path in ath_tx_start that would queue packets has been
> removed completely, 

It seems to me that this breaks the ath9k driver when non-data packets
which mac80211 will not queue on the new intermediate queues, see
ieee80211_drv_tx( ) in mac80211/tx.c where it says

	if (!ieee80211_is_data(hdr->frame_control))
		goto tx_normal;

This means that non-data packets can come down from mac80211 via
ath_tx --> ath_tx_start which might be for a vif that is not on the
same channel, and so cannot be sent straight away but must be queued.
Maybe this problem should be fixed in mac80211, but as of now this is
a problem.   It appears to me that your new patch just sends them on
the wrong channel.

Maybe I'm confused about something, hints appreciated.


> as has the qlen limit tunables (since there's no
> longer a queue in the driver to limit).

> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
> index 5294595..daf972c 100644
> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> @@ -91,7 +91,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>  #define ATH_RXBUF               512
>  #define ATH_TXBUF               512
>  #define ATH_TXBUF_RESERVE       5
> -#define ATH_MAX_QDEPTH          (ATH_TXBUF / 4 - ATH_TXBUF_RESERVE)
>  #define ATH_TXMAXTRY            13
>  #define ATH_MAX_SW_RETRIES      30

I thought the purpose of ATH_MAX_QDEPTH was due to a limit on the
depth of a hardware FIFO.  Not all packets that get handed to the
hardware come through a software queue (e.g. those that bypass the
intermediate queueing in mac80211 now) and (it seems to me) there
needs to be a limit to prevent overflowing a hardware fifo.   Yes,
normal data packets are (much) further limited as you taught me a few
weeks ago, but not all packets are subject to that constraint (as far
as I can understand at the moment).   I'm not entirely sure of the
details, but I think it is the sorts of packets sent directly by
hostapd and wpa_supplicant that bypass all the queueing.  And maybe
those things aren't likely to be sending a burst of hundreds of
packets in a very short period of time (where it could overrun the
FIFO), but there may be other tools that send raw 802.11 non-data
packets which could then overflow the above limit, and it seems you
are removing the check.    Actually I think my original version of
this patch may have had a flaw in that some combination of non-data
and data packets could be combined to overflow this limit (since I
failed to check overflowing this limit where I pulled from the
mac80211 intermediate queue).



My hope is that I'm just confused and you all understand what's really
going on better than I do and have an explanation why all the above
doesn't matter and is handled in some other way.   But in case you
don't, I don't want these issues to be overlooked.

(I'm not testing using vifs on multiple channels.   But even if I was,
 I'm not sure if normal operation of wpa_supplicant or hostapd would
 be enough to trigger these problems.)



			-Tim Shepard
			 shep at alum.mit.edu

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 16:31                 ` Felix Fietkau
@ 2016-07-08 16:38                   ` Toke Høiland-Jørgensen
  2016-07-08 18:24                   ` Sebastian Gottschall
  1 sibling, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-08 16:38 UTC (permalink / raw)
  To: ath9k-devel

Felix Fietkau <nbd@nbd.name> writes:

> On 2016-07-08 18:28, Toke H?iland-J?rgensen wrote:
>> Felix Fietkau <nbd@nbd.name> writes:
>> 
>>> On 2016-07-08 17:53, Toke H?iland-J?rgensen wrote:
>>>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>>> 
>>>>> Toke H?iland-J?rgensen wrote:
>>>>>> This switches ath9k over to using the mac80211 intermediate software
>>>>>> queueing mechanism for data packets. It removes the queueing inside the
>>>>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>>>>> a packet is needed. The retry queue is used to store a packet that was
>>>>>> pulled but can't be sent immediately.
>>>>>> 
>>>>>> The old code path in ath_tx_start that would queue packets has been
>>>>>> removed completely, as has the qlen limit tunables (since there's no
>>>>>> longer a queue in the driver to limit).
>>>>>> 
>>>>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>>>>> 
>>>>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>>>>> Cc: Felix Fietkau <nbd@nbd.name>
>>>>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>>>>
>>>>> Nice work.
>>>> 
>>>> Thanks :)
>>>> 
>>>>> Because this is such a significant change, and to maximise testing
>>>>> time, I'm planning to queue this for 4.9 (so I would apply this to
>>>>> ath-next in 3-4 weeks after the merge window closes). But anyone who
>>>>> wants to test this can use master-pending branch from my ath.git tree
>>>>> (uses wireless-testing as the baseline). Sounds good?
>>>> 
>>>> Sounds good to me. I'm planning on backporting this and Michael's
>>>> mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
>>>> Hopefully that will get it some more testing as well.
>>> I've pushed a backport of this into my LEDE staging tree:
>>> https://git.lede-project.org/?p=lede/nbd/staging.git;a=summary
>> 
>> Awesome! What about the FQ-CoDel mac80211 patches themselves? I have a
>> tree where I've separated out the needed patches and rebased them on
>> mainline 4.4.9. Can I post that somewhere (or just email you the series)
>> and get you to include those as well? Or do I just dump the patch files
>> into the LEDE patches dir and send that as a patch to LEDE? (I see your
>> patch also refreshed subsequent patches; is there a script to do that
>> automatically?)
> You don't need to do anything here. LEDE does not use mac80211 and
> drivers from the kernel tree, it's built using backports.
> It's currently using a backports snapshot that I built myself from
> wireless-testing 2016-06-20, which already includes FQ-Codel.

Ah, didn't know that. Cool; and thanks for taking care of the
backporting :)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 16:31                 ` Felix Fietkau
  2016-07-08 16:38                   ` Toke Høiland-Jørgensen
@ 2016-07-08 18:24                   ` Sebastian Gottschall
  2016-07-09 12:00                     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 50+ messages in thread
From: Sebastian Gottschall @ 2016-07-08 18:24 UTC (permalink / raw)
  To: ath9k-devel

for me it crashes on wds sta on 3.18 kernels. need to solder a serial to 
get more logs

Am 08.07.2016 um 18:31 schrieb Felix Fietkau:
> On 2016-07-08 18:28, Toke H?iland-J?rgensen wrote:
>> Felix Fietkau <nbd@nbd.name> writes:
>>
>>> On 2016-07-08 17:53, Toke H?iland-J?rgensen wrote:
>>>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>>>
>>>>> Toke H?iland-J?rgensen wrote:
>>>>>> This switches ath9k over to using the mac80211 intermediate software
>>>>>> queueing mechanism for data packets. It removes the queueing inside the
>>>>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>>>>> a packet is needed. The retry queue is used to store a packet that was
>>>>>> pulled but can't be sent immediately.
>>>>>>
>>>>>> The old code path in ath_tx_start that would queue packets has been
>>>>>> removed completely, as has the qlen limit tunables (since there's no
>>>>>> longer a queue in the driver to limit).
>>>>>>
>>>>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>>>>>
>>>>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>>>>> Cc: Felix Fietkau <nbd@nbd.name>
>>>>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>>>> Nice work.
>>>> Thanks :)
>>>>
>>>>> Because this is such a significant change, and to maximise testing
>>>>> time, I'm planning to queue this for 4.9 (so I would apply this to
>>>>> ath-next in 3-4 weeks after the merge window closes). But anyone who
>>>>> wants to test this can use master-pending branch from my ath.git tree
>>>>> (uses wireless-testing as the baseline). Sounds good?
>>>> Sounds good to me. I'm planning on backporting this and Michael's
>>>> mac80211 FQ-CoDel patches to 4.4 and post them for inclusion in LEDE.
>>>> Hopefully that will get it some more testing as well.
>>> I've pushed a backport of this into my LEDE staging tree:
>>> https://git.lede-project.org/?p=lede/nbd/staging.git;a=summary
>> Awesome! What about the FQ-CoDel mac80211 patches themselves? I have a
>> tree where I've separated out the needed patches and rebased them on
>> mainline 4.4.9. Can I post that somewhere (or just email you the series)
>> and get you to include those as well? Or do I just dump the patch files
>> into the LEDE patches dir and send that as a patch to LEDE? (I see your
>> patch also refreshed subsequent patches; is there a script to do that
>> automatically?)
> You don't need to do anything here. LEDE does not use mac80211 and
> drivers from the kernel tree, it's built using backports.
> It's currently using a backports snapshot that I built myself from
> wireless-testing 2016-06-20, which already includes FQ-Codel.
>
> - Felix
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Mit freundlichen Gr?ssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz:  Berliner Ring 101, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Gesch?ftsf?hrer: Peter Steinh?user, Christian Scheele
http://www.dd-wrt.com
email: s.gottschall at dd-wrt.com
Tel.: +496251-582650 / Fax: +496251-5826565

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 18:24                   ` Sebastian Gottschall
@ 2016-07-09 12:00                     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-09 12:00 UTC (permalink / raw)
  To: ath9k-devel

Sebastian Gottschall <s.gottschall@dd-wrt.com> writes:

> for me it crashes on wds sta on 3.18 kernels. 

Bugger :/

> need to solder a serial to get more logs

That would be helpful :)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v3] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-08 16:38         ` [ath9k-devel] [PATCH v3] " Tim Shepard
@ 2016-07-09 15:45           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-07-09 15:45 UTC (permalink / raw)
  To: ath9k-devel

Tim Shepard <shep@alum.mit.edu> writes:

>> The old code path in ath_tx_start that would queue packets has been
>> removed completely, 
>
> It seems to me that this breaks the ath9k driver when non-data packets
> which mac80211 will not queue on the new intermediate queues, see
> ieee80211_drv_tx( ) in mac80211/tx.c where it says
>
> 	if (!ieee80211_is_data(hdr->frame_control))
> 		goto tx_normal;
>
> This means that non-data packets can come down from mac80211 via
> ath_tx --> ath_tx_start which might be for a vif that is not on the
> same channel, and so cannot be sent straight away but must be queued.
> Maybe this problem should be fixed in mac80211, but as of now this is
> a problem.   It appears to me that your new patch just sends them on
> the wrong channel.

Well, the idea is that the chanctx code will call ieee80211_stop_queue()
to make sure that packets for the wrong context are not pushed down to
the driver.

>> as has the qlen limit tunables (since there's no
>> longer a queue in the driver to limit).
>
>> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
>> index 5294595..daf972c 100644
>> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
>> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
>> @@ -91,7 +91,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
>>  #define ATH_RXBUF               512
>>  #define ATH_TXBUF               512
>>  #define ATH_TXBUF_RESERVE       5
>> -#define ATH_MAX_QDEPTH          (ATH_TXBUF / 4 - ATH_TXBUF_RESERVE)
>>  #define ATH_TXMAXTRY            13
>>  #define ATH_MAX_SW_RETRIES      30
>
> I thought the purpose of ATH_MAX_QDEPTH was due to a limit on the
> depth of a hardware FIFO.

It's not. The limit is way too high for that. I'm a little fuzzy on the
details, but the hardware queue depth is somewhat lower:

#define ATH_TXFIFO_DEPTH           8

The ATH_MAX_QDEPTH was supposed to keep the number of packets queued in
the driver under control. And since we're no longer queueing in the
driver, there's no longer a need for it.

> Not all packets that get handed to the hardware come through a
> software queue (e.g. those that bypass the intermediate queueing in
> mac80211 now) and (it seems to me) there needs to be a limit to
> prevent overflowing a hardware fifo. Yes, normal data packets are
> (much) further limited as you taught me a few weeks ago, but not all
> packets are subject to that constraint (as far as I can understand at
> the moment). I'm not entirely sure of the details, but I think it is
> the sorts of packets sent directly by hostapd and wpa_supplicant that
> bypass all the queueing. And maybe those things aren't likely to be
> sending a burst of hundreds of packets in a very short period of time
> (where it could overrun the FIFO), but there may be other tools that
> send raw 802.11 non-data packets which could then overflow the above
> limit, and it seems you are removing the check. Actually I think my
> original version of this patch may have had a flaw in that some
> combination of non-data and data packets could be combined to overflow
> this limit (since I failed to check overflowing this limit where I
> pulled from the mac80211 intermediate queue).

Hmm, I'm not sure I can confidently say that what you describe would
never happen. But I'm pretty sure that ATH_MAX_QDEPTH wasn't what was
keeping it from happening...

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-07-06 19:38       ` [ath9k-devel] [PATCH v3] " Toke Høiland-Jørgensen
  2016-07-08 14:26         ` [ath9k-devel] [v3] " Kalle Valo
  2016-07-08 16:38         ` [ath9k-devel] [PATCH v3] " Tim Shepard
@ 2016-08-05 16:05         ` Toke Høiland-Jørgensen
  2016-08-22 15:43           ` Kalle Valo
  2 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-08-05 16:05 UTC (permalink / raw)
  To: ath9k-devel

This switches ath9k over to using the mac80211 intermediate software
queueing mechanism for data packets. It removes the queueing inside the
driver, except for the retry queue, and instead pulls from mac80211 when
a packet is needed. The retry queue is used to store a packet that was
pulled but can't be sent immediately.

The old code path in ath_tx_start that would queue packets has been
removed completely, as has the qlen limit tunables (since there's no
longer a queue in the driver to limit).

Based on Tim's original patch set, but reworked quite thoroughly.

Cc: Tim Shepard <shep@alum.mit.edu>
Cc: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
---
Changes since v3 (most due to Felix; thanks!):
  - Correctly notify mac80211 when there are packets in the retry queue
    on powersave start/stop.
  - Get rid of ath_tx_aggr_resume().
  - Some readability changes and additional WARN_ON/BUG_ON in
    appropriate places.
    
 drivers/net/wireless/ath/ath9k/ath9k.h     |  27 ++-
 drivers/net/wireless/ath/ath9k/channel.c   |   2 -
 drivers/net/wireless/ath/ath9k/debug.c     |  14 +-
 drivers/net/wireless/ath/ath9k/debug.h     |   2 -
 drivers/net/wireless/ath/ath9k/debug_sta.c |   4 +-
 drivers/net/wireless/ath/ath9k/init.c      |   2 +-
 drivers/net/wireless/ath/ath9k/main.c      |   9 +-
 drivers/net/wireless/ath/ath9k/xmit.c      | 332 +++++++++++------------------
 8 files changed, 157 insertions(+), 235 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 5294595..7e0a976 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -91,7 +91,6 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define ATH_RXBUF               512
 #define ATH_TXBUF               512
 #define ATH_TXBUF_RESERVE       5
-#define ATH_MAX_QDEPTH          (ATH_TXBUF / 4 - ATH_TXBUF_RESERVE)
 #define ATH_TXMAXTRY            13
 #define ATH_MAX_SW_RETRIES      30
 
@@ -145,7 +144,7 @@ int ath_descdma_setup(struct ath_softc *sc, struct ath_descdma *dd,
 #define BAW_WITHIN(_start, _bawsz, _seqno) \
 	((((_seqno) - (_start)) & 4095) < (_bawsz))
 
-#define ATH_AN_2_TID(_an, _tidno)  (&(_an)->tid[(_tidno)])
+#define ATH_AN_2_TID(_an, _tidno) ath_node_to_tid(_an, _tidno)
 
 #define IS_HT_RATE(rate)   (rate & 0x80)
 #define IS_CCK_RATE(rate)  ((rate >= 0x18) && (rate <= 0x1e))
@@ -164,7 +163,6 @@ struct ath_txq {
 	spinlock_t axq_lock;
 	u32 axq_depth;
 	u32 axq_ampdu_depth;
-	bool stopped;
 	bool axq_tx_inprogress;
 	struct list_head txq_fifo[ATH_TXFIFO_DEPTH];
 	u8 txq_headidx;
@@ -232,7 +230,6 @@ struct ath_buf {
 
 struct ath_atx_tid {
 	struct list_head list;
-	struct sk_buff_head buf_q;
 	struct sk_buff_head retry_q;
 	struct ath_node *an;
 	struct ath_txq *txq;
@@ -247,13 +244,13 @@ struct ath_atx_tid {
 	s8 bar_index;
 	bool active;
 	bool clear_ps_filter;
+	bool has_queued;
 };
 
 struct ath_node {
 	struct ath_softc *sc;
 	struct ieee80211_sta *sta; /* station struct we're part of */
 	struct ieee80211_vif *vif; /* interface with which we're associated */
-	struct ath_atx_tid tid[IEEE80211_NUM_TIDS];
 
 	u16 maxampdu;
 	u8 mpdudensity;
@@ -276,7 +273,6 @@ struct ath_tx_control {
 	struct ath_node *an;
 	struct ieee80211_sta *sta;
 	u8 paprd;
-	bool force_channel;
 };
 
 
@@ -293,7 +289,6 @@ struct ath_tx {
 	struct ath_descdma txdma;
 	struct ath_txq *txq_map[IEEE80211_NUM_ACS];
 	struct ath_txq *uapsdq;
-	u32 txq_max_pending[IEEE80211_NUM_ACS];
 	u16 max_aggr_framelen[IEEE80211_NUM_ACS][4][32];
 };
 
@@ -421,6 +416,22 @@ struct ath_offchannel {
 	int duration;
 };
 
+static inline struct ath_atx_tid *
+ath_node_to_tid(struct ath_node *an, u8 tidno)
+{
+	struct ieee80211_sta *sta = an->sta;
+	struct ieee80211_vif *vif = an->vif;
+	struct ieee80211_txq *txq;
+
+	BUG_ON(!vif);
+	if (sta)
+		txq = sta->txq[tidno % ARRAY_SIZE(sta->txq)];
+	else
+		txq = vif->txq;
+
+	return (struct ath_atx_tid *) txq->drv_priv;
+}
+
 #define case_rtn_string(val) case val: return #val
 
 #define ath_for_each_chanctx(_sc, _ctx)                             \
@@ -575,7 +586,6 @@ void ath_tx_edma_tasklet(struct ath_softc *sc);
 int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
 		      u16 tid, u16 *ssn);
 void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid);
-void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid);
 
 void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an);
 void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
@@ -585,6 +595,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 				   u16 tids, int nframes,
 				   enum ieee80211_frame_release_type reason,
 				   bool more_data);
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue);
 
 /********/
 /* VIFs */
diff --git a/drivers/net/wireless/ath/ath9k/channel.c b/drivers/net/wireless/ath/ath9k/channel.c
index 319cb5f..a5ce016 100644
--- a/drivers/net/wireless/ath/ath9k/channel.c
+++ b/drivers/net/wireless/ath/ath9k/channel.c
@@ -1007,7 +1007,6 @@ static void ath_scan_send_probe(struct ath_softc *sc,
 		goto error;
 
 	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
-	txctl.force_channel = true;
 	if (ath_tx_start(sc->hw, skb, &txctl))
 		goto error;
 
@@ -1130,7 +1129,6 @@ ath_chanctx_send_vif_ps_frame(struct ath_softc *sc, struct ath_vif *avp,
 	memset(&txctl, 0, sizeof(txctl));
 	txctl.txq = sc->tx.txq_map[IEEE80211_AC_VO];
 	txctl.sta = sta;
-	txctl.force_channel = true;
 	if (ath_tx_start(sc->hw, skb, &txctl)) {
 		ieee80211_free_txskb(sc->hw, skb);
 		return false;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index 6de64cf..48b181d 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -600,7 +600,6 @@ static int read_file_xmit(struct seq_file *file, void *data)
 	PR("MPDUs XRetried:  ", xretries);
 	PR("Aggregates:      ", a_aggr);
 	PR("AMPDUs Queued HW:", a_queued_hw);
-	PR("AMPDUs Queued SW:", a_queued_sw);
 	PR("AMPDUs Completed:", a_completed);
 	PR("AMPDUs Retried:  ", a_retries);
 	PR("AMPDUs XRetried: ", a_xretries);
@@ -629,8 +628,7 @@ static void print_queue(struct ath_softc *sc, struct ath_txq *txq,
 	seq_printf(file, "%s: %d ", "qnum", txq->axq_qnum);
 	seq_printf(file, "%s: %2d ", "qdepth", txq->axq_depth);
 	seq_printf(file, "%s: %2d ", "ampdu-depth", txq->axq_ampdu_depth);
-	seq_printf(file, "%s: %3d ", "pending", txq->pending_frames);
-	seq_printf(file, "%s: %d\n", "stopped", txq->stopped);
+	seq_printf(file, "%s: %3d\n", "pending", txq->pending_frames);
 
 	ath_txq_unlock(sc, txq);
 }
@@ -1190,7 +1188,6 @@ static const char ath9k_gstrings_stats[][ETH_GSTRING_LEN] = {
 	AMKSTR(d_tx_mpdu_xretries),
 	AMKSTR(d_tx_aggregates),
 	AMKSTR(d_tx_ampdus_queued_hw),
-	AMKSTR(d_tx_ampdus_queued_sw),
 	AMKSTR(d_tx_ampdus_completed),
 	AMKSTR(d_tx_ampdu_retries),
 	AMKSTR(d_tx_ampdu_xretries),
@@ -1270,7 +1267,6 @@ void ath9k_get_et_stats(struct ieee80211_hw *hw,
 	AWDATA(xretries);
 	AWDATA(a_aggr);
 	AWDATA(a_queued_hw);
-	AWDATA(a_queued_sw);
 	AWDATA(a_completed);
 	AWDATA(a_retries);
 	AWDATA(a_xretries);
@@ -1328,14 +1324,6 @@ int ath9k_init_debug(struct ath_hw *ah)
 				    read_file_xmit);
 	debugfs_create_devm_seqfile(sc->dev, "queues", sc->debug.debugfs_phy,
 				    read_file_queues);
-	debugfs_create_u32("qlen_bk", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_BK]);
-	debugfs_create_u32("qlen_be", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_BE]);
-	debugfs_create_u32("qlen_vi", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_VI]);
-	debugfs_create_u32("qlen_vo", S_IRUSR | S_IWUSR, sc->debug.debugfs_phy,
-			   &sc->tx.txq_max_pending[IEEE80211_AC_VO]);
 	debugfs_create_devm_seqfile(sc->dev, "misc", sc->debug.debugfs_phy,
 				    read_file_misc);
 	debugfs_create_devm_seqfile(sc->dev, "reset", sc->debug.debugfs_phy,
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index cd68c5f..a078cdd 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -147,7 +147,6 @@ struct ath_interrupt_stats {
  * @completed: Total MPDUs (non-aggr) completed
  * @a_aggr: Total no. of aggregates queued
  * @a_queued_hw: Total AMPDUs queued to hardware
- * @a_queued_sw: Total AMPDUs queued to software queues
  * @a_completed: Total AMPDUs completed
  * @a_retries: No. of AMPDUs retried (SW)
  * @a_xretries: No. of AMPDUs dropped due to xretries
@@ -174,7 +173,6 @@ struct ath_tx_stats {
 	u32 xretries;
 	u32 a_aggr;
 	u32 a_queued_hw;
-	u32 a_queued_sw;
 	u32 a_completed;
 	u32 a_retries;
 	u32 a_xretries;
diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c b/drivers/net/wireless/ath/ath9k/debug_sta.c
index c2ca57a..2e8371a 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -52,8 +52,8 @@ static ssize_t read_file_node_aggr(struct file *file, char __user *user_buf,
 			 "TID", "SEQ_START", "SEQ_NEXT", "BAW_SIZE",
 			 "BAW_HEAD", "BAW_TAIL", "BAR_IDX", "SCHED", "PAUSED");
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ath_node_to_tid(an, tidno);
 		txq = tid->txq;
 		ath_txq_lock(sc, txq);
 		if (tid->active) {
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 1c226d6..752cacb 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -354,7 +354,6 @@ static int ath9k_init_queues(struct ath_softc *sc)
 	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
 		sc->tx.txq_map[i] = ath_txq_setup(sc, ATH9K_TX_QUEUE_DATA, i);
 		sc->tx.txq_map[i]->mac80211_qnum = i;
-		sc->tx.txq_max_pending[i] = ATH_MAX_QDEPTH;
 	}
 	return 0;
 }
@@ -867,6 +866,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
 	hw->max_rate_tries = 10;
 	hw->sta_data_size = sizeof(struct ath_node);
 	hw->vif_data_size = sizeof(struct ath_vif);
+	hw->txq_data_size = sizeof(struct ath_atx_tid);
 	hw->extra_tx_headroom = 4;
 
 	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 3aed43a..eb48f91 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1874,9 +1874,11 @@ static int ath9k_ampdu_action(struct ieee80211_hw *hw,
 	bool flush = false;
 	int ret = 0;
 	struct ieee80211_sta *sta = params->sta;
+	struct ath_node *an = (struct ath_node *)sta->drv_priv;
 	enum ieee80211_ampdu_mlme_action action = params->action;
 	u16 tid = params->tid;
 	u16 *ssn = &params->ssn;
+	struct ath_atx_tid *atid;
 
 	mutex_lock(&sc->mutex);
 
@@ -1909,9 +1911,9 @@ static int ath9k_ampdu_action(struct ieee80211_hw *hw,
 		ath9k_ps_restore(sc);
 		break;
 	case IEEE80211_AMPDU_TX_OPERATIONAL:
-		ath9k_ps_wakeup(sc);
-		ath_tx_aggr_resume(sc, sta, tid);
-		ath9k_ps_restore(sc);
+		atid = ath_node_to_tid(an, tid);
+		atid->baw_size = IEEE80211_MIN_AMPDU_BUF <<
+			        sta->ht_cap.ampdu_factor;
 		break;
 	default:
 		ath_err(ath9k_hw_common(sc->sc_ah), "Unknown AMPDU action\n");
@@ -2673,4 +2675,5 @@ struct ieee80211_ops ath9k_ops = {
 	.sw_scan_start	    = ath9k_sw_scan_start,
 	.sw_scan_complete   = ath9k_sw_scan_complete,
 	.get_txpower        = ath9k_get_txpower,
+	.wake_tx_queue      = ath9k_wake_tx_queue,
 };
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index fe795fc..8103954 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -65,6 +65,8 @@ static struct ath_buf *ath_tx_setup_buffer(struct ath_softc *sc,
 					   struct ath_txq *txq,
 					   struct ath_atx_tid *tid,
 					   struct sk_buff *skb);
+static int ath_tx_prepare(struct ieee80211_hw *hw, struct sk_buff *skb,
+			  struct ath_tx_control *txctl);
 
 enum {
 	MCS_HT20,
@@ -118,6 +120,26 @@ static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq,
 		list_add_tail(&tid->list, list);
 }
 
+void ath9k_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *queue)
+{
+	struct ath_softc *sc = hw->priv;
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	struct ath_atx_tid *tid = (struct ath_atx_tid *) queue->drv_priv;
+	struct ath_txq *txq = tid->txq;
+
+	ath_dbg(common, QUEUE, "Waking TX queue: %pM (%d)\n",
+		queue->sta ? queue->sta->addr : queue->vif->addr,
+		tid->tidno);
+
+	ath_txq_lock(sc, txq);
+
+	tid->has_queued = true;
+	ath_tx_queue_tid(sc, txq, tid);
+	ath_txq_schedule(sc, txq);
+
+	ath_txq_unlock(sc, txq);
+}
+
 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
 {
 	struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
@@ -145,7 +167,6 @@ static void ath_set_rates(struct ieee80211_vif *vif, struct ieee80211_sta *sta,
 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
 			     struct sk_buff *skb)
 {
-	struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
 	struct ath_frame_info *fi = get_frame_info(skb);
 	int q = fi->txq;
 
@@ -156,14 +177,6 @@ static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
 	if (WARN_ON(--txq->pending_frames < 0))
 		txq->pending_frames = 0;
 
-	if (txq->stopped &&
-	    txq->pending_frames < sc->tx.txq_max_pending[q]) {
-		if (ath9k_is_chanctx_enabled())
-			ieee80211_wake_queue(sc->hw, info->hw_queue);
-		else
-			ieee80211_wake_queue(sc->hw, q);
-		txq->stopped = false;
-	}
 }
 
 static struct ath_atx_tid *
@@ -173,9 +186,48 @@ ath_get_skb_tid(struct ath_softc *sc, struct ath_node *an, struct sk_buff *skb)
 	return ATH_AN_2_TID(an, tidno);
 }
 
+static struct sk_buff *
+ath_tid_pull(struct ath_atx_tid *tid)
+{
+	struct ieee80211_txq *txq = container_of((void*)tid, struct ieee80211_txq, drv_priv);
+	struct ath_softc *sc = tid->an->sc;
+	struct ieee80211_hw *hw = sc->hw;
+	struct ath_tx_control txctl = {
+		.txq = tid->txq,
+		.sta = tid->an->sta,
+	};
+	struct sk_buff *skb;
+	struct ath_frame_info *fi;
+	int q;
+
+	if (!tid->has_queued)
+		return NULL;
+
+	skb = ieee80211_tx_dequeue(hw, txq);
+	if (!skb) {
+		tid->has_queued = false;
+		return NULL;
+	}
+
+	if (ath_tx_prepare(hw, skb, &txctl)) {
+		ieee80211_free_txskb(hw, skb);
+		return NULL;
+	}
+
+	q = skb_get_queue_mapping(skb);
+	if (tid->txq == sc->tx.txq_map[q]) {
+		fi = get_frame_info(skb);
+		fi->txq = q;
+		++tid->txq->pending_frames;
+	}
+
+	return skb;
+ }
+
+
 static bool ath_tid_has_buffered(struct ath_atx_tid *tid)
 {
-	return !skb_queue_empty(&tid->buf_q) || !skb_queue_empty(&tid->retry_q);
+	return !skb_queue_empty(&tid->retry_q) || tid->has_queued;
 }
 
 static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
@@ -184,46 +236,11 @@ static struct sk_buff *ath_tid_dequeue(struct ath_atx_tid *tid)
 
 	skb = __skb_dequeue(&tid->retry_q);
 	if (!skb)
-		skb = __skb_dequeue(&tid->buf_q);
+		skb = ath_tid_pull(tid);
 
 	return skb;
 }
 
-/*
- * ath_tx_tid_change_state:
- * - clears a-mpdu flag of previous session
- * - force sequence number allocation to fix next BlockAck Window
- */
-static void
-ath_tx_tid_change_state(struct ath_softc *sc, struct ath_atx_tid *tid)
-{
-	struct ath_txq *txq = tid->txq;
-	struct ieee80211_tx_info *tx_info;
-	struct sk_buff *skb, *tskb;
-	struct ath_buf *bf;
-	struct ath_frame_info *fi;
-
-	skb_queue_walk_safe(&tid->buf_q, skb, tskb) {
-		fi = get_frame_info(skb);
-		bf = fi->bf;
-
-		tx_info = IEEE80211_SKB_CB(skb);
-		tx_info->flags &= ~IEEE80211_TX_CTL_AMPDU;
-
-		if (bf)
-			continue;
-
-		bf = ath_tx_setup_buffer(sc, txq, tid, skb);
-		if (!bf) {
-			__skb_unlink(skb, &tid->buf_q);
-			ath_txq_skb_done(sc, txq, skb);
-			ieee80211_free_txskb(sc->hw, skb);
-			continue;
-		}
-	}
-
-}
-
 static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid)
 {
 	struct ath_txq *txq = tid->txq;
@@ -858,20 +875,16 @@ static int ath_compute_num_delims(struct ath_softc *sc, struct ath_atx_tid *tid,
 
 static struct ath_buf *
 ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
-			struct ath_atx_tid *tid, struct sk_buff_head **q)
+			struct ath_atx_tid *tid)
 {
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
-	struct sk_buff *skb;
+	struct sk_buff *skb, *first_skb = NULL;
 	struct ath_buf *bf;
 	u16 seqno;
 
 	while (1) {
-		*q = &tid->retry_q;
-		if (skb_queue_empty(*q))
-			*q = &tid->buf_q;
-
-		skb = skb_peek(*q);
+		skb = ath_tid_dequeue(tid);
 		if (!skb)
 			break;
 
@@ -883,7 +896,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 			bf->bf_state.stale = false;
 
 		if (!bf) {
-			__skb_unlink(skb, *q);
 			ath_txq_skb_done(sc, txq, skb);
 			ieee80211_free_txskb(sc->hw, skb);
 			continue;
@@ -912,8 +924,20 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 		seqno = bf->bf_state.seqno;
 
 		/* do not step over block-ack window */
-		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno))
+		if (!BAW_WITHIN(tid->seq_start, tid->baw_size, seqno)) {
+			__skb_queue_tail(&tid->retry_q, skb);
+
+			/* If there are other skbs in the retry q, they are
+			 * probably within the BAW, so loop immediately to get
+			 * one of them. Otherwise the queue can get stuck. */
+			if (!skb_queue_is_first(&tid->retry_q, skb) &&
+			    !WARN_ON(skb == first_skb)) {
+				if(!first_skb) /* infinite loop prevention */
+					first_skb = skb;
+				continue;
+			}
 			break;
+		}
 
 		if (tid->bar_index > ATH_BA_INDEX(tid->seq_start, seqno)) {
 			struct ath_tx_status ts = {};
@@ -921,7 +945,6 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 
 			INIT_LIST_HEAD(&bf_head);
 			list_add(&bf->list, &bf_head);
-			__skb_unlink(skb, *q);
 			ath_tx_update_baw(sc, tid, seqno);
 			ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0);
 			continue;
@@ -933,11 +956,10 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq,
 	return NULL;
 }
 
-static bool
+static int
 ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		 struct ath_atx_tid *tid, struct list_head *bf_q,
-		 struct ath_buf *bf_first, struct sk_buff_head *tid_q,
-		 int *aggr_len)
+		 struct ath_buf *bf_first)
 {
 #define PADBYTES(_len) ((4 - ((_len) % 4)) % 4)
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
@@ -947,12 +969,13 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 	struct ieee80211_tx_info *tx_info;
 	struct ath_frame_info *fi;
 	struct sk_buff *skb;
-	bool closed = false;
+
 
 	bf = bf_first;
 	aggr_limit = ath_lookup_rate(sc, bf, tid);
 
-	do {
+	while (bf)
+	{
 		skb = bf->bf_mpdu;
 		fi = get_frame_info(skb);
 
@@ -961,12 +984,12 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes) {
 			if (aggr_limit < al + bpad + al_delta ||
 			    ath_lookup_legacy(bf) || nframes >= h_baw)
-				break;
+				goto stop;
 
 			tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 			if ((tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) ||
 			    !(tx_info->flags & IEEE80211_TX_CTL_AMPDU))
-				break;
+				goto stop;
 		}
 
 		/* add padding for previous frame to aggregation length */
@@ -988,20 +1011,18 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 			ath_tx_addto_baw(sc, tid, bf);
 		bf->bf_state.ndelim = ndelim;
 
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
 
 		bf_prev = bf;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
-		if (!bf) {
-			closed = true;
-			break;
-		}
-	} while (ath_tid_has_buffered(tid));
-
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
+	}
+	goto finish;
+stop:
+	__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
+finish:
 	bf = bf_first;
 	bf->bf_lastbf = bf_prev;
 
@@ -1012,9 +1033,7 @@ ath_tx_form_aggr(struct ath_softc *sc, struct ath_txq *txq,
 		TX_STAT_INC(txq->axq_qnum, a_aggr);
 	}
 
-	*aggr_len = al;
-
-	return closed;
+	return al;
 #undef PADBYTES
 }
 
@@ -1391,18 +1410,15 @@ static void ath_tx_fill_desc(struct ath_softc *sc, struct ath_buf *bf,
 static void
 ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		  struct ath_atx_tid *tid, struct list_head *bf_q,
-		  struct ath_buf *bf_first, struct sk_buff_head *tid_q)
+		  struct ath_buf *bf_first)
 {
 	struct ath_buf *bf = bf_first, *bf_prev = NULL;
-	struct sk_buff *skb;
 	int nframes = 0;
 
 	do {
 		struct ieee80211_tx_info *tx_info;
-		skb = bf->bf_mpdu;
 
 		nframes++;
-		__skb_unlink(skb, tid_q);
 		list_add_tail(&bf->list, bf_q);
 		if (bf_prev)
 			bf_prev->bf_next = bf;
@@ -1411,13 +1427,15 @@ ath_tx_form_burst(struct ath_softc *sc, struct ath_txq *txq,
 		if (nframes >= 2)
 			break;
 
-		bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+		bf = ath_tx_get_tid_subframe(sc, txq, tid);
 		if (!bf)
 			break;
 
 		tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
-		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU)
+		if (tx_info->flags & IEEE80211_TX_CTL_AMPDU) {
+			__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 			break;
+		}
 
 		ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	} while (1);
@@ -1428,34 +1446,33 @@ static bool ath_tx_sched_aggr(struct ath_softc *sc, struct ath_txq *txq,
 {
 	struct ath_buf *bf;
 	struct ieee80211_tx_info *tx_info;
-	struct sk_buff_head *tid_q;
 	struct list_head bf_q;
 	int aggr_len = 0;
-	bool aggr, last = true;
+	bool aggr;
 
 	if (!ath_tid_has_buffered(tid))
 		return false;
 
 	INIT_LIST_HEAD(&bf_q);
 
-	bf = ath_tx_get_tid_subframe(sc, txq, tid, &tid_q);
+	bf = ath_tx_get_tid_subframe(sc, txq, tid);
 	if (!bf)
 		return false;
 
 	tx_info = IEEE80211_SKB_CB(bf->bf_mpdu);
 	aggr = !!(tx_info->flags & IEEE80211_TX_CTL_AMPDU);
 	if ((aggr && txq->axq_ampdu_depth >= ATH_AGGR_MIN_QDEPTH) ||
-		(!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+	    (!aggr && txq->axq_depth >= ATH_NON_AGGR_MIN_QDEPTH)) {
+		__skb_queue_tail(&tid->retry_q, bf->bf_mpdu);
 		*stop = true;
 		return false;
 	}
 
 	ath_set_rates(tid->an->vif, tid->an->sta, bf);
 	if (aggr)
-		last = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf,
-					tid_q, &aggr_len);
+		aggr_len = ath_tx_form_aggr(sc, txq, tid, &bf_q, bf);
 	else
-		ath_tx_form_burst(sc, txq, tid, &bf_q, bf, tid_q);
+		ath_tx_form_burst(sc, txq, tid, &bf_q, bf);
 
 	if (list_empty(&bf_q))
 		return false;
@@ -1498,9 +1515,6 @@ int ath_tx_aggr_start(struct ath_softc *sc, struct ieee80211_sta *sta,
 		an->mpdudensity = density;
 	}
 
-	/* force sequence number allocation for pending frames */
-	ath_tx_tid_change_state(sc, txtid);
-
 	txtid->active = true;
 	*ssn = txtid->seq_start = txtid->seq_next;
 	txtid->bar_index = -1;
@@ -1525,7 +1539,6 @@ void ath_tx_aggr_stop(struct ath_softc *sc, struct ieee80211_sta *sta, u16 tid)
 	ath_txq_lock(sc, txq);
 	txtid->active = false;
 	ath_tx_flush_tid(sc, txtid);
-	ath_tx_tid_change_state(sc, txtid);
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -1535,14 +1548,12 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
 	struct ath_atx_tid *tid;
 	struct ath_txq *txq;
-	bool buffered;
 	int tidno;
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ath_node_to_tid(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -1552,13 +1563,12 @@ void ath_tx_aggr_sleep(struct ieee80211_sta *sta, struct ath_softc *sc,
 			continue;
 		}
 
-		buffered = ath_tid_has_buffered(tid);
+		if (!skb_queue_empty(&tid->retry_q))
+			ieee80211_sta_set_buffered(sta, tid->tidno, true);
 
 		list_del_init(&tid->list);
 
 		ath_txq_unlock(sc, txq);
-
-		ieee80211_sta_set_buffered(sta, tidno, buffered);
 	}
 }
 
@@ -1571,49 +1581,20 @@ void ath_tx_aggr_wakeup(struct ath_softc *sc, struct ath_node *an)
 
 	ath_dbg(common, XMIT, "%s called\n", __func__);
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ath_node_to_tid(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
 		tid->clear_ps_filter = true;
-
 		if (ath_tid_has_buffered(tid)) {
 			ath_tx_queue_tid(sc, txq, tid);
 			ath_txq_schedule(sc, txq);
 		}
-
 		ath_txq_unlock_complete(sc, txq);
 	}
 }
 
-void ath_tx_aggr_resume(struct ath_softc *sc, struct ieee80211_sta *sta,
-			u16 tidno)
-{
-	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
-	struct ath_atx_tid *tid;
-	struct ath_node *an;
-	struct ath_txq *txq;
-
-	ath_dbg(common, XMIT, "%s called\n", __func__);
-
-	an = (struct ath_node *)sta->drv_priv;
-	tid = ATH_AN_2_TID(an, tidno);
-	txq = tid->txq;
-
-	ath_txq_lock(sc, txq);
-
-	tid->baw_size = IEEE80211_MIN_AMPDU_BUF << sta->ht_cap.ampdu_factor;
-
-	if (ath_tid_has_buffered(tid)) {
-		ath_tx_queue_tid(sc, txq, tid);
-		ath_txq_schedule(sc, txq);
-	}
-
-	ath_txq_unlock_complete(sc, txq);
-}
-
 void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 				   struct ieee80211_sta *sta,
 				   u16 tids, int nframes,
@@ -1626,7 +1607,6 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 	struct ieee80211_tx_info *info;
 	struct list_head bf_q;
 	struct ath_buf *bf_tail = NULL, *bf;
-	struct sk_buff_head *tid_q;
 	int sent = 0;
 	int i;
 
@@ -1641,11 +1621,10 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 
 		ath_txq_lock(sc, tid->txq);
 		while (nframes > 0) {
-			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid, &tid_q);
+			bf = ath_tx_get_tid_subframe(sc, sc->tx.uapsdq, tid);
 			if (!bf)
 				break;
 
-			__skb_unlink(bf->bf_mpdu, tid_q);
 			list_add_tail(&bf->list, &bf_q);
 			ath_set_rates(tid->an->vif, tid->an->sta, bf);
 			if (bf_isampdu(bf)) {
@@ -1660,7 +1639,7 @@ void ath9k_release_buffered_frames(struct ieee80211_hw *hw,
 			sent++;
 			TX_STAT_INC(txq->axq_qnum, a_queued_hw);
 
-			if (an->sta && !ath_tid_has_buffered(tid))
+			if (an->sta && skb_queue_empty(&tid->retry_q))
 				ieee80211_sta_set_buffered(an->sta, i, false);
 		}
 		ath_txq_unlock_complete(sc, tid->txq);
@@ -1887,13 +1866,7 @@ bool ath_drain_all_txq(struct ath_softc *sc)
 		if (!ATH_TXQ_SETUP(sc, i))
 			continue;
 
-		/*
-		 * The caller will resume queues with ieee80211_wake_queues.
-		 * Mark the queue as not stopped to prevent ath_tx_complete
-		 * from waking the queue too early.
-		 */
 		txq = &sc->tx.txq[i];
-		txq->stopped = false;
 		ath_draintxq(sc, txq);
 	}
 
@@ -2293,15 +2266,12 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 	struct ath_txq *txq = txctl->txq;
 	struct ath_atx_tid *tid = NULL;
 	struct ath_buf *bf;
-	bool queue, skip_uapsd = false, ps_resp;
+	bool ps_resp;
 	int q, ret;
 
 	if (vif)
 		avp = (void *)vif->drv_priv;
 
-	if (info->flags & IEEE80211_TX_CTL_TX_OFFCHAN)
-		txctl->force_channel = true;
-
 	ps_resp = !!(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE);
 
 	ret = ath_tx_prepare(hw, skb, txctl);
@@ -2316,63 +2286,13 @@ int ath_tx_start(struct ieee80211_hw *hw, struct sk_buff *skb,
 
 	q = skb_get_queue_mapping(skb);
 
+	if (ps_resp)
+		txq = sc->tx.uapsdq;
+
 	ath_txq_lock(sc, txq);
 	if (txq == sc->tx.txq_map[q]) {
 		fi->txq = q;
-		if (++txq->pending_frames > sc->tx.txq_max_pending[q] &&
-		    !txq->stopped) {
-			if (ath9k_is_chanctx_enabled())
-				ieee80211_stop_queue(sc->hw, info->hw_queue);
-			else
-				ieee80211_stop_queue(sc->hw, q);
-			txq->stopped = true;
-		}
-	}
-
-	queue = ieee80211_is_data_present(hdr->frame_control);
-
-	/* If chanctx, queue all null frames while NOA could be there */
-	if (ath9k_is_chanctx_enabled() &&
-	    ieee80211_is_nullfunc(hdr->frame_control) &&
-	    !txctl->force_channel)
-		queue = true;
-
-	/* Force queueing of all frames that belong to a virtual interface on
-	 * a different channel context, to ensure that they are sent on the
-	 * correct channel.
-	 */
-	if (((avp && avp->chanctx != sc->cur_chan) ||
-	     sc->cur_chan->stopped) && !txctl->force_channel) {
-		if (!txctl->an)
-			txctl->an = &avp->mcast_node;
-		queue = true;
-		skip_uapsd = true;
-	}
-
-	if (txctl->an && queue)
-		tid = ath_get_skb_tid(sc, txctl->an, skb);
-
-	if (!skip_uapsd && ps_resp) {
-		ath_txq_unlock(sc, txq);
-		txq = sc->tx.uapsdq;
-		ath_txq_lock(sc, txq);
-	} else if (txctl->an && queue) {
-		WARN_ON(tid->txq != txctl->txq);
-
-		if (info->flags & IEEE80211_TX_CTL_CLEAR_PS_FILT)
-			tid->clear_ps_filter = true;
-
-		/*
-		 * Add this frame to software queue for scheduling later
-		 * for aggregation.
-		 */
-		TX_STAT_INC(txq->axq_qnum, a_queued_sw);
-		__skb_queue_tail(&tid->buf_q, skb);
-		if (!txctl->an->sleeping)
-			ath_tx_queue_tid(sc, txq, tid);
-
-		ath_txq_schedule(sc, txq);
-		goto out;
+		++txq->pending_frames;
 	}
 
 	bf = ath_tx_setup_buffer(sc, txq, tid, skb);
@@ -2856,9 +2776,8 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 	struct ath_atx_tid *tid;
 	int tidno, acno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS;
-	     tidno++, tid++) {
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ath_node_to_tid(an, tidno);
 		tid->an        = an;
 		tid->tidno     = tidno;
 		tid->seq_start = tid->seq_next = 0;
@@ -2866,11 +2785,14 @@ void ath_tx_node_init(struct ath_softc *sc, struct ath_node *an)
 		tid->baw_head  = tid->baw_tail = 0;
 		tid->active	   = false;
 		tid->clear_ps_filter = true;
-		__skb_queue_head_init(&tid->buf_q);
+		tid->has_queued  = false;
 		__skb_queue_head_init(&tid->retry_q);
 		INIT_LIST_HEAD(&tid->list);
 		acno = TID_TO_WME_AC(tidno);
 		tid->txq = sc->tx.txq_map[acno];
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
@@ -2880,9 +2802,8 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 	struct ath_txq *txq;
 	int tidno;
 
-	for (tidno = 0, tid = &an->tid[tidno];
-	     tidno < IEEE80211_NUM_TIDS; tidno++, tid++) {
-
+	for (tidno = 0; tidno < IEEE80211_NUM_TIDS; tidno++) {
+		tid = ath_node_to_tid(an, tidno);
 		txq = tid->txq;
 
 		ath_txq_lock(sc, txq);
@@ -2894,6 +2815,9 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an)
 		tid->active = false;
 
 		ath_txq_unlock(sc, txq);
+
+		if (!an->sta)
+			break; /* just one multicast ath_atx_tid */
 	}
 }
 
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-08-05 16:05         ` [ath9k-devel] [PATCH v4] " Toke Høiland-Jørgensen
@ 2016-08-22 15:43           ` Kalle Valo
  2016-08-22 16:16             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Kalle Valo @ 2016-08-22 15:43 UTC (permalink / raw)
  To: ath9k-devel

Toke H?iland-J?rgensen <toke@toke.dk> writes:

> This switches ath9k over to using the mac80211 intermediate software
> queueing mechanism for data packets. It removes the queueing inside the
> driver, except for the retry queue, and instead pulls from mac80211 when
> a packet is needed. The retry queue is used to store a packet that was
> pulled but can't be sent immediately.
>
> The old code path in ath_tx_start that would queue packets has been
> removed completely, as has the qlen limit tunables (since there's no
> longer a queue in the driver to limit).
>
> Based on Tim's original patch set, but reworked quite thoroughly.
>
> Cc: Tim Shepard <shep@alum.mit.edu>
> Cc: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
> ---
> Changes since v3 (most due to Felix; thanks!):
>   - Correctly notify mac80211 when there are packets in the retry queue
>     on powersave start/stop.
>   - Get rid of ath_tx_aggr_resume().
>   - Some readability changes and additional WARN_ON/BUG_ON in
>     appropriate places.

This is great work but due to the regressions I'm not sure if this will
be ready for 4.9. To get more testing time I wonder if we should wait
for 4.10? IMHO applying this in the end of the cycle is too risky and we
should try to maximise the time linux-next by applying this just after
-rc1 is released.

Thoughts?

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-08-22 15:43           ` Kalle Valo
@ 2016-08-22 16:16             ` Toke Høiland-Jørgensen
  2016-08-22 17:02               ` Kalle Valo
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-08-22 16:16 UTC (permalink / raw)
  To: ath9k-devel

Kalle Valo <kvalo@codeaurora.org> writes:

> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>
>> This switches ath9k over to using the mac80211 intermediate software
>> queueing mechanism for data packets. It removes the queueing inside the
>> driver, except for the retry queue, and instead pulls from mac80211 when
>> a packet is needed. The retry queue is used to store a packet that was
>> pulled but can't be sent immediately.
>>
>> The old code path in ath_tx_start that would queue packets has been
>> removed completely, as has the qlen limit tunables (since there's no
>> longer a queue in the driver to limit).
>>
>> Based on Tim's original patch set, but reworked quite thoroughly.
>>
>> Cc: Tim Shepard <shep@alum.mit.edu>
>> Cc: Felix Fietkau <nbd@nbd.name>
>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>> ---
>> Changes since v3 (most due to Felix; thanks!):
>>   - Correctly notify mac80211 when there are packets in the retry queue
>>     on powersave start/stop.
>>   - Get rid of ath_tx_aggr_resume().
>>   - Some readability changes and additional WARN_ON/BUG_ON in
>>     appropriate places.
>
> This is great work but due to the regressions I'm not sure if this
> will be ready for 4.9. To get more testing time I wonder if we should
> wait for 4.10? IMHO applying this in the end of the cycle is too risky
> and we should try to maximise the time linux-next by applying this
> just after -rc1 is released.
>
> Thoughts?

Well, now that we understand what is causing the throughput regressions,
fixing them should be fairly straight forward (yeah, famous last words,
but still...). I already have a patch for the fast path and will go poke
at the slow path next. It'll probably require another workaround or two,
so I guess it won't be the architecturally clean ideal solution; but it
would make it possible to have something that works for 4.9 and then
iterate for a cleaner design for 4.10.

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-08-22 16:16             ` Toke Høiland-Jørgensen
@ 2016-08-22 17:02               ` Kalle Valo
  2016-08-22 17:13                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Kalle Valo @ 2016-08-22 17:02 UTC (permalink / raw)
  To: ath9k-devel

Toke H?iland-J?rgensen <toke@toke.dk> writes:

> Kalle Valo <kvalo@codeaurora.org> writes:
>
>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>
>>> This switches ath9k over to using the mac80211 intermediate software
>>> queueing mechanism for data packets. It removes the queueing inside the
>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>> a packet is needed. The retry queue is used to store a packet that was
>>> pulled but can't be sent immediately.
>>>
>>> The old code path in ath_tx_start that would queue packets has been
>>> removed completely, as has the qlen limit tunables (since there's no
>>> longer a queue in the driver to limit).
>>>
>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>>
>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>> Cc: Felix Fietkau <nbd@nbd.name>
>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>> ---
>>> Changes since v3 (most due to Felix; thanks!):
>>>   - Correctly notify mac80211 when there are packets in the retry queue
>>>     on powersave start/stop.
>>>   - Get rid of ath_tx_aggr_resume().
>>>   - Some readability changes and additional WARN_ON/BUG_ON in
>>>     appropriate places.
>>
>> This is great work but due to the regressions I'm not sure if this
>> will be ready for 4.9. To get more testing time I wonder if we should
>> wait for 4.10? IMHO applying this in the end of the cycle is too risky
>> and we should try to maximise the time linux-next by applying this
>> just after -rc1 is released.
>>
>> Thoughts?
>
> Well, now that we understand what is causing the throughput regressions,
> fixing them should be fairly straight forward (yeah, famous last words,
> but still...). I already have a patch for the fast path and will go poke
> at the slow path next. It'll probably require another workaround or two,
> so I guess it won't be the architecturally clean ideal solution; but it
> would make it possible to have something that works for 4.9 and then
> iterate for a cleaner design for 4.10.

But if we try to rush this to 4.9 it won't be in linux-next for long. We
are now in -rc3 and let's say that the patches are ready to apply in two
weeks. That would leave us only two weeks of -next time before the merge
window, which I think is not enough for a controversial patch like this
one. There might be other bugs lurking which haven't been found yet.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-08-22 17:02               ` Kalle Valo
@ 2016-08-22 17:13                 ` Toke Høiland-Jørgensen
  2016-08-23  6:59                   ` Kalle Valo
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-08-22 17:13 UTC (permalink / raw)
  To: ath9k-devel

Kalle Valo <kvalo@codeaurora.org> writes:

> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>
>> Kalle Valo <kvalo@codeaurora.org> writes:
>>
>>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>>
>>>> This switches ath9k over to using the mac80211 intermediate software
>>>> queueing mechanism for data packets. It removes the queueing inside the
>>>> driver, except for the retry queue, and instead pulls from mac80211 when
>>>> a packet is needed. The retry queue is used to store a packet that was
>>>> pulled but can't be sent immediately.
>>>>
>>>> The old code path in ath_tx_start that would queue packets has been
>>>> removed completely, as has the qlen limit tunables (since there's no
>>>> longer a queue in the driver to limit).
>>>>
>>>> Based on Tim's original patch set, but reworked quite thoroughly.
>>>>
>>>> Cc: Tim Shepard <shep@alum.mit.edu>
>>>> Cc: Felix Fietkau <nbd@nbd.name>
>>>> Signed-off-by: Toke H?iland-J?rgensen <toke@toke.dk>
>>>> ---
>>>> Changes since v3 (most due to Felix; thanks!):
>>>>   - Correctly notify mac80211 when there are packets in the retry queue
>>>>     on powersave start/stop.
>>>>   - Get rid of ath_tx_aggr_resume().
>>>>   - Some readability changes and additional WARN_ON/BUG_ON in
>>>>     appropriate places.
>>>
>>> This is great work but due to the regressions I'm not sure if this
>>> will be ready for 4.9. To get more testing time I wonder if we should
>>> wait for 4.10? IMHO applying this in the end of the cycle is too risky
>>> and we should try to maximise the time linux-next by applying this
>>> just after -rc1 is released.
>>>
>>> Thoughts?
>>
>> Well, now that we understand what is causing the throughput regressions,
>> fixing them should be fairly straight forward (yeah, famous last words,
>> but still...). I already have a patch for the fast path and will go poke
>> at the slow path next. It'll probably require another workaround or two,
>> so I guess it won't be the architecturally clean ideal solution; but it
>> would make it possible to have something that works for 4.9 and then
>> iterate for a cleaner design for 4.10.
>
> But if we try to rush this to 4.9 it won't be in linux-next for long. We
> are now in -rc3 and let's say that the patches are ready to apply in two
> weeks. That would leave us only two weeks of -next time before the merge
> window, which I think is not enough for a controversial patch like this
> one. There might be other bugs lurking which haven't been found yet.

What, other hidden bugs? Unpossible! :)

Would it be possible to merge the partial solution (which is ready now,
basically) and fix the slow path in a separate patch later?

(Just spit-balling here; I'm still fairly new to this process. But I am
concerned that we'll hit a catch-22 where we can't get wider testing
before it's "ready" and we can't prove that it's "ready" until we've had
wider testing...)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-08-22 17:13                 ` Toke Høiland-Jørgensen
@ 2016-08-23  6:59                   ` Kalle Valo
  2016-08-23  8:58                     ` Arend van Spriel
  2016-10-05 14:09                     ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 50+ messages in thread
From: Kalle Valo @ 2016-08-23  6:59 UTC (permalink / raw)
  To: ath9k-devel

Toke H?iland-J?rgensen <toke@toke.dk> writes:

>>>> This is great work but due to the regressions I'm not sure if this
>>>> will be ready for 4.9. To get more testing time I wonder if we should
>>>> wait for 4.10? IMHO applying this in the end of the cycle is too risky
>>>> and we should try to maximise the time linux-next by applying this
>>>> just after -rc1 is released.
>>>>
>>>> Thoughts?
>>>
>>> Well, now that we understand what is causing the throughput regressions,
>>> fixing them should be fairly straight forward (yeah, famous last words,
>>> but still...). I already have a patch for the fast path and will go poke
>>> at the slow path next. It'll probably require another workaround or two,
>>> so I guess it won't be the architecturally clean ideal solution; but it
>>> would make it possible to have something that works for 4.9 and then
>>> iterate for a cleaner design for 4.10.
>>
>> But if we try to rush this to 4.9 it won't be in linux-next for long. We
>> are now in -rc3 and let's say that the patches are ready to apply in two
>> weeks. That would leave us only two weeks of -next time before the merge
>> window, which I think is not enough for a controversial patch like this
>> one. There might be other bugs lurking which haven't been found yet.
>
> What, other hidden bugs? Unpossible! :)

Yeah, right ;)

> Would it be possible to merge the partial solution (which is ready now,
> basically) and fix the slow path in a separate patch later?

What do you mean with partial solution? You mean ath9k users would
suffer from regressions until they are fixed? We can't do that.

> (Just spit-balling here; I'm still fairly new to this process. But I am
> concerned that we'll hit a catch-22 where we can't get wider testing
> before it's "ready" and we can't prove that it's "ready" until we've had
> wider testing...)

I understand your point, but I don't want to rush this to 4.9 and then
start getting lots of bug reports and eventually forced to revert it. If
we just found a new serious regression the chances are that there are
more lurking somewhere and this patch is just not ready yet.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-08-23  6:59                   ` Kalle Valo
@ 2016-08-23  8:58                     ` Arend van Spriel
  2016-10-05 14:09                     ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 50+ messages in thread
From: Arend van Spriel @ 2016-08-23  8:58 UTC (permalink / raw)
  To: ath9k-devel



On 23-08-16 08:59, Kalle Valo wrote:
> Toke H?iland-J?rgensen <toke@toke.dk> writes:
> 
>>>>> This is great work but due to the regressions I'm not sure if this
>>>>> will be ready for 4.9. To get more testing time I wonder if we should
>>>>> wait for 4.10? IMHO applying this in the end of the cycle is too risky
>>>>> and we should try to maximise the time linux-next by applying this
>>>>> just after -rc1 is released.
>>>>>
>>>>> Thoughts?
>>>>
>>>> Well, now that we understand what is causing the throughput regressions,
>>>> fixing them should be fairly straight forward (yeah, famous last words,
>>>> but still...). I already have a patch for the fast path and will go poke
>>>> at the slow path next. It'll probably require another workaround or two,
>>>> so I guess it won't be the architecturally clean ideal solution; but it
>>>> would make it possible to have something that works for 4.9 and then
>>>> iterate for a cleaner design for 4.10.
>>>
>>> But if we try to rush this to 4.9 it won't be in linux-next for long. We
>>> are now in -rc3 and let's say that the patches are ready to apply in two
>>> weeks. That would leave us only two weeks of -next time before the merge
>>> window, which I think is not enough for a controversial patch like this
>>> one. There might be other bugs lurking which haven't been found yet.
>>
>> What, other hidden bugs? Unpossible! :)
> 
> Yeah, right ;)
> 
>> Would it be possible to merge the partial solution (which is ready now,
>> basically) and fix the slow path in a separate patch later?
> 
> What do you mean with partial solution? You mean ath9k users would
> suffer from regressions until they are fixed? We can't do that.
> 
>> (Just spit-balling here; I'm still fairly new to this process. But I am
>> concerned that we'll hit a catch-22 where we can't get wider testing
>> before it's "ready" and we can't prove that it's "ready" until we've had
>> wider testing...)

So could the wider testing be accomplished by working on a branch in the
wireless-testing repo and make its availability known on wireless-list,
ath?k-list, LWN or whatever.

Regards,
Arend

> I understand your point, but I don't want to rush this to 4.9 and then
> start getting lots of bug reports and eventually forced to revert it. If
> we just found a new serious regression the chances are that there are
> more lurking somewhere and this patch is just not ready yet.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-08-23  6:59                   ` Kalle Valo
  2016-08-23  8:58                     ` Arend van Spriel
@ 2016-10-05 14:09                     ` Toke Høiland-Jørgensen
  2016-10-05 15:50                       ` Kalle Valo
  1 sibling, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-10-05 14:09 UTC (permalink / raw)
  To: ath9k-devel

Kalle Valo <kvalo@codeaurora.org> writes:

> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>
>>>>> This is great work but due to the regressions I'm not sure if this
>>>>> will be ready for 4.9. To get more testing time I wonder if we should
>>>>> wait for 4.10? IMHO applying this in the end of the cycle is too risky
>>>>> and we should try to maximise the time linux-next by applying this
>>>>> just after -rc1 is released.
>>>>>
>>>>> Thoughts?
>>>>
>>>> Well, now that we understand what is causing the throughput regressions,
>>>> fixing them should be fairly straight forward (yeah, famous last words,
>>>> but still...). I already have a patch for the fast path and will go poke
>>>> at the slow path next. It'll probably require another workaround or two,
>>>> so I guess it won't be the architecturally clean ideal solution; but it
>>>> would make it possible to have something that works for 4.9 and then
>>>> iterate for a cleaner design for 4.10.
>>>
>>> But if we try to rush this to 4.9 it won't be in linux-next for long. We
>>> are now in -rc3 and let's say that the patches are ready to apply in two
>>> weeks. That would leave us only two weeks of -next time before the merge
>>> window, which I think is not enough for a controversial patch like this
>>> one. There might be other bugs lurking which haven't been found yet.
>>
>> What, other hidden bugs? Unpossible! :)
>
> Yeah, right ;)
>
>> Would it be possible to merge the partial solution (which is ready now,
>> basically) and fix the slow path in a separate patch later?
>
> What do you mean with partial solution? You mean ath9k users would
> suffer from regressions until they are fixed? We can't do that.
>
>> (Just spit-balling here; I'm still fairly new to this process. But I am
>> concerned that we'll hit a catch-22 where we can't get wider testing
>> before it's "ready" and we can't prove that it's "ready" until we've had
>> wider testing...)
>
> I understand your point, but I don't want to rush this to 4.9 and then
> start getting lots of bug reports and eventually forced to revert it. If
> we just found a new serious regression the chances are that there are
> more lurking somewhere and this patch is just not ready yet.

So, the changes to mac80211 that fixes the known regressions of this
patch have gone in. Any chance of seeing this merged during the current
merge window? :)

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-10-05 14:09                     ` Toke Høiland-Jørgensen
@ 2016-10-05 15:50                       ` Kalle Valo
  2016-10-05 16:55                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Kalle Valo @ 2016-10-05 15:50 UTC (permalink / raw)
  To: ath9k-devel

Toke H?iland-J?rgensen <toke@toke.dk> writes:

> Kalle Valo <kvalo@codeaurora.org> writes:
>
>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>
>> I understand your point, but I don't want to rush this to 4.9 and then
>> start getting lots of bug reports and eventually forced to revert it. If
>> we just found a new serious regression the chances are that there are
>> more lurking somewhere and this patch is just not ready yet.
>
> So, the changes to mac80211 that fixes the known regressions of this
> patch have gone in.

I guess you mean this commit:

bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue

(Just making sure that I have the same commit in my tree when I apply this)

> Any chance of seeing this merged during the current merge window? :)

I sent last new feature ("-next") patches for 4.9 last week, sorry. So
this has to wait for 4.10.

And I assume I need to take v5:

https://patchwork.kernel.org/patch/9311037/

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-10-05 15:50                       ` Kalle Valo
@ 2016-10-05 16:55                         ` Toke Høiland-Jørgensen
  2016-10-05 17:54                           ` Kalle Valo
  0 siblings, 1 reply; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-10-05 16:55 UTC (permalink / raw)
  To: ath9k-devel

Kalle Valo <kvalo@codeaurora.org> writes:

> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>
>> Kalle Valo <kvalo@codeaurora.org> writes:
>>
>>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>>
>>> I understand your point, but I don't want to rush this to 4.9 and then
>>> start getting lots of bug reports and eventually forced to revert it. If
>>> we just found a new serious regression the chances are that there are
>>> more lurking somewhere and this patch is just not ready yet.
>>
>> So, the changes to mac80211 that fixes the known regressions of this
>> patch have gone in.
>
> I guess you mean this commit:
>
> bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue
>
> (Just making sure that I have the same commit in my tree when I apply
> this)

Yup, that's the one :)

>> Any chance of seeing this merged during the current merge window? :)
>
> I sent last new feature ("-next") patches for 4.9 last week, sorry. So
> this has to wait for 4.10.

Ah, right, I think I got my merge windows confused. You already said you
wouldn't take it for 4.9. So I guess what I'm asking is for you to put
it into the appropriate -next tree so it can get some wider exposure
ahead of the *next* merge window...

> And I assume I need to take v5:
>
> https://patchwork.kernel.org/patch/9311037/

Yes. Haven't noticed anything that changed since that might conflict
with it, but let me know if I missed something and you want a refreshed
version.

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-10-05 16:55                         ` Toke Høiland-Jørgensen
@ 2016-10-05 17:54                           ` Kalle Valo
  2016-10-05 19:56                             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 50+ messages in thread
From: Kalle Valo @ 2016-10-05 17:54 UTC (permalink / raw)
  To: ath9k-devel

Toke H?iland-J?rgensen <toke@toke.dk> writes:

> Kalle Valo <kvalo@codeaurora.org> writes:
>
>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>
>>> Kalle Valo <kvalo@codeaurora.org> writes:
>>>
>>>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>>>
>>>> I understand your point, but I don't want to rush this to 4.9 and then
>>>> start getting lots of bug reports and eventually forced to revert it. If
>>>> we just found a new serious regression the chances are that there are
>>>> more lurking somewhere and this patch is just not ready yet.
>>>
>>> So, the changes to mac80211 that fixes the known regressions of this
>>> patch have gone in.
>>
>> I guess you mean this commit:
>>
>> bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue
>>
>> (Just making sure that I have the same commit in my tree when I apply
>> this)
>
> Yup, that's the one :)
>
>>> Any chance of seeing this merged during the current merge window? :)
>>
>> I sent last new feature ("-next") patches for 4.9 last week, sorry. So
>> this has to wait for 4.10.
>
> Ah, right, I think I got my merge windows confused. You already said you
> wouldn't take it for 4.9. So I guess what I'm asking is for you to put
> it into the appropriate -next tree so it can get some wider exposure
> ahead of the *next* merge window...

Yeah, we have plenty of time for 4.10 :) So my plan is to apply this
after I open wireless-drivers-next in 2-3 weeks or so. That would mean
that the patch would hit Linus' tree when 4.10-rc1 is released
(estimated to happen on 2017-01-01). The timing is actually perfect as
now we get maximal testing time on -next.

>> And I assume I need to take v5:
>>
>> https://patchwork.kernel.org/patch/9311037/
>
> Yes. Haven't noticed anything that changed since that might conflict
> with it, but let me know if I missed something and you want a refreshed
> version.

Thanks, I'll let you know if there are any problems.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ath9k-devel] [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
  2016-10-05 17:54                           ` Kalle Valo
@ 2016-10-05 19:56                             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 50+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-10-05 19:56 UTC (permalink / raw)
  To: ath9k-devel

Kalle Valo <kvalo@codeaurora.org> writes:

> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>
>> Kalle Valo <kvalo@codeaurora.org> writes:
>>
>>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>>
>>>> Kalle Valo <kvalo@codeaurora.org> writes:
>>>>
>>>>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>>>>>
>>>>> I understand your point, but I don't want to rush this to 4.9 and then
>>>>> start getting lots of bug reports and eventually forced to revert it. If
>>>>> we just found a new serious regression the chances are that there are
>>>>> more lurking somewhere and this patch is just not ready yet.
>>>>
>>>> So, the changes to mac80211 that fixes the known regressions of this
>>>> patch have gone in.
>>>
>>> I guess you mean this commit:
>>>
>>> bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue
>>>
>>> (Just making sure that I have the same commit in my tree when I apply
>>> this)
>>
>> Yup, that's the one :)
>>
>>>> Any chance of seeing this merged during the current merge window? :)
>>>
>>> I sent last new feature ("-next") patches for 4.9 last week, sorry. So
>>> this has to wait for 4.10.
>>
>> Ah, right, I think I got my merge windows confused. You already said you
>> wouldn't take it for 4.9. So I guess what I'm asking is for you to put
>> it into the appropriate -next tree so it can get some wider exposure
>> ahead of the *next* merge window...
>
> Yeah, we have plenty of time for 4.10 :) So my plan is to apply this
> after I open wireless-drivers-next in 2-3 weeks or so. That would mean
> that the patch would hit Linus' tree when 4.10-rc1 is released
> (estimated to happen on 2017-01-01). The timing is actually perfect as
> now we get maximal testing time on -next.

So the -next trees are those that are open outside the merge window.
Right, got it; thanks :)

>>> And I assume I need to take v5:
>>>
>>> https://patchwork.kernel.org/patch/9311037/
>>
>> Yes. Haven't noticed anything that changed since that might conflict
>> with it, but let me know if I missed something and you want a refreshed
>> version.
>
> Thanks, I'll let you know if there are any problems.

Cool.

-Toke

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-10-05 19:56 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17  9:09 [ath9k-devel] [PATCH 0/2] ath9k: Add airtime fairness scheduler Toke Høiland-Jørgensen
2016-06-17  9:09 ` [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues Toke Høiland-Jørgensen
2016-06-17 13:28   ` Felix Fietkau
2016-06-17 13:43     ` Toke Høiland-Jørgensen
2016-06-17 13:48       ` Felix Fietkau
2016-06-17 16:33         ` Felix Fietkau
2016-06-17 14:08     ` Tim Shepard
2016-06-17 14:35       ` Felix Fietkau
2016-06-17 17:45         ` Tim Shepard
2016-06-17 19:16           ` Toke Høiland-Jørgensen
2016-06-17 14:10     ` Dave Taht
2016-06-18 19:06   ` [ath9k-devel] [PATCH] ath9k: Switch to using " Toke Høiland-Jørgensen
2016-06-19  3:17     ` Tim Shepard
2016-06-19  8:52       ` Toke Høiland-Jørgensen
2016-06-19 13:40         ` Tim Shepard
2016-06-19 13:50           ` Toke Høiland-Jørgensen
2016-07-03  3:53     ` Tim Shepard
2016-07-04 17:47       ` Toke Høiland-Jørgensen
2016-07-06 13:23         ` Felix Fietkau
2016-07-06 14:46           ` Toke Høiland-Jørgensen
2016-07-06 16:17     ` [ath9k-devel] [PATCH v2] " Toke Høiland-Jørgensen
2016-07-06 18:13       ` Felix Fietkau
2016-07-06 18:52         ` Toke Høiland-Jørgensen
2016-07-06 18:59           ` Felix Fietkau
2016-07-06 19:08             ` Toke Høiland-Jørgensen
2016-07-06 18:19       ` Sebastian Gottschall
2016-07-06 19:38       ` [ath9k-devel] [PATCH v3] " Toke Høiland-Jørgensen
2016-07-08 14:26         ` [ath9k-devel] [v3] " Kalle Valo
2016-07-08 15:53           ` Toke Høiland-Jørgensen
2016-07-08 16:10             ` Felix Fietkau
2016-07-08 16:28               ` Toke Høiland-Jørgensen
2016-07-08 16:31                 ` Felix Fietkau
2016-07-08 16:38                   ` Toke Høiland-Jørgensen
2016-07-08 18:24                   ` Sebastian Gottschall
2016-07-09 12:00                     ` Toke Høiland-Jørgensen
2016-07-08 16:38         ` [ath9k-devel] [PATCH v3] " Tim Shepard
2016-07-09 15:45           ` Toke Høiland-Jørgensen
2016-08-05 16:05         ` [ath9k-devel] [PATCH v4] " Toke Høiland-Jørgensen
2016-08-22 15:43           ` Kalle Valo
2016-08-22 16:16             ` Toke Høiland-Jørgensen
2016-08-22 17:02               ` Kalle Valo
2016-08-22 17:13                 ` Toke Høiland-Jørgensen
2016-08-23  6:59                   ` Kalle Valo
2016-08-23  8:58                     ` Arend van Spriel
2016-10-05 14:09                     ` Toke Høiland-Jørgensen
2016-10-05 15:50                       ` Kalle Valo
2016-10-05 16:55                         ` Toke Høiland-Jørgensen
2016-10-05 17:54                           ` Kalle Valo
2016-10-05 19:56                             ` Toke Høiland-Jørgensen
2016-06-17  9:09 ` [ath9k-devel] [PATCH 2/2] ath9k: Add a per-station airtime deficit scheduler Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).