All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load
@ 2011-02-13 17:56 Nathaniel J. Smith
  2011-02-13 17:56 ` [PATCH 1/5] iwlwifi: Simplify tx queue management Nathaniel J. Smith
                   ` (7 more replies)
  0 siblings, 8 replies; 48+ messages in thread
From: Nathaniel J. Smith @ 2011-02-13 17:56 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, wey-yi.w.guy, ilw, Nathaniel J. Smith

I have an iwl3945 in my laptop, and find that whenever I saturate the
outgoing connection -- which happens daily when my rsync backup kicks
in -- then latency becomes astronomical (what Jim Gettys calls
"bufferbloat"). I've measured 12-13 second ping times to my router. As
you can imagine, this makes web browsing or interactive SSH somewhat
uncomfortable.

The problem is the very large TX queue in the iwlwifi driver, which is
non-adjustable and willing to buffer several hundred packets at a
time. My pings, HTTP connections, and so on get stuck at the end of
this line, and have to wait for multiple seconds while all the packets
ahead of them get sent.

(A secondary problem is the txqueue in the network layer, but this can
be fixed via 'ip link set wlan0 txqueuelen $LEN', or by traffic
shaping. But so long as the driver queue is so huge, neither really
makes any difference.)

This patch series teaches the driver to measure the average rate of
packet transmission for each tx queue, and adjusts the queue size
dynamically in an attempt to achieve ~2 ms of added latency.

Patches are against wireless-testing (0fd54899). All patches tested on
iwl3945 hardware. I've been running the full patch (against an earlier
wireless-testing) full time for more than a month, and it's been just
as reliable as the standard driver.

Hard numbers
------------
I made a test file:
  dd if=/dev/urandom of=/tmp/50-megs bs=1M count=50
Then ran a test load and a latency measurement, started at the same time:
  scp /tmp/50-megs 192.168.16.1:/dev/zero
  sleep 20 && ping -n 192.168.16.1 -c 20 -q
Measurements are on my home network, which is not the cleanest
environment, and bandwidth fluctuates enough that it isn't worth
reporting (scp reports between ~100KB/s and 200KB/s). But at least
it's realistic... Each test performed twice, in intermixed order.

(numbers are RTT min/avg/max/mdev in ms, as reported by ping)

Baseline: current wireless-testing, with txqueuelen 1000
  1550.601  5404.818  8672.305  2423.160 ms
  1484.907  4554.160  6632.850  1326.948 ms

Best currently achievable: current wireless-testing, with txqueuelen 2
  2196.802  4121.507  6009.160   910.079 ms
  1092.415  2170.247  3860.628   674.256 ms

New best: patched wireless-testing, with txqueuelen 2
     3.812    44.419    95.883    26.462 ms
     1.793    34.926   137.325    33.161 ms

As you can see, this patch achieves something like a 2 order of
magnitude reduction in RTT, and if there's any effect on throughput
then it's too subtle to notice in my environment. (The single highest
observed throughput was actually in the second test of the patched
kernel.)

Nathaniel J. Smith (5):
  iwlwifi: Simplify tx queue management
  iwlwifi: Convert the tx queue high_mark to an atomic_t
  iwlwifi: Invert the sense of the queue high_mark
  iwlwifi: auto-tune tx queue size to minimize latency
  iwlwifi: make current tx queue sizes visible in debugfs

 drivers/net/wireless/iwlwifi/iwl-3945.c     |    6 ++-
 drivers/net/wireless/iwlwifi/iwl-4965.c     |    4 +-
 drivers/net/wireless/iwlwifi/iwl-agn-lib.c  |    4 +-
 drivers/net/wireless/iwlwifi/iwl-agn-tx.c   |   11 +++-
 drivers/net/wireless/iwlwifi/iwl-core.h     |    1 +
 drivers/net/wireless/iwlwifi/iwl-debugfs.c  |    6 ++-
 drivers/net/wireless/iwlwifi/iwl-dev.h      |   20 ++++++--
 drivers/net/wireless/iwlwifi/iwl-tx.c       |   62 ++++++++++++++++++++-------
 drivers/net/wireless/iwlwifi/iwl3945-base.c |    7 ++-
 9 files changed, 87 insertions(+), 34 deletions(-)


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
@ 2011-02-13 17:56 ` Nathaniel J. Smith
  2011-02-14  9:57   ` Johannes Berg
  2011-02-14 15:33   ` wwguy
  2011-02-13 17:56 ` [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t Nathaniel J. Smith
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 48+ messages in thread
From: Nathaniel J. Smith @ 2011-02-13 17:56 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, wey-yi.w.guy, ilw, Nathaniel J. Smith

Previously, the iwlwifi driver filled its transmit queue until it
reached a high-water mark, and then stopped until it had fallen to a
low-water mark. This basic logic makes sense for interrupt mitigation
-- you might not want to wake up the CPU after every packet, but
instead wait until a batch of packets has been sent -- except the
iwlwifi driver doesn't actually do any interrupt mitigation; the CPU
wakes up after every packet transmitted anyway. So we simplify the
code to maintain only a single limit on total queue length, and
whenever we drop below that limit we allow more packets in.

This patch should have no user-visible effect.

Signed-off-by: Nathaniel J. Smith <njs@pobox.com>
---
 drivers/net/wireless/iwlwifi/iwl-3945.c     |    4 ++--
 drivers/net/wireless/iwlwifi/iwl-4965.c     |    4 ++--
 drivers/net/wireless/iwlwifi/iwl-agn-lib.c  |    4 ++--
 drivers/net/wireless/iwlwifi/iwl-agn-tx.c   |    6 +++---
 drivers/net/wireless/iwlwifi/iwl-dev.h      |    2 --
 drivers/net/wireless/iwlwifi/iwl-tx.c       |   18 +++++++-----------
 drivers/net/wireless/iwlwifi/iwl3945-base.c |    4 ++--
 7 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c
index 5b6932c..6d2fb06 100644
--- a/drivers/net/wireless/iwlwifi/iwl-3945.c
+++ b/drivers/net/wireless/iwlwifi/iwl-3945.c
@@ -274,7 +274,7 @@ int iwl3945_rs_next_rate(struct iwl_priv *priv, int rate)
  *
  * When FW advances 'R' index, all entries between old and new 'R' index
  * need to be reclaimed. As result, some free space forms. If there is
- * enough free space (> low mark), wake the stack that feeds us.
+ * enough free space (> high mark), wake the stack that feeds us.
  */
 static void iwl3945_tx_queue_reclaim(struct iwl_priv *priv,
 				     int txq_id, int index)
@@ -294,7 +294,7 @@ static void iwl3945_tx_queue_reclaim(struct iwl_priv *priv,
 		priv->cfg->ops->lib->txq_free_tfd(priv, txq);
 	}
 
-	if (iwl_queue_space(q) > q->low_mark && (txq_id >= 0) &&
+	if (iwl_queue_space(q) > q->high_mark && (txq_id >= 0) &&
 			(txq_id != IWL39_CMD_QUEUE_NUM) &&
 			priv->mac80211_registered)
 		iwl_wake_queue(priv, txq);
diff --git a/drivers/net/wireless/iwlwifi/iwl-4965.c b/drivers/net/wireless/iwlwifi/iwl-4965.c
index 8998ed1..b8955d7 100644
--- a/drivers/net/wireless/iwlwifi/iwl-4965.c
+++ b/drivers/net/wireless/iwlwifi/iwl-4965.c
@@ -2230,7 +2230,7 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
 						       tid, freed);
 
 			if (priv->mac80211_registered &&
-			    (iwl_queue_space(&txq->q) > txq->q.low_mark) &&
+			    (iwl_queue_space(&txq->q) > txq->q.high_mark) &&
 			    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 				iwl_wake_queue(priv, txq);
 		}
@@ -2255,7 +2255,7 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
 			IWL_DEBUG_TX_REPLY(priv, "Station not known\n");
 
 		if (priv->mac80211_registered &&
-		    (iwl_queue_space(&txq->q) > txq->q.low_mark))
+		    (iwl_queue_space(&txq->q) > txq->q.high_mark))
 			iwl_wake_queue(priv, txq);
 	}
 	if (qc && likely(sta_id != IWL_INVALID_STATION))
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
index 3aa4864..f229543 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
@@ -445,7 +445,7 @@ static void iwlagn_rx_reply_tx(struct iwl_priv *priv,
 			iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
 			if (priv->mac80211_registered &&
-			    (iwl_queue_space(&txq->q) > txq->q.low_mark) &&
+			    (iwl_queue_space(&txq->q) > txq->q.high_mark) &&
 			    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 				iwl_wake_queue(priv, txq);
 		}
@@ -455,7 +455,7 @@ static void iwlagn_rx_reply_tx(struct iwl_priv *priv,
 		iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
 		if (priv->mac80211_registered &&
-		    (iwl_queue_space(&txq->q) > txq->q.low_mark))
+		    (iwl_queue_space(&txq->q) > txq->q.high_mark))
 			iwl_wake_queue(priv, txq);
 	}
 
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
index 266490d..bffba7e 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
@@ -637,7 +637,7 @@ int iwlagn_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	txq = &priv->txq[txq_id];
 	q = &txq->q;
 
-	if (unlikely(iwl_queue_space(q) < q->high_mark)) {
+	if (unlikely(iwl_queue_space(q) <= q->high_mark)) {
 		spin_unlock(&priv->sta_lock);
 		goto drop_unlock;
 	}
@@ -788,7 +788,7 @@ int iwlagn_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	if (sta_priv && sta_priv->client && !is_agg)
 		atomic_inc(&sta_priv->pending_frames);
 
-	if ((iwl_queue_space(q) < q->high_mark) && priv->mac80211_registered) {
+	if ((iwl_queue_space(q) <= q->high_mark) && priv->mac80211_registered) {
 		if (wait_write_ptr) {
 			spin_lock_irqsave(&priv->lock, flags);
 			txq->need_update = 1;
@@ -1431,7 +1431,7 @@ void iwlagn_rx_reply_compressed_ba(struct iwl_priv *priv,
 		int freed = iwlagn_tx_queue_reclaim(priv, scd_flow, index);
 		iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
-		if ((iwl_queue_space(&txq->q) > txq->q.low_mark) &&
+		if ((iwl_queue_space(&txq->q) > txq->q.high_mark) &&
 		    priv->mac80211_registered &&
 		    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 			iwl_wake_queue(priv, txq);
diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index ecfbef4..eb6586d 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -134,8 +134,6 @@ struct iwl_queue {
 	dma_addr_t dma_addr;   /* physical addr for BD's */
 	int n_window;	       /* safe queue window */
 	u32 id;
-	int low_mark;	       /* low watermark, resume queue if free
-				* space more than this */
 	int high_mark;         /* high watermark, stop queue if free
 				* space less than this */
 };
diff --git a/drivers/net/wireless/iwlwifi/iwl-tx.c b/drivers/net/wireless/iwlwifi/iwl-tx.c
index 073b6ce..78a9896 100644
--- a/drivers/net/wireless/iwlwifi/iwl-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-tx.c
@@ -210,10 +210,10 @@ EXPORT_SYMBOL(iwl_cmd_queue_free);
  * The device reads or writes the data in the queues via the device's several
  * DMA/FIFO channels.  Each queue is mapped to a single DMA channel.
  *
- * For Tx queue, there are low mark and high mark limits. If, after queuing
- * the packet for Tx, free space become < low mark, Tx queue stopped. When
- * reclaiming packets (on 'tx done IRQ), if free space become > high mark,
- * Tx queue resumed.
+ * For Tx queue, there is a high mark limit. If, after queuing the packet for
+ * Tx, free space becomes <= high mark, Tx queue stopped. When reclaiming
+ * packets (on 'tx done IRQ), if free space become > high mark, Tx queue
+ * resumed.
  *
  * See more detailed info in iwl-4965-hw.h.
  ***************************************************/
@@ -237,7 +237,7 @@ EXPORT_SYMBOL(iwl_queue_space);
 
 
 /**
- * iwl_queue_init - Initialize queue's high/low-water and read/write indexes
+ * iwl_queue_init - Initialize queue's high-water and read/write indexes
  */
 static int iwl_queue_init(struct iwl_priv *priv, struct iwl_queue *q,
 			  int count, int slots_num, u32 id)
@@ -254,10 +254,6 @@ static int iwl_queue_init(struct iwl_priv *priv, struct iwl_queue *q,
 	 * get_cmd_index is broken. */
 	BUG_ON(!is_power_of_2(slots_num));
 
-	q->low_mark = q->n_window / 4;
-	if (q->low_mark < 4)
-		q->low_mark = 4;
-
 	q->high_mark = q->n_window / 8;
 	if (q->high_mark < 2)
 		q->high_mark = 2;
@@ -368,7 +364,7 @@ int iwl_tx_queue_init(struct iwl_priv *priv, struct iwl_tx_queue *txq,
 	 * iwl_queue_inc_wrap and iwl_queue_dec_wrap are broken. */
 	BUILD_BUG_ON(TFD_QUEUE_SIZE_MAX & (TFD_QUEUE_SIZE_MAX - 1));
 
-	/* Initialize queue's high/low-water marks, and head/tail indexes */
+	/* Initialize queue's high-water mark, and head/tail indexes */
 	iwl_queue_init(priv, &txq->q, TFD_QUEUE_SIZE_MAX, slots_num, txq_id);
 
 	/* Tell device where to find queue */
@@ -547,7 +543,7 @@ int iwl_enqueue_hcmd(struct iwl_priv *priv, struct iwl_host_cmd *cmd)
  *
  * When FW advances 'R' index, all entries between old and new 'R' index
  * need to be reclaimed. As result, some free space forms.  If there is
- * enough free space (> low mark), wake the stack that feeds us.
+ * enough free space (> high mark), wake the stack that feeds us.
  */
 static void iwl_hcmd_queue_reclaim(struct iwl_priv *priv, int txq_id,
 				   int idx, int cmd_idx)
diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index adcef73..e0d4c83 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -536,7 +536,7 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	txq = &priv->txq[txq_id];
 	q = &txq->q;
 
-	if ((iwl_queue_space(q) < q->high_mark))
+	if ((iwl_queue_space(q) <= q->high_mark))
 		goto drop;
 
 	spin_lock_irqsave(&priv->lock, flags);
@@ -646,7 +646,7 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	iwl_txq_update_write_ptr(priv, txq);
 	spin_unlock_irqrestore(&priv->lock, flags);
 
-	if ((iwl_queue_space(q) < q->high_mark)
+	if ((iwl_queue_space(q) <= q->high_mark)
 	    && priv->mac80211_registered) {
 		if (wait_write_ptr) {
 			spin_lock_irqsave(&priv->lock, flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
  2011-02-13 17:56 ` [PATCH 1/5] iwlwifi: Simplify tx queue management Nathaniel J. Smith
@ 2011-02-13 17:56 ` Nathaniel J. Smith
  2011-02-14 12:16   ` Johannes Berg
  2011-02-13 17:56 ` [PATCH 3/5] iwlwifi: Invert the sense of the queue high_mark Nathaniel J. Smith
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Nathaniel J. Smith @ 2011-02-13 17:56 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, wey-yi.w.guy, ilw, Nathaniel J. Smith

This patch should cause no behavioral change whatsoever.

Signed-off-by: Nathaniel J. Smith <njs@pobox.com>
---
 drivers/net/wireless/iwlwifi/iwl-3945.c     |    2 +-
 drivers/net/wireless/iwlwifi/iwl-4965.c     |    4 ++--
 drivers/net/wireless/iwlwifi/iwl-agn-lib.c  |    4 ++--
 drivers/net/wireless/iwlwifi/iwl-agn-tx.c   |    8 +++++---
 drivers/net/wireless/iwlwifi/iwl-dev.h      |    5 +++--
 drivers/net/wireless/iwlwifi/iwl-tx.c       |    7 ++++---
 drivers/net/wireless/iwlwifi/iwl3945-base.c |    5 +++--
 7 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c
index 6d2fb06..63999ae 100644
--- a/drivers/net/wireless/iwlwifi/iwl-3945.c
+++ b/drivers/net/wireless/iwlwifi/iwl-3945.c
@@ -294,7 +294,7 @@ static void iwl3945_tx_queue_reclaim(struct iwl_priv *priv,
 		priv->cfg->ops->lib->txq_free_tfd(priv, txq);
 	}
 
-	if (iwl_queue_space(q) > q->high_mark && (txq_id >= 0) &&
+	if (iwl_queue_space(q) > atomic_read(&q->high_mark) && (txq_id >= 0) &&
 			(txq_id != IWL39_CMD_QUEUE_NUM) &&
 			priv->mac80211_registered)
 		iwl_wake_queue(priv, txq);
diff --git a/drivers/net/wireless/iwlwifi/iwl-4965.c b/drivers/net/wireless/iwlwifi/iwl-4965.c
index b8955d7..548cbe9 100644
--- a/drivers/net/wireless/iwlwifi/iwl-4965.c
+++ b/drivers/net/wireless/iwlwifi/iwl-4965.c
@@ -2230,7 +2230,7 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
 						       tid, freed);
 
 			if (priv->mac80211_registered &&
-			    (iwl_queue_space(&txq->q) > txq->q.high_mark) &&
+			    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)) &&
 			    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 				iwl_wake_queue(priv, txq);
 		}
@@ -2255,7 +2255,7 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
 			IWL_DEBUG_TX_REPLY(priv, "Station not known\n");
 
 		if (priv->mac80211_registered &&
-		    (iwl_queue_space(&txq->q) > txq->q.high_mark))
+		    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)))
 			iwl_wake_queue(priv, txq);
 	}
 	if (qc && likely(sta_id != IWL_INVALID_STATION))
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
index f229543..b2b841f 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
@@ -445,7 +445,7 @@ static void iwlagn_rx_reply_tx(struct iwl_priv *priv,
 			iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
 			if (priv->mac80211_registered &&
-			    (iwl_queue_space(&txq->q) > txq->q.high_mark) &&
+			    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)) &&
 			    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 				iwl_wake_queue(priv, txq);
 		}
@@ -455,7 +455,7 @@ static void iwlagn_rx_reply_tx(struct iwl_priv *priv,
 		iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
 		if (priv->mac80211_registered &&
-		    (iwl_queue_space(&txq->q) > txq->q.high_mark))
+		    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)))
 			iwl_wake_queue(priv, txq);
 	}
 
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
index bffba7e..0c808f0 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
@@ -31,6 +31,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/sched.h>
+#include <linux/time.h>
 
 #include "iwl-dev.h"
 #include "iwl-core.h"
@@ -637,7 +638,7 @@ int iwlagn_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	txq = &priv->txq[txq_id];
 	q = &txq->q;
 
-	if (unlikely(iwl_queue_space(q) <= q->high_mark)) {
+	if (unlikely(iwl_queue_space(q) <= atomic_read(&q->high_mark))) {
 		spin_unlock(&priv->sta_lock);
 		goto drop_unlock;
 	}
@@ -788,7 +789,8 @@ int iwlagn_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	if (sta_priv && sta_priv->client && !is_agg)
 		atomic_inc(&sta_priv->pending_frames);
 
-	if ((iwl_queue_space(q) <= q->high_mark) && priv->mac80211_registered) {
+	if ((iwl_queue_space(q) <= atomic_read(&q->high_mark))
+	     && priv->mac80211_registered) {
 		if (wait_write_ptr) {
 			spin_lock_irqsave(&priv->lock, flags);
 			txq->need_update = 1;
@@ -1431,7 +1433,7 @@ void iwlagn_rx_reply_compressed_ba(struct iwl_priv *priv,
 		int freed = iwlagn_tx_queue_reclaim(priv, scd_flow, index);
 		iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
-		if ((iwl_queue_space(&txq->q) > txq->q.high_mark) &&
+		if ((iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)) &&
 		    priv->mac80211_registered &&
 		    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 			iwl_wake_queue(priv, txq);
diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index eb6586d..cf9807c 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -37,6 +37,7 @@
 #include <linux/wait.h>
 #include <linux/leds.h>
 #include <net/ieee80211_radiotap.h>
+#include <linux/time.h>
 
 #include "iwl-eeprom.h"
 #include "iwl-csr.h"
@@ -134,8 +135,8 @@ struct iwl_queue {
 	dma_addr_t dma_addr;   /* physical addr for BD's */
 	int n_window;	       /* safe queue window */
 	u32 id;
-	int high_mark;         /* high watermark, stop queue if free
-				* space less than this */
+	atomic_t high_mark;    /* high watermark, stop queue if free
+				* space less than or equal to this */
 };
 
 /* One for each TFD */
diff --git a/drivers/net/wireless/iwlwifi/iwl-tx.c b/drivers/net/wireless/iwlwifi/iwl-tx.c
index 78a9896..1867f99 100644
--- a/drivers/net/wireless/iwlwifi/iwl-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-tx.c
@@ -30,6 +30,7 @@
 #include <linux/etherdevice.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/time.h>
 #include <net/mac80211.h>
 #include "iwl-eeprom.h"
 #include "iwl-dev.h"
@@ -254,9 +255,9 @@ static int iwl_queue_init(struct iwl_priv *priv, struct iwl_queue *q,
 	 * get_cmd_index is broken. */
 	BUG_ON(!is_power_of_2(slots_num));
 
-	q->high_mark = q->n_window / 8;
-	if (q->high_mark < 2)
-		q->high_mark = 2;
+	atomic_set(&q->high_mark, q->n_window / 8);
+	if (atomic_read(&q->high_mark) < 2)
+		atomic_set(&q->high_mark, 2);
 
 	q->write_ptr = q->read_ptr = 0;
 
diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index e0d4c83..b3f8f3e 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -44,6 +44,7 @@
 #include <linux/firmware.h>
 #include <linux/etherdevice.h>
 #include <linux/if_arp.h>
+#include <linux/time.h>
 
 #include <net/ieee80211_radiotap.h>
 #include <net/mac80211.h>
@@ -536,7 +537,7 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	txq = &priv->txq[txq_id];
 	q = &txq->q;
 
-	if ((iwl_queue_space(q) <= q->high_mark))
+	if ((iwl_queue_space(q) <= atomic_read(&q->high_mark)))
 		goto drop;
 
 	spin_lock_irqsave(&priv->lock, flags);
@@ -646,7 +647,7 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	iwl_txq_update_write_ptr(priv, txq);
 	spin_unlock_irqrestore(&priv->lock, flags);
 
-	if ((iwl_queue_space(q) <= q->high_mark)
+	if ((iwl_queue_space(q) <= atomic_read(&q->high_mark))
 	    && priv->mac80211_registered) {
 		if (wait_write_ptr) {
 			spin_lock_irqsave(&priv->lock, flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/5] iwlwifi: Invert the sense of the queue high_mark
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
  2011-02-13 17:56 ` [PATCH 1/5] iwlwifi: Simplify tx queue management Nathaniel J. Smith
  2011-02-13 17:56 ` [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t Nathaniel J. Smith
@ 2011-02-13 17:56 ` Nathaniel J. Smith
  2011-02-13 17:56 ` [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency Nathaniel J. Smith
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 48+ messages in thread
From: Nathaniel J. Smith @ 2011-02-13 17:56 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, wey-yi.w.guy, ilw, Nathaniel J. Smith

Before, the high_mark was the minimum amount of free space allowed in a
queue. Now, it's the maximum number of packets which are allowed to be
in the queue.

No behavioral changes, but this makes the high_mark much easier to reason
about. (You might think it wouldn't make a difference, but in fact the
relationship between the number of packets enqueued and the queue free
space is complicated by the need to distinguish the queue full and queue
empty conditions. Removing this complication makes the next patches
simpler.)

Signed-off-by: Nathaniel J. Smith <njs@pobox.com>
---
 drivers/net/wireless/iwlwifi/iwl-3945.c     |    3 ++-
 drivers/net/wireless/iwlwifi/iwl-4965.c     |    4 ++--
 drivers/net/wireless/iwlwifi/iwl-agn-lib.c  |    4 ++--
 drivers/net/wireless/iwlwifi/iwl-agn-tx.c   |    6 +++---
 drivers/net/wireless/iwlwifi/iwl-dev.h      |    6 +++---
 drivers/net/wireless/iwlwifi/iwl-tx.c       |   19 +++++++++++++------
 drivers/net/wireless/iwlwifi/iwl3945-base.c |    4 ++--
 7 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c
index 63999ae..f2e9e59 100644
--- a/drivers/net/wireless/iwlwifi/iwl-3945.c
+++ b/drivers/net/wireless/iwlwifi/iwl-3945.c
@@ -294,7 +294,8 @@ static void iwl3945_tx_queue_reclaim(struct iwl_priv *priv,
 		priv->cfg->ops->lib->txq_free_tfd(priv, txq);
 	}
 
-	if (iwl_queue_space(q) > atomic_read(&q->high_mark) && (txq_id >= 0) &&
+	if (iwl_queue_space_used(q) < atomic_read(&q->high_mark) &&
+			(txq_id >= 0) &&
 			(txq_id != IWL39_CMD_QUEUE_NUM) &&
 			priv->mac80211_registered)
 		iwl_wake_queue(priv, txq);
diff --git a/drivers/net/wireless/iwlwifi/iwl-4965.c b/drivers/net/wireless/iwlwifi/iwl-4965.c
index 548cbe9..6203c45 100644
--- a/drivers/net/wireless/iwlwifi/iwl-4965.c
+++ b/drivers/net/wireless/iwlwifi/iwl-4965.c
@@ -2230,7 +2230,7 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
 						       tid, freed);
 
 			if (priv->mac80211_registered &&
-			    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)) &&
+			    (iwl_queue_space_used(&txq->q) < atomic_read(&txq->q.high_mark)) &&
 			    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 				iwl_wake_queue(priv, txq);
 		}
@@ -2255,7 +2255,7 @@ static void iwl4965_rx_reply_tx(struct iwl_priv *priv,
 			IWL_DEBUG_TX_REPLY(priv, "Station not known\n");
 
 		if (priv->mac80211_registered &&
-		    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)))
+		    (iwl_queue_space_used(&txq->q) < atomic_read(&txq->q.high_mark)))
 			iwl_wake_queue(priv, txq);
 	}
 	if (qc && likely(sta_id != IWL_INVALID_STATION))
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
index b2b841f..a84b4a7 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
@@ -445,7 +445,7 @@ static void iwlagn_rx_reply_tx(struct iwl_priv *priv,
 			iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
 			if (priv->mac80211_registered &&
-			    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)) &&
+			    (iwl_queue_space_used(&txq->q) < atomic_read(&txq->q.high_mark)) &&
 			    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 				iwl_wake_queue(priv, txq);
 		}
@@ -455,7 +455,7 @@ static void iwlagn_rx_reply_tx(struct iwl_priv *priv,
 		iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
 		if (priv->mac80211_registered &&
-		    (iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)))
+		    (iwl_queue_space_used(&txq->q) < atomic_read(&txq->q.high_mark)))
 			iwl_wake_queue(priv, txq);
 	}
 
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
index 0c808f0..921876d 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
@@ -638,7 +638,7 @@ int iwlagn_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	txq = &priv->txq[txq_id];
 	q = &txq->q;
 
-	if (unlikely(iwl_queue_space(q) <= atomic_read(&q->high_mark))) {
+	if (unlikely(iwl_queue_space_used(q) >= atomic_read(&q->high_mark))) {
 		spin_unlock(&priv->sta_lock);
 		goto drop_unlock;
 	}
@@ -789,7 +789,7 @@ int iwlagn_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	if (sta_priv && sta_priv->client && !is_agg)
 		atomic_inc(&sta_priv->pending_frames);
 
-	if ((iwl_queue_space(q) <= atomic_read(&q->high_mark))
+	if ((iwl_queue_space_used(q) >= atomic_read(&q->high_mark))
 	     && priv->mac80211_registered) {
 		if (wait_write_ptr) {
 			spin_lock_irqsave(&priv->lock, flags);
@@ -1433,7 +1433,7 @@ void iwlagn_rx_reply_compressed_ba(struct iwl_priv *priv,
 		int freed = iwlagn_tx_queue_reclaim(priv, scd_flow, index);
 		iwl_free_tfds_in_queue(priv, sta_id, tid, freed);
 
-		if ((iwl_queue_space(&txq->q) > atomic_read(&txq->q.high_mark)) &&
+		if ((iwl_queue_space_used(&txq->q) < atomic_read(&txq->q.high_mark)) &&
 		    priv->mac80211_registered &&
 		    (agg->state != IWL_EMPTYING_HW_QUEUE_DELBA))
 			iwl_wake_queue(priv, txq);
diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index cf9807c..dc6dd5f 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -135,8 +135,8 @@ struct iwl_queue {
 	dma_addr_t dma_addr;   /* physical addr for BD's */
 	int n_window;	       /* safe queue window */
 	u32 id;
-	atomic_t high_mark;    /* high watermark, stop queue if free
-				* space less than or equal to this */
+	atomic_t high_mark;    /* high watermark -- don't queue more than this
+				* many entries */
 };
 
 /* One for each TFD */
@@ -729,7 +729,7 @@ extern void iwl_update_chain_flags(struct iwl_priv *priv);
 extern const u8 iwl_bcast_addr[ETH_ALEN];
 extern int iwl_rxq_stop(struct iwl_priv *priv);
 extern void iwl_txq_ctx_stop(struct iwl_priv *priv);
-extern int iwl_queue_space(const struct iwl_queue *q);
+extern int iwl_queue_space_used(const struct iwl_queue *q);
 static inline int iwl_queue_used(const struct iwl_queue *q, int i)
 {
 	return q->write_ptr >= q->read_ptr ?
diff --git a/drivers/net/wireless/iwlwifi/iwl-tx.c b/drivers/net/wireless/iwlwifi/iwl-tx.c
index 1867f99..ce62981 100644
--- a/drivers/net/wireless/iwlwifi/iwl-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-tx.c
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(iwl_cmd_queue_free);
  * See more detailed info in iwl-4965-hw.h.
  ***************************************************/
 
-int iwl_queue_space(const struct iwl_queue *q)
+static int iwl_queue_space_free(const struct iwl_queue *q)
 {
 	int s = q->read_ptr - q->write_ptr;
 
@@ -234,8 +234,15 @@ int iwl_queue_space(const struct iwl_queue *q)
 		s = 0;
 	return s;
 }
-EXPORT_SYMBOL(iwl_queue_space);
 
+int iwl_queue_space_used(const struct iwl_queue *q)
+{
+	int s = q->write_ptr - q->read_ptr;
+	if (s < 0)
+		s += q->n_bd;
+	return s;
+}
+EXPORT_SYMBOL(iwl_queue_space_used);
 
 /**
  * iwl_queue_init - Initialize queue's high-water and read/write indexes
@@ -255,9 +262,9 @@ static int iwl_queue_init(struct iwl_priv *priv, struct iwl_queue *q,
 	 * get_cmd_index is broken. */
 	BUG_ON(!is_power_of_2(slots_num));
 
-	atomic_set(&q->high_mark, q->n_window / 8);
-	if (atomic_read(&q->high_mark) < 2)
-		atomic_set(&q->high_mark, 2);
+	atomic_set(&q->high_mark, q->n_window - q->n_window / 8);
+	if (q->n_window - atomic_read(&q->high_mark) < 2)
+		atomic_set(&q->high_mark, q->n_window - 2);
 
 	q->write_ptr = q->read_ptr = 0;
 
@@ -445,7 +452,7 @@ int iwl_enqueue_hcmd(struct iwl_priv *priv, struct iwl_host_cmd *cmd)
 		return -EIO;
 	}
 
-	if (iwl_queue_space(q) < ((cmd->flags & CMD_ASYNC) ? 2 : 1)) {
+	if (iwl_queue_space_free(q) < ((cmd->flags & CMD_ASYNC) ? 2 : 1)) {
 		IWL_ERR(priv, "No space in command queue\n");
 		if (priv->cfg->ops->lib->tt_ops.ct_kill_check) {
 			is_ct_kill =
diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index b3f8f3e..d0aa01c 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -537,7 +537,7 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	txq = &priv->txq[txq_id];
 	q = &txq->q;
 
-	if ((iwl_queue_space(q) <= atomic_read(&q->high_mark)))
+	if ((iwl_queue_space_used(q) >= atomic_read(&q->high_mark)))
 		goto drop;
 
 	spin_lock_irqsave(&priv->lock, flags);
@@ -647,7 +647,7 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	iwl_txq_update_write_ptr(priv, txq);
 	spin_unlock_irqrestore(&priv->lock, flags);
 
-	if ((iwl_queue_space(q) <= atomic_read(&q->high_mark))
+	if ((iwl_queue_space_used(q) >= atomic_read(&q->high_mark))
 	    && priv->mac80211_registered) {
 		if (wait_write_ptr) {
 			spin_lock_irqsave(&priv->lock, flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
                   ` (2 preceding siblings ...)
  2011-02-13 17:56 ` [PATCH 3/5] iwlwifi: Invert the sense of the queue high_mark Nathaniel J. Smith
@ 2011-02-13 17:56 ` Nathaniel J. Smith
  2011-02-14 12:17   ` Johannes Berg
  2011-02-14 15:46   ` wwguy
  2011-02-13 17:56 ` [PATCH 5/5] iwlwifi: make current tx queue sizes visible in debugfs Nathaniel J. Smith
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 48+ messages in thread
From: Nathaniel J. Smith @ 2011-02-13 17:56 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, wey-yi.w.guy, ilw, Nathaniel J. Smith

We maintain several ring buffers that queue up packets for the
hardware to transmit. These buffers can be quite large, and the
quality of wireless connections can vary greatly; as a result, it can
be that a full queue might take multiple seconds to drain.

For instance, if there is a high-bandwidth outgoing flow, like a large
file upload, then it will completely fill the tx queues. Once the
queues are full, then any other outgoing packets -- like those sent
when a user clicks on an HTTP link, or types into an SSH session --
will have to wait at the end of the line, and will not actually be
transmitted until multiple seconds have passed.  This results in a
suboptimal level of interactive response.

So we really don't want to allow too many packets to get queued up. On
the other hand, we do want to queue up *some* packets, to maintain
throughput -- and the queue size that maintains interactivity for a
degraded 1 Mb/s connection might not be so great for some
super-fancy 802.11n 600 Mb/s connection.

This patch estimates how long it takes the hardware to transmit one
packet (by comparing each packet's queue residency time to the number
of packets it had to wait behind), and then further smooths these
estimates with an EWMA. Then, it uses the estimated packet
transmission time to choose the maximum queue size that will still
produce reasonably bounded queue residency times, given current
conditions.

Signed-off-by: Nathaniel J. Smith <njs@pobox.com>
---
 drivers/net/wireless/iwlwifi/iwl-3945.c     |    1 +
 drivers/net/wireless/iwlwifi/iwl-agn-tx.c   |    3 +++
 drivers/net/wireless/iwlwifi/iwl-core.h     |    1 +
 drivers/net/wireless/iwlwifi/iwl-dev.h      |   11 +++++++++++
 drivers/net/wireless/iwlwifi/iwl-tx.c       |   26 ++++++++++++++++++++++++++
 drivers/net/wireless/iwlwifi/iwl3945-base.c |    2 ++
 6 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c
index f2e9e59..d52bcb3 100644
--- a/drivers/net/wireless/iwlwifi/iwl-3945.c
+++ b/drivers/net/wireless/iwlwifi/iwl-3945.c
@@ -288,6 +288,7 @@ static void iwl3945_tx_queue_reclaim(struct iwl_priv *priv,
 	for (index = iwl_queue_inc_wrap(index, q->n_bd); q->read_ptr != index;
 		q->read_ptr = iwl_queue_inc_wrap(q->read_ptr, q->n_bd)) {
 
+		iwl_tx_queue_update_high_mark(txq, txq->q.read_ptr);
 		tx_info = &txq->txb[txq->q.read_ptr];
 		ieee80211_tx_status_irqsafe(priv->hw, tx_info->skb);
 		tx_info->skb = NULL;
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
index 921876d..06347e3 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c
@@ -655,6 +655,8 @@ int iwlagn_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	memset(&(txq->txb[q->write_ptr]), 0, sizeof(struct iwl_tx_info));
 	txq->txb[q->write_ptr].skb = skb;
 	txq->txb[q->write_ptr].ctx = ctx;
+	getrawmonotonic(&txq->txb[q->write_ptr].enqueue_time);
+	txq->txb[q->write_ptr].enqueue_depth = iwl_queue_space_used(q) + 1;
 
 	/* Set up first empty entry in queue's array of Tx/cmd buffers */
 	out_cmd = txq->cmd[q->write_ptr];
@@ -1215,6 +1217,7 @@ int iwlagn_tx_queue_reclaim(struct iwl_priv *priv, int txq_id, int index)
 	     q->read_ptr != index;
 	     q->read_ptr = iwl_queue_inc_wrap(q->read_ptr, q->n_bd)) {
 
+		iwl_tx_queue_update_high_mark(txq, txq->q.read_ptr);
 		tx_info = &txq->txb[txq->q.read_ptr];
 		iwlagn_tx_status(priv, tx_info,
 				 txq_id >= IWLAGN_FIRST_AMPDU_QUEUE);
diff --git a/drivers/net/wireless/iwlwifi/iwl-core.h b/drivers/net/wireless/iwlwifi/iwl-core.h
index e0ec170..b97a6dc 100644
--- a/drivers/net/wireless/iwlwifi/iwl-core.h
+++ b/drivers/net/wireless/iwlwifi/iwl-core.h
@@ -532,6 +532,7 @@ int iwl_tx_queue_init(struct iwl_priv *priv, struct iwl_tx_queue *txq,
 		      int slots_num, u32 txq_id);
 void iwl_tx_queue_reset(struct iwl_priv *priv, struct iwl_tx_queue *txq,
 			int slots_num, u32 txq_id);
+void iwl_tx_queue_update_high_mark(struct iwl_tx_queue *txq, int index);
 void iwl_tx_queue_free(struct iwl_priv *priv, int txq_id);
 void iwl_setup_watchdog(struct iwl_priv *priv);
 /*****************************************************
diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index dc6dd5f..3b421fd 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -38,6 +38,7 @@
 #include <linux/leds.h>
 #include <net/ieee80211_radiotap.h>
 #include <linux/time.h>
+#include <linux/average.h>
 
 #include "iwl-eeprom.h"
 #include "iwl-csr.h"
@@ -137,12 +138,22 @@ struct iwl_queue {
 	u32 id;
 	atomic_t high_mark;    /* high watermark -- don't queue more than this
 				* many entries */
+	struct ewma ns_per_packet; /* tracks how long it takes each packet
+				    * to be transmitted, in nanoseconds */
 };
 
+/* We attempt to find a setting for high_mark (which is measured in packets)
+ * so that no packet spends longer than this (measured in wall clock time)
+ * waiting around in the transmit queue:
+ */
+#define IWL_TX_HIGH_MARK_IN_NSEC (2 * NSEC_PER_MSEC)
+
 /* One for each TFD */
 struct iwl_tx_info {
 	struct sk_buff *skb;
 	struct iwl_rxon_context *ctx;
+	struct timespec enqueue_time;
+	int enqueue_depth;
 };
 
 /**
diff --git a/drivers/net/wireless/iwlwifi/iwl-tx.c b/drivers/net/wireless/iwlwifi/iwl-tx.c
index ce62981..07fac3d 100644
--- a/drivers/net/wireless/iwlwifi/iwl-tx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-tx.c
@@ -266,6 +266,8 @@ static int iwl_queue_init(struct iwl_priv *priv, struct iwl_queue *q,
 	if (q->n_window - atomic_read(&q->high_mark) < 2)
 		atomic_set(&q->high_mark, q->n_window - 2);
 
+	ewma_init(&q->ns_per_packet, 1, 8);
+
 	q->write_ptr = q->read_ptr = 0;
 
 	return 0;
@@ -410,6 +412,30 @@ void iwl_tx_queue_reset(struct iwl_priv *priv, struct iwl_tx_queue *txq,
 }
 EXPORT_SYMBOL(iwl_tx_queue_reset);
 
+void iwl_tx_queue_update_high_mark(struct iwl_tx_queue *txq, int index)
+{
+	struct iwl_queue *q = &txq->q;
+	struct timespec now, diff;
+	int enqueue_depth = txq->txb[index].enqueue_depth;
+	int this_ns_per_packet, avg_ns_per_packet;
+	int new_high_mark;
+
+	getrawmonotonic(&now);
+	diff = timespec_sub(now, txq->txb[index].enqueue_time);
+	this_ns_per_packet = (NSEC_PER_SEC / enqueue_depth) * diff.tv_sec;
+	this_ns_per_packet += diff.tv_nsec / enqueue_depth;
+	/* Just some sanity checks for paranoia's sake */
+	if (this_ns_per_packet < 0 && this_ns_per_packet > NSEC_PER_SEC)
+		return;
+	ewma_add(&q->ns_per_packet, this_ns_per_packet);
+	avg_ns_per_packet = ewma_read(&q->ns_per_packet);
+	new_high_mark = IWL_TX_HIGH_MARK_IN_NSEC / avg_ns_per_packet;
+	new_high_mark = clamp(new_high_mark, 2, q->n_window - 2);
+	atomic_set(&q->high_mark, new_high_mark);
+}
+EXPORT_SYMBOL(iwl_tx_queue_update_high_mark);
+
+
 /*************** HOST COMMAND QUEUE FUNCTIONS   *****/
 
 /**
diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index d0aa01c..a2ae4ad 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -548,6 +548,8 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	memset(&(txq->txb[q->write_ptr]), 0, sizeof(struct iwl_tx_info));
 	txq->txb[q->write_ptr].skb = skb;
 	txq->txb[q->write_ptr].ctx = &priv->contexts[IWL_RXON_CTX_BSS];
+	getrawmonotonic(&txq->txb[q->write_ptr].enqueue_time);
+	txq->txb[q->write_ptr].enqueue_depth = iwl_queue_space_used(q) + 1;
 
 	/* Init first empty entry in queue's array of Tx/cmd buffers */
 	out_cmd = txq->cmd[idx];
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/5] iwlwifi: make current tx queue sizes visible in debugfs
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
                   ` (3 preceding siblings ...)
  2011-02-13 17:56 ` [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency Nathaniel J. Smith
@ 2011-02-13 17:56 ` Nathaniel J. Smith
  2011-02-14  0:32 ` [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Julian Calaby
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 48+ messages in thread
From: Nathaniel J. Smith @ 2011-02-13 17:56 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, wey-yi.w.guy, ilw, Nathaniel J. Smith

Signed-off-by: Nathaniel J. Smith <njs@pobox.com>
---
 drivers/net/wireless/iwlwifi/iwl-debugfs.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-debugfs.c b/drivers/net/wireless/iwlwifi/iwl-debugfs.c
index bc7a965..31c58fe 100644
--- a/drivers/net/wireless/iwlwifi/iwl-debugfs.c
+++ b/drivers/net/wireless/iwlwifi/iwl-debugfs.c
@@ -980,7 +980,7 @@ static ssize_t iwl_dbgfs_tx_queue_read(struct file *file,
 	int pos = 0;
 	int cnt;
 	int ret;
-	const size_t bufsz = sizeof(char) * 64 *
+	const size_t bufsz = sizeof(char) * 128 *
 				priv->cfg->base_params->num_of_queues;
 
 	if (!priv->txq) {
@@ -995,9 +995,11 @@ static ssize_t iwl_dbgfs_tx_queue_read(struct file *file,
 		txq = &priv->txq[cnt];
 		q = &txq->q;
 		pos += scnprintf(buf + pos, bufsz - pos,
-				"hwq %.2d: read=%u write=%u stop=%d"
+				"hwq %.2d: read=%u write=%u"
+				" high_mark=%u stop=%d"
 				" swq_id=%#.2x (ac %d/hwq %d)\n",
 				cnt, q->read_ptr, q->write_ptr,
+				atomic_read(&q->high_mark),
 				!!test_bit(cnt, priv->queue_stopped),
 				txq->swq_id, txq->swq_id & 3,
 				(txq->swq_id >> 2) & 0x1f);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
                   ` (4 preceding siblings ...)
  2011-02-13 17:56 ` [PATCH 5/5] iwlwifi: make current tx queue sizes visible in debugfs Nathaniel J. Smith
@ 2011-02-14  0:32 ` Julian Calaby
  2011-02-14  3:28   ` Nathaniel Smith
  2011-02-16 15:50 ` John W. Linville
  2011-02-17  1:49 ` [RFC] mac80211: implement eBDP algorithm to fight bufferbloat John W. Linville
  7 siblings, 1 reply; 48+ messages in thread
From: Julian Calaby @ 2011-02-14  0:32 UTC (permalink / raw)
  To: Nathaniel J. Smith; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Mon, Feb 14, 2011 at 04:56, Nathaniel J. Smith <njs@pobox.com> wrote:
> I have an iwl3945 in my laptop, and find that whenever I saturate the
> outgoing connection -- which happens daily when my rsync backup kicks
> in -- then latency becomes astronomical (what Jim Gettys calls
> "bufferbloat"). I've measured 12-13 second ping times to my router. As
> you can imagine, this makes web browsing or interactive SSH somewhat
> uncomfortable.

For those who aren't aware of this, more info:
http://gettys.wordpress.com/2010/12/03/introducing-the-criminal-mastermind-bufferbloat/

Re-(re-)stating the issue in question: When a network link is
congested, packets are being buffered for a "long" time in an attempt
to not drop any traffic; despite almost all protocols being either
built on TCP or designed to deal with packet loss. This causes latency
issues as the sender is not receiving any indication that the network
is congested and hence continuing to send as fast as it can, filling
these buffers and causing high latencies as they are emptied out over
the congested link.

The short term solution (but not an overall solution by any means) is
to reduce these buffers so that packets are dropped when links are
congested.

> This patch series teaches the driver to measure the average rate of
> packet transmission for each tx queue, and adjusts the queue size
> dynamically in an attempt to achieve ~2 ms of added latency.

IMHO, I'm not sure that this is the best method of implementing this
solution: most wireless (and ethernet) devices have some form of
buffer for the same reason, so implementing a solution for iwlwifi is
not an overall solution - we need some method of doing this for *all*
network devices, not just iwlwifi.

That said, your reasoning for the queue management simplification
seems sound - but as I don't know the iwlwifi code so I can't really
comment on it any further than that.

Finally, your numbers are damn impressive, and this is the first
attempt that I'm aware of to solve this problem.

Thanks,

-- 

Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load
  2011-02-14  0:32 ` [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Julian Calaby
@ 2011-02-14  3:28   ` Nathaniel Smith
  0 siblings, 0 replies; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-14  3:28 UTC (permalink / raw)
  To: Julian Calaby; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Sun, Feb 13, 2011 at 4:32 PM, Julian Calaby <julian.calaby@gmail.com> wrote:
> The short term solution (but not an overall solution by any means) is
> to reduce these buffers so that packets are dropped when links are
> congested.

The job of a device transmit buffer is just to couple the "real"
generic networking layer buffer, where QoS, AQM, etc. are applied, to
the network hardware. They should just be as small as is consistent
with reasonable CPU utilization and throughput. There are many
different problems with over-large buffers, but this particular one is
very simple.

>> This patch series teaches the driver to measure the average rate of
>> packet transmission for each tx queue, and adjusts the queue size
>> dynamically in an attempt to achieve ~2 ms of added latency.
>
> IMHO, I'm not sure that this is the best method of implementing this
> solution: most wireless (and ethernet) devices have some form of
> buffer for the same reason, so implementing a solution for iwlwifi is
> not an overall solution - we need some method of doing this for *all*
> network devices, not just iwlwifi.

I agree, but I figure a good start is a good start, and at least it
scratches my itch. Maybe the other driver developers will move faster
once iwlwifi starts showing them up ;-).

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-13 17:56 ` [PATCH 1/5] iwlwifi: Simplify tx queue management Nathaniel J. Smith
@ 2011-02-14  9:57   ` Johannes Berg
  2011-02-14 22:17     ` Nathaniel Smith
  2011-02-14 15:33   ` wwguy
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Berg @ 2011-02-14  9:57 UTC (permalink / raw)
  To: Nathaniel J. Smith; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
> Previously, the iwlwifi driver filled its transmit queue until it
> reached a high-water mark, and then stopped until it had fallen to a
> low-water mark. This basic logic makes sense for interrupt mitigation
> -- you might not want to wake up the CPU after every packet, but
> instead wait until a batch of packets has been sent -- except the
> iwlwifi driver doesn't actually do any interrupt mitigation; the CPU
> wakes up after every packet transmitted anyway. So we simplify the
> code to maintain only a single limit on total queue length, and
> whenever we drop below that limit we allow more packets in.
> 
> This patch should have no user-visible effect.

I'm pretty sure the devices (but maybe not 3945) implement interrupt
mitigation at least in some cases. How did you arrive at the conclusion
that "the driver doesn't actually do any interrupt mitigation"?

johannes



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t
  2011-02-13 17:56 ` [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t Nathaniel J. Smith
@ 2011-02-14 12:16   ` Johannes Berg
  2011-02-14 22:35     ` Nathaniel Smith
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Berg @ 2011-02-14 12:16 UTC (permalink / raw)
  To: Nathaniel J. Smith; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
> This patch should cause no behavioral change whatsoever.

Right -- but it makes the TX path quite a bit more expensive on some
architectures. Why is this necessary? There should be locking in most
places already, and where not it should be easy to move this under the
existing lock.

johannes



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-13 17:56 ` [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency Nathaniel J. Smith
@ 2011-02-14 12:17   ` Johannes Berg
  2011-02-14 21:58     ` Nathaniel Smith
  2011-02-14 15:46   ` wwguy
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Berg @ 2011-02-14 12:17 UTC (permalink / raw)
  To: Nathaniel J. Smith; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
> We maintain several ring buffers that queue up packets for the
> hardware to transmit. These buffers can be quite large, and the
> quality of wireless connections can vary greatly; as a result, it can
> be that a full queue might take multiple seconds to drain.
> 
> For instance, if there is a high-bandwidth outgoing flow, like a large
> file upload, then it will completely fill the tx queues. Once the
> queues are full, then any other outgoing packets -- like those sent
> when a user clicks on an HTTP link, or types into an SSH session --
> will have to wait at the end of the line, and will not actually be
> transmitted until multiple seconds have passed.  This results in a
> suboptimal level of interactive response.
> 
> So we really don't want to allow too many packets to get queued up. On
> the other hand, we do want to queue up *some* packets, to maintain
> throughput -- and the queue size that maintains interactivity for a
> degraded 1 Mb/s connection might not be so great for some
> super-fancy 802.11n 600 Mb/s connection.
> 
> This patch estimates how long it takes the hardware to transmit one
> packet (by comparing each packet's queue residency time to the number
> of packets it had to wait behind), and then further smooths these
> estimates with an EWMA. Then, it uses the estimated packet
> transmission time to choose the maximum queue size that will still
> produce reasonably bounded queue residency times, given current
> conditions.

This sounds nice -- but couldn't it just as well be implemented in
mac80211, at least for most devices (that have accurate TX status
reporting)?

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-13 17:56 ` [PATCH 1/5] iwlwifi: Simplify tx queue management Nathaniel J. Smith
  2011-02-14  9:57   ` Johannes Berg
@ 2011-02-14 15:33   ` wwguy
  1 sibling, 0 replies; 48+ messages in thread
From: wwguy @ 2011-02-14 15:33 UTC (permalink / raw)
  To: Nathaniel J. Smith; +Cc: linville, linux-wireless, ilw

Hi Nathaniel,

On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
> Previously, the iwlwifi driver filled its transmit queue until it
> reached a high-water mark, and then stopped until it had fallen to a
> low-water mark. This basic logic makes sense for interrupt mitigation
> -- you might not want to wake up the CPU after every packet, but
> instead wait until a batch of packets has been sent -- except the
> iwlwifi driver doesn't actually do any interrupt mitigation; the CPU
> wakes up after every packet transmitted anyway. So we simplify the
> code to maintain only a single limit on total queue length, and
> whenever we drop below that limit we allow more packets in.
> 
> This patch should have no user-visible effect.
> 
> Signed-off-by: Nathaniel J. Smith <njs@pobox.com>
> ---
>  drivers/net/wireless/iwlwifi/iwl-3945.c     |    4 ++--
>  drivers/net/wireless/iwlwifi/iwl-4965.c     |    4 ++--
>  drivers/net/wireless/iwlwifi/iwl-agn-lib.c  |    4 ++--
>  drivers/net/wireless/iwlwifi/iwl-agn-tx.c   |    6 +++---
>  drivers/net/wireless/iwlwifi/iwl-dev.h      |    2 --
>  drivers/net/wireless/iwlwifi/iwl-tx.c       |   18 +++++++-----------
>  drivers/net/wireless/iwlwifi/iwl3945-base.c |    4 ++--
>  7 files changed, 18 insertions(+), 24 deletions(-)
> 
what device you test this with? I will like to see this changes with
newer device, especially with aggregation.

Thanks
Wey


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-13 17:56 ` [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency Nathaniel J. Smith
  2011-02-14 12:17   ` Johannes Berg
@ 2011-02-14 15:46   ` wwguy
  1 sibling, 0 replies; 48+ messages in thread
From: wwguy @ 2011-02-14 15:46 UTC (permalink / raw)
  To: Nathaniel J. Smith; +Cc: linville, linux-wireless, ilw

Hi Nathaniel,

On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
> We maintain several ring buffers that queue up packets for the
> hardware to transmit. These buffers can be quite large, and the
> quality of wireless connections can vary greatly; as a result, it can
> be that a full queue might take multiple seconds to drain.
> 
> For instance, if there is a high-bandwidth outgoing flow, like a large
> file upload, then it will completely fill the tx queues. Once the
> queues are full, then any other outgoing packets -- like those sent
> when a user clicks on an HTTP link, or types into an SSH session --
> will have to wait at the end of the line, and will not actually be
> transmitted until multiple seconds have passed.  This results in a
> suboptimal level of interactive response.
> 
> So we really don't want to allow too many packets to get queued up. On
> the other hand, we do want to queue up *some* packets, to maintain
> throughput -- and the queue size that maintains interactivity for a
> degraded 1 Mb/s connection might not be so great for some
> super-fancy 802.11n 600 Mb/s connection.
> 
> This patch estimates how long it takes the hardware to transmit one
> packet (by comparing each packet's queue residency time to the number
> of packets it had to wait behind), and then further smooths these
> estimates with an EWMA. Then, it uses the estimated packet
> transmission time to choose the maximum queue size that will still
> produce reasonably bounded queue residency times, given current
> conditions.
> 
> Signed-off-by: Nathaniel J. Smith <njs@pobox.com>
> ---
>  drivers/net/wireless/iwlwifi/iwl-3945.c     |    1 +
>  drivers/net/wireless/iwlwifi/iwl-agn-tx.c   |    3 +++
>  drivers/net/wireless/iwlwifi/iwl-core.h     |    1 +
>  drivers/net/wireless/iwlwifi/iwl-dev.h      |   11 +++++++++++
>  drivers/net/wireless/iwlwifi/iwl-tx.c       |   26 ++++++++++++++++++++++++++
>  drivers/net/wireless/iwlwifi/iwl3945-base.c |    2 ++
>  6 files changed, 44 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c
> index f2e9e59..d52bcb3 100644

>  
> +void iwl_tx_queue_update_high_mark(struct iwl_tx_queue *txq, int index)
> +{
> +	struct iwl_queue *q = &txq->q;
> +	struct timespec now, diff;
> +	int enqueue_depth = txq->txb[index].enqueue_depth;
> +	int this_ns_per_packet, avg_ns_per_packet;
> +	int new_high_mark;
> +
> +	getrawmonotonic(&now);
> +	diff = timespec_sub(now, txq->txb[index].enqueue_time);
> +	this_ns_per_packet = (NSEC_PER_SEC / enqueue_depth) * diff.tv_sec;
> +	this_ns_per_packet += diff.tv_nsec / enqueue_depth;
> +	/* Just some sanity checks for paranoia's sake */
> +	if (this_ns_per_packet < 0 && this_ns_per_packet > NSEC_PER_SEC)
> +		return;
> +	ewma_add(&q->ns_per_packet, this_ns_per_packet);
> +	avg_ns_per_packet = ewma_read(&q->ns_per_packet);
> +	new_high_mark = IWL_TX_HIGH_MARK_IN_NSEC / avg_ns_per_packet;
> +	new_high_mark = clamp(new_high_mark, 2, q->n_window - 2);
> +	atomic_set(&q->high_mark, new_high_mark);
> +}
> +EXPORT_SYMBOL(iwl_tx_queue_update_high_mark);
> +
> +

should the high_mark get reset when interface up or queue reset.

Wey



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-14 12:17   ` Johannes Berg
@ 2011-02-14 21:58     ` Nathaniel Smith
  2011-02-15 12:13       ` Johannes Berg
  0 siblings, 1 reply; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-14 21:58 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Mon, Feb 14, 2011 at 4:17 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
>> This patch estimates how long it takes the hardware to transmit one
>> packet (by comparing each packet's queue residency time to the number
>> of packets it had to wait behind), and then further smooths these
>> estimates with an EWMA. Then, it uses the estimated packet
>> transmission time to choose the maximum queue size that will still
>> produce reasonably bounded queue residency times, given current
>> conditions.
>
> This sounds nice -- but couldn't it just as well be implemented in
> mac80211, at least for most devices (that have accurate TX status
> reporting)?

I don't know. I take it you're imagining a scheme where mac80211 keeps
a hash table or similar of all packets that have been queued to the
device, and does its own tx rate estimation and calls start/stop_queue
as needed? If that works, then it would be lovely. (Heck, it should
probably go in the generic net layer eventually; ethernet cards have
the same problem.) But there are enough complications around
multiqueue devices, packet aggregation strategies, interrupt
mitigation, etc., that I don't think I sit down and write this fully
general solution myself, at least not without a lot more experience
with the needs of different devices. So this patch is intended as a
start.

On Mon, Feb 14, 2011 at 7:46 AM, wwguy <wey-yi.w.guy@intel.com> wrote:
> Hi Nathaniel,
[...]
> should the high_mark get reset when interface up or queue reset.

Hi Wey,

I thought about that, but I can't see what difference it would make.
If we have no information about connection quality, then any high_mark
is as good as any other (and if anything, the previous connection's
quality is probably as good an estimate as we have, since people often
reconnect to the same network repeatedly after suspend, etc.). And in
any case, once connected, we should adapt quite quickly (esp. when
using the uplink heavily, which is the only time when the queue size
has an effect), so our starting value should hardly matter. But I have
no objection to resetting it if that seems cleaner somehow.

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-14  9:57   ` Johannes Berg
@ 2011-02-14 22:17     ` Nathaniel Smith
  2011-02-14 22:45       ` wwguy
  2011-02-15 12:11       ` Johannes Berg
  0 siblings, 2 replies; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-14 22:17 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Mon, Feb 14, 2011 at 1:57 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
>> Previously, the iwlwifi driver filled its transmit queue until it
>> reached a high-water mark, and then stopped until it had fallen to a
>> low-water mark. This basic logic makes sense for interrupt mitigation
>> -- you might not want to wake up the CPU after every packet, but
>> instead wait until a batch of packets has been sent -- except the
>> iwlwifi driver doesn't actually do any interrupt mitigation; the CPU
>> wakes up after every packet transmitted anyway. So we simplify the
>> code to maintain only a single limit on total queue length, and
>> whenever we drop below that limit we allow more packets in.
>>
>> This patch should have no user-visible effect.
>
> I'm pretty sure the devices (but maybe not 3945) implement interrupt
> mitigation at least in some cases. How did you arrive at the conclusion
> that "the driver doesn't actually do any interrupt mitigation"?

Two reasons:
  -- I searched the code and couldn't find any evidence for it
  -- If I'm wrong then the quickest way to find out is to make loud
and confident claims in front of people who know better ;-)
I might be wrong.

If so, then it'd be pretty straightforward to extend this approach to
handle interrupt mitigation -- you set the low mark to N ms, and the
high mark to N+M ms, where N is the amount of time you think you need
to refill the queue after it drops, and 1/M is the maximum interrupt
rate you're willing to tolerate.

On Mon, Feb 14, 2011 at 7:33 AM, wwguy <wey-yi.w.guy@intel.com> wrote:
> Hi Nathaniel,
[...]
> what device you test this with? I will like to see this changes with
> newer device, especially with aggregation.

Hi Wey,

03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG
[Golan] Network Connection (rev 02)

Sadly, I don't have access to any newer hardware. I'd be happy to take
donations ;-), but probably someone else will need to do the testing?

I wouldn't be surprised if it needed some tweaks to properly handle
aggregation (in particular, I'd want to start by clamping the minimum
queue size to match our best guess at the number of packets we can
currently combine into a single transmission), but I'm not enough of
an 802.11 expert to be confident about how to do that either. That
number varies with the rate, yes? So we'd need some way to ask the
rate control algorithm what rate it plans to use next?

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t
  2011-02-14 12:16   ` Johannes Berg
@ 2011-02-14 22:35     ` Nathaniel Smith
  2011-02-15 12:08       ` Johannes Berg
  0 siblings, 1 reply; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-14 22:35 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Mon, Feb 14, 2011 at 4:16 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
>> This patch should cause no behavioral change whatsoever.
>
> Right -- but it makes the TX path quite a bit more expensive on some
> architectures. Why is this necessary? There should be locking in most
> places already, and where not it should be easy to move this under the
> existing lock.

Yes, I suspect we could use the priv->lock for this purpose (at the
expense of adding a lock/unlock on the TX notification path), but I'd
like some confirmation from someone more familiar with the code before
trying. high_mark is read from all over the place, and was previously
read-only, so it really needs someone who understands this driver's
locking "big picture" to check correctness.

In particular, is it safe to acquire priv->lock from inside
{iwl3945,iwlagn}_tx_queue_reclaim? Or can they be called on a path
where the that lock is already held?

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-14 22:17     ` Nathaniel Smith
@ 2011-02-14 22:45       ` wwguy
  2011-02-15  0:15         ` Dave Täht
  2011-02-16  9:16         ` Stanislaw Gruszka
  2011-02-15 12:11       ` Johannes Berg
  1 sibling, 2 replies; 48+ messages in thread
From: wwguy @ 2011-02-14 22:45 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: Johannes Berg, linville, linux-wireless, ilw

On Mon, 2011-02-14 at 14:17 -0800, Nathaniel Smith wrote:
> On Mon, Feb 14, 2011 at 1:57 AM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
> > On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
> >> Previously, the iwlwifi driver filled its transmit queue until it
> >> reached a high-water mark, and then stopped until it had fallen to a
> >> low-water mark. This basic logic makes sense for interrupt mitigation
> >> -- you might not want to wake up the CPU after every packet, but
> >> instead wait until a batch of packets has been sent -- except the
> >> iwlwifi driver doesn't actually do any interrupt mitigation; the CPU
> >> wakes up after every packet transmitted anyway. So we simplify the
> >> code to maintain only a single limit on total queue length, and
> >> whenever we drop below that limit we allow more packets in.
> >>
> >> This patch should have no user-visible effect.
> >
> > I'm pretty sure the devices (but maybe not 3945) implement interrupt
> > mitigation at least in some cases. How did you arrive at the conclusion
> > that "the driver doesn't actually do any interrupt mitigation"?
> 
> Two reasons:
>   -- I searched the code and couldn't find any evidence for it
>   -- If I'm wrong then the quickest way to find out is to make loud
> and confident claims in front of people who know better ;-)
> I might be wrong.
> 
> If so, then it'd be pretty straightforward to extend this approach to
> handle interrupt mitigation -- you set the low mark to N ms, and the
> high mark to N+M ms, where N is the amount of time you think you need
> to refill the queue after it drops, and 1/M is the maximum interrupt
> rate you're willing to tolerate.
> 
> On Mon, Feb 14, 2011 at 7:33 AM, wwguy <wey-yi.w.guy@intel.com> wrote:
> > Hi Nathaniel,
> [...]
> > what device you test this with? I will like to see this changes with
> > newer device, especially with aggregation.
> 
> Hi Wey,
> 
> 03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG
> [Golan] Network Connection (rev 02)
> 
> Sadly, I don't have access to any newer hardware. I'd be happy to take
> donations ;-), but probably someone else will need to do the testing?
> 
> I wouldn't be surprised if it needed some tweaks to properly handle
> aggregation (in particular, I'd want to start by clamping the minimum
> queue size to match our best guess at the number of packets we can
> currently combine into a single transmission), but I'm not enough of
> an 802.11 expert to be confident about how to do that either. That
> number varies with the rate, yes? So we'd need some way to ask the
> rate control algorithm what rate it plans to use next?
> 

right, I will really like to see those changes being test with agn
device first.

btw, we are in the process of splitting 3945/4965 out of iwlwifi into it
own directory (iwlwifilegacy). The work should be done and upstream in
the next few days. I will really prefer you wait for us done the split
works first, then consider those changes.

Thanks
Wey



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-14 22:45       ` wwguy
@ 2011-02-15  0:15         ` Dave Täht
  2011-02-16  9:16         ` Stanislaw Gruszka
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Täht @ 2011-02-15  0:15 UTC (permalink / raw)
  To: linux-wireless

wwguy <wey-yi.w.guy@intel.com> writes:

> On Mon, 2011-02-14 at 14:17 -0800, Nathaniel Smith wrote:
>> On Mon, Feb 14, 2011 at 1:57 AM, Johannes Berg
>> <johannes@sipsolutions.net> wrote:
>> > On Sun, 2011-02-13 at 09:56 -0800, Nathaniel J. Smith wrote:
>> >> Previously, the iwlwifi driver filled its transmit queue until it
>> >> reached a high-water mark, and then stopped until it had fallen to
>> >> a low-water mark. This basic logic makes sense for interrupt
>> >> mitigation -- you might not want to wake up the CPU after every
>> >> packet, but instead wait until a batch of packets has been sent --
>> >> except the iwlwifi driver doesn't actually do any interrupt
>> >> mitigation; the CPU wakes up after every packet transmitted
>> >> anyway. So we simplify the code to maintain only a single limit on
>> >> total queue length, and whenever we drop below that limit we allow
>> >> more packets in.
>> >>
>> >> This patch should have no user-visible effect.
>> >
>> > I'm pretty sure the devices (but maybe not 3945) implement
>> > interrupt mitigation at least in some cases. How did you arrive at
>> > the conclusion that "the driver doesn't actually do any interrupt
>> > mitigation"?
>> 
>> Two reasons: -- I searched the code and couldn't find any evidence
>> for it -- If I'm wrong then the quickest way to find out is to make
>> loud and confident claims in front of people who know better ;-) I
>> might be wrong.
>> 
>> If so, then it'd be pretty straightforward to extend this approach to
>> handle interrupt mitigation -- you set the low mark to N ms, and the
>> high mark to N+M ms, where N is the amount of time you think you need
>> to refill the queue after it drops, and 1/M is the maximum interrupt
>> rate you're willing to tolerate.
>> 
>> On Mon, Feb 14, 2011 at 7:33 AM, wwguy <wey-yi.w.guy@intel.com> wrote:
>> > Hi Nathaniel,
>> [...]
>> > what device you test this with? I will like to see this changes
>> > with newer device, especially with aggregation.
>> 
>> Hi Wey,
>> 
>> 03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG
>> [Golan] Network Connection (rev 02)
>> 
>> Sadly, I don't have access to any newer hardware. I'd be happy to
>> take donations ;-), but probably someone else will need to do the
>> testing?
>> 
>> I wouldn't be surprised if it needed some tweaks to properly handle
>> aggregation (in particular, I'd want to start by clamping the minimum
>> queue size to match our best guess at the number of packets we can
>> currently combine into a single transmission), but I'm not enough of
>> an 802.11 expert to be confident about how to do that either. That
>> number varies with the rate, yes? So we'd need some way to ask the
>> rate control algorithm what rate it plans to use next?

Concur there may be issues with aggregation. 

However, I'll take two orders of magnitude latency improvement over
extreme bandwidth, ANY DAY.

>> 
>
> right, I will really like to see those changes being test with agn
> device first.

Is this modern enough?

03:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)

If so, I'll apply Nathans patch series (could you email the latest cut to me?)


-- 
Dave Taht
http://nex-6.taht.net


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t
  2011-02-14 22:35     ` Nathaniel Smith
@ 2011-02-15 12:08       ` Johannes Berg
  2011-02-15 17:37         ` Nathaniel Smith
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Berg @ 2011-02-15 12:08 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Mon, 2011-02-14 at 14:35 -0800, Nathaniel Smith wrote:

> > Right -- but it makes the TX path quite a bit more expensive on some
> > architectures. Why is this necessary? There should be locking in most
> > places already, and where not it should be easy to move this under the
> > existing lock.
> 
> Yes, I suspect we could use the priv->lock for this purpose (at the
> expense of adding a lock/unlock on the TX notification path), but I'd
> like some confirmation from someone more familiar with the code before
> trying. high_mark is read from all over the place, and was previously
> read-only, so it really needs someone who understands this driver's
> locking "big picture" to check correctness.

Actually, it doesn't really matter. Linux assumes that reads and writes
to "unsigned long" are always atomic against each other (i.e. you get
either the old or the new value) and that's all you care about here --
this never needs anything like atomic_inc/dec etc.

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-14 22:17     ` Nathaniel Smith
  2011-02-14 22:45       ` wwguy
@ 2011-02-15 12:11       ` Johannes Berg
  1 sibling, 0 replies; 48+ messages in thread
From: Johannes Berg @ 2011-02-15 12:11 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Mon, 2011-02-14 at 14:17 -0800, Nathaniel Smith wrote:

> > I'm pretty sure the devices (but maybe not 3945) implement interrupt
> > mitigation at least in some cases. How did you arrive at the conclusion
> > that "the driver doesn't actually do any interrupt mitigation"?
> 
> Two reasons:
>   -- I searched the code and couldn't find any evidence for it
>   -- If I'm wrong then the quickest way to find out is to make loud
> and confident claims in front of people who know better ;-)
> I might be wrong.

Well, I think you are -- at least for some cases. The driver doesn't
need to do anything to implement mitigation, if anything the ucode will
do this by itself -- I imagine it'll do this especially for aggregation
queues (which I realise aren't relevant in this patch, but still).

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-14 21:58     ` Nathaniel Smith
@ 2011-02-15 12:13       ` Johannes Berg
  2011-02-15 15:03         ` John W. Linville
  2011-02-15 17:31         ` Nathaniel Smith
  0 siblings, 2 replies; 48+ messages in thread
From: Johannes Berg @ 2011-02-15 12:13 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Mon, 2011-02-14 at 13:58 -0800, Nathaniel Smith wrote:

> > This sounds nice -- but couldn't it just as well be implemented in
> > mac80211, at least for most devices (that have accurate TX status
> > reporting)?
> 
> I don't know. I take it you're imagining a scheme where mac80211 keeps
> a hash table or similar of all packets that have been queued to the
> device, and does its own tx rate estimation and calls start/stop_queue
> as needed? If that works, then it would be lovely. (Heck, it should
> probably go in the generic net layer eventually; ethernet cards have
> the same problem.)

Well, no, why would it keep a hash table?? And no -- it can't be done in
the generic net layer.

mac80211 has the advantage that TX status is reported back, so it could
pretty accurately know how many frames are queued up -- more generically
this cannot be done.

But since it's not perfectly accurate right now, it may not be the best
idea to do this at this time.

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-15 12:13       ` Johannes Berg
@ 2011-02-15 15:03         ` John W. Linville
  2011-02-16  8:59           ` Johannes Berg
  2011-02-15 17:31         ` Nathaniel Smith
  1 sibling, 1 reply; 48+ messages in thread
From: John W. Linville @ 2011-02-15 15:03 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Nathaniel Smith, linux-wireless, wey-yi.w.guy, ilw

On Tue, Feb 15, 2011 at 01:13:06PM +0100, Johannes Berg wrote:

> mac80211 has the advantage that TX status is reported back, so it could
> pretty accurately know how many frames are queued up -- more generically
> this cannot be done.
> 
> But since it's not perfectly accurate right now, it may not be the best
> idea to do this at this time.

Let's not make the perfect the enemy of the good.  An imperfect
implementation at the mac80211 layer is likely better than a somewhat
better implementation for one driver and no implementation for
the rest.

Now, someone help me understand how .11n aggregation complicates this... :-)

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-15 12:13       ` Johannes Berg
  2011-02-15 15:03         ` John W. Linville
@ 2011-02-15 17:31         ` Nathaniel Smith
  1 sibling, 0 replies; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-15 17:31 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Tue, Feb 15, 2011 at 4:13 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Mon, 2011-02-14 at 13:58 -0800, Nathaniel Smith wrote:
>> I don't know. I take it you're imagining a scheme where mac80211 keeps
>> a hash table or similar of all packets that have been queued to the
>> device, and does its own tx rate estimation and calls start/stop_queue
>> as needed? If that works, then it would be lovely. (Heck, it should
>> probably go in the generic net layer eventually; ethernet cards have
>> the same problem.)
>
> Well, no, why would it keep a hash table??

The current patch, whenever it queues a packet, stores
  t = the current time
  S = the current size of the queue
so that later when the TX completes, it can compute (now - t)/S to
estimate the per-packet transmission time. I just mean that you'd need
somewhere to store this stuff, or else find another algorithm for rate
estimation.

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t
  2011-02-15 12:08       ` Johannes Berg
@ 2011-02-15 17:37         ` Nathaniel Smith
  0 siblings, 0 replies; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-15 17:37 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linville, linux-wireless, wey-yi.w.guy, ilw

On Tue, Feb 15, 2011 at 4:08 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> Actually, it doesn't really matter. Linux assumes that reads and writes
> to "unsigned long" are always atomic against each other (i.e. you get
> either the old or the new value) and that's all you care about here --
> this never needs anything like atomic_inc/dec etc.

Excellent point, thank you.

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency
  2011-02-15 15:03         ` John W. Linville
@ 2011-02-16  8:59           ` Johannes Berg
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Berg @ 2011-02-16  8:59 UTC (permalink / raw)
  To: John W. Linville; +Cc: Nathaniel Smith, linux-wireless, wey-yi.w.guy, ilw

On Tue, 2011-02-15 at 10:03 -0500, John W. Linville wrote:
> On Tue, Feb 15, 2011 at 01:13:06PM +0100, Johannes Berg wrote:
> 
> > mac80211 has the advantage that TX status is reported back, so it could
> > pretty accurately know how many frames are queued up -- more generically
> > this cannot be done.
> > 
> > But since it's not perfectly accurate right now, it may not be the best
> > idea to do this at this time.
> 
> Let's not make the perfect the enemy of the good.  An imperfect
> implementation at the mac80211 layer is likely better than a somewhat
> better implementation for one driver and no implementation for
> the rest.

Well, no, I think the fact that it's not perfect probably means that the
bandwidth/packet estimate will drift if any packets are dropped etc.

> Now, someone help me understand how .11n aggregation complicates this... :-)

I'll try ... later :-)

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-14 22:45       ` wwguy
  2011-02-15  0:15         ` Dave Täht
@ 2011-02-16  9:16         ` Stanislaw Gruszka
  2011-02-16 14:41           ` John W. Linville
  1 sibling, 1 reply; 48+ messages in thread
From: Stanislaw Gruszka @ 2011-02-16  9:16 UTC (permalink / raw)
  To: wwguy; +Cc: Nathaniel Smith, Johannes Berg, linville, linux-wireless, ilw

On Mon, Feb 14, 2011 at 02:45:41PM -0800, wwguy wrote:
> btw, we are in the process of splitting 3945/4965 out of iwlwifi into it
> own directory (iwlwifilegacy). 
Uhh, we do not have suchlongdrivernames in kernel, what about iwlegacy :-)

Stanislaw


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-16  9:16         ` Stanislaw Gruszka
@ 2011-02-16 14:41           ` John W. Linville
  2011-02-16 15:13             ` wwguy
  0 siblings, 1 reply; 48+ messages in thread
From: John W. Linville @ 2011-02-16 14:41 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: wwguy, Nathaniel Smith, Johannes Berg, linux-wireless, ilw

On Wed, Feb 16, 2011 at 10:16:06AM +0100, Stanislaw Gruszka wrote:
> On Mon, Feb 14, 2011 at 02:45:41PM -0800, wwguy wrote:
> > btw, we are in the process of splitting 3945/4965 out of iwlwifi into it
> > own directory (iwlwifilegacy). 
> Uhh, we do not have suchlongdrivernames in kernel, what about iwlegacy :-)

I do prefer iwlegacy to iwlwifilegacy as well...

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/5] iwlwifi: Simplify tx queue management
  2011-02-16 14:41           ` John W. Linville
@ 2011-02-16 15:13             ` wwguy
  0 siblings, 0 replies; 48+ messages in thread
From: wwguy @ 2011-02-16 15:13 UTC (permalink / raw)
  To: John W. Linville
  Cc: Stanislaw Gruszka, Nathaniel Smith, Johannes Berg, linux-wireless, ilw

On Wed, 2011-02-16 at 06:41 -0800, John W. Linville wrote:
> On Wed, Feb 16, 2011 at 10:16:06AM +0100, Stanislaw Gruszka wrote:
> > On Mon, Feb 14, 2011 at 02:45:41PM -0800, wwguy wrote:
> > > btw, we are in the process of splitting 3945/4965 out of iwlwifi into it
> > > own directory (iwlwifilegacy). 
> > Uhh, we do not have suchlongdrivernames in kernel, what about iwlegacy :-)
> 
> I do prefer iwlegacy to iwlwifilegacy as well...
> 
ok

Thanks
Wey



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
                   ` (5 preceding siblings ...)
  2011-02-14  0:32 ` [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Julian Calaby
@ 2011-02-16 15:50 ` John W. Linville
  2011-02-16 23:08   ` Nathaniel Smith
  2011-02-17  1:49 ` [RFC] mac80211: implement eBDP algorithm to fight bufferbloat John W. Linville
  7 siblings, 1 reply; 48+ messages in thread
From: John W. Linville @ 2011-02-16 15:50 UTC (permalink / raw)
  To: Nathaniel J. Smith; +Cc: linux-wireless, wey-yi.w.guy, ilw

On Sun, Feb 13, 2011 at 09:56:37AM -0800, Nathaniel J. Smith wrote:

> This patch series teaches the driver to measure the average rate of
> packet transmission for each tx queue, and adjusts the queue size
> dynamically in an attempt to achieve ~2 ms of added latency.

How did you pick this number?  Is there research to support this as
the correct target for link latency?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load
  2011-02-16 15:50 ` John W. Linville
@ 2011-02-16 23:08   ` Nathaniel Smith
  2011-02-16 23:42     ` wwguy
  0 siblings, 1 reply; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-16 23:08 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, wey-yi.w.guy, ilw

On Wed, Feb 16, 2011 at 7:50 AM, John W. Linville
<linville@tuxdriver.com> wrote:
> On Sun, Feb 13, 2011 at 09:56:37AM -0800, Nathaniel J. Smith wrote:
>
>> This patch series teaches the driver to measure the average rate of
>> packet transmission for each tx queue, and adjusts the queue size
>> dynamically in an attempt to achieve ~2 ms of added latency.
>
> How did you pick this number?  Is there research to support this as
> the correct target for link latency?

I'm not aware of any such research, no. My reasoning is that in this
scheme the ideal latency is based on the kernel's scheduling
promptness -- at some moment t0 the hardware sends the packet that
drops us below our low water mark, and then it takes some time for the
kernel to be informed of this fact, for the driver's tasklet to be
scheduled, the TX queue to be restarted, and for packets to get loaded
into it, so that eventually at time t1 the queue is refilled. To
maintain throughput, we want the queue length to be such that all this
can be accomplished before the queue drains completely; to maintain
latency, we want it to be no longer than necessary.

So the ideal latency for the TX queue is whatever number L is, say,
95% of the time, greater than (t1 - t0). Note that this is
specifically for the driver queue, whose job is just to couple the
"real" queue to the hardware; the "real" queue should be larger and
smarter to properly handle bursty behavior and such.

I made up "2 ms" out of thin air as a number that seemed plausible to
me, and because I don't know how to measure the real number L. Ideally
we'd measure it on the fly, since it surely varies somewhat between
machines. Maybe someone else has a better idea how to do this?

The queue refill process for iwl3945 looks like:
  1) hardware transmits a packet, sends a tx notification to the driver
  2) iwl_isr_legacy receives the interrupt, and tasklet_schedule()s
the irq tasklet
  3) iwl3945_irq_tasklet runs, and eventually from
iwl3945_tx_queue_reclaim we wake the queue
  4) Waking the queue raises a softirq (netif_wake_subqueue -> __netif_schedule)
  5) The softirq runs, and, if there are packets queued, eventually
calls iwl3945_tx_skb

So IIUC there are actually two trips through the scheduler -- between
(2) and (3), and between (4) and (5). I assume that these are the only
sources of significant latency, so our goal is to measure the time
elapsed from (2) to (5).

Complications: (a) In the ISR, I'm not sure we have access to a
reliable realtime clock. (b) If there aren't any packets queued and
waiting, then we'll never get called in step (5).

The best bet -- assuming access to the clock from the interrupt
handler -- might be to measure the time from iwl_isr_legacy being
called to the tasklet being scheduled, and then multiply that by 2 + a
fudge factor.

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load
  2011-02-16 23:08   ` Nathaniel Smith
@ 2011-02-16 23:42     ` wwguy
  0 siblings, 0 replies; 48+ messages in thread
From: wwguy @ 2011-02-16 23:42 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: John W. Linville, linux-wireless, ilw

On Wed, 2011-02-16 at 15:08 -0800, Nathaniel Smith wrote:
> On Wed, Feb 16, 2011 at 7:50 AM, John W. Linville
> <linville@tuxdriver.com> wrote:
> > On Sun, Feb 13, 2011 at 09:56:37AM -0800, Nathaniel J. Smith wrote:
> >
> >> This patch series teaches the driver to measure the average rate of
> >> packet transmission for each tx queue, and adjusts the queue size
> >> dynamically in an attempt to achieve ~2 ms of added latency.
> >
> > How did you pick this number?  Is there research to support this as
> > the correct target for link latency?
> 
> I'm not aware of any such research, no. My reasoning is that in this
> scheme the ideal latency is based on the kernel's scheduling
> promptness -- at some moment t0 the hardware sends the packet that
> drops us below our low water mark, and then it takes some time for the
> kernel to be informed of this fact, for the driver's tasklet to be
> scheduled, the TX queue to be restarted, and for packets to get loaded
> into it, so that eventually at time t1 the queue is refilled. To
> maintain throughput, we want the queue length to be such that all this
> can be accomplished before the queue drains completely; to maintain
> latency, we want it to be no longer than necessary.
> 
> So the ideal latency for the TX queue is whatever number L is, say,
> 95% of the time, greater than (t1 - t0). Note that this is
> specifically for the driver queue, whose job is just to couple the
> "real" queue to the hardware; the "real" queue should be larger and
> smarter to properly handle bursty behavior and such.
> 
> I made up "2 ms" out of thin air as a number that seemed plausible to
> me, and because I don't know how to measure the real number L. Ideally
> we'd measure it on the fly, since it surely varies somewhat between
> machines. Maybe someone else has a better idea how to do this?
> 
> The queue refill process for iwl3945 looks like:
>   1) hardware transmits a packet, sends a tx notification to the driver
>   2) iwl_isr_legacy receives the interrupt, and tasklet_schedule()s
> the irq tasklet
>   3) iwl3945_irq_tasklet runs, and eventually from
> iwl3945_tx_queue_reclaim we wake the queue
>   4) Waking the queue raises a softirq (netif_wake_subqueue -> __netif_schedule)
>   5) The softirq runs, and, if there are packets queued, eventually
> calls iwl3945_tx_skb
> 
> So IIUC there are actually two trips through the scheduler -- between
> (2) and (3), and between (4) and (5). I assume that these are the only
> sources of significant latency, so our goal is to measure the time
> elapsed from (2) to (5).
> 
> Complications: (a) In the ISR, I'm not sure we have access to a
> reliable realtime clock. (b) If there aren't any packets queued and
> waiting, then we'll never get called in step (5).
> 
> The best bet -- assuming access to the clock from the interrupt
> handler -- might be to measure the time from iwl_isr_legacy being
> called to the tasklet being scheduled, and then multiply that by 2 + a
> fudge factor.
> 

I believe we need to test this on the newer device and re-think the
approach, keep in mind the newer iwlagn device has "interrupt
coalescing" to reduce the power and CPU usage, which will change the
timing. Also with 11n enable and aggregation is on, the behavior also
different.

Wey



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
                   ` (6 preceding siblings ...)
  2011-02-16 15:50 ` John W. Linville
@ 2011-02-17  1:49 ` John W. Linville
  2011-02-17  3:31   ` Ben Greear
                     ` (3 more replies)
  7 siblings, 4 replies; 48+ messages in thread
From: John W. Linville @ 2011-02-17  1:49 UTC (permalink / raw)
  To: linux-wireless
  Cc: johannes, nbd, bloat-devel, Nathaniel J. Smith, John W. Linville

This is an implementation of the eBDP algorithm as documented in
Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
et al.

	http://www.hamilton.ie/tianji_li/buffersizing.pdf

This implementation timestamps an skb before handing it to the hardware
driver, then computes the service time when the transmit status is
reported back from the driver.  An exponentially weighted moving
average of per packet service times is used to restrict queueing
delays in hopes of achieving a target packet transmission latency.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
This is preliminary, but it seems to put some limits on latencies
for me.  I haven't even really done much testing, so YMMV...

I'm sure this isn't ideal.  This should be combined with the ALT
algorithm to yield the A* algorithm.  There are parameters that
probably should be tunable (or at least better researched).  This may
not be ideal for 802.11n -- it may even be detrimental to it.

Still, it is an attempt at addressing buffer bloat.  Plus, it should
pertain to all mac80211-based drivers.  So, feel free to test it,
suggest different parameters, report real numbers, post patches,
etc... :-)

 net/mac80211/ieee80211_i.h |    7 +++++++
 net/mac80211/iface.c       |    6 ++++++
 net/mac80211/status.c      |    9 +++++++++
 net/mac80211/tx.c          |   16 ++++++++++++++++
 4 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index f2ef15d..1c69b29 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -604,6 +604,13 @@ struct ieee80211_sub_if_data {
 		struct dentry *default_mgmt_key;
 	} debugfs;
 #endif
+
+	/* number of packets currently in flight */
+	atomic_t enqueued;
+
+	/* average packet service time */
+	struct ewma tserv_ns_avg;
+
 	/* must be last, dynamically sized area in this! */
 	struct ieee80211_vif vif;
 };
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 5a4e19b..ccfb659 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -857,6 +857,12 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
 	skb_queue_head_init(&sdata->skb_queue);
 	INIT_WORK(&sdata->work, ieee80211_iface_work);
 
+	/* initialize service time estimate for buffer control */
+	ewma_init(&sdata->tserv_ns_avg, 1, 8);
+	ewma_add(&sdata->tserv_ns_avg, 2 * NSEC_PER_MSEC);
+
+	atomic_set(&sdata->enqueued, 0);
+
 	switch (type) {
 	case NL80211_IFTYPE_P2P_GO:
 		type = NL80211_IFTYPE_AP;
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 010a559..9e5eed3 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -172,6 +172,7 @@ static void ieee80211_frame_acked(struct sta_info *sta, struct sk_buff *skb)
 
 void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 {
+	ktime_t tserv;
 	struct sk_buff *skb2;
 	struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
 	struct ieee80211_local *local = hw_to_local(hw);
@@ -188,6 +189,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 	bool send_to_cooked;
 	bool acked;
 
+	/* grab timestamp info for buffer control estimates */
+	tserv = ktime_sub(ktime_get(), skb->tstamp);
+
 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++) {
 		/* the HW cannot have attempted that rate */
 		if (i >= hw->max_report_rates) {
@@ -212,6 +216,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 		if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN))
 			continue;
 
+		atomic_dec(&sta->sdata->enqueued);
+
+		/* factor current tserv into service time estimate */
+		ewma_add(&sta->sdata->tserv_ns_avg, ktime_to_ns(tserv));
+
 		acked = !!(info->flags & IEEE80211_TX_STAT_ACK);
 		if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) {
 			/*
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 17ef4f4..8ccf60b 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1298,6 +1298,8 @@ static int __ieee80211_tx(struct ieee80211_local *local,
 
 	while (skb) {
 		int q = skb_get_queue_mapping(skb);
+		int max_enqueued;
+		unsigned long tserv_ns_avg;
 		__le16 fc;
 
 		spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
@@ -1323,6 +1325,20 @@ static int __ieee80211_tx(struct ieee80211_local *local,
 
 		sdata = vif_to_sdata(info->control.vif);
 
+		/* test for queue admission qualifications */
+		tserv_ns_avg = ewma_read(&sdata->tserv_ns_avg);
+		/* constants 2 msec and offset 5 should be tunable? */
+		max_enqueued = 2 * NSEC_PER_MSEC / tserv_ns_avg + 5;
+		if (atomic_read(&sdata->enqueued) > max_enqueued) {
+			/* silently drop */
+			dev_kfree_skb(skb);
+			return IEEE80211_TX_OK;
+		}
+
+		/* timestamp queue entry and increment queue length */
+		skb->tstamp = ktime_get();
+		atomic_inc(&sdata->enqueued);
+
 		switch (sdata->vif.type) {
 		case NL80211_IFTYPE_MONITOR:
 			info->control.vif = NULL;
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [RFC] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-17  1:49 ` [RFC] mac80211: implement eBDP algorithm to fight bufferbloat John W. Linville
@ 2011-02-17  3:31   ` Ben Greear
  2011-02-17  4:26   ` Nathaniel Smith
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 48+ messages in thread
From: Ben Greear @ 2011-02-17  3:31 UTC (permalink / raw)
  To: John W. Linville
  Cc: linux-wireless, johannes, nbd, bloat-devel, Nathaniel J. Smith

On 02/16/2011 05:49 PM, John W. Linville wrote:
> This is an implementation of the eBDP algorithm as documented in
> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
> et al.
>
> 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
>
> This implementation timestamps an skb before handing it to the hardware
> driver, then computes the service time when the transmit status is
> reported back from the driver.  An exponentially weighted moving
> average of per packet service times is used to restrict queueing
> delays in hopes of achieving a target packet transmission latency.
>
> Signed-off-by: John W. Linville<linville@tuxdriver.com>
> ---
> This is preliminary, but it seems to put some limits on latencies
> for me.  I haven't even really done much testing, so YMMV...
>
> I'm sure this isn't ideal.  This should be combined with the ALT
> algorithm to yield the A* algorithm.  There are parameters that
> probably should be tunable (or at least better researched).  This may
> not be ideal for 802.11n -- it may even be detrimental to it.
>
> Still, it is an attempt at addressing buffer bloat.  Plus, it should
> pertain to all mac80211-based drivers.  So, feel free to test it,
> suggest different parameters, report real numbers, post patches,
> etc... :-)

> +		/* test for queue admission qualifications */
> +		tserv_ns_avg = ewma_read(&sdata->tserv_ns_avg);
> +		/* constants 2 msec and offset 5 should be tunable? */
> +		max_enqueued = 2 * NSEC_PER_MSEC / tserv_ns_avg + 5;
> +		if (atomic_read(&sdata->enqueued)>  max_enqueued) {
> +			/* silently drop */
> +			dev_kfree_skb(skb);
> +			return IEEE80211_TX_OK;
> +		}

We should have some visible stat to increment when you drop on xmit,
  tx-fifo error or similar?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-17  1:49 ` [RFC] mac80211: implement eBDP algorithm to fight bufferbloat John W. Linville
  2011-02-17  3:31   ` Ben Greear
@ 2011-02-17  4:26   ` Nathaniel Smith
  2011-02-17  8:31   ` Johannes Berg
  2011-02-18 21:21   ` [RFC v2] " John W. Linville
  3 siblings, 0 replies; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-17  4:26 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, johannes, nbd, bloat-devel

On Wed, Feb 16, 2011 at 5:49 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> I'm sure this isn't ideal.  This should be combined with the ALT
> algorithm to yield the A* algorithm.  There are parameters that
> probably should be tunable (or at least better researched).  This may
> not be ideal for 802.11n -- it may even be detrimental to it.

Excellent!

General thoughts:

I think you're wiping out any interrupt mitigation any drivers are
doing, because they'll never get filled up to even their low water
mark? (http://article.gmane.org/gmane.linux.kernel.wireless.general/64843
has a scheme for adapting the eBDP idea to interrupt mitigation)

It's important to keep in mind the distinction between:
 -- a host's total tx buffer
 -- the individual queues that make up that buffer
In Linux we have two queues in serial, the net subsystem's Qdisc
thing, which feeds the driver's tx queue. It's this distinction that
makes it reasonable to shrink the tx queue down to really tiny sizes
(a few ms), because while a router needs a few hundred milliseconds
(~a RTT) of *total* buffering to absorb bursty packet arrivals, we
want as much of that buffering as possible to be happening in the
Qdisc, where it can have AMQ and QoS and applied.

A* is an algorithm for estimating the right total host buffer size.

(Of course, we might want to just disable the Qdisc buffering and move
everything inside the driver -- Felix Fietkau is considering this[1],
because when aggregating you really want a separate buffer per
destination STA, and there's no easy way to do this with the current
system -- but obviously that raises its own issues...)

[1] https://lists.bufferbloat.net/pipermail/bloat-devel/2011-February/000013.html

> @@ -212,6 +216,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
>                if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN))
>                        continue;
>
> +               atomic_dec(&sta->sdata->enqueued);
> +
> +               /* factor current tserv into service time estimate */
> +               ewma_add(&sta->sdata->tserv_ns_avg, ktime_to_ns(tserv));
> +

I think you're calculating the total time that the packet was resident
in the queue, and treating it like the time to service a single
packet. In my patch I also stored the current queue length at the time
the packet was enqueued, and then divided the time delta by the number
of packets that were serviced in that time.

> @@ -1323,6 +1325,20 @@ static int __ieee80211_tx(struct ieee80211_local *local,
>
>                sdata = vif_to_sdata(info->control.vif);
>
> +               /* test for queue admission qualifications */
> +               tserv_ns_avg = ewma_read(&sdata->tserv_ns_avg);
> +               /* constants 2 msec and offset 5 should be tunable? */
> +               max_enqueued = 2 * NSEC_PER_MSEC / tserv_ns_avg + 5;

5 packets worth of fudge factor seems high. I can measure 15-20 ms
single packet service times here just by turning down to 1Mbs on an
uncontended network; the Li et al paper you link has a graph
suggesting that on contented networks, 50-100 ms/packet is not
uncommon. (And even if I don't force my card to 1Mbs, with my patch
I'm still seeing appropriate buffer sizes in the 1-2 packet range.) So
you may be unconditionally adding a few hundred milliseconds of
latency here.

They make a good point that you might want some extra space to absorb
short-term fluctuations in packet service times, but I think it'd make
more sense to do that by clamping the queue length to some minimum
value (2 packets?), and possibly bumping up the magic "2 ms" constant.
Or be even more clever and estimate the standard deviation of the
single packet service times...

(http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm)

> +               if (atomic_read(&sdata->enqueued) > max_enqueued) {
> +                       /* silently drop */
> +                       dev_kfree_skb(skb);
> +                       return IEEE80211_TX_OK;
> +               }

Shouldn't you be stopping the queue, too?

I think by leaving the queue open and discarding everything, you
effectively disable the net layer's Qdisc buffering, which might also
be a factor in your observations of reduced bufferbloat :-).

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-17  1:49 ` [RFC] mac80211: implement eBDP algorithm to fight bufferbloat John W. Linville
  2011-02-17  3:31   ` Ben Greear
  2011-02-17  4:26   ` Nathaniel Smith
@ 2011-02-17  8:31   ` Johannes Berg
  2011-02-18 21:21   ` [RFC v2] " John W. Linville
  3 siblings, 0 replies; 48+ messages in thread
From: Johannes Berg @ 2011-02-17  8:31 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, nbd, bloat-devel, Nathaniel J. Smith

On Wed, 2011-02-16 at 20:49 -0500, John W. Linville wrote:

> @@ -212,6 +216,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
>  		if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN))
>  			continue;
>  
> +		atomic_dec(&sta->sdata->enqueued);

Note that TX status isn't reliable under any circumstances. Many drivers
will reliably return the SKB if it was enqueued correctly to the device,
but that may already fail (failure to allocate URBs, DMA memory etc.) In
this case, the counter will continually be too high, leading the code
below to drop many more packets than necessary. This is what I was
referring to earlier :-)

> @@ -1323,6 +1325,20 @@ static int __ieee80211_tx(struct ieee80211_local *local,
>  
>  		sdata = vif_to_sdata(info->control.vif);
>  
> +		/* test for queue admission qualifications */
> +		tserv_ns_avg = ewma_read(&sdata->tserv_ns_avg);
> +		/* constants 2 msec and offset 5 should be tunable? */
> +		max_enqueued = 2 * NSEC_PER_MSEC / tserv_ns_avg + 5;
> +		if (atomic_read(&sdata->enqueued) > max_enqueued) {
> +			/* silently drop */
> +			dev_kfree_skb(skb);
> +			return IEEE80211_TX_OK;
> +		}

As Nathaniel pointed out, this should instead stop the queue to give
normal QoS mechanisms a chance (this one packet can be kept for later,
or transmitted anyway, or the code to test all this can be after the
packet is sent).

However, since "sdata->enqueued" will drift as explained above, this
isn't feasible anyhow.

> +		/* timestamp queue entry and increment queue length */
> +		skb->tstamp = ktime_get();
> +		atomic_inc(&sdata->enqueued);

Additionally, this assumes that tx() cannot fail -- it can until my void
patch.

Sorry, this approach just won't work because we don't have perfect skb
accounting across the stack.

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-17  1:49 ` [RFC] mac80211: implement eBDP algorithm to fight bufferbloat John W. Linville
                     ` (2 preceding siblings ...)
  2011-02-17  8:31   ` Johannes Berg
@ 2011-02-18 21:21   ` John W. Linville
  2011-02-19  3:44     ` Nathaniel Smith
                       ` (2 more replies)
  3 siblings, 3 replies; 48+ messages in thread
From: John W. Linville @ 2011-02-18 21:21 UTC (permalink / raw)
  To: linux-wireless
  Cc: bloat-devel, Nathaniel J. Smith, johannes, nbd, John W. Linville

This is an implementation of the eBDP algorithm as documented in
Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
et al.

	http://www.hamilton.ie/tianji_li/buffersizing.pdf

This implementation timestamps an skb before handing it to the
hardware driver, then computes the service time when the frame is
freed by the driver.  An exponentially weighted moving average of per
fragment service times is used to restrict queueing delays in hopes
of achieving a target fragment transmission latency.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
v1 -> v2:
- execute algorithm separately for each WMM queue
- change ewma scaling parameters
- calculate max queue len only when new latency data is received
- stop queues when occupancy limit is reached rather than dropping
- use skb->destructor for tracking queue occupancy

Johannes' comment about tx status reporting being unreliable (and what
he was really saying) finally sunk-in.  So, this version uses
skb->destructor to track in-flight fragments.  That should handle
fragments that get silently dropped in the driver for whatever reason
without leaking queue capacity.  Correct me if I'm wrong!

This version runs separate instances of the algorithm for each WMM
queue -- I reckon that makes sense.  I still ignore HT aggregation, as
I'm pretty sure I don't understand how it realy effects the issue.

I still haven't done any objective testing, so still no numbers -- feel
free to provide your own! ;-)  But for me, this version "feels" pretty
good where I meet the chair.

There are still bits that I would like to be adjustable.  But since I'm
still not sure what this ultimately becomes, I'm in no hurry to finalize
a UI.  I would be happy to consider a contribution from someone else in
that regard.

No doubt that this is all still quite preliminary!  I'm pretty sure
it still eats puppies.  So, don't get it wet!  And never, ever feed
it after midnight...

 net/mac80211/ieee80211_i.h |   12 ++++++++++++
 net/mac80211/iface.c       |   12 ++++++++++++
 net/mac80211/status.c      |   23 +++++++++++++++++++++++
 net/mac80211/tx.c          |   26 ++++++++++++++++++++++++++
 4 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index f2ef15d..47e65bd 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -604,6 +604,18 @@ struct ieee80211_sub_if_data {
 		struct dentry *default_mgmt_key;
 	} debugfs;
 #endif
+
+	struct {
+		/* number of packets currently in flight */
+		atomic_t enqueued;
+
+		/* computed maximum packets in flight */
+		int max_enqueued;
+
+		/* average packet service time */
+		struct ewma tserv_ns_avg;
+	} qdata[4];
+
 	/* must be last, dynamically sized area in this! */
 	struct ieee80211_vif vif;
 };
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 5a4e19b..3e04ddb 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -839,6 +839,8 @@ static void ieee80211_iface_work(struct work_struct *work)
 static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
 				  enum nl80211_iftype type)
 {
+	int i;
+
 	/* clear type-dependent union */
 	memset(&sdata->u, 0, sizeof(sdata->u));
 
@@ -857,6 +859,16 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
 	skb_queue_head_init(&sdata->skb_queue);
 	INIT_WORK(&sdata->work, ieee80211_iface_work);
 
+	for (i=0; i<4; i++) {
+		/* initialize service time estimate for buffer control */
+		ewma_init(&sdata->qdata[i].tserv_ns_avg, 1, 64);
+
+		atomic_set(&sdata->qdata[i].enqueued, 0);
+
+		/* default max should be ??? */
+		sdata->qdata[i].max_enqueued = 2;
+	}
+
 	switch (type) {
 	case NL80211_IFTYPE_P2P_GO:
 		type = NL80211_IFTYPE_AP;
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 010a559..8cb6539 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -172,6 +172,7 @@ static void ieee80211_frame_acked(struct sta_info *sta, struct sk_buff *skb)
 
 void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 {
+	ktime_t tserv;
 	struct sk_buff *skb2;
 	struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
 	struct ieee80211_local *local = hw_to_local(hw);
@@ -188,6 +189,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 	bool send_to_cooked;
 	bool acked;
 
+	/* grab timestamp info for buffer control estimates */
+	tserv = ktime_sub(ktime_get(), skb->tstamp);
+
 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++) {
 		/* the HW cannot have attempted that rate */
 		if (i >= hw->max_report_rates) {
@@ -208,10 +212,29 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 	fc = hdr->frame_control;
 
 	for_each_sta_info(local, hdr->addr1, sta, tmp) {
+		int q = skb_get_queue_mapping(skb);
+		unsigned long tserv_ns_avg;
+
 		/* skip wrong virtual interface */
 		if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN))
 			continue;
 
+		/* factor current tserv into service time estimate */
+		ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
+			 ktime_to_ns(tserv));
+
+		/* determine queue admission qualifications */
+		tserv_ns_avg = ewma_read(&sta->sdata->qdata[q].tserv_ns_avg);
+		if (tserv_ns_avg) {
+			/* latency target and min limit should be tunable? */
+			sta->sdata->qdata[q].max_enqueued =
+				max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg);
+		}
+		if (atomic_read(&sta->sdata->qdata[q].enqueued) >=
+				sta->sdata->qdata[q].max_enqueued &&
+		    !ieee80211_queue_stopped(&local->hw, q))
+			ieee80211_stop_queue(&local->hw, q);
+
 		acked = !!(info->flags & IEEE80211_TX_STAT_ACK);
 		if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) {
 			/*
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 17ef4f4..cba9060 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1284,6 +1284,20 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
 	return TX_CONTINUE;
 }
 
+static void ieee80211_tx_skb_free(struct sk_buff *skb)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev);
+	struct ieee80211_local *local = sdata->local;
+	int q = skb_get_queue_mapping(skb);
+
+	/* decrement enqueued count */
+	atomic_dec(&sdata->qdata[q].enqueued);
+
+	/* if queue stopped, wake it */
+	if (ieee80211_queue_stopped(&local->hw, q))
+		ieee80211_wake_queue(&local->hw, q);
+}
+
 static int __ieee80211_tx(struct ieee80211_local *local,
 			  struct sk_buff **skbp,
 			  struct sta_info *sta,
@@ -1323,6 +1337,18 @@ static int __ieee80211_tx(struct ieee80211_local *local,
 
 		sdata = vif_to_sdata(info->control.vif);
 
+		/* timestamp queue entry and increment queue length */
+		skb->tstamp = ktime_get();
+		atomic_inc(&sdata->qdata[q].enqueued);
+
+		/* take ownership of skb */
+		skb_orphan(skb);
+		skb->dev = sdata->dev;
+		skb->destructor = ieee80211_tx_skb_free;
+		if (atomic_read(&sdata->qdata[q].enqueued) >=
+				sdata->qdata[q].max_enqueued)
+			ieee80211_stop_queue(&local->hw, q);
+
 		switch (sdata->vif.type) {
 		case NL80211_IFTYPE_MONITOR:
 			info->control.vif = NULL;
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-18 21:21   ` [RFC v2] " John W. Linville
@ 2011-02-19  3:44     ` Nathaniel Smith
  2011-02-21 18:47       ` John W. Linville
  2011-02-20  0:37     ` Nathaniel Smith
  2011-02-21 15:28     ` Johannes Berg
  2 siblings, 1 reply; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-19  3:44 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, bloat-devel, johannes, nbd

On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> +       /* grab timestamp info for buffer control estimates */
> +       tserv = ktime_sub(ktime_get(), skb->tstamp);
[...]
> +               ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
> +                        ktime_to_ns(tserv));

I think you're still measuring how long it takes one packet to get
from the end of the queue to the beginning, rather than measuring how
long it takes each packet to go out?

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-18 21:21   ` [RFC v2] " John W. Linville
  2011-02-19  3:44     ` Nathaniel Smith
@ 2011-02-20  0:37     ` Nathaniel Smith
  2011-02-21 18:52       ` John W. Linville
  2011-02-21 15:28     ` Johannes Berg
  2 siblings, 1 reply; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-20  0:37 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, bloat-devel, johannes, nbd

Actually, a few more comments just occurred to me...

On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> Johannes' comment about tx status reporting being unreliable (and what
> he was really saying) finally sunk-in.  So, this version uses
> skb->destructor to track in-flight fragments.  That should handle
> fragments that get silently dropped in the driver for whatever reason
> without leaking queue capacity.  Correct me if I'm wrong!

Should we be somehow filtering out and ignoring the packets that get
dropped, when we're calculating the average packet transmission rate?
Presumably they're getting dropped almost instantly, so they don't
really take up queue space and they have abnormally fast transmission
times, which will tend to cause us to overestimate max_enqueued? They
should be rare, though, at least. (And presumably we *should* include
packets that get dropped because their retry timer ran out, since they
were sitting in the queue for that long.) Possibly we should just
ignore any packet that was handled in less than, oh, say, a few
microseconds?

Alternatively, if we aren't worried about those packets, then is there
any reason this patch should be mac80211 specific?

> +static void ieee80211_tx_skb_free(struct sk_buff *skb)
> +{
> +       struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev);
> +       struct ieee80211_local *local = sdata->local;
> +       int q = skb_get_queue_mapping(skb);
> +
> +       /* decrement enqueued count */
> +       atomic_dec(&sdata->qdata[q].enqueued);
> +
> +       /* if queue stopped, wake it */
> +       if (ieee80211_queue_stopped(&local->hw, q))
> +               ieee80211_wake_queue(&local->hw, q);
> +}

I think you need to check that .enqueued is < max_enqueued here,
instead of waking the queue unconditionally.

Suppose the data rate drops while there's a steady flow -- our
max_enqueued value will drop below the current queue size, but because
we wake the queue unconditionally after each packet goes out, and then
only stop it again after we've queued at least one new packet, we
might get 'stuck' with an over-large queue.

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-18 21:21   ` [RFC v2] " John W. Linville
  2011-02-19  3:44     ` Nathaniel Smith
  2011-02-20  0:37     ` Nathaniel Smith
@ 2011-02-21 15:28     ` Johannes Berg
  2011-02-21 19:06       ` John W. Linville
  2 siblings, 1 reply; 48+ messages in thread
From: Johannes Berg @ 2011-02-21 15:28 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, bloat-devel, Nathaniel J. Smith, nbd

On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote:
> This is an implementation of the eBDP algorithm as documented in
> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
> et al.
> 
> 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
> 
> This implementation timestamps an skb before handing it to the
> hardware driver, then computes the service time when the frame is
> freed by the driver.  An exponentially weighted moving average of per
> fragment service times is used to restrict queueing delays in hopes
> of achieving a target fragment transmission latency.
> 
> Signed-off-by: John W. Linville <linville@tuxdriver.com>
> ---
> v1 -> v2:
> - execute algorithm separately for each WMM queue
> - change ewma scaling parameters
> - calculate max queue len only when new latency data is received
> - stop queues when occupancy limit is reached rather than dropping
> - use skb->destructor for tracking queue occupancy
> 
> Johannes' comment about tx status reporting being unreliable (and what
> he was really saying) finally sunk-in.  So, this version uses
> skb->destructor to track in-flight fragments.  That should handle
> fragments that get silently dropped in the driver for whatever reason
> without leaking queue capacity.  Correct me if I'm wrong!

Yeah, I had that idea as well. Could unify the existing skb_orphan()
call though :-)

However, Nathaniel is right -- if the skb is freed right away during
tx() you kinda estimate its queue time to be virtually zero. That
doesn't make a lot of sense and might in certain conditions exacerbate
the problem, for example if the system is out of memory more packets
might be allowed through than in normal operation etc.

Also, for some USB drivers I believe SKB lifetime has no relation to
queue size at all because the data is just shuffled into an URB. I'm not
sure we can solve this generically. I'm not really sure how this works
for USB drivers, I think they queue up frames with the HCI controller
rather than directly with the device.

Finally, this isn't taking into account any of the issues about
aggregation and AP mode. Remember that both with multiple streams (on
different ACs) and even more so going to different stations
(AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and
let's not forget 11ac/ad) there may be vast differences in the time
different frames spend on a queue which are not just due to bloated
queues. I'm concerned about this since none of it has been taken into
account in the paper you're basing this on, all evaluations seem to be
pretty much based on a single traffic stream.

Overall, I think there should be some more research first. This might
help in some cases, but do we know it won't completely break throughput
in other cases?

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-19  3:44     ` Nathaniel Smith
@ 2011-02-21 18:47       ` John W. Linville
  2011-02-21 23:26         ` Nathaniel Smith
  0 siblings, 1 reply; 48+ messages in thread
From: John W. Linville @ 2011-02-21 18:47 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: linux-wireless, bloat-devel, johannes, nbd

On Fri, Feb 18, 2011 at 07:44:30PM -0800, Nathaniel Smith wrote:
> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
> <linville@tuxdriver.com> wrote:
> > +       /* grab timestamp info for buffer control estimates */
> > +       tserv = ktime_sub(ktime_get(), skb->tstamp);
> [...]
> > +               ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
> > +                        ktime_to_ns(tserv));
> 
> I think you're still measuring how long it takes one packet to get
> from the end of the queue to the beginning, rather than measuring how
> long it takes each packet to go out?

Yes, I am measuring how long the driver+device takes to release each
skb back to me (using that as a proxy for how long it takes to get
the fragment to the next hop).  Actually, FWIW I'm only measuring
that time for those skb's that result in a tx status report.

I tried to see how your measurement would be useful, but I just don't
see how the number of frames ahead of me in the queue is relevant to
the measured link latency?  I mean, I realize that having more packets
ahead of me in the queue is likely to increase the latency for this
frame, but I don't understand why I should use that information to
discount the measured latency...?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-20  0:37     ` Nathaniel Smith
@ 2011-02-21 18:52       ` John W. Linville
  0 siblings, 0 replies; 48+ messages in thread
From: John W. Linville @ 2011-02-21 18:52 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: linux-wireless, bloat-devel, johannes, nbd

On Sat, Feb 19, 2011 at 04:37:00PM -0800, Nathaniel Smith wrote:
> Actually, a few more comments just occurred to me...
> 
> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
> <linville@tuxdriver.com> wrote:
> > Johannes' comment about tx status reporting being unreliable (and what
> > he was really saying) finally sunk-in.  So, this version uses
> > skb->destructor to track in-flight fragments.  That should handle
> > fragments that get silently dropped in the driver for whatever reason
> > without leaking queue capacity.  Correct me if I'm wrong!
> 
> Should we be somehow filtering out and ignoring the packets that get
> dropped, when we're calculating the average packet transmission rate?
> Presumably they're getting dropped almost instantly, so they don't
> really take up queue space and they have abnormally fast transmission
> times, which will tend to cause us to overestimate max_enqueued? They
> should be rare, though, at least. (And presumably we *should* include
> packets that get dropped because their retry timer ran out, since they
> were sitting in the queue for that long.) Possibly we should just
> ignore any packet that was handled in less than, oh, say, a few
> microseconds?

If you look, I only do the timing measurement for frames that
result in a tx status report.  Frames that are dropped will only hit
ieee80211_tx_skb_free (which reduces the enqueued count but doesn't
recalculate max_enqueue).

> Alternatively, if we aren't worried about those packets, then is there
> any reason this patch should be mac80211 specific?

No, in fact I was thinking the same thing.  Some thought needs to be
put to whether or not we can ignore the effects of letting dropped
packets effect the latency estimate...
 
> > +static void ieee80211_tx_skb_free(struct sk_buff *skb)
> > +{
> > +       struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev);
> > +       struct ieee80211_local *local = sdata->local;
> > +       int q = skb_get_queue_mapping(skb);
> > +
> > +       /* decrement enqueued count */
> > +       atomic_dec(&sdata->qdata[q].enqueued);
> > +
> > +       /* if queue stopped, wake it */
> > +       if (ieee80211_queue_stopped(&local->hw, q))
> > +               ieee80211_wake_queue(&local->hw, q);
> > +}
> 
> I think you need to check that .enqueued is < max_enqueued here,
> instead of waking the queue unconditionally.
> 
> Suppose the data rate drops while there's a steady flow -- our
> max_enqueued value will drop below the current queue size, but because
> we wake the queue unconditionally after each packet goes out, and then
> only stop it again after we've queued at least one new packet, we
> might get 'stuck' with an over-large queue.

Yes, thanks for pointing that out.  My non-thorough tests seem to do
a better job at holding the latency lower with that change.

Thanks for taking a look!

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 15:28     ` Johannes Berg
@ 2011-02-21 19:06       ` John W. Linville
  2011-02-21 20:26         ` Tianji Li
  2011-02-28 13:07         ` Johannes Berg
  0 siblings, 2 replies; 48+ messages in thread
From: John W. Linville @ 2011-02-21 19:06 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless, bloat-devel, Nathaniel J. Smith, nbd

On Mon, Feb 21, 2011 at 04:28:06PM +0100, Johannes Berg wrote:
> On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote:
> > This is an implementation of the eBDP algorithm as documented in
> > Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
> > et al.
> > 
> > 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
> > 
> > This implementation timestamps an skb before handing it to the
> > hardware driver, then computes the service time when the frame is
> > freed by the driver.  An exponentially weighted moving average of per
> > fragment service times is used to restrict queueing delays in hopes
> > of achieving a target fragment transmission latency.
> > 
> > Signed-off-by: John W. Linville <linville@tuxdriver.com>
> > ---
> > v1 -> v2:
> > - execute algorithm separately for each WMM queue
> > - change ewma scaling parameters
> > - calculate max queue len only when new latency data is received
> > - stop queues when occupancy limit is reached rather than dropping
> > - use skb->destructor for tracking queue occupancy
> > 
> > Johannes' comment about tx status reporting being unreliable (and what
> > he was really saying) finally sunk-in.  So, this version uses
> > skb->destructor to track in-flight fragments.  That should handle
> > fragments that get silently dropped in the driver for whatever reason
> > without leaking queue capacity.  Correct me if I'm wrong!
> 
> Yeah, I had that idea as well. Could unify the existing skb_orphan()
> call though :-)

The one in ieee80211_skb_resize?  Any idea how that would look?
 
> However, Nathaniel is right -- if the skb is freed right away during
> tx() you kinda estimate its queue time to be virtually zero. That
> doesn't make a lot of sense and might in certain conditions exacerbate
> the problem, for example if the system is out of memory more packets
> might be allowed through than in normal operation etc.

As in my reply to Nathaniel, please notice that the timing estimate
(and the max_enqueued calculation) only happens for frames that result
in a tx status report -- at least for now...

However, if this were generalized beyond mac80211 then we wouldn't
be able to rely on tx status reports.  I can see that dropping frames
in the driver would lead to timing estimates that would cascade into
a wide-open queue size.  But I'm not sure that would be a big deal,
since in the long run those dropped frames should still result in IP
cwnd reductions, etc...?

> Also, for some USB drivers I believe SKB lifetime has no relation to
> queue size at all because the data is just shuffled into an URB. I'm not
> sure we can solve this generically. I'm not really sure how this works
> for USB drivers, I think they queue up frames with the HCI controller
> rather than directly with the device.

How do you think the time spent handling URBs in the USB stack relates
to the time spent transmitting frames?  At what point do those SKBs
get freed?

> Finally, this isn't taking into account any of the issues about
> aggregation and AP mode. Remember that both with multiple streams (on
> different ACs) and even more so going to different stations
> (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and
> let's not forget 11ac/ad) there may be vast differences in the time
> different frames spend on a queue which are not just due to bloated
> queues. I'm concerned about this since none of it has been taken into
> account in the paper you're basing this on, all evaluations seem to be
> pretty much based on a single traffic stream.

Yeah, I'm still not sure we all have our heads around these issues.
I mean, on the one hand it seems wrong to limit queueing for one
stream or station just because some other stream or station is
higher latency.  But on the other hand, it seems to me that those
streams/stations still have to share the same link and that higher
real latency for one stream/station could still result in a higher
perceived latency for another stream/station sharing the same link,
since they still have to share the same air...no?

> Overall, I think there should be some more research first. This might
> help in some cases, but do we know it won't completely break throughput
> in other cases?

That's why it is posted RFC, of course. :-)

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 19:06       ` John W. Linville
@ 2011-02-21 20:26         ` Tianji Li
  2011-02-28 13:07         ` Johannes Berg
  1 sibling, 0 replies; 48+ messages in thread
From: Tianji Li @ 2011-02-21 20:26 UTC (permalink / raw)
  To: John W. Linville, Johannes Berg, Nathaniel J. Smith
  Cc: bloat-devel, linux-wireless


On 02/21/2011 01:06 PM, John W. Linville wrote:
> On Mon, Feb 21, 2011 at 04:28:06PM +0100, Johannes Berg wrote:
>> On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote:
>>> This is an implementation of the eBDP algorithm as documented in
>>> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
>>> et al.
>>>
>>> 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
>>>
>>> This implementation timestamps an skb before handing it to the
>>> hardware driver, then computes the service time when the frame is
>>> freed by the driver.  An exponentially weighted moving average of per
>>> fragment service times is used to restrict queueing delays in hopes
>>> of achieving a target fragment transmission latency.
>>>
>>> Signed-off-by: John W. Linville<linville@tuxdriver.com>
>>> ---
>>> v1 ->  v2:
>>> - execute algorithm separately for each WMM queue
>>> - change ewma scaling parameters
>>> - calculate max queue len only when new latency data is received
>>> - stop queues when occupancy limit is reached rather than dropping
>>> - use skb->destructor for tracking queue occupancy
>>>
>>> Johannes' comment about tx status reporting being unreliable (and what
>>> he was really saying) finally sunk-in.  So, this version uses
>>> skb->destructor to track in-flight fragments.  That should handle
>>> fragments that get silently dropped in the driver for whatever reason
>>> without leaking queue capacity.  Correct me if I'm wrong!
>>
>> Yeah, I had that idea as well. Could unify the existing skb_orphan()
>> call though :-)
>
> The one in ieee80211_skb_resize?  Any idea how that would look?
>
>> However, Nathaniel is right -- if the skb is freed right away during
>> tx() you kinda estimate its queue time to be virtually zero. That
>> doesn't make a lot of sense and might in certain conditions exacerbate
>> the problem, for example if the system is out of memory more packets
>> might be allowed through than in normal operation etc.
>
> As in my reply to Nathaniel, please notice that the timing estimate
> (and the max_enqueued calculation) only happens for frames that result
> in a tx status report -- at least for now...
>
> However, if this were generalized beyond mac80211 then we wouldn't
> be able to rely on tx status reports.  I can see that dropping frames
> in the driver would lead to timing estimates that would cascade into
> a wide-open queue size.  But I'm not sure that would be a big deal,
> since in the long run those dropped frames should still result in IP
> cwnd reductions, etc...?
>
>> Also, for some USB drivers I believe SKB lifetime has no relation to
>> queue size at all because the data is just shuffled into an URB. I'm not
>> sure we can solve this generically. I'm not really sure how this works
>> for USB drivers, I think they queue up frames with the HCI controller
>> rather than directly with the device.
>
> How do you think the time spent handling URBs in the USB stack relates
> to the time spent transmitting frames?  At what point do those SKBs
> get freed?
>
>> Finally, this isn't taking into account any of the issues about
>> aggregation and AP mode. Remember that both with multiple streams (on
>> different ACs) and even more so going to different stations
>> (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and
>> let's not forget 11ac/ad) there may be vast differences in the time
>> different frames spend on a queue which are not just due to bloated
>> queues. I'm concerned about this since none of it has been taken into
>> account in the paper you're basing this on, all evaluations seem to be
>> pretty much based on a single traffic stream.
>
> Yeah, I'm still not sure we all have our heads around these issues.
> I mean, on the one hand it seems wrong to limit queueing for one
> stream or station just because some other stream or station is
> higher latency.  But on the other hand, it seems to me that those
> streams/stations still have to share the same link and that higher
> real latency for one stream/station could still result in a higher
> perceived latency for another stream/station sharing the same link,
> since they still have to share the same air...no?

This is a good point.

A buffer builds up when there are long-lived TCP flows. They can block 
the buffer since they are elastic in the sense that they send more 
packets when previous are acknowledged, which means that if the flow is 
long, lots of packets will arrive at the buffer almost at the same time. 
If the buffer is large, the waiting to be serviced can be long. This is 
fine for long-lived flows since when we are download a large file, we do 
not quite care if we are done in 2 minutes or 3. However, if there are a 
couple of email checks, no one can tolerate a 'fresh' click takes 3-2=1 
minute.

To mitigate, we shorten the buffer sizes (by dropping) so that the 
waiting can be shorter. Since the long-lived flows are dominating, 
dropping happens much more likely on them, some packets from short-lived 
can also be dropped too if they are not lucky. Still due to the elastic 
(it is both bad and good :-)) nature of TCP, the dropping on long-lived 
flows makes them backoff, which gives time to short ones.

If a buffer is not used by elastic traffic, there is no need to do 
buffering. (Note that UDP can be elastic as well. The application layer 
of UDP normally has some logic to backoff if waiting is too long)

In 802.11 standard, there is only one queue. While in 802.11e/n, there 
are four, and by default only one of which is used for TCP (but this can 
be changed). There are some other queues in the drive for control 
purposes, but they do not count.

In our paper, we were doing buffersizing on the 802.11e/n TCP queue 
only. For 802.11, we need a few buffers on top of the 802.11 standard 
one to mimic those of 802.11e, and use sizing on the TCP buffer only.

The scheduling of which queues should be active at the MAC layer is 
another issue, which can not be solved with the sizing logic.

AQM may not may not be a better issue, but the issue is that it is not 
enabled even if so well known.

My 2 cents,
Tianji

>
>> Overall, I think there should be some more research first. This might
>> help in some cases, but do we know it won't completely break throughput
>> in other cases?
>
> That's why it is posted RFC, of course. :-)
>
> John

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 18:47       ` John W. Linville
@ 2011-02-21 23:26         ` Nathaniel Smith
  2011-02-23 22:28           ` John W. Linville
  0 siblings, 1 reply; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-21 23:26 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, bloat-devel, johannes, nbd

On Mon, Feb 21, 2011 at 10:47 AM, John W. Linville
<linville@tuxdriver.com> wrote:
> On Fri, Feb 18, 2011 at 07:44:30PM -0800, Nathaniel Smith wrote:
>> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
>> <linville@tuxdriver.com> wrote:
>> > +       /* grab timestamp info for buffer control estimates */
>> > +       tserv = ktime_sub(ktime_get(), skb->tstamp);
>> [...]
>> > +               ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
>> > +                        ktime_to_ns(tserv));
>>
>> I think you're still measuring how long it takes one packet to get
>> from the end of the queue to the beginning, rather than measuring how
>> long it takes each packet to go out?
>
> Yes, I am measuring how long the driver+device takes to release each
> skb back to me (using that as a proxy for how long it takes to get
> the fragment to the next hop).  Actually, FWIW I'm only measuring
> that time for those skb's that result in a tx status report.
>
> I tried to see how your measurement would be useful, but I just don't
> see how the number of frames ahead of me in the queue is relevant to
> the measured link latency?  I mean, I realize that having more packets
> ahead of me in the queue is likely to increase the latency for this
> frame, but I don't understand why I should use that information to
> discount the measured latency...?

It depends on which latency you want to measure. The way that I
reasoned was, suppose that at some given time, the card is able to
transmit 1 fragment every T nanoseconds. Then it can transmit n
fragments in n*T nanoseconds, so if we want the queue depth to be 2
ms, then we have
  n * T = 2 * NSEC_PER_MSEC
  n = 2 * NSEC_PER_MSEC / T

Which is the calculation that you're doing:

+                       sta->sdata->qdata[q].max_enqueued =
+                               max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg);

But for this calculation to make sense, we need T to be the time it
takes the card to transmit 1 fragment. In your patch, you're not
measuring that. You're measuring the total time between when a packet
is enqueued and when it is transmitted; if there were K packets in the
queue ahead of it, then this is the time to send *all* of them --
you're measuring (K+1)*T. That's why in my patch, I recorded the
current size of the queue when each packet is enqueued, so I could
compute T = total_time / (K+1).

Under saturation conditions, K+1 will always equal max_enqueued, so I
guess in your algorithm, at the steady state we have

  max_enqueued = K+1 = 2 * NSEC_PER_MSEC / ((K+1) * T)
  (K+1)^2 = 2 * NSEC_PER_MSEC / T
  K+1 = sqrt(2 * NSEC_PER_MSEC / T)

So I think under saturation, you converge to setting the queue to the
square root of the appropriate size?

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 23:26         ` Nathaniel Smith
@ 2011-02-23 22:28           ` John W. Linville
  2011-02-25 18:21             ` Nathaniel Smith
  0 siblings, 1 reply; 48+ messages in thread
From: John W. Linville @ 2011-02-23 22:28 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: linux-wireless, bloat-devel, johannes, nbd

On Mon, Feb 21, 2011 at 03:26:32PM -0800, Nathaniel Smith wrote:
> On Mon, Feb 21, 2011 at 10:47 AM, John W. Linville
> <linville@tuxdriver.com> wrote:

> > I tried to see how your measurement would be useful, but I just don't
> > see how the number of frames ahead of me in the queue is relevant to
> > the measured link latency?  I mean, I realize that having more packets
> > ahead of me in the queue is likely to increase the latency for this
> > frame, but I don't understand why I should use that information to
> > discount the measured latency...?
> 
> It depends on which latency you want to measure. The way that I
> reasoned was, suppose that at some given time, the card is able to
> transmit 1 fragment every T nanoseconds. Then it can transmit n
> fragments in n*T nanoseconds, so if we want the queue depth to be 2
> ms, then we have
>   n * T = 2 * NSEC_PER_MSEC
>   n = 2 * NSEC_PER_MSEC / T
> 
> Which is the calculation that you're doing:
> 
> +                       sta->sdata->qdata[q].max_enqueued =
> +                               max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg);
> 
> But for this calculation to make sense, we need T to be the time it
> takes the card to transmit 1 fragment. In your patch, you're not
> measuring that. You're measuring the total time between when a packet
> is enqueued and when it is transmitted; if there were K packets in the
> queue ahead of it, then this is the time to send *all* of them --
> you're measuring (K+1)*T. That's why in my patch, I recorded the
> current size of the queue when each packet is enqueued, so I could
> compute T = total_time / (K+1).

Thanks for the math!  I think I see what you are saying now.  Since the
measured time is being used to determine the queue size, we need to
factor-in the length of the queue that resulted in that measurment.

Unfortunately, I'm not sure how to apply this with the technique I
am using for the timing measurements. :-(  I'll have to think about
this some more...

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-23 22:28           ` John W. Linville
@ 2011-02-25 18:21             ` Nathaniel Smith
  2011-02-25 18:27               ` Nathaniel Smith
  0 siblings, 1 reply; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-25 18:21 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, bloat-devel, johannes, nbd

On Wed, Feb 23, 2011 at 2:28 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> On Mon, Feb 21, 2011 at 03:26:32PM -0800, Nathaniel Smith wrote:
>> But for this calculation to make sense, we need T to be the time it
>> takes the card to transmit 1 fragment. In your patch, you're not
>> measuring that. You're measuring the total time between when a packet
>> is enqueued and when it is transmitted; if there were K packets in the
>> queue ahead of it, then this is the time to send *all* of them --
>> you're measuring (K+1)*T. That's why in my patch, I recorded the
>> current size of the queue when each packet is enqueued, so I could
>> compute T = total_time / (K+1).
>
> Thanks for the math!  I think I see what you are saying now.  Since the
> measured time is being used to determine the queue size, we need to
> factor-in the length of the queue that resulted in that measurment.
>
> Unfortunately, I'm not sure how to apply this with the technique I
> am using for the timing measurements. :-(  I'll have to think about
> this some more...

I guess an ugly but effective approach would be to steal some bits
from the timestamp... something like (untested, and probably needs
some sign-extension bugs fixed):

#define LATENCY_BITS 8
#define ENQUEUED_BITS 24
BUILD_BUG_ON(LATENCY_BITS + ENQUEUED_BITS != 32);

static ktime_t _ktime_get_and_stash(int enqueued) {
        timespec now = ktime_to_timespec(ktime_get());
        now.sec &= (1 << LATENCY_BITS) - 1;
        BUG_ON(enqueued > (1 << ENQUEUED_BITS));
        now.sec |= enqueued << LATENCY_BITS;
        return timespec_to_ktime(now);
}

static u64 _ktime_diff_to_now_and_unstash(ktime_t then, int * enqueued) {

        timespec ts_then = ktime_to_timespec(then);
        timespec ts_now = ktime_to_timespec(ktime_get());
        *enqueued = ts_then.tv_sec >> LATENCY_BITS;
        ts_then.tv_sec &= (1 << LATENCY_BITS) - 1;
        ts_now.tv_sec &= (1 << LATENCY_BITS) - 1;
        if (ts_now.tv_sec < ts_then.tv_sec)
                ts_now.tv_sec += (1 << LATENCY_BITS);
         timespec_sub(ts_now, ts_then);
}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-25 18:21             ` Nathaniel Smith
@ 2011-02-25 18:27               ` Nathaniel Smith
  0 siblings, 0 replies; 48+ messages in thread
From: Nathaniel Smith @ 2011-02-25 18:27 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, bloat-devel, johannes, nbd

On Fri, Feb 25, 2011 at 10:21 AM, Nathaniel Smith <njs@pobox.com> wrote:
> static u64 _ktime_diff_to_now_and_unstash(ktime_t then, int * enqueued) {
>
>        timespec ts_then = ktime_to_timespec(then);
>        timespec ts_now = ktime_to_timespec(ktime_get());
>        *enqueued = ts_then.tv_sec >> LATENCY_BITS;
>        ts_then.tv_sec &= (1 << LATENCY_BITS) - 1;
>        ts_now.tv_sec &= (1 << LATENCY_BITS) - 1;
>        if (ts_now.tv_sec < ts_then.tv_sec)
>                ts_now.tv_sec += (1 << LATENCY_BITS);
>         timespec_sub(ts_now, ts_then);
> }

Err, plus the 'return timespec_to_ns(...)' on the last line, that I
was trying to add when my computer suddenly decided I wanted to send
the message. How embarrassing.

Anyway, not sure this is a *good* idea, but it should work. Hopefully
we don't actually need to measure latencies > 256 seconds in this
context...

-- Nathaniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 19:06       ` John W. Linville
  2011-02-21 20:26         ` Tianji Li
@ 2011-02-28 13:07         ` Johannes Berg
  1 sibling, 0 replies; 48+ messages in thread
From: Johannes Berg @ 2011-02-28 13:07 UTC (permalink / raw)
  To: John W. Linville; +Cc: linux-wireless, bloat-devel, Nathaniel J. Smith, nbd

On Mon, 2011-02-21 at 14:06 -0500, John W. Linville wrote:

> > Yeah, I had that idea as well. Could unify the existing skb_orphan()
> > call though :-)
> 
> The one in ieee80211_skb_resize?  Any idea how that would look?

Yeah. I think it'd have to be moved out of _skb_resize and made
unconditional in that path, since eventually with this patch you'd do it
anyway.

> As in my reply to Nathaniel, please notice that the timing estimate
> (and the max_enqueued calculation) only happens for frames that result
> in a tx status report -- at least for now...

Oops, right.

> However, if this were generalized beyond mac80211 then we wouldn't
> be able to rely on tx status reports.  I can see that dropping frames
> in the driver would lead to timing estimates that would cascade into
> a wide-open queue size.  But I'm not sure that would be a big deal,
> since in the long run those dropped frames should still result in IP
> cwnd reductions, etc...?

I don't think we can generically rely on skb_orphan() in the network
stack since that will make socket buffer limits meaningless. In fact, it
pains me a bit that we had to do this in wireless before buffering the
skb, and doing it unconditionally may be worse?

> How do you think the time spent handling URBs in the USB stack relates
> to the time spent transmitting frames?  At what point do those SKBs
> get freed?

I honestly don't know. I would hope they are only freed when the URB was
processed (i.e. at least DMA'd to the target device) but I suppose a
driver might also copy the TX frame completely.

> Yeah, I'm still not sure we all have our heads around these issues.
> I mean, on the one hand it seems wrong to limit queueing for one
> stream or station just because some other stream or station is
> higher latency.  But on the other hand, it seems to me that those
> streams/stations still have to share the same link and that higher
> real latency for one stream/station could still result in a higher
> perceived latency for another stream/station sharing the same link,
> since they still have to share the same air...no?

Yeah, but retries (robustness) and aggregation (throughput) will
invariably affect latency for everybody else using the shared medium. I
suppose it would be better if queueing would be limited to a certain
amount of air time use *per peer station*, so that each connection can
have fairly low latency. However, this seems much harder to do. But what
could happen here is that bursty traffic to a far-away (slow) station
severely affects latency for and because there's also high traffic to a
closer station that caused a buffering increase.

johannes


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2011-02-28 13:07 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-13 17:56 [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Nathaniel J. Smith
2011-02-13 17:56 ` [PATCH 1/5] iwlwifi: Simplify tx queue management Nathaniel J. Smith
2011-02-14  9:57   ` Johannes Berg
2011-02-14 22:17     ` Nathaniel Smith
2011-02-14 22:45       ` wwguy
2011-02-15  0:15         ` Dave Täht
2011-02-16  9:16         ` Stanislaw Gruszka
2011-02-16 14:41           ` John W. Linville
2011-02-16 15:13             ` wwguy
2011-02-15 12:11       ` Johannes Berg
2011-02-14 15:33   ` wwguy
2011-02-13 17:56 ` [PATCH 2/5] iwlwifi: Convert the tx queue high_mark to an atomic_t Nathaniel J. Smith
2011-02-14 12:16   ` Johannes Berg
2011-02-14 22:35     ` Nathaniel Smith
2011-02-15 12:08       ` Johannes Berg
2011-02-15 17:37         ` Nathaniel Smith
2011-02-13 17:56 ` [PATCH 3/5] iwlwifi: Invert the sense of the queue high_mark Nathaniel J. Smith
2011-02-13 17:56 ` [PATCH 4/5] iwlwifi: auto-tune tx queue size to minimize latency Nathaniel J. Smith
2011-02-14 12:17   ` Johannes Berg
2011-02-14 21:58     ` Nathaniel Smith
2011-02-15 12:13       ` Johannes Berg
2011-02-15 15:03         ` John W. Linville
2011-02-16  8:59           ` Johannes Berg
2011-02-15 17:31         ` Nathaniel Smith
2011-02-14 15:46   ` wwguy
2011-02-13 17:56 ` [PATCH 5/5] iwlwifi: make current tx queue sizes visible in debugfs Nathaniel J. Smith
2011-02-14  0:32 ` [PATCH 0/5] iwlwifi: Auto-tune tx queue size to maintain latency under load Julian Calaby
2011-02-14  3:28   ` Nathaniel Smith
2011-02-16 15:50 ` John W. Linville
2011-02-16 23:08   ` Nathaniel Smith
2011-02-16 23:42     ` wwguy
2011-02-17  1:49 ` [RFC] mac80211: implement eBDP algorithm to fight bufferbloat John W. Linville
2011-02-17  3:31   ` Ben Greear
2011-02-17  4:26   ` Nathaniel Smith
2011-02-17  8:31   ` Johannes Berg
2011-02-18 21:21   ` [RFC v2] " John W. Linville
2011-02-19  3:44     ` Nathaniel Smith
2011-02-21 18:47       ` John W. Linville
2011-02-21 23:26         ` Nathaniel Smith
2011-02-23 22:28           ` John W. Linville
2011-02-25 18:21             ` Nathaniel Smith
2011-02-25 18:27               ` Nathaniel Smith
2011-02-20  0:37     ` Nathaniel Smith
2011-02-21 18:52       ` John W. Linville
2011-02-21 15:28     ` Johannes Berg
2011-02-21 19:06       ` John W. Linville
2011-02-21 20:26         ` Tianji Li
2011-02-28 13:07         ` Johannes Berg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.