linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: greearb@candelatech.com
To: linux-wireless@vger.kernel.org
Cc: ath9k-devel@venema.h4ckr.net, Ben Greear <greearb@candelatech.com>
Subject: [RFC] ath9k:  Detect and work-around tx-queue hang.
Date: Thu, 21 Feb 2013 18:06:37 -0800	[thread overview]
Message-ID: <1361498797-14361-1-git-send-email-greearb@candelatech.com> (raw)

From: Ben Greear <greearb@candelatech.com>

We see TX lockups on ar9380 NICs when running 32 stations
each with a 56kbps stream of MTU sized UDP packets.
We see lockups on the AP and also on the station, seems
random which hits first.

The test case further involves a programmable attenuator,
and the attenuation is taken from -30 to -85 signal level
in steps of 10db.  Each step runs for 1 minute before
increasing the attenuation.  The problem normally
shows up around signal level of -70 (noise is reported
as around -95).

When the lockup hits, it is typically on a single queue
(BE).  The symptom is that there is no obvious transmit
activity on that queue, the acq-depth and axq-ampdu-depth
are zero, the queue is stopped, and the pending-frames is
at or above the maximum allowed.  The VO queue continues
to function, and RX logic functions fine.

Just resetting the chip does not fix the problem:  The
pending-frames usually stays at max.  So, this patch also
adds hacks to force pending-frames to zero.  It also
quietens some warnings about pending-frame underruns
because sometimes, the tx status does appear many seconds
later.

Finally, the reset fixup code is logged at ath_err because
I think everyone should be aware of events like this.

We see the same problem with ath9k rate control and
minstrel-ht.  We have not tested other ath9k chipsets
in this manner.

Small numbers of high-speed stations do not hit this
problem, or at least not in our test cases.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/wireless/ath/ath9k/ath9k.h |    2 ++
 drivers/net/wireless/ath/ath9k/link.c  |   30 ++++++++++++++++++++++++++++--
 drivers/net/wireless/ath/ath9k/main.c  |    5 +++--
 drivers/net/wireless/ath/ath9k/xmit.c  |   15 ++++++++++++++-
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index d7897dcf..cc8d560 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -194,6 +194,7 @@ struct ath_txq {
 	u32 axq_ampdu_depth;
 	bool stopped;
 	bool axq_tx_inprogress;
+	bool clear_pending_frames_on_flush;
 	struct list_head axq_acq;
 	struct list_head txq_fifo[ATH_TXFIFO_DEPTH];
 	u8 txq_headidx;
@@ -684,6 +685,7 @@ struct ath_softc {
 	u16 curtxpow;
 	bool ps_enabled;
 	bool ps_idle;
+	bool reset_force_noretry;
 	short nbcnvifs;
 	short nvifs;
 	unsigned long ps_usecount;
diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index 7b88b9c..b59565c 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -38,18 +38,44 @@ void ath_tx_complete_poll_work(struct work_struct *work)
 			if (txq->axq_depth) {
 				if (txq->axq_tx_inprogress) {
 					needreset = true;
+					ath_err(ath9k_hw_common(sc->sc_ah),
+						"tx hung, queue: %i axq-depth: %i, ampdu-depth: %i resetting the chip\n",
+						i, txq->axq_depth,
+						txq->axq_ampdu_depth);
 					ath_txq_unlock(sc, txq);
 					break;
 				} else {
 					txq->axq_tx_inprogress = true;
 				}
+			} else {
+				/* Check for software TX hang.  It seems
+				 * sometimes pending-frames is not properly
+				 * decremented, and the tx queue hangs.
+				 * Considered hung if:  axq-depth is zero,
+				 *  ampdu-depth is zero, queue-is-stopped,
+				 *  and we have pending frames.
+				 */
+				if (txq->stopped &&
+				    (txq->axq_ampdu_depth == 0) &&
+				    (txq->pending_frames > 0)) {
+					if (txq->axq_tx_inprogress) {
+						ath_err(ath9k_hw_common(sc->sc_ah),
+							"soft tx hang: queue: %i pending-frames: %i, resetting chip\n",
+							i, txq->pending_frames);
+						needreset = true;
+						txq->clear_pending_frames_on_flush = true;
+						sc->reset_force_noretry = true;
+						ath_txq_unlock(sc, txq);
+						break;
+					} else {
+						txq->axq_tx_inprogress = true;
+					}
+				}
 			}
 			ath_txq_unlock_complete(sc, txq);
 		}
 
 	if (needreset) {
-		ath_dbg(ath9k_hw_common(sc->sc_ah), RESET,
-			"tx hung, resetting the chip\n");
 		ath9k_queue_reset(sc, RESET_TYPE_TX_HANG);
 		return;
 	}
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 5c8758d..0de0e50 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -587,8 +587,9 @@ void ath9k_queue_reset(struct ath_softc *sc, enum ath_reset_type type)
 void ath_reset_work(struct work_struct *work)
 {
 	struct ath_softc *sc = container_of(work, struct ath_softc, hw_reset_work);
-
-	ath_reset(sc, true);
+	bool retry_tx = !sc->reset_force_noretry;
+	sc->reset_force_noretry = false;
+	ath_reset(sc, retry_tx);
 }
 
 /**********************/
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 741918a..093c77e 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -1543,6 +1543,15 @@ void ath_draintxq(struct ath_softc *sc, struct ath_txq *txq, bool retry_tx)
 	if ((sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT) && !retry_tx)
 		ath_txq_drain_pending_buffers(sc, txq);
 
+	if (txq->clear_pending_frames_on_flush && (txq->pending_frames != 0)) {
+		ath_err(ath9k_hw_common(sc->sc_ah),
+			"Pending frames still exist on txq: %i after drain: %i  axq-depth: %i  ampdu-depth: %i\n",
+			txq->mac80211_qnum, txq->pending_frames, txq->axq_depth,
+			txq->axq_ampdu_depth);
+		txq->pending_frames = 0;
+	}
+	txq->clear_pending_frames_on_flush = false;
+
 	ath_txq_unlock_complete(sc, txq);
 }
 
@@ -2066,8 +2075,12 @@ static void ath_tx_complete(struct ath_softc *sc, struct sk_buff *skb,
 
 	q = skb_get_queue_mapping(skb);
 	if (txq == sc->tx.txq_map[q]) {
-		if (WARN_ON(--txq->pending_frames < 0))
+		if (--txq->pending_frames < 0) {
+			if (net_ratelimit())
+				ath_err(common, "txq: %p had negative pending_frames, q: %i\n",
+					txq, q);
 			txq->pending_frames = 0;
+		}
 
 		if (txq->stopped &&
 		    txq->pending_frames < sc->tx.txq_max_pending[q]) {
-- 
1.7.3.4


             reply	other threads:[~2013-02-22  2:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-22  2:06 greearb [this message]
2013-02-22  4:28 ` [RFC] ath9k: Detect and work-around tx-queue hang Sujith Manoharan
2013-02-22  4:42   ` Ben Greear
2013-02-22  4:49     ` Sujith Manoharan
2013-02-22  5:26       ` Ben Greear
2013-02-22 11:36         ` [ath9k-devel] " Felix Fietkau
2013-02-22 12:25           ` Sujith Manoharan
2013-02-22 12:38             ` Felix Fietkau
2013-02-22 12:55               ` Ben Greear
2013-03-12 18:16                 ` Ben Greear
2013-03-13 14:14                   ` Sujith Manoharan
2013-03-13 14:18                   ` Felix Fietkau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1361498797-14361-1-git-send-email-greearb@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=ath9k-devel@venema.h4ckr.net \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).