From: greearb@candelatech.com
To: linux-wireless@vger.kernel.org
Cc: ath9k-devel@venema.h4ckr.net, Ben Greear <greearb@candelatech.com>
Subject: [RFC] ath9k: Detect and work-around tx-queue hang.
Date: Thu, 21 Feb 2013 18:06:37 -0800 [thread overview]
Message-ID: <1361498797-14361-1-git-send-email-greearb@candelatech.com> (raw)
From: Ben Greear <greearb@candelatech.com>
We see TX lockups on ar9380 NICs when running 32 stations
each with a 56kbps stream of MTU sized UDP packets.
We see lockups on the AP and also on the station, seems
random which hits first.
The test case further involves a programmable attenuator,
and the attenuation is taken from -30 to -85 signal level
in steps of 10db. Each step runs for 1 minute before
increasing the attenuation. The problem normally
shows up around signal level of -70 (noise is reported
as around -95).
When the lockup hits, it is typically on a single queue
(BE). The symptom is that there is no obvious transmit
activity on that queue, the acq-depth and axq-ampdu-depth
are zero, the queue is stopped, and the pending-frames is
at or above the maximum allowed. The VO queue continues
to function, and RX logic functions fine.
Just resetting the chip does not fix the problem: The
pending-frames usually stays at max. So, this patch also
adds hacks to force pending-frames to zero. It also
quietens some warnings about pending-frame underruns
because sometimes, the tx status does appear many seconds
later.
Finally, the reset fixup code is logged at ath_err because
I think everyone should be aware of events like this.
We see the same problem with ath9k rate control and
minstrel-ht. We have not tested other ath9k chipsets
in this manner.
Small numbers of high-speed stations do not hit this
problem, or at least not in our test cases.
Signed-off-by: Ben Greear <greearb@candelatech.com>
---
drivers/net/wireless/ath/ath9k/ath9k.h | 2 ++
drivers/net/wireless/ath/ath9k/link.c | 30 ++++++++++++++++++++++++++++--
drivers/net/wireless/ath/ath9k/main.c | 5 +++--
drivers/net/wireless/ath/ath9k/xmit.c | 15 ++++++++++++++-
4 files changed, 47 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index d7897dcf..cc8d560 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -194,6 +194,7 @@ struct ath_txq {
u32 axq_ampdu_depth;
bool stopped;
bool axq_tx_inprogress;
+ bool clear_pending_frames_on_flush;
struct list_head axq_acq;
struct list_head txq_fifo[ATH_TXFIFO_DEPTH];
u8 txq_headidx;
@@ -684,6 +685,7 @@ struct ath_softc {
u16 curtxpow;
bool ps_enabled;
bool ps_idle;
+ bool reset_force_noretry;
short nbcnvifs;
short nvifs;
unsigned long ps_usecount;
diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index 7b88b9c..b59565c 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -38,18 +38,44 @@ void ath_tx_complete_poll_work(struct work_struct *work)
if (txq->axq_depth) {
if (txq->axq_tx_inprogress) {
needreset = true;
+ ath_err(ath9k_hw_common(sc->sc_ah),
+ "tx hung, queue: %i axq-depth: %i, ampdu-depth: %i resetting the chip\n",
+ i, txq->axq_depth,
+ txq->axq_ampdu_depth);
ath_txq_unlock(sc, txq);
break;
} else {
txq->axq_tx_inprogress = true;
}
+ } else {
+ /* Check for software TX hang. It seems
+ * sometimes pending-frames is not properly
+ * decremented, and the tx queue hangs.
+ * Considered hung if: axq-depth is zero,
+ * ampdu-depth is zero, queue-is-stopped,
+ * and we have pending frames.
+ */
+ if (txq->stopped &&
+ (txq->axq_ampdu_depth == 0) &&
+ (txq->pending_frames > 0)) {
+ if (txq->axq_tx_inprogress) {
+ ath_err(ath9k_hw_common(sc->sc_ah),
+ "soft tx hang: queue: %i pending-frames: %i, resetting chip\n",
+ i, txq->pending_frames);
+ needreset = true;
+ txq->clear_pending_frames_on_flush = true;
+ sc->reset_force_noretry = true;
+ ath_txq_unlock(sc, txq);
+ break;
+ } else {
+ txq->axq_tx_inprogress = true;
+ }
+ }
}
ath_txq_unlock_complete(sc, txq);
}
if (needreset) {
- ath_dbg(ath9k_hw_common(sc->sc_ah), RESET,
- "tx hung, resetting the chip\n");
ath9k_queue_reset(sc, RESET_TYPE_TX_HANG);
return;
}
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 5c8758d..0de0e50 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -587,8 +587,9 @@ void ath9k_queue_reset(struct ath_softc *sc, enum ath_reset_type type)
void ath_reset_work(struct work_struct *work)
{
struct ath_softc *sc = container_of(work, struct ath_softc, hw_reset_work);
-
- ath_reset(sc, true);
+ bool retry_tx = !sc->reset_force_noretry;
+ sc->reset_force_noretry = false;
+ ath_reset(sc, retry_tx);
}
/**********************/
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index 741918a..093c77e 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -1543,6 +1543,15 @@ void ath_draintxq(struct ath_softc *sc, struct ath_txq *txq, bool retry_tx)
if ((sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT) && !retry_tx)
ath_txq_drain_pending_buffers(sc, txq);
+ if (txq->clear_pending_frames_on_flush && (txq->pending_frames != 0)) {
+ ath_err(ath9k_hw_common(sc->sc_ah),
+ "Pending frames still exist on txq: %i after drain: %i axq-depth: %i ampdu-depth: %i\n",
+ txq->mac80211_qnum, txq->pending_frames, txq->axq_depth,
+ txq->axq_ampdu_depth);
+ txq->pending_frames = 0;
+ }
+ txq->clear_pending_frames_on_flush = false;
+
ath_txq_unlock_complete(sc, txq);
}
@@ -2066,8 +2075,12 @@ static void ath_tx_complete(struct ath_softc *sc, struct sk_buff *skb,
q = skb_get_queue_mapping(skb);
if (txq == sc->tx.txq_map[q]) {
- if (WARN_ON(--txq->pending_frames < 0))
+ if (--txq->pending_frames < 0) {
+ if (net_ratelimit())
+ ath_err(common, "txq: %p had negative pending_frames, q: %i\n",
+ txq, q);
txq->pending_frames = 0;
+ }
if (txq->stopped &&
txq->pending_frames < sc->tx.txq_max_pending[q]) {
--
1.7.3.4
next reply other threads:[~2013-02-22 2:06 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-22 2:06 greearb [this message]
2013-02-22 4:28 ` [RFC] ath9k: Detect and work-around tx-queue hang Sujith Manoharan
2013-02-22 4:42 ` Ben Greear
2013-02-22 4:49 ` Sujith Manoharan
2013-02-22 5:26 ` Ben Greear
2013-02-22 11:36 ` [ath9k-devel] " Felix Fietkau
2013-02-22 12:25 ` Sujith Manoharan
2013-02-22 12:38 ` Felix Fietkau
2013-02-22 12:55 ` Ben Greear
2013-03-12 18:16 ` Ben Greear
2013-03-13 14:14 ` Sujith Manoharan
2013-03-13 14:18 ` Felix Fietkau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1361498797-14361-1-git-send-email-greearb@candelatech.com \
--to=greearb@candelatech.com \
--cc=ath9k-devel@venema.h4ckr.net \
--cc=linux-wireless@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).