linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Linus Lüssing" <linus.luessing@c0d3.blue>
To: Kalle Valo <kvalo@codeaurora.org>, Felix Fietkau <nbd@nbd.name>,
	Sujith Manoharan <c_manoha@qca.qualcomm.com>,
	ath9k-devel@qca.qualcomm.com
Cc: linux-wireless@vger.kernel.org,
	"David S . Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"John W . Linville" <linville@tuxdriver.com>,
	"Felix Fietkau" <nbd@openwrt.org>,
	"Simon Wunderlich" <sw@simonwunderlich.de>,
	"Sven Eckelmann" <sven@narfation.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Linus Lüssing" <ll@simonwunderlich.de>,
	"Linus Lüssing" <linus.luessing@c0d3.blue>
Subject: [PATCH 2/3] ath9k: Fix potential interrupt storm on queue reset
Date: Tue, 14 Sep 2021 21:25:14 +0200	[thread overview]
Message-ID: <20210914192515.9273-3-linus.luessing@c0d3.blue> (raw)
In-Reply-To: <20210914192515.9273-1-linus.luessing@c0d3.blue>

From: Linus Lüssing <ll@simonwunderlich.de>

In tests with two Lima boards from 8devices (QCA4531 based) on OpenWrt
19.07 we could force a silent restart of a device with no serial
output when we were sending a high amount of UDP traffic (iperf3 at 80
MBit/s in both directions from external hosts, saturating the wifi and
causing a load of about 4.5 to 6) and were then triggering an
ath9k_queue_reset().

Further debugging showed that the restart was caused by the ath79
watchdog. With disabled watchdog we could observe that the device was
constantly going into ath_isr() interrupt handler and was returning
early after the ATH_OP_HW_RESET flag test, without clearing any
interrupts. Even though ath9k_queue_reset() calls
ath9k_hw_kill_interrupts().

With JTAG we could observe the following race condition:

1) ath9k_queue_reset()
   ...
   -> ath9k_hw_kill_interrupts()
   -> set_bit(ATH_OP_HW_RESET, &common->op_flags);
   ...
   <- returns

      2) ath9k_tasklet()
         ...
         -> ath9k_hw_resume_interrupts()
         ...
         <- returns

                 3) loops around:
                    ...
                    handle_int()
                    -> ath_isr()
                       ...
                       -> if (test_bit(ATH_OP_HW_RESET,
                                       &common->op_flags))
                            return IRQ_HANDLED;

                    x) ath_reset_internal():
                       => never reached <=

And in ath_isr() we would typically see the following interrupts /
interrupt causes:

* status: 0x00111030 or 0x00110030
* async_cause: 2 (AR_INTR_MAC_IPQ)
* sync_cause: 0

So the ath9k_tasklet() reenables the ath9k interrupts
through ath9k_hw_resume_interrupts() which ath9k_queue_reset() had just
disabled. And ath_isr() then keeps firing because it returns IRQ_HANDLED
without actually clearing the interrupt.

To fix this IRQ storm also clear/disable the interrupts again when we
are in reset state.

Cc: Sven Eckelmann <sven@narfation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Linus Lüssing <linus.luessing@c0d3.blue>
Fixes: 872b5d814f99 ("ath9k: do not access hardware on IRQs during reset")
Signed-off-by: Linus Lüssing <ll@simonwunderlich.de>
---
 drivers/net/wireless/ath/ath9k/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 139831539da3..98090e40e1cf 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -533,8 +533,10 @@ irqreturn_t ath_isr(int irq, void *dev)
 	ath9k_debug_sync_cause(sc, sync_cause);
 	status &= ah->imask;	/* discard unasked-for bits */
 
-	if (test_bit(ATH_OP_HW_RESET, &common->op_flags))
+	if (test_bit(ATH_OP_HW_RESET, &common->op_flags)) {
+		ath9k_hw_kill_interrupts(sc->sc_ah);
 		return IRQ_HANDLED;
+	}
 
 	/*
 	 * If there are no status bits set, then this interrupt was not
-- 
2.31.0


  parent reply	other threads:[~2021-09-14 19:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 19:25 [PATCH 0/3] ath9k: interrupt fixes on queue reset Linus Lüssing
2021-09-14 19:25 ` [PATCH 1/3] ath9k: add option to reset the wifi chip via debugfs Linus Lüssing
2021-10-05 14:27   ` Kalle Valo
2021-09-14 19:25 ` Linus Lüssing [this message]
2021-09-14 19:25 ` [PATCH 3/3] ath9k: Fix potential hw interrupt resume during reset Linus Lüssing
2021-09-15  9:48   ` Felix Fietkau
2021-09-15 19:18     ` Linus Lüssing
2021-09-14 19:53 ` [PATCH 0/3] ath9k: interrupt fixes on queue reset Toke Høiland-Jørgensen
2021-09-15  9:23   ` Linus Lüssing
2021-10-05 14:12 ` Linus Lüssing
2021-10-05 14:24   ` Kalle Valo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210914192515.9273-3-linus.luessing@c0d3.blue \
    --to=linus.luessing@c0d3.blue \
    --cc=ath9k-devel@qca.qualcomm.com \
    --cc=c_manoha@qca.qualcomm.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linville@tuxdriver.com \
    --cc=ll@simonwunderlich.de \
    --cc=nbd@nbd.name \
    --cc=nbd@openwrt.org \
    --cc=netdev@vger.kernel.org \
    --cc=sven@narfation.org \
    --cc=sw@simonwunderlich.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).