All of lore.kernel.org
 help / color / mirror / Atom feed
* ath9k: irq storm after suspend/resume
@ 2011-08-27 11:32 Clemens Buchacher
  2011-08-29  7:55 ` Mohammed Shafi
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-08-27 11:32 UTC (permalink / raw)
  To: linux-wireless

Hi,

I am trying to debug this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=39112

An IRQ storm triggers during resume after suspend to RAM. The IRQ
storm also triggers during insmod after suspend/resume, even if
ath9k is not loaded during suspend.

In the former case, the IRQ storm happens right after the PCI
subsystem restores the config space for ath9k.

In the latter case, the IRQ storm happens immediately after
request_irq in ath_pci_probe. So I have moved ath9k_init_device
ahead of request_irq, in order to call ath9k_hw_set_interrupts(ah,
0) before enabling the interrupt.

But still, the interrupts come. Note that according to
/proc/interrupts, the IRQ line is not shared with any other device.
I did not manage to determine which interrupt it is exactly,
because the device is not in a ready state (SC_OP_INVALID is set)
when they happen (in either scenario that triggers the IRQ storm).
And SC_OP_INVALID is cleared only much later in ath9k_start.

So, I am at a loss. Any ideas?

Below is the output I get during insmod'ing right after resume
(complete logs can be found in the bug report). Below that are the
changes I made to produce the output.

Clemens
---

[   53.387989] ath9k 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[   53.389306] ath9k 0000:02:00.0: setting latency timer to 64
[   53.389371] ath: reset power on
[   53.392725] ath: UNDEFINED -> AWAKE
[   53.392783] ath: serialize_regmode is 0
[   53.438096] ath: Eeprom VER: 14, REV: 13
[   53.438098] ath: Initialize ANI
[   53.438112] ath: Enable MIB counters
[   53.438143] ath: Setup TX queue: 9
[   53.438144] ath: Set queue properties for: 9
[   53.438146] ath: Setup TX queue: 8
[   53.438147] ath: Set queue properties for: 8
[   53.438149] ath: Set queue properties for: 8
[   53.438151] ath: Reset TX queue: 8
[   53.438176] ath: tx ok 0x0 err 0x0 desc 0x100 eol 0x100 urn 0x0
[   53.438178] ath: Setup TX queue: 0
[   53.438179] ath: Set queue properties for: 0
[   53.438181] ath: Setup TX queue: 1
[   53.438182] ath: Set queue properties for: 1
[   53.438184] ath: Setup TX queue: 2
[   53.438185] ath: Set queue properties for: 2
[   53.438186] ath: Setup TX queue: 3
[   53.438187] ath: Set queue properties for: 3
[   53.439692] ath: TX streams 1, RX streams: 1
[   53.439694] ath: EEPROM regdomain: 0x60
[   53.439695] ath: EEPROM indicates we should expect a direct regpair map
[   53.439698] ath: Country alpha2 being used: 00
[   53.439699] ath: Regpair used: 0x60
[   53.439702] cfg80211: Updating information on frequency 2412 MHz for a 20 MHz width channel with regulatory rule:
[   53.439704] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439706] cfg80211: Updating information on frequency 2417 MHz for a 20 MHz width channel with regulatory rule:
[   53.439709] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439710] cfg80211: Updating information on frequency 2422 MHz for a 20 MHz width channel with regulatory rule:
[   53.439713] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439714] cfg80211: Updating information on frequency 2427 MHz for a 20 MHz width channel with regulatory rule:
[   53.439717] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439719] cfg80211: Updating information on frequency 2432 MHz for a 20 MHz width channel with regulatory rule:
[   53.439721] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439723] cfg80211: Updating information on frequency 2437 MHz for a 20 MHz width channel with regulatory rule:
[   53.439725] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439727] cfg80211: Updating information on frequency 2442 MHz for a 20 MHz width channel with regulatory rule:
[   53.439729] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439731] cfg80211: Updating information on frequency 2447 MHz for a 20 MHz width channel with regulatory rule:
[   53.439733] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439735] cfg80211: Updating information on frequency 2452 MHz for a 20 MHz width channel with regulatory rule:
[   53.439737] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439739] cfg80211: Updating information on frequency 2457 MHz for a 20 MHz width channel with regulatory rule:
[   53.439741] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439743] cfg80211: Updating information on frequency 2462 MHz for a 20 MHz width channel with regulatory rule:
[   53.439746] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439747] cfg80211: Updating information on frequency 2467 MHz for a 20 MHz width channel with regulatory rule:
[   53.439750] cfg80211: 2457000 KHz - 2482000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439751] cfg80211: Updating information on frequency 2472 MHz for a 20 MHz width channel with regulatory rule:
[   53.439754] cfg80211: 2457000 KHz - 2482000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439756] cfg80211: Updating information on frequency 2484 MHz for a 20 MHz width channel with regulatory rule:
[   53.439758] cfg80211: 2474000 KHz - 2494000 KHz @  KHz), (N/A mBi, 2000 mBm)
[   53.439761] ath: tx DMA: 512 buffers 1 desc/buf
[   53.439772] ath: tx DMA map: ffff880071f90000 (54600) -> 71f90000 (54600)
[   53.439792] ath: beacon DMA: 4 buffers 1 desc/buf
[   53.439795] ath: beacon DMA map: ffff880071ecc000 (416) -> 71ecc000 (416)
[   53.439797] ath: cachelsz 64 rxbufsize 1926
[   53.439799] ath: rx DMA: 512 buffers 1 desc/buf
[   53.439811] ath: rx DMA map: ffff880071fb0000 (54600) -> 71fb0000 (54600)
[   53.441193] cfg80211: Ignoring regulatory request Set by core since the driver uses its own custom regulatory domain 
[   53.441269] ieee80211 phy1: Selected rate control algorithm 'ath9k_rate_control'
[   53.441628] Registered led device: ath9k-phy1
[   53.441641] ath9k: disabling interrupts
[   53.442979] ath: disable IER
[   53.442990] ath: 0x0 => 0x0
[   53.442992] ath: new IMR 0x0
[   53.443013] ath9k: requesting irq
[   53.444293] ath9k: interrupt with invalid flag set (1 times)
[   53.444296] wait 500 ms
[   53.446255] ath9k: interrupt with invalid flag set (2 times)
[   53.446288] ath9k: interrupt with invalid flag set (4 times)
[   53.446330] ath9k: interrupt with invalid flag set (8 times)
[   53.446374] ath9k: interrupt with invalid flag set (16 times)
[   53.446453] ath9k: interrupt with invalid flag set (32 times)
[   53.446572] ath9k: interrupt with invalid flag set (64 times)
[   53.446821] ath9k: interrupt with invalid flag set (128 times)
[   53.447319] ath9k: interrupt with invalid flag set (256 times)
[   53.448526] ath9k: interrupt with invalid flag set (512 times)
[   53.450942] ath9k: interrupt with invalid flag set (1024 times)
[   53.454793] ath9k: interrupt with invalid flag set (2048 times)
[   53.462493] ath9k: interrupt with invalid flag set (4096 times)
[   53.477738] ath9k: interrupt with invalid flag set (8192 times)
[   53.508164] ath9k: interrupt with invalid flag set (16384 times)
[   53.571798] ath9k: interrupt with invalid flag set (32768 times)
[   53.697472] ath9k: interrupt with invalid flag set (65536 times)
[   53.832522] irq 17: nobody cared (try booting with the "irqpoll" option)
[   53.834253] Pid: 0, comm: kworker/0:0 Tainted: G         C  3.1.0-rc2-ath9k-debug #16
[   53.835991] Call Trace:
[   53.837727]  <IRQ>  [<ffffffff810c64fd>] __report_bad_irq+0x3d/0xe0
[   53.839487]  [<ffffffff810c693d>] note_interrupt+0x14d/0x210
[   53.841220]  [<ffffffff810c49d9>] handle_irq_event_percpu+0xc9/0x2a0
[   53.842952]  [<ffffffff810c4bf8>] handle_irq_event+0x48/0x70
[   53.844659]  [<ffffffff810c727a>] handle_fasteoi_irq+0x5a/0xe0
[   53.846379]  [<ffffffff81016982>] handle_irq+0x22/0x40
[   53.848073]  [<ffffffff813eb45a>] do_IRQ+0x5a/0xe0
[   53.849751]  [<ffffffff813e80ae>] common_interrupt+0x6e/0x6e
[   53.851438]  <EOI>  [<ffffffff813e4a96>] ? schedule+0x316/0x9b0
[   53.853151]  [<ffffffff812732bb>] ? intel_idle+0xcb/0x120
[   53.854822]  [<ffffffff8127329d>] ? intel_idle+0xad/0x120
[   53.856502]  [<ffffffff81311e66>] cpuidle_idle_call+0xc6/0x350
[   53.858166]  [<ffffffff81013229>] cpu_idle+0xc9/0x120
[   53.859806]  [<ffffffff813d6f71>] start_secondary+0x202/0x209
[   53.861438] handlers:
[   53.863009] [<ffffffffa0557600>] ath_isr
[   53.864524] Disabling IRQ #17
[   53.943777] done waiting
[   53.944849] ieee80211 phy1: Atheros AR9285 Rev:2 mem=0xffffc90010ea0000, irq=17

diff --git a/drivers/net/wireless/ath/ath9k/hw.c b/drivers/net/wireless/ath/ath9k/hw.c
index 8dcefe7..fe7780c 100644
--- a/drivers/net/wireless/ath/ath9k/hw.c
+++ b/drivers/net/wireless/ath/ath9k/hw.c
@@ -16,6 +16,7 @@
 
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <linux/delay.h>
 #include <asm/unaligned.h>
 
 #include "hw.h"
@@ -532,6 +533,7 @@ static int __ath9k_hw_init(struct ath_hw *ah)
 	ah->WARegVal |= (AR_WA_D3_L1_DISABLE |
 			 AR_WA_ASPM_TIMER_BASED_DISABLE);
 
+	ath_dbg(common, ATH_DBG_RESET, "reset power on\n");
 	if (!ath9k_hw_set_reset_reg(ah, ATH9K_RESET_POWER_ON)) {
 		ath_err(common, "Couldn't reset chip\n");
 		return -EIO;
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 9098aaa..fdd5dd1 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -16,6 +16,7 @@
 
 #include <linux/nl80211.h>
 #include <linux/delay.h>
+#include <linux/log2.h>
 #include "ath9k.h"
 #include "btcoex.h"
 
@@ -770,14 +771,28 @@ irqreturn_t ath_isr(int irq, void *dev)
 	 * touch anything. Note this can happen early
 	 * on if the IRQ is shared.
 	 */
-	if (sc->sc_flags & SC_OP_INVALID)
+	if (sc->sc_flags & SC_OP_INVALID) {
+		static int count = 0;
+		if (count < INT_MAX) {
+			count++;
+			if (is_power_of_2(count))
+				printk(KERN_DEBUG "ath9k: interrupt with invalid flag set (%d times)\n", count);
+		}
 		return IRQ_NONE;
+	}
 
 
 	/* shared irq, not for us */
 
-	if (!ath9k_hw_intrpend(ah))
+	if (!ath9k_hw_intrpend(ah)) {
+		static int count = 0;
+		if (count < INT_MAX) {
+			count++;
+			if (is_power_of_2(count))
+				printk(KERN_DEBUG "ath9k: shared irq, not for us (%d times)\n", count);
+		}
 		return IRQ_NONE;
+	}
 
 	/*
 	 * Figure out the reason(s) for the interrupt.  Note
@@ -792,8 +807,15 @@ irqreturn_t ath_isr(int irq, void *dev)
 	 * If there are no status bits set, then this interrupt was not
 	 * for me (should have been caught above).
 	 */
-	if (!status)
+	if (!status) {
+		static int count = 0;
+		if (count < INT_MAX) {
+			count++;
+			if (is_power_of_2(count))
+				printk(KERN_DEBUG "ath9k: interrupt, but no status bit set (%d times)\n", count);
+		}
 		return IRQ_NONE;
+	}
 
 	/* Cache the status */
 	sc->intrstatus = status;
diff --git a/drivers/net/wireless/ath/ath9k/pci.c b/drivers/net/wireless/ath/ath9k/pci.c
index be4ea13..6fdabd8 100644
--- a/drivers/net/wireless/ath/ath9k/pci.c
+++ b/drivers/net/wireless/ath/ath9k/pci.c
@@ -242,21 +242,35 @@ static int ath_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	/* Will be cleared in ath9k_start() */
 	sc->sc_flags |= SC_OP_INVALID;
 
+	pci_read_config_word(pdev, PCI_SUBSYSTEM_ID, &subsysid);
+	ret = ath9k_init_device(id->device, sc, subsysid, &ath_pci_bus_ops);
+	if (ret) {
+		dev_err(&pdev->dev, "Failed to initialize device\n");
+		goto err_init;
+	}
+
+	{
+		struct ath_hw *ah = sc->sc_ah;
+		printk(KERN_INFO "ath9k: disabling interrupts\n");
+		ath9k_ps_wakeup(sc);
+		mutex_lock(&sc->mutex);
+		ath9k_hw_set_interrupts(ah, 0);
+		mutex_unlock(&sc->mutex);
+		ath9k_ps_restore(sc);
+	}
+
+	printk(KERN_INFO "ath9k: requesting irq\n");
 	ret = request_irq(pdev->irq, ath_isr, IRQF_SHARED, "ath9k", sc);
 	if (ret) {
 		dev_err(&pdev->dev, "request_irq failed\n");
 		goto err_irq;
 	}
+	printk(KERN_INFO "wait 500 ms\n");
+	msleep(500);
+	printk(KERN_INFO "done waiting\n");
 
 	sc->irq = pdev->irq;
 
-	pci_read_config_word(pdev, PCI_SUBSYSTEM_ID, &subsysid);
-	ret = ath9k_init_device(id->device, sc, subsysid, &ath_pci_bus_ops);
-	if (ret) {
-		dev_err(&pdev->dev, "Failed to initialize device\n");
-		goto err_init;
-	}
-
 	ath9k_hw_name(sc->sc_ah, hw_name, sizeof(hw_name));
 	wiphy_info(hw->wiphy, "%s mem=0x%lx, irq=%d\n",
 		   hw_name, (unsigned long)mem, pdev->irq);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-08-27 11:32 ath9k: irq storm after suspend/resume Clemens Buchacher
@ 2011-08-29  7:55 ` Mohammed Shafi
  2011-08-29 11:53   ` Mohammed Shafi
  0 siblings, 1 reply; 38+ messages in thread
From: Mohammed Shafi @ 2011-08-29  7:55 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: linux-wireless, beta992

On Sat, Aug 27, 2011 at 5:02 PM, Clemens Buchacher <drizzd@aon.at> wrote:
> Hi,
>
Hi Clemens,

> I am trying to debug this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=39112
>
> An IRQ storm triggers during resume after suspend to RAM. The IRQ
> storm also triggers during insmod after suspend/resume, even if
> ath9k is not loaded during suspend.

a recent commit, not sure this will help suspend/resume

commit 0682c9b52bf51fbc67c4e79fcbdadcf70bd600f8
Author: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Date:   Sat Aug 13 10:28:09 2011 +0530

    ath9k: Fix rx overrun interrupt storm

    Whenever RXEOL is received, both RXORN and RXEOL got cleared
    to avoid rx overrun interrupt storm. This was handled only for
    edma chips. The same scenario was also observered with AR9280,
    doing frequent channel type switch b/w HT20/40 with bidi traffic
    that is causing failure to stop rx dma. This patch clears
    the RXEOL & RXORN interrupts for all chips.


>
> In the former case, the IRQ storm happens right after the PCI
> subsystem restores the config space for ath9k.
>
> In the latter case, the IRQ storm happens immediately after
> request_irq in ath_pci_probe. So I have moved ath9k_init_device
> ahead of request_irq, in order to call ath9k_hw_set_interrupts(ah,
> 0) before enabling the interrupt.
>
> But still, the interrupts come. Note that according to
> /proc/interrupts, the IRQ line is not shared with any other device.
> I did not manage to determine which interrupt it is exactly,
> because the device is not in a ready state (SC_OP_INVALID is set)
> when they happen (in either scenario that triggers the IRQ storm).
> And SC_OP_INVALID is cleared only much later in ath9k_start.
>
> So, I am at a loss. Any ideas?

please provide the lspci -vvvxx.
also looking at
/sys/kernel/debug/ieee80211/phy0/ath9k$ sudo cat interrupt.

also this additional information might help:
in case have you seen this is happening in 32 bit also ?
is this happening in wireless-testing  Linux 3.1-rc3 ? or the latest
compat wireless?
i did some preliminary testing, not able to recreate it. will try
further.thanks!

>
> Below is the output I get during insmod'ing right after resume
> (complete logs can be found in the bug report). Below that are the
> changes I made to produce the output.
>
> Clemens
> ---
>
> [   53.387989] ath9k 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> [   53.389306] ath9k 0000:02:00.0: setting latency timer to 64
> [   53.389371] ath: reset power on
> [   53.392725] ath: UNDEFINED -> AWAKE
> [   53.392783] ath: serialize_regmode is 0
> [   53.438096] ath: Eeprom VER: 14, REV: 13
> [   53.438098] ath: Initialize ANI
> [   53.438112] ath: Enable MIB counters
> [   53.438143] ath: Setup TX queue: 9
> [   53.438144] ath: Set queue properties for: 9
> [   53.438146] ath: Setup TX queue: 8
> [   53.438147] ath: Set queue properties for: 8
> [   53.438149] ath: Set queue properties for: 8
> [   53.438151] ath: Reset TX queue: 8
> [   53.438176] ath: tx ok 0x0 err 0x0 desc 0x100 eol 0x100 urn 0x0
> [   53.438178] ath: Setup TX queue: 0
> [   53.438179] ath: Set queue properties for: 0
> [   53.438181] ath: Setup TX queue: 1
> [   53.438182] ath: Set queue properties for: 1
> [   53.438184] ath: Setup TX queue: 2
> [   53.438185] ath: Set queue properties for: 2
> [   53.438186] ath: Setup TX queue: 3
> [   53.438187] ath: Set queue properties for: 3
> [   53.439692] ath: TX streams 1, RX streams: 1
> [   53.439694] ath: EEPROM regdomain: 0x60
> [   53.439695] ath: EEPROM indicates we should expect a direct regpair map
> [   53.439698] ath: Country alpha2 being used: 00
> [   53.439699] ath: Regpair used: 0x60
> [   53.439702] cfg80211: Updating information on frequency 2412 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439704] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439706] cfg80211: Updating information on frequency 2417 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439709] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439710] cfg80211: Updating information on frequency 2422 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439713] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439714] cfg80211: Updating information on frequency 2427 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439717] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439719] cfg80211: Updating information on frequency 2432 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439721] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439723] cfg80211: Updating information on frequency 2437 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439725] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439727] cfg80211: Updating information on frequency 2442 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439729] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439731] cfg80211: Updating information on frequency 2447 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439733] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439735] cfg80211: Updating information on frequency 2452 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439737] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439739] cfg80211: Updating information on frequency 2457 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439741] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439743] cfg80211: Updating information on frequency 2462 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439746] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439747] cfg80211: Updating information on frequency 2467 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439750] cfg80211: 2457000 KHz - 2482000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439751] cfg80211: Updating information on frequency 2472 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439754] cfg80211: 2457000 KHz - 2482000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439756] cfg80211: Updating information on frequency 2484 MHz for a 20 MHz width channel with regulatory rule:
> [   53.439758] cfg80211: 2474000 KHz - 2494000 KHz @  KHz), (N/A mBi, 2000 mBm)
> [   53.439761] ath: tx DMA: 512 buffers 1 desc/buf
> [   53.439772] ath: tx DMA map: ffff880071f90000 (54600) -> 71f90000 (54600)
> [   53.439792] ath: beacon DMA: 4 buffers 1 desc/buf
> [   53.439795] ath: beacon DMA map: ffff880071ecc000 (416) -> 71ecc000 (416)
> [   53.439797] ath: cachelsz 64 rxbufsize 1926
> [   53.439799] ath: rx DMA: 512 buffers 1 desc/buf
> [   53.439811] ath: rx DMA map: ffff880071fb0000 (54600) -> 71fb0000 (54600)
> [   53.441193] cfg80211: Ignoring regulatory request Set by core since the driver uses its own custom regulatory domain
> [   53.441269] ieee80211 phy1: Selected rate control algorithm 'ath9k_rate_control'
> [   53.441628] Registered led device: ath9k-phy1
> [   53.441641] ath9k: disabling interrupts
> [   53.442979] ath: disable IER
> [   53.442990] ath: 0x0 => 0x0
> [   53.442992] ath: new IMR 0x0
> [   53.443013] ath9k: requesting irq
> [   53.444293] ath9k: interrupt with invalid flag set (1 times)
> [   53.444296] wait 500 ms
> [   53.446255] ath9k: interrupt with invalid flag set (2 times)
> [   53.446288] ath9k: interrupt with invalid flag set (4 times)
> [   53.446330] ath9k: interrupt with invalid flag set (8 times)
> [   53.446374] ath9k: interrupt with invalid flag set (16 times)
> [   53.446453] ath9k: interrupt with invalid flag set (32 times)
> [   53.446572] ath9k: interrupt with invalid flag set (64 times)
> [   53.446821] ath9k: interrupt with invalid flag set (128 times)
> [   53.447319] ath9k: interrupt with invalid flag set (256 times)
> [   53.448526] ath9k: interrupt with invalid flag set (512 times)
> [   53.450942] ath9k: interrupt with invalid flag set (1024 times)
> [   53.454793] ath9k: interrupt with invalid flag set (2048 times)
> [   53.462493] ath9k: interrupt with invalid flag set (4096 times)
> [   53.477738] ath9k: interrupt with invalid flag set (8192 times)
> [   53.508164] ath9k: interrupt with invalid flag set (16384 times)
> [   53.571798] ath9k: interrupt with invalid flag set (32768 times)
> [   53.697472] ath9k: interrupt with invalid flag set (65536 times)
> [   53.832522] irq 17: nobody cared (try booting with the "irqpoll" option)
> [   53.834253] Pid: 0, comm: kworker/0:0 Tainted: G         C  3.1.0-rc2-ath9k-debug #16
> [   53.835991] Call Trace:
> [   53.837727]  <IRQ>  [<ffffffff810c64fd>] __report_bad_irq+0x3d/0xe0
> [   53.839487]  [<ffffffff810c693d>] note_interrupt+0x14d/0x210
> [   53.841220]  [<ffffffff810c49d9>] handle_irq_event_percpu+0xc9/0x2a0
> [   53.842952]  [<ffffffff810c4bf8>] handle_irq_event+0x48/0x70
> [   53.844659]  [<ffffffff810c727a>] handle_fasteoi_irq+0x5a/0xe0
> [   53.846379]  [<ffffffff81016982>] handle_irq+0x22/0x40
> [   53.848073]  [<ffffffff813eb45a>] do_IRQ+0x5a/0xe0
> [   53.849751]  [<ffffffff813e80ae>] common_interrupt+0x6e/0x6e
> [   53.851438]  <EOI>  [<ffffffff813e4a96>] ? schedule+0x316/0x9b0
> [   53.853151]  [<ffffffff812732bb>] ? intel_idle+0xcb/0x120
> [   53.854822]  [<ffffffff8127329d>] ? intel_idle+0xad/0x120
> [   53.856502]  [<ffffffff81311e66>] cpuidle_idle_call+0xc6/0x350
> [   53.858166]  [<ffffffff81013229>] cpu_idle+0xc9/0x120
> [   53.859806]  [<ffffffff813d6f71>] start_secondary+0x202/0x209
> [   53.861438] handlers:
> [   53.863009] [<ffffffffa0557600>] ath_isr
> [   53.864524] Disabling IRQ #17
> [   53.943777] done waiting
> [   53.944849] ieee80211 phy1: Atheros AR9285 Rev:2 mem=0xffffc90010ea0000, irq=17
>
> diff --git a/drivers/net/wireless/ath/ath9k/hw.c b/drivers/net/wireless/ath/ath9k/hw.c
> index 8dcefe7..fe7780c 100644
> --- a/drivers/net/wireless/ath/ath9k/hw.c
> +++ b/drivers/net/wireless/ath/ath9k/hw.c
> @@ -16,6 +16,7 @@
>
>  #include <linux/io.h>
>  #include <linux/slab.h>
> +#include <linux/delay.h>
>  #include <asm/unaligned.h>
>
>  #include "hw.h"
> @@ -532,6 +533,7 @@ static int __ath9k_hw_init(struct ath_hw *ah)
>        ah->WARegVal |= (AR_WA_D3_L1_DISABLE |
>                         AR_WA_ASPM_TIMER_BASED_DISABLE);
>
> +       ath_dbg(common, ATH_DBG_RESET, "reset power on\n");
>        if (!ath9k_hw_set_reset_reg(ah, ATH9K_RESET_POWER_ON)) {
>                ath_err(common, "Couldn't reset chip\n");
>                return -EIO;
> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
> index 9098aaa..fdd5dd1 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -16,6 +16,7 @@
>
>  #include <linux/nl80211.h>
>  #include <linux/delay.h>
> +#include <linux/log2.h>
>  #include "ath9k.h"
>  #include "btcoex.h"
>
> @@ -770,14 +771,28 @@ irqreturn_t ath_isr(int irq, void *dev)
>         * touch anything. Note this can happen early
>         * on if the IRQ is shared.
>         */
> -       if (sc->sc_flags & SC_OP_INVALID)
> +       if (sc->sc_flags & SC_OP_INVALID) {
> +               static int count = 0;
> +               if (count < INT_MAX) {
> +                       count++;
> +                       if (is_power_of_2(count))
> +                               printk(KERN_DEBUG "ath9k: interrupt with invalid flag set (%d times)\n", count);
> +               }
>                return IRQ_NONE;
> +       }
>
>
>        /* shared irq, not for us */
>
> -       if (!ath9k_hw_intrpend(ah))
> +       if (!ath9k_hw_intrpend(ah)) {
> +               static int count = 0;
> +               if (count < INT_MAX) {
> +                       count++;
> +                       if (is_power_of_2(count))
> +                               printk(KERN_DEBUG "ath9k: shared irq, not for us (%d times)\n", count);
> +               }
>                return IRQ_NONE;
> +       }
>
>        /*
>         * Figure out the reason(s) for the interrupt.  Note
> @@ -792,8 +807,15 @@ irqreturn_t ath_isr(int irq, void *dev)
>         * If there are no status bits set, then this interrupt was not
>         * for me (should have been caught above).
>         */
> -       if (!status)
> +       if (!status) {
> +               static int count = 0;
> +               if (count < INT_MAX) {
> +                       count++;
> +                       if (is_power_of_2(count))
> +                               printk(KERN_DEBUG "ath9k: interrupt, but no status bit set (%d times)\n", count);
> +               }
>                return IRQ_NONE;
> +       }
>
>        /* Cache the status */
>        sc->intrstatus = status;
> diff --git a/drivers/net/wireless/ath/ath9k/pci.c b/drivers/net/wireless/ath/ath9k/pci.c
> index be4ea13..6fdabd8 100644
> --- a/drivers/net/wireless/ath/ath9k/pci.c
> +++ b/drivers/net/wireless/ath/ath9k/pci.c
> @@ -242,21 +242,35 @@ static int ath_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>        /* Will be cleared in ath9k_start() */
>        sc->sc_flags |= SC_OP_INVALID;
>
> +       pci_read_config_word(pdev, PCI_SUBSYSTEM_ID, &subsysid);
> +       ret = ath9k_init_device(id->device, sc, subsysid, &ath_pci_bus_ops);
> +       if (ret) {
> +               dev_err(&pdev->dev, "Failed to initialize device\n");
> +               goto err_init;
> +       }
> +
> +       {
> +               struct ath_hw *ah = sc->sc_ah;
> +               printk(KERN_INFO "ath9k: disabling interrupts\n");
> +               ath9k_ps_wakeup(sc);
> +               mutex_lock(&sc->mutex);
> +               ath9k_hw_set_interrupts(ah, 0);
> +               mutex_unlock(&sc->mutex);
> +               ath9k_ps_restore(sc);
> +       }
> +
> +       printk(KERN_INFO "ath9k: requesting irq\n");
>        ret = request_irq(pdev->irq, ath_isr, IRQF_SHARED, "ath9k", sc);
>        if (ret) {
>                dev_err(&pdev->dev, "request_irq failed\n");
>                goto err_irq;
>        }
> +       printk(KERN_INFO "wait 500 ms\n");
> +       msleep(500);
> +       printk(KERN_INFO "done waiting\n");
>
>        sc->irq = pdev->irq;
>
> -       pci_read_config_word(pdev, PCI_SUBSYSTEM_ID, &subsysid);
> -       ret = ath9k_init_device(id->device, sc, subsysid, &ath_pci_bus_ops);
> -       if (ret) {
> -               dev_err(&pdev->dev, "Failed to initialize device\n");
> -               goto err_init;
> -       }
> -
>        ath9k_hw_name(sc->sc_ah, hw_name, sizeof(hw_name));
>        wiphy_info(hw->wiphy, "%s mem=0x%lx, irq=%d\n",
>                   hw_name, (unsigned long)mem, pdev->irq);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
shafi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-08-29  7:55 ` Mohammed Shafi
@ 2011-08-29 11:53   ` Mohammed Shafi
  2011-08-29 15:12     ` Mohammed Shafi
  0 siblings, 1 reply; 38+ messages in thread
From: Mohammed Shafi @ 2011-08-29 11:53 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: linux-wireless, beta992

On Mon, Aug 29, 2011 at 1:25 PM, Mohammed Shafi
<shafi.wireless@gmail.com> wrote:
> On Sat, Aug 27, 2011 at 5:02 PM, Clemens Buchacher <drizzd@aon.at> wrote:
>> Hi,
>>
> Hi Clemens,
>
>> I am trying to debug this issue:
>> https://bugzilla.kernel.org/show_bug.cgi?id=39112
>>
>> An IRQ storm triggers during resume after suspend to RAM. The IRQ
>> storm also triggers during insmod after suspend/resume, even if
>> ath9k is not loaded during suspend.
>
> a recent commit, not sure this will help suspend/resume
>
> commit 0682c9b52bf51fbc67c4e79fcbdadcf70bd600f8
> Author: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
> Date:   Sat Aug 13 10:28:09 2011 +0530
>
>    ath9k: Fix rx overrun interrupt storm
>
>    Whenever RXEOL is received, both RXORN and RXEOL got cleared
>    to avoid rx overrun interrupt storm. This was handled only for
>    edma chips. The same scenario was also observered with AR9280,
>    doing frequent channel type switch b/w HT20/40 with bidi traffic
>    that is causing failure to stop rx dma. This patch clears
>    the RXEOL & RXORN interrupts for all chips.
>
>
>>
>> In the former case, the IRQ storm happens right after the PCI
>> subsystem restores the config space for ath9k.
>>
>> In the latter case, the IRQ storm happens immediately after
>> request_irq in ath_pci_probe. So I have moved ath9k_init_device
>> ahead of request_irq, in order to call ath9k_hw_set_interrupts(ah,
>> 0) before enabling the interrupt.
>>
>> But still, the interrupts come. Note that according to
>> /proc/interrupts, the IRQ line is not shared with any other device.
>> I did not manage to determine which interrupt it is exactly,
>> because the device is not in a ready state (SC_OP_INVALID is set)
>> when they happen (in either scenario that triggers the IRQ storm).
>> And SC_OP_INVALID is cleared only much later in ath9k_start.
>>
>> So, I am at a loss. Any ideas?
>
> please provide the lspci -vvvxx.
> also looking at
> /sys/kernel/debug/ieee80211/phy0/ath9k$ sudo cat interrupt.

hi, i think this will help, please get the message sudo modprobe ath9k
debug=0xffffffff.
few fatal PCI interrupt messages are based on ATH_DEBUG_ANY.
thanks.

>
> also this additional information might help:
> in case have you seen this is happening in 32 bit also ?
> is this happening in wireless-testing  Linux 3.1-rc3 ? or the latest
> compat wireless?
> i did some preliminary testing, not able to recreate it. will try
> further.thanks!
>
>>
>> Below is the output I get during insmod'ing right after resume
>> (complete logs can be found in the bug report). Below that are the
>> changes I made to produce the output.
>>
>> Clemens
>> ---
>>
>> [   53.387989] ath9k 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
>> [   53.389306] ath9k 0000:02:00.0: setting latency timer to 64
>> [   53.389371] ath: reset power on
>> [   53.392725] ath: UNDEFINED -> AWAKE
>> [   53.392783] ath: serialize_regmode is 0
>> [   53.438096] ath: Eeprom VER: 14, REV: 13
>> [   53.438098] ath: Initialize ANI
>> [   53.438112] ath: Enable MIB counters
>> [   53.438143] ath: Setup TX queue: 9
>> [   53.438144] ath: Set queue properties for: 9
>> [   53.438146] ath: Setup TX queue: 8
>> [   53.438147] ath: Set queue properties for: 8
>> [   53.438149] ath: Set queue properties for: 8
>> [   53.438151] ath: Reset TX queue: 8
>> [   53.438176] ath: tx ok 0x0 err 0x0 desc 0x100 eol 0x100 urn 0x0
>> [   53.438178] ath: Setup TX queue: 0
>> [   53.438179] ath: Set queue properties for: 0
>> [   53.438181] ath: Setup TX queue: 1
>> [   53.438182] ath: Set queue properties for: 1
>> [   53.438184] ath: Setup TX queue: 2
>> [   53.438185] ath: Set queue properties for: 2
>> [   53.438186] ath: Setup TX queue: 3
>> [   53.438187] ath: Set queue properties for: 3
>> [   53.439692] ath: TX streams 1, RX streams: 1
>> [   53.439694] ath: EEPROM regdomain: 0x60
>> [   53.439695] ath: EEPROM indicates we should expect a direct regpair map
>> [   53.439698] ath: Country alpha2 being used: 00
>> [   53.439699] ath: Regpair used: 0x60
>> [   53.439702] cfg80211: Updating information on frequency 2412 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439704] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439706] cfg80211: Updating information on frequency 2417 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439709] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439710] cfg80211: Updating information on frequency 2422 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439713] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439714] cfg80211: Updating information on frequency 2427 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439717] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439719] cfg80211: Updating information on frequency 2432 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439721] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439723] cfg80211: Updating information on frequency 2437 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439725] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439727] cfg80211: Updating information on frequency 2442 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439729] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439731] cfg80211: Updating information on frequency 2447 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439733] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439735] cfg80211: Updating information on frequency 2452 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439737] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439739] cfg80211: Updating information on frequency 2457 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439741] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439743] cfg80211: Updating information on frequency 2462 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439746] cfg80211: 2402000 KHz - 2472000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439747] cfg80211: Updating information on frequency 2467 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439750] cfg80211: 2457000 KHz - 2482000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439751] cfg80211: Updating information on frequency 2472 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439754] cfg80211: 2457000 KHz - 2482000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439756] cfg80211: Updating information on frequency 2484 MHz for a 20 MHz width channel with regulatory rule:
>> [   53.439758] cfg80211: 2474000 KHz - 2494000 KHz @  KHz), (N/A mBi, 2000 mBm)
>> [   53.439761] ath: tx DMA: 512 buffers 1 desc/buf
>> [   53.439772] ath: tx DMA map: ffff880071f90000 (54600) -> 71f90000 (54600)
>> [   53.439792] ath: beacon DMA: 4 buffers 1 desc/buf
>> [   53.439795] ath: beacon DMA map: ffff880071ecc000 (416) -> 71ecc000 (416)
>> [   53.439797] ath: cachelsz 64 rxbufsize 1926
>> [   53.439799] ath: rx DMA: 512 buffers 1 desc/buf
>> [   53.439811] ath: rx DMA map: ffff880071fb0000 (54600) -> 71fb0000 (54600)
>> [   53.441193] cfg80211: Ignoring regulatory request Set by core since the driver uses its own custom regulatory domain
>> [   53.441269] ieee80211 phy1: Selected rate control algorithm 'ath9k_rate_control'
>> [   53.441628] Registered led device: ath9k-phy1
>> [   53.441641] ath9k: disabling interrupts
>> [   53.442979] ath: disable IER
>> [   53.442990] ath: 0x0 => 0x0
>> [   53.442992] ath: new IMR 0x0
>> [   53.443013] ath9k: requesting irq
>> [   53.444293] ath9k: interrupt with invalid flag set (1 times)
>> [   53.444296] wait 500 ms
>> [   53.446255] ath9k: interrupt with invalid flag set (2 times)
>> [   53.446288] ath9k: interrupt with invalid flag set (4 times)
>> [   53.446330] ath9k: interrupt with invalid flag set (8 times)
>> [   53.446374] ath9k: interrupt with invalid flag set (16 times)
>> [   53.446453] ath9k: interrupt with invalid flag set (32 times)
>> [   53.446572] ath9k: interrupt with invalid flag set (64 times)
>> [   53.446821] ath9k: interrupt with invalid flag set (128 times)
>> [   53.447319] ath9k: interrupt with invalid flag set (256 times)
>> [   53.448526] ath9k: interrupt with invalid flag set (512 times)
>> [   53.450942] ath9k: interrupt with invalid flag set (1024 times)
>> [   53.454793] ath9k: interrupt with invalid flag set (2048 times)
>> [   53.462493] ath9k: interrupt with invalid flag set (4096 times)
>> [   53.477738] ath9k: interrupt with invalid flag set (8192 times)
>> [   53.508164] ath9k: interrupt with invalid flag set (16384 times)
>> [   53.571798] ath9k: interrupt with invalid flag set (32768 times)
>> [   53.697472] ath9k: interrupt with invalid flag set (65536 times)
>> [   53.832522] irq 17: nobody cared (try booting with the "irqpoll" option)
>> [   53.834253] Pid: 0, comm: kworker/0:0 Tainted: G         C  3.1.0-rc2-ath9k-debug #16
>> [   53.835991] Call Trace:
>> [   53.837727]  <IRQ>  [<ffffffff810c64fd>] __report_bad_irq+0x3d/0xe0
>> [   53.839487]  [<ffffffff810c693d>] note_interrupt+0x14d/0x210
>> [   53.841220]  [<ffffffff810c49d9>] handle_irq_event_percpu+0xc9/0x2a0
>> [   53.842952]  [<ffffffff810c4bf8>] handle_irq_event+0x48/0x70
>> [   53.844659]  [<ffffffff810c727a>] handle_fasteoi_irq+0x5a/0xe0
>> [   53.846379]  [<ffffffff81016982>] handle_irq+0x22/0x40
>> [   53.848073]  [<ffffffff813eb45a>] do_IRQ+0x5a/0xe0
>> [   53.849751]  [<ffffffff813e80ae>] common_interrupt+0x6e/0x6e
>> [   53.851438]  <EOI>  [<ffffffff813e4a96>] ? schedule+0x316/0x9b0
>> [   53.853151]  [<ffffffff812732bb>] ? intel_idle+0xcb/0x120
>> [   53.854822]  [<ffffffff8127329d>] ? intel_idle+0xad/0x120
>> [   53.856502]  [<ffffffff81311e66>] cpuidle_idle_call+0xc6/0x350
>> [   53.858166]  [<ffffffff81013229>] cpu_idle+0xc9/0x120
>> [   53.859806]  [<ffffffff813d6f71>] start_secondary+0x202/0x209
>> [   53.861438] handlers:
>> [   53.863009] [<ffffffffa0557600>] ath_isr
>> [   53.864524] Disabling IRQ #17
>> [   53.943777] done waiting
>> [   53.944849] ieee80211 phy1: Atheros AR9285 Rev:2 mem=0xffffc90010ea0000, irq=17
>>
>> diff --git a/drivers/net/wireless/ath/ath9k/hw.c b/drivers/net/wireless/ath/ath9k/hw.c
>> index 8dcefe7..fe7780c 100644
>> --- a/drivers/net/wireless/ath/ath9k/hw.c
>> +++ b/drivers/net/wireless/ath/ath9k/hw.c
>> @@ -16,6 +16,7 @@
>>
>>  #include <linux/io.h>
>>  #include <linux/slab.h>
>> +#include <linux/delay.h>
>>  #include <asm/unaligned.h>
>>
>>  #include "hw.h"
>> @@ -532,6 +533,7 @@ static int __ath9k_hw_init(struct ath_hw *ah)
>>        ah->WARegVal |= (AR_WA_D3_L1_DISABLE |
>>                         AR_WA_ASPM_TIMER_BASED_DISABLE);
>>
>> +       ath_dbg(common, ATH_DBG_RESET, "reset power on\n");
>>        if (!ath9k_hw_set_reset_reg(ah, ATH9K_RESET_POWER_ON)) {
>>                ath_err(common, "Couldn't reset chip\n");
>>                return -EIO;
>> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
>> index 9098aaa..fdd5dd1 100644
>> --- a/drivers/net/wireless/ath/ath9k/main.c
>> +++ b/drivers/net/wireless/ath/ath9k/main.c
>> @@ -16,6 +16,7 @@
>>
>>  #include <linux/nl80211.h>
>>  #include <linux/delay.h>
>> +#include <linux/log2.h>
>>  #include "ath9k.h"
>>  #include "btcoex.h"
>>
>> @@ -770,14 +771,28 @@ irqreturn_t ath_isr(int irq, void *dev)
>>         * touch anything. Note this can happen early
>>         * on if the IRQ is shared.
>>         */
>> -       if (sc->sc_flags & SC_OP_INVALID)
>> +       if (sc->sc_flags & SC_OP_INVALID) {
>> +               static int count = 0;
>> +               if (count < INT_MAX) {
>> +                       count++;
>> +                       if (is_power_of_2(count))
>> +                               printk(KERN_DEBUG "ath9k: interrupt with invalid flag set (%d times)\n", count);
>> +               }
>>                return IRQ_NONE;
>> +       }
>>
>>
>>        /* shared irq, not for us */
>>
>> -       if (!ath9k_hw_intrpend(ah))
>> +       if (!ath9k_hw_intrpend(ah)) {
>> +               static int count = 0;
>> +               if (count < INT_MAX) {
>> +                       count++;
>> +                       if (is_power_of_2(count))
>> +                               printk(KERN_DEBUG "ath9k: shared irq, not for us (%d times)\n", count);
>> +               }
>>                return IRQ_NONE;
>> +       }
>>
>>        /*
>>         * Figure out the reason(s) for the interrupt.  Note
>> @@ -792,8 +807,15 @@ irqreturn_t ath_isr(int irq, void *dev)
>>         * If there are no status bits set, then this interrupt was not
>>         * for me (should have been caught above).
>>         */
>> -       if (!status)
>> +       if (!status) {
>> +               static int count = 0;
>> +               if (count < INT_MAX) {
>> +                       count++;
>> +                       if (is_power_of_2(count))
>> +                               printk(KERN_DEBUG "ath9k: interrupt, but no status bit set (%d times)\n", count);
>> +               }
>>                return IRQ_NONE;
>> +       }
>>
>>        /* Cache the status */
>>        sc->intrstatus = status;
>> diff --git a/drivers/net/wireless/ath/ath9k/pci.c b/drivers/net/wireless/ath/ath9k/pci.c
>> index be4ea13..6fdabd8 100644
>> --- a/drivers/net/wireless/ath/ath9k/pci.c
>> +++ b/drivers/net/wireless/ath/ath9k/pci.c
>> @@ -242,21 +242,35 @@ static int ath_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>        /* Will be cleared in ath9k_start() */
>>        sc->sc_flags |= SC_OP_INVALID;
>>
>> +       pci_read_config_word(pdev, PCI_SUBSYSTEM_ID, &subsysid);
>> +       ret = ath9k_init_device(id->device, sc, subsysid, &ath_pci_bus_ops);
>> +       if (ret) {
>> +               dev_err(&pdev->dev, "Failed to initialize device\n");
>> +               goto err_init;
>> +       }
>> +
>> +       {
>> +               struct ath_hw *ah = sc->sc_ah;
>> +               printk(KERN_INFO "ath9k: disabling interrupts\n");
>> +               ath9k_ps_wakeup(sc);
>> +               mutex_lock(&sc->mutex);
>> +               ath9k_hw_set_interrupts(ah, 0);
>> +               mutex_unlock(&sc->mutex);
>> +               ath9k_ps_restore(sc);
>> +       }
>> +
>> +       printk(KERN_INFO "ath9k: requesting irq\n");
>>        ret = request_irq(pdev->irq, ath_isr, IRQF_SHARED, "ath9k", sc);
>>        if (ret) {
>>                dev_err(&pdev->dev, "request_irq failed\n");
>>                goto err_irq;
>>        }
>> +       printk(KERN_INFO "wait 500 ms\n");
>> +       msleep(500);
>> +       printk(KERN_INFO "done waiting\n");
>>
>>        sc->irq = pdev->irq;
>>
>> -       pci_read_config_word(pdev, PCI_SUBSYSTEM_ID, &subsysid);
>> -       ret = ath9k_init_device(id->device, sc, subsysid, &ath_pci_bus_ops);
>> -       if (ret) {
>> -               dev_err(&pdev->dev, "Failed to initialize device\n");
>> -               goto err_init;
>> -       }
>> -
>>        ath9k_hw_name(sc->sc_ah, hw_name, sizeof(hw_name));
>>        wiphy_info(hw->wiphy, "%s mem=0x%lx, irq=%d\n",
>>                   hw_name, (unsigned long)mem, pdev->irq);
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> shafi
>



-- 
shafi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-08-29 11:53   ` Mohammed Shafi
@ 2011-08-29 15:12     ` Mohammed Shafi
  2011-08-30  6:41       ` Clemens Buchacher
  0 siblings, 1 reply; 38+ messages in thread
From: Mohammed Shafi @ 2011-08-29 15:12 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: linux-wireless, beta992

>>> But still, the interrupts come. Note that according to
>>> /proc/interrupts, the IRQ line is not shared with any other device.
>>> I did not manage to determine which interrupt it is exactly,
>>> because the device is not in a ready state (SC_OP_INVALID is set)
>>> when they happen (in either scenario that triggers the IRQ storm).
>>> And SC_OP_INVALID is cleared only much later in ath9k_start.
>>>
>>> So, I am at a loss. Any ideas?
>>
>> please provide the lspci -vvvxx.
>> also looking at
>> /sys/kernel/debug/ieee80211/phy0/ath9k$ sudo cat interrupt.
>
> hi, i think this will help, please get the message sudo modprobe ath9k
> debug=0xffffffff.
> few fatal PCI interrupt messages are based on ATH_DEBUG_ANY.
> thanks.
>
we can also try to disable MIB interrupts though its handled properly
now in ath9k

http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2008-09-25/0001-ath9k-disable-MIB-interrupts-to-fix-interrupt-storm.patch


-- 
shafi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-08-29 15:12     ` Mohammed Shafi
@ 2011-08-30  6:41       ` Clemens Buchacher
  2011-08-30  9:33         ` Mohammed Shafi
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-08-30  6:41 UTC (permalink / raw)
  To: Mohammed Shafi; +Cc: linux-wireless, beta992

Hi Mohammed,

On Mon, Aug 29, 2011 at 08:42:33PM +0530, Mohammed Shafi wrote:
>
> >> But still, the interrupts come. Note that according to
> >> /proc/interrupts, the IRQ line is not shared with any other device.
> >> I did not manage to determine which interrupt it is exactly,
> >> because the device is not in a ready state (SC_OP_INVALID is set)
> >> when they happen (in either scenario that triggers the IRQ storm).
> >> And SC_OP_INVALID is cleared only much later in ath9k_start.
> >>
> >> So, I am at a loss. Any ideas?
> >
> > please provide the lspci -vvvxx.

Please see below.

> >> also looking at
> >> /sys/kernel/debug/ieee80211/phy0/ath9k$ sudo cat interrupt.

Those interrupt counters are always zero, because ath_isr never
gets to the point where it would gather statistics. The interrupt
routine exits right at the start, because SC_OP_INVALID is still
set.

        if (sc->sc_flags & SC_OP_INVALID)
                return IRQ_NONE;

By the time the invalid flag is cleared, the IRQ line has long
since been disabled, due to 10000 spurios interrupts during less
than 500 ms.

> > hi, i think this will help, please get the message sudo modprobe ath9k
> > debug=0xffffffff.
> > few fatal PCI interrupt messages are based on ATH_DEBUG_ANY.

Whenever I did that in the past, it just added lots of PDADC debug
messages.

> we can also try to disable MIB interrupts though its handled properly
> now in ath9k
> 
> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2008-09-25/0001-ath9k-disable-MIB-interrupts-to-fix-interrupt-storm.patch

But I am already disabling all interrupts by setting the mask to 0.
Unless there are some non-maskable ones?

I wonder if the device is in some crashed state at this point. Is
it possible to reset the device in ath_pci_probe?

> a recent commit, not sure this will help suspend/resume
>
> commit 0682c9b52bf51fbc67c4e79fcbdadcf70bd600f8
> Author: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
> Date:   Sat Aug 13 10:28:09 2011 +0530
>
>    ath9k: Fix rx overrun interrupt storm

For the same reason as above, this patch does not touch any code
that would get executed.

> > also this additional information might help:
> > in case have you seen this is happening in 32 bit also ?

I have never had a 32-bit system on this machine.

> > is this happening in wireless-testing  Linux 3.1-rc3 ? or the latest
> > compat wireless?

I think I tried last week, but I can try again.

> > i did some preliminary testing, not able to recreate it. will try
> > further.thanks!

Thanks for trying. Did you turn off network manager? As I described
here, it can make the bug go away.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=39112#c5

Clemens
---

02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
	Subsystem: AzureWave Device 1089
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at d2c00000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [60] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 00-15-17-ff-ff-24-14-12
	Capabilities: [170 v1] Power Budgeting <?>
	Kernel driver in use: ath9k
	Kernel modules: ath9k
00: 8c 16 2b 00 07 00 10 00 01 00 80 02 10 00 00 00
10: 04 00 c0 d2 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3b 1a 89 10
30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 00 00

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-08-30  6:41       ` Clemens Buchacher
@ 2011-08-30  9:33         ` Mohammed Shafi
  2011-08-30  9:41           ` Mohammed Shafi
  2011-09-01  6:24           ` Clemens Buchacher
  0 siblings, 2 replies; 38+ messages in thread
From: Mohammed Shafi @ 2011-08-30  9:33 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: linux-wireless, beta992

On Tue, Aug 30, 2011 at 12:11 PM, Clemens Buchacher <drizzd@aon.at> wrote:
> Hi Mohammed,
>
> On Mon, Aug 29, 2011 at 08:42:33PM +0530, Mohammed Shafi wrote:
>>
>> >> But still, the interrupts come. Note that according to
>> >> /proc/interrupts, the IRQ line is not shared with any other device.
>> >> I did not manage to determine which interrupt it is exactly,
>> >> because the device is not in a ready state (SC_OP_INVALID is set)
>> >> when they happen (in either scenario that triggers the IRQ storm).
>> >> And SC_OP_INVALID is cleared only much later in ath9k_start.
>> >>
>> >> So, I am at a loss. Any ideas?
>> >
>> > please provide the lspci -vvvxx.
>
> Please see below.

thanks!

>
>> >> also looking at
>> >> /sys/kernel/debug/ieee80211/phy0/ath9k$ sudo cat interrupt.
>
> Those interrupt counters are always zero, because ath_isr never
> gets to the point where it would gather statistics. The interrupt
> routine exits right at the start, because SC_OP_INVALID is still
> set.

yes it is, though not a good idea, just thinking of we could get some
thing by not setting SC_OP_INVALID flag in ath_pci_probe(it was added
to fix a panic, but it did not cause panic for me now).

>
>        if (sc->sc_flags & SC_OP_INVALID)
>                return IRQ_NONE;
>
> By the time the invalid flag is cleared, the IRQ line has long
> since been disabled, due to 10000 spurios interrupts during less
> than 500 ms.
>
>> > hi, i think this will help, please get the message sudo modprobe ath9k
>> > debug=0xffffffff.
>> > few fatal PCI interrupt messages are based on ATH_DEBUG_ANY.
>
> Whenever I did that in the past, it just added lots of PDADC debug
> messages.

though we might get some PCI fatal interrupts.


>
>> we can also try to disable MIB interrupts though its handled properly
>> now in ath9k
>>
>> http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2008-09-25/0001-ath9k-disable-MIB-interrupts-to-fix-interrupt-storm.patch
>
> But I am already disabling all interrupts by setting the mask to 0.
> Unless there are some non-maskable ones?
>
> I wonder if the device is in some crashed state at this point. Is
> it possible to reset the device in ath_pci_probe?

i don't think ath_reset cannot be called

>
>> a recent commit, not sure this will help suspend/resume
>>
>> commit 0682c9b52bf51fbc67c4e79fcbdadcf70bd600f8
>> Author: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
>> Date:   Sat Aug 13 10:28:09 2011 +0530
>>
>>    ath9k: Fix rx overrun interrupt storm
>
> For the same reason as above, this patch does not touch any code
> that would get executed.
>
>> > also this additional information might help:
>> > in case have you seen this is happening in 32 bit also ?
>
> I have never had a 32-bit system on this machine.
>
>> > is this happening in wireless-testing  Linux 3.1-rc3 ? or the latest
>> > compat wireless?
>
> I think I tried last week, but I can try again.
>
>> > i did some preliminary testing, not able to recreate it. will try
>> > further.thanks!
>
> Thanks for trying. Did you turn off network manager? As I described
> here, it can make the bug go away.

i am bit confused looking at bug comments. please correct me.
the bug comments say that disabling/making the Network-Manager to
sleep triggers the problem.


>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=39112#c5
>
> Clemens
> ---
>
> 02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
>        Subsystem: AzureWave Device 1089
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0, Cache Line Size: 64 bytes
>        Interrupt: pin A routed to IRQ 17
>        Region 0: Memory at d2c00000 (64-bit, non-prefetchable) [size=64K]
>        Capabilities: [40] Power Management version 3
>                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
>                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
>                Address: 00000000  Data: 0000
>        Capabilities: [60] Express (v2) Legacy Endpoint, MSI 00
>                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                        MaxPayload 128 bytes, MaxReadReq 512 bytes
>                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
>                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
>                        ClockPM- Surprise- LLActRep- BwNot-
>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                DevCap2: Completion Timeout: Not Supported, TimeoutDis+
>                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
>                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
>                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                         Compliance De-emphasis: -6dB
>                LnkSta2: Current De-emphasis Level: -6dB
>        Capabilities: [100 v1] Advanced Error Reporting
>                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>        Capabilities: [140 v1] Virtual Channel
>                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                Arb:    Fixed- WRR32- WRR64- WRR128-
>                Ctrl:   ArbSelect=Fixed
>                Status: InProgress-
>                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                        Status: NegoPending- InProgress-
>        Capabilities: [160 v1] Device Serial Number 00-15-17-ff-ff-24-14-12
>        Capabilities: [170 v1] Power Budgeting <?>
>        Kernel driver in use: ath9k
>        Kernel modules: ath9k
> 00: 8c 16 2b 00 07 00 10 00 01 00 80 02 10 00 00 00
> 10: 04 00 c0 d2 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 3b 1a 89 10
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 00 00
>



-- 
shafi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-08-30  9:33         ` Mohammed Shafi
@ 2011-08-30  9:41           ` Mohammed Shafi
  2011-09-01  6:24           ` Clemens Buchacher
  1 sibling, 0 replies; 38+ messages in thread
From: Mohammed Shafi @ 2011-08-30  9:41 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: linux-wireless, beta992

> yes it is, though not a good idea, just thinking of we could get some
> thing by not setting SC_OP_INVALID flag in ath_pci_probe(it was added
> to fix a panic, but it did not cause panic for me now).

sorry please don't try it, not setting the SC_OP_INVALID flag in
ath_pci_probe, the system did panic.

>
>>
>>        if (sc->sc_flags & SC_OP_INVALID)
>>                return IRQ_NONE;
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-08-30  9:33         ` Mohammed Shafi
  2011-08-30  9:41           ` Mohammed Shafi
@ 2011-09-01  6:24           ` Clemens Buchacher
  2011-09-26  9:24             ` Mohammed Shafi
  1 sibling, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-09-01  6:24 UTC (permalink / raw)
  To: Mohammed Shafi; +Cc: linux-wireless, beta992

Mohammed Shafi <shafi.wireless@gmail.com> wrote:
>
>i am bit confused looking at bug comments. please correct me.
>the bug comments say that disabling/making the Network-Manager to
>sleep triggers the problem.
That is correct.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-09-01  6:24           ` Clemens Buchacher
@ 2011-09-26  9:24             ` Mohammed Shafi
  2011-09-27 21:42               ` Clemens Buchacher
  0 siblings, 1 reply; 38+ messages in thread
From: Mohammed Shafi @ 2011-09-26  9:24 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: linux-wireless, beta992

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]

Hi,
On Thu, Sep 1, 2011 at 11:54 AM, Clemens Buchacher <drizzd@aon.at> wrote:
> Mohammed Shafi <shafi.wireless@gmail.com> wrote:
>>
>>i am bit confused looking at bug comments. please correct me.
>>the bug comments say that disabling/making the Network-Manager to
>>sleep triggers the problem.
> That is correct.

is the issue still seen? any improvement with the attached patch..
will just again take a look at this thread and ath_pci_probe once i
finish my work. thanks!

>



-- 
shafi

[-- Attachment #2: 0001-ath9k-disable-interrupt-before-free_irq.patch --]
[-- Type: text/x-diff, Size: 1271 bytes --]

From 3550cd7a113e17c4e7ced8acbd7f14a3a73586ed Mon Sep 17 00:00:00 2001
From: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Date: Sat, 24 Sep 2011 21:07:16 +0530
Subject: [PATCH] ath9k: disable interrupt before free_irq

free_irq claims that for a shared IRQ the caller must ensure that the interrupt is
disabled on the card it drives before calling it. will soon test and
send a proper patch

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
---
 drivers/net/wireless/ath/ath9k/pci.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/pci.c b/drivers/net/wireless/ath/ath9k/pci.c
index d67d6ee..35537ca 100644
--- a/drivers/net/wireless/ath/ath9k/pci.c
+++ b/drivers/net/wireless/ath/ath9k/pci.c
@@ -285,11 +285,18 @@ static void ath_pci_remove(struct pci_dev *pdev)
 {
 	struct ieee80211_hw *hw = pci_get_drvdata(pdev);
 	struct ath_softc *sc = hw->priv;
+	struct ath_hw *ah = sc->sc_ah;
 	void __iomem *mem = sc->mem;
 
 	if (!is_ath9k_unloaded)
 		sc->sc_ah->ah_flags |= AH_UNPLUGGED;
+
+	/* disable interrupts */
+	ah->imask &= ~ATH9K_INT_GLOBAL;
+	ath9k_hw_disable_interrupts(ah);
+
 	ath9k_deinit_device(sc);
+
 	free_irq(sc->irq, sc);
 	ieee80211_free_hw(sc->hw);
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-09-26  9:24             ` Mohammed Shafi
@ 2011-09-27 21:42               ` Clemens Buchacher
  2011-09-29  8:18                 ` Mohammed Shafi
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-09-27 21:42 UTC (permalink / raw)
  To: Mohammed Shafi; +Cc: linux-wireless, beta992

Hello Mohammed,

On Mon, Sep 26, 2011 at 02:54:18PM +0530, Mohammed Shafi wrote:
> 
> is the issue still seen? any improvement with the attached patch..

I still have the issue. No change with current wl-testing plus your
patch.

Below dmesg output of:

 modprobe ath9k debug=0x0000fffb
 echo mem >/sys/power/state
 # resume

Clemens

---
[   27.575562] ath9k 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[   27.577970] ath9k 0000:02:00.0: setting latency timer to 64
[   27.581405] ath: UNDEFINED -> AWAKE
[   27.581469] ath: serialize_regmode is 0
[   27.626195] ath: Eeprom VER: 14, REV: 13
[   27.626199] ath: Initialize ANI
[   27.626203] ath: Enable MIB counters
[   27.626240] ath: Setup TX queue: 9
[   27.626243] ath: Set queue properties for: 9
[   27.626248] ath: Setup TX queue: 8
[   27.626251] ath: Set queue properties for: 8
[   27.626255] ath: Set queue properties for: 8
[   27.626259] ath: Reset TX queue: 8
[   27.626286] ath: tx ok 0x0 err 0x0 desc 0x100 eol 0x100 urn 0x0
[   27.626290] ath: Setup TX queue: 0
[   27.626293] ath: Set queue properties for: 0
[   27.626297] ath: Setup TX queue: 1
[   27.626300] ath: Set queue properties for: 1
[   27.626304] ath: Setup TX queue: 2
[   27.626307] ath: Set queue properties for: 2
[   27.626310] ath: Setup TX queue: 3
[   27.626313] ath: Set queue properties for: 3
[   27.627816] ath: TX streams 1, RX streams: 1
[   27.627819] ath: EEPROM regdomain: 0x60
[   27.627822] ath: EEPROM indicates we should expect a direct regpair map
[   27.627828] ath: Country alpha2 being used: 00
[   27.627830] ath: Regpair used: 0x60
[   27.627838] ath: tx DMA: 512 buffers 1 desc/buf
[   27.627857] ath: tx DMA map: ffff88006f250000 (54600) -> 6f250000 (54600)
[   27.627891] ath: beacon DMA: 4 buffers 1 desc/buf
[   27.627897] ath: beacon DMA map: ffff88007198d000 (416) -> 7198d000 (416)
[   27.627902] ath: cachelsz 64 rxbufsize 1926
[   27.627906] ath: rx DMA: 512 buffers 1 desc/buf
[   27.627924] ath: rx DMA map: ffff88006f2b0000 (54600) -> 6f2b0000 (54600)
[   27.629650] ieee80211 phy1: Selected rate control algorithm 'ath9k_rate_control'
[   27.630460] Registered led device: ath9k-phy1
[   27.630475] ieee80211 phy1: Atheros AR9285 Rev:2 mem=0xffffc90021ba0000, irq=17
[   43.855411] PM: Syncing filesystems ... done.
[   43.948410] PM: Preparing system for mem sleep
[   43.949937] Freezing user space processes ... (elapsed 0.01 seconds) done.
[   43.963254] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[   43.976527] PM: Entering mem sleep
[   43.976595] Suspending console(s) (use no_console_suspend to debug)
[   43.977193] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   43.989161] jmb38x_ms 0000:06:00.3: PCI INT B disabled
[   43.989194] sdhci-pci 0000:06:00.0: PCI INT B disabled
[   43.989232] ath: AWAKE -> FULL-SLEEP
[   43.998370] sd 0:0:0:0: [sda] Stopping disk
[   44.196369] snd_hda_intel 0000:00:1b.0: PCI INT A disabled
[   44.608701] PM: suspend of devices complete after 633.019 msecs
[   44.609142] jme 0000:06:00.5: PME# enabled
[   44.609152] jme 0000:06:00.5: wake-up capability enabled by ACPI
[   44.648588] PM: late suspend of devices complete after 39.960 msecs
[   44.648748] ACPI: Preparing to enter system sleep state S3
[   44.951903] PM: Saving platform NVS memory
[   44.952780] Disabling non-boot CPUs ...
[   44.955081] CPU 1 is now offline
[   45.057696] CPU 2 is now offline
[   45.160822] CPU 3 is now offline
[   45.161227] Extended CMOS year: 2000
[   45.161461] ACPI: Low-level resume complete
[   45.161521] PM: Restoring platform NVS memory
[   45.161951] Extended CMOS year: 2000
[   45.161974] Enabling non-boot CPUs ...
[   45.162070] Booting Node 0 Processor 1 APIC 0x4
[   45.162071] smpboot cpu 1: start_ip = 96000
[   45.173166] Calibrating delay loop (skipped) already calibrated this CPU
[   45.193433] NMI watchdog enabled, takes one hw-pmu counter.
[   45.193554] CPU1 is up
[   45.193622] Booting Node 0 Processor 2 APIC 0x1
[   45.193623] smpboot cpu 2: start_ip = 96000
[   45.194742] Switched to NOHz mode on CPU #1
[   45.204719] Calibrating delay loop (skipped) already calibrated this CPU
[   45.225042] NMI watchdog enabled, takes one hw-pmu counter.
[   45.225176] CPU2 is up
[   45.225257] Booting Node 0 Processor 3 APIC 0x5
[   45.225259] smpboot cpu 3: start_ip = 96000
[   45.228010] Switched to NOHz mode on CPU #2
[   45.236352] Calibrating delay loop (skipped) already calibrated this CPU
[   45.256763] NMI watchdog enabled, takes one hw-pmu counter.
[   45.256925] CPU3 is up
[   45.258831] Switched to NOHz mode on CPU #3
[   45.259072] ACPI: Waking up from system sleep state S3
[   45.580717] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 20253, limit 9000
[   45.658244] i915 0000:00:02.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[   45.658253] i915 0000:00:02.0: restoring config space at offset 0x1 (was 0x900007, writing 0x900407)
[   45.658277] mei 0000:00:16.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[   45.658296] mei 0000:00:16.0: restoring config space at offset 0x4 (was 0x4, writing 0xd540a004)
[   45.658304] mei 0000:00:16.0: restoring config space at offset 0x1 (was 0x180000, writing 0x180006)
[   45.658332] pci 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[   45.658350] pci 0000:00:1a.0: restoring config space at offset 0x4 (was 0x0, writing 0xd5408000)
[   45.658358] pci 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900006)
[   45.658391] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x105)
[   45.658410] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x4 (was 0x4, writing 0xd5400004)
[   45.658415] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
[   45.658421] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100004, writing 0x100002)
[   45.658455] pcieport 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x10010b)
[   45.658467] pcieport 0000:00:1c.0: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c717c61)
[   45.658479] pcieport 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
[   45.658485] pcieport 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
[   45.658538] pcieport 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x100203)
[   45.658550] pcieport 0000:00:1c.1: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c517c41)
[   45.658561] pcieport 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
[   45.658568] pcieport 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
[   45.658620] pcieport 0000:00:1c.2: restoring config space at offset 0xf (was 0x300, writing 0x100304)
[   45.658632] pcieport 0000:00:1c.2: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c317c21)
[   45.658643] pcieport 0000:00:1c.2: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
[   45.658650] pcieport 0000:00:1c.2: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
[   45.658703] pcieport 0000:00:1c.5: restoring config space at offset 0xf (was 0x200, writing 0x100203)
[   45.658714] pcieport 0000:00:1c.5: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c117c01)
[   45.658726] pcieport 0000:00:1c.5: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
[   45.658732] pcieport 0000:00:1c.5: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
[   45.658780] pci 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[   45.658799] pci 0000:00:1d.0: restoring config space at offset 0x4 (was 0x0, writing 0xd5407000)
[   45.658807] pci 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900006)
[   45.658831] pci 0000:00:1e.0: restoring config space at offset 0xf (was 0x0, writing 0x100000)
[   45.658930] ahci 0000:00:1f.2: restoring config space at offset 0x1 (was 0x2b00007, writing 0x2b00407)
[   45.659017] intel ips 0000:00:1f.6: restoring config space at offset 0x1 (was 0x100400, writing 0x100406)
[   45.659057] ath9k 0000:02:00.0: restoring config space at offset 0xf (was 0x1ff, writing 0x103)
[   45.659077] ath9k 0000:02:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xd2c00004)
[   45.659082] ath9k 0000:02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
[   45.659088] ath9k 0000:02:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[   45.659129] sdhci-pci 0000:06:00.0: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
[   45.659157] sdhci-pci 0000:06:00.0: restoring config space at offset 0x4 (was 0x0, writing 0xd0407000)
[   45.659163] sdhci-pci 0000:06:00.0: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
[   45.659172] sdhci-pci 0000:06:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[   45.659218] pci 0000:06:00.2: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
[   45.659246] pci 0000:06:00.2: restoring config space at offset 0x4 (was 0x0, writing 0xd0406000)
[   45.659253] pci 0000:06:00.2: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
[   45.659262] pci 0000:06:00.2: restoring config space at offset 0x1 (was 0x100000, writing 0x100003)
[   45.659308] jmb38x_ms 0000:06:00.3: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
[   45.659336] jmb38x_ms 0000:06:00.3: restoring config space at offset 0x4 (was 0x0, writing 0xd0405000)
[   45.659342] jmb38x_ms 0000:06:00.3: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
[   45.659351] jmb38x_ms 0000:06:00.3: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[   45.659397] pci 0000:06:00.4: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
[   45.659425] pci 0000:06:00.4: restoring config space at offset 0x4 (was 0x0, writing 0xd0404000)
[   45.659431] pci 0000:06:00.4: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
[   45.659441] pci 0000:06:00.4: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[   45.659487] jme 0000:06:00.5: restoring config space at offset 0xf (was 0x100, writing 0x103)
[   45.659508] jme 0000:06:00.5: restoring config space at offset 0x7 (was 0x1, writing 0xa001)
[   45.659520] jme 0000:06:00.5: restoring config space at offset 0x4 (was 0x0, writing 0xd0400000)
[   45.659526] jme 0000:06:00.5: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
[   45.659535] jme 0000:06:00.5: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
[   45.659759] PM: early resume of devices complete after 1.603 msecs
[   45.659979] i915 0000:00:02.0: setting latency timer to 64
[   45.659996] snd_hda_intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[   45.660003] snd_hda_intel 0000:00:1b.0: setting latency timer to 64
[   45.660050] snd_hda_intel 0000:00:1b.0: irq 45 for MSI/MSI-X
[   45.660126] pci 0000:00:1e.0: setting latency timer to 64
[   45.660144] ahci 0000:00:1f.2: setting latency timer to 64
[   45.660211] ath: FULL-SLEEP -> AWAKE
[   45.663932] sdhci-pci 0000:06:00.0: PCI INT B -> GSI 18 (level, low) -> IRQ 18
[   45.663974] sdhci-pci 0000:06:00.0: setting latency timer to 64
[   45.664037] jmb38x_ms 0000:06:00.3: PCI INT B -> GSI 18 (level, low) -> IRQ 18
[   45.664051] jmb38x_ms 0000:06:00.3: setting latency timer to 64
[   45.664113] jme 0000:06:00.5: wake-up capability disabled by ACPI
[   45.664124] jme 0000:06:00.5: PME# disabled
[   45.664503] ath: disable IER
[   45.664521] ath: Disable MIB counters
[   45.664560] ath: disable IER
[   45.664575] ath: 0x30 => 0x30
[   45.664576] ath: new IMR 0x30
[   45.664672] sd 0:0:0:0: [sda] Starting disk
[   45.667161] ath: Getting spur idx:0 is2Ghz:1 val:8000
[   45.668810] ath: Reset TX queue: 0
[   45.668813] ath: tx ok 0x0 err 0x0 desc 0x101 eol 0x101 urn 0x0
[   45.668815] ath: Reset TX queue: 1
[   45.668817] ath: tx ok 0x0 err 0x0 desc 0x103 eol 0x103 urn 0x0
[   45.668819] ath: Reset TX queue: 2
[   45.668821] ath: tx ok 0x0 err 0x0 desc 0x107 eol 0x107 urn 0x0
[   45.668823] ath: Reset TX queue: 3
[   45.668825] ath: tx ok 0x0 err 0x0 desc 0x10f eol 0x10f urn 0x0
[   45.668827] ath: Reset TXQ, inactive queue: 4
[   45.668828] ath: Reset TXQ, inactive queue: 5
[   45.668830] ath: Reset TXQ, inactive queue: 6
[   45.668831] ath: Reset TXQ, inactive queue: 7
[   45.668832] ath: Reset TX queue: 8
[   45.668842] ath: tx ok 0x0 err 0x0 desc 0x10f eol 0x10f urn 0x0
[   45.668844] ath: Reset TX queue: 9
[   45.668854] ath: tx ok 0x0 err 0x0 desc 0x10f eol 0x10f urn 0x0
[   45.668861] ath: ah->misc_mode 0x4
[   45.670264] ath: Running PA Calibration
[   45.670963] ath: enabling ADC Gain Calibration.
[   45.670964] ath: enabling ADC DC Calibration.
[   45.670965] ath: enabling IQ Calibration.
[   45.670969] ath: starting ADC Gain Calibration
[   45.671139] ath: AWAKE -> FULL-SLEEP
[   45.679552] Extended CMOS year: 2000
[   45.983193] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   45.985133] ata2.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[   45.985136] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[   45.988885] ata2.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[   45.988889] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[   45.989852] ata6: SATA link down (SStatus 0 SControl 300)
[   45.990375] ata2.00: configured for UDMA/100
[   46.120272] irq 17: nobody cared (try booting with the "irqpoll" option)
[   46.120275] Pid: 0, comm: swapper Tainted: G         C  3.1.0-rc7-wl-ARCH3.1.0-rc7-ath9k-debug #2
[   46.120277] Call Trace:
[   46.120280]  <IRQ>  [<ffffffff810c68bd>] __report_bad_irq+0x3d/0xe0
[   46.120290]  [<ffffffff810c6cfd>] note_interrupt+0x14d/0x210
[   46.120293]  [<ffffffff810c4c39>] handle_irq_event_percpu+0xc9/0x2a0
[   46.120296]  [<ffffffff810c4e58>] handle_irq_event+0x48/0x70
[   46.120299]  [<ffffffff810c763a>] handle_fasteoi_irq+0x5a/0xe0
[   46.120303]  [<ffffffff81016982>] handle_irq+0x22/0x40
[   46.120308]  [<ffffffff813ec09a>] do_IRQ+0x5a/0xe0
[   46.120312]  [<ffffffff813e8cee>] common_interrupt+0x6e/0x6e
[   46.120313]  <EOI>  [<ffffffff8131234a>] ? poll_idle+0x3a/0x80
[   46.120319]  [<ffffffff81312323>] ? poll_idle+0x13/0x80
[   46.120321]  [<ffffffff813126f6>] cpuidle_idle_call+0xc6/0x350
[   46.120324]  [<ffffffff81013229>] cpu_idle+0xc9/0x120
[   46.120328]  [<ffffffff813c64e7>] rest_init+0x9b/0xa4
[   46.120330]  [<ffffffff8174dc15>] start_kernel+0x3bf/0x3cc
[   46.120333]  [<ffffffff8174d347>] x86_64_start_reservations+0x132/0x136
[   46.120336]  [<ffffffff8174d140>] ? early_idt_handlers+0x140/0x140
[   46.120338]  [<ffffffff8174d44d>] x86_64_start_kernel+0x102/0x111
[   46.120340] handlers:
[   46.120344] [<ffffffffa053c020>] ath_isr
[   46.120345] Disabling IRQ #17
[   48.371839] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   48.479286] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[   48.479457] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[   48.479463] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[   48.516626] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[   48.516804] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[   48.516809] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[   48.518731] ata1.00: configured for UDMA/133
[   48.526624] ath: FULL-SLEEP -> AWAKE
[   48.530116] ath: ah->misc_mode 0x4
[   48.530142] ath: AWAKE -> FULL-SLEEP
[   48.530172] PM: resume of devices complete after 2875.994 msecs
[   48.530607] PM: Finishing wakeup.
[   48.530608] Restarting tasks ... done.
[   48.533504] video LNXVIDEO:02: Restoring backlight state

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-09-27 21:42               ` Clemens Buchacher
@ 2011-09-29  8:18                 ` Mohammed Shafi
  2011-09-29 10:33                   ` Adrian Chadd
  0 siblings, 1 reply; 38+ messages in thread
From: Mohammed Shafi @ 2011-09-29  8:18 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: linux-wireless, beta992

On Wed, Sep 28, 2011 at 3:12 AM, Clemens Buchacher <drizzd@aon.at> wrote:
> Hello Mohammed,
>
> On Mon, Sep 26, 2011 at 02:54:18PM +0530, Mohammed Shafi wrote:
>>
>> is the issue still seen? any improvement with the attached patch..
>
> I still have the issue. No change with current wl-testing plus your
> patch.

Clemens thanks for taking time  and test it, I wll see if i can find
something else.

>
> Below dmesg output of:
>
>  modprobe ath9k debug=0x0000fffb
>  echo mem >/sys/power/state
>  # resume
>
> Clemens
>
> ---
> [   27.575562] ath9k 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> [   27.577970] ath9k 0000:02:00.0: setting latency timer to 64
> [   27.581405] ath: UNDEFINED -> AWAKE
> [   27.581469] ath: serialize_regmode is 0
> [   27.626195] ath: Eeprom VER: 14, REV: 13
> [   27.626199] ath: Initialize ANI
> [   27.626203] ath: Enable MIB counters
> [   27.626240] ath: Setup TX queue: 9
> [   27.626243] ath: Set queue properties for: 9
> [   27.626248] ath: Setup TX queue: 8
> [   27.626251] ath: Set queue properties for: 8
> [   27.626255] ath: Set queue properties for: 8
> [   27.626259] ath: Reset TX queue: 8
> [   27.626286] ath: tx ok 0x0 err 0x0 desc 0x100 eol 0x100 urn 0x0
> [   27.626290] ath: Setup TX queue: 0
> [   27.626293] ath: Set queue properties for: 0
> [   27.626297] ath: Setup TX queue: 1
> [   27.626300] ath: Set queue properties for: 1
> [   27.626304] ath: Setup TX queue: 2
> [   27.626307] ath: Set queue properties for: 2
> [   27.626310] ath: Setup TX queue: 3
> [   27.626313] ath: Set queue properties for: 3
> [   27.627816] ath: TX streams 1, RX streams: 1
> [   27.627819] ath: EEPROM regdomain: 0x60
> [   27.627822] ath: EEPROM indicates we should expect a direct regpair map
> [   27.627828] ath: Country alpha2 being used: 00
> [   27.627830] ath: Regpair used: 0x60
> [   27.627838] ath: tx DMA: 512 buffers 1 desc/buf
> [   27.627857] ath: tx DMA map: ffff88006f250000 (54600) -> 6f250000 (54600)
> [   27.627891] ath: beacon DMA: 4 buffers 1 desc/buf
> [   27.627897] ath: beacon DMA map: ffff88007198d000 (416) -> 7198d000 (416)
> [   27.627902] ath: cachelsz 64 rxbufsize 1926
> [   27.627906] ath: rx DMA: 512 buffers 1 desc/buf
> [   27.627924] ath: rx DMA map: ffff88006f2b0000 (54600) -> 6f2b0000 (54600)
> [   27.629650] ieee80211 phy1: Selected rate control algorithm 'ath9k_rate_control'
> [   27.630460] Registered led device: ath9k-phy1
> [   27.630475] ieee80211 phy1: Atheros AR9285 Rev:2 mem=0xffffc90021ba0000, irq=17
> [   43.855411] PM: Syncing filesystems ... done.
> [   43.948410] PM: Preparing system for mem sleep
> [   43.949937] Freezing user space processes ... (elapsed 0.01 seconds) done.
> [   43.963254] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> [   43.976527] PM: Entering mem sleep
> [   43.976595] Suspending console(s) (use no_console_suspend to debug)
> [   43.977193] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   43.989161] jmb38x_ms 0000:06:00.3: PCI INT B disabled
> [   43.989194] sdhci-pci 0000:06:00.0: PCI INT B disabled
> [   43.989232] ath: AWAKE -> FULL-SLEEP
> [   43.998370] sd 0:0:0:0: [sda] Stopping disk
> [   44.196369] snd_hda_intel 0000:00:1b.0: PCI INT A disabled
> [   44.608701] PM: suspend of devices complete after 633.019 msecs
> [   44.609142] jme 0000:06:00.5: PME# enabled
> [   44.609152] jme 0000:06:00.5: wake-up capability enabled by ACPI
> [   44.648588] PM: late suspend of devices complete after 39.960 msecs
> [   44.648748] ACPI: Preparing to enter system sleep state S3
> [   44.951903] PM: Saving platform NVS memory
> [   44.952780] Disabling non-boot CPUs ...
> [   44.955081] CPU 1 is now offline
> [   45.057696] CPU 2 is now offline
> [   45.160822] CPU 3 is now offline
> [   45.161227] Extended CMOS year: 2000
> [   45.161461] ACPI: Low-level resume complete
> [   45.161521] PM: Restoring platform NVS memory
> [   45.161951] Extended CMOS year: 2000
> [   45.161974] Enabling non-boot CPUs ...
> [   45.162070] Booting Node 0 Processor 1 APIC 0x4
> [   45.162071] smpboot cpu 1: start_ip = 96000
> [   45.173166] Calibrating delay loop (skipped) already calibrated this CPU
> [   45.193433] NMI watchdog enabled, takes one hw-pmu counter.
> [   45.193554] CPU1 is up
> [   45.193622] Booting Node 0 Processor 2 APIC 0x1
> [   45.193623] smpboot cpu 2: start_ip = 96000
> [   45.194742] Switched to NOHz mode on CPU #1
> [   45.204719] Calibrating delay loop (skipped) already calibrated this CPU
> [   45.225042] NMI watchdog enabled, takes one hw-pmu counter.
> [   45.225176] CPU2 is up
> [   45.225257] Booting Node 0 Processor 3 APIC 0x5
> [   45.225259] smpboot cpu 3: start_ip = 96000
> [   45.228010] Switched to NOHz mode on CPU #2
> [   45.236352] Calibrating delay loop (skipped) already calibrated this CPU
> [   45.256763] NMI watchdog enabled, takes one hw-pmu counter.
> [   45.256925] CPU3 is up
> [   45.258831] Switched to NOHz mode on CPU #3
> [   45.259072] ACPI: Waking up from system sleep state S3
> [   45.580717] intel ips 0000:00:1f.6: MCP limit exceeded: Avg temp 20253, limit 9000
> [   45.658244] i915 0000:00:02.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
> [   45.658253] i915 0000:00:02.0: restoring config space at offset 0x1 (was 0x900007, writing 0x900407)
> [   45.658277] mei 0000:00:16.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
> [   45.658296] mei 0000:00:16.0: restoring config space at offset 0x4 (was 0x4, writing 0xd540a004)
> [   45.658304] mei 0000:00:16.0: restoring config space at offset 0x1 (was 0x180000, writing 0x180006)
> [   45.658332] pci 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
> [   45.658350] pci 0000:00:1a.0: restoring config space at offset 0x4 (was 0x0, writing 0xd5408000)
> [   45.658358] pci 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900006)
> [   45.658391] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x105)
> [   45.658410] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x4 (was 0x4, writing 0xd5400004)
> [   45.658415] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
> [   45.658421] snd_hda_intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100004, writing 0x100002)
> [   45.658455] pcieport 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x10010b)
> [   45.658467] pcieport 0000:00:1c.0: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c717c61)
> [   45.658479] pcieport 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
> [   45.658485] pcieport 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> [   45.658538] pcieport 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x100203)
> [   45.658550] pcieport 0000:00:1c.1: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c517c41)
> [   45.658561] pcieport 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
> [   45.658568] pcieport 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> [   45.658620] pcieport 0000:00:1c.2: restoring config space at offset 0xf (was 0x300, writing 0x100304)
> [   45.658632] pcieport 0000:00:1c.2: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c317c21)
> [   45.658643] pcieport 0000:00:1c.2: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
> [   45.658650] pcieport 0000:00:1c.2: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> [   45.658703] pcieport 0000:00:1c.5: restoring config space at offset 0xf (was 0x200, writing 0x100203)
> [   45.658714] pcieport 0000:00:1c.5: restoring config space at offset 0x9 (was 0x1fff1, writing 0x7c117c01)
> [   45.658726] pcieport 0000:00:1c.5: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
> [   45.658732] pcieport 0000:00:1c.5: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> [   45.658780] pci 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> [   45.658799] pci 0000:00:1d.0: restoring config space at offset 0x4 (was 0x0, writing 0xd5407000)
> [   45.658807] pci 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900006)
> [   45.658831] pci 0000:00:1e.0: restoring config space at offset 0xf (was 0x0, writing 0x100000)
> [   45.658930] ahci 0000:00:1f.2: restoring config space at offset 0x1 (was 0x2b00007, writing 0x2b00407)
> [   45.659017] intel ips 0000:00:1f.6: restoring config space at offset 0x1 (was 0x100400, writing 0x100406)
> [   45.659057] ath9k 0000:02:00.0: restoring config space at offset 0xf (was 0x1ff, writing 0x103)
> [   45.659077] ath9k 0000:02:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xd2c00004)
> [   45.659082] ath9k 0000:02:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
> [   45.659088] ath9k 0000:02:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> [   45.659129] sdhci-pci 0000:06:00.0: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
> [   45.659157] sdhci-pci 0000:06:00.0: restoring config space at offset 0x4 (was 0x0, writing 0xd0407000)
> [   45.659163] sdhci-pci 0000:06:00.0: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
> [   45.659172] sdhci-pci 0000:06:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> [   45.659218] pci 0000:06:00.2: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
> [   45.659246] pci 0000:06:00.2: restoring config space at offset 0x4 (was 0x0, writing 0xd0406000)
> [   45.659253] pci 0000:06:00.2: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
> [   45.659262] pci 0000:06:00.2: restoring config space at offset 0x1 (was 0x100000, writing 0x100003)
> [   45.659308] jmb38x_ms 0000:06:00.3: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
> [   45.659336] jmb38x_ms 0000:06:00.3: restoring config space at offset 0x4 (was 0x0, writing 0xd0405000)
> [   45.659342] jmb38x_ms 0000:06:00.3: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
> [   45.659351] jmb38x_ms 0000:06:00.3: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> [   45.659397] pci 0000:06:00.4: restoring config space at offset 0xf (was 0x2ff, writing 0x204)
> [   45.659425] pci 0000:06:00.4: restoring config space at offset 0x4 (was 0x0, writing 0xd0404000)
> [   45.659431] pci 0000:06:00.4: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
> [   45.659441] pci 0000:06:00.4: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> [   45.659487] jme 0000:06:00.5: restoring config space at offset 0xf (was 0x100, writing 0x103)
> [   45.659508] jme 0000:06:00.5: restoring config space at offset 0x7 (was 0x1, writing 0xa001)
> [   45.659520] jme 0000:06:00.5: restoring config space at offset 0x4 (was 0x0, writing 0xd0400000)
> [   45.659526] jme 0000:06:00.5: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
> [   45.659535] jme 0000:06:00.5: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> [   45.659759] PM: early resume of devices complete after 1.603 msecs
> [   45.659979] i915 0000:00:02.0: setting latency timer to 64
> [   45.659996] snd_hda_intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
> [   45.660003] snd_hda_intel 0000:00:1b.0: setting latency timer to 64
> [   45.660050] snd_hda_intel 0000:00:1b.0: irq 45 for MSI/MSI-X
> [   45.660126] pci 0000:00:1e.0: setting latency timer to 64
> [   45.660144] ahci 0000:00:1f.2: setting latency timer to 64
> [   45.660211] ath: FULL-SLEEP -> AWAKE
> [   45.663932] sdhci-pci 0000:06:00.0: PCI INT B -> GSI 18 (level, low) -> IRQ 18
> [   45.663974] sdhci-pci 0000:06:00.0: setting latency timer to 64
> [   45.664037] jmb38x_ms 0000:06:00.3: PCI INT B -> GSI 18 (level, low) -> IRQ 18
> [   45.664051] jmb38x_ms 0000:06:00.3: setting latency timer to 64
> [   45.664113] jme 0000:06:00.5: wake-up capability disabled by ACPI
> [   45.664124] jme 0000:06:00.5: PME# disabled
> [   45.664503] ath: disable IER
> [   45.664521] ath: Disable MIB counters
> [   45.664560] ath: disable IER
> [   45.664575] ath: 0x30 => 0x30
> [   45.664576] ath: new IMR 0x30
> [   45.664672] sd 0:0:0:0: [sda] Starting disk
> [   45.667161] ath: Getting spur idx:0 is2Ghz:1 val:8000
> [   45.668810] ath: Reset TX queue: 0
> [   45.668813] ath: tx ok 0x0 err 0x0 desc 0x101 eol 0x101 urn 0x0
> [   45.668815] ath: Reset TX queue: 1
> [   45.668817] ath: tx ok 0x0 err 0x0 desc 0x103 eol 0x103 urn 0x0
> [   45.668819] ath: Reset TX queue: 2
> [   45.668821] ath: tx ok 0x0 err 0x0 desc 0x107 eol 0x107 urn 0x0
> [   45.668823] ath: Reset TX queue: 3
> [   45.668825] ath: tx ok 0x0 err 0x0 desc 0x10f eol 0x10f urn 0x0
> [   45.668827] ath: Reset TXQ, inactive queue: 4
> [   45.668828] ath: Reset TXQ, inactive queue: 5
> [   45.668830] ath: Reset TXQ, inactive queue: 6
> [   45.668831] ath: Reset TXQ, inactive queue: 7
> [   45.668832] ath: Reset TX queue: 8
> [   45.668842] ath: tx ok 0x0 err 0x0 desc 0x10f eol 0x10f urn 0x0
> [   45.668844] ath: Reset TX queue: 9
> [   45.668854] ath: tx ok 0x0 err 0x0 desc 0x10f eol 0x10f urn 0x0
> [   45.668861] ath: ah->misc_mode 0x4
> [   45.670264] ath: Running PA Calibration
> [   45.670963] ath: enabling ADC Gain Calibration.
> [   45.670964] ath: enabling ADC DC Calibration.
> [   45.670965] ath: enabling IQ Calibration.
> [   45.670969] ath: starting ADC Gain Calibration
> [   45.671139] ath: AWAKE -> FULL-SLEEP
> [   45.679552] Extended CMOS year: 2000
> [   45.983193] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [   45.985133] ata2.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
> [   45.985136] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [   45.988885] ata2.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
> [   45.988889] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [   45.989852] ata6: SATA link down (SStatus 0 SControl 300)
> [   45.990375] ata2.00: configured for UDMA/100
> [   46.120272] irq 17: nobody cared (try booting with the "irqpoll" option)
> [   46.120275] Pid: 0, comm: swapper Tainted: G         C  3.1.0-rc7-wl-ARCH3.1.0-rc7-ath9k-debug #2
> [   46.120277] Call Trace:
> [   46.120280]  <IRQ>  [<ffffffff810c68bd>] __report_bad_irq+0x3d/0xe0
> [   46.120290]  [<ffffffff810c6cfd>] note_interrupt+0x14d/0x210
> [   46.120293]  [<ffffffff810c4c39>] handle_irq_event_percpu+0xc9/0x2a0
> [   46.120296]  [<ffffffff810c4e58>] handle_irq_event+0x48/0x70
> [   46.120299]  [<ffffffff810c763a>] handle_fasteoi_irq+0x5a/0xe0
> [   46.120303]  [<ffffffff81016982>] handle_irq+0x22/0x40
> [   46.120308]  [<ffffffff813ec09a>] do_IRQ+0x5a/0xe0
> [   46.120312]  [<ffffffff813e8cee>] common_interrupt+0x6e/0x6e
> [   46.120313]  <EOI>  [<ffffffff8131234a>] ? poll_idle+0x3a/0x80
> [   46.120319]  [<ffffffff81312323>] ? poll_idle+0x13/0x80
> [   46.120321]  [<ffffffff813126f6>] cpuidle_idle_call+0xc6/0x350
> [   46.120324]  [<ffffffff81013229>] cpu_idle+0xc9/0x120
> [   46.120328]  [<ffffffff813c64e7>] rest_init+0x9b/0xa4
> [   46.120330]  [<ffffffff8174dc15>] start_kernel+0x3bf/0x3cc
> [   46.120333]  [<ffffffff8174d347>] x86_64_start_reservations+0x132/0x136
> [   46.120336]  [<ffffffff8174d140>] ? early_idt_handlers+0x140/0x140
> [   46.120338]  [<ffffffff8174d44d>] x86_64_start_kernel+0x102/0x111
> [   46.120340] handlers:
> [   46.120344] [<ffffffffa053c020>] ath_isr
> [   46.120345] Disabling IRQ #17
> [   48.371839] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [   48.479286] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [   48.479457] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
> [   48.479463] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [   48.516626] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
> [   48.516804] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
> [   48.516809] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
> [   48.518731] ata1.00: configured for UDMA/133
> [   48.526624] ath: FULL-SLEEP -> AWAKE
> [   48.530116] ath: ah->misc_mode 0x4
> [   48.530142] ath: AWAKE -> FULL-SLEEP
> [   48.530172] PM: resume of devices complete after 2875.994 msecs
> [   48.530607] PM: Finishing wakeup.
> [   48.530608] Restarting tasks ... done.
> [   48.533504] video LNXVIDEO:02: Restoring backlight state
>



-- 
shafi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-09-29  8:18                 ` Mohammed Shafi
@ 2011-09-29 10:33                   ` Adrian Chadd
  2011-09-29 17:11                     ` Clemens Buchacher
       [not found]                     ` <CAD2nsn0Z2J4r4tN_fLjx5bbvz2bg6NVcQ8vppJbbNcgOF8pFew@mail.gmail.com>
  0 siblings, 2 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-09-29 10:33 UTC (permalink / raw)
  To: Mohammed Shafi; +Cc: Clemens Buchacher, linux-wireless, beta992

Hm,

Has someone figured out which ISR bits are being triggered?

If not, I can likely whip up a patch which adds some relevant
printk's. I think it's worth establishing:

* is it a sync/async interrupt;
* is it a fatal interrupt (eg something like a PCI bus error or
transaction timeout) or is it a normal ISR bit that keeps firing;
* .. and which one is firing.


Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-09-29 10:33                   ` Adrian Chadd
@ 2011-09-29 17:11                     ` Clemens Buchacher
       [not found]                     ` <CAD2nsn0Z2J4r4tN_fLjx5bbvz2bg6NVcQ8vppJbbNcgOF8pFew@mail.gmail.com>
  1 sibling, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-09-29 17:11 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless, beta992

Hi Adrian,

On Thu, Sep 29, 2011 at 06:33:42PM +0800, Adrian Chadd wrote:
> 
> Has someone figured out which ISR bits are being triggered?

No.

I tried to move the SC_OP_INVALID check until after the call to
ath9k_hw_getisr in ath_isr. But that caused the kernel to freeze
IIRC.

What I also tried and failed to do a hardware reset before
request_irq gets called. For ath9k_hw_reset I need an initialized
ath_hw struct, but that's done after request_irq in
ath9k_init_device. I remember I tried to delay request_irq, but for
some reason that did not work out either.

> If not, I can likely whip up a patch which adds some relevant
> printk's. I think it's worth establishing:
> 
> * is it a sync/async interrupt;
> * is it a fatal interrupt (eg something like a PCI bus error or
> transaction timeout) or is it a normal ISR bit that keeps firing;
> * .. and which one is firing.

If you have an idea how to do this, I think that would be most
helpful.

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
       [not found]                       ` <CAJ-VmokcM4KmzV7Rn9PA68iEiTJiPw=ffYgNFLDAHShxD0HNAg@mail.gmail.com>
@ 2011-10-03  8:48                         ` Clemens Buchacher
  2011-10-04  7:58                           ` Adrian Chadd
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-03  8:48 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless

(re-added linux-wireless)

On Fri, Sep 30, 2011 at 01:46:06PM +0800, Adrian Chadd wrote:
> 
> if (sc->sc_flags & SC_OP_INVALID) {
> printk(KERN_ERR "AR_INTR_SYNC_CAUSE: 0x%08x, AR_INTR_ASYNC_CAUSE: 0x%08x\n",
>     REG_READ(ah, AR_INTR_SYNC_CAUSE), REG_READ(ah, AR_INTR_ASYNC_CAUSE));
> printk(KERN_ERR "AR_IMR: 0x%08x, AR_ISR: 0x%08x\n",
>     REG_READ(ah, AR_IMR),
>     REG_READ(ah, AR_ISR));
>     return IRQ_NONE;
> }

I only output the first 99 and then every 2^n'th interrupt. The log
gets truncated otherwise. If I load the module first, then
suspend/resume, I get this until "nobody cared".

for 1, 2, ..., 99, 128, 256:
AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
AR_IMR: 0xdeadbeef, AR_ISR: 0xdeadbeef

for 512:
AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
AR_IMR: 0x00000000, AR_ISR: 0x00000208

for 1024:
AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
AR_IMR: 0x81800964, AR_ISR: 0x00000008

for 2048:
AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
AR_IMR: 0xdeadbeef, AR_ISR: 0xdeadbeef

for 2^12, 2^13, ..., 2^16
AR_INTR_SYNC_CAUSE: 0x00020000, AR_INTR_ASYNC_CAUSE: 0x00000000
AR_IMR: 0xdeadbeef, AR_ISR: 0xdeadbeef

The last ones indeed correspond to AR_INTR_SYNC_MAC_SLEEP_ACCESS
that you mentioned. Note that this output is interleaved with
device initialization. So I don't know if that causes the register
contents to become valid, or if it changes the contents.

If I suspend/resume first, then load the module, then wait for 500
ms after request_irq I get only zeroes, repeated 109 times:

for 1, 2, ..., 99, 2^7, 2^8, ..., 2^16:
AR_IMR: 0x00000000, AR_ISR: 0x00000000
AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000

Clemens
---

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index edaa784..634256b 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -16,6 +16,7 @@
 
 #include <linux/nl80211.h>
 #include <linux/delay.h>
+#include <linux/log2.h>
 #include "ath9k.h"
 #include "btcoex.h"
 
@@ -765,8 +766,19 @@ irqreturn_t ath_isr(int irq, void *dev)
 	 * touch anything. Note this can happen early
 	 * on if the IRQ is shared.
 	 */
-	if (sc->sc_flags & SC_OP_INVALID)
+	if (sc->sc_flags & SC_OP_INVALID) {
+		static int count = 0;
+		if (count < INT_MAX)
+			count++;
+		if (count < 100 || is_power_of_2(count)) {
+			printk(KERN_ERR "AR_INTR_SYNC_CAUSE: 0x%08x, AR_INTR_ASYNC_CAUSE: 0x%08x\n",
+					REG_READ(ah, AR_INTR_SYNC_CAUSE), REG_READ(ah, AR_INTR_ASYNC_CAUSE));
+			printk(KERN_ERR "AR_IMR: 0x%08x, AR_ISR: 0x%08x\n",
+					REG_READ(ah, AR_IMR),
+					REG_READ(ah, AR_ISR));
+		}
 		return IRQ_NONE;
+	}
 
 
 	/* shared irq, not for us */
diff --git a/drivers/net/wireless/ath/ath9k/pci.c b/drivers/net/wireless/ath/ath9k/pci.c
index d67d6ee..76006a9 100644
--- a/drivers/net/wireless/ath/ath9k/pci.c
+++ b/drivers/net/wireless/ath/ath9k/pci.c
@@ -246,20 +246,24 @@ static int ath_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	/* Will be cleared in ath9k_start() */
 	sc->sc_flags |= SC_OP_INVALID;
 
+	ret = ath9k_init_device(id->device, sc, &ath_pci_bus_ops);
+	if (ret) {
+		dev_err(&pdev->dev, "Failed to initialize device\n");
+		goto err_init;
+	}
+
+	printk(KERN_INFO "ath9k: requesting irq\n");
 	ret = request_irq(pdev->irq, ath_isr, IRQF_SHARED, "ath9k", sc);
 	if (ret) {
 		dev_err(&pdev->dev, "request_irq failed\n");
 		goto err_irq;
 	}
+	printk(KERN_INFO "wait 500 ms\n");
+	msleep(500);
+	printk(KERN_INFO "done waiting\n");
 
 	sc->irq = pdev->irq;
 
-	ret = ath9k_init_device(id->device, sc, &ath_pci_bus_ops);
-	if (ret) {
-		dev_err(&pdev->dev, "Failed to initialize device\n");
-		goto err_init;
-	}
-
 	ath9k_hw_name(sc->sc_ah, hw_name, sizeof(hw_name));
 	wiphy_info(hw->wiphy, "%s mem=0x%lx, irq=%d\n",
 		   hw_name, (unsigned long)mem, pdev->irq);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-03  8:48                         ` Clemens Buchacher
@ 2011-10-04  7:58                           ` Adrian Chadd
  2011-10-04 18:15                             ` Clemens Buchacher
  2011-10-04 18:36                             ` Clemens Buchacher
  0 siblings, 2 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-10-04  7:58 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless

Right, this looks a bit evil. ;)

On 3 October 2011 16:48, Clemens Buchacher <drizzd@aon.at> wrote:

> for 1, 2, ..., 99, 128, 256:
> AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
> AR_IMR: 0xdeadbeef, AR_ISR: 0xdeadbeef

.. that says some part of the MAC is very asleep. But no interrupt
cause bits are set. So I bet it's a shared interrupt.

> for 512:
> AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
> AR_IMR: 0x00000000, AR_ISR: 0x00000208

That looks valid, but why are you receiving an interrupt here? Is it
some other device on a shared irq? Or a non-acked interrupt?
Ie - there's no sync/async cause bits set and IMR is 0, the status
bits are no tx pkt / no rx pkt. So that looks fine. You shouldn't have
received an interrupt.

> for 1024:
> AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
> AR_IMR: 0x81800964, AR_ISR: 0x00000008

.. hm,  why'd you receive an interrupt here? the ISR bits aren't set that way.
Again, I think this is another shared or non-acked interrupt coming in.

> for 2048:
> AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
> AR_IMR: 0xdeadbeef, AR_ISR: 0xdeadbeef
>
> for 2^12, 2^13, ..., 2^16
> AR_INTR_SYNC_CAUSE: 0x00020000, AR_INTR_ASYNC_CAUSE: 0x00000000
> AR_IMR: 0xdeadbeef, AR_ISR: 0xdeadbeef
>
> The last ones indeed correspond to AR_INTR_SYNC_MAC_SLEEP_ACCESS
> that you mentioned. Note that this output is interleaved with
> device initialization. So I don't know if that causes the register
> contents to become valid, or if it changes the contents.
>
> If I suspend/resume first, then load the module, then wait for 500
> ms after request_irq I get only zeroes, repeated 109 times:
>
> for 1, 2, ..., 99, 2^7, 2^8, ..., 2^16:
> AR_IMR: 0x00000000, AR_ISR: 0x00000000
> AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000

Would you mind also printing the contents of AR_IER there? I wonder if
AR_IER is incorrectly being enabled.

I think this may be a shared interrupt. What else is using the same
IRQ line? What about only printing out those lines if the interrupt is
actually generated by the NIC? There's a function further down in the
isr routine which checks whether the NIC itself posted the interrupt
(it checks SYNC_CAUSE/ASYNC_CAUSE.) That should cut down the spam
quite a bit and only log the relevant entries for ath9k.




Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-04  7:58                           ` Adrian Chadd
@ 2011-10-04 18:15                             ` Clemens Buchacher
  2011-10-04 21:11                               ` Adrian Chadd
  2011-10-04 18:36                             ` Clemens Buchacher
  1 sibling, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-04 18:15 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless

On Tue, Oct 04, 2011 at 03:58:09PM +0800, Adrian Chadd wrote:
> 
> On 3 October 2011 16:48, Clemens Buchacher <drizzd@aon.at> wrote:
> 
> > for 1, 2, ..., 99, 128, 256:
> > AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
> > AR_IMR: 0xdeadbeef, AR_ISR: 0xdeadbeef
> 
> .. that says some part of the MAC is very asleep. But no interrupt
> cause bits are set. So I bet it's a shared interrupt.

Unfortunately, that's not it. From /proc/interrupts:

            CPU0       CPU1       CPU2       CPU3       
  17:      24958      25047      25012      24984   IO-APIC-fasteoi   ath9k
  
This is right after the IRQ is disabled because of too many
unhandled interrupts.

> > If I suspend/resume first, then load the module, then wait for 500
> > ms after request_irq I get only zeroes, repeated 109 times:
> >
> > for 1, 2, ..., 99, 2^7, 2^8, ..., 2^16:
> > AR_IMR: 0x00000000, AR_ISR: 0x00000000
> > AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
> 
> Would you mind also printing the contents of AR_IER there? I wonder if
> AR_IER is incorrectly being enabled.

Not much more helpful, I'm afraid:

AR_IMR: 0x00000000, AR_ISR: 0x00000000, AR_IER: 0x00000000
AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
(repeated 109 times)

It seems that at this point the device is in a bad state.

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-04  7:58                           ` Adrian Chadd
  2011-10-04 18:15                             ` Clemens Buchacher
@ 2011-10-04 18:36                             ` Clemens Buchacher
  1 sibling, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-04 18:36 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless

On Tue, Oct 04, 2011 at 03:58:09PM +0800, Adrian Chadd wrote:
> 
> There's a function further down in the isr routine which checks
> whether the NIC itself posted the interrupt (it checks
> SYNC_CAUSE/ASYNC_CAUSE.) That should cut down the spam quite a
> bit and only log the relevant entries for ath9k.

If I do this no interrupts are logged at all.

In previous traces SYNC_CAUSE became non-zero only because the
debugging output delayed the interrupt handler enough to give the
initialization code a chance to run.

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-04 18:15                             ` Clemens Buchacher
@ 2011-10-04 21:11                               ` Adrian Chadd
  2011-10-05  6:28                                 ` Clemens Buchacher
  0 siblings, 1 reply; 38+ messages in thread
From: Adrian Chadd @ 2011-10-04 21:11 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless

On 5 October 2011 02:15, Clemens Buchacher <drizzd@aon.at> wrote:
>> Would you mind also printing the contents of AR_IER there? I wonder if
>> AR_IER is incorrectly being enabled.
>
> Not much more helpful, I'm afraid:
>
> AR_IMR: 0x00000000, AR_ISR: 0x00000000, AR_IER: 0x00000000
> AR_INTR_SYNC_CAUSE: 0x00000000, AR_INTR_ASYNC_CAUSE: 0x00000000
> (repeated 109 times)
>
> It seems that at this point the device is in a bad state.

Well, the question now is whether the device itself is actually firing
off interrupts.

>From what you've shown there, the device _isn't_ firing off interrupts
the normal way.
So I wonder if it's the card or whether it's something else (eg the
PCI bus glue.)

Here, you could try inserting this in the SC_OP_INVALID bit, right
after you do some logging. I'm still assuming your AR9285 is somehow
firing off interrupts.
At this point however I'm all just guessing here. It'd be nice to know
whether the device is _really_ sending interrupts or not.

/* Mark _all_ async interrupts as serviced */
REG_WRITE(ah, AR_INTR_ASYNC_CAUSE, 0xffffffff);
REG_READ(ah, AR_INTR_ASYNC_CAUSE);

/* Disable sync interrupt sources */
REG_WRITE(ah, AR_INTR_SYNC_ENABLE, 0);
/* Mark _all_ sync interrupts as serviced */
REG_WRITE(ah, AR_INTR_SYNC_CAUSE, 0xffffffff);
REG_READ(ah, AR_INTR_SYNC_CAUSE);

/* Do a reset of the AHB and host interface */
REG_WRITE(ah, AR_RC, AR_RC_AHB | AR_RC_HOSTIF);
DELAY(100);
REG_WRITE(ah, AR_RC, 0);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-04 21:11                               ` Adrian Chadd
@ 2011-10-05  6:28                                 ` Clemens Buchacher
  2011-10-05 13:02                                   ` Adrian Chadd
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-05  6:28 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless

On Wed, Oct 05, 2011 at 05:11:00AM +0800, Adrian Chadd wrote:
> 
> Well, the question now is whether the device itself is actually firing
> off interrupts.
> 
> From what you've shown there, the device _isn't_ firing off interrupts
> the normal way.
> So I wonder if it's the card or whether it's something else (eg the
> PCI bus glue.)
> 
> Here, you could try inserting this in the SC_OP_INVALID bit, right
> after you do some logging. I'm still assuming your AR9285 is somehow
> firing off interrupts.
> At this point however I'm all just guessing here. It'd be nice to know
> whether the device is _really_ sending interrupts or not.

No change with the code below, except that the timing is slightly
different.

> /* Mark _all_ async interrupts as serviced */
> REG_WRITE(ah, AR_INTR_ASYNC_CAUSE, 0xffffffff);
> REG_READ(ah, AR_INTR_ASYNC_CAUSE);
> 
> /* Disable sync interrupt sources */
> REG_WRITE(ah, AR_INTR_SYNC_ENABLE, 0);
> /* Mark _all_ sync interrupts as serviced */
> REG_WRITE(ah, AR_INTR_SYNC_CAUSE, 0xffffffff);
> REG_READ(ah, AR_INTR_SYNC_CAUSE);
> 
> /* Do a reset of the AHB and host interface */
> REG_WRITE(ah, AR_RC, AR_RC_AHB | AR_RC_HOSTIF);
> DELAY(100);

Since I could not find a suitable DELAY macro, I used this instead:

 int delay = 100;
 DO_DELAY(delay);

> REG_WRITE(ah, AR_RC, 0);

For the record, instead of going to full sleep, I can reproduce the
issue with

 echo devices >/sys/power/pm_test
 echo mem >/sys/power/state

but not with

 echo freezer >/sys/power/pm_test
 echo mem >/sys/power/state

So it could still be a different device that's causing the issue? I
can try unloading all kinds of modules. Or is there an option to
suspend just one device?

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-05  6:28                                 ` Clemens Buchacher
@ 2011-10-05 13:02                                   ` Adrian Chadd
  2011-10-12 13:10                                     ` Mohammed Shafi
  0 siblings, 1 reply; 38+ messages in thread
From: Adrian Chadd @ 2011-10-05 13:02 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless

On 5 October 2011 14:28, Clemens Buchacher <drizzd@aon.at> wrote:
> For the record, instead of going to full sleep, I can reproduce the
> issue with
>
>  echo devices >/sys/power/pm_test
>  echo mem >/sys/power/state
>
> but not with
>
>  echo freezer >/sys/power/pm_test
>  echo mem >/sys/power/state
>
> So it could still be a different device that's causing the issue? I
> can try unloading all kinds of modules. Or is there an option to
> suspend just one device?

Right. So its likely something (more) odd. Yes, see if that's the case
:) At this point I'm going to punt this to the power management/ACPI
people .:)

Good luck!


Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-05 13:02                                   ` Adrian Chadd
@ 2011-10-12 13:10                                     ` Mohammed Shafi
  2011-10-15  9:39                                       ` Clemens Buchacher
  0 siblings, 1 reply; 38+ messages in thread
From: Mohammed Shafi @ 2011-10-12 13:10 UTC (permalink / raw)
  To: Adrian Chadd, Clemens Buchacher; +Cc: linux-wireless

Hi Clemens,

please try this in case it helps anything
to debug the issue.

 this should avoid dead beef and the interrupt when we access MAC
registers when the MAC is asleep. also i did not see MAC irq being not
triggerred in your case(AR_INTR_ASYNC_CAUSE should be 0x2) or may be
its not an interrupt triggered by the chip itself. thanks

diff --git a/drivers/net/wireless/ath/ath9k/main.c
b/drivers/net/wireless/ath/ath9k/main.c
index 93fbe6f..51b7c54 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -770,6 +770,7 @@ irqreturn_t ath_isr(int irq, void *dev)
        enum ath9k_int status;
        bool sched = false;

+       ath9k_ps_wakeup(sc);
        /*
         * The hardware is not ready/present, don't
         * touch anything. Note this can happen early


On Wed, Oct 5, 2011 at 6:32 PM, Adrian Chadd <adrian@freebsd.org> wrote:
> On 5 October 2011 14:28, Clemens Buchacher <drizzd@aon.at> wrote:
>> For the record, instead of going to full sleep, I can reproduce the
>> issue with
>>
>>  echo devices >/sys/power/pm_test
>>  echo mem >/sys/power/state
>>
>> but not with
>>
>>  echo freezer >/sys/power/pm_test
>>  echo mem >/sys/power/state
>>
>> So it could still be a different device that's causing the issue? I
>> can try unloading all kinds of modules. Or is there an option to
>> suspend just one device?
>
> Right. So its likely something (more) odd. Yes, see if that's the case
> :) At this point I'm going to punt this to the power management/ACPI
> people .:)
>
> Good luck!
>
>
> Adrian
>



-- 
shafi

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-12 13:10                                     ` Mohammed Shafi
@ 2011-10-15  9:39                                       ` Clemens Buchacher
  2011-10-15 10:01                                         ` Adrian Chadd
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-15  9:39 UTC (permalink / raw)
  To: Mohammed Shafi; +Cc: Adrian Chadd, linux-wireless

Hi Mohammed,

On Wed, Oct 12, 2011 at 06:40:06PM +0530, Mohammed Shafi wrote:
> 
> please try this in case it helps anything
> to debug the issue.

With this, the previous instaces of 0xdeadbeaf are replaced with
zeroes, and I do not get any non-zero traces for SYNC_CAUSE or
ASYNC_CAUSE any more.

>  this should avoid dead beef and the interrupt when we access MAC
> registers when the MAC is asleep. also i did not see MAC irq being not
> triggerred in your case(AR_INTR_ASYNC_CAUSE should be 0x2) or may be
> its not an interrupt triggered by the chip itself. thanks

If it's not triggered by ath9k, I do wonder why it works if I
suspend in connected mode, and why reloading the driver usually
fixes the issue. Maybe there is something in the driver unloading
code that we could look at?

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-15  9:39                                       ` Clemens Buchacher
@ 2011-10-15 10:01                                         ` Adrian Chadd
  2011-10-18  6:44                                           ` Clemens Buchacher
  0 siblings, 1 reply; 38+ messages in thread
From: Adrian Chadd @ 2011-10-15 10:01 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless

I've found some perhaps relevant bits of code here which force the
chip into and out of reset/sleep states.

It's possible that your BIOS doesn't fully reset the NIC upon startup.
It's also possible there are issues with the ath9k power save/resume
code. It's also possible they're all compounding to make things worse.
Are you able to completely and utterly disable any/all power saving on
ath9k, including APSM if you have it enabled?

Unfortunately the information I have is a bit sketchy. I'll do some
further digging (and hope I don't bug Atheros people too much :) to
see if I can extract out some more information about what's going on.

Thanks,


Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-15 10:01                                         ` Adrian Chadd
@ 2011-10-18  6:44                                           ` Clemens Buchacher
  2011-10-18  7:05                                             ` Adrian Chadd
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-18  6:44 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless

On Sat, Oct 15, 2011 at 06:01:38PM +0800, Adrian Chadd wrote:
>
> Are you able to completely and utterly disable any/all power saving on
> ath9k, including APSM if you have it enabled?

Unfortunately, no change with the following options disabled:

 PCIEAER y -> n
 PCIEASPM y -> n
 PM_RUNTIME y -> n

Anything else I should disable?

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-18  6:44                                           ` Clemens Buchacher
@ 2011-10-18  7:05                                             ` Adrian Chadd
  2011-10-21 10:22                                               ` Clemens Buchacher
  0 siblings, 1 reply; 38+ messages in thread
From: Adrian Chadd @ 2011-10-18  7:05 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless

Hm, which version of the AR9285 is it, exactly?
Would you just do a quick check (or code hack) and see if it matches
the AR_SREV_9285SE_20 (or similar) macro?

(I see a WAR in the reference code for that particular model AR9285,
that isn't in ath9k.)

I'm honestly still not convinced that it's (soley) the AR9285. You
_could_ do something naughty like disabling all interrupts and then
powering off the NIC (but not detaching it from the bus glue, like
unloading the module does.)

Good luck,


Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-18  7:05                                             ` Adrian Chadd
@ 2011-10-21 10:22                                               ` Clemens Buchacher
  2011-10-21 14:10                                                   ` [ath9k-devel] " Adrian Chadd
  0 siblings, 1 reply; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-21 10:22 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless

On Tue, Oct 18, 2011 at 03:05:38PM +0800, Adrian Chadd wrote:
>
> Hm, which version of the AR9285 is it, exactly?
> Would you just do a quick check (or code hack) and see if it matches
> the AR_SREV_9285SE_20 (or similar) macro?

It matches AR_SREV_9285, but it does not match AR_SREV_9285E_20.

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-21 10:22                                               ` Clemens Buchacher
@ 2011-10-21 14:10                                                   ` Adrian Chadd
  0 siblings, 0 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-10-21 14:10 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless, ath9k-devel

On 21 October 2011 18:22, Clemens Buchacher <drizzd@aon.at> wrote:

> It matches AR_SREV_9285, but it does not match AR_SREV_9285E_20.

Ok. Well, I've read the ar9285 host documentation that I've gotten my
hands on and I can't see anything obvious. All I can think of is that
something else is throwing those interrupts - whether it's the NIC
throwing the error, or whether its the bus itself throwing an error
and you're seeing the AR9285 because it's the only thing attached. I
however know next to nothing about PCI/PCIe and the various fruity
behaviour seen.

I assume you have disabled the NIC power save option. Not just APSM,
etc, but the powersave options in iw. I don't know what they are,
sorry, only that they exist.

I suggest modifying ath_intr() so if there is an interrupt storm,
disable _all_ sources of interrupts (ie, set sync_cause, async_cause
to 0, set imr to 0, set ier to 0) and see if it continues occuring.

If the interrupt storm continue occuring, try putting the whole NIC to
sleep entirely (ie, call pcipowersave to put the NIC to sleep, then
make sure the NIC -stays- asleep somehow, ie so nothing else tries
waking it up - I don't know how to do this, likely you'll have to
clear some interface flag so the rest of the ath9k and 802.11 stack
thinks the nic is actually down for the count) and see if you're still
getting spurious interrupts.

If you are at this point, the only other possible thing I can think of
doing involves a PCIe bus analyser, and you/I don't have access to
those. :-)



Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [ath9k-devel] ath9k: irq storm after suspend/resume
@ 2011-10-21 14:10                                                   ` Adrian Chadd
  0 siblings, 0 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-10-21 14:10 UTC (permalink / raw)
  To: ath9k-devel

On 21 October 2011 18:22, Clemens Buchacher <drizzd@aon.at> wrote:

> It matches AR_SREV_9285, but it does not match AR_SREV_9285E_20.

Ok. Well, I've read the ar9285 host documentation that I've gotten my
hands on and I can't see anything obvious. All I can think of is that
something else is throwing those interrupts - whether it's the NIC
throwing the error, or whether its the bus itself throwing an error
and you're seeing the AR9285 because it's the only thing attached. I
however know next to nothing about PCI/PCIe and the various fruity
behaviour seen.

I assume you have disabled the NIC power save option. Not just APSM,
etc, but the powersave options in iw. I don't know what they are,
sorry, only that they exist.

I suggest modifying ath_intr() so if there is an interrupt storm,
disable _all_ sources of interrupts (ie, set sync_cause, async_cause
to 0, set imr to 0, set ier to 0) and see if it continues occuring.

If the interrupt storm continue occuring, try putting the whole NIC to
sleep entirely (ie, call pcipowersave to put the NIC to sleep, then
make sure the NIC -stays- asleep somehow, ie so nothing else tries
waking it up - I don't know how to do this, likely you'll have to
clear some interface flag so the rest of the ath9k and 802.11 stack
thinks the nic is actually down for the count) and see if you're still
getting spurious interrupts.

If you are at this point, the only other possible thing I can think of
doing involves a PCIe bus analyser, and you/I don't have access to
those. :-)



Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-21 14:10                                                   ` [ath9k-devel] " Adrian Chadd
@ 2011-10-21 19:03                                                     ` Clemens Buchacher
  -1 siblings, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-21 19:03 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless, ath9k-devel

On Fri, Oct 21, 2011 at 10:10:21PM +0800, Adrian Chadd wrote:
> 
> If the interrupt storm continue occuring, try putting the whole NIC to
> sleep entirely (ie, call pcipowersave to put the NIC to sleep, then
> make sure the NIC -stays- asleep somehow, ie so nothing else tries
> waking it up.

Is it possible to figure out whether or not it is asleep in the
interrupt routine?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [ath9k-devel] ath9k: irq storm after suspend/resume
@ 2011-10-21 19:03                                                     ` Clemens Buchacher
  0 siblings, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-21 19:03 UTC (permalink / raw)
  To: ath9k-devel

On Fri, Oct 21, 2011 at 10:10:21PM +0800, Adrian Chadd wrote:
> 
> If the interrupt storm continue occuring, try putting the whole NIC to
> sleep entirely (ie, call pcipowersave to put the NIC to sleep, then
> make sure the NIC -stays- asleep somehow, ie so nothing else tries
> waking it up.

Is it possible to figure out whether or not it is asleep in the
interrupt routine?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-21 14:10                                                   ` [ath9k-devel] " Adrian Chadd
@ 2011-10-21 20:20                                                     ` Clemens Buchacher
  -1 siblings, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-21 20:20 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless, ath9k-devel

On Fri, Oct 21, 2011 at 10:10:21PM +0800, Adrian Chadd wrote:
> 
> I assume you have disabled the NIC power save option. Not just APSM,
> etc, but the powersave options in iw. I don't know what they are,
> sorry, only that they exist.

I am not sure if that's what you mean, but if I set "iwconfig wlan0
power off" it makes no difference.

> I suggest modifying ath_intr() so if there is an interrupt storm,
> disable _all_ sources of interrupts (ie, set sync_cause, async_cause
> to 0, set imr to 0, set ier to 0) and see if it continues occuring.

They keep coming with this too.

> If the interrupt storm continue occuring, try putting the whole NIC to
> sleep entirely (ie, call pcipowersave to put the NIC to sleep, then
> make sure the NIC -stays- asleep somehow, ie so nothing else tries
> waking it up - I don't know how to do this, likely you'll have to
> clear some interface flag so the rest of the ath9k and 802.11 stack
> thinks the nic is actually down for the count) and see if you're still
> getting spurious interrupts.

I am now calling

 ath9k_hw_ops(sc->sc_ah)->config_pci_powersave(sc->sc_ah, true);
 ath9k_setpower(sc, ATH9K_PM_FULL_SLEEP);

right before request_irq, and the interrupts still come. And the
device is in FULL_SLEEP the whole time, at least according to
ah->power_mode.

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [ath9k-devel] ath9k: irq storm after suspend/resume
@ 2011-10-21 20:20                                                     ` Clemens Buchacher
  0 siblings, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-21 20:20 UTC (permalink / raw)
  To: ath9k-devel

On Fri, Oct 21, 2011 at 10:10:21PM +0800, Adrian Chadd wrote:
> 
> I assume you have disabled the NIC power save option. Not just APSM,
> etc, but the powersave options in iw. I don't know what they are,
> sorry, only that they exist.

I am not sure if that's what you mean, but if I set "iwconfig wlan0
power off" it makes no difference.

> I suggest modifying ath_intr() so if there is an interrupt storm,
> disable _all_ sources of interrupts (ie, set sync_cause, async_cause
> to 0, set imr to 0, set ier to 0) and see if it continues occuring.

They keep coming with this too.

> If the interrupt storm continue occuring, try putting the whole NIC to
> sleep entirely (ie, call pcipowersave to put the NIC to sleep, then
> make sure the NIC -stays- asleep somehow, ie so nothing else tries
> waking it up - I don't know how to do this, likely you'll have to
> clear some interface flag so the rest of the ath9k and 802.11 stack
> thinks the nic is actually down for the count) and see if you're still
> getting spurious interrupts.

I am now calling

 ath9k_hw_ops(sc->sc_ah)->config_pci_powersave(sc->sc_ah, true);
 ath9k_setpower(sc, ATH9K_PM_FULL_SLEEP);

right before request_irq, and the interrupts still come. And the
device is in FULL_SLEEP the whole time, at least according to
ah->power_mode.

Clemens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-21 20:20                                                     ` [ath9k-devel] " Clemens Buchacher
@ 2011-10-22  0:47                                                       ` Adrian Chadd
  -1 siblings, 0 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-10-22  0:47 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless, ath9k-devel

.. then I don't think it's specifically the NIC.

>From what I understand, unless there are bus errors happening (and
that'd occur with the NIC awake or having something being written to
it) the NIC shouldn't be generating interrupts if:

* sync/async cause are 0;
* ier is 0;
* imr is 0.

Now maybe some part of ath9k is still trying to write to the NIC when
it's asleep, but it would set some bits in sync_cause/async_cause.

At this point I'd really suggest whacking other devices into the slot
to see if they generate the same kind of behaviour.



Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [ath9k-devel] ath9k: irq storm after suspend/resume
@ 2011-10-22  0:47                                                       ` Adrian Chadd
  0 siblings, 0 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-10-22  0:47 UTC (permalink / raw)
  To: ath9k-devel

.. then I don't think it's specifically the NIC.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-22  0:47                                                       ` [ath9k-devel] " Adrian Chadd
@ 2011-10-22  7:22                                                         ` Clemens Buchacher
  -1 siblings, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-22  7:22 UTC (permalink / raw)
  To: Adrian Chadd; +Cc: Mohammed Shafi, linux-wireless, ath9k-devel

Hi,

On Sat, Oct 22, 2011 at 08:47:33AM +0800, Adrian Chadd wrote:
> 
> At this point I'd really suggest whacking other devices into the slot
> to see if they generate the same kind of behaviour.

Looks like you had it right from the start. After unloading the jme
module [*1*] I cannot reproduce the IRQ storm any more. From
looking at dmesg it should have been obvious, since it also gets
IRQ 17 assigned. I don't know how I missed that, I was sure I had
checked.

Sorry guys, for most likely wasting your time. And thanks for your
patience.

Best,
Clemens

[*1*]

$ lspci
06:00.5 Ethernet controller: JMicron Technology Corp. JMC250 PCI Express Gigabit Ethernet Controller (rev 03))

$ dmesg|grep -i jme
[    6.160187] jme: JMicron JMC2XX ethernet driver version 1.0.8
[    6.160278] jme 0000:06:00.5: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    6.160305] jme 0000:06:00.5: setting latency timer to 64
[    6.161154] jme 0000:06:00.5: eth0: JMC250 Gigabit Ethernet chiprev:23 pcirev:3 macaddr:bc:ae:c5:2f:91:47
[   18.354697] jme 0000:06:00.5: PCI INT A disabled

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [ath9k-devel] ath9k: irq storm after suspend/resume
@ 2011-10-22  7:22                                                         ` Clemens Buchacher
  0 siblings, 0 replies; 38+ messages in thread
From: Clemens Buchacher @ 2011-10-22  7:22 UTC (permalink / raw)
  To: ath9k-devel

Hi,

On Sat, Oct 22, 2011 at 08:47:33AM +0800, Adrian Chadd wrote:
> 
> At this point I'd really suggest whacking other devices into the slot
> to see if they generate the same kind of behaviour.

Looks like you had it right from the start. After unloading the jme
module [*1*] I cannot reproduce the IRQ storm any more. From
looking at dmesg it should have been obvious, since it also gets
IRQ 17 assigned. I don't know how I missed that, I was sure I had
checked.

Sorry guys, for most likely wasting your time. And thanks for your
patience.

Best,
Clemens

[*1*]

$ lspci
06:00.5 Ethernet controller: JMicron Technology Corp. JMC250 PCI Express Gigabit Ethernet Controller (rev 03))

$ dmesg|grep -i jme
[    6.160187] jme: JMicron JMC2XX ethernet driver version 1.0.8
[    6.160278] jme 0000:06:00.5: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    6.160305] jme 0000:06:00.5: setting latency timer to 64
[    6.161154] jme 0000:06:00.5: eth0: JMC250 Gigabit Ethernet chiprev:23 pcirev:3 macaddr:bc:ae:c5:2f:91:47
[   18.354697] jme 0000:06:00.5: PCI INT A disabled

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ath9k: irq storm after suspend/resume
  2011-10-22  7:22                                                         ` [ath9k-devel] " Clemens Buchacher
@ 2011-10-22  7:29                                                           ` Adrian Chadd
  -1 siblings, 0 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-10-22  7:29 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Mohammed Shafi, linux-wireless, ath9k-devel

On 22 October 2011 15:22, Clemens Buchacher <drizzd@aon.at> wrote:

> Looks like you had it right from the start. After unloading the jme
> module [*1*] I cannot reproduce the IRQ storm any more. From
> looking at dmesg it should have been obvious, since it also gets
> IRQ 17 assigned. I don't know how I missed that, I was sure I had
> checked.
>
> Sorry guys, for most likely wasting your time. And thanks for your
> patience.

Ah, eek! Well, you're welcome! And I hope you/others have a better
understanding of how the interrupt setup on these NICs work.
(I certainly have a better understanding now of the host interface and
what we can poke when it's awake and asleep.)

Good luck!



Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [ath9k-devel] ath9k: irq storm after suspend/resume
@ 2011-10-22  7:29                                                           ` Adrian Chadd
  0 siblings, 0 replies; 38+ messages in thread
From: Adrian Chadd @ 2011-10-22  7:29 UTC (permalink / raw)
  To: ath9k-devel

On 22 October 2011 15:22, Clemens Buchacher <drizzd@aon.at> wrote:

> Looks like you had it right from the start. After unloading the jme
> module [*1*] I cannot reproduce the IRQ storm any more. From
> looking at dmesg it should have been obvious, since it also gets
> IRQ 17 assigned. I don't know how I missed that, I was sure I had
> checked.
>
> Sorry guys, for most likely wasting your time. And thanks for your
> patience.

Ah, eek! Well, you're welcome! And I hope you/others have a better
understanding of how the interrupt setup on these NICs work.
(I certainly have a better understanding now of the host interface and
what we can poke when it's awake and asleep.)

Good luck!



Adrian

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2011-10-22  7:29 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-27 11:32 ath9k: irq storm after suspend/resume Clemens Buchacher
2011-08-29  7:55 ` Mohammed Shafi
2011-08-29 11:53   ` Mohammed Shafi
2011-08-29 15:12     ` Mohammed Shafi
2011-08-30  6:41       ` Clemens Buchacher
2011-08-30  9:33         ` Mohammed Shafi
2011-08-30  9:41           ` Mohammed Shafi
2011-09-01  6:24           ` Clemens Buchacher
2011-09-26  9:24             ` Mohammed Shafi
2011-09-27 21:42               ` Clemens Buchacher
2011-09-29  8:18                 ` Mohammed Shafi
2011-09-29 10:33                   ` Adrian Chadd
2011-09-29 17:11                     ` Clemens Buchacher
     [not found]                     ` <CAD2nsn0Z2J4r4tN_fLjx5bbvz2bg6NVcQ8vppJbbNcgOF8pFew@mail.gmail.com>
     [not found]                       ` <CAJ-VmokcM4KmzV7Rn9PA68iEiTJiPw=ffYgNFLDAHShxD0HNAg@mail.gmail.com>
2011-10-03  8:48                         ` Clemens Buchacher
2011-10-04  7:58                           ` Adrian Chadd
2011-10-04 18:15                             ` Clemens Buchacher
2011-10-04 21:11                               ` Adrian Chadd
2011-10-05  6:28                                 ` Clemens Buchacher
2011-10-05 13:02                                   ` Adrian Chadd
2011-10-12 13:10                                     ` Mohammed Shafi
2011-10-15  9:39                                       ` Clemens Buchacher
2011-10-15 10:01                                         ` Adrian Chadd
2011-10-18  6:44                                           ` Clemens Buchacher
2011-10-18  7:05                                             ` Adrian Chadd
2011-10-21 10:22                                               ` Clemens Buchacher
2011-10-21 14:10                                                 ` Adrian Chadd
2011-10-21 14:10                                                   ` [ath9k-devel] " Adrian Chadd
2011-10-21 19:03                                                   ` Clemens Buchacher
2011-10-21 19:03                                                     ` [ath9k-devel] " Clemens Buchacher
2011-10-21 20:20                                                   ` Clemens Buchacher
2011-10-21 20:20                                                     ` [ath9k-devel] " Clemens Buchacher
2011-10-22  0:47                                                     ` Adrian Chadd
2011-10-22  0:47                                                       ` [ath9k-devel] " Adrian Chadd
2011-10-22  7:22                                                       ` Clemens Buchacher
2011-10-22  7:22                                                         ` [ath9k-devel] " Clemens Buchacher
2011-10-22  7:29                                                         ` Adrian Chadd
2011-10-22  7:29                                                           ` [ath9k-devel] " Adrian Chadd
2011-10-04 18:36                             ` Clemens Buchacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.