All of lore.kernel.org
 help / color / mirror / Atom feed
* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
@ 2012-11-12  9:06 Felix Liao
  2012-11-12  9:43 ` Adrian Chadd
  0 siblings, 1 reply; 9+ messages in thread
From: Felix Liao @ 2012-11-12  9:06 UTC (permalink / raw)
  To: ath9k-devel

Hi ath9k team,
I trace the ath9k_hw_getisr() today to find out why I get the zero ISR mask, and I found that when I get the zero ISR mask, this function return the boolean true, then I trace into the function ar9003_hw_get_isr(), 

static bool ar9003_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
{
        ....
        async_cause = REG_READ(ah, AR_INTR_ASYNC_CAUSE);

        if (async_cause & (AR_INTR_MAC_IRQ | AR_INTR_ASYNC_MASK_MCI)) {
                if ((REG_READ(ah, AR_RTC_STATUS) & AR_RTC_STATUS_M)
                                == AR_RTC_STATUS_ON)
                        isr = REG_READ(ah, AR_ISR);
        }


        sync_cause = REG_READ(ah, AR_INTR_SYNC_CAUSE) & AR_INTR_SYNC_DEFAULT;

        *masked = 0;

        if (!isr && !sync_cause && !async_cause)
                return false;

        if (isr == 0) {
                //debug value here is isr=0, async_cause=0, sync_cause=0x20 --> AR_INTR_SYNC_HOST1_FATAL
        }
        ...
}

I have no idea why I always get the isr=0 and sync_cause=0x20 which means AR_INTR_SYNC_HOST1_FATAL, but we can't deal with this value in the source code, do you know any about this?

Thanks,
- Felix

-----Original Message-----
From: Felix Liao 
Sent: Monday, November 12, 2012 10:10 AM
To: 'ath9k-devel at lists.ath9k.org'
Subject: RE: ath9k_tasklet() can be interrupted by a hardware IRQ

Hi ath9k team:
Another interest question is since the ath9k interrupt was disabled, how can we get a new IRQ to interrupt the tasklet? 
So I investigated and found that the cause of the issue "irq 16: nobody cared" in my box is the ath9k's irq handler return too many IRQ_NONE since I read the ISR register always return 0 as below.

        /*
         * Figure out the reason(s) for the interrupt.  Note
         * that the hal returns a pseudo-ISR that may include
         * bits we haven't explicitly enabled so we mask the
         * value to insure we only process bits we requested.
         */
        ath9k_hw_getisr(ah, &status);   /* NB: clears ISR too */
        status &= ah->imask;    /* discard unasked-for bits */

        /*
         * If there are no status bits set, then this interrupt was not
         * for me (should have been caught above).
         */
        if (!status) 
                return IRQ_NONE;

if I go to reset chip instead of return IRQ_NONE in this position, the things become a little healthier than before, I will never see the message "irq 16: nobody cared".

I check each IRQ using "if(AR_SREV_9300(ah))" and they are all passed, that is, no other hardware share the irq with ath9k.
I need to know why we read the ISR register and get value 0, and return IRQ_NONE, the chip I used is AR9300.
Can we using this change below in the ath_isr() function:
        if (!status) 
                goto reset_chip;

Thanks,
- Felix
-----
From: Felix Liao
Sent: Friday, November 09, 2012 9:33 AM
To: 'ath9k-devel at lists.ath9k.org'
Subject: ath9k_tasklet() can be interrupted by a hardware IRQ

Hi all,
I met a kernel call trace below using the ath9k driver, we can see that the ath9k_tasklet() was interrupted by a hardware IRQ, and at this time the ath9k IRQ was disabled, so the kernel will complain that there is none irq handler for this interrupt.

[?? 60.977474] irq 16: nobody cared (try booting with the "irqpoll" option) [?? 60.984194] Call Trace:
[?? 60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable) [?? 60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8 [?? 60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200/0x260 [?? 61.004396] [dfff1fe0] [c0078704] handle_fasteoi_irq+0xc4/0x10c [?? 61.010340] [dfff1ff0] [c0010fdc] call_handle_irq+0x18/0x28 [?? 61.015929] [dfff3e50] [c00051c8] do_IRQ+0xcc/0x1b4 [?? 61.020825] [dfff3e80] [c0011fc4] ret_from_except+0x0/0x18 [?? 61.026349] --- Exception: 501 at ioread32+0x8/0x14 [?? 61.026355]???? LR = ath_descdma_setup+0x294/0x664 [ath9k] [?? 61.036733] [dfff3f40] [c04f5bdc] softirq_to_name+0x0/0x28 (unreliable) [?? 61.043390] [dfff3f50] [e5ad66fc] ath9k_hw_enable_interrupts+0x160/0x1b0 [ath9k_hw] [?? 61.051074] [dfff3f70] [e5b41368] ath9k_tasklet+0xc0/0x208 [ath9k] [?? 61.057274] [dfff3f90] [c0043298] tasklet_action+0xcc/0xf8 [?? 61.062777] [dfff3fb0] [c0043d48] __do_softirq+0xdc/0x18c [?? 61.068194] [dfff3ff0] [c0010fb4] call_do_softirq+0x14/0x24 [?? 61.073784] [d8613e60] [c0004fec] do_softirq+0x8c/0x98 [?? 61.078939] [d8613e80] [c0043bd0] local_bh_enable+0x9c/0xa0 [?? 61.084536] [d8613e90] [c036d910] unix_create1+0x184/0x1ac [?? 61.090039] [d8613eb0] [c036d9a0] unix_create+0x68/0xbc [?? 61.095286] [d8613ec0] [c02c6c34] __sock_create+0x114/0x248 [?? 61.100877] [d8613ef0] [c02c7008] sys_socket+0x54/0x84 [?? 61.106031] [d8613f10] [c02c70e8] sys_socketcall+0xb0/0x258 [?? 61.111622] [d8613f40] [c0011970] ret_from_syscall+0x0/0x3c [?? 61.117338] --- Exception: c01 at 0xeff41fc [?? 61.117343]???? LR = 0xeff4350 [?? 61.124578] handlers:
[?? 61.126851] [<e5b3eb44>] (ath_isr+0x0/0x21c [ath9k]) [?? 61.131840] Disabling IRQ #16 [?? 61.143234] ath: phy0: Failed to stop TX DMA, queues=0x001!

I made a patch below for this issue and use it to reduce the probability of this issue, you can see that, I can't disable the IRQ from start to end of this function, I need to restore the IRQ flags before we call ath_rx_tasklet and ath_tx_tasklet, since these functions will disable and enable the local softirq using spin_lock_bh and spin_unlock_bh, ?since with this gap of time, the tasklet still can be interrupted by the hardware IRQ.

diff -pNaur compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c
--- compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c??? 2012-11-08 00:17:08.000000000 +0800
+++ compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c?????????????? 
+++ 2012-11-09 07:27:28.459837051 +0800
@@ -368,6 +368,7 @@ void ath9k_tasklet(unsigned long data)
????????????? u32 status = sc->intrstatus;
????????????? u32 rxmask;

+???????????? local_irq_save(flags);
????????????? ath9k_ps_wakeup(sc);
????????????? spin_lock(&sc->sc_pcu_lock);

@@ -383,7 +384,7 @@ void ath9k_tasklet(unsigned long data)
???????????????????????????? goto out;
????????????? }

-????????????? spin_lock_irqsave(&sc->sc_pm_lock, flags);
+???????????? spin_lock(&sc->sc_pm_lock);
????????????? if ((status & ATH9K_INT_TSFOOR) && sc->ps_enabled) {
???????????????????????????? /*
???????????????????????????? ?* TSF sync does not look correct; remain awake to sync with @@ -392,7 +393,7 @@ void ath9k_tasklet(unsigned long data)
???????????????????????????? ath_dbg(common, PS, "TSFOOR - Sync with next Beacon\n");
???????????????????????????? sc->ps_flags |= PS_WAIT_FOR_BEACON | PS_BEACON_SYNC;
????????????? }
-????????????? spin_unlock_irqrestore(&sc->sc_pm_lock, flags);
+???????????? spin_unlock(&sc->sc_pm_lock);

?????????????? if (ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
???????????????????????????? rxmask = (ATH9K_INT_RXHP | ATH9K_INT_RXLP | ATH9K_INT_RXEOL | @@ -400,6 +401,7 @@ void ath9k_tasklet(unsigned long data)
????????????? else
???????????????????????????? rxmask = (ATH9K_INT_RX | ATH9K_INT_RXEOL | ATH9K_INT_RXORN);

+???????????? local_irq_restore(flags);
????????????? if (status & rxmask) {
???????????????????????????? /* Check for high priority Rx first */
???????????????????????????? if ((ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) && @@ -416,6 +418,7 @@ void ath9k_tasklet(unsigned long data)
??????????????????????????????????????????? ath_tx_tasklet(sc);
????????????? }

+???????????? local_irq_save(flags);
????????????? ath9k_btcoex_handle_interrupt(sc, status);

?out:
@@ -424,6 +427,7 @@ out:

?????????????? spin_unlock(&sc->sc_pcu_lock);
????????????? ath9k_ps_restore(sc);
+???????????? local_irq_restore(flags);
}

?irqreturn_t ath_isr(int irq, void *dev)

but what I need to say is we should disable the IRQ entire of this function, so we need to use the spin_lock/spin_unlock instead of spin_lock_bh/spin_unlock_bh in the functions called by it, but I know it is a big job to do, so I ask for your suggestion.
In a simple way, ?this patch can fix this issue incompletely.

Thanks,
- Felix

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
  2012-11-12  9:06 [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ Felix Liao
@ 2012-11-12  9:43 ` Adrian Chadd
  2012-11-12  9:54   ` Felix Liao
  0 siblings, 1 reply; 9+ messages in thread
From: Adrian Chadd @ 2012-11-12  9:43 UTC (permalink / raw)
  To: ath9k-devel

Well, any sync error like that where the PCIe glue throws an error
like that is a bad thing.

It could be from a variety of places. It could be a busted PCIe
interconnect somehow. I've seen this happen before. The only real way
to diagnose that is to grab a PCIe bus analyser.

It however could also be from the NIC going in and out of network
sleep. Is this in STA or AP mode?


Adrian

On 12 November 2012 01:06, Felix Liao <Felix.Liao@watchguard.com> wrote:
> Hi ath9k team,
> I trace the ath9k_hw_getisr() today to find out why I get the zero ISR mask, and I found that when I get the zero ISR mask, this function return the boolean true, then I trace into the function ar9003_hw_get_isr(),
>
> static bool ar9003_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked)
> {
>         ....
>         async_cause = REG_READ(ah, AR_INTR_ASYNC_CAUSE);
>
>         if (async_cause & (AR_INTR_MAC_IRQ | AR_INTR_ASYNC_MASK_MCI)) {
>                 if ((REG_READ(ah, AR_RTC_STATUS) & AR_RTC_STATUS_M)
>                                 == AR_RTC_STATUS_ON)
>                         isr = REG_READ(ah, AR_ISR);
>         }
>
>
>         sync_cause = REG_READ(ah, AR_INTR_SYNC_CAUSE) & AR_INTR_SYNC_DEFAULT;
>
>         *masked = 0;
>
>         if (!isr && !sync_cause && !async_cause)
>                 return false;
>
>         if (isr == 0) {
>                 //debug value here is isr=0, async_cause=0, sync_cause=0x20 --> AR_INTR_SYNC_HOST1_FATAL
>         }
>         ...
> }
>
> I have no idea why I always get the isr=0 and sync_cause=0x20 which means AR_INTR_SYNC_HOST1_FATAL, but we can't deal with this value in the source code, do you know any about this?
>
> Thanks,
> - Felix
>
> -----Original Message-----
> From: Felix Liao
> Sent: Monday, November 12, 2012 10:10 AM
> To: 'ath9k-devel at lists.ath9k.org'
> Subject: RE: ath9k_tasklet() can be interrupted by a hardware IRQ
>
> Hi ath9k team:
> Another interest question is since the ath9k interrupt was disabled, how can we get a new IRQ to interrupt the tasklet?
> So I investigated and found that the cause of the issue "irq 16: nobody cared" in my box is the ath9k's irq handler return too many IRQ_NONE since I read the ISR register always return 0 as below.
>
>         /*
>          * Figure out the reason(s) for the interrupt.  Note
>          * that the hal returns a pseudo-ISR that may include
>          * bits we haven't explicitly enabled so we mask the
>          * value to insure we only process bits we requested.
>          */
>         ath9k_hw_getisr(ah, &status);   /* NB: clears ISR too */
>         status &= ah->imask;    /* discard unasked-for bits */
>
>         /*
>          * If there are no status bits set, then this interrupt was not
>          * for me (should have been caught above).
>          */
>         if (!status)
>                 return IRQ_NONE;
>
> if I go to reset chip instead of return IRQ_NONE in this position, the things become a little healthier than before, I will never see the message "irq 16: nobody cared".
>
> I check each IRQ using "if(AR_SREV_9300(ah))" and they are all passed, that is, no other hardware share the irq with ath9k.
> I need to know why we read the ISR register and get value 0, and return IRQ_NONE, the chip I used is AR9300.
> Can we using this change below in the ath_isr() function:
>         if (!status)
>                 goto reset_chip;
>
> Thanks,
> - Felix
> -----
> From: Felix Liao
> Sent: Friday, November 09, 2012 9:33 AM
> To: 'ath9k-devel at lists.ath9k.org'
> Subject: ath9k_tasklet() can be interrupted by a hardware IRQ
>
> Hi all,
> I met a kernel call trace below using the ath9k driver, we can see that the ath9k_tasklet() was interrupted by a hardware IRQ, and at this time the ath9k IRQ was disabled, so the kernel will complain that there is none irq handler for this interrupt.
>
> [   60.977474] irq 16: nobody cared (try booting with the "irqpoll" option) [   60.984194] Call Trace:
> [   60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable) [   60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8 [   60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200/0x260 [   61.004396] [dfff1fe0] [c0078704] handle_fasteoi_irq+0xc4/0x10c [   61.010340] [dfff1ff0] [c0010fdc] call_handle_irq+0x18/0x28 [   61.015929] [dfff3e50] [c00051c8] do_IRQ+0xcc/0x1b4 [   61.020825] [dfff3e80] [c0011fc4] ret_from_except+0x0/0x18 [   61.026349] --- Exception: 501 at ioread32+0x8/0x14 [   61.026355]     LR = ath_descdma_setup+0x294/0x664 [ath9k] [   61.036733] [dfff3f40] [c04f5bdc] softirq_to_name+0x0/0x28 (unreliable) [   61.043390] [dfff3f50] [e5ad66fc] ath9k_hw_enable_interrupts+0x160/0x1b0 [ath9k_hw] [   61.051074] [dfff3f70] [e5b41368] ath9k_tasklet+0xc0/0x208 [ath9k] [   61.057274] [dfff3f90] [c0043298] tasklet_action+0xcc/0xf8 [   61.062777] [dfff3fb0] [c0043d48] __do_softirq+0xdc/0x18c [   61.068194] [dfff3ff0] [c0010fb4] call_do_softirq+0x14/0x24 [   61.073784] [d8613e60] [c0004fec] do_softirq+0x8c/0x98 [   61.078939] [d8613e80] [c0043bd0] local_bh_enable+0x9c/0xa0 [   61.084536] [d8613e90] [c036d910] unix_create1+0x184/0x1ac [   61.090039] [d8613eb0] [c036d9a0] unix_create+0x68/0xbc [   61.095286] [d8613ec0] [c02c6c34] __sock_create+0x114/0x248 [   61.100877] [d8613ef0] [c02c7008] sys_socket+0x54/0x84 [   61.106031] [d8613f10] [c02c70e8] sys_socketcall+0xb0/0x258 [   61.111622] [d8613f40] [c0011970] ret_from_syscall+0x0/0x3c [   61.117338] --- Exception: c01 at 0xeff41fc [   61.117343]     LR = 0xeff4350 [   61.124578] handlers:
> [   61.126851] [<e5b3eb44>] (ath_isr+0x0/0x21c [ath9k]) [   61.131840] Disabling IRQ #16 [   61.143234] ath: phy0: Failed to stop TX DMA, queues=0x001!
>
> I made a patch below for this issue and use it to reduce the probability of this issue, you can see that, I can't disable the IRQ from start to end of this function, I need to restore the IRQ flags before we call ath_rx_tasklet and ath_tx_tasklet, since these functions will disable and enable the local softirq using spin_lock_bh and spin_unlock_bh,  since with this gap of time, the tasklet still can be interrupted by the hardware IRQ.
>
> diff -pNaur compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c
> --- compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c    2012-11-08 00:17:08.000000000 +0800
> +++ compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c
> +++ 2012-11-09 07:27:28.459837051 +0800
> @@ -368,6 +368,7 @@ void ath9k_tasklet(unsigned long data)
>               u32 status = sc->intrstatus;
>               u32 rxmask;
>
> +             local_irq_save(flags);
>               ath9k_ps_wakeup(sc);
>               spin_lock(&sc->sc_pcu_lock);
>
> @@ -383,7 +384,7 @@ void ath9k_tasklet(unsigned long data)
>                              goto out;
>               }
>
> -              spin_lock_irqsave(&sc->sc_pm_lock, flags);
> +             spin_lock(&sc->sc_pm_lock);
>               if ((status & ATH9K_INT_TSFOOR) && sc->ps_enabled) {
>                              /*
>                               * TSF sync does not look correct; remain awake to sync with @@ -392,7 +393,7 @@ void ath9k_tasklet(unsigned long data)
>                              ath_dbg(common, PS, "TSFOOR - Sync with next Beacon\n");
>                              sc->ps_flags |= PS_WAIT_FOR_BEACON | PS_BEACON_SYNC;
>               }
> -              spin_unlock_irqrestore(&sc->sc_pm_lock, flags);
> +             spin_unlock(&sc->sc_pm_lock);
>
>                if (ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
>                              rxmask = (ATH9K_INT_RXHP | ATH9K_INT_RXLP | ATH9K_INT_RXEOL | @@ -400,6 +401,7 @@ void ath9k_tasklet(unsigned long data)
>               else
>                              rxmask = (ATH9K_INT_RX | ATH9K_INT_RXEOL | ATH9K_INT_RXORN);
>
> +             local_irq_restore(flags);
>               if (status & rxmask) {
>                              /* Check for high priority Rx first */
>                              if ((ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) && @@ -416,6 +418,7 @@ void ath9k_tasklet(unsigned long data)
>                                             ath_tx_tasklet(sc);
>               }
>
> +             local_irq_save(flags);
>               ath9k_btcoex_handle_interrupt(sc, status);
>
>  out:
> @@ -424,6 +427,7 @@ out:
>
>                spin_unlock(&sc->sc_pcu_lock);
>               ath9k_ps_restore(sc);
> +             local_irq_restore(flags);
> }
>
>  irqreturn_t ath_isr(int irq, void *dev)
>
> but what I need to say is we should disable the IRQ entire of this function, so we need to use the spin_lock/spin_unlock instead of spin_lock_bh/spin_unlock_bh in the functions called by it, but I know it is a big job to do, so I ask for your suggestion.
> In a simple way,  this patch can fix this issue incompletely.
>
> Thanks,
> - Felix
>
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
  2012-11-12  9:43 ` Adrian Chadd
@ 2012-11-12  9:54   ` Felix Liao
  2012-11-18 14:11     ` Adrian Chadd
  0 siblings, 1 reply; 9+ messages in thread
From: Felix Liao @ 2012-11-12  9:54 UTC (permalink / raw)
  To: ath9k-devel

Thanks for you information.
It is running in AP mode, and uses compat-wireless-3.5-3, but the "irq 16: nobody cared" issue still happens when I update to compat-wireless-3.6.6-1.
I found a mail list talk about this, https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, and just set the mask to ATH9K_INT_FATAL, 
then the ath_isr() will just reset the hardware, it is just same with what I think in my mind, and I will take a try for it.

Another question I said, do you think we should spin_lock_irqsave and spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock?

Thanks,
- Felix

-----Original Message-----
From: adrian.chadd@gmail.com [mailto:adrian.chadd at gmail.com] On Behalf Of Adrian Chadd
Sent: Monday, November 12, 2012 5:44 PM
To: Felix Liao
Cc: ath9k-devel at lists.ath9k.org
Subject: Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

Well, any sync error like that where the PCIe glue throws an error like that is a bad thing.

It could be from a variety of places. It could be a busted PCIe interconnect somehow. I've seen this happen before. The only real way to diagnose that is to grab a PCIe bus analyser.

It however could also be from the NIC going in and out of network sleep. Is this in STA or AP mode?


Adrian

On 12 November 2012 01:06, Felix Liao <Felix.Liao@watchguard.com> wrote:
> Hi ath9k team,
> I trace the ath9k_hw_getisr() today to find out why I get the zero ISR 
> mask, and I found that when I get the zero ISR mask, this function 
> return the boolean true, then I trace into the function 
> ar9003_hw_get_isr(),
>
> static bool ar9003_hw_get_isr(struct ath_hw *ah, enum ath9k_int 
> *masked) {
>         ....
>         async_cause = REG_READ(ah, AR_INTR_ASYNC_CAUSE);
>
>         if (async_cause & (AR_INTR_MAC_IRQ | AR_INTR_ASYNC_MASK_MCI)) {
>                 if ((REG_READ(ah, AR_RTC_STATUS) & AR_RTC_STATUS_M)
>                                 == AR_RTC_STATUS_ON)
>                         isr = REG_READ(ah, AR_ISR);
>         }
>
>
>         sync_cause = REG_READ(ah, AR_INTR_SYNC_CAUSE) & 
> AR_INTR_SYNC_DEFAULT;
>
>         *masked = 0;
>
>         if (!isr && !sync_cause && !async_cause)
>                 return false;
>
>         if (isr == 0) {
>                 //debug value here is isr=0, async_cause=0, sync_cause=0x20 --> AR_INTR_SYNC_HOST1_FATAL
>         }
>         ...
> }
>
> I have no idea why I always get the isr=0 and sync_cause=0x20 which means AR_INTR_SYNC_HOST1_FATAL, but we can't deal with this value in the source code, do you know any about this?
>
> Thanks,
> - Felix
>
> -----Original Message-----
> From: Felix Liao
> Sent: Monday, November 12, 2012 10:10 AM
> To: 'ath9k-devel at lists.ath9k.org'
> Subject: RE: ath9k_tasklet() can be interrupted by a hardware IRQ
>
> Hi ath9k team:
> Another interest question is since the ath9k interrupt was disabled, how can we get a new IRQ to interrupt the tasklet?
> So I investigated and found that the cause of the issue "irq 16: nobody cared" in my box is the ath9k's irq handler return too many IRQ_NONE since I read the ISR register always return 0 as below.
>
>         /*
>          * Figure out the reason(s) for the interrupt.  Note
>          * that the hal returns a pseudo-ISR that may include
>          * bits we haven't explicitly enabled so we mask the
>          * value to insure we only process bits we requested.
>          */
>         ath9k_hw_getisr(ah, &status);   /* NB: clears ISR too */
>         status &= ah->imask;    /* discard unasked-for bits */
>
>         /*
>          * If there are no status bits set, then this interrupt was not
>          * for me (should have been caught above).
>          */
>         if (!status)
>                 return IRQ_NONE;
>
> if I go to reset chip instead of return IRQ_NONE in this position, the things become a little healthier than before, I will never see the message "irq 16: nobody cared".
>
> I check each IRQ using "if(AR_SREV_9300(ah))" and they are all passed, that is, no other hardware share the irq with ath9k.
> I need to know why we read the ISR register and get value 0, and return IRQ_NONE, the chip I used is AR9300.
> Can we using this change below in the ath_isr() function:
>         if (!status)
>                 goto reset_chip;
>
> Thanks,
> - Felix
> -----
> From: Felix Liao
> Sent: Friday, November 09, 2012 9:33 AM
> To: 'ath9k-devel at lists.ath9k.org'
> Subject: ath9k_tasklet() can be interrupted by a hardware IRQ
>
> Hi all,
> I met a kernel call trace below using the ath9k driver, we can see that the ath9k_tasklet() was interrupted by a hardware IRQ, and at this time the ath9k IRQ was disabled, so the kernel will complain that there is none irq handler for this interrupt.
>
> [   60.977474] irq 16: nobody cared (try booting with the "irqpoll" option) [   60.984194] Call Trace:
> [   60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable) [   60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8 [   60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200/0x260 [   61.004396] [dfff1fe0] [c0078704] handle_fasteoi_irq+0xc4/0x10c [   61.010340] [dfff1ff0] [c0010fdc] call_handle_irq+0x18/0x28 [   61.015929] [dfff3e50] [c00051c8] do_IRQ+0xcc/0x1b4 [   61.020825] [dfff3e80] [c0011fc4] ret_from_except+0x0/0x18 [   61.026349] --- Exception: 501 at ioread32+0x8/0x14 [   61.026355]     LR = ath_descdma_setup+0x294/0x664 [ath9k] [   61.036733] [dfff3f40] [c04f5bdc] softirq_to_name+0x0/0x28 (unreliable) [   61.043390] [dfff3f50] [e5ad66fc] ath9k_hw_enable_interrupts+0x160/0x1b0 [ath9k_hw] [   61.051074] [dfff3f70] [e5b41368] ath9k_tasklet+0xc0/0x208 [ath9k] [   61.057274] [dfff3f90] [c0043298] tasklet_action+0xcc/0xf8 [   61.062777] [dfff3fb0] [c0043d48] __do_softirq+0xdc/0x18c [   61.068194] [dfff3ff0] [c0010fb4] call_do_softirq+0x14/0x24 [   61.073784] [d8613e60] [c0004fec] do_softirq+0x8c/0x98 [   61.078939] [d8613e80] [c0043bd0] local_bh_enable+0x9c/0xa0 [   61.084536] [d8613e90] [c036d910] unix_create1+0x184/0x1ac [   61.090039] [d8613eb0] [c036d9a0] unix_create+0x68/0xbc [   61.095286] [d8613ec0] [c02c6c34] __sock_create+0x114/0x248 [   61.100877] [d8613ef0] [c02c7008] sys_socket+0x54/0x84 [   61.106031] [d8613f10] [c02c70e8] sys_socketcall+0xb0/0x258 [   61.111622] [d8613f40] [c0011970] ret_from_syscall+0x0/0x3c [   61.117338] --- Exception: c01 at 0xeff41fc [   61.117343]     LR = 0xeff4350 [   61.124578] handlers:
> [   61.126851] [<e5b3eb44>] (ath_isr+0x0/0x21c [ath9k]) [   61.131840] Disabling IRQ #16 [   61.143234] ath: phy0: Failed to stop TX DMA, queues=0x001!
>
> I made a patch below for this issue and use it to reduce the probability of this issue, you can see that, I can't disable the IRQ from start to end of this function, I need to restore the IRQ flags before we call ath_rx_tasklet and ath_tx_tasklet, since these functions will disable and enable the local softirq using spin_lock_bh and spin_unlock_bh,  since with this gap of time, the tasklet still can be interrupted by the hardware IRQ.
>
> diff -pNaur compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c
> --- compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c    2012-11-08 00:17:08.000000000 +0800
> +++ compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.
> +++ c
> +++ 2012-11-09 07:27:28.459837051 +0800
> @@ -368,6 +368,7 @@ void ath9k_tasklet(unsigned long data)
>               u32 status = sc->intrstatus;
>               u32 rxmask;
>
> +             local_irq_save(flags);
>               ath9k_ps_wakeup(sc);
>               spin_lock(&sc->sc_pcu_lock);
>
> @@ -383,7 +384,7 @@ void ath9k_tasklet(unsigned long data)
>                              goto out;
>               }
>
> -              spin_lock_irqsave(&sc->sc_pm_lock, flags);
> +             spin_lock(&sc->sc_pm_lock);
>               if ((status & ATH9K_INT_TSFOOR) && sc->ps_enabled) {
>                              /*
>                               * TSF sync does not look correct; remain awake to sync with @@ -392,7 +393,7 @@ void ath9k_tasklet(unsigned long data)
>                              ath_dbg(common, PS, "TSFOOR - Sync with next Beacon\n");
>                              sc->ps_flags |= PS_WAIT_FOR_BEACON | PS_BEACON_SYNC;
>               }
> -              spin_unlock_irqrestore(&sc->sc_pm_lock, flags);
> +             spin_unlock(&sc->sc_pm_lock);
>
>                if (ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
>                              rxmask = (ATH9K_INT_RXHP | ATH9K_INT_RXLP | ATH9K_INT_RXEOL | @@ -400,6 +401,7 @@ void ath9k_tasklet(unsigned long data)
>               else
>                              rxmask = (ATH9K_INT_RX | ATH9K_INT_RXEOL 
> | ATH9K_INT_RXORN);
>
> +             local_irq_restore(flags);
>               if (status & rxmask) {
>                              /* Check for high priority Rx first */
>                              if ((ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) && @@ -416,6 +418,7 @@ void ath9k_tasklet(unsigned long data)
>                                             ath_tx_tasklet(sc);
>               }
>
> +             local_irq_save(flags);
>               ath9k_btcoex_handle_interrupt(sc, status);
>
>  out:
> @@ -424,6 +427,7 @@ out:
>
>                spin_unlock(&sc->sc_pcu_lock);
>               ath9k_ps_restore(sc);
> +             local_irq_restore(flags);
> }
>
>  irqreturn_t ath_isr(int irq, void *dev)
>
> but what I need to say is we should disable the IRQ entire of this function, so we need to use the spin_lock/spin_unlock instead of spin_lock_bh/spin_unlock_bh in the functions called by it, but I know it is a big job to do, so I ask for your suggestion.
> In a simple way,  this patch can fix this issue incompletely.
>
> Thanks,
> - Felix
>
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
  2012-11-12  9:54   ` Felix Liao
@ 2012-11-18 14:11     ` Adrian Chadd
  2012-11-20 10:05       ` Felix Liao
  0 siblings, 1 reply; 9+ messages in thread
From: Adrian Chadd @ 2012-11-18 14:11 UTC (permalink / raw)
  To: ath9k-devel

On 12 November 2012 01:54, Felix Liao <Felix.Liao@watchguard.com> wrote:
> Thanks for you information.
> It is running in AP mode, and uses compat-wireless-3.5-3, but the "irq 16: nobody cared" issue still happens when I update to compat-wireless-3.6.6-1.
> I found a mail list talk about this, https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, and just set the mask to ATH9K_INT_FATAL,
> then the ath_isr() will just reset the hardware, it is just same with what I think in my mind, and I will take a try for it.
>
> Another question I said, do you think we should spin_lock_irqsave and spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock?

That's a question for someone with a more intimate knowledge of the
current workings of ath9k.

Felix (F) - would you mind taking a look at this thread? He's having
issues with fatal sync interrupts not causing the NIC to be reset?


Adrian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
  2012-11-18 14:11     ` Adrian Chadd
@ 2012-11-20 10:05       ` Felix Liao
  2012-12-03  9:45         ` Felix Fietkau
  0 siblings, 1 reply; 9+ messages in thread
From: Felix Liao @ 2012-11-20 10:05 UTC (permalink / raw)
  To: ath9k-devel

Hi Felix F, I'm Felix Liao from WatchGuard, I think we should spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock 
in the ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the tasklet being interrupted by the hardware IRQ,
do you think so?

Thanks,
- Felix Liao

-----Original Message-----
From: adrian.chadd@gmail.com [mailto:adrian.chadd at gmail.com] On Behalf Of Adrian Chadd
Sent: Sunday, November 18, 2012 10:11 PM
To: Felix Liao; Felix Fietkau
Cc: ath9k-devel at lists.ath9k.org
Subject: Re: [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ

On 12 November 2012 01:54, Felix Liao <Felix.Liao@watchguard.com> wrote:
> Thanks for you information.
> It is running in AP mode, and uses compat-wireless-3.5-3, but the "irq 16: nobody cared" issue still happens when I update to compat-wireless-3.6.6-1.
> I found a mail list talk about this, 
> https://patchwork.kernel.org/patch/1536701/, I see this patch not in 3.6.6-1, and just set the mask to ATH9K_INT_FATAL, then the ath_isr() will just reset the hardware, it is just same with what I think in my mind, and I will take a try for it.
>
> Another question I said, do you think we should spin_lock_irqsave and spin_unlock_irqrestore in the ath9k_tasklet instead of spin_lock/spin_unlock?

That's a question for someone with a more intimate knowledge of the current workings of ath9k.

Felix (F) - would you mind taking a look at this thread? He's having issues with fatal sync interrupts not causing the NIC to be reset?


Adrian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
  2012-11-20 10:05       ` Felix Liao
@ 2012-12-03  9:45         ` Felix Fietkau
  2012-12-03 14:51           ` Chaoxing Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Felix Fietkau @ 2012-12-03  9:45 UTC (permalink / raw)
  To: ath9k-devel

On 2012-11-20 11:05 AM, Felix Liao wrote:
> Hi Felix F, I'm Felix Liao from WatchGuard, I think we should spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock 
> in the ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the tasklet being interrupted by the hardware IRQ,
> do you think so?
Using an IRQ spinlock in the tasklet is not the correct solution. If the
IRQ fires while the tasklet is running, it means that the driver does
not properly disable IRQs in the handler. They're supposed to get
disabled there and re-enabled only after the tasklet has completed.

- Felix

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
  2012-12-03  9:45         ` Felix Fietkau
@ 2012-12-03 14:51           ` Chaoxing Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Chaoxing Lin @ 2012-12-03 14:51 UTC (permalink / raw)
  To: ath9k-devel


On 2012-11-20 11:05 AM, Felix Liao wrote:
>> Hi Felix F, I'm Felix Liao from WatchGuard, I think we should 
>> spin_lock_irqsave and spin_unlock_irqrestore on sc_pcu_lock in the 
>> ath9k_tasklet() instead of spin_lock/spin_unlock, just to avoid the tasklet being interrupted by the hardware IRQ, do you think so?
>Using an IRQ spinlock in the tasklet is not the correct solution. If the IRQ fires while the tasklet is running, it means that the driver does not properly disable IRQs in the handler. They're supposed to get disabled there and re-enabled only after the tasklet has completed.


My 2 cents. 
I disagree this statement "They're supposed to get disabled there and re-enabled only after the tasklet has completed."
In general, IRQs should be able to interrupt tasklet. Otherwise, it defeats the purpose of using tasklet. 

Only when both tasklet and IRQs can access common resources(critical region), IRQ needs to be disabled before tasklet access this resource and re-enable after that.

_______________________________________________
ath9k-devel mailing list
ath9k-devel at lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
@ 2012-11-12  2:09 Felix Liao
  0 siblings, 0 replies; 9+ messages in thread
From: Felix Liao @ 2012-11-12  2:09 UTC (permalink / raw)
  To: ath9k-devel

Hi ath9k team:
Another interest question is since the ath9k interrupt was disabled, how can we get a new IRQ to interrupt the tasklet? 
So I investigated and found that the cause of the issue "irq 16: nobody cared" in my box is the ath9k's irq handler return 
too many IRQ_NONE since I read the ISR register always return 0 as below.

        /*
         * Figure out the reason(s) for the interrupt.  Note
         * that the hal returns a pseudo-ISR that may include
         * bits we haven't explicitly enabled so we mask the
         * value to insure we only process bits we requested.
         */
        ath9k_hw_getisr(ah, &status);   /* NB: clears ISR too */
        status &= ah->imask;    /* discard unasked-for bits */

        /*
         * If there are no status bits set, then this interrupt was not
         * for me (should have been caught above).
         */
        if (!status) 
                return IRQ_NONE;

if I go to reset chip instead of return IRQ_NONE in this position, the things become a little healthier than before, 
I will never see the message "irq 16: nobody cared".

I check each IRQ using "if(AR_SREV_9300(ah))" and they are all passed, that is, no other hardware share the irq with ath9k.
I need to know why we read the ISR register and get value 0, and return IRQ_NONE, the chip I used is AR9300.
Can we using this change below in the ath_isr() function:
        if (!status) 
                goto reset_chip;

Thanks,
- Felix
-----
From: Felix Liao 
Sent: Friday, November 09, 2012 9:33 AM
To: 'ath9k-devel at lists.ath9k.org'
Subject: ath9k_tasklet() can be interrupted by a hardware IRQ

Hi all,
I met a kernel call trace below using the ath9k driver, we can see that the ath9k_tasklet() was interrupted by a hardware IRQ, 
and at this time the ath9k IRQ was disabled, so the kernel will complain that there is none irq handler for this interrupt.

[?? 60.977474] irq 16: nobody cared (try booting with the "irqpoll" option)
[?? 60.984194] Call Trace:
[?? 60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable)
[?? 60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8
[?? 60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200/0x260
[?? 61.004396] [dfff1fe0] [c0078704] handle_fasteoi_irq+0xc4/0x10c
[?? 61.010340] [dfff1ff0] [c0010fdc] call_handle_irq+0x18/0x28
[?? 61.015929] [dfff3e50] [c00051c8] do_IRQ+0xcc/0x1b4
[?? 61.020825] [dfff3e80] [c0011fc4] ret_from_except+0x0/0x18
[?? 61.026349] --- Exception: 501 at ioread32+0x8/0x14
[?? 61.026355]???? LR = ath_descdma_setup+0x294/0x664 [ath9k]
[?? 61.036733] [dfff3f40] [c04f5bdc] softirq_to_name+0x0/0x28 (unreliable)
[?? 61.043390] [dfff3f50] [e5ad66fc] ath9k_hw_enable_interrupts+0x160/0x1b0 [ath9k_hw]
[?? 61.051074] [dfff3f70] [e5b41368] ath9k_tasklet+0xc0/0x208 [ath9k]
[?? 61.057274] [dfff3f90] [c0043298] tasklet_action+0xcc/0xf8
[?? 61.062777] [dfff3fb0] [c0043d48] __do_softirq+0xdc/0x18c
[?? 61.068194] [dfff3ff0] [c0010fb4] call_do_softirq+0x14/0x24
[?? 61.073784] [d8613e60] [c0004fec] do_softirq+0x8c/0x98
[?? 61.078939] [d8613e80] [c0043bd0] local_bh_enable+0x9c/0xa0
[?? 61.084536] [d8613e90] [c036d910] unix_create1+0x184/0x1ac
[?? 61.090039] [d8613eb0] [c036d9a0] unix_create+0x68/0xbc
[?? 61.095286] [d8613ec0] [c02c6c34] __sock_create+0x114/0x248
[?? 61.100877] [d8613ef0] [c02c7008] sys_socket+0x54/0x84
[?? 61.106031] [d8613f10] [c02c70e8] sys_socketcall+0xb0/0x258
[?? 61.111622] [d8613f40] [c0011970] ret_from_syscall+0x0/0x3c
[?? 61.117338] --- Exception: c01 at 0xeff41fc
[?? 61.117343]???? LR = 0xeff4350
[?? 61.124578] handlers:
[?? 61.126851] [<e5b3eb44>] (ath_isr+0x0/0x21c [ath9k])
[?? 61.131840] Disabling IRQ #16
[?? 61.143234] ath: phy0: Failed to stop TX DMA, queues=0x001!

I made a patch below for this issue and use it to reduce the probability of this issue, you can see that, 
I can't disable the IRQ from start to end of this function, I need to restore the IRQ flags before we call ath_rx_tasklet and ath_tx_tasklet, 
since these functions will disable and enable the local softirq using spin_lock_bh and spin_unlock_bh, ?since with this gap of time, 
the tasklet still can be interrupted by the hardware IRQ.

diff -pNaur compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c
--- compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c??? 2012-11-08 00:17:08.000000000 +0800
+++ compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c?????????????? 2012-11-09 07:27:28.459837051 +0800
@@ -368,6 +368,7 @@ void ath9k_tasklet(unsigned long data)
????????????? u32 status = sc->intrstatus;
????????????? u32 rxmask;

+???????????? local_irq_save(flags);
????????????? ath9k_ps_wakeup(sc);
????????????? spin_lock(&sc->sc_pcu_lock);

@@ -383,7 +384,7 @@ void ath9k_tasklet(unsigned long data)
???????????????????????????? goto out;
????????????? }

-????????????? spin_lock_irqsave(&sc->sc_pm_lock, flags);
+???????????? spin_lock(&sc->sc_pm_lock);
????????????? if ((status & ATH9K_INT_TSFOOR) && sc->ps_enabled) {
???????????????????????????? /*
???????????????????????????? ?* TSF sync does not look correct; remain awake to sync with
@@ -392,7 +393,7 @@ void ath9k_tasklet(unsigned long data)
???????????????????????????? ath_dbg(common, PS, "TSFOOR - Sync with next Beacon\n");
???????????????????????????? sc->ps_flags |= PS_WAIT_FOR_BEACON | PS_BEACON_SYNC;
????????????? }
-????????????? spin_unlock_irqrestore(&sc->sc_pm_lock, flags);
+???????????? spin_unlock(&sc->sc_pm_lock);

?????????????? if (ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
???????????????????????????? rxmask = (ATH9K_INT_RXHP | ATH9K_INT_RXLP | ATH9K_INT_RXEOL |
@@ -400,6 +401,7 @@ void ath9k_tasklet(unsigned long data)
????????????? else
???????????????????????????? rxmask = (ATH9K_INT_RX | ATH9K_INT_RXEOL | ATH9K_INT_RXORN);

+???????????? local_irq_restore(flags);
????????????? if (status & rxmask) {
???????????????????????????? /* Check for high priority Rx first */
???????????????????????????? if ((ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) &&
@@ -416,6 +418,7 @@ void ath9k_tasklet(unsigned long data)
??????????????????????????????????????????? ath_tx_tasklet(sc);
????????????? }

+???????????? local_irq_save(flags);
????????????? ath9k_btcoex_handle_interrupt(sc, status);

?out:
@@ -424,6 +427,7 @@ out:

?????????????? spin_unlock(&sc->sc_pcu_lock);
????????????? ath9k_ps_restore(sc);
+???????????? local_irq_restore(flags);
}

?irqreturn_t ath_isr(int irq, void *dev)

but what I need to say is we should disable the IRQ entire of this function, so we need to use the 
spin_lock/spin_unlock instead of spin_lock_bh/spin_unlock_bh in the functions called by it, 
but I know it is a big job to do, so I ask for your suggestion.
In a simple way, ?this patch can fix this issue incompletely.

Thanks,
- Felix

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ
@ 2012-11-09  1:33 Felix Liao
  0 siblings, 0 replies; 9+ messages in thread
From: Felix Liao @ 2012-11-09  1:33 UTC (permalink / raw)
  To: ath9k-devel

Hi all,
I met a kernel call trace below using the ath9k driver, we can see that the ath9k_tasklet() was interrupted by a hardware IRQ, and at this time the ath9k IRQ was disabled, so the kernel will complain that there is none irq handler for this interrupt.

[   60.977474] irq 16: nobody cared (try booting with the "irqpoll" option)
[   60.984194] Call Trace:
[   60.986660] [dfff1f50] [c000843c] show_stack+0x70/0x1c8 (unreliable)
[   60.993041] [dfff1f90] [c00777a0] __report_bad_irq+0x44/0xe8
[   60.998718] [dfff1fb0] [c0077a44] note_interrupt+0x200/0x260
[   61.004396] [dfff1fe0] [c0078704] handle_fasteoi_irq+0xc4/0x10c
[   61.010340] [dfff1ff0] [c0010fdc] call_handle_irq+0x18/0x28
[   61.015929] [dfff3e50] [c00051c8] do_IRQ+0xcc/0x1b4
[   61.020825] [dfff3e80] [c0011fc4] ret_from_except+0x0/0x18
[   61.026349] --- Exception: 501 at ioread32+0x8/0x14
[   61.026355]     LR = ath_descdma_setup+0x294/0x664 [ath9k]
[   61.036733] [dfff3f40] [c04f5bdc] softirq_to_name+0x0/0x28 (unreliable)
[   61.043390] [dfff3f50] [e5ad66fc] ath9k_hw_enable_interrupts+0x160/0x1b0 [ath9k_hw]
[   61.051074] [dfff3f70] [e5b41368] ath9k_tasklet+0xc0/0x208 [ath9k]
[   61.057274] [dfff3f90] [c0043298] tasklet_action+0xcc/0xf8
[   61.062777] [dfff3fb0] [c0043d48] __do_softirq+0xdc/0x18c
[   61.068194] [dfff3ff0] [c0010fb4] call_do_softirq+0x14/0x24
[   61.073784] [d8613e60] [c0004fec] do_softirq+0x8c/0x98
[   61.078939] [d8613e80] [c0043bd0] local_bh_enable+0x9c/0xa0
[   61.084536] [d8613e90] [c036d910] unix_create1+0x184/0x1ac
[   61.090039] [d8613eb0] [c036d9a0] unix_create+0x68/0xbc
[   61.095286] [d8613ec0] [c02c6c34] __sock_create+0x114/0x248
[   61.100877] [d8613ef0] [c02c7008] sys_socket+0x54/0x84
[   61.106031] [d8613f10] [c02c70e8] sys_socketcall+0xb0/0x258
[   61.111622] [d8613f40] [c0011970] ret_from_syscall+0x0/0x3c
[   61.117338] --- Exception: c01 at 0xeff41fc
[   61.117343]     LR = 0xeff4350
[   61.124578] handlers:
[   61.126851] [<e5b3eb44>] (ath_isr+0x0/0x21c [ath9k])
[   61.131840] Disabling IRQ #16
[   61.143234] ath: phy0: Failed to stop TX DMA, queues=0x001!

I made a patch below for this issue and use it to reduce the probability of this issue, you can see that, I can't disable the IRQ from start to end of this function, I need to restore the IRQ flags before we call ath_rx_tasklet and ath_tx_tasklet, since these functions will disable and enable the local softirq using spin_lock_bh and spin_unlock_bh,  since with this gap of time, the tasklet still can be interrupted by the hardware IRQ.

diff -pNaur compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c
--- compat-wireless-3.6.6-1-orig/drivers/net/wireless/ath/ath9k/main.c    2012-11-08 00:17:08.000000000 +0800
+++ compat-wireless-3.6.6-1-fixed/drivers/net/wireless/ath/ath9k/main.c               2012-11-09 07:27:28.459837051 +0800
@@ -368,6 +368,7 @@ void ath9k_tasklet(unsigned long data)
              u32 status = sc->intrstatus;
              u32 rxmask;
+             local_irq_save(flags);
              ath9k_ps_wakeup(sc);
              spin_lock(&sc->sc_pcu_lock);
@@ -383,7 +384,7 @@ void ath9k_tasklet(unsigned long data)
                             goto out;
              }
-              spin_lock_irqsave(&sc->sc_pm_lock, flags);
+             spin_lock(&sc->sc_pm_lock);
              if ((status & ATH9K_INT_TSFOOR) && sc->ps_enabled) {
                             /*
                              * TSF sync does not look correct; remain awake to sync with
@@ -392,7 +393,7 @@ void ath9k_tasklet(unsigned long data)
                             ath_dbg(common, PS, "TSFOOR - Sync with next Beacon\n");
                             sc->ps_flags |= PS_WAIT_FOR_BEACON | PS_BEACON_SYNC;
              }
-              spin_unlock_irqrestore(&sc->sc_pm_lock, flags);
+             spin_unlock(&sc->sc_pm_lock);
               if (ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
                             rxmask = (ATH9K_INT_RXHP | ATH9K_INT_RXLP | ATH9K_INT_RXEOL |
@@ -400,6 +401,7 @@ void ath9k_tasklet(unsigned long data)
              else
                             rxmask = (ATH9K_INT_RX | ATH9K_INT_RXEOL | ATH9K_INT_RXORN);
+             local_irq_restore(flags);
              if (status & rxmask) {
                             /* Check for high priority Rx first */
                             if ((ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) &&
@@ -416,6 +418,7 @@ void ath9k_tasklet(unsigned long data)
                                            ath_tx_tasklet(sc);
              }
+             local_irq_save(flags);
              ath9k_btcoex_handle_interrupt(sc, status);
 out:
@@ -424,6 +427,7 @@ out:
               spin_unlock(&sc->sc_pcu_lock);
              ath9k_ps_restore(sc);
+             local_irq_restore(flags);
}
 irqreturn_t ath_isr(int irq, void *dev)

but what I need to say is we should disable the IRQ entire of this function, so we need to use the spin_lock/spin_unlock instead of spin_lock_bh/spin_unlock_bh in the functions called by it, but I know it is a big job to do, so I ask for your suggestion.
In a simple way,  this patch can fix this issue incompletely.

Thanks,
- Felix

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ath9k.org/pipermail/ath9k-devel/attachments/20121109/6ecc0e1e/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ath9k_tasklet_disable_irq.patch
Type: application/octet-stream
Size: 1800 bytes
Desc: ath9k_tasklet_disable_irq.patch
Url : http://lists.ath9k.org/pipermail/ath9k-devel/attachments/20121109/6ecc0e1e/attachment.obj 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-12-03 14:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-12  9:06 [ath9k-devel] ath9k_tasklet() can be interrupted by a hardware IRQ Felix Liao
2012-11-12  9:43 ` Adrian Chadd
2012-11-12  9:54   ` Felix Liao
2012-11-18 14:11     ` Adrian Chadd
2012-11-20 10:05       ` Felix Liao
2012-12-03  9:45         ` Felix Fietkau
2012-12-03 14:51           ` Chaoxing Lin
  -- strict thread matches above, loose matches on Subject: below --
2012-11-12  2:09 Felix Liao
2012-11-09  1:33 Felix Liao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.