All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Kazior <michal.kazior@tieto.com>
To: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: ath10k@lists.infradead.org, linux-wireless@vger.kernel.org
Subject: Re: [PATCH] ath10k: move irq setup
Date: Wed, 31 Jul 2013 07:50:37 +0200	[thread overview]
Message-ID: <CA+BoTQn-VF-Ehio4Az6GenGb5cBTm2t2a9bQFAuALFhx+MQ+cw@mail.gmail.com> (raw)
In-Reply-To: <87d2pzuc90.fsf@kamboji.qca.qualcomm.com>

On 30 July 2013 20:35, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> There was a slight race during PCI shutdown. Since
>> interrupts weren't really stopped (only Copy
>> Engine interrupts were disabled through device hw
>> registers) it was possible for a firmware
>> indication (crash) interrupt to come in after
>> tasklets were synced/killed. This would cause
>> memory corruption and a panic in most cases. It
>> was also possible for interrupt to come before CE
>> was initialized during device probing.
>>
>> Interrupts are required for BMI phase so they are enabled as soon as
>> power_up() is called but are freed upon both power_down() and stop()
>> so there's asymmetry here. As by design stop() cannot be followed by
>> start() it is okay. Both power_down() and stop() should be merged
>> later on to avoid confusion.
>
> Why are the interrupts freed both in power_down() and stop()? I don't
> get that.
>
> What if we call disable_irq() in power_down() instead?

power_down() must call free_irq(), because power_up() calls
request_irq() (if you want the symmetry). If anything, the stop()
should call disable_irq(), but wouldn't that mean start() should call
enable_irq()? But than, irqs are needed before start()..

I did think about disable_irq() but AFAIR you need to enable_irq()
later on (so either way you need to keep track of the irq state or
you'll get a ton of WARN_ONs from the system). I'll double check that
and report back later


>> Before this can be really properly fixed var/hw
>> init code split is necessary.
>>
>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>> ---
>>
>> Please note: this is based on my (still under
>> review at the time of posting) previous patchests:
>> device setup refactor and recovery.
>>
>> I'm posting this before those patchsets are merged
>> so anyone interested in testing this fix (I can't
>> reproduce the problem on my setup) can give it a
>> try.
>
> This was reported by Ben, right? So this sould have a Reported-by line
> attributing him.

Yes. I'll fix that, provided we get through the review with the patch :)


>> @@ -1783,16 +1792,24 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
>>       return 0;
>>
>>  err_ce:
>> +     /* XXX: Until var/hw init is split it's impossible to fix the ordering
>> +      * here so we must call stop_intr() here too to prevent interrupts after
>> +      * CE is teared down. It's okay to double call the stop_intr()
>> */
>
> "FIXME:"

Ok.



>>  exit:
>> +     ar_pci->intr_started = ret == 0;
>
> A bit too clever for the sake of readibility for my taste, but I guess
> it's ok.
>
>> --- a/drivers/net/wireless/ath/ath10k/pci.h
>> +++ b/drivers/net/wireless/ath/ath10k/pci.h
>> @@ -198,6 +198,7 @@ struct ath10k_pci {
>>        * interrupts.
>>        */
>>       int num_msi_intrs;
>> +     bool intr_started;
>
> Adding a new state variable makes me worried. I really would prefer a
> solution which would not require that.

I know that. That's why I mentioned in the commit log that it is more
of a workaround than a real fix. Me, I don't like this either but a
real fix requires a lot of rework from what I can tell.

This bug can be triggered more easily now apparently after recovery
patches went in. I'm not experiencing it but I get reports of rare
panics when a machine is left idle for a very long time with
interfaces brought down.


> Also if we call request_irq() in ath10k_pci_probe() we should also call
> free_irq() in ath10k_pci_remove() for symmetry. Just doing a temporary
> hack will most likely stay forever :)

With the patch interrupts are temporarily enabled&disabled for
probe_fw() during pci_probe() and are then not requested until
mac80211 start().


Pozdrawiam / Best regards,
Michał Kazior.

WARNING: multiple messages have this Message-ID (diff)
From: Michal Kazior <michal.kazior@tieto.com>
To: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: linux-wireless@vger.kernel.org, ath10k@lists.infradead.org
Subject: Re: [PATCH] ath10k: move irq setup
Date: Wed, 31 Jul 2013 07:50:37 +0200	[thread overview]
Message-ID: <CA+BoTQn-VF-Ehio4Az6GenGb5cBTm2t2a9bQFAuALFhx+MQ+cw@mail.gmail.com> (raw)
In-Reply-To: <87d2pzuc90.fsf@kamboji.qca.qualcomm.com>

On 30 July 2013 20:35, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> There was a slight race during PCI shutdown. Since
>> interrupts weren't really stopped (only Copy
>> Engine interrupts were disabled through device hw
>> registers) it was possible for a firmware
>> indication (crash) interrupt to come in after
>> tasklets were synced/killed. This would cause
>> memory corruption and a panic in most cases. It
>> was also possible for interrupt to come before CE
>> was initialized during device probing.
>>
>> Interrupts are required for BMI phase so they are enabled as soon as
>> power_up() is called but are freed upon both power_down() and stop()
>> so there's asymmetry here. As by design stop() cannot be followed by
>> start() it is okay. Both power_down() and stop() should be merged
>> later on to avoid confusion.
>
> Why are the interrupts freed both in power_down() and stop()? I don't
> get that.
>
> What if we call disable_irq() in power_down() instead?

power_down() must call free_irq(), because power_up() calls
request_irq() (if you want the symmetry). If anything, the stop()
should call disable_irq(), but wouldn't that mean start() should call
enable_irq()? But than, irqs are needed before start()..

I did think about disable_irq() but AFAIR you need to enable_irq()
later on (so either way you need to keep track of the irq state or
you'll get a ton of WARN_ONs from the system). I'll double check that
and report back later


>> Before this can be really properly fixed var/hw
>> init code split is necessary.
>>
>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>> ---
>>
>> Please note: this is based on my (still under
>> review at the time of posting) previous patchests:
>> device setup refactor and recovery.
>>
>> I'm posting this before those patchsets are merged
>> so anyone interested in testing this fix (I can't
>> reproduce the problem on my setup) can give it a
>> try.
>
> This was reported by Ben, right? So this sould have a Reported-by line
> attributing him.

Yes. I'll fix that, provided we get through the review with the patch :)


>> @@ -1783,16 +1792,24 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
>>       return 0;
>>
>>  err_ce:
>> +     /* XXX: Until var/hw init is split it's impossible to fix the ordering
>> +      * here so we must call stop_intr() here too to prevent interrupts after
>> +      * CE is teared down. It's okay to double call the stop_intr()
>> */
>
> "FIXME:"

Ok.



>>  exit:
>> +     ar_pci->intr_started = ret == 0;
>
> A bit too clever for the sake of readibility for my taste, but I guess
> it's ok.
>
>> --- a/drivers/net/wireless/ath/ath10k/pci.h
>> +++ b/drivers/net/wireless/ath/ath10k/pci.h
>> @@ -198,6 +198,7 @@ struct ath10k_pci {
>>        * interrupts.
>>        */
>>       int num_msi_intrs;
>> +     bool intr_started;
>
> Adding a new state variable makes me worried. I really would prefer a
> solution which would not require that.

I know that. That's why I mentioned in the commit log that it is more
of a workaround than a real fix. Me, I don't like this either but a
real fix requires a lot of rework from what I can tell.

This bug can be triggered more easily now apparently after recovery
patches went in. I'm not experiencing it but I get reports of rare
panics when a machine is left idle for a very long time with
interfaces brought down.


> Also if we call request_irq() in ath10k_pci_probe() we should also call
> free_irq() in ath10k_pci_remove() for symmetry. Just doing a temporary
> hack will most likely stay forever :)

With the patch interrupts are temporarily enabled&disabled for
probe_fw() during pci_probe() and are then not requested until
mac80211 start().


Pozdrawiam / Best regards,
Michał Kazior.

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

  reply	other threads:[~2013-07-31  5:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-18  6:39 [PATCH] ath10k: move irq setup Michal Kazior
2013-07-18  6:39 ` Michal Kazior
2013-07-30 18:35 ` Kalle Valo
2013-07-30 18:35   ` Kalle Valo
2013-07-31  5:50   ` Michal Kazior [this message]
2013-07-31  5:50     ` Michal Kazior
2013-07-31 10:50     ` Michal Kazior
2013-07-31 10:50       ` Michal Kazior
2013-08-02  7:15 ` [PATCH v2] ath10k: fix device teardown Michal Kazior
2013-08-02  7:15   ` Michal Kazior
2013-08-02  7:41   ` Kalle Valo
2013-08-02  7:41     ` Kalle Valo
2013-08-02  7:51     ` Michal Kazior
2013-08-02  7:51       ` Michal Kazior
2013-08-02  8:00       ` Kalle Valo
2013-08-02  8:00         ` Kalle Valo
2013-08-05 16:23   ` Kalle Valo
2013-08-05 16:23     ` Kalle Valo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+BoTQn-VF-Ehio4Az6GenGb5cBTm2t2a9bQFAuALFhx+MQ+cw@mail.gmail.com \
    --to=michal.kazior@tieto.com \
    --cc=ath10k@lists.infradead.org \
    --cc=kvalo@qca.qualcomm.com \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.