All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes
@ 2020-12-06 17:38 Mitchell Nordine
  2020-12-06 17:53 ` wi nk
  0 siblings, 1 reply; 31+ messages in thread
From: Mitchell Nordine @ 2020-12-06 17:38 UTC (permalink / raw)
  To: ath11k

I recently tried updating to the latest set of patches on `ath11k-qca6390-bringup`, and as expected the crashing still remains (XPS 13 9310 with the QCA6390). I'm finding it difficult to test any of the other behaviour (like improved suspend, etc) as I'm seeing crashes the vast majority of the time. Normally this occurs when the wifi first attempts to connect to a network. On the rare occasion where it does connect successfully, it appears to run smoothly for a seemingly random amount of time before spontaneously crashing and freezing the system. I haven't managed to identify any particular action that causes this.

FWIW, I still haven't managed to enable Bluetooth in my kernel yet, so there's very little chance that it's contributing to the issue in my case. I think wi-nk's observation is correct that the Bluetooth impacting raciness they observed was just a coincidence.

Let me know if there is anything else I can test to help, or any particular kinds of debugging output you would like to see and I'll give it a go next time I get the chance to test.


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, December 6, 2020 6:00 PM, <ath11k-request@lists.infradead.org> wrote:

> Send ath11k mailing list submissions to
> ath11k@lists.infradead.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.infradead.org/mailman/listinfo/ath11k
> or, via email, send a message with subject or body 'help' to
> ath11k-request@lists.infradead.org
>
> You can reach the person managing the list at
> ath11k-owner@lists.infradead.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of ath11k digest..."
>
> Today's Topics:
>
> 1.  Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes (wi nk)
> 2.  Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes (wi nk)
>
>
> Message: 1
> Date: Sat, 5 Dec 2020 20:17:10 +0100
> From: wi nk wink@technolu.st
> To: Kalle Valo kvalo@codeaurora.org
> Cc: Thomas Krause thomaskrause@posteo.de, ath11k@lists.infradead.org
> Subject: Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes
> Message-ID:
> CAHUdJJX6JWbNY+=B2D1fFGZPqzbJSw0V0C2i+bZ=xabE56cv_A@mail.gmail.com
> Content-Type: text/plain; charset="UTF-8"
>
> On Tue, Dec 1, 2020 at 11:17 AM wi nk wink@technolu.st wrote:
>
> > On Mon, Nov 30, 2020 at 6:02 PM wi nk wink@technolu.st wrote:
> >
> > > On Mon, Nov 30, 2020 at 5:55 PM Kalle Valo kvalo@codeaurora.org wrote:
> > >
> > > > Hi Wi and Thomas,
> > > > I'll start a new thread about problems on XPS 13. The information is
> > > > scattered to different threads and hard to find everything, it's much
> > > > easier to have everything in one place. So let's continue the discussion
> > > > about the kernel crashes on this thread.
> > > > Here's what I have understood so far:
> > > >
> > > > -   On Dell XPS 15 there are no issues with QCA6390 and it seems to work
> > > >     with 32 MSI vectors.
> > > >
> > > > -   On Dell XPS 13 there's a BIOS bug and kernel prints:
> > > >
> > > >
> > > > [ 0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
> > > > BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:
> > > >
> > > > -   Because of this BIOS bug QCA6390 only gets one MSI vector on Dell XPS
> > > >     13. We added a hack to ath11k make it work with only vector and after
> > > >     that it's possible to boot the firmware, connect to the AP and use the
> > > >     device for a while.
> > > >
> > > > -   But the problem now is that the kernel is crashing almost immediately
> > > >     and almost every time(?). And these crashes only happen on Dell XPS
> > > >     13, all other systems (including Dell XPS 15) seem to work without
> > > >     issues.
> > > >
> > > >
> > > > Is my understanding correct? Did I miss anything?
> > > > About the symptoms Wi reports:
> > > >
> > > > So up until this point, everything is working without issues.
> > > > Everything seems to spiral out of control a couple of seconds later
> > > > when my system attempts to actually bring up the adapter. In most of
> > > > the crash states I will see this:
> > > > [ 31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > > > [ 31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> > > > [ 31.391928] wlp85s0: authenticated
> > > > [ 31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > > > [ 31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > > > (capab=0x411 status=0 aid=6)
> > > > [ 31.407730] wlp85s0: associated
> > > > [ 31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> > > > And then either somewhere in that pile of messages, or a second or two
> > > > after this my machine will start to stutter as I mentioned before, and
> > > > then it either hangs, or I see this message (I'm truncating the
> > > > timestamp):
> > > > [ 35.xxxx ] sched: RT throttling activated
> > > > After that moment, the machine is unresponsive. Sorry I can't seem to
> > > > extract this data other than screenshots from my phone at the moment,
> > > > you can see the dmesg output from 6 different hangs here:
> > > >
> > > > https://github.com/w1nk/ath11k-debug
> > > >
> > > > -------------------------------------
> > > >
> > > > And Thomas Krause reports:
> > > >
> > > > I can confirm this behavior on my configuration. I managed to login
> > > > once and select the Wifi and connect to it. It seemed curiously enough
> > > > be stable long enough to enter the Wifi passphrase. After the
> > > > connection was established, the system hang and on each attempt to
> > > > reboot into the graphical system it would freeze at some point
> > > > (sometimes even before showing the login screen).
> > > >
> > > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > >
> > > > --
> > > > https://patchwork.kernel.org/project/linux-wireless/list/
> > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >
> > > Hi Kalle,
> > > Again, thanks much for your work. I think you've summarized
> > > everything up until this point. On my XPS 13 9310 The behavior of the
> > > RT throttling still exists for me occasionally on loading the
> > > driver/associating with an AP. The throttling consistently occurs
> > > after a few sets of the MHI debug printing showing the EE entering an
> > > invalid state ( AMSS -> INVALID_EE ). I'm now building the latest tag
> > > to see if there are any differences.
> > > Thanks!
> >
> > Just to follow up, the first boot resulted in the RT throttling
> > message as the adapter was coming up/associating, shortly after the
> > firmware crashed and the kernel didn't fully freeze, but I needed to(
> > reboot to bring the adapter back.
>
> Kalle -
>
> I've noticed one additional behavior that may give someone with
> familiarity with the QCA hardware a clue. I'm running
> ath11k-qca6390-bringup-202011301608 on the dell xps 13 9310. For
> whatever reason, having the bluetooth subsystem enabled (with a paired
> device) on this dell basically guarantees I'll hit the scheduler
> throttling issue as the ath11k driver is initializing / associating.
> The bluetooth system is using the btqca driver. I don't have any
> useful debugging (I'll gladly collect some if there is a way to do it)
> other than tracking some simple statistics. I booted my system 20
> times, 10 times with bluetooth enabled ((and some headphones turned on
> ready to pair), and 10 times without. In both scenarios, I'm booting
> into X and manually modprobing the ath11k driver. The difference is
> that with bluetooth on and by the time I modprobe the driver, the
> headphones are paired and I received the throttling message and
> subsequent freezing 10/10 times. With bluetooth off / my headphones
> not paired, I only saw it 2/10. I know it's not much hard information
> but it's reliably reproducible for me, is there anything useful I can
> collect?
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Message: 2
> Date: Sun, 6 Dec 2020 09:05:57 +0100
> From: wi nk wink@technolu.st
> To: Kalle Valo kvalo@codeaurora.org
> Cc: Thomas Krause thomaskrause@posteo.de, ath11k@lists.infradead.org
> Subject: Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes
> Message-ID:
> CAHUdJJU0ykf96GbaMrhkcPv2xSF62CDPNSNSgtoGP6BtBTAk6Q@mail.gmail.com
> Content-Type: text/plain; charset="UTF-8"
>
> On Sat, Dec 5, 2020 at 8:17 PM wi nk wink@technolu.st wrote:
>
> > On Tue, Dec 1, 2020 at 11:17 AM wi nk wink@technolu.st wrote:
> >
> > > On Mon, Nov 30, 2020 at 6:02 PM wi nk wink@technolu.st wrote:
> > >
> > > > On Mon, Nov 30, 2020 at 5:55 PM Kalle Valo kvalo@codeaurora.org wrote:
> > > >
> > > > > Hi Wi and Thomas,
> > > > > I'll start a new thread about problems on XPS 13. The information is
> > > > > scattered to different threads and hard to find everything, it's much
> > > > > easier to have everything in one place. So let's continue the discussion
> > > > > about the kernel crashes on this thread.
> > > > > Here's what I have understood so far:
> > > > >
> > > > > -   On Dell XPS 15 there are no issues with QCA6390 and it seems to work
> > > > >     with 32 MSI vectors.
> > > > >
> > > > > -   On Dell XPS 13 there's a BIOS bug and kernel prints:
> > > > >
> > > > >
> > > > > [ 0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
> > > > > BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:
> > > > >
> > > > > -   Because of this BIOS bug QCA6390 only gets one MSI vector on Dell XPS
> > > > >     13. We added a hack to ath11k make it work with only vector and after
> > > > >     that it's possible to boot the firmware, connect to the AP and use the
> > > > >     device for a while.
> > > > >
> > > > > -   But the problem now is that the kernel is crashing almost immediately
> > > > >     and almost every time(?). And these crashes only happen on Dell XPS
> > > > >     13, all other systems (including Dell XPS 15) seem to work without
> > > > >     issues.
> > > > >
> > > > >
> > > > > Is my understanding correct? Did I miss anything?
> > > > > About the symptoms Wi reports:
> > > > >
> > > > > So up until this point, everything is working without issues.
> > > > > Everything seems to spiral out of control a couple of seconds later
> > > > > when my system attempts to actually bring up the adapter. In most of
> > > > > the crash states I will see this:
> > > > > [ 31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > > > > [ 31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> > > > > [ 31.391928] wlp85s0: authenticated
> > > > > [ 31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > > > > [ 31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > > > > (capab=0x411 status=0 aid=6)
> > > > > [ 31.407730] wlp85s0: associated
> > > > > [ 31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> > > > > And then either somewhere in that pile of messages, or a second or two
> > > > > after this my machine will start to stutter as I mentioned before, and
> > > > > then it either hangs, or I see this message (I'm truncating the
> > > > > timestamp):
> > > > > [ 35.xxxx ] sched: RT throttling activated
> > > > > After that moment, the machine is unresponsive. Sorry I can't seem to
> > > > > extract this data other than screenshots from my phone at the moment,
> > > > > you can see the dmesg output from 6 different hangs here:
> > > > >
> > > > > https://github.com/w1nk/ath11k-debug
> > > > >
> > > > > -------------------------------------
> > > > >
> > > > > And Thomas Krause reports:
> > > > >
> > > > > I can confirm this behavior on my configuration. I managed to login
> > > > > once and select the Wifi and connect to it. It seemed curiously enough
> > > > > be stable long enough to enter the Wifi passphrase. After the
> > > > > connection was established, the system hang and on each attempt to
> > > > > reboot into the graphical system it would freeze at some point
> > > > > (sometimes even before showing the login screen).
> > > > >
> > > > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > > >
> > > > > --
> > > > > https://patchwork.kernel.org/project/linux-wireless/list/
> > > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > > >
> > > > Hi Kalle,
> > > > Again, thanks much for your work. I think you've summarized
> > > > everything up until this point. On my XPS 13 9310 The behavior of the
> > > > RT throttling still exists for me occasionally on loading the
> > > > driver/associating with an AP. The throttling consistently occurs
> > > > after a few sets of the MHI debug printing showing the EE entering an
> > > > invalid state ( AMSS -> INVALID_EE ). I'm now building the latest tag
> > > > to see if there are any differences.
> > > > Thanks!
> > >
> > > Just to follow up, the first boot resulted in the RT throttling
> > > message as the adapter was coming up/associating, shortly after the
> > > firmware crashed and the kernel didn't fully freeze, but I needed to(
> > > reboot to bring the adapter back.
> >
> > Kalle -
> > I've noticed one additional behavior that may give someone with
> > familiarity with the QCA hardware a clue. I'm running
> > ath11k-qca6390-bringup-202011301608 on the dell xps 13 9310. For
> > whatever reason, having the bluetooth subsystem enabled (with a paired
> > device) on this dell basically guarantees I'll hit the scheduler
> > throttling issue as the ath11k driver is initializing / associating.
> > The bluetooth system is using the btqca driver. I don't have any
> > useful debugging (I'll gladly collect some if there is a way to do it)
> > other than tracking some simple statistics. I booted my system 20
> > times, 10 times with bluetooth enabled ((and some headphones turned on
> > ready to pair), and 10 times without. In both scenarios, I'm booting
> > into X and manually modprobing the ath11k driver. The difference is
> > that with bluetooth on and by the time I modprobe the driver, the
> > headphones are paired and I received the throttling message and
> > subsequent freezing 10/10 times. With bluetooth off / my headphones
> > not paired, I only saw it 2/10. I know it's not much hard information
> > but it's reliably reproducible for me, is there anything useful I can
> > collect?
>
> Well unfortunately I think the bluetooth was just a red herring in the
> racing. To chase that, I disabled all bluetooth and was able to get
> into a state where I had 6 failed boots in a row. To further poke
> around, I rebuilt the kernel with localmodconfig to disable building
> big chunks of things. This kernel is way less stable and seems to
> freeze most of the time (but does occasionally remain stable), I'm not
> sure what else got disabled in there, but it seems to have had a
> negative impact on the crash racing.
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Subject: Digest Footer
>
> ath11k mailing list
> ath11k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath11k
>
>
> -----------------------------------------------------------------------------------------------------------------------------
>
> End of ath11k Digest, Vol 7, Issue 5



-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes
@ 2020-12-02 23:49 Stephen Liang
  2020-12-09 15:09 ` Kalle Valo
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Liang @ 2020-12-02 23:49 UTC (permalink / raw)
  To: ath11k

For reference, I am running Fedora Rawhide kernel 5.10.0-0.rc6 with
commit 59c6d022df8efb450f82d33dd6a6812935bd022f and a revert of
7fef431be9c9 patched in.

I am able to connect to WiFi and did not receive any immediate
freezing, hard kernel panics, etc. Uptime is now going on 35 minutes
and able to get 50 up/down throughput.

I  do see a number of taints in dmesg while connecting to an access
point but the taint doesn't seem to affect functionality, for example:
https://pastebin.com/raw/gbzuvs3q

Finally, rebooting the computer does require a hard power down.

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 31+ messages in thread
* ath11k: QCA6390 on Dell XPS 13 and kernel crashes
@ 2020-11-30 16:55 Kalle Valo
  2020-11-30 17:02 ` wi nk
  0 siblings, 1 reply; 31+ messages in thread
From: Kalle Valo @ 2020-11-30 16:55 UTC (permalink / raw)
  To: wi nk, Thomas Krause; +Cc: ath11k

Hi Wi and Thomas,

I'll start a new thread about problems on XPS 13. The information is
scattered to different threads and hard to find everything, it's much
easier to have everything in one place. So let's continue the discussion
about the kernel crashes on this thread.

Here's what I have understood so far:

* On Dell XPS 15 there are no issues with QCA6390 and it seems to work
  with 32 MSI vectors.

* On Dell XPS 13 there's a BIOS bug and kernel prints:

[    0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
               BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:

* Because of this BIOS bug QCA6390 only gets one MSI vector on Dell XPS
  13. We added a hack to ath11k make it work with only vector and after
  that it's possible to boot the firmware, connect to the AP and use the
  device for a while.

* But the problem now is that the kernel is crashing almost immediately
  and almost every time(?). And these crashes only happen on Dell XPS
  13, all other systems (including Dell XPS 15) seem to work without
  issues.

Is my understanding correct? Did I miss anything?

About the symptoms Wi reports:

----------------------------------------------------------------------
So up until this point, everything is working without issues.
Everything seems to spiral out of control a couple of seconds later
when my system attempts to actually bring up the adapter.  In most of
the crash states I will see this:

[   31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
[   31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
[   31.391928] wlp85s0: authenticated
[   31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
[   31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
(capab=0x411 status=0 aid=6)
[   31.407730] wlp85s0: associated
[   31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready

And then either somewhere in that pile of messages, or a second or two
after this my machine will start to stutter as I mentioned before, and
then it either hangs, or I see this message (I'm truncating the
timestamp):

[   35.xxxx ] sched: RT throttling activated

After that moment, the machine is unresponsive.  Sorry I can't seem to
extract this data other than screenshots from my phone at the moment,
you can see the dmesg output from 6 different hangs here:

https://github.com/w1nk/ath11k-debug
----------------------------------------------------------------------

And Thomas Krause reports:

--------------------------------------------------------------------------------
I can confirm this behavior on my configuration. I managed to login
once and select the Wifi and connect to it. It seemed curiously enough
be stable long enough to enter the Wifi passphrase. After the
connection was established, the system hang and on each attempt to
reboot into the graphical system it would freeze at some point
(sometimes even before showing the login screen).
----------------------------------------------------------------------

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-12-16  8:50 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-06 17:38 ath11k: QCA6390 on Dell XPS 13 and kernel crashes Mitchell Nordine
2020-12-06 17:53 ` wi nk
2020-12-06 21:45   ` wi nk
2020-12-07  1:17     ` wi nk
2020-12-07 14:45       ` Mitchell Nordine
2020-12-07 17:01         ` wi nk
2020-12-09  1:52           ` wi nk
2020-12-09  9:43             ` wi nk
2020-12-09 15:28               ` wi nk
2020-12-09 15:35     ` Kalle Valo
2020-12-09 15:39       ` wi nk
2020-12-09 15:50         ` wi nk
2020-12-09 15:50         ` Kalle Valo
2020-12-09 15:55           ` wi nk
2020-12-09 21:46             ` wi nk
2020-12-11 12:28               ` wi nk
2020-12-12  5:37                 ` Kalle Valo
2020-12-12 11:46                   ` wi nk
2020-12-12 23:29                     ` wi nk
2020-12-13  0:03                       ` wi nk
2020-12-13  0:59                         ` Mitchell Nordine
2020-12-13 22:09                           ` Stephen Liang
2020-12-16  8:50                           ` Kalle Valo
  -- strict thread matches above, loose matches on Subject: below --
2020-12-02 23:49 Stephen Liang
2020-12-09 15:09 ` Kalle Valo
2020-12-10  3:07   ` Stephen Liang
2020-12-10  7:37     ` Stephen Liang
2020-11-30 16:55 Kalle Valo
2020-11-30 17:02 ` wi nk
2020-12-01 10:17   ` wi nk
2020-12-05 19:17     ` wi nk
2020-12-06  8:05       ` wi nk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.