linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
       [not found]   ` <1ce6f735-21ff-db7e-c8dc-d567761964aa@posteo.de>
@ 2020-11-02 18:49     ` Kalle Valo
  2020-11-02 20:57       ` Bjorn Helgaas
  0 siblings, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-02 18:49 UTC (permalink / raw)
  To: Thomas Krause; +Cc: ath11k, linux-wireless, linux-pci, Devin Bayer

+ linux-wireless, linux-pci, devin

Thomas Krause <thomaskrause@posteo.de> writes:

>> I had the same problem as well back in the days, for me enabling
>> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> mention that in the ath11k warning above :)
>
> CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
> is behind a PCI bridge which is also disabled, could this be a
> problem?
>
> 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
> [Normal decode])
> 	Flags: bus master, fast devsel, latency 0, IRQ 123
> 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> 	I/O behind bridge: [disabled]
> 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
> 	Prefetchable memory behind bridge: [disabled]
> 	Capabilities: [40] Express Root Port (Slot+), MSI 00
> 	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 	Capabilities: [90] Subsystem: Dell Device 0991
> 	Capabilities: [a0] Power Management version 3
> 	Capabilities: [100] Advanced Error Reporting
> 	Capabilities: [220] Access Control Services
> 	Capabilities: [150] Precision Time Measurement
> 	Capabilities: [200] L1 PM Substates
> 	Capabilities: [a00] Downstream Port Containment
> 	Kernel driver in use: pcieport

I don't know enough about PCI to say if the bridge is a problem or not.
I'm adding linux-wireless and linux-pci in someone can help. Also Devin
seems to have a similar problem.

To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
PCI device where he is not having enough MSI vectors. ath11k needs 32
vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
for ath11k and introduced in v5.10-rc1. The irq allocation code is in
drivers/net/wireless/ath/ath11k/pci.c. [2]

Can PCI folks help, what could cause this and how to debug it further?

I would first try with a full distro kernel config, just in case there's
some another important kernel config missing.

[1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath/ath11k/pci.c#n633

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-02 18:49     ` pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310 Kalle Valo
@ 2020-11-02 20:57       ` Bjorn Helgaas
  2020-11-03  3:01         ` Carl Huang
                           ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Bjorn Helgaas @ 2020-11-02 20:57 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Thomas Krause, ath11k, linux-wireless, linux-pci, Devin Bayer,
	Govind Singh

[+cc Govind, author of 5697a564d369 ("ath11k: pci: add MSI config
initialisation")]

On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
> + linux-wireless, linux-pci, devin
> 
> Thomas Krause <thomaskrause@posteo.de> writes:
> 
> >> I had the same problem as well back in the days, for me enabling
> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
> >> mention that in the ath11k warning above :)
> >
> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
> > is behind a PCI bridge which is also disabled, could this be a
> > problem?
> >
> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
> > [Normal decode])
> > 	Flags: bus master, fast devsel, latency 0, IRQ 123
> > 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> > 	I/O behind bridge: [disabled]
> > 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
> > 	Prefetchable memory behind bridge: [disabled]
> > 	Capabilities: [40] Express Root Port (Slot+), MSI 00
> > 	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> > 	Capabilities: [90] Subsystem: Dell Device 0991
> > 	Capabilities: [a0] Power Management version 3
> > 	Capabilities: [100] Advanced Error Reporting
> > 	Capabilities: [220] Access Control Services
> > 	Capabilities: [150] Precision Time Measurement
> > 	Capabilities: [200] L1 PM Substates
> > 	Capabilities: [a00] Downstream Port Containment
> > 	Kernel driver in use: pcieport
> 
> I don't know enough about PCI to say if the bridge is a problem or not.

I don't think the bridge is an issue here.  AFAICT the bridge's I/O
and prefetchable memory windows are disabled, but the non-prefetchable
window *is* enabled and contains the space consumed by the ath11k
device:

  00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
	Memory behind bridge: 8c300000-8c3fffff [size=1M]
  56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
     Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]

> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
> PCI device where he is not having enough MSI vectors. ath11k needs 32
> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
> drivers/net/wireless/ath/ath11k/pci.c. [2]

This code is needlessly complicated.  If you absolutely need
msi_config.total_vectors and can't settle for any less, you can do
this:

  num_vectors = pci_alloc_irq_vectors(ab_pci->pdev,
                                      msi_config.total_vectors,
                                      msi_config.total_vectors,
                                      PCI_IRQ_MSI);

  if (num_vectors < 0) {
    ath11k_err(ab, "failed to get %d MSI vectors (%d)\n",
               msi_config.total_vectors, num_vectors);
    return num_vectors;
  }

But it seems a little greedy if the device can't operate at all unless
it gets 32 vectors.  Are you sure that's a hard requirement?  Most
devices can work with fewer vectors, even if it reduces performance.

> I would first try with a full distro kernel config, just in case there's
> some another important kernel config missing.
> 
> [1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html

Tangent: have you considered getting this list archived on
https://lore.kernel.org/lists.html?

> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath/ath11k/pci.c#n633
> 
> -- 
> https://patchwork.kernel.org/project/linux-wireless/list/
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-02 20:57       ` Bjorn Helgaas
@ 2020-11-03  3:01         ` Carl Huang
  2020-11-03  6:49         ` Kalle Valo
  2020-11-03 11:20         ` Devin Bayer
  2 siblings, 0 replies; 40+ messages in thread
From: Carl Huang @ 2020-11-03  3:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Kalle Valo, Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Thomas Krause, ath11k

On 2020-11-03 04:57, Bjorn Helgaas wrote:
> [+cc Govind, author of 5697a564d369 ("ath11k: pci: add MSI config
> initialisation")]
> 
> On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
>> + linux-wireless, linux-pci, devin
>> 
>> Thomas Krause <thomaskrause@posteo.de> writes:
>> 
>> >> I had the same problem as well back in the days, for me enabling
>> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> >> mention that in the ath11k warning above :)
>> >
>> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
>> > is behind a PCI bridge which is also disabled, could this be a
>> > problem?
>> >
>> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
>> > [Normal decode])
>> > 	Flags: bus master, fast devsel, latency 0, IRQ 123
>> > 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
>> > 	I/O behind bridge: [disabled]
>> > 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
>> > 	Prefetchable memory behind bridge: [disabled]
>> > 	Capabilities: [40] Express Root Port (Slot+), MSI 00
>> > 	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> > 	Capabilities: [90] Subsystem: Dell Device 0991
>> > 	Capabilities: [a0] Power Management version 3
>> > 	Capabilities: [100] Advanced Error Reporting
>> > 	Capabilities: [220] Access Control Services
>> > 	Capabilities: [150] Precision Time Measurement
>> > 	Capabilities: [200] L1 PM Substates
>> > 	Capabilities: [a00] Downstream Port Containment
>> > 	Kernel driver in use: pcieport
>> 
>> I don't know enough about PCI to say if the bridge is a problem or 
>> not.
> 
> I don't think the bridge is an issue here.  AFAICT the bridge's I/O
> and prefetchable memory windows are disabled, but the non-prefetchable
> window *is* enabled and contains the space consumed by the ath11k
> device:
> 
>   00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
> 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
>   56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
>      Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]
> 

Have you enabled VT-d from BIOS? This is required at least on some old 
laptops.


>> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
>> PCI device where he is not having enough MSI vectors. ath11k needs 32
>> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is 
>> new
>> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
>> drivers/net/wireless/ath/ath11k/pci.c. [2]
> 
> This code is needlessly complicated.  If you absolutely need
> msi_config.total_vectors and can't settle for any less, you can do
> this:
> 
>   num_vectors = pci_alloc_irq_vectors(ab_pci->pdev,
>                                       msi_config.total_vectors,
>                                       msi_config.total_vectors,
>                                       PCI_IRQ_MSI);
> 
>   if (num_vectors < 0) {
>     ath11k_err(ab, "failed to get %d MSI vectors (%d)\n",
>                msi_config.total_vectors, num_vectors);
>     return num_vectors;
>   }
> 
> But it seems a little greedy if the device can't operate at all unless
> it gets 32 vectors.  Are you sure that's a hard requirement?  Most
> devices can work with fewer vectors, even if it reduces performance.
> 
>> I would first try with a full distro kernel config, just in case 
>> there's
>> some another important kernel config missing.
>> 
>> [1] 
>> http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html
> 
> Tangent: have you considered getting this list archived on
> https://lore.kernel.org/lists.html?
> 
>> [2] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath/ath11k/pci.c#n633
>> 
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>> 
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-02 20:57       ` Bjorn Helgaas
  2020-11-03  3:01         ` Carl Huang
@ 2020-11-03  6:49         ` Kalle Valo
  2020-11-03 16:08           ` Bjorn Helgaas
  2020-11-03 11:20         ` Devin Bayer
  2 siblings, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-03  6:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Thomas Krause, ath11k

Bjorn Helgaas <helgaas@kernel.org> writes:

> [+cc Govind, author of 5697a564d369 ("ath11k: pci: add MSI config
> initialisation")]
>
> On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
>> + linux-wireless, linux-pci, devin
>> 
>> Thomas Krause <thomaskrause@posteo.de> writes:
>> 
>> >> I had the same problem as well back in the days, for me enabling
>> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> >> mention that in the ath11k warning above :)
>> >
>> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
>> > is behind a PCI bridge which is also disabled, could this be a
>> > problem?
>> >
>> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
>> > [Normal decode])
>> > 	Flags: bus master, fast devsel, latency 0, IRQ 123
>> > 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
>> > 	I/O behind bridge: [disabled]
>> > 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
>> > 	Prefetchable memory behind bridge: [disabled]
>> > 	Capabilities: [40] Express Root Port (Slot+), MSI 00
>> > 	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> > 	Capabilities: [90] Subsystem: Dell Device 0991
>> > 	Capabilities: [a0] Power Management version 3
>> > 	Capabilities: [100] Advanced Error Reporting
>> > 	Capabilities: [220] Access Control Services
>> > 	Capabilities: [150] Precision Time Measurement
>> > 	Capabilities: [200] L1 PM Substates
>> > 	Capabilities: [a00] Downstream Port Containment
>> > 	Kernel driver in use: pcieport
>> 
>> I don't know enough about PCI to say if the bridge is a problem or not.
>
> I don't think the bridge is an issue here.  AFAICT the bridge's I/O
> and prefetchable memory windows are disabled, but the non-prefetchable
> window *is* enabled and contains the space consumed by the ath11k
> device:
>
>   00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
> 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
>   56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
>      Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]

Good to know that the bridge shouldn't be the problem. Do you have any
ideas how to make more vectors available to ath11k, besides
CONFIG_IRQ_REMAP? Because QCA6390 works in Windows I doubt this is a
hardware problem.

>> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
>> PCI device where he is not having enough MSI vectors. ath11k needs 32
>> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
>> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
>> drivers/net/wireless/ath/ath11k/pci.c. [2]
>
> This code is needlessly complicated.  If you absolutely need
> msi_config.total_vectors and can't settle for any less, you can do
> this:
>
>   num_vectors = pci_alloc_irq_vectors(ab_pci->pdev,
>                                       msi_config.total_vectors,
>                                       msi_config.total_vectors,
>                                       PCI_IRQ_MSI);
>
>   if (num_vectors < 0) {
>     ath11k_err(ab, "failed to get %d MSI vectors (%d)\n",
>                msi_config.total_vectors, num_vectors);
>     return num_vectors;
>   }

True, this should be cleaned up. But of course this won't solve the
actual problem.

> But it seems a little greedy if the device can't operate at all unless
> it gets 32 vectors.  Are you sure that's a hard requirement?  Most
> devices can work with fewer vectors, even if it reduces performance.

This was my first reaction as well when I saw the code for the first
time. And the reply I got is that the firmware needs all 32 vectors, it
won't work with less.

>> I would first try with a full distro kernel config, just in case there's
>> some another important kernel config missing.
>> 
>> [1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html
>
> Tangent: have you considered getting this list archived on
> https://lore.kernel.org/lists.html?

Good point, actually I have not. I'll add both ath10k and ath11k lists
to lore. It's even more important now that lists.infradead.org had a
hard drive crash and lost years of archives.

Thanks for the help!

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-02 20:57       ` Bjorn Helgaas
  2020-11-03  3:01         ` Carl Huang
  2020-11-03  6:49         ` Kalle Valo
@ 2020-11-03 11:20         ` Devin Bayer
  2 siblings, 0 replies; 40+ messages in thread
From: Devin Bayer @ 2020-11-03 11:20 UTC (permalink / raw)
  To: Bjorn Helgaas, Kalle Valo
  Cc: Thomas Krause, ath11k, linux-wireless, linux-pci, Govind Singh

On 02/11/2020 21.57, Bjorn Helgaas wrote:
>>>
>>> CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
>>> is behind a PCI bridge which is also disabled, could this be a
>>> problem?

Just to provide another case, I have the same issue with this driver.

CONFIG_IRQ_REMAP=y and doesn't have any effect.

I'm unsure if the issue could be my system (Atom / Intel J1900) or the
that I'm using a slightly different card. Is there anyway to tell from the
lspci output? Here is what I guess is most relevant:

00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 3 (rev 0e) (prog-if 00 [Normal decode])
        Memory behind bridge: d0000000-d0ffffff [size=16M]
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee08004  Data: 4021

03:00.0 Unassigned class [ff00]: Qualcomm Device 1101
        Subsystem: Qualcomm Device 0108
        Region 0: Memory at d0000000 (64-bit, non-prefetchable) [size=16M]
        Capabilities: [50] MSI: Enable+ Count=1/32 Maskable+ 64bit-
                Address: fee01004  Data: 40ef
                Masking: ffffffff  Pending: 00000000
        Capabilities: [70] Express (v2) Endpoint, MSI 00

Thanks,
Devin

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-03  6:49         ` Kalle Valo
@ 2020-11-03 16:08           ` Bjorn Helgaas
  2020-11-03 21:08             ` Thomas Gleixner
  2020-11-09 18:48             ` Kalle Valo
  0 siblings, 2 replies; 40+ messages in thread
From: Bjorn Helgaas @ 2020-11-03 16:08 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Thomas Krause, ath11k, Thomas Gleixner, Christoph Hellwig

[+cc Thomas, Christoph for question about not enough MSI IRQ vectors]

On Tue, Nov 03, 2020 at 08:49:06AM +0200, Kalle Valo wrote:
> Bjorn Helgaas <helgaas@kernel.org> writes:
> > On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
> >> + linux-wireless, linux-pci, devin
> >> 
> >> Thomas Krause <thomaskrause@posteo.de> writes:
> >> 
> >> >> I had the same problem as well back in the days, for me enabling
> >> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
> >> >> mention that in the ath11k warning above :)
> >> >
> >> > CONFIG_IRQ_REMAP did not do the trick. I noticed that the Wi-Fi card
> >> > is behind a PCI bridge which is also disabled, could this be a
> >> > problem?
> >> >
> >> > 00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20) (prog-if 00
> >> > [Normal decode])
> >> > 	Flags: bus master, fast devsel, latency 0, IRQ 123
> >> > 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> >> > 	I/O behind bridge: [disabled]
> >> > 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
> >> > 	Prefetchable memory behind bridge: [disabled]
> >> > 	Capabilities: [40] Express Root Port (Slot+), MSI 00
> >> > 	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >> > 	Capabilities: [90] Subsystem: Dell Device 0991
> >> > 	Capabilities: [a0] Power Management version 3
> >> > 	Capabilities: [100] Advanced Error Reporting
> >> > 	Capabilities: [220] Access Control Services
> >> > 	Capabilities: [150] Precision Time Measurement
> >> > 	Capabilities: [200] L1 PM Substates
> >> > 	Capabilities: [a00] Downstream Port Containment
> >> > 	Kernel driver in use: pcieport
> >> 
> >> I don't know enough about PCI to say if the bridge is a problem or not.
> >
> > I don't think the bridge is an issue here.  AFAICT the bridge's I/O
> > and prefetchable memory windows are disabled, but the non-prefetchable
> > window *is* enabled and contains the space consumed by the ath11k
> > device:
> >
> >   00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
> > 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
> > 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
> >   56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
> >      Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]
> 
> Good to know that the bridge shouldn't be the problem. Do you have any
> ideas how to make more vectors available to ath11k, besides
> CONFIG_IRQ_REMAP? Because QCA6390 works in Windows I doubt this is a
> hardware problem.
> 
> >> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
> >> PCI device where he is not having enough MSI vectors. ath11k needs 32
> >> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
> >> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
> >> drivers/net/wireless/ath/ath11k/pci.c. [2]

> > But it seems a little greedy if the device can't operate at all unless
> > it gets 32 vectors.  Are you sure that's a hard requirement?  Most
> > devices can work with fewer vectors, even if it reduces performance.
> 
> This was my first reaction as well when I saw the code for the first
> time. And the reply I got is that the firmware needs all 32 vectors, it
> won't work with less.

I do see a couple other drivers that are completely inflexible (they
request min==max).  But I don't know the system constraint you're
hitting.  CC'd Thomas & Christoph in case they have time to give us a
hint.

> >> I would first try with a full distro kernel config, just in case there's
> >> some another important kernel config missing.
> >> 
> >> [1] http://lists.infradead.org/pipermail/ath11k/2020-October/000466.html
> >
> > Tangent: have you considered getting this list archived on
> > https://lore.kernel.org/lists.html?
> 
> Good point, actually I have not. I'll add both ath10k and ath11k lists
> to lore. It's even more important now that lists.infradead.org had a
> hard drive crash and lost years of archives.

Or you could just add linux-wireless, e.g.,

  L:      ath11k@lists.infradead.org
  L:      linux-wireless@vger.kernel.org

or even consider moving from ath10k and ath11k to
linux-wireless@vger.kernel.org.  I think there's some value in
consolidating low-volume lists.  It looks like ath11k had < 90
messages for all of October.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-03 16:08           ` Bjorn Helgaas
@ 2020-11-03 21:08             ` Thomas Gleixner
  2020-11-03 22:42               ` Thomas Gleixner
                                 ` (2 more replies)
  2020-11-09 18:48             ` Kalle Valo
  1 sibling, 3 replies; 40+ messages in thread
From: Thomas Gleixner @ 2020-11-03 21:08 UTC (permalink / raw)
  To: Bjorn Helgaas, Kalle Valo
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Thomas Krause, ath11k, Christoph Hellwig

On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
> On Tue, Nov 03, 2020 at 08:49:06AM +0200, Kalle Valo wrote:
>> Bjorn Helgaas <helgaas@kernel.org> writes:
>> > On Mon, Nov 02, 2020 at 08:49:51PM +0200, Kalle Valo wrote:
>> >> Thomas Krause <thomaskrause@posteo.de> writes:
>> >> 
>> >> >> I had the same problem as well back in the days, for me enabling
>> >> >> CONFIG_IRQ_REMAP helped. If it helps for you also I wonder if we should
>> >> >> mention that in the ath11k warning above :)

Interrupt remapping only helps when the device supports only MSI (not
MSI-X) because x86 (kernel) does not support multiple MSI interrupts
without remapping.

So if only MSI is available then you get exactly _one_ MSI vector
without remapping.

>> >> > CONFIG_IRQ_REMAP did not do the trick.

The config alone does not help. The hardware has to support it and the
BIOS has to enable it.

Check the BIOS for a switch which is named 'VT-d' or such. It might
depend on 'Intel Virtualization Technology' or such.

>> >   00:1c.0 PCI bridge: Intel Corporation Device a0b8 (rev 20)
>> > 	Bus: primary=00, secondary=56, subordinate=56, sec-latency=0
>> > 	Memory behind bridge: 8c300000-8c3fffff [size=1M]
>> >   56:00.0 Network controller: Qualcomm Device 1101 (rev 01)
>> >      Region 0: Memory at 8c300000 (64-bit, non-prefetchable) [size=1M]

So I grabbed the PCI info from the link and it has:

     Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit-

So no MSI-X, ergo only one MSI interrupt without remapping.
 
>> >> To summarise: Thomas is reporting[1] a problem with ath11k on QCA6390
>> >> PCI device where he is not having enough MSI vectors. ath11k needs 32
>> >> vectors but pci_alloc_irq_vectors() returns -ENOSPC. PCI support is new
>> >> for ath11k and introduced in v5.10-rc1. The irq allocation code is in
>> >> drivers/net/wireless/ath/ath11k/pci.c. [2]
>
>> > But it seems a little greedy if the device can't operate at all unless
>> > it gets 32 vectors.  Are you sure that's a hard requirement?  Most
>> > devices can work with fewer vectors, even if it reduces performance.

Right, even most high end network cards work with one interrupt.

>> This was my first reaction as well when I saw the code for the first
>> time. And the reply I got is that the firmware needs all 32 vectors, it
>> won't work with less.

Great design.

> I do see a couple other drivers that are completely inflexible (they
> request min==max).  But I don't know the system constraint you're
> hitting.  CC'd Thomas & Christoph in case they have time to give us a
> hint.

Can I have a full dmesg please?

Please enable CONFIG_IRQ_REMAP and CONFIG_INTEL_IOMMU (not strictly
required, but it's a Dell BIOS after all). Also set
CONFIG_INTEL_IOMMU_DEFAULT_ON.

Or simply try a distro kernel.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-03 21:08             ` Thomas Gleixner
@ 2020-11-03 22:42               ` Thomas Gleixner
  2020-11-09 18:44                 ` Kalle Valo
       [not found]               ` <fa26ac8b-ed48-7ea3-c21b-b133532716b8@posteo.de>
  2020-11-06 11:45               ` Devin Bayer
  2 siblings, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2020-11-03 22:42 UTC (permalink / raw)
  To: Bjorn Helgaas, Kalle Valo
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Thomas Krause, ath11k, Christoph Hellwig

On Tue, Nov 03 2020 at 22:08, Thomas Gleixner wrote:
> On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
>>> > But it seems a little greedy if the device can't operate at all unless
>>> > it gets 32 vectors.  Are you sure that's a hard requirement?  Most
>>> > devices can work with fewer vectors, even if it reduces performance.
>
> Right, even most high end network cards work with one interrupt.
>
>>> This was my first reaction as well when I saw the code for the first
>>> time. And the reply I got is that the firmware needs all 32 vectors, it
>>> won't work with less.
>
> Great design.

Just to put more information to this:

Enforcing 32 vectors with MSI is beyond silly. Due to the limitations of
MSI all of these vectors will be affine to a single CPU unless irq
remapping is available and enabled.

So if irq remapping is not enabled, then what are the 32 vectors buying?
Exactly nothing because they just compete to be handled on the very same
CPU. If the design requires more than one vector, then this should be
done with MSI-X (which allows individual affinities and individual
masking).

That's known for 20 years and MSI-X exists for exactly that reason. But
hardware people still insist on implementing MSI (probably because it
saves 0.002$ per chip).

But there is also the firmware side. Enforcing the availability of 32
vectors on MSI is silly to begin with as explained above, but it's also
silly given the constraints of the x86 vector space. It takes just 6
devices having the same 32 vector requirement to exhaust it. Oh well...

Thanks,

        tglx









^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
       [not found]               ` <fa26ac8b-ed48-7ea3-c21b-b133532716b8@posteo.de>
@ 2020-11-04 15:26                 ` Thomas Gleixner
  2020-11-05 13:23                   ` Kalle Valo
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2020-11-04 15:26 UTC (permalink / raw)
  To: Thomas Krause, Bjorn Helgaas, Kalle Valo
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer, ath11k,
	Christoph Hellwig, David Woodhouse

On Wed, Nov 04 2020 at 14:04, Thomas Krause wrote:
> config) but CONFIG_INTEL_IOMMU_DEFAULT_ON needed to be set manually. I 
> hope this helps, if there is more I can do to debug it on my side I'm 
> happy to do so.

> [    0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
>                BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:

> [    0.103693] DMAR: Host address width 39
> [    0.103693] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
> [    0.103697] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 69e2ff0505e
> [    0.103698] DMAR: DRHD base: 0x000000fed84000 flags: 0x0
> [    0.103701] DMAR: dmar1: reg_base_addr fed84000 ver 1:0 cap d2008c40660462 ecap f050da
> [    0.103702] DMAR: DRHD base: 0x000000fed86000 flags: 0x0
> [    0.103706] DMAR: dmar2: reg_base_addr fed86000 ver 1:0 cap d2008c40660462 ecap f050da
> [    0.103707] DMAR: DRHD base: 0x00000000000000 flags: 0x1
> [    0.103707] DMAR: Parse DMAR table failure.

which disables interrupt remapping and therefore the driver gets only
one MSI which makes it unhappy.

Not that I'm surprised, it's Dell.... Can you check whether they have a
BIOS update for that box?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-04 15:26                 ` Thomas Gleixner
@ 2020-11-05 13:23                   ` Kalle Valo
  2020-11-10  8:33                     ` Kalle Valo
  0 siblings, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-05 13:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Thomas Krause, Bjorn Helgaas, Govind Singh, linux-pci,
	linux-wireless, Devin Bayer, ath11k, Christoph Hellwig,
	David Woodhouse

Thomas Gleixner <tglx@linutronix.de> writes:

> On Wed, Nov 04 2020 at 14:04, Thomas Krause wrote:
>> config) but CONFIG_INTEL_IOMMU_DEFAULT_ON needed to be set manually. I 
>> hope this helps, if there is more I can do to debug it on my side I'm 
>> happy to do so.
>
>> [    0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR reported at address 0!
>>                BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:
>
>> [    0.103693] DMAR: Host address width 39
>> [    0.103693] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
>> [    0.103697] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 69e2ff0505e
>> [    0.103698] DMAR: DRHD base: 0x000000fed84000 flags: 0x0
>> [    0.103701] DMAR: dmar1: reg_base_addr fed84000 ver 1:0 cap d2008c40660462 ecap f050da
>> [    0.103702] DMAR: DRHD base: 0x000000fed86000 flags: 0x0
>> [    0.103706] DMAR: dmar2: reg_base_addr fed86000 ver 1:0 cap d2008c40660462 ecap f050da
>> [    0.103707] DMAR: DRHD base: 0x00000000000000 flags: 0x1
>> [    0.103707] DMAR: Parse DMAR table failure.
>
> which disables interrupt remapping and therefore the driver gets only
> one MSI which makes it unhappy.
>
> Not that I'm surprised, it's Dell.... Can you check whether they have a
> BIOS update for that box?

I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
separate "Virtualisation" setting in BIOS. See if you have that and try
enabling it.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-03 21:08             ` Thomas Gleixner
  2020-11-03 22:42               ` Thomas Gleixner
       [not found]               ` <fa26ac8b-ed48-7ea3-c21b-b133532716b8@posteo.de>
@ 2020-11-06 11:45               ` Devin Bayer
  2 siblings, 0 replies; 40+ messages in thread
From: Devin Bayer @ 2020-11-06 11:45 UTC (permalink / raw)
  To: Thomas Gleixner, Bjorn Helgaas, Kalle Valo
  Cc: Govind Singh, linux-pci, linux-wireless, Christoph Hellwig,
	Thomas Krause, ath11k

On 03/11/2020 22.08, Thomas Gleixner wrote:
> On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
> 
> Check the BIOS for a switch which is named 'VT-d' or such. It might
> depend on 'Intel Virtualization Technology' or such.
> 

Thanks for this info. The platform I have, J1900, indeed does not support VT-d.

So I guess I'm not able to use this card. That's unfortunate.

It doesn't seem like the Windows driver works either. It doesn't give any errors
but it fails to find any wireless networks.

~ dev

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-03 22:42               ` Thomas Gleixner
@ 2020-11-09 18:44                 ` Kalle Valo
  0 siblings, 0 replies; 40+ messages in thread
From: Kalle Valo @ 2020-11-09 18:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Govind Singh, linux-pci, linux-wireless,
	Devin Bayer, Christoph Hellwig, Thomas Krause, ath11k,
	Stefani Seibold

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, Nov 03 2020 at 22:08, Thomas Gleixner wrote:
>> On Tue, Nov 03 2020 at 10:08, Bjorn Helgaas wrote:
>>>> > But it seems a little greedy if the device can't operate at all unless
>>>> > it gets 32 vectors.  Are you sure that's a hard requirement?  Most
>>>> > devices can work with fewer vectors, even if it reduces performance.
>>
>> Right, even most high end network cards work with one interrupt.
>>
>>>> This was my first reaction as well when I saw the code for the first
>>>> time. And the reply I got is that the firmware needs all 32 vectors, it
>>>> won't work with less.
>>
>> Great design.
>
> Just to put more information to this:
>
> Enforcing 32 vectors with MSI is beyond silly. Due to the limitations of
> MSI all of these vectors will be affine to a single CPU unless irq
> remapping is available and enabled.
>
> So if irq remapping is not enabled, then what are the 32 vectors buying?
> Exactly nothing because they just compete to be handled on the very same
> CPU. If the design requires more than one vector, then this should be
> done with MSI-X (which allows individual affinities and individual
> masking).
>
> That's known for 20 years and MSI-X exists for exactly that reason. But
> hardware people still insist on implementing MSI (probably because it
> saves 0.002$ per chip).
>
> But there is also the firmware side. Enforcing the availability of 32
> vectors on MSI is silly to begin with as explained above, but it's also
> silly given the constraints of the x86 vector space. It takes just 6
> devices having the same 32 vector requirement to exhaust it. Oh well...

Thanks Thomas, this is great info. I'm pushing this internally and we
try to get ath11k working with just one MSI vector.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-03 16:08           ` Bjorn Helgaas
  2020-11-03 21:08             ` Thomas Gleixner
@ 2020-11-09 18:48             ` Kalle Valo
  1 sibling, 0 replies; 40+ messages in thread
From: Kalle Valo @ 2020-11-09 18:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Christoph Hellwig, Thomas Krause, Thomas Gleixner, ath11k

Bjorn Helgaas <helgaas@kernel.org> writes:

>> > Tangent: have you considered getting this list archived on
>> > https://lore.kernel.org/lists.html?
>> 
>> Good point, actually I have not. I'll add both ath10k and ath11k lists
>> to lore. It's even more important now that lists.infradead.org had a
>> hard drive crash and lost years of archives.
>
> Or you could just add linux-wireless, e.g.,
>
>   L:      ath11k@lists.infradead.org
>   L:      linux-wireless@vger.kernel.org
>
> or even consider moving from ath10k and ath11k to
> linux-wireless@vger.kernel.org.  I think there's some value in
> consolidating low-volume lists.  It looks like ath11k had < 90
> messages for all of October.

The background here is that linux-wireless is quite high volume list and
not everyone have time to follow that, so having specific ath10k and
ath11k lists make it easier for those people. So I'm hesitant to
shutdown driver lists for that reason.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-05 13:23                   ` Kalle Valo
@ 2020-11-10  8:33                     ` Kalle Valo
  2020-11-11  8:53                       ` Thomas Krause
  0 siblings, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-10  8:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Christoph Hellwig, Thomas Krause, Bjorn Helgaas, ath11k,
	David Woodhouse, Stefani Seibold

Kalle Valo <kvalo@codeaurora.org> writes:

> Thomas Gleixner <tglx@linutronix.de> writes:
>
>> On Wed, Nov 04 2020 at 14:04, Thomas Krause wrote:
>>> config) but CONFIG_INTEL_IOMMU_DEFAULT_ON needed to be set manually. I 
>>> hope this helps, if there is more I can do to debug it on my side I'm 
>>> happy to do so.
>>
>>> [ 0.050130] DMAR: [Firmware Bug]: Your BIOS is broken; DMAR
>>> reported at address 0!
>>>                BIOS vendor: Dell Inc.; Ver: 1.1.1; Product Version:
>>
>>> [    0.103693] DMAR: Host address width 39
>>> [    0.103693] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
>>> [ 0.103697] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap
>>> 1c0000c40660462 ecap 69e2ff0505e
>>> [    0.103698] DMAR: DRHD base: 0x000000fed84000 flags: 0x0
>>> [ 0.103701] DMAR: dmar1: reg_base_addr fed84000 ver 1:0 cap
>>> d2008c40660462 ecap f050da
>>> [    0.103702] DMAR: DRHD base: 0x000000fed86000 flags: 0x0
>>> [ 0.103706] DMAR: dmar2: reg_base_addr fed86000 ver 1:0 cap
>>> d2008c40660462 ecap f050da
>>> [    0.103707] DMAR: DRHD base: 0x00000000000000 flags: 0x1
>>> [    0.103707] DMAR: Parse DMAR table failure.
>>
>> which disables interrupt remapping and therefore the driver gets only
>> one MSI which makes it unhappy.
>>
>> Not that I'm surprised, it's Dell.... Can you check whether they have a
>> BIOS update for that box?
>
> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
> separate "Virtualisation" setting in BIOS. See if you have that and try
> enabling it.

I was informed about another setting to test: try disabling "Enable
Secure Boot" in the BIOS. I don't know yet why it would help, but that's
what few people have recommended.

Please let me know how it goes.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-10  8:33                     ` Kalle Valo
@ 2020-11-11  8:53                       ` Thomas Krause
  2020-11-11  9:22                         ` Kalle Valo
  2020-11-11  9:39                         ` Thomas Gleixner
  0 siblings, 2 replies; 40+ messages in thread
From: Thomas Krause @ 2020-11-11  8:53 UTC (permalink / raw)
  To: Kalle Valo, Thomas Gleixner
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Christoph Hellwig, Bjorn Helgaas, ath11k, David Woodhouse,
	Stefani Seibold


Am 10.11.20 um 09:33 schrieb Kalle Valo:
>
>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>> separate "Virtualisation" setting in BIOS. See if you have that and try
>> enabling it.
> I was informed about another setting to test: try disabling "Enable
> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
> what few people have recommended.
>
> Please let me know how it goes.
>
I have two options under "Virtualization" in the BIOS: "Enable Intel 
Virtualization Technology (VT)" and "VT for Direct I/O". Both were 
enabled. Secure boot was also turned off. BIOS version is also at the 
most current version 1.1.1. Because of the dmesg errors Thomas Gleixner 
mentioned, I assume it would be best to contact Dell directly (even if 
I'm not sure if and how fast they will respond). If the driver would 
manage to work with only 1 vector, I assume this would also make it work 
on my configuration, even with possible performance hits.

Best,

Thomas



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11  8:53                       ` Thomas Krause
@ 2020-11-11  9:22                         ` Kalle Valo
  2020-11-11 19:10                           ` Kalle Valo
  2020-11-11  9:39                         ` Thomas Gleixner
  1 sibling, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-11  9:22 UTC (permalink / raw)
  To: Thomas Krause
  Cc: Thomas Gleixner, Govind Singh, linux-pci, Stefani Seibold,
	linux-wireless, Devin Bayer, ath11k, Bjorn Helgaas,
	Christoph Hellwig, David Woodhouse

Thomas Krause <thomaskrause@posteo.de> writes:

> Am 10.11.20 um 09:33 schrieb Kalle Valo:
>>
>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>>> separate "Virtualisation" setting in BIOS. See if you have that and try
>>> enabling it.
>> I was informed about another setting to test: try disabling "Enable
>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
>> what few people have recommended.
>>
>> Please let me know how it goes.
>>
> I have two options under "Virtualization" in the BIOS: "Enable Intel
> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
> enabled. Secure boot was also turned off. BIOS version is also at the
> most current version 1.1.1.

This is good to know, thanks for testing. Now we have explored all
possible BIOS options as I know of.

> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
> would be best to contact Dell directly (even if I'm not sure if and
> how fast they will respond).

I have asked our people to report this to Dell, but no response yet.

> If the driver would manage to work with only 1 vector, I assume this
> would also make it work on my configuration, even with possible
> performance hits.

This is the workaround we are working on at the moment. There's now a
proof of concept patch but I'm not certain if it will work. I'll post it
as soon as I can and will provide the link in this thread.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11  8:53                       ` Thomas Krause
  2020-11-11  9:22                         ` Kalle Valo
@ 2020-11-11  9:39                         ` Thomas Gleixner
  1 sibling, 0 replies; 40+ messages in thread
From: Thomas Gleixner @ 2020-11-11  9:39 UTC (permalink / raw)
  To: Thomas Krause, Kalle Valo
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Christoph Hellwig, Bjorn Helgaas, ath11k, David Woodhouse,
	Stefani Seibold

On Wed, Nov 11 2020 at 09:53, Thomas Krause wrote:
> Am 10.11.20 um 09:33 schrieb Kalle Valo:
>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>>> separate "Virtualisation" setting in BIOS. See if you have that and try
>>> enabling it.
>> I was informed about another setting to test: try disabling "Enable
>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
>> what few people have recommended.
>>
>> Please let me know how it goes.
>>
> I have two options under "Virtualization" in the BIOS: "Enable Intel 
> Virtualization Technology (VT)" and "VT for Direct I/O". Both were

VT for Direct I/O enables the IOMMU and the interrupt remapping unit,
but the kernel can't use it because the ACPI tables are busted.

> enabled. Secure boot was also turned off. BIOS version is also at the 
> most current version 1.1.1. Because of the dmesg errors Thomas Gleixner 
> mentioned, I assume it would be best to contact Dell directly (even if 
> I'm not sure if and how fast they will respond). If the driver would

Good luck.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11  9:22                         ` Kalle Valo
@ 2020-11-11 19:10                           ` Kalle Valo
  2020-11-11 19:24                             ` wi nk
                                               ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Kalle Valo @ 2020-11-11 19:10 UTC (permalink / raw)
  To: Thomas Krause
  Cc: Govind Singh, linux-pci, Stefani Seibold, linux-wireless,
	Devin Bayer, Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner,
	ath11k, David Woodhouse, wink

Kalle Valo <kvalo@codeaurora.org> writes:

> Thomas Krause <thomaskrause@posteo.de> writes:
>
>> Am 10.11.20 um 09:33 schrieb Kalle Valo:
>>>
>>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
>>>> separate "Virtualisation" setting in BIOS. See if you have that and try
>>>> enabling it.
>>> I was informed about another setting to test: try disabling "Enable
>>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
>>> what few people have recommended.
>>>
>>> Please let me know how it goes.
>>>
>> I have two options under "Virtualization" in the BIOS: "Enable Intel
>> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
>> enabled. Secure boot was also turned off. BIOS version is also at the
>> most current version 1.1.1.
>
> This is good to know, thanks for testing. Now we have explored all
> possible BIOS options as I know of.
>
>> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
>> would be best to contact Dell directly (even if I'm not sure if and
>> how fast they will respond).
>
> I have asked our people to report this to Dell, but no response yet.
>
>> If the driver would manage to work with only 1 vector, I assume this
>> would also make it work on my configuration, even with possible
>> performance hits.
>
> This is the workaround we are working on at the moment. There's now a
> proof of concept patch but I'm not certain if it will work. I'll post it
> as soon as I can and will provide the link in this thread.

The proof of concept patch for v5.10-rc2 is here:

https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/

Hopefully it makes it possible to boot the firmware now. But this is a
quick hack and most likely buggy, so keep your expectations low :)

In case there are these warnings during firmware initialisation:

ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110

Try reverting this commit:

7fef431be9c9 mm/page_alloc: place pages to tail in __free_pages_core()

That's another issue which is debugged here:

http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11 19:10                           ` Kalle Valo
@ 2020-11-11 19:24                             ` wi nk
  2020-11-11 19:30                               ` wi nk
  2020-11-11 21:35                             ` Stefani Seibold
  2020-11-11 22:02                             ` Stefani Seibold
  2 siblings, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-11 19:24 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Thomas Krause, Govind Singh, linux-pci, Stefani Seibold,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

Kalle,

  Thanks so much for your and your teams efforts.  I've applied the
patch, and I'm receiving some errors similar to what you thought might
occur:

[    7.802756] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[    7.802797] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[    7.802815] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[    7.803291] ath11k_pci 0000:55:00.0: MSI vectors: 1
[    8.172623] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 48
[    8.172624] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22

I've reverted the commit you mentioned and am rebuilding now.  I'll
test in a few minutes.

Thanks!

On Wed, Nov 11, 2020 at 8:10 PM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Kalle Valo <kvalo@codeaurora.org> writes:
>
> > Thomas Krause <thomaskrause@posteo.de> writes:
> >
> >> Am 10.11.20 um 09:33 schrieb Kalle Valo:
> >>>
> >>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
> >>>> separate "Virtualisation" setting in BIOS. See if you have that and try
> >>>> enabling it.
> >>> I was informed about another setting to test: try disabling "Enable
> >>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
> >>> what few people have recommended.
> >>>
> >>> Please let me know how it goes.
> >>>
> >> I have two options under "Virtualization" in the BIOS: "Enable Intel
> >> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
> >> enabled. Secure boot was also turned off. BIOS version is also at the
> >> most current version 1.1.1.
> >
> > This is good to know, thanks for testing. Now we have explored all
> > possible BIOS options as I know of.
> >
> >> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
> >> would be best to contact Dell directly (even if I'm not sure if and
> >> how fast they will respond).
> >
> > I have asked our people to report this to Dell, but no response yet.
> >
> >> If the driver would manage to work with only 1 vector, I assume this
> >> would also make it work on my configuration, even with possible
> >> performance hits.
> >
> > This is the workaround we are working on at the moment. There's now a
> > proof of concept patch but I'm not certain if it will work. I'll post it
> > as soon as I can and will provide the link in this thread.
>
> The proof of concept patch for v5.10-rc2 is here:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
>
> Hopefully it makes it possible to boot the firmware now. But this is a
> quick hack and most likely buggy, so keep your expectations low :)
>
> In case there are these warnings during firmware initialisation:
>
> ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>
> Try reverting this commit:
>
> 7fef431be9c9 mm/page_alloc: place pages to tail in __free_pages_core()
>
> That's another issue which is debugged here:
>
> http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11 19:24                             ` wi nk
@ 2020-11-11 19:30                               ` wi nk
  2020-11-11 19:45                                 ` Kalle Valo
  0 siblings, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-11 19:30 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Thomas Krause, Govind Singh, linux-pci, Stefani Seibold,
	linux-wireless, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, Devin Bayer

Ok with 7fef431be9c9 reverted, it doesn't seem to change the initialization any:

[    7.961867] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[    7.961913] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[    7.961930] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[    7.962009] ath11k_pci 0000:55:00.0: MSI vectors: 1
[    8.461553] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 48
[    8.461556] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22

and just for thoroughness, here are my firmware file checksums (sha256):

9cc48d1dce819ead4112c6a8051c51e4d75e2b11f99ba9d8738cf8108967b70e  amss.bin
5081930c3b207f8ed82ff250f9b90fb77e87b2a92c3cf80ad020a58dea0bc5b7  board.bin
596482f780d21645f72a48acd9aed6c6fc8cf2d039ac31552a19800674d253cc  m3.bin


Thanks!


On Wed, Nov 11, 2020 at 8:24 PM wi nk <wink@technolu.st> wrote:
>
> Kalle,
>
>   Thanks so much for your and your teams efforts.  I've applied the
> patch, and I'm receiving some errors similar to what you thought might
> occur:
>
> [    7.802756] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> experimental!
> [    7.802797] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> 0x8e300000-0x8e3fffff 64bit]
> [    7.802815] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> [    7.803291] ath11k_pci 0000:55:00.0: MSI vectors: 1
> [    8.172623] ath11k_pci 0000:55:00.0: Respond mem req failed,
> result: 1, err: 48
> [    8.172624] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
>
> I've reverted the commit you mentioned and am rebuilding now.  I'll
> test in a few minutes.
>
> Thanks!
>
> On Wed, Nov 11, 2020 at 8:10 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> >
> > Kalle Valo <kvalo@codeaurora.org> writes:
> >
> > > Thomas Krause <thomaskrause@posteo.de> writes:
> > >
> > >> Am 10.11.20 um 09:33 schrieb Kalle Valo:
> > >>>
> > >>>> I was told that on Dell XPS 15 (with a working QCA6390 setup) there's a
> > >>>> separate "Virtualisation" setting in BIOS. See if you have that and try
> > >>>> enabling it.
> > >>> I was informed about another setting to test: try disabling "Enable
> > >>> Secure Boot" in the BIOS. I don't know yet why it would help, but that's
> > >>> what few people have recommended.
> > >>>
> > >>> Please let me know how it goes.
> > >>>
> > >> I have two options under "Virtualization" in the BIOS: "Enable Intel
> > >> Virtualization Technology (VT)" and "VT for Direct I/O". Both were
> > >> enabled. Secure boot was also turned off. BIOS version is also at the
> > >> most current version 1.1.1.
> > >
> > > This is good to know, thanks for testing. Now we have explored all
> > > possible BIOS options as I know of.
> > >
> > >> Because of the dmesg errors Thomas Gleixner mentioned, I assume it
> > >> would be best to contact Dell directly (even if I'm not sure if and
> > >> how fast they will respond).
> > >
> > > I have asked our people to report this to Dell, but no response yet.
> > >
> > >> If the driver would manage to work with only 1 vector, I assume this
> > >> would also make it work on my configuration, even with possible
> > >> performance hits.
> > >
> > > This is the workaround we are working on at the moment. There's now a
> > > proof of concept patch but I'm not certain if it will work. I'll post it
> > > as soon as I can and will provide the link in this thread.
> >
> > The proof of concept patch for v5.10-rc2 is here:
> >
> > https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
> >
> > Hopefully it makes it possible to boot the firmware now. But this is a
> > quick hack and most likely buggy, so keep your expectations low :)
> >
> > In case there are these warnings during firmware initialisation:
> >
> > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> >
> > Try reverting this commit:
> >
> > 7fef431be9c9 mm/page_alloc: place pages to tail in __free_pages_core()
> >
> > That's another issue which is debugged here:
> >
> > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11 19:30                               ` wi nk
@ 2020-11-11 19:45                                 ` Kalle Valo
  2020-11-11 20:12                                   ` wi nk
  0 siblings, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-11 19:45 UTC (permalink / raw)
  To: wi nk
  Cc: Govind Singh, linux-pci, Stefani Seibold, linux-wireless,
	Devin Bayer, ath11k, Thomas Krause, Bjorn Helgaas,
	Thomas Gleixner, Christoph Hellwig

(please don't top post, makes it harder to read emails)

wi nk <wink@technolu.st> writes:

> Ok with 7fef431be9c9 reverted, it doesn't seem to change the initialization any:
>
> [    7.961867] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> experimental!
> [    7.961913] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> 0x8e300000-0x8e3fffff 64bit]
> [    7.961930] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> [    7.962009] ath11k_pci 0000:55:00.0: MSI vectors: 1
> [    8.461553] ath11k_pci 0000:55:00.0: Respond mem req failed,
> result: 1, err: 48
> [    8.461556] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22

I also see this -22 error (see my logs in [1]), even when the firmware
reboots normally. Do you see anything after these messages?

The problem which reverting 7fef431be9c9 helps has these errors:

ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110

[1] http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html

> and just for thoroughness, here are my firmware file checksums (sha256):
>
> 9cc48d1dce819ead4112c6a8051c51e4d75e2b11f99ba9d8738cf8108967b70e  amss.bin
> 5081930c3b207f8ed82ff250f9b90fb77e87b2a92c3cf80ad020a58dea0bc5b7  board.bin
> 596482f780d21645f72a48acd9aed6c6fc8cf2d039ac31552a19800674d253cc  m3.bin

But these do not look same. I have:

a101dc90f8e876f39383b60c9da64ec4  /lib/firmware/ath11k/QCA6390/hw2.0/amss.bin
4c0781f659d2b7d6bef10a2e3d457728  /lib/firmware/ath11k/QCA6390/hw2.0/board-2.bin
d4c912a3501a3694a3f460d13de06d28  /lib/firmware/ath11k/QCA6390/hw2.0/m3.bin

Download them like this:

wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/amss.bin

wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/m3.bin

wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/board-2.bin

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11 19:45                                 ` Kalle Valo
@ 2020-11-11 20:12                                   ` wi nk
  0 siblings, 0 replies; 40+ messages in thread
From: wi nk @ 2020-11-11 20:12 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Govind Singh, linux-pci, Stefani Seibold, linux-wireless,
	Devin Bayer, ath11k, Thomas Krause, Bjorn Helgaas,
	Thomas Gleixner, Christoph Hellwig

On Wed, Nov 11, 2020 at 8:45 PM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> (please don't top post, makes it harder to read emails)
>
> wi nk <wink@technolu.st> writes:
>
> > Ok with 7fef431be9c9 reverted, it doesn't seem to change the initialization any:
> >
> > [    7.961867] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
> > experimental!
> > [    7.961913] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
> > 0x8e300000-0x8e3fffff 64bit]
> > [    7.961930] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
> > [    7.962009] ath11k_pci 0000:55:00.0: MSI vectors: 1
> > [    8.461553] ath11k_pci 0000:55:00.0: Respond mem req failed,
> > result: 1, err: 48
> > [    8.461556] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
>
> I also see this -22 error (see my logs in [1]), even when the firmware
> reboots normally. Do you see anything after these messages?
>
> The problem which reverting 7fef431be9c9 helps has these errors:
>
> ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110
>
> [1] http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
>
> > and just for thoroughness, here are my firmware file checksums (sha256):
> >
> > 9cc48d1dce819ead4112c6a8051c51e4d75e2b11f99ba9d8738cf8108967b70e  amss.bin
> > 5081930c3b207f8ed82ff250f9b90fb77e87b2a92c3cf80ad020a58dea0bc5b7  board.bin
> > 596482f780d21645f72a48acd9aed6c6fc8cf2d039ac31552a19800674d253cc  m3.bin
>
> But these do not look same. I have:
>
> a101dc90f8e876f39383b60c9da64ec4  /lib/firmware/ath11k/QCA6390/hw2.0/amss.bin
> 4c0781f659d2b7d6bef10a2e3d457728  /lib/firmware/ath11k/QCA6390/hw2.0/board-2.bin
> d4c912a3501a3694a3f460d13de06d28  /lib/firmware/ath11k/QCA6390/hw2.0/m3.bin
>
> Download them like this:
>
> wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/amss.bin
>
> wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/1.0.1/WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1/m3.bin
>
> wget https://github.com/kvalo/ath11k-firmware/raw/master/QCA6390/hw2.0/board-2.bin
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Sorry for the top posting, web email has ruined my mailing list
etiquette.  It seems having the correct firmware in place has caused
some forward movement.  I now see this:

[    8.513210] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[    8.513251] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[    8.513269] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[    8.513348] ath11k_pci 0000:55:00.0: MSI vectors: 1
[    8.789499] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 0
[    8.789500] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
[    8.794236] ath11k_pci 0000:55:00.0: req mem_seg[0] 0x28100000 524288 1
[    8.794237] ath11k_pci 0000:55:00.0: req mem_seg[1] 0x28180000 524288 1
[    8.794238] ath11k_pci 0000:55:00.0: req mem_seg[2] 0x28200000 524288 1
[    8.794238] ath11k_pci 0000:55:00.0: req mem_seg[3] 0x28280000 294912 1
[    8.794239] ath11k_pci 0000:55:00.0: req mem_seg[4] 0x28300000 524288 1
[    8.794239] ath11k_pci 0000:55:00.0: req mem_seg[5] 0x28380000 524288 1
[    8.794240] ath11k_pci 0000:55:00.0: req mem_seg[6] 0x27c00000 458752 1
[    8.794240] ath11k_pci 0000:55:00.0: req mem_seg[7] 0x27c80000 131072 1
[    8.794240] ath11k_pci 0000:55:00.0: req mem_seg[8] 0x27d00000 524288 4
[    8.794241] ath11k_pci 0000:55:00.0: req mem_seg[9] 0x27d80000 360448 4
[    8.794241] ath11k_pci 0000:55:00.0: req mem_seg[10] 0x28578000 16384 1
[    8.807053] ath11k_pci 0000:55:00.0: chip_id 0x0 chip_family 0xb
board_id 0xff soc_id 0xffffffff
[    8.807054] ath11k_pci 0000:55:00.0: fw_version 0x101c06cc
fw_build_timestamp 2020-06-24 19:50 fw_build_id
[    8.910984] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
[    9.446566] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0
[   11.296620] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
[   22.088028] ath11k_pci 0000:55:00.0: wmi command 12290 timeout
[   22.088030] ath11k_pci 0000:55:00.0: failed to send WMI_STOP_SCAN_CMDID
[   22.088031] ath11k_pci 0000:55:00.0: failed to stop wmi scan: -11
[   22.088032] ath11k_pci 0000:55:00.0: failed to stop scan: -11
[   22.088033] ath11k_pci 0000:55:00.0: failed to start hw scan: -110
[   28.232066] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[   28.232069] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[   28.232073] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[   38.216054] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[   38.216057] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[   38.216061] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[   51.783961] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[   51.783965] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[   51.783970] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[   71.695627] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[   71.695629] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[   71.695630] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[  100.864905] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[  100.864909] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[  100.864913] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[  107.306896] mhi 0000:55:00.0: Device failed to exit MHI Reset state
[  143.868561] ath11k_pci 0000:55:00.0: wmi command 12289 timeout
[  143.868564] ath11k_pci 0000:55:00.0: failed to send WMI_START_SCAN_CMDID
[  143.868566] ath11k_pci 0000:55:00.0: failed to start hw scan: -11
[  199.464250] mhi 0000:55:00.0: Device failed to exit MHI Reset state
<snip>

Occasionally my kernel is panic'ing at random spots (this is probably
related to the other patch I guess), but I do have a bit of an adapter
now ,ifconfig shows it.  I don't seem to be able to find any networks
with it however.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11 19:10                           ` Kalle Valo
  2020-11-11 19:24                             ` wi nk
@ 2020-11-11 21:35                             ` Stefani Seibold
  2020-11-11 22:02                             ` Stefani Seibold
  2 siblings, 0 replies; 40+ messages in thread
From: Stefani Seibold @ 2020-11-11 21:35 UTC (permalink / raw)
  To: Kalle Valo, Thomas Krause
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner, ath11k,
	David Woodhouse, wink

On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> 
> 
> The proof of concept patch for v5.10-rc2 is here:
> 
> https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
> 
> Hopefully it makes it possible to boot the firmware now. But this is
> a
> quick hack and most likely buggy, so keep your expectations low :)
> 
> In case there are these warnings during firmware initialisation:
> 
> ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> 
> Try reverting this commit:
> 
> 7fef431be9c9 mm/page_alloc: place pages to tail in
> __free_pages_core()
> 
> That's another issue which is debugged here:
> 
> http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> 

Success on DELL XPS13 910. Applying the patch and revert patch
7fef431be9c9 worked for me.

Thanks!



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11 19:10                           ` Kalle Valo
  2020-11-11 19:24                             ` wi nk
  2020-11-11 21:35                             ` Stefani Seibold
@ 2020-11-11 22:02                             ` Stefani Seibold
  2020-11-12  0:24                               ` wi nk
  2 siblings, 1 reply; 40+ messages in thread
From: Stefani Seibold @ 2020-11-11 22:02 UTC (permalink / raw)
  To: Kalle Valo, Thomas Krause
  Cc: Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner, ath11k,
	David Woodhouse, wink

On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> 
> The proof of concept patch for v5.10-rc2 is here:
> 
> https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
> 
> Hopefully it makes it possible to boot the firmware now. But this is
> a
> quick hack and most likely buggy, so keep your expectations low :)
> 
> In case there are these warnings during firmware initialisation:
> 
> ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> 
> Try reverting this commit:
> 
> 7fef431be9c9 mm/page_alloc: place pages to tail in
> __free_pages_core()
> 
> That's another issue which is debugged here:
> 
> http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> 

Applying the patch and revert patch 7fef431be9c9 worked on the first
glance.

After a couple of minutes the connection get broken. The kernel log
shows the following error:

ath11k_pci 0000:55:00.0: wmi command 16387 timeout
ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
ath11k_pc
i 0000:55:00.0: failed to enable PMF QOS: (-11

It is also not possible to unload the ath11k_pci, rmmod will hang.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-11 22:02                             ` Stefani Seibold
@ 2020-11-12  0:24                               ` wi nk
  2020-11-12  1:10                                 ` wi nk
  0 siblings, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-12  0:24 UTC (permalink / raw)
  To: Stefani Seibold
  Cc: Kalle Valo, Thomas Krause, Govind Singh, linux-pci,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <stefani@seibold.net> wrote:
>
> On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> >
> > The proof of concept patch for v5.10-rc2 is here:
> >
> > https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
> >
> > Hopefully it makes it possible to boot the firmware now. But this is
> > a
> > quick hack and most likely buggy, so keep your expectations low :)
> >
> > In case there are these warnings during firmware initialisation:
> >
> > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> >
> > Try reverting this commit:
> >
> > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > __free_pages_core()
> >
> > That's another issue which is debugged here:
> >
> > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> >
>
> Applying the patch and revert patch 7fef431be9c9 worked on the first
> glance.
>
> After a couple of minutes the connection get broken. The kernel log
> shows the following error:
>
> ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> ath11k_pc
> i 0000:55:00.0: failed to enable PMF QOS: (-11
>
> It is also not possible to unload the ath11k_pci, rmmod will hang.
>
>

I can confirm the same behavior as Stefani so far.  After applying the
patch, and reverting commit 7fef431be9c9, I am able to connect to a
network.  It hasn't disconnected yet (I'm sending this email via that
connection).  I'll report what I find next.

Thanks again for the help!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  0:24                               ` wi nk
@ 2020-11-12  1:10                                 ` wi nk
  2020-11-12  1:11                                   ` wi nk
  2020-11-12  7:05                                   ` Stefani Seibold
  0 siblings, 2 replies; 40+ messages in thread
From: wi nk @ 2020-11-12  1:10 UTC (permalink / raw)
  To: Stefani Seibold
  Cc: Kalle Valo, Thomas Krause, Govind Singh, linux-pci,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

I've yet to see any instability after 45 minutes of exercising it, I
do see a couple of messages that came out of the driver:

[    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
[   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a

then when it associates:

[   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
[   16.722636] wlp85s0: authenticated
[   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
[   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
(capab=0x411 status=0 aid=8)
[   16.738443] wlp85s0: associated
[   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready

The adapter is achieving around 500 mbps on my gigabit connection, my
2018 mbp sees around 650, so it's doing pretty well so far.

Stefani - when you applied the patch that Kalle shared, which branch
did you apply it to?  I applied it to ath11k-qca6390-bringup and when
I revert 7fef431be9c9 there is a small merge conflict I needed to
resolve.  I wonder if either the starting branch, or your chosen
resolution are related to the instability you see (or I'm just lucky
so far! :)).

On Thu, Nov 12, 2020 at 1:24 AM wi nk <wink@technolu.st> wrote:
>
> On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <stefani@seibold.net> wrote:
> >
> > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> > >
> > > The proof of concept patch for v5.10-rc2 is here:
> > >
> > > https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
> > >
> > > Hopefully it makes it possible to boot the firmware now. But this is
> > > a
> > > quick hack and most likely buggy, so keep your expectations low :)
> > >
> > > In case there are these warnings during firmware initialisation:
> > >
> > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> > >
> > > Try reverting this commit:
> > >
> > > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > > __free_pages_core()
> > >
> > > That's another issue which is debugged here:
> > >
> > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> > >
> >
> > Applying the patch and revert patch 7fef431be9c9 worked on the first
> > glance.
> >
> > After a couple of minutes the connection get broken. The kernel log
> > shows the following error:
> >
> > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > ath11k_pc
> > i 0000:55:00.0: failed to enable PMF QOS: (-11
> >
> > It is also not possible to unload the ath11k_pci, rmmod will hang.
> >
> >
>
> I can confirm the same behavior as Stefani so far.  After applying the
> patch, and reverting commit 7fef431be9c9, I am able to connect to a
> network.  It hasn't disconnected yet (I'm sending this email via that
> connection).  I'll report what I find next.
>
> Thanks again for the help!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  1:10                                 ` wi nk
@ 2020-11-12  1:11                                   ` wi nk
  2020-11-12  2:31                                     ` wi nk
  2020-11-12  7:05                                   ` Stefani Seibold
  1 sibling, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-12  1:11 UTC (permalink / raw)
  To: Stefani Seibold
  Cc: Kalle Valo, Thomas Krause, Govind Singh, linux-pci,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

On Thu, Nov 12, 2020 at 2:10 AM wi nk <wink@technolu.st> wrote:
>
> I've yet to see any instability after 45 minutes of exercising it, I
> do see a couple of messages that came out of the driver:
>
> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>
> then when it associates:
>
> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> [   16.722636] wlp85s0: authenticated
> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> (capab=0x411 status=0 aid=8)
> [   16.738443] wlp85s0: associated
> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
>
> The adapter is achieving around 500 mbps on my gigabit connection, my
> 2018 mbp sees around 650, so it's doing pretty well so far.
>
> Stefani - when you applied the patch that Kalle shared, which branch
> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> I revert 7fef431be9c9 there is a small merge conflict I needed to
> resolve.  I wonder if either the starting branch, or your chosen
> resolution are related to the instability you see (or I'm just lucky
> so far! :)).
>
> On Thu, Nov 12, 2020 at 1:24 AM wi nk <wink@technolu.st> wrote:
> >
> > On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <stefani@seibold.net> wrote:
> > >
> > > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> > > >
> > > > The proof of concept patch for v5.10-rc2 is here:
> > > >
> > > > https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
> > > >
> > > > Hopefully it makes it possible to boot the firmware now. But this is
> > > > a
> > > > quick hack and most likely buggy, so keep your expectations low :)
> > > >
> > > > In case there are these warnings during firmware initialisation:
> > > >
> > > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> > > >
> > > > Try reverting this commit:
> > > >
> > > > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > > > __free_pages_core()
> > > >
> > > > That's another issue which is debugged here:
> > > >
> > > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> > > >
> > >
> > > Applying the patch and revert patch 7fef431be9c9 worked on the first
> > > glance.
> > >
> > > After a couple of minutes the connection get broken. The kernel log
> > > shows the following error:
> > >
> > > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> > > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > > ath11k_pc
> > > i 0000:55:00.0: failed to enable PMF QOS: (-11
> > >
> > > It is also not possible to unload the ath11k_pci, rmmod will hang.
> > >
> > >
> >
> > I can confirm the same behavior as Stefani so far.  After applying the
> > patch, and reverting commit 7fef431be9c9, I am able to connect to a
> > network.  It hasn't disconnected yet (I'm sending this email via that
> > connection).  I'll report what I find next.
> >
> > Thanks again for the help!

Sigh.... sorry for the top post again.  I'll now get a real email client.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  1:11                                   ` wi nk
@ 2020-11-12  2:31                                     ` wi nk
  2020-11-12  6:29                                       ` Carl Huang
  0 siblings, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-12  2:31 UTC (permalink / raw)
  To: Stefani Seibold
  Cc: Kalle Valo, Thomas Krause, Govind Singh, linux-pci,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

On Thu, Nov 12, 2020 at 2:11 AM wi nk <wink@technolu.st> wrote:
>
> On Thu, Nov 12, 2020 at 2:10 AM wi nk <wink@technolu.st> wrote:
> >
> > I've yet to see any instability after 45 minutes of exercising it, I
> > do see a couple of messages that came out of the driver:
> >
> > [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> > [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >
> > then when it associates:
> >
> > [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > [   16.722636] wlp85s0: authenticated
> > [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > (capab=0x411 status=0 aid=8)
> > [   16.738443] wlp85s0: associated
> > [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> >
> > The adapter is achieving around 500 mbps on my gigabit connection, my
> > 2018 mbp sees around 650, so it's doing pretty well so far.
> >
> > Stefani - when you applied the patch that Kalle shared, which branch
> > did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> > I revert 7fef431be9c9 there is a small merge conflict I needed to
> > resolve.  I wonder if either the starting branch, or your chosen
> > resolution are related to the instability you see (or I'm just lucky
> > so far! :)).
> >
> > On Thu, Nov 12, 2020 at 1:24 AM wi nk <wink@technolu.st> wrote:
> > >
> > > On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <stefani@seibold.net> wrote:
> > > >
> > > > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
> > > > >
> > > > > The proof of concept patch for v5.10-rc2 is here:
> > > > >
> > > > > https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
> > > > >
> > > > > Hopefully it makes it possible to boot the firmware now. But this is
> > > > > a
> > > > > quick hack and most likely buggy, so keep your expectations low :)
> > > > >
> > > > > In case there are these warnings during firmware initialisation:
> > > > >
> > > > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
> > > > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
> > > > >
> > > > > Try reverting this commit:
> > > > >
> > > > > 7fef431be9c9 mm/page_alloc: place pages to tail in
> > > > > __free_pages_core()
> > > > >
> > > > > That's another issue which is debugged here:
> > > > >
> > > > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
> > > > >
> > > >
> > > > Applying the patch and revert patch 7fef431be9c9 worked on the first
> > > > glance.
> > > >
> > > > After a couple of minutes the connection get broken. The kernel log
> > > > shows the following error:
> > > >
> > > > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> > > > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > > > ath11k_pc
> > > > i 0000:55:00.0: failed to enable PMF QOS: (-11
> > > >
> > > > It is also not possible to unload the ath11k_pci, rmmod will hang.
> > > >
> > > >
> > >
> > > I can confirm the same behavior as Stefani so far.  After applying the
> > > patch, and reverting commit 7fef431be9c9, I am able to connect to a
> > > network.  It hasn't disconnected yet (I'm sending this email via that
> > > connection).  I'll report what I find next.
> > >
> > > Thanks again for the help!
>
> Sigh.... sorry for the top post again.  I'll now get a real email client.

So the connection remained super stable for a while, so I decided to
tempt fate and suspend the laptop to see what would happen :).

[ 5994.143715] PM: suspend exit
[ 5997.260351] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 5997.260353] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 5997.260356] ath11k_pci 0000:55:00.0: failed to enable dynamic bw: -11
[ 6000.332299] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6000.332303] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6000.332308] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6003.404365] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6003.404368] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6003.404373] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6016.204347] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6016.204351] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6016.204357] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6019.276319] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6019.276323] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6019.276329] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6031.052272] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6031.052275] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6031.052279] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6034.128257] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
[ 6034.128261] ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 6034.128265] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
[ 6039.500241] ath11k_pci 0000:55:00.0: qmi failed set mode request,
mode: 4, err = -110
[ 6039.500244] ath11k_pci 0000:55:00.0: qmi failed to send wlan mode off

I was able to remove the ath11k module using rmmod -f , and then
modprobe ath11k + atk11k_pci and the device was able to reassociate
and bring the connection back up.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  2:31                                     ` wi nk
@ 2020-11-12  6:29                                       ` Carl Huang
  0 siblings, 0 replies; 40+ messages in thread
From: Carl Huang @ 2020-11-12  6:29 UTC (permalink / raw)
  To: wi nk
  Cc: Stefani Seibold, Govind Singh, linux-pci, linux-wireless,
	Devin Bayer, ath11k, Thomas Krause, Bjorn Helgaas,
	David Woodhouse, Thomas Gleixner, Christoph Hellwig, Kalle Valo

On 2020-11-12 10:31, wi nk wrote:
> On Thu, Nov 12, 2020 at 2:11 AM wi nk <wink@technolu.st> wrote:
>> 
>> On Thu, Nov 12, 2020 at 2:10 AM wi nk <wink@technolu.st> wrote:
>> >
>> > I've yet to see any instability after 45 minutes of exercising it, I
>> > do see a couple of messages that came out of the driver:
>> >
>> > [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>> > [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>> >
>> > then when it associates:
>> >
>> > [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>> > [   16.722636] wlp85s0: authenticated
>> > [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>> > [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>> > (capab=0x411 status=0 aid=8)
>> > [   16.738443] wlp85s0: associated
>> > [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
>> >
>> > The adapter is achieving around 500 mbps on my gigabit connection, my
>> > 2018 mbp sees around 650, so it's doing pretty well so far.
>> >
>> > Stefani - when you applied the patch that Kalle shared, which branch
>> > did you apply it to?  I applied it to ath11k-qca6390-bringup and when
>> > I revert 7fef431be9c9 there is a small merge conflict I needed to
>> > resolve.  I wonder if either the starting branch, or your chosen
>> > resolution are related to the instability you see (or I'm just lucky
>> > so far! :)).
>> >
>> > On Thu, Nov 12, 2020 at 1:24 AM wi nk <wink@technolu.st> wrote:
>> > >
>> > > On Wed, Nov 11, 2020 at 11:02 PM Stefani Seibold <stefani@seibold.net> wrote:
>> > > >
>> > > > On Wed, 2020-11-11 at 21:10 +0200, Kalle Valo wrote:
>> > > > >
>> > > > > The proof of concept patch for v5.10-rc2 is here:
>> > > > >
>> > > > > https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@codeaurora.org/
>> > > > >
>> > > > > Hopefully it makes it possible to boot the firmware now. But this is
>> > > > > a
>> > > > > quick hack and most likely buggy, so keep your expectations low :)
>> > > > >
>> > > > > In case there are these warnings during firmware initialisation:
>> > > > >
>> > > > > ath11k_pci 0000:05:00.0: qmi failed memory request, err = -110
>> > > > > ath11k_pci 0000:05:00.0: qmi failed to respond fw mem req:-110
>> > > > >
>> > > > > Try reverting this commit:
>> > > > >
>> > > > > 7fef431be9c9 mm/page_alloc: place pages to tail in
>> > > > > __free_pages_core()
>> > > > >
>> > > > > That's another issue which is debugged here:
>> > > > >
>> > > > > http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html
>> > > > >
>> > > >
>> > > > Applying the patch and revert patch 7fef431be9c9 worked on the first
>> > > > glance.
>> > > >
>> > > > After a couple of minutes the connection get broken. The kernel log
>> > > > shows the following error:
>> > > >
>> > > > ath11k_pci 0000:55:00.0: wmi command 16387 timeout
>> > > > ath11k_pci 0000:55:00.0: failed to send WMI_PDEV_SET_PARAM cmd
>> > > > ath11k_pc
>> > > > i 0000:55:00.0: failed to enable PMF QOS: (-11
>> > > >
>> > > > It is also not possible to unload the ath11k_pci, rmmod will hang.
>> > > >
>> > > >
>> > >
>> > > I can confirm the same behavior as Stefani so far.  After applying the
>> > > patch, and reverting commit 7fef431be9c9, I am able to connect to a
>> > > network.  It hasn't disconnected yet (I'm sending this email via that
>> > > connection).  I'll report what I find next.
>> > >
>> > > Thanks again for the help!
>> 
>> Sigh.... sorry for the top post again.  I'll now get a real email 
>> client.
> 
> So the connection remained super stable for a while, so I decided to
> tempt fate and suspend the laptop to see what would happen :).
> 
> [ 5994.143715] PM: suspend exit
> [ 5997.260351] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 5997.260353] ath11k_pci 0000:55:00.0: failed to send 
> WMI_PDEV_SET_PARAM cmd
> [ 5997.260356] ath11k_pci 0000:55:00.0: failed to enable dynamic bw: 
> -11
> [ 6000.332299] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6000.332303] ath11k_pci 0000:55:00.0: failed to send 
> WMI_PDEV_SET_PARAM cmd
> [ 6000.332308] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6003.404365] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6003.404368] ath11k_pci 0000:55:00.0: failed to send 
> WMI_PDEV_SET_PARAM cmd
> [ 6003.404373] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6016.204347] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6016.204351] ath11k_pci 0000:55:00.0: failed to send 
> WMI_PDEV_SET_PARAM cmd
> [ 6016.204357] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6019.276319] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6019.276323] ath11k_pci 0000:55:00.0: failed to send 
> WMI_PDEV_SET_PARAM cmd
> [ 6019.276329] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6031.052272] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6031.052275] ath11k_pci 0000:55:00.0: failed to send 
> WMI_PDEV_SET_PARAM cmd
> [ 6031.052279] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6034.128257] ath11k_pci 0000:55:00.0: wmi command 16387 timeout
> [ 6034.128261] ath11k_pci 0000:55:00.0: failed to send 
> WMI_PDEV_SET_PARAM cmd
> [ 6034.128265] ath11k_pci 0000:55:00.0: failed to enable PMF QOS: (-11
> [ 6039.500241] ath11k_pci 0000:55:00.0: qmi failed set mode request,
> mode: 4, err = -110
> [ 6039.500244] ath11k_pci 0000:55:00.0: qmi failed to send wlan mode 
> off
> 
> I was able to remove the ath11k module using rmmod -f , and then
> modprobe ath11k + atk11k_pci and the device was able to reassociate
> and bring the connection back up.

Please apply below to have a try:
https://patchwork.kernel.org/project/linux-wireless/patch/20201112062555.3335-1-cjhuang@codeaurora.org/


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  1:10                                 ` wi nk
  2020-11-12  1:11                                   ` wi nk
@ 2020-11-12  7:05                                   ` Stefani Seibold
  2020-11-12  7:15                                     ` Kalle Valo
  1 sibling, 1 reply; 40+ messages in thread
From: Stefani Seibold @ 2020-11-12  7:05 UTC (permalink / raw)
  To: wi nk
  Cc: Kalle Valo, Thomas Krause, Govind Singh, linux-pci,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> I've yet to see any instability after 45 minutes of exercising it, I
> do see a couple of messages that came out of the driver:
> 
> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> 
> then when it associates:
> 
> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> [   16.722636] wlp85s0: authenticated
> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> (capab=0x411 status=0 aid=8)
> [   16.738443] wlp85s0: associated
> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> ready
> 
> The adapter is achieving around 500 mbps on my gigabit connection, my
> 2018 mbp sees around 650, so it's doing pretty well so far.
> 
> Stefani - when you applied the patch that Kalle shared, which branch
> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> I revert 7fef431be9c9 there is a small merge conflict I needed to
> resolve.  I wonder if either the starting branch, or your chosen
> resolution are related to the instability you see (or I'm just lucky
> so far! :)).
> 

I used the vanilla kernel tree 
https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
i applied the 

RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch

and reverted the patch 7fef431be9c9



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  7:05                                   ` Stefani Seibold
@ 2020-11-12  7:15                                     ` Kalle Valo
  2020-11-12  7:41                                       ` wi nk
  0 siblings, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-12  7:15 UTC (permalink / raw)
  To: Stefani Seibold
  Cc: wi nk, Govind Singh, linux-pci, linux-wireless, Devin Bayer,
	ath11k, Thomas Krause, Bjorn Helgaas, David Woodhouse,
	Thomas Gleixner, Christoph Hellwig

Stefani Seibold <stefani@seibold.net> writes:

> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
>> I've yet to see any instability after 45 minutes of exercising it, I
>> do see a couple of messages that came out of the driver:
>> 
>> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>> 
>> then when it associates:
>> 
>> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>> [   16.722636] wlp85s0: authenticated
>> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>> (capab=0x411 status=0 aid=8)
>> [   16.738443] wlp85s0: associated
>> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
>> ready
>> 
>> The adapter is achieving around 500 mbps on my gigabit connection, my
>> 2018 mbp sees around 650, so it's doing pretty well so far.
>> 
>> Stefani - when you applied the patch that Kalle shared, which branch
>> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
>> I revert 7fef431be9c9 there is a small merge conflict I needed to
>> resolve.  I wonder if either the starting branch, or your chosen
>> resolution are related to the instability you see (or I'm just lucky
>> so far! :)).
>> 
>
> I used the vanilla kernel tree 
> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> i applied the 
>
> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
>
> and reverted the patch 7fef431be9c9

I did also my testing on v5.10-rc2 and I recommend to use that as the
baseline when debuggin these ath11k problems. It helps to compare the
results if everyone have the same baseline.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  7:15                                     ` Kalle Valo
@ 2020-11-12  7:41                                       ` wi nk
  2020-11-12  8:59                                         ` Kalle Valo
  0 siblings, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-12  7:41 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Stefani Seibold, Govind Singh, linux-pci, linux-wireless,
	Devin Bayer, ath11k, Thomas Krause, Bjorn Helgaas,
	David Woodhouse, Thomas Gleixner, Christoph Hellwig

On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> Stefani Seibold <stefani@seibold.net> writes:
>
> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> >> I've yet to see any instability after 45 minutes of exercising it, I
> >> do see a couple of messages that came out of the driver:
> >>
> >> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> >> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >>
> >> then when it associates:
> >>
> >> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> >> [   16.722636] wlp85s0: authenticated
> >> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> >> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> >> (capab=0x411 status=0 aid=8)
> >> [   16.738443] wlp85s0: associated
> >> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> >> ready
> >>
> >> The adapter is achieving around 500 mbps on my gigabit connection, my
> >> 2018 mbp sees around 650, so it's doing pretty well so far.
> >>
> >> Stefani - when you applied the patch that Kalle shared, which branch
> >> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
> >> resolve.  I wonder if either the starting branch, or your chosen
> >> resolution are related to the instability you see (or I'm just lucky
> >> so far! :)).
> >>
> >
> > I used the vanilla kernel tree
> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> > i applied the
> >
> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> >
> > and reverted the patch 7fef431be9c9
>
> I did also my testing on v5.10-rc2 and I recommend to use that as the
> baseline when debuggin these ath11k problems. It helps to compare the
> results if everyone have the same baseline.
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Absolutely, I'll rebuild to 5.10 later today and apply the same series
of patches and report back.  I'll also test out the patch on both
versions from Carl to fix resuming.  It stands to reason that we may
be seeing another regression between Stefani (5.10) and myself (5.9
bringup branch) as I don't see any disconnections or instability once
the interface is online.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  7:41                                       ` wi nk
@ 2020-11-12  8:59                                         ` Kalle Valo
  2020-11-12 15:44                                           ` wi nk
  0 siblings, 1 reply; 40+ messages in thread
From: Kalle Valo @ 2020-11-12  8:59 UTC (permalink / raw)
  To: wi nk
  Cc: Govind Singh, linux-pci, Stefani Seibold, linux-wireless,
	Devin Bayer, Christoph Hellwig, Thomas Krause, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

wi nk <wink@technolu.st> writes:

> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>>
>> Stefani Seibold <stefani@seibold.net> writes:
>>
>> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
>> >> I've yet to see any instability after 45 minutes of exercising it, I
>> >> do see a couple of messages that came out of the driver:
>> >>
>> >> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>> >> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>> >>
>> >> then when it associates:
>> >>
>> >> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>> >> [   16.722636] wlp85s0: authenticated
>> >> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>> >> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>> >> (capab=0x411 status=0 aid=8)
>> >> [   16.738443] wlp85s0: associated
>> >> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
>> >> ready
>> >>
>> >> The adapter is achieving around 500 mbps on my gigabit connection, my
>> >> 2018 mbp sees around 650, so it's doing pretty well so far.
>> >>
>> >> Stefani - when you applied the patch that Kalle shared, which branch
>> >> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
>> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
>> >> resolve.  I wonder if either the starting branch, or your chosen
>> >> resolution are related to the instability you see (or I'm just lucky
>> >> so far! :)).
>> >>
>> >
>> > I used the vanilla kernel tree
>> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
>> > i applied the
>> >
>> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
>> >
>> > and reverted the patch 7fef431be9c9
>>
>> I did also my testing on v5.10-rc2 and I recommend to use that as the
>> baseline when debuggin these ath11k problems. It helps to compare the
>> results if everyone have the same baseline.
>>
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> Absolutely, I'll rebuild to 5.10 later today and apply the same series
> of patches and report back.

Great, thanks.

> I'll also test out the patch on both versions from Carl to fix
> resuming. It stands to reason that we may be seeing another regression
> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> any disconnections or instability once the interface is online.

Yeah, there is something strange happening between v5.9 and v5.10 we
have not yet figured out. Most likely it has something to do with memory
allocations and DMA transfers failing, but no clear understanding yet.

But to keep things simple let's only discuss the MSI problem on this
thread, and discuss the timeouts in the another thread:

http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html

I'll include you and other reporters to that thread.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12  8:59                                         ` Kalle Valo
@ 2020-11-12 15:44                                           ` wi nk
  2020-11-13  9:52                                             ` wi nk
  2020-11-15 13:30                                             ` Thomas Krause
  0 siblings, 2 replies; 40+ messages in thread
From: wi nk @ 2020-11-12 15:44 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Govind Singh, linux-pci, Stefani Seibold, linux-wireless,
	Devin Bayer, Christoph Hellwig, Thomas Krause, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> wi nk <wink@technolu.st> writes:
>
> > On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> >>
> >> Stefani Seibold <stefani@seibold.net> writes:
> >>
> >> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> >> >> I've yet to see any instability after 45 minutes of exercising it, I
> >> >> do see a couple of messages that came out of the driver:
> >> >>
> >> >> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> >> >> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >> >>
> >> >> then when it associates:
> >> >>
> >> >> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> >> >> [   16.722636] wlp85s0: authenticated
> >> >> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> >> >> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> >> >> (capab=0x411 status=0 aid=8)
> >> >> [   16.738443] wlp85s0: associated
> >> >> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> >> >> ready
> >> >>
> >> >> The adapter is achieving around 500 mbps on my gigabit connection, my
> >> >> 2018 mbp sees around 650, so it's doing pretty well so far.
> >> >>
> >> >> Stefani - when you applied the patch that Kalle shared, which branch
> >> >> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> >> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
> >> >> resolve.  I wonder if either the starting branch, or your chosen
> >> >> resolution are related to the instability you see (or I'm just lucky
> >> >> so far! :)).
> >> >>
> >> >
> >> > I used the vanilla kernel tree
> >> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> >> > i applied the
> >> >
> >> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> >> >
> >> > and reverted the patch 7fef431be9c9
> >>
> >> I did also my testing on v5.10-rc2 and I recommend to use that as the
> >> baseline when debuggin these ath11k problems. It helps to compare the
> >> results if everyone have the same baseline.
> >>
> >> --
> >> https://patchwork.kernel.org/project/linux-wireless/list/
> >>
> >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >
> > Absolutely, I'll rebuild to 5.10 later today and apply the same series
> > of patches and report back.
>
> Great, thanks.
>
> > I'll also test out the patch on both versions from Carl to fix
> > resuming. It stands to reason that we may be seeing another regression
> > between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> > any disconnections or instability once the interface is online.
>
> Yeah, there is something strange happening between v5.9 and v5.10 we
> have not yet figured out. Most likely it has something to do with memory
> allocations and DMA transfers failing, but no clear understanding yet.
>
> But to keep things simple let's only discuss the MSI problem on this
> thread, and discuss the timeouts in the another thread:
>
> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
>
> I'll include you and other reporters to that thread.
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
applied and 7fef431be9c9 reverted.  I can't get my machine to  boot
into anything usable with that configuration.  I'm running ubuntu so
its starting right into X and sometime between showing the available
users and me clicking the icon to login the machine freezes.  I can
see in the system tray that the wifi adapter is being activated and
appears to have associated with an AP, I just can't do much beyond
that as the keyboard backlight wakes up, but the caps lock key doesn't
work.  I see similar behavior with the 5.9 configuration, but after a
reboot or two I win whatever race is occuring.  With 5.10, I tried
maybe 10-15 times with 0 success.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12 15:44                                           ` wi nk
@ 2020-11-13  9:52                                             ` wi nk
  2020-11-15 13:30                                             ` Thomas Krause
  1 sibling, 0 replies; 40+ messages in thread
From: wi nk @ 2020-11-13  9:52 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Govind Singh, linux-pci, Stefani Seibold, linux-wireless,
	Devin Bayer, Christoph Hellwig, Thomas Krause, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

On Thu, Nov 12, 2020 at 4:44 PM wi nk <wink@technolu.st> wrote:
>
> On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> >
> > wi nk <wink@technolu.st> writes:
> >
> > > On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> > >>
> > >> Stefani Seibold <stefani@seibold.net> writes:
> > >>
> > >> > Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> > >> >> I've yet to see any instability after 45 minutes of exercising it, I
> > >> >> do see a couple of messages that came out of the driver:
> > >> >>
> > >> >> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> > >> >> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> > >> >>
> > >> >> then when it associates:
> > >> >>
> > >> >> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > >> >> [   16.722636] wlp85s0: authenticated
> > >> >> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > >> >> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > >> >> (capab=0x411 status=0 aid=8)
> > >> >> [   16.738443] wlp85s0: associated
> > >> >> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> > >> >> ready
> > >> >>
> > >> >> The adapter is achieving around 500 mbps on my gigabit connection, my
> > >> >> 2018 mbp sees around 650, so it's doing pretty well so far.
> > >> >>
> > >> >> Stefani - when you applied the patch that Kalle shared, which branch
> > >> >> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> > >> >> I revert 7fef431be9c9 there is a small merge conflict I needed to
> > >> >> resolve.  I wonder if either the starting branch, or your chosen
> > >> >> resolution are related to the instability you see (or I'm just lucky
> > >> >> so far! :)).
> > >> >>
> > >> >
> > >> > I used the vanilla kernel tree
> > >> > https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> > >> > i applied the
> > >> >
> > >> > RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> > >> >
> > >> > and reverted the patch 7fef431be9c9
> > >>
> > >> I did also my testing on v5.10-rc2 and I recommend to use that as the
> > >> baseline when debuggin these ath11k problems. It helps to compare the
> > >> results if everyone have the same baseline.
> > >>
> > >> --
> > >> https://patchwork.kernel.org/project/linux-wireless/list/
> > >>
> > >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >
> > > Absolutely, I'll rebuild to 5.10 later today and apply the same series
> > > of patches and report back.
> >
> > Great, thanks.
> >
> > > I'll also test out the patch on both versions from Carl to fix
> > > resuming. It stands to reason that we may be seeing another regression
> > > between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> > > any disconnections or instability once the interface is online.
> >
> > Yeah, there is something strange happening between v5.9 and v5.10 we
> > have not yet figured out. Most likely it has something to do with memory
> > allocations and DMA transfers failing, but no clear understanding yet.
> >
> > But to keep things simple let's only discuss the MSI problem on this
> > thread, and discuss the timeouts in the another thread:
> >
> > http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
> >
> > I'll include you and other reporters to that thread.
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> applied and 7fef431be9c9 reverted.  I can't get my machine to  boot
> into anything usable with that configuration.  I'm running ubuntu so
> its starting right into X and sometime between showing the available
> users and me clicking the icon to login the machine freezes.  I can
> see in the system tray that the wifi adapter is being activated and
> appears to have associated with an AP, I just can't do much beyond
> that as the keyboard backlight wakes up, but the caps lock key doesn't
> work.  I see similar behavior with the 5.9 configuration, but after a
> reboot or two I win whatever race is occuring.  With 5.10, I tried
> maybe 10-15 times with 0 success.

Kalle, what would be a useful next move for trying to hunt this?  It
seems I can't really test the single MSI patch on 5.10 since with the
patch (+ the reverted commit) the driver isn't stable enough for my
machine to stay running.  It seems your hunch is that this is related
to the issues in the other thread
(http://lists.infradead.org/pipermail/ath11k/2020-November/000550.html)?
 I see the SOTA for debugging these things would be to use the kdump
tools and let the secondary kernel dump diagnostics for me.  Would
such logs be useful for you/this?

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-12 15:44                                           ` wi nk
  2020-11-13  9:52                                             ` wi nk
@ 2020-11-15 13:30                                             ` Thomas Krause
  2020-11-15 19:55                                               ` wi nk
  1 sibling, 1 reply; 40+ messages in thread
From: Thomas Krause @ 2020-11-15 13:30 UTC (permalink / raw)
  To: wi nk, Kalle Valo
  Cc: Govind Singh, linux-pci, Stefani Seibold, linux-wireless,
	Devin Bayer, Christoph Hellwig, Bjorn Helgaas, Thomas Gleixner,
	ath11k, David Woodhouse


Am 12.11.20 um 16:44 schrieb wi nk:
> On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>> wi nk <wink@technolu.st> writes:
>>
>>> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>>>> Stefani Seibold <stefani@seibold.net> writes:
>>>>
>>>>> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
>>>>>> I've yet to see any instability after 45 minutes of exercising it, I
>>>>>> do see a couple of messages that came out of the driver:
>>>>>>
>>>>>> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
>>>>>> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
>>>>>>
>>>>>> then when it associates:
>>>>>>
>>>>>> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
>>>>>> [   16.722636] wlp85s0: authenticated
>>>>>> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
>>>>>> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
>>>>>> (capab=0x411 status=0 aid=8)
>>>>>> [   16.738443] wlp85s0: associated
>>>>>> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
>>>>>> ready
>>>>>>
>>>>>> The adapter is achieving around 500 mbps on my gigabit connection, my
>>>>>> 2018 mbp sees around 650, so it's doing pretty well so far.
>>>>>>
>>>>>> Stefani - when you applied the patch that Kalle shared, which branch
>>>>>> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
>>>>>> I revert 7fef431be9c9 there is a small merge conflict I needed to
>>>>>> resolve.  I wonder if either the starting branch, or your chosen
>>>>>> resolution are related to the instability you see (or I'm just lucky
>>>>>> so far! :)).
>>>>>>
>>>>> I used the vanilla kernel tree
>>>>> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
>>>>> i applied the
>>>>>
>>>>> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
>>>>>
>>>>> and reverted the patch 7fef431be9c9
>>>> I did also my testing on v5.10-rc2 and I recommend to use that as the
>>>> baseline when debuggin these ath11k problems. It helps to compare the
>>>> results if everyone have the same baseline.
>>>>
>>>> --
>>>> https://patchwork.kernel.org/project/linux-wireless/list/
>>>>
>>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>>> Absolutely, I'll rebuild to 5.10 later today and apply the same series
>>> of patches and report back.
>> Great, thanks.
>>
>>> I'll also test out the patch on both versions from Carl to fix
>>> resuming. It stands to reason that we may be seeing another regression
>>> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
>>> any disconnections or instability once the interface is online.
>> Yeah, there is something strange happening between v5.9 and v5.10 we
>> have not yet figured out. Most likely it has something to do with memory
>> allocations and DMA transfers failing, but no clear understanding yet.
>>
>> But to keep things simple let's only discuss the MSI problem on this
>> thread, and discuss the timeouts in the another thread:
>>
>> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
>>
>> I'll include you and other reporters to that thread.
>>
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> applied and 7fef431be9c9 reverted.  I can't get my machine to  boot
> into anything usable with that configuration.  I'm running ubuntu so
> its starting right into X and sometime between showing the available
> users and me clicking the icon to login the machine freezes.  I can
> see in the system tray that the wifi adapter is being activated and
> appears to have associated with an AP, I just can't do much beyond
> that as the keyboard backlight wakes up, but the caps lock key doesn't
> work.  I see similar behavior with the 5.9 configuration, but after a
> reboot or two I win whatever race is occuring.  With 5.10, I tried
> maybe 10-15 times with 0 success.

I can confirm this behavior on my configuration. I managed to login once 
and select the Wifi and connect to it. It seemed curiously enough be 
stable long enough to enter the Wifi passphrase. After the connection 
was established, the system hang and on each attempt to reboot into the 
graphical system it would freeze at some point (sometimes even before 
showing the login screen).

Kernel was both based on 5.10-rc2 and 5.10-rc3 (I did see the same 
behavior) with the patch applied, 7fef431be9c9 reverted and firmware 
downloaded and copied to /lib/firmware/ath11k/QCA6390/hw2.0/.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-15 13:30                                             ` Thomas Krause
@ 2020-11-15 19:55                                               ` wi nk
  2020-11-17 15:49                                                 ` wi nk
  0 siblings, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-15 19:55 UTC (permalink / raw)
  To: Thomas Krause
  Cc: Kalle Valo, Govind Singh, linux-pci, Stefani Seibold,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

On Sun, Nov 15, 2020 at 2:30 PM Thomas Krause <thomaskrause@posteo.de> wrote:
>
>
> Am 12.11.20 um 16:44 schrieb wi nk:
> > On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> >> wi nk <wink@technolu.st> writes:
> >>
> >>> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> >>>> Stefani Seibold <stefani@seibold.net> writes:
> >>>>
> >>>>> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> >>>>>> I've yet to see any instability after 45 minutes of exercising it, I
> >>>>>> do see a couple of messages that came out of the driver:
> >>>>>>
> >>>>>> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> >>>>>> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> >>>>>>
> >>>>>> then when it associates:
> >>>>>>
> >>>>>> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> >>>>>> [   16.722636] wlp85s0: authenticated
> >>>>>> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> >>>>>> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> >>>>>> (capab=0x411 status=0 aid=8)
> >>>>>> [   16.738443] wlp85s0: associated
> >>>>>> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> >>>>>> ready
> >>>>>>
> >>>>>> The adapter is achieving around 500 mbps on my gigabit connection, my
> >>>>>> 2018 mbp sees around 650, so it's doing pretty well so far.
> >>>>>>
> >>>>>> Stefani - when you applied the patch that Kalle shared, which branch
> >>>>>> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> >>>>>> I revert 7fef431be9c9 there is a small merge conflict I needed to
> >>>>>> resolve.  I wonder if either the starting branch, or your chosen
> >>>>>> resolution are related to the instability you see (or I'm just lucky
> >>>>>> so far! :)).
> >>>>>>
> >>>>> I used the vanilla kernel tree
> >>>>> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> >>>>> i applied the
> >>>>>
> >>>>> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> >>>>>
> >>>>> and reverted the patch 7fef431be9c9
> >>>> I did also my testing on v5.10-rc2 and I recommend to use that as the
> >>>> baseline when debuggin these ath11k problems. It helps to compare the
> >>>> results if everyone have the same baseline.
> >>>>
> >>>> --
> >>>> https://patchwork.kernel.org/project/linux-wireless/list/
> >>>>
> >>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >>> Absolutely, I'll rebuild to 5.10 later today and apply the same series
> >>> of patches and report back.
> >> Great, thanks.
> >>
> >>> I'll also test out the patch on both versions from Carl to fix
> >>> resuming. It stands to reason that we may be seeing another regression
> >>> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> >>> any disconnections or instability once the interface is online.
> >> Yeah, there is something strange happening between v5.9 and v5.10 we
> >> have not yet figured out. Most likely it has something to do with memory
> >> allocations and DMA transfers failing, but no clear understanding yet.
> >>
> >> But to keep things simple let's only discuss the MSI problem on this
> >> thread, and discuss the timeouts in the another thread:
> >>
> >> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
> >>
> >> I'll include you and other reporters to that thread.
> >>
> >> --
> >> https://patchwork.kernel.org/project/linux-wireless/list/
> >>
> >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> > applied and 7fef431be9c9 reverted.  I can't get my machine to  boot
> > into anything usable with that configuration.  I'm running ubuntu so
> > its starting right into X and sometime between showing the available
> > users and me clicking the icon to login the machine freezes.  I can
> > see in the system tray that the wifi adapter is being activated and
> > appears to have associated with an AP, I just can't do much beyond
> > that as the keyboard backlight wakes up, but the caps lock key doesn't
> > work.  I see similar behavior with the 5.9 configuration, but after a
> > reboot or two I win whatever race is occuring.  With 5.10, I tried
> > maybe 10-15 times with 0 success.
>
> I can confirm this behavior on my configuration. I managed to login once
> and select the Wifi and connect to it. It seemed curiously enough be
> stable long enough to enter the Wifi passphrase. After the connection
> was established, the system hang and on each attempt to reboot into the
> graphical system it would freeze at some point (sometimes even before
> showing the login screen).
>
> Kernel was both based on 5.10-rc2 and 5.10-rc3 (I did see the same
> behavior) with the patch applied, 7fef431be9c9 reverted and firmware
> downloaded and copied to /lib/firmware/ath11k/QCA6390/hw2.0/.
>
>

I did a bit more digging to see if I could find any new information,
I'm not sure I did but here's what I did / found.  I spent the time to
get a kdump kernel running and enabled, I was able to SysRq-C (both
via keyboard and echo c > /proc/sysrq-trigger) and generate a crash
dump.  Actually viewing them at the moment will require reverting a
couple of patches to printk to fix the file for the crash utility
(https://github.com/crash-utility/crash/issues/67), but right now
that's not super important since the mechanism isn't being triggered.
As reported here and by Mitchell, the adapter will work occasionally,
but more often it will hang the machine (I too tried 5.10-rc3 with no
noticable differences).  Whatever is causing the system to hang isn't
triggering the kdump kernel to take over and dump the vmcore.  I've
set watchdog=1 , nmi_watchdog=1, hung_task_panic=1, softlockup_panic=1
trying to convince the kernel to dump it's state during this.  I've
not been able to make it write a crash, it just sits 'hung'.  One
interesting observation that may be related, is that if the lockup
occurs during my login, I can actually see the system grind to a halt
over the course of a number of frames (the rendering of the login
animations starts to stutter/get really slow, then after a few frames
everything is frozen).  If something were spin locking/ed, I'd expect
the soft lockup panic to find it, but I don't know these mechanisms
well.

The only consistent behavior that I managed to create is that if the
wifi adapter / machine are in a 'working' state (ie: I can browse the
internet, etc) and I issue sysrq-c to crash the kernel and then let
the crash dump write and reboot the machine, once booted the adapter
is no longer seen by the kernel, and there are zero messages in dmesg
that match "ath11k".  The driver shows up in lsmod , but it reports
zero messages and it's like the adapter is completely invisible.  A
power off and back on of the machine will re-enter it into the
freezing/wifi working lottery.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-15 19:55                                               ` wi nk
@ 2020-11-17 15:49                                                 ` wi nk
  2020-11-17 20:59                                                   ` Thomas Gleixner
  0 siblings, 1 reply; 40+ messages in thread
From: wi nk @ 2020-11-17 15:49 UTC (permalink / raw)
  To: Thomas Krause
  Cc: Kalle Valo, Govind Singh, linux-pci, Stefani Seibold,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	Thomas Gleixner, ath11k, David Woodhouse

On Sun, Nov 15, 2020 at 8:55 PM wi nk <wink@technolu.st> wrote:
>
> On Sun, Nov 15, 2020 at 2:30 PM Thomas Krause <thomaskrause@posteo.de> wrote:
> >
> >
> > Am 12.11.20 um 16:44 schrieb wi nk:
> > > On Thu, Nov 12, 2020 at 10:00 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> > >> wi nk <wink@technolu.st> writes:
> > >>
> > >>> On Thu, Nov 12, 2020 at 8:15 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> > >>>> Stefani Seibold <stefani@seibold.net> writes:
> > >>>>
> > >>>>> Am Donnerstag, den 12.11.2020, 02:10 +0100 schrieb wi nk:
> > >>>>>> I've yet to see any instability after 45 minutes of exercising it, I
> > >>>>>> do see a couple of messages that came out of the driver:
> > >>>>>>
> > >>>>>> [    8.963389] ath11k_pci 0000:55:00.0: Unknown eventid: 0x16005
> > >>>>>> [   11.342317] ath11k_pci 0000:55:00.0: Unknown eventid: 0x1d00a
> > >>>>>>
> > >>>>>> then when it associates:
> > >>>>>>
> > >>>>>> [   16.718895] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > >>>>>> [   16.722636] wlp85s0: authenticated
> > >>>>>> [   16.724150] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > >>>>>> [   16.726486] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > >>>>>> (capab=0x411 status=0 aid=8)
> > >>>>>> [   16.738443] wlp85s0: associated
> > >>>>>> [   16.764966] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes
> > >>>>>> ready
> > >>>>>>
> > >>>>>> The adapter is achieving around 500 mbps on my gigabit connection, my
> > >>>>>> 2018 mbp sees around 650, so it's doing pretty well so far.
> > >>>>>>
> > >>>>>> Stefani - when you applied the patch that Kalle shared, which branch
> > >>>>>> did you apply it to?  I applied it to ath11k-qca6390-bringup and when
> > >>>>>> I revert 7fef431be9c9 there is a small merge conflict I needed to
> > >>>>>> resolve.  I wonder if either the starting branch, or your chosen
> > >>>>>> resolution are related to the instability you see (or I'm just lucky
> > >>>>>> so far! :)).
> > >>>>>>
> > >>>>> I used the vanilla kernel tree
> > >>>>> https://git.kernel.org/torvalds/t/linux-5.10-rc2.tar.gz. On top of this
> > >>>>> i applied the
> > >>>>>
> > >>>>> RFT-ath11k-pci-support-platforms-with-one-MSI-vector.patch
> > >>>>>
> > >>>>> and reverted the patch 7fef431be9c9
> > >>>> I did also my testing on v5.10-rc2 and I recommend to use that as the
> > >>>> baseline when debuggin these ath11k problems. It helps to compare the
> > >>>> results if everyone have the same baseline.
> > >>>>
> > >>>> --
> > >>>> https://patchwork.kernel.org/project/linux-wireless/list/
> > >>>>
> > >>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >>> Absolutely, I'll rebuild to 5.10 later today and apply the same series
> > >>> of patches and report back.
> > >> Great, thanks.
> > >>
> > >>> I'll also test out the patch on both versions from Carl to fix
> > >>> resuming. It stands to reason that we may be seeing another regression
> > >>> between Stefani (5.10) and myself (5.9 bringup branch) as I don't see
> > >>> any disconnections or instability once the interface is online.
> > >> Yeah, there is something strange happening between v5.9 and v5.10 we
> > >> have not yet figured out. Most likely it has something to do with memory
> > >> allocations and DMA transfers failing, but no clear understanding yet.
> > >>
> > >> But to keep things simple let's only discuss the MSI problem on this
> > >> thread, and discuss the timeouts in the another thread:
> > >>
> > >> http://lists.infradead.org/pipermail/ath11k/2020-November/000641.html
> > >>
> > >> I'll include you and other reporters to that thread.
> > >>
> > >> --
> > >> https://patchwork.kernel.org/project/linux-wireless/list/
> > >>
> > >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > > Ok, I've tried a clean checkout of 5.10-rc2 with the one MSI patch
> > > applied and 7fef431be9c9 reverted.  I can't get my machine to  boot
> > > into anything usable with that configuration.  I'm running ubuntu so
> > > its starting right into X and sometime between showing the available
> > > users and me clicking the icon to login the machine freezes.  I can
> > > see in the system tray that the wifi adapter is being activated and
> > > appears to have associated with an AP, I just can't do much beyond
> > > that as the keyboard backlight wakes up, but the caps lock key doesn't
> > > work.  I see similar behavior with the 5.9 configuration, but after a
> > > reboot or two I win whatever race is occuring.  With 5.10, I tried
> > > maybe 10-15 times with 0 success.
> >
> > I can confirm this behavior on my configuration. I managed to login once
> > and select the Wifi and connect to it. It seemed curiously enough be
> > stable long enough to enter the Wifi passphrase. After the connection
> > was established, the system hang and on each attempt to reboot into the
> > graphical system it would freeze at some point (sometimes even before
> > showing the login screen).
> >
> > Kernel was both based on 5.10-rc2 and 5.10-rc3 (I did see the same
> > behavior) with the patch applied, 7fef431be9c9 reverted and firmware
> > downloaded and copied to /lib/firmware/ath11k/QCA6390/hw2.0/.
> >
> >
>
> I did a bit more digging to see if I could find any new information,
> I'm not sure I did but here's what I did / found.  I spent the time to
> get a kdump kernel running and enabled, I was able to SysRq-C (both
> via keyboard and echo c > /proc/sysrq-trigger) and generate a crash
> dump.  Actually viewing them at the moment will require reverting a
> couple of patches to printk to fix the file for the crash utility
> (https://github.com/crash-utility/crash/issues/67), but right now
> that's not super important since the mechanism isn't being triggered.
> As reported here and by Mitchell, the adapter will work occasionally,
> but more often it will hang the machine (I too tried 5.10-rc3 with no
> noticable differences).  Whatever is causing the system to hang isn't
> triggering the kdump kernel to take over and dump the vmcore.  I've
> set watchdog=1 , nmi_watchdog=1, hung_task_panic=1, softlockup_panic=1
> trying to convince the kernel to dump it's state during this.  I've
> not been able to make it write a crash, it just sits 'hung'.  One
> interesting observation that may be related, is that if the lockup
> occurs during my login, I can actually see the system grind to a halt
> over the course of a number of frames (the rendering of the login
> animations starts to stutter/get really slow, then after a few frames
> everything is frozen).  If something were spin locking/ed, I'd expect
> the soft lockup panic to find it, but I don't know these mechanisms
> well.
>
> The only consistent behavior that I managed to create is that if the
> wifi adapter / machine are in a 'working' state (ie: I can browse the
> internet, etc) and I issue sysrq-c to crash the kernel and then let
> the crash dump write and reboot the machine, once booted the adapter
> is no longer seen by the kernel, and there are zero messages in dmesg
> that match "ath11k".  The driver shows up in lsmod , but it reports
> zero messages and it's like the adapter is completely invisible.  A
> power off and back on of the machine will re-enter it into the
> freezing/wifi working lottery.

Good evening all!  Just wanted to follow up as I think I've started to
uncover some of what's happening with the XPS and this driver.  So
since I can't get the kdump kernel to dump anything for me, I took a
bit more of a naive approach.  I blacklisted the modules (ath11k /
ath11k_pci) from modprobe so I could at least control when it was
loaded.  I managed to capture a series of crashes (in phone pics, but
I'll transcribe the relevant bits here) that seem to indicate some
kind of runaway / spin locked behavior.  In all but one case[*], both
the crash and the eventual working state, the driver completely
initialized successfully with messaging like this:

[   23.209335] ath11k_pci 0000:55:00.0: WARNING: ath11k PCI support is
experimental!
[   23.209404] ath11k_pci 0000:55:00.0: BAR 0: assigned [mem
0x8e300000-0x8e3fffff 64bit]
[   23.209421] ath11k_pci 0000:55:00.0: enabling device (0000 -> 0002)
[   23.209502] ath11k_pci 0000:55:00.0: MSI vectors: 1
[   23.454227] ath11k_pci 0000:55:00.0: Respond mem req failed,
result: 1, err: 0
[   23.454233] ath11k_pci 0000:55:00.0: qmi failed to respond fw mem req:-22
[   23.455810] ath11k_pci 0000:55:00.0: req mem_seg[0] 0x27d00000 524288 1
[   23.455814] ath11k_pci 0000:55:00.0: req mem_seg[1] 0x27d80000 524288 1
[   23.455816] ath11k_pci 0000:55:00.0: req mem_seg[2] 0x27e00000 524288 1
[   23.455817] ath11k_pci 0000:55:00.0: req mem_seg[3] 0x27e80000 294912 1
[   23.455819] ath11k_pci 0000:55:00.0: req mem_seg[4] 0x27f00000 524288 1
[   23.455820] ath11k_pci 0000:55:00.0: req mem_seg[5] 0x27f80000 524288 1
[   23.455822] ath11k_pci 0000:55:00.0: req mem_seg[6] 0x27800000 458752 1
[   23.455824] ath11k_pci 0000:55:00.0: req mem_seg[7] 0x27cc0000 131072 1
[   23.455825] ath11k_pci 0000:55:00.0: req mem_seg[8] 0x27880000 524288 4
[   23.455827] ath11k_pci 0000:55:00.0: req mem_seg[9] 0x27900000 360448 4
[   23.455829] ath11k_pci 0000:55:00.0: req mem_seg[10] 0x27ca4000 16384 1
[   23.466226] ath11k_pci 0000:55:00.0: chip_id 0x0 chip_family 0xb
board_id 0xff soc_id 0xffffffff
[   23.466230] ath11k_pci 0000:55:00.0: fw_version 0x101c06cc
fw_build_timestamp 2020-06-24 19:50 fw_build_id
[   23.677675] ath11k_pci 0000:55:00.0 wlp85s0: renamed from wlan0

So up until this point, everything is working without issues.
Everything seems to spiral out of control a couple of seconds later
when my system attempts to actually bring up the adapter.  In most of
the crash states I will see this:

[   31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
[   31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
[   31.391928] wlp85s0: authenticated
[   31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
[   31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
(capab=0x411 status=0 aid=6)
[   31.407730] wlp85s0: associated
[   31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready

And then either somewhere in that pile of messages, or a second or two
after this my machine will start to stutter as I mentioned before, and
then it either hangs, or I see this message (I'm truncating the
timestamp):

[   35.xxxx ] sched: RT throttling activated

After that moment, the machine is unresponsive.  Sorry I can't seem to
extract this data other than screenshots from my phone at the moment,
you can see the dmesg output from 6 different hangs here:
https://github.com/w1nk/ath11k-debug

* - In the case where the driver didn't fully initialize successfully
and hung; during the initialization right after the "MSI vectors: %d"
printk, I started seeing these:

[ 77.xxx ] alloc_contig_range: [88d8e0, 88d8e9) PFNs busy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-17 15:49                                                 ` wi nk
@ 2020-11-17 20:59                                                   ` Thomas Gleixner
  2020-11-18 10:22                                                     ` wi nk
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2020-11-17 20:59 UTC (permalink / raw)
  To: wi nk, Thomas Krause
  Cc: Kalle Valo, Govind Singh, linux-pci, Stefani Seibold,
	linux-wireless, Devin Bayer, Christoph Hellwig, Bjorn Helgaas,
	ath11k, David Woodhouse

On Tue, Nov 17 2020 at 16:49, wi nk wrote:
> On Sun, Nov 15, 2020 at 8:55 PM wi nk <wink@technolu.st> wrote:
> So up until this point, everything is working without issues.
> Everything seems to spiral out of control a couple of seconds later
> when my system attempts to actually bring up the adapter.  In most of
> the crash states I will see this:
>
> [   31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> [   31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> [   31.391928] wlp85s0: authenticated
> [   31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> [   31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> (capab=0x411 status=0 aid=6)
> [   31.407730] wlp85s0: associated
> [   31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
>
> And then either somewhere in that pile of messages, or a second or two
> after this my machine will start to stutter as I mentioned before, and
> then it either hangs, or I see this message (I'm truncating the
> timestamp):
>
> [   35.xxxx ] sched: RT throttling activated

As this driver uses threaded interrupts, this looks like an interrupt
storm and the interrupt thread consumes the CPU fully. The RT throttler
limits the RT runtime of it which allows other tasks make some
progress. That's what you observe as stutter.

You can apply the hack below so the irq thread(s) run in the SCHED_OTHER
class which prevents them from monopolizing the CPU. That might make the
problem simpler to debug.

Thanks,

        tglx
---
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c460e0496006..8473ecacac7a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1320,7 +1320,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
 	if (IS_ERR(t))
 		return PTR_ERR(t);
 
-	sched_set_fifo(t);
+	//sched_set_fifo(t);
 
 	/*
 	 * We keep the reference to the task struct even if

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310
  2020-11-17 20:59                                                   ` Thomas Gleixner
@ 2020-11-18 10:22                                                     ` wi nk
  0 siblings, 0 replies; 40+ messages in thread
From: wi nk @ 2020-11-18 10:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Thomas Krause, Kalle Valo, Govind Singh, linux-pci,
	Stefani Seibold, linux-wireless, Devin Bayer, Christoph Hellwig,
	Bjorn Helgaas, ath11k, David Woodhouse

On Tue, Nov 17, 2020 at 9:59 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, Nov 17 2020 at 16:49, wi nk wrote:
> > On Sun, Nov 15, 2020 at 8:55 PM wi nk <wink@technolu.st> wrote:
> > So up until this point, everything is working without issues.
> > Everything seems to spiral out of control a couple of seconds later
> > when my system attempts to actually bring up the adapter.  In most of
> > the crash states I will see this:
> >
> > [   31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
> > [   31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
> > [   31.391928] wlp85s0: authenticated
> > [   31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
> > [   31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
> > (capab=0x411 status=0 aid=6)
> > [   31.407730] wlp85s0: associated
> > [   31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready
> >
> > And then either somewhere in that pile of messages, or a second or two
> > after this my machine will start to stutter as I mentioned before, and
> > then it either hangs, or I see this message (I'm truncating the
> > timestamp):
> >
> > [   35.xxxx ] sched: RT throttling activated
>
> As this driver uses threaded interrupts, this looks like an interrupt
> storm and the interrupt thread consumes the CPU fully. The RT throttler
> limits the RT runtime of it which allows other tasks make some
> progress. That's what you observe as stutter.
>
> You can apply the hack below so the irq thread(s) run in the SCHED_OTHER
> class which prevents them from monopolizing the CPU. That might make the
> problem simpler to debug.
>
> Thanks,
>
>         tglx
> ---
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index c460e0496006..8473ecacac7a 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1320,7 +1320,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
>         if (IS_ERR(t))
>                 return PTR_ERR(t);
>
> -       sched_set_fifo(t);
> +       //sched_set_fifo(t);
>
>         /*
>          * We keep the reference to the task struct even if

I was able to apply this patch and play a little bit.  Unfortunately,
whatever is still going on is mostly the same.  It seems this patch
extends the 'stuttering' I see a little bit, but the end result is
still an unresponsive machine.  I didn't get tons of time to play yet,
so the extra time may make it possible to finally get sysrq-c issued
and get a vmcore dump.  I also tried to replicate a google android
patch I found to basically BUG() on the rt throttling activating
(https://groups.google.com/a/chromium.org/g/chromium-os-reviews/c/NDyPucYrvRY)
but that path hasn't activated for me since I booted it.  I'll
hopefully have a chance again this evening.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-11-18 10:22 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2849fd39-a7a6-8366-7c78-fc9fec4dffa4@posteo.de>
     [not found] ` <87tuuqhc1i.fsf@codeaurora.org>
     [not found]   ` <1ce6f735-21ff-db7e-c8dc-d567761964aa@posteo.de>
2020-11-02 18:49     ` pci_alloc_irq_vectors fails ENOSPC for XPS 13 9310 Kalle Valo
2020-11-02 20:57       ` Bjorn Helgaas
2020-11-03  3:01         ` Carl Huang
2020-11-03  6:49         ` Kalle Valo
2020-11-03 16:08           ` Bjorn Helgaas
2020-11-03 21:08             ` Thomas Gleixner
2020-11-03 22:42               ` Thomas Gleixner
2020-11-09 18:44                 ` Kalle Valo
     [not found]               ` <fa26ac8b-ed48-7ea3-c21b-b133532716b8@posteo.de>
2020-11-04 15:26                 ` Thomas Gleixner
2020-11-05 13:23                   ` Kalle Valo
2020-11-10  8:33                     ` Kalle Valo
2020-11-11  8:53                       ` Thomas Krause
2020-11-11  9:22                         ` Kalle Valo
2020-11-11 19:10                           ` Kalle Valo
2020-11-11 19:24                             ` wi nk
2020-11-11 19:30                               ` wi nk
2020-11-11 19:45                                 ` Kalle Valo
2020-11-11 20:12                                   ` wi nk
2020-11-11 21:35                             ` Stefani Seibold
2020-11-11 22:02                             ` Stefani Seibold
2020-11-12  0:24                               ` wi nk
2020-11-12  1:10                                 ` wi nk
2020-11-12  1:11                                   ` wi nk
2020-11-12  2:31                                     ` wi nk
2020-11-12  6:29                                       ` Carl Huang
2020-11-12  7:05                                   ` Stefani Seibold
2020-11-12  7:15                                     ` Kalle Valo
2020-11-12  7:41                                       ` wi nk
2020-11-12  8:59                                         ` Kalle Valo
2020-11-12 15:44                                           ` wi nk
2020-11-13  9:52                                             ` wi nk
2020-11-15 13:30                                             ` Thomas Krause
2020-11-15 19:55                                               ` wi nk
2020-11-17 15:49                                                 ` wi nk
2020-11-17 20:59                                                   ` Thomas Gleixner
2020-11-18 10:22                                                     ` wi nk
2020-11-11  9:39                         ` Thomas Gleixner
2020-11-06 11:45               ` Devin Bayer
2020-11-09 18:48             ` Kalle Valo
2020-11-03 11:20         ` Devin Bayer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).