* QCA6174 pcie wifi: Add pci quirks [not found] <d74205a4-8a69-c383-e265-1ed5b8508422@web.de> @ 2021-04-09 9:26 ` Ingmar Klein 2021-04-14 21:03 ` Bjorn Helgaas 2021-09-14 21:11 ` Bjorn Helgaas 0 siblings, 2 replies; 18+ messages in thread From: Ingmar Klein @ 2021-04-09 9:26 UTC (permalink / raw) To: bhelgaas, linux-pci, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1107 bytes --] Edit: Retry, as I did not consider, that my mail-client would make this party html. Dear maintainers, I recently encountered an issue on my Proxmox server system, that includes a Qualcomm QCA6174 m.2 PCIe wifi module. https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX On system boot and subsequent virtual machine start (with passed-through QCA6174), the VM would just freeze/hang, at the point where the ath10k driver loads. Quick search in the proxmox related topics, brought me to the following discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ I then went ahead, got the Proxmox kernel source (v5.4.106) and applied the attached patch. Effect was as hoped, that the VM hangs are now gone. System boots and runs as intended. Judging by the existing quirk entries for Atheros, I would think, that my proposed "fix" could be included in the vanilla kernel. As far as I saw, there is no entry yet, even in the latest kernel sources. Thank you very much! Best regards, Ingmar [-- Attachment #2: qualcomm_qca6174_add_pci_quirks.patch --] [-- Type: text/plain, Size: 645 bytes --] diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 706f27a86a8e..ecfe80ec5b9c 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3584,6 +3584,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset); /* * Root port on some Cavium CN8xxx chips do not successfully complete a bus ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-04-09 9:26 ` QCA6174 pcie wifi: Add pci quirks Ingmar Klein @ 2021-04-14 21:03 ` Bjorn Helgaas 2021-04-15 2:36 ` Alex Williamson 2021-09-14 21:11 ` Bjorn Helgaas 1 sibling, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2021-04-14 21:03 UTC (permalink / raw) To: Ingmar Klein; +Cc: bhelgaas, linux-pci, linux-kernel, Alex Williamson [+cc Alex] On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > Edit: Retry, as I did not consider, that my mail-client would make this > party html. > > Dear maintainers, > I recently encountered an issue on my Proxmox server system, that > includes a Qualcomm QCA6174 m.2 PCIe wifi module. > https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > > On system boot and subsequent virtual machine start (with passed-through > QCA6174), the VM would just freeze/hang, at the point where the ath10k > driver loads. > Quick search in the proxmox related topics, brought me to the following > discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: > https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > > I then went ahead, got the Proxmox kernel source (v5.4.106) and applied > the attached patch. > Effect was as hoped, that the VM hangs are now gone. System boots and > runs as intended. > > Judging by the existing quirk entries for Atheros, I would think, that > my proposed "fix" could be included in the vanilla kernel. > As far as I saw, there is no entry yet, even in the latest kernel sources. This would need a signed-off-by; see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 This is an old issue, and likely we'll end up just applying this as yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset"), where it started, it seems to be connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore support"). I'd like to dig into that a bit more to see if there are any clues. AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added a fair bit of code. I wonder if we're restoring something out of order or making some simple mistake in the way to restore VC config. > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 706f27a86a8e..ecfe80ec5b9c 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3584,6 +3584,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset); > > /* > * Root port on some Cavium CN8xxx chips do not successfully complete a bus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-04-14 21:03 ` Bjorn Helgaas @ 2021-04-15 2:36 ` Alex Williamson 2021-04-15 18:02 ` Ingmar Klein 0 siblings, 1 reply; 18+ messages in thread From: Alex Williamson @ 2021-04-15 2:36 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Ingmar Klein, bhelgaas, linux-pci, linux-kernel On Wed, 14 Apr 2021 16:03:50 -0500 Bjorn Helgaas <helgaas@kernel.org> wrote: > [+cc Alex] > > On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > > Edit: Retry, as I did not consider, that my mail-client would make this > > party html. > > > > Dear maintainers, > > I recently encountered an issue on my Proxmox server system, that > > includes a Qualcomm QCA6174 m.2 PCIe wifi module. > > https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > > > > On system boot and subsequent virtual machine start (with passed-through > > QCA6174), the VM would just freeze/hang, at the point where the ath10k > > driver loads. > > Quick search in the proxmox related topics, brought me to the following > > discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: > > https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > > > > I then went ahead, got the Proxmox kernel source (v5.4.106) and applied > > the attached patch. > > Effect was as hoped, that the VM hangs are now gone. System boots and > > runs as intended. > > > > Judging by the existing quirk entries for Atheros, I would think, that > > my proposed "fix" could be included in the vanilla kernel. > > As far as I saw, there is no entry yet, even in the latest kernel sources. > > This would need a signed-off-by; see > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 > > This is an old issue, and likely we'll end up just applying this as > yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros > AR93xx to avoid bus reset"), where it started, it seems to be > connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore > support"). > > I'd like to dig into that a bit more to see if there are any clues. > AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added > a fair bit of code. I wonder if we're restoring something out of > order or making some simple mistake in the way to restore VC config. I don't really have any faith in that bisect report in commit c3e59ee4e766. To double check I dug out the card from that commit, installed an old Fedora release so I could build kernel v3.13, pre-dating 425c1b223dac and tested triggering a bus reset both via setpci and by masking PM reset so that sysfs can trigger the bus reset path with the kernel save/restore code. Both result in the system hanging when the device is accessed either restoring from the kernel bus reset or reading from the device after the setpci reset. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-04-15 2:36 ` Alex Williamson @ 2021-04-15 18:02 ` Ingmar Klein 2021-04-15 19:01 ` Alex Williamson 0 siblings, 1 reply; 18+ messages in thread From: Ingmar Klein @ 2021-04-15 18:02 UTC (permalink / raw) To: Alex Williamson, Bjorn Helgaas; +Cc: bhelgaas, linux-pci, linux-kernel First thanks to you both, Alex and Bjorn! I am in no way an expert on this topic, so I have to fully rely on your feedback, concerning this issue. If you should have any other solution approach, in form of patch-set, I would be glad to test it out. Just let me know, what you think might make sense. I will wait for your further feedback on the issue. In the meantime I have my current workaround via quirk entry. By the way, my layman's question: Do you think, that the following topic might also apply for the QCA6174? https://www.spinics.net/lists/linux-pci/msg106395.html Or in other words, should a similar approach be tried for the QCA6174 and if yes, would it bring any benefit at all? I hope you can excuse me, in case the questions should not make too much sense. Best regards, Ingmar Am 15.04.2021 um 04:36 schrieb Alex Williamson: > On Wed, 14 Apr 2021 16:03:50 -0500 > Bjorn Helgaas <helgaas@kernel.org> wrote: > >> [+cc Alex] >> >> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: >>> Edit: Retry, as I did not consider, that my mail-client would make this >>> party html. >>> >>> Dear maintainers, >>> I recently encountered an issue on my Proxmox server system, that >>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. >>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX >>> >>> On system boot and subsequent virtual machine start (with passed-through >>> QCA6174), the VM would just freeze/hang, at the point where the ath10k >>> driver loads. >>> Quick search in the proxmox related topics, brought me to the following >>> discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: >>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ >>> >>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied >>> the attached patch. >>> Effect was as hoped, that the VM hangs are now gone. System boots and >>> runs as intended. >>> >>> Judging by the existing quirk entries for Atheros, I would think, that >>> my proposed "fix" could be included in the vanilla kernel. >>> As far as I saw, there is no entry yet, even in the latest kernel sources. >> This would need a signed-off-by; see >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 >> >> This is an old issue, and likely we'll end up just applying this as >> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros >> AR93xx to avoid bus reset"), where it started, it seems to be >> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore >> support"). >> >> I'd like to dig into that a bit more to see if there are any clues. >> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added >> a fair bit of code. I wonder if we're restoring something out of >> order or making some simple mistake in the way to restore VC config. > I don't really have any faith in that bisect report in commit > c3e59ee4e766. To double check I dug out the card from that commit, > installed an old Fedora release so I could build kernel v3.13, > pre-dating 425c1b223dac and tested triggering a bus reset both via > setpci and by masking PM reset so that sysfs can trigger the bus reset > path with the kernel save/restore code. Both result in the system > hanging when the device is accessed either restoring from the kernel > bus reset or reading from the device after the setpci reset. Thanks, > > Alex > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-04-15 18:02 ` Ingmar Klein @ 2021-04-15 19:01 ` Alex Williamson 2021-04-15 19:53 ` Pali Rohár 0 siblings, 1 reply; 18+ messages in thread From: Alex Williamson @ 2021-04-15 19:01 UTC (permalink / raw) To: Ingmar Klein Cc: Bjorn Helgaas, bhelgaas, linux-pci, linux-kernel, Pali Rohár [cc +Pali] On Thu, 15 Apr 2021 20:02:23 +0200 Ingmar Klein <ingmar_klein@web.de> wrote: > First thanks to you both, Alex and Bjorn! > I am in no way an expert on this topic, so I have to fully rely on your > feedback, concerning this issue. > > If you should have any other solution approach, in form of patch-set, I > would be glad to test it out. Just let me know, what you think might > make sense. > I will wait for your further feedback on the issue. In the meantime I > have my current workaround via quirk entry. > > By the way, my layman's question: > Do you think, that the following topic might also apply for the QCA6174? > https://www.spinics.net/lists/linux-pci/msg106395.html > Or in other words, should a similar approach be tried for the QCA6174 > and if yes, would it bring any benefit at all? > I hope you can excuse me, in case the questions should not make too much > sense. If you run lspci -vvv on your device, what do LnkCap and LnkSta report under the express capability? I wonder if your device even supports >Gen1 speeds, mine does not. I would not expect that patch to be relevant to you based on your report. I understand it to resolve an issue during link retraining to a higher speed on boot, not during a bus reset. Pali can correct if I'm wrong. Thanks, Alex > Am 15.04.2021 um 04:36 schrieb Alex Williamson: > > On Wed, 14 Apr 2021 16:03:50 -0500 > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > >> [+cc Alex] > >> > >> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > >>> Edit: Retry, as I did not consider, that my mail-client would make this > >>> party html. > >>> > >>> Dear maintainers, > >>> I recently encountered an issue on my Proxmox server system, that > >>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. > >>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > >>> > >>> On system boot and subsequent virtual machine start (with passed-through > >>> QCA6174), the VM would just freeze/hang, at the point where the ath10k > >>> driver loads. > >>> Quick search in the proxmox related topics, brought me to the following > >>> discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: > >>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > >>> > >>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied > >>> the attached patch. > >>> Effect was as hoped, that the VM hangs are now gone. System boots and > >>> runs as intended. > >>> > >>> Judging by the existing quirk entries for Atheros, I would think, that > >>> my proposed "fix" could be included in the vanilla kernel. > >>> As far as I saw, there is no entry yet, even in the latest kernel sources. > >> This would need a signed-off-by; see > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 > >> > >> This is an old issue, and likely we'll end up just applying this as > >> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros > >> AR93xx to avoid bus reset"), where it started, it seems to be > >> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore > >> support"). > >> > >> I'd like to dig into that a bit more to see if there are any clues. > >> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added > >> a fair bit of code. I wonder if we're restoring something out of > >> order or making some simple mistake in the way to restore VC config. > > I don't really have any faith in that bisect report in commit > > c3e59ee4e766. To double check I dug out the card from that commit, > > installed an old Fedora release so I could build kernel v3.13, > > pre-dating 425c1b223dac and tested triggering a bus reset both via > > setpci and by masking PM reset so that sysfs can trigger the bus reset > > path with the kernel save/restore code. Both result in the system > > hanging when the device is accessed either restoring from the kernel > > bus reset or reading from the device after the setpci reset. Thanks, > > > > Alex > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-04-15 19:01 ` Alex Williamson @ 2021-04-15 19:53 ` Pali Rohár 2021-05-25 22:12 ` Bjorn Helgaas 0 siblings, 1 reply; 18+ messages in thread From: Pali Rohár @ 2021-04-15 19:53 UTC (permalink / raw) To: Alex Williamson Cc: Ingmar Klein, Bjorn Helgaas, bhelgaas, linux-pci, linux-kernel Hello! On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: > [cc +Pali] > > On Thu, 15 Apr 2021 20:02:23 +0200 > Ingmar Klein <ingmar_klein@web.de> wrote: > > > First thanks to you both, Alex and Bjorn! > > I am in no way an expert on this topic, so I have to fully rely on your > > feedback, concerning this issue. > > > > If you should have any other solution approach, in form of patch-set, I > > would be glad to test it out. Just let me know, what you think might > > make sense. > > I will wait for your further feedback on the issue. In the meantime I > > have my current workaround via quirk entry. > > > > By the way, my layman's question: > > Do you think, that the following topic might also apply for the QCA6174? > > https://www.spinics.net/lists/linux-pci/msg106395.html I have been testing more ath cards and I'm going to send a new version of this patch with including more PCI ids. > > Or in other words, should a similar approach be tried for the QCA6174 > > and if yes, would it bring any benefit at all? > > I hope you can excuse me, in case the questions should not make too much > > sense. > > If you run lspci -vvv on your device, what do LnkCap and LnkSta report > under the express capability? I wonder if your device even supports > >Gen1 speeds, mine does not. > > I would not expect that patch to be relevant to you based on your > report. I understand it to resolve an issue during link retraining to a > higher speed on boot, not during a bus reset. Pali can correct if I'm > wrong. Thanks, These two issues are are related. Both operations (PCIe Hot Reset and PCIe Link Retraining) cause reset of ath chips. Seems that they cause double reset. After reset these chips reads configuration from internal EEPROM/OTP and if another reset is triggered prior chip finishes internal configuration read then it stops working. My testing showed that ath10k chips completely disappear from the PCIe bus, some ath9k chips works fine but starts reporting incorrect PCI ID (0xABCD) and some other ath9k chips reports correct PCI ID but does not work. I had discussion with Adrian Chadd who knows probably everything about ath9k and confirmed me that this issue is there with ath9k and ath10k chips. He wrote me that workaround to turn card back from this "broken" state is to do PCIe Cold Reset of the card, which means turning power supply off for particular PCIe slot. Such thing is not supported on many low-end boards, so workaround cannot be applied. I was able to recover my testing cards from this "broken" state by PCIe Warm Reset (= reset via PERST# pin). I have tried many other reset methods (PCIe PM reset, Link Down, PCIe Hot Reset with bigger internal, ...) but nothing worked. So seems that the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. I will send V2 of my patch with details and explanation. As kernel does not have API for doing PCIe Warm Reset, I think is another argument why kernel really needs it. I do not have any QCA6174 card for testing, but based on the fact I reproduced this issue with more ath9k and ath10 cards and Adrian confirmed that above reset issue is there, I think that it affects all AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. I was told that AMI BIOS was patching their BIOSes found in notebooks to avoid triggering this issue on notebooks ath9k cards. > Alex > > > Am 15.04.2021 um 04:36 schrieb Alex Williamson: > > > On Wed, 14 Apr 2021 16:03:50 -0500 > > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > >> [+cc Alex] > > >> > > >> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > > >>> Edit: Retry, as I did not consider, that my mail-client would make this > > >>> party html. > > >>> > > >>> Dear maintainers, > > >>> I recently encountered an issue on my Proxmox server system, that > > >>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. > > >>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > > >>> > > >>> On system boot and subsequent virtual machine start (with passed-through > > >>> QCA6174), the VM would just freeze/hang, at the point where the ath10k > > >>> driver loads. > > >>> Quick search in the proxmox related topics, brought me to the following > > >>> discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: > > >>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > > >>> > > >>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied > > >>> the attached patch. > > >>> Effect was as hoped, that the VM hangs are now gone. System boots and > > >>> runs as intended. > > >>> > > >>> Judging by the existing quirk entries for Atheros, I would think, that > > >>> my proposed "fix" could be included in the vanilla kernel. > > >>> As far as I saw, there is no entry yet, even in the latest kernel sources. > > >> This would need a signed-off-by; see > > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 > > >> > > >> This is an old issue, and likely we'll end up just applying this as > > >> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros > > >> AR93xx to avoid bus reset"), where it started, it seems to be > > >> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore > > >> support"). > > >> > > >> I'd like to dig into that a bit more to see if there are any clues. > > >> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added > > >> a fair bit of code. I wonder if we're restoring something out of > > >> order or making some simple mistake in the way to restore VC config. > > > I don't really have any faith in that bisect report in commit > > > c3e59ee4e766. To double check I dug out the card from that commit, > > > installed an old Fedora release so I could build kernel v3.13, > > > pre-dating 425c1b223dac and tested triggering a bus reset both via > > > setpci and by masking PM reset so that sysfs can trigger the bus reset > > > path with the kernel save/restore code. Both result in the system > > > hanging when the device is accessed either restoring from the kernel > > > bus reset or reading from the device after the setpci reset. Thanks, > > > > > > Alex > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-04-15 19:53 ` Pali Rohár @ 2021-05-25 22:12 ` Bjorn Helgaas 2021-05-28 18:08 ` Ingmar Klein 2021-07-21 8:54 ` Pali Rohár 0 siblings, 2 replies; 18+ messages in thread From: Bjorn Helgaas @ 2021-05-25 22:12 UTC (permalink / raw) To: Pali Rohár Cc: Alex Williamson, Ingmar Klein, bhelgaas, linux-pci, linux-kernel On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: > Hello! > > On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: > > [cc +Pali] > > > > On Thu, 15 Apr 2021 20:02:23 +0200 > > Ingmar Klein <ingmar_klein@web.de> wrote: > > > > > First thanks to you both, Alex and Bjorn! > > > I am in no way an expert on this topic, so I have to fully rely on your > > > feedback, concerning this issue. > > > > > > If you should have any other solution approach, in form of patch-set, I > > > would be glad to test it out. Just let me know, what you think might > > > make sense. > > > I will wait for your further feedback on the issue. In the meantime I > > > have my current workaround via quirk entry. > > > > > > By the way, my layman's question: > > > Do you think, that the following topic might also apply for the QCA6174? > > > https://www.spinics.net/lists/linux-pci/msg106395.html > > I have been testing more ath cards and I'm going to send a new version > of this patch with including more PCI ids. Dropping this patch in favor of Pali's new version. > > > Or in other words, should a similar approach be tried for the QCA6174 > > > and if yes, would it bring any benefit at all? > > > I hope you can excuse me, in case the questions should not make too much > > > sense. > > > > If you run lspci -vvv on your device, what do LnkCap and LnkSta report > > under the express capability? I wonder if your device even supports > > >Gen1 speeds, mine does not. > > > > I would not expect that patch to be relevant to you based on your > > report. I understand it to resolve an issue during link retraining to a > > higher speed on boot, not during a bus reset. Pali can correct if I'm > > wrong. Thanks, > > These two issues are are related. Both operations (PCIe Hot Reset and > PCIe Link Retraining) cause reset of ath chips. Seems that they cause > double reset. After reset these chips reads configuration from internal > EEPROM/OTP and if another reset is triggered prior chip finishes > internal configuration read then it stops working. My testing showed > that ath10k chips completely disappear from the PCIe bus, some ath9k > chips works fine but starts reporting incorrect PCI ID (0xABCD) and some > other ath9k chips reports correct PCI ID but does not work. I had > discussion with Adrian Chadd who knows probably everything about ath9k > and confirmed me that this issue is there with ath9k and ath10k chips. > > He wrote me that workaround to turn card back from this "broken" state > is to do PCIe Cold Reset of the card, which means turning power supply > off for particular PCIe slot. Such thing is not supported on many > low-end boards, so workaround cannot be applied. > > I was able to recover my testing cards from this "broken" state by PCIe > Warm Reset (= reset via PERST# pin). > > I have tried many other reset methods (PCIe PM reset, Link Down, PCIe > Hot Reset with bigger internal, ...) but nothing worked. So seems that > the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. > > I will send V2 of my patch with details and explanation. > > As kernel does not have API for doing PCIe Warm Reset, I think is > another argument why kernel really needs it. > > I do not have any QCA6174 card for testing, but based on the fact I > reproduced this issue with more ath9k and ath10 cards and Adrian > confirmed that above reset issue is there, I think that it affects all > AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. > > I was told that AMI BIOS was patching their BIOSes found in notebooks to > avoid triggering this issue on notebooks ath9k cards. > > > Alex > > > > > Am 15.04.2021 um 04:36 schrieb Alex Williamson: > > > > On Wed, 14 Apr 2021 16:03:50 -0500 > > > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > >> [+cc Alex] > > > >> > > > >> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > > > >>> Edit: Retry, as I did not consider, that my mail-client would make this > > > >>> party html. > > > >>> > > > >>> Dear maintainers, > > > >>> I recently encountered an issue on my Proxmox server system, that > > > >>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. > > > >>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > > > >>> > > > >>> On system boot and subsequent virtual machine start (with passed-through > > > >>> QCA6174), the VM would just freeze/hang, at the point where the ath10k > > > >>> driver loads. > > > >>> Quick search in the proxmox related topics, brought me to the following > > > >>> discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: > > > >>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > > > >>> > > > >>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied > > > >>> the attached patch. > > > >>> Effect was as hoped, that the VM hangs are now gone. System boots and > > > >>> runs as intended. > > > >>> > > > >>> Judging by the existing quirk entries for Atheros, I would think, that > > > >>> my proposed "fix" could be included in the vanilla kernel. > > > >>> As far as I saw, there is no entry yet, even in the latest kernel sources. > > > >> This would need a signed-off-by; see > > > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 > > > >> > > > >> This is an old issue, and likely we'll end up just applying this as > > > >> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros > > > >> AR93xx to avoid bus reset"), where it started, it seems to be > > > >> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore > > > >> support"). > > > >> > > > >> I'd like to dig into that a bit more to see if there are any clues. > > > >> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added > > > >> a fair bit of code. I wonder if we're restoring something out of > > > >> order or making some simple mistake in the way to restore VC config. > > > > I don't really have any faith in that bisect report in commit > > > > c3e59ee4e766. To double check I dug out the card from that commit, > > > > installed an old Fedora release so I could build kernel v3.13, > > > > pre-dating 425c1b223dac and tested triggering a bus reset both via > > > > setpci and by masking PM reset so that sysfs can trigger the bus reset > > > > path with the kernel save/restore code. Both result in the system > > > > hanging when the device is accessed either restoring from the kernel > > > > bus reset or reading from the device after the setpci reset. Thanks, > > > > > > > > Alex > > > > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-05-25 22:12 ` Bjorn Helgaas @ 2021-05-28 18:08 ` Ingmar Klein 2021-05-28 18:21 ` Pali Rohár 2021-07-21 8:54 ` Pali Rohár 1 sibling, 1 reply; 18+ messages in thread From: Ingmar Klein @ 2021-05-28 18:08 UTC (permalink / raw) To: Bjorn Helgaas, Pali Rohár Cc: Alex Williamson, bhelgaas, linux-pci, linux-kernel Thanks to both of you, Bjorn and Pali! I had hoped that Pali would come with an appropriate fix. Good to know, that this is taken care of. Will test ASAP, but I am confident, that it will work anyway. Should it unexpectedly not fix my issues, I'll let you know. Have a nice weekend! Best regards, Ingmar Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas: > On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: >> Hello! >> >> On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: >>> [cc +Pali] >>> >>> On Thu, 15 Apr 2021 20:02:23 +0200 >>> Ingmar Klein <ingmar_klein@web.de> wrote: >>> >>>> First thanks to you both, Alex and Bjorn! >>>> I am in no way an expert on this topic, so I have to fully rely on your >>>> feedback, concerning this issue. >>>> >>>> If you should have any other solution approach, in form of patch-set, I >>>> would be glad to test it out. Just let me know, what you think might >>>> make sense. >>>> I will wait for your further feedback on the issue. In the meantime I >>>> have my current workaround via quirk entry. >>>> >>>> By the way, my layman's question: >>>> Do you think, that the following topic might also apply for the QCA6174? >>>> https://www.spinics.net/lists/linux-pci/msg106395.html >> I have been testing more ath cards and I'm going to send a new version >> of this patch with including more PCI ids. > Dropping this patch in favor of Pali's new version. > >>>> Or in other words, should a similar approach be tried for the QCA6174 >>>> and if yes, would it bring any benefit at all? >>>> I hope you can excuse me, in case the questions should not make too much >>>> sense. >>> If you run lspci -vvv on your device, what do LnkCap and LnkSta report >>> under the express capability? I wonder if your device even supports >>>> Gen1 speeds, mine does not. >>> I would not expect that patch to be relevant to you based on your >>> report. I understand it to resolve an issue during link retraining to a >>> higher speed on boot, not during a bus reset. Pali can correct if I'm >>> wrong. Thanks, >> These two issues are are related. Both operations (PCIe Hot Reset and >> PCIe Link Retraining) cause reset of ath chips. Seems that they cause >> double reset. After reset these chips reads configuration from internal >> EEPROM/OTP and if another reset is triggered prior chip finishes >> internal configuration read then it stops working. My testing showed >> that ath10k chips completely disappear from the PCIe bus, some ath9k >> chips works fine but starts reporting incorrect PCI ID (0xABCD) and some >> other ath9k chips reports correct PCI ID but does not work. I had >> discussion with Adrian Chadd who knows probably everything about ath9k >> and confirmed me that this issue is there with ath9k and ath10k chips. >> >> He wrote me that workaround to turn card back from this "broken" state >> is to do PCIe Cold Reset of the card, which means turning power supply >> off for particular PCIe slot. Such thing is not supported on many >> low-end boards, so workaround cannot be applied. >> >> I was able to recover my testing cards from this "broken" state by PCIe >> Warm Reset (= reset via PERST# pin). >> >> I have tried many other reset methods (PCIe PM reset, Link Down, PCIe >> Hot Reset with bigger internal, ...) but nothing worked. So seems that >> the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. >> >> I will send V2 of my patch with details and explanation. >> >> As kernel does not have API for doing PCIe Warm Reset, I think is >> another argument why kernel really needs it. >> >> I do not have any QCA6174 card for testing, but based on the fact I >> reproduced this issue with more ath9k and ath10 cards and Adrian >> confirmed that above reset issue is there, I think that it affects all >> AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. >> >> I was told that AMI BIOS was patching their BIOSes found in notebooks to >> avoid triggering this issue on notebooks ath9k cards. >> >>> Alex >>> >>>> Am 15.04.2021 um 04:36 schrieb Alex Williamson: >>>>> On Wed, 14 Apr 2021 16:03:50 -0500 >>>>> Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>> >>>>>> [+cc Alex] >>>>>> >>>>>> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: >>>>>>> Edit: Retry, as I did not consider, that my mail-client would make this >>>>>>> party html. >>>>>>> >>>>>>> Dear maintainers, >>>>>>> I recently encountered an issue on my Proxmox server system, that >>>>>>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. >>>>>>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX >>>>>>> >>>>>>> On system boot and subsequent virtual machine start (with passed-through >>>>>>> QCA6174), the VM would just freeze/hang, at the point where the ath10k >>>>>>> driver loads. >>>>>>> Quick search in the proxmox related topics, brought me to the following >>>>>>> discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: >>>>>>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ >>>>>>> >>>>>>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied >>>>>>> the attached patch. >>>>>>> Effect was as hoped, that the VM hangs are now gone. System boots and >>>>>>> runs as intended. >>>>>>> >>>>>>> Judging by the existing quirk entries for Atheros, I would think, that >>>>>>> my proposed "fix" could be included in the vanilla kernel. >>>>>>> As far as I saw, there is no entry yet, even in the latest kernel sources. >>>>>> This would need a signed-off-by; see >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 >>>>>> >>>>>> This is an old issue, and likely we'll end up just applying this as >>>>>> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros >>>>>> AR93xx to avoid bus reset"), where it started, it seems to be >>>>>> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore >>>>>> support"). >>>>>> >>>>>> I'd like to dig into that a bit more to see if there are any clues. >>>>>> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added >>>>>> a fair bit of code. I wonder if we're restoring something out of >>>>>> order or making some simple mistake in the way to restore VC config. >>>>> I don't really have any faith in that bisect report in commit >>>>> c3e59ee4e766. To double check I dug out the card from that commit, >>>>> installed an old Fedora release so I could build kernel v3.13, >>>>> pre-dating 425c1b223dac and tested triggering a bus reset both via >>>>> setpci and by masking PM reset so that sysfs can trigger the bus reset >>>>> path with the kernel save/restore code. Both result in the system >>>>> hanging when the device is accessed either restoring from the kernel >>>>> bus reset or reading from the device after the setpci reset. Thanks, >>>>> >>>>> Alex >>>>> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-05-28 18:08 ` Ingmar Klein @ 2021-05-28 18:21 ` Pali Rohár 2021-05-28 18:47 ` Ingmar Klein 0 siblings, 1 reply; 18+ messages in thread From: Pali Rohár @ 2021-05-28 18:21 UTC (permalink / raw) To: Ingmar Klein Cc: Bjorn Helgaas, Alex Williamson, bhelgaas, linux-pci, linux-kernel Hello Ingmar! Now I see that in your patch you have Atheros card with id 0x003e: https://lore.kernel.org/linux-pci/08982e05-b6e8-5a8d-24ab-da1488ee50a8@web.de/ With my patch I have tested 5 different Atheros cards but none has id 0x003e: https://lore.kernel.org/linux-pci/20210505163357.16012-1-pali@kernel.org/ So my patch does not fix that issue for your 0x003e card. I just do not have such card for testing. Could you try to apply my patch and then add your id 0x003e into quirk list if it helps? On Friday 28 May 2021 20:08:52 Ingmar Klein wrote: > Thanks to both of you, Bjorn and Pali! > I had hoped that Pali would come with an appropriate fix. Good to know, > that this is taken care of. > > Will test ASAP, but I am confident, that it will work anyway. > Should it unexpectedly not fix my issues, I'll let you know. > Have a nice weekend! > Best regards, > Ingmar > > > Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas: > > On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: > > > Hello! > > > > > > On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: > > > > [cc +Pali] > > > > > > > > On Thu, 15 Apr 2021 20:02:23 +0200 > > > > Ingmar Klein <ingmar_klein@web.de> wrote: > > > > > > > > > First thanks to you both, Alex and Bjorn! > > > > > I am in no way an expert on this topic, so I have to fully rely on your > > > > > feedback, concerning this issue. > > > > > > > > > > If you should have any other solution approach, in form of patch-set, I > > > > > would be glad to test it out. Just let me know, what you think might > > > > > make sense. > > > > > I will wait for your further feedback on the issue. In the meantime I > > > > > have my current workaround via quirk entry. > > > > > > > > > > By the way, my layman's question: > > > > > Do you think, that the following topic might also apply for the QCA6174? > > > > > https://www.spinics.net/lists/linux-pci/msg106395.html > > > I have been testing more ath cards and I'm going to send a new version > > > of this patch with including more PCI ids. > > Dropping this patch in favor of Pali's new version. > > > > > > > Or in other words, should a similar approach be tried for the QCA6174 > > > > > and if yes, would it bring any benefit at all? > > > > > I hope you can excuse me, in case the questions should not make too much > > > > > sense. > > > > If you run lspci -vvv on your device, what do LnkCap and LnkSta report > > > > under the express capability? I wonder if your device even supports > > > > > Gen1 speeds, mine does not. > > > > I would not expect that patch to be relevant to you based on your > > > > report. I understand it to resolve an issue during link retraining to a > > > > higher speed on boot, not during a bus reset. Pali can correct if I'm > > > > wrong. Thanks, > > > These two issues are are related. Both operations (PCIe Hot Reset and > > > PCIe Link Retraining) cause reset of ath chips. Seems that they cause > > > double reset. After reset these chips reads configuration from internal > > > EEPROM/OTP and if another reset is triggered prior chip finishes > > > internal configuration read then it stops working. My testing showed > > > that ath10k chips completely disappear from the PCIe bus, some ath9k > > > chips works fine but starts reporting incorrect PCI ID (0xABCD) and some > > > other ath9k chips reports correct PCI ID but does not work. I had > > > discussion with Adrian Chadd who knows probably everything about ath9k > > > and confirmed me that this issue is there with ath9k and ath10k chips. > > > > > > He wrote me that workaround to turn card back from this "broken" state > > > is to do PCIe Cold Reset of the card, which means turning power supply > > > off for particular PCIe slot. Such thing is not supported on many > > > low-end boards, so workaround cannot be applied. > > > > > > I was able to recover my testing cards from this "broken" state by PCIe > > > Warm Reset (= reset via PERST# pin). > > > > > > I have tried many other reset methods (PCIe PM reset, Link Down, PCIe > > > Hot Reset with bigger internal, ...) but nothing worked. So seems that > > > the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. > > > > > > I will send V2 of my patch with details and explanation. > > > > > > As kernel does not have API for doing PCIe Warm Reset, I think is > > > another argument why kernel really needs it. > > > > > > I do not have any QCA6174 card for testing, but based on the fact I > > > reproduced this issue with more ath9k and ath10 cards and Adrian > > > confirmed that above reset issue is there, I think that it affects all > > > AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. > > > > > > I was told that AMI BIOS was patching their BIOSes found in notebooks to > > > avoid triggering this issue on notebooks ath9k cards. > > > > > > > Alex > > > > > > > > > Am 15.04.2021 um 04:36 schrieb Alex Williamson: > > > > > > On Wed, 14 Apr 2021 16:03:50 -0500 > > > > > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > > > > > > [+cc Alex] > > > > > > > > > > > > > > On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > > > > > > > > Edit: Retry, as I did not consider, that my mail-client would make this > > > > > > > > party html. > > > > > > > > > > > > > > > > Dear maintainers, > > > > > > > > I recently encountered an issue on my Proxmox server system, that > > > > > > > > includes a Qualcomm QCA6174 m.2 PCIe wifi module. > > > > > > > > https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > > > > > > > > > > > > > > > > On system boot and subsequent virtual machine start (with passed-through > > > > > > > > QCA6174), the VM would just freeze/hang, at the point where the ath10k > > > > > > > > driver loads. > > > > > > > > Quick search in the proxmox related topics, brought me to the following > > > > > > > > discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: > > > > > > > > https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > > > > > > > > > > > > > > > > I then went ahead, got the Proxmox kernel source (v5.4.106) and applied > > > > > > > > the attached patch. > > > > > > > > Effect was as hoped, that the VM hangs are now gone. System boots and > > > > > > > > runs as intended. > > > > > > > > > > > > > > > > Judging by the existing quirk entries for Atheros, I would think, that > > > > > > > > my proposed "fix" could be included in the vanilla kernel. > > > > > > > > As far as I saw, there is no entry yet, even in the latest kernel sources. > > > > > > > This would need a signed-off-by; see > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 > > > > > > > > > > > > > > This is an old issue, and likely we'll end up just applying this as > > > > > > > yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros > > > > > > > AR93xx to avoid bus reset"), where it started, it seems to be > > > > > > > connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore > > > > > > > support"). > > > > > > > > > > > > > > I'd like to dig into that a bit more to see if there are any clues. > > > > > > > AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added > > > > > > > a fair bit of code. I wonder if we're restoring something out of > > > > > > > order or making some simple mistake in the way to restore VC config. > > > > > > I don't really have any faith in that bisect report in commit > > > > > > c3e59ee4e766. To double check I dug out the card from that commit, > > > > > > installed an old Fedora release so I could build kernel v3.13, > > > > > > pre-dating 425c1b223dac and tested triggering a bus reset both via > > > > > > setpci and by masking PM reset so that sysfs can trigger the bus reset > > > > > > path with the kernel save/restore code. Both result in the system > > > > > > hanging when the device is accessed either restoring from the kernel > > > > > > bus reset or reading from the device after the setpci reset. Thanks, > > > > > > > > > > > > Alex > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-05-28 18:21 ` Pali Rohár @ 2021-05-28 18:47 ` Ingmar Klein 2021-06-05 14:46 ` Ingmar Klein 0 siblings, 1 reply; 18+ messages in thread From: Ingmar Klein @ 2021-05-28 18:47 UTC (permalink / raw) To: Pali Rohár Cc: Bjorn Helgaas, Alex Williamson, bhelgaas, linux-pci, linux-kernel Hi Pali, sorry for not checking that detail! Of course no problem that you couldn't test that ID. Will be glad to do so. I'll let you know how this turns out. Best regards, Ingmar Am 28.05.2021 um 20:21 schrieb Pali Rohár: > Hello Ingmar! > > Now I see that in your patch you have Atheros card with id 0x003e: > https://lore.kernel.org/linux-pci/08982e05-b6e8-5a8d-24ab-da1488ee50a8@web.de/ > > With my patch I have tested 5 different Atheros cards but none has id 0x003e: > https://lore.kernel.org/linux-pci/20210505163357.16012-1-pali@kernel.org/ > > So my patch does not fix that issue for your 0x003e card. I just do not > have such card for testing. > > Could you try to apply my patch and then add your id 0x003e into quirk > list if it helps? > > On Friday 28 May 2021 20:08:52 Ingmar Klein wrote: >> Thanks to both of you, Bjorn and Pali! >> I had hoped that Pali would come with an appropriate fix. Good to know, >> that this is taken care of. >> >> Will test ASAP, but I am confident, that it will work anyway. >> Should it unexpectedly not fix my issues, I'll let you know. >> Have a nice weekend! >> Best regards, >> Ingmar >> >> >> Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas: >>> On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: >>>> Hello! >>>> >>>> On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: >>>>> [cc +Pali] >>>>> >>>>> On Thu, 15 Apr 2021 20:02:23 +0200 >>>>> Ingmar Klein <ingmar_klein@web.de> wrote: >>>>> >>>>>> First thanks to you both, Alex and Bjorn! >>>>>> I am in no way an expert on this topic, so I have to fully rely on your >>>>>> feedback, concerning this issue. >>>>>> >>>>>> If you should have any other solution approach, in form of patch-set, I >>>>>> would be glad to test it out. Just let me know, what you think might >>>>>> make sense. >>>>>> I will wait for your further feedback on the issue. In the meantime I >>>>>> have my current workaround via quirk entry. >>>>>> >>>>>> By the way, my layman's question: >>>>>> Do you think, that the following topic might also apply for the QCA6174? >>>>>> https://www.spinics.net/lists/linux-pci/msg106395.html >>>> I have been testing more ath cards and I'm going to send a new version >>>> of this patch with including more PCI ids. >>> Dropping this patch in favor of Pali's new version. >>> >>>>>> Or in other words, should a similar approach be tried for the QCA6174 >>>>>> and if yes, would it bring any benefit at all? >>>>>> I hope you can excuse me, in case the questions should not make too much >>>>>> sense. >>>>> If you run lspci -vvv on your device, what do LnkCap and LnkSta report >>>>> under the express capability? I wonder if your device even supports >>>>>> Gen1 speeds, mine does not. >>>>> I would not expect that patch to be relevant to you based on your >>>>> report. I understand it to resolve an issue during link retraining to a >>>>> higher speed on boot, not during a bus reset. Pali can correct if I'm >>>>> wrong. Thanks, >>>> These two issues are are related. Both operations (PCIe Hot Reset and >>>> PCIe Link Retraining) cause reset of ath chips. Seems that they cause >>>> double reset. After reset these chips reads configuration from internal >>>> EEPROM/OTP and if another reset is triggered prior chip finishes >>>> internal configuration read then it stops working. My testing showed >>>> that ath10k chips completely disappear from the PCIe bus, some ath9k >>>> chips works fine but starts reporting incorrect PCI ID (0xABCD) and some >>>> other ath9k chips reports correct PCI ID but does not work. I had >>>> discussion with Adrian Chadd who knows probably everything about ath9k >>>> and confirmed me that this issue is there with ath9k and ath10k chips. >>>> >>>> He wrote me that workaround to turn card back from this "broken" state >>>> is to do PCIe Cold Reset of the card, which means turning power supply >>>> off for particular PCIe slot. Such thing is not supported on many >>>> low-end boards, so workaround cannot be applied. >>>> >>>> I was able to recover my testing cards from this "broken" state by PCIe >>>> Warm Reset (= reset via PERST# pin). >>>> >>>> I have tried many other reset methods (PCIe PM reset, Link Down, PCIe >>>> Hot Reset with bigger internal, ...) but nothing worked. So seems that >>>> the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. >>>> >>>> I will send V2 of my patch with details and explanation. >>>> >>>> As kernel does not have API for doing PCIe Warm Reset, I think is >>>> another argument why kernel really needs it. >>>> >>>> I do not have any QCA6174 card for testing, but based on the fact I >>>> reproduced this issue with more ath9k and ath10 cards and Adrian >>>> confirmed that above reset issue is there, I think that it affects all >>>> AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. >>>> >>>> I was told that AMI BIOS was patching their BIOSes found in notebooks to >>>> avoid triggering this issue on notebooks ath9k cards. >>>> >>>>> Alex >>>>> >>>>>> Am 15.04.2021 um 04:36 schrieb Alex Williamson: >>>>>>> On Wed, 14 Apr 2021 16:03:50 -0500 >>>>>>> Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>>>> >>>>>>>> [+cc Alex] >>>>>>>> >>>>>>>> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: >>>>>>>>> Edit: Retry, as I did not consider, that my mail-client would make this >>>>>>>>> party html. >>>>>>>>> >>>>>>>>> Dear maintainers, >>>>>>>>> I recently encountered an issue on my Proxmox server system, that >>>>>>>>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. >>>>>>>>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX >>>>>>>>> >>>>>>>>> On system boot and subsequent virtual machine start (with passed-through >>>>>>>>> QCA6174), the VM would just freeze/hang, at the point where the ath10k >>>>>>>>> driver loads. >>>>>>>>> Quick search in the proxmox related topics, brought me to the following >>>>>>>>> discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: >>>>>>>>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ >>>>>>>>> >>>>>>>>> I then went ahead, got the Proxmox kernel source (v5.4.106) and applied >>>>>>>>> the attached patch. >>>>>>>>> Effect was as hoped, that the VM hangs are now gone. System boots and >>>>>>>>> runs as intended. >>>>>>>>> >>>>>>>>> Judging by the existing quirk entries for Atheros, I would think, that >>>>>>>>> my proposed "fix" could be included in the vanilla kernel. >>>>>>>>> As far as I saw, there is no entry yet, even in the latest kernel sources. >>>>>>>> This would need a signed-off-by; see >>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 >>>>>>>> >>>>>>>> This is an old issue, and likely we'll end up just applying this as >>>>>>>> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark Atheros >>>>>>>> AR93xx to avoid bus reset"), where it started, it seems to be >>>>>>>> connected to 425c1b223dac ("PCI: Add Virtual Channel to save/restore >>>>>>>> support"). >>>>>>>> >>>>>>>> I'd like to dig into that a bit more to see if there are any clues. >>>>>>>> AFAIK Linux itself still doesn't use VC at all, and 425c1b223dac added >>>>>>>> a fair bit of code. I wonder if we're restoring something out of >>>>>>>> order or making some simple mistake in the way to restore VC config. >>>>>>> I don't really have any faith in that bisect report in commit >>>>>>> c3e59ee4e766. To double check I dug out the card from that commit, >>>>>>> installed an old Fedora release so I could build kernel v3.13, >>>>>>> pre-dating 425c1b223dac and tested triggering a bus reset both via >>>>>>> setpci and by masking PM reset so that sysfs can trigger the bus reset >>>>>>> path with the kernel save/restore code. Both result in the system >>>>>>> hanging when the device is accessed either restoring from the kernel >>>>>>> bus reset or reading from the device after the setpci reset. Thanks, >>>>>>> >>>>>>> Alex >>>>>>> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-05-28 18:47 ` Ingmar Klein @ 2021-06-05 14:46 ` Ingmar Klein 2021-06-08 18:34 ` Pali Rohár 0 siblings, 1 reply; 18+ messages in thread From: Ingmar Klein @ 2021-06-05 14:46 UTC (permalink / raw) To: Pali Rohár, bhelgaas, Bjorn Helgaas Cc: Alex Williamson, linux-pci, linux-kernel Hi Pali and Bjorn, finally found the time to test. Pali's v3 patch seems to work like a charm for my card with "0x003e" id as well. Just finished compiling a pve-kernel v5.11.21 with Pali's patch, slightly adjusted for my test card and the Ubuntu kernel source (no functional differences, just minor adjustments to make it fit the Proxmox pve-kernel). System works just fine, in contrast to without patch. Of course, no long term tests, yet. However, it is looking really good. Thanks guys! Best regards, Ingmar Am 28.05.2021 um 20:47 schrieb Ingmar Klein: > Hi Pali, > sorry for not checking that detail! > Of course no problem that you couldn't test that ID. Will be glad to > do so. > > I'll let you know how this turns out. > > Best regards, > Ingmar > > > Am 28.05.2021 um 20:21 schrieb Pali Rohár: >> Hello Ingmar! >> >> Now I see that in your patch you have Atheros card with id 0x003e: >> https://lore.kernel.org/linux-pci/08982e05-b6e8-5a8d-24ab-da1488ee50a8@web.de/ >> >> >> With my patch I have tested 5 different Atheros cards but none has id >> 0x003e: >> https://lore.kernel.org/linux-pci/20210505163357.16012-1-pali@kernel.org/ >> >> >> So my patch does not fix that issue for your 0x003e card. I just do not >> have such card for testing. >> >> Could you try to apply my patch and then add your id 0x003e into quirk >> list if it helps? >> >> On Friday 28 May 2021 20:08:52 Ingmar Klein wrote: >>> Thanks to both of you, Bjorn and Pali! >>> I had hoped that Pali would come with an appropriate fix. Good to know, >>> that this is taken care of. >>> >>> Will test ASAP, but I am confident, that it will work anyway. >>> Should it unexpectedly not fix my issues, I'll let you know. >>> Have a nice weekend! >>> Best regards, >>> Ingmar >>> >>> >>> Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas: >>>> On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: >>>>> Hello! >>>>> >>>>> On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: >>>>>> [cc +Pali] >>>>>> >>>>>> On Thu, 15 Apr 2021 20:02:23 +0200 >>>>>> Ingmar Klein <ingmar_klein@web.de> wrote: >>>>>> >>>>>>> First thanks to you both, Alex and Bjorn! >>>>>>> I am in no way an expert on this topic, so I have to fully rely >>>>>>> on your >>>>>>> feedback, concerning this issue. >>>>>>> >>>>>>> If you should have any other solution approach, in form of >>>>>>> patch-set, I >>>>>>> would be glad to test it out. Just let me know, what you think >>>>>>> might >>>>>>> make sense. >>>>>>> I will wait for your further feedback on the issue. In the >>>>>>> meantime I >>>>>>> have my current workaround via quirk entry. >>>>>>> >>>>>>> By the way, my layman's question: >>>>>>> Do you think, that the following topic might also apply for the >>>>>>> QCA6174? >>>>>>> https://www.spinics.net/lists/linux-pci/msg106395.html >>>>> I have been testing more ath cards and I'm going to send a new >>>>> version >>>>> of this patch with including more PCI ids. >>>> Dropping this patch in favor of Pali's new version. >>>> >>>>>>> Or in other words, should a similar approach be tried for the >>>>>>> QCA6174 >>>>>>> and if yes, would it bring any benefit at all? >>>>>>> I hope you can excuse me, in case the questions should not make >>>>>>> too much >>>>>>> sense. >>>>>> If you run lspci -vvv on your device, what do LnkCap and LnkSta >>>>>> report >>>>>> under the express capability? I wonder if your device even supports >>>>>>> Gen1 speeds, mine does not. >>>>>> I would not expect that patch to be relevant to you based on your >>>>>> report. I understand it to resolve an issue during link >>>>>> retraining to a >>>>>> higher speed on boot, not during a bus reset. Pali can correct >>>>>> if I'm >>>>>> wrong. Thanks, >>>>> These two issues are are related. Both operations (PCIe Hot Reset and >>>>> PCIe Link Retraining) cause reset of ath chips. Seems that they cause >>>>> double reset. After reset these chips reads configuration from >>>>> internal >>>>> EEPROM/OTP and if another reset is triggered prior chip finishes >>>>> internal configuration read then it stops working. My testing showed >>>>> that ath10k chips completely disappear from the PCIe bus, some ath9k >>>>> chips works fine but starts reporting incorrect PCI ID (0xABCD) >>>>> and some >>>>> other ath9k chips reports correct PCI ID but does not work. I had >>>>> discussion with Adrian Chadd who knows probably everything about >>>>> ath9k >>>>> and confirmed me that this issue is there with ath9k and ath10k >>>>> chips. >>>>> >>>>> He wrote me that workaround to turn card back from this "broken" >>>>> state >>>>> is to do PCIe Cold Reset of the card, which means turning power >>>>> supply >>>>> off for particular PCIe slot. Such thing is not supported on many >>>>> low-end boards, so workaround cannot be applied. >>>>> >>>>> I was able to recover my testing cards from this "broken" state by >>>>> PCIe >>>>> Warm Reset (= reset via PERST# pin). >>>>> >>>>> I have tried many other reset methods (PCIe PM reset, Link Down, PCIe >>>>> Hot Reset with bigger internal, ...) but nothing worked. So seems >>>>> that >>>>> the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. >>>>> >>>>> I will send V2 of my patch with details and explanation. >>>>> >>>>> As kernel does not have API for doing PCIe Warm Reset, I think is >>>>> another argument why kernel really needs it. >>>>> >>>>> I do not have any QCA6174 card for testing, but based on the fact I >>>>> reproduced this issue with more ath9k and ath10 cards and Adrian >>>>> confirmed that above reset issue is there, I think that it affects >>>>> all >>>>> AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. >>>>> >>>>> I was told that AMI BIOS was patching their BIOSes found in >>>>> notebooks to >>>>> avoid triggering this issue on notebooks ath9k cards. >>>>> >>>>>> Alex >>>>>> >>>>>>> Am 15.04.2021 um 04:36 schrieb Alex Williamson: >>>>>>>> On Wed, 14 Apr 2021 16:03:50 -0500 >>>>>>>> Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>>>>> >>>>>>>>> [+cc Alex] >>>>>>>>> >>>>>>>>> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: >>>>>>>>>> Edit: Retry, as I did not consider, that my mail-client would >>>>>>>>>> make this >>>>>>>>>> party html. >>>>>>>>>> >>>>>>>>>> Dear maintainers, >>>>>>>>>> I recently encountered an issue on my Proxmox server system, >>>>>>>>>> that >>>>>>>>>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. >>>>>>>>>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX >>>>>>>>>> >>>>>>>>>> On system boot and subsequent virtual machine start (with >>>>>>>>>> passed-through >>>>>>>>>> QCA6174), the VM would just freeze/hang, at the point where >>>>>>>>>> the ath10k >>>>>>>>>> driver loads. >>>>>>>>>> Quick search in the proxmox related topics, brought me to the >>>>>>>>>> following >>>>>>>>>> discussion, which suggested a PCI quirk entry for the QCA6174 >>>>>>>>>> in the kernel: >>>>>>>>>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I then went ahead, got the Proxmox kernel source (v5.4.106) >>>>>>>>>> and applied >>>>>>>>>> the attached patch. >>>>>>>>>> Effect was as hoped, that the VM hangs are now gone. System >>>>>>>>>> boots and >>>>>>>>>> runs as intended. >>>>>>>>>> >>>>>>>>>> Judging by the existing quirk entries for Atheros, I would >>>>>>>>>> think, that >>>>>>>>>> my proposed "fix" could be included in the vanilla kernel. >>>>>>>>>> As far as I saw, there is no entry yet, even in the latest >>>>>>>>>> kernel sources. >>>>>>>>> This would need a signed-off-by; see >>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 >>>>>>>>> >>>>>>>>> >>>>>>>>> This is an old issue, and likely we'll end up just applying >>>>>>>>> this as >>>>>>>>> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark >>>>>>>>> Atheros >>>>>>>>> AR93xx to avoid bus reset"), where it started, it seems to be >>>>>>>>> connected to 425c1b223dac ("PCI: Add Virtual Channel to >>>>>>>>> save/restore >>>>>>>>> support"). >>>>>>>>> >>>>>>>>> I'd like to dig into that a bit more to see if there are any >>>>>>>>> clues. >>>>>>>>> AFAIK Linux itself still doesn't use VC at all, and >>>>>>>>> 425c1b223dac added >>>>>>>>> a fair bit of code. I wonder if we're restoring something out of >>>>>>>>> order or making some simple mistake in the way to restore VC >>>>>>>>> config. >>>>>>>> I don't really have any faith in that bisect report in commit >>>>>>>> c3e59ee4e766. To double check I dug out the card from that >>>>>>>> commit, >>>>>>>> installed an old Fedora release so I could build kernel v3.13, >>>>>>>> pre-dating 425c1b223dac and tested triggering a bus reset both via >>>>>>>> setpci and by masking PM reset so that sysfs can trigger the >>>>>>>> bus reset >>>>>>>> path with the kernel save/restore code. Both result in the system >>>>>>>> hanging when the device is accessed either restoring from the >>>>>>>> kernel >>>>>>>> bus reset or reading from the device after the setpci reset. >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Alex >>>>>>>> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-06-05 14:46 ` Ingmar Klein @ 2021-06-08 18:34 ` Pali Rohár 2021-06-09 17:07 ` Ingmar Klein 0 siblings, 1 reply; 18+ messages in thread From: Pali Rohár @ 2021-06-08 18:34 UTC (permalink / raw) To: Ingmar Klein Cc: bhelgaas, Bjorn Helgaas, Alex Williamson, linux-pci, linux-kernel Hello! So should I add also 0x003e device id in next patch iteration? On Saturday 05 June 2021 16:46:36 Ingmar Klein wrote: > Hi Pali and Bjorn, > > finally found the time to test. > Pali's v3 patch seems to work like a charm for my card with "0x003e" id > as well. > Just finished compiling a pve-kernel v5.11.21 with Pali's patch, > slightly adjusted for my test card and the Ubuntu kernel source (no > functional differences, just minor adjustments to make it fit the > Proxmox pve-kernel). > > System works just fine, in contrast to without patch. Of course, no long > term tests, yet. However, it is looking really good. > Thanks guys! > > Best regards, > Ingmar > > > Am 28.05.2021 um 20:47 schrieb Ingmar Klein: > > Hi Pali, > > sorry for not checking that detail! > > Of course no problem that you couldn't test that ID. Will be glad to > > do so. > > > > I'll let you know how this turns out. > > > > Best regards, > > Ingmar > > > > > > Am 28.05.2021 um 20:21 schrieb Pali Rohár: > > > Hello Ingmar! > > > > > > Now I see that in your patch you have Atheros card with id 0x003e: > > > https://lore.kernel.org/linux-pci/08982e05-b6e8-5a8d-24ab-da1488ee50a8@web.de/ > > > > > > > > > With my patch I have tested 5 different Atheros cards but none has id > > > 0x003e: > > > https://lore.kernel.org/linux-pci/20210505163357.16012-1-pali@kernel.org/ > > > > > > > > > So my patch does not fix that issue for your 0x003e card. I just do not > > > have such card for testing. > > > > > > Could you try to apply my patch and then add your id 0x003e into quirk > > > list if it helps? > > > > > > On Friday 28 May 2021 20:08:52 Ingmar Klein wrote: > > > > Thanks to both of you, Bjorn and Pali! > > > > I had hoped that Pali would come with an appropriate fix. Good to know, > > > > that this is taken care of. > > > > > > > > Will test ASAP, but I am confident, that it will work anyway. > > > > Should it unexpectedly not fix my issues, I'll let you know. > > > > Have a nice weekend! > > > > Best regards, > > > > Ingmar > > > > > > > > > > > > Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas: > > > > > On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: > > > > > > Hello! > > > > > > > > > > > > On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: > > > > > > > [cc +Pali] > > > > > > > > > > > > > > On Thu, 15 Apr 2021 20:02:23 +0200 > > > > > > > Ingmar Klein <ingmar_klein@web.de> wrote: > > > > > > > > > > > > > > > First thanks to you both, Alex and Bjorn! > > > > > > > > I am in no way an expert on this topic, so I have to fully rely > > > > > > > > on your > > > > > > > > feedback, concerning this issue. > > > > > > > > > > > > > > > > If you should have any other solution approach, in form of > > > > > > > > patch-set, I > > > > > > > > would be glad to test it out. Just let me know, what you think > > > > > > > > might > > > > > > > > make sense. > > > > > > > > I will wait for your further feedback on the issue. In the > > > > > > > > meantime I > > > > > > > > have my current workaround via quirk entry. > > > > > > > > > > > > > > > > By the way, my layman's question: > > > > > > > > Do you think, that the following topic might also apply for the > > > > > > > > QCA6174? > > > > > > > > https://www.spinics.net/lists/linux-pci/msg106395.html > > > > > > I have been testing more ath cards and I'm going to send a new > > > > > > version > > > > > > of this patch with including more PCI ids. > > > > > Dropping this patch in favor of Pali's new version. > > > > > > > > > > > > > Or in other words, should a similar approach be tried for the > > > > > > > > QCA6174 > > > > > > > > and if yes, would it bring any benefit at all? > > > > > > > > I hope you can excuse me, in case the questions should not make > > > > > > > > too much > > > > > > > > sense. > > > > > > > If you run lspci -vvv on your device, what do LnkCap and LnkSta > > > > > > > report > > > > > > > under the express capability? I wonder if your device even supports > > > > > > > > Gen1 speeds, mine does not. > > > > > > > I would not expect that patch to be relevant to you based on your > > > > > > > report. I understand it to resolve an issue during link > > > > > > > retraining to a > > > > > > > higher speed on boot, not during a bus reset. Pali can correct > > > > > > > if I'm > > > > > > > wrong. Thanks, > > > > > > These two issues are are related. Both operations (PCIe Hot Reset and > > > > > > PCIe Link Retraining) cause reset of ath chips. Seems that they cause > > > > > > double reset. After reset these chips reads configuration from > > > > > > internal > > > > > > EEPROM/OTP and if another reset is triggered prior chip finishes > > > > > > internal configuration read then it stops working. My testing showed > > > > > > that ath10k chips completely disappear from the PCIe bus, some ath9k > > > > > > chips works fine but starts reporting incorrect PCI ID (0xABCD) > > > > > > and some > > > > > > other ath9k chips reports correct PCI ID but does not work. I had > > > > > > discussion with Adrian Chadd who knows probably everything about > > > > > > ath9k > > > > > > and confirmed me that this issue is there with ath9k and ath10k > > > > > > chips. > > > > > > > > > > > > He wrote me that workaround to turn card back from this "broken" > > > > > > state > > > > > > is to do PCIe Cold Reset of the card, which means turning power > > > > > > supply > > > > > > off for particular PCIe slot. Such thing is not supported on many > > > > > > low-end boards, so workaround cannot be applied. > > > > > > > > > > > > I was able to recover my testing cards from this "broken" state by > > > > > > PCIe > > > > > > Warm Reset (= reset via PERST# pin). > > > > > > > > > > > > I have tried many other reset methods (PCIe PM reset, Link Down, PCIe > > > > > > Hot Reset with bigger internal, ...) but nothing worked. So seems > > > > > > that > > > > > > the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. > > > > > > > > > > > > I will send V2 of my patch with details and explanation. > > > > > > > > > > > > As kernel does not have API for doing PCIe Warm Reset, I think is > > > > > > another argument why kernel really needs it. > > > > > > > > > > > > I do not have any QCA6174 card for testing, but based on the fact I > > > > > > reproduced this issue with more ath9k and ath10 cards and Adrian > > > > > > confirmed that above reset issue is there, I think that it affects > > > > > > all > > > > > > AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. > > > > > > > > > > > > I was told that AMI BIOS was patching their BIOSes found in > > > > > > notebooks to > > > > > > avoid triggering this issue on notebooks ath9k cards. > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > Am 15.04.2021 um 04:36 schrieb Alex Williamson: > > > > > > > > > On Wed, 14 Apr 2021 16:03:50 -0500 > > > > > > > > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > > > > > > > > > > > > [+cc Alex] > > > > > > > > > > > > > > > > > > > > On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > > > > > > > > > > > Edit: Retry, as I did not consider, that my mail-client would > > > > > > > > > > > make this > > > > > > > > > > > party html. > > > > > > > > > > > > > > > > > > > > > > Dear maintainers, > > > > > > > > > > > I recently encountered an issue on my Proxmox server system, > > > > > > > > > > > that > > > > > > > > > > > includes a Qualcomm QCA6174 m.2 PCIe wifi module. > > > > > > > > > > > https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > > > > > > > > > > > > > > > > > > > > > > On system boot and subsequent virtual machine start (with > > > > > > > > > > > passed-through > > > > > > > > > > > QCA6174), the VM would just freeze/hang, at the point where > > > > > > > > > > > the ath10k > > > > > > > > > > > driver loads. > > > > > > > > > > > Quick search in the proxmox related topics, brought me to the > > > > > > > > > > > following > > > > > > > > > > > discussion, which suggested a PCI quirk entry for the QCA6174 > > > > > > > > > > > in the kernel: > > > > > > > > > > > https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I then went ahead, got the Proxmox kernel source (v5.4.106) > > > > > > > > > > > and applied > > > > > > > > > > > the attached patch. > > > > > > > > > > > Effect was as hoped, that the VM hangs are now gone. System > > > > > > > > > > > boots and > > > > > > > > > > > runs as intended. > > > > > > > > > > > > > > > > > > > > > > Judging by the existing quirk entries for Atheros, I would > > > > > > > > > > > think, that > > > > > > > > > > > my proposed "fix" could be included in the vanilla kernel. > > > > > > > > > > > As far as I saw, there is no entry yet, even in the latest > > > > > > > > > > > kernel sources. > > > > > > > > > > This would need a signed-off-by; see > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is an old issue, and likely we'll end up just applying > > > > > > > > > > this as > > > > > > > > > > yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark > > > > > > > > > > Atheros > > > > > > > > > > AR93xx to avoid bus reset"), where it started, it seems to be > > > > > > > > > > connected to 425c1b223dac ("PCI: Add Virtual Channel to > > > > > > > > > > save/restore > > > > > > > > > > support"). > > > > > > > > > > > > > > > > > > > > I'd like to dig into that a bit more to see if there are any > > > > > > > > > > clues. > > > > > > > > > > AFAIK Linux itself still doesn't use VC at all, and > > > > > > > > > > 425c1b223dac added > > > > > > > > > > a fair bit of code. I wonder if we're restoring something out of > > > > > > > > > > order or making some simple mistake in the way to restore VC > > > > > > > > > > config. > > > > > > > > > I don't really have any faith in that bisect report in commit > > > > > > > > > c3e59ee4e766. To double check I dug out the card from that > > > > > > > > > commit, > > > > > > > > > installed an old Fedora release so I could build kernel v3.13, > > > > > > > > > pre-dating 425c1b223dac and tested triggering a bus reset both via > > > > > > > > > setpci and by masking PM reset so that sysfs can trigger the > > > > > > > > > bus reset > > > > > > > > > path with the kernel save/restore code. Both result in the system > > > > > > > > > hanging when the device is accessed either restoring from the > > > > > > > > > kernel > > > > > > > > > bus reset or reading from the device after the setpci reset. > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Alex > > > > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-06-08 18:34 ` Pali Rohár @ 2021-06-09 17:07 ` Ingmar Klein 0 siblings, 0 replies; 18+ messages in thread From: Ingmar Klein @ 2021-06-09 17:07 UTC (permalink / raw) To: Pali Rohár Cc: bhelgaas, Bjorn Helgaas, Alex Williamson, linux-pci, linux-kernel Yes, would be really nice if you could do that. Seems to work perfectly fine. Thanks and have a nice rest of the day! Best regards, Ingmar Am 08.06.2021 um 20:34 schrieb Pali Rohár: > Hello! So should I add also 0x003e device id in next patch iteration? > > On Saturday 05 June 2021 16:46:36 Ingmar Klein wrote: >> Hi Pali and Bjorn, >> >> finally found the time to test. >> Pali's v3 patch seems to work like a charm for my card with "0x003e" id >> as well. >> Just finished compiling a pve-kernel v5.11.21 with Pali's patch, >> slightly adjusted for my test card and the Ubuntu kernel source (no >> functional differences, just minor adjustments to make it fit the >> Proxmox pve-kernel). >> >> System works just fine, in contrast to without patch. Of course, no long >> term tests, yet. However, it is looking really good. >> Thanks guys! >> >> Best regards, >> Ingmar >> >> >> Am 28.05.2021 um 20:47 schrieb Ingmar Klein: >>> Hi Pali, >>> sorry for not checking that detail! >>> Of course no problem that you couldn't test that ID. Will be glad to >>> do so. >>> >>> I'll let you know how this turns out. >>> >>> Best regards, >>> Ingmar >>> >>> >>> Am 28.05.2021 um 20:21 schrieb Pali Rohár: >>>> Hello Ingmar! >>>> >>>> Now I see that in your patch you have Atheros card with id 0x003e: >>>> https://lore.kernel.org/linux-pci/08982e05-b6e8-5a8d-24ab-da1488ee50a8@web.de/ >>>> >>>> >>>> With my patch I have tested 5 different Atheros cards but none has id >>>> 0x003e: >>>> https://lore.kernel.org/linux-pci/20210505163357.16012-1-pali@kernel.org/ >>>> >>>> >>>> So my patch does not fix that issue for your 0x003e card. I just do not >>>> have such card for testing. >>>> >>>> Could you try to apply my patch and then add your id 0x003e into quirk >>>> list if it helps? >>>> >>>> On Friday 28 May 2021 20:08:52 Ingmar Klein wrote: >>>>> Thanks to both of you, Bjorn and Pali! >>>>> I had hoped that Pali would come with an appropriate fix. Good to know, >>>>> that this is taken care of. >>>>> >>>>> Will test ASAP, but I am confident, that it will work anyway. >>>>> Should it unexpectedly not fix my issues, I'll let you know. >>>>> Have a nice weekend! >>>>> Best regards, >>>>> Ingmar >>>>> >>>>> >>>>> Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas: >>>>>> On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: >>>>>>> Hello! >>>>>>> >>>>>>> On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: >>>>>>>> [cc +Pali] >>>>>>>> >>>>>>>> On Thu, 15 Apr 2021 20:02:23 +0200 >>>>>>>> Ingmar Klein <ingmar_klein@web.de> wrote: >>>>>>>> >>>>>>>>> First thanks to you both, Alex and Bjorn! >>>>>>>>> I am in no way an expert on this topic, so I have to fully rely >>>>>>>>> on your >>>>>>>>> feedback, concerning this issue. >>>>>>>>> >>>>>>>>> If you should have any other solution approach, in form of >>>>>>>>> patch-set, I >>>>>>>>> would be glad to test it out. Just let me know, what you think >>>>>>>>> might >>>>>>>>> make sense. >>>>>>>>> I will wait for your further feedback on the issue. In the >>>>>>>>> meantime I >>>>>>>>> have my current workaround via quirk entry. >>>>>>>>> >>>>>>>>> By the way, my layman's question: >>>>>>>>> Do you think, that the following topic might also apply for the >>>>>>>>> QCA6174? >>>>>>>>> https://www.spinics.net/lists/linux-pci/msg106395.html >>>>>>> I have been testing more ath cards and I'm going to send a new >>>>>>> version >>>>>>> of this patch with including more PCI ids. >>>>>> Dropping this patch in favor of Pali's new version. >>>>>> >>>>>>>>> Or in other words, should a similar approach be tried for the >>>>>>>>> QCA6174 >>>>>>>>> and if yes, would it bring any benefit at all? >>>>>>>>> I hope you can excuse me, in case the questions should not make >>>>>>>>> too much >>>>>>>>> sense. >>>>>>>> If you run lspci -vvv on your device, what do LnkCap and LnkSta >>>>>>>> report >>>>>>>> under the express capability? I wonder if your device even supports >>>>>>>>> Gen1 speeds, mine does not. >>>>>>>> I would not expect that patch to be relevant to you based on your >>>>>>>> report. I understand it to resolve an issue during link >>>>>>>> retraining to a >>>>>>>> higher speed on boot, not during a bus reset. Pali can correct >>>>>>>> if I'm >>>>>>>> wrong. Thanks, >>>>>>> These two issues are are related. Both operations (PCIe Hot Reset and >>>>>>> PCIe Link Retraining) cause reset of ath chips. Seems that they cause >>>>>>> double reset. After reset these chips reads configuration from >>>>>>> internal >>>>>>> EEPROM/OTP and if another reset is triggered prior chip finishes >>>>>>> internal configuration read then it stops working. My testing showed >>>>>>> that ath10k chips completely disappear from the PCIe bus, some ath9k >>>>>>> chips works fine but starts reporting incorrect PCI ID (0xABCD) >>>>>>> and some >>>>>>> other ath9k chips reports correct PCI ID but does not work. I had >>>>>>> discussion with Adrian Chadd who knows probably everything about >>>>>>> ath9k >>>>>>> and confirmed me that this issue is there with ath9k and ath10k >>>>>>> chips. >>>>>>> >>>>>>> He wrote me that workaround to turn card back from this "broken" >>>>>>> state >>>>>>> is to do PCIe Cold Reset of the card, which means turning power >>>>>>> supply >>>>>>> off for particular PCIe slot. Such thing is not supported on many >>>>>>> low-end boards, so workaround cannot be applied. >>>>>>> >>>>>>> I was able to recover my testing cards from this "broken" state by >>>>>>> PCIe >>>>>>> Warm Reset (= reset via PERST# pin). >>>>>>> >>>>>>> I have tried many other reset methods (PCIe PM reset, Link Down, PCIe >>>>>>> Hot Reset with bigger internal, ...) but nothing worked. So seems >>>>>>> that >>>>>>> the only workaround is to do PCIe Cold Reset or PCIe Warm Reset. >>>>>>> >>>>>>> I will send V2 of my patch with details and explanation. >>>>>>> >>>>>>> As kernel does not have API for doing PCIe Warm Reset, I think is >>>>>>> another argument why kernel really needs it. >>>>>>> >>>>>>> I do not have any QCA6174 card for testing, but based on the fact I >>>>>>> reproduced this issue with more ath9k and ath10 cards and Adrian >>>>>>> confirmed that above reset issue is there, I think that it affects >>>>>>> all >>>>>>> AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers. >>>>>>> >>>>>>> I was told that AMI BIOS was patching their BIOSes found in >>>>>>> notebooks to >>>>>>> avoid triggering this issue on notebooks ath9k cards. >>>>>>> >>>>>>>> Alex >>>>>>>> >>>>>>>>> Am 15.04.2021 um 04:36 schrieb Alex Williamson: >>>>>>>>>> On Wed, 14 Apr 2021 16:03:50 -0500 >>>>>>>>>> Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>>>>>>> >>>>>>>>>>> [+cc Alex] >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: >>>>>>>>>>>> Edit: Retry, as I did not consider, that my mail-client would >>>>>>>>>>>> make this >>>>>>>>>>>> party html. >>>>>>>>>>>> >>>>>>>>>>>> Dear maintainers, >>>>>>>>>>>> I recently encountered an issue on my Proxmox server system, >>>>>>>>>>>> that >>>>>>>>>>>> includes a Qualcomm QCA6174 m.2 PCIe wifi module. >>>>>>>>>>>> https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX >>>>>>>>>>>> >>>>>>>>>>>> On system boot and subsequent virtual machine start (with >>>>>>>>>>>> passed-through >>>>>>>>>>>> QCA6174), the VM would just freeze/hang, at the point where >>>>>>>>>>>> the ath10k >>>>>>>>>>>> driver loads. >>>>>>>>>>>> Quick search in the proxmox related topics, brought me to the >>>>>>>>>>>> following >>>>>>>>>>>> discussion, which suggested a PCI quirk entry for the QCA6174 >>>>>>>>>>>> in the kernel: >>>>>>>>>>>> https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I then went ahead, got the Proxmox kernel source (v5.4.106) >>>>>>>>>>>> and applied >>>>>>>>>>>> the attached patch. >>>>>>>>>>>> Effect was as hoped, that the VM hangs are now gone. System >>>>>>>>>>>> boots and >>>>>>>>>>>> runs as intended. >>>>>>>>>>>> >>>>>>>>>>>> Judging by the existing quirk entries for Atheros, I would >>>>>>>>>>>> think, that >>>>>>>>>>>> my proposed "fix" could be included in the vanilla kernel. >>>>>>>>>>>> As far as I saw, there is no entry yet, even in the latest >>>>>>>>>>>> kernel sources. >>>>>>>>>>> This would need a signed-off-by; see >>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is an old issue, and likely we'll end up just applying >>>>>>>>>>> this as >>>>>>>>>>> yet another quirk. But looking at c3e59ee4e766 ("PCI: Mark >>>>>>>>>>> Atheros >>>>>>>>>>> AR93xx to avoid bus reset"), where it started, it seems to be >>>>>>>>>>> connected to 425c1b223dac ("PCI: Add Virtual Channel to >>>>>>>>>>> save/restore >>>>>>>>>>> support"). >>>>>>>>>>> >>>>>>>>>>> I'd like to dig into that a bit more to see if there are any >>>>>>>>>>> clues. >>>>>>>>>>> AFAIK Linux itself still doesn't use VC at all, and >>>>>>>>>>> 425c1b223dac added >>>>>>>>>>> a fair bit of code. I wonder if we're restoring something out of >>>>>>>>>>> order or making some simple mistake in the way to restore VC >>>>>>>>>>> config. >>>>>>>>>> I don't really have any faith in that bisect report in commit >>>>>>>>>> c3e59ee4e766. To double check I dug out the card from that >>>>>>>>>> commit, >>>>>>>>>> installed an old Fedora release so I could build kernel v3.13, >>>>>>>>>> pre-dating 425c1b223dac and tested triggering a bus reset both via >>>>>>>>>> setpci and by masking PM reset so that sysfs can trigger the >>>>>>>>>> bus reset >>>>>>>>>> path with the kernel save/restore code. Both result in the system >>>>>>>>>> hanging when the device is accessed either restoring from the >>>>>>>>>> kernel >>>>>>>>>> bus reset or reading from the device after the setpci reset. >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Alex >>>>>>>>>> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-05-25 22:12 ` Bjorn Helgaas 2021-05-28 18:08 ` Ingmar Klein @ 2021-07-21 8:54 ` Pali Rohár 2021-08-20 23:22 ` Pali Rohár 1 sibling, 1 reply; 18+ messages in thread From: Pali Rohár @ 2021-07-21 8:54 UTC (permalink / raw) To: Bjorn Helgaas Cc: Alex Williamson, Ingmar Klein, bhelgaas, linux-pci, linux-kernel On Tuesday 25 May 2021 17:12:15 Bjorn Helgaas wrote: > On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: > > Hello! > > > > On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: > > > [cc +Pali] > > > > > > On Thu, 15 Apr 2021 20:02:23 +0200 > > > Ingmar Klein <ingmar_klein@web.de> wrote: > > > > > > > First thanks to you both, Alex and Bjorn! > > > > I am in no way an expert on this topic, so I have to fully rely on your > > > > feedback, concerning this issue. > > > > > > > > If you should have any other solution approach, in form of patch-set, I > > > > would be glad to test it out. Just let me know, what you think might > > > > make sense. > > > > I will wait for your further feedback on the issue. In the meantime I > > > > have my current workaround via quirk entry. > > > > > > > > By the way, my layman's question: > > > > Do you think, that the following topic might also apply for the QCA6174? > > > > https://www.spinics.net/lists/linux-pci/msg106395.html > > > > I have been testing more ath cards and I'm going to send a new version > > of this patch with including more PCI ids. > > Dropping this patch in favor of Pali's new version. Hello Bjorn! Seems that it would take much more time to finish my version of patch. So could you take Ingmar's patch with cc:stable tag for now, which just adds PCI device id into list of problematic devices? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-07-21 8:54 ` Pali Rohár @ 2021-08-20 23:22 ` Pali Rohár 2021-09-08 19:18 ` Ingmar Klein 0 siblings, 1 reply; 18+ messages in thread From: Pali Rohár @ 2021-08-20 23:22 UTC (permalink / raw) To: Bjorn Helgaas Cc: Alex Williamson, Ingmar Klein, bhelgaas, linux-pci, linux-kernel On Wednesday 21 July 2021 10:54:53 Pali Rohár wrote: > On Tuesday 25 May 2021 17:12:15 Bjorn Helgaas wrote: > > On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: > > > Hello! > > > > > > On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: > > > > [cc +Pali] > > > > > > > > On Thu, 15 Apr 2021 20:02:23 +0200 > > > > Ingmar Klein <ingmar_klein@web.de> wrote: > > > > > > > > > First thanks to you both, Alex and Bjorn! > > > > > I am in no way an expert on this topic, so I have to fully rely on your > > > > > feedback, concerning this issue. > > > > > > > > > > If you should have any other solution approach, in form of patch-set, I > > > > > would be glad to test it out. Just let me know, what you think might > > > > > make sense. > > > > > I will wait for your further feedback on the issue. In the meantime I > > > > > have my current workaround via quirk entry. > > > > > > > > > > By the way, my layman's question: > > > > > Do you think, that the following topic might also apply for the QCA6174? > > > > > https://www.spinics.net/lists/linux-pci/msg106395.html > > > > > > I have been testing more ath cards and I'm going to send a new version > > > of this patch with including more PCI ids. > > > > Dropping this patch in favor of Pali's new version. > > Hello Bjorn! Seems that it would take much more time to finish my > version of patch. So could you take Ingmar's patch with cc:stable tag > for now, which just adds PCI device id into list of problematic devices? Ping, gentle reminder... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-08-20 23:22 ` Pali Rohár @ 2021-09-08 19:18 ` Ingmar Klein 2021-09-08 20:35 ` Pali Rohár 0 siblings, 1 reply; 18+ messages in thread From: Ingmar Klein @ 2021-09-08 19:18 UTC (permalink / raw) To: Pali Rohár, Bjorn Helgaas Cc: Alex Williamson, bhelgaas, linux-pci, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1681 bytes --] Am 21.08.2021 um 01:22 schrieb Pali Rohár: > On Wednesday 21 July 2021 10:54:53 Pali Rohár wrote: >> On Tuesday 25 May 2021 17:12:15 Bjorn Helgaas wrote: >>> On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: >>>> Hello! >>>> >>>> On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: >>>>> [cc +Pali] >>>>> >>>>> On Thu, 15 Apr 2021 20:02:23 +0200 >>>>> Ingmar Klein <ingmar_klein@web.de> wrote: >>>>> >>>>>> First thanks to you both, Alex and Bjorn! >>>>>> I am in no way an expert on this topic, so I have to fully rely on your >>>>>> feedback, concerning this issue. >>>>>> >>>>>> If you should have any other solution approach, in form of patch-set, I >>>>>> would be glad to test it out. Just let me know, what you think might >>>>>> make sense. >>>>>> I will wait for your further feedback on the issue. In the meantime I >>>>>> have my current workaround via quirk entry. >>>>>> >>>>>> By the way, my layman's question: >>>>>> Do you think, that the following topic might also apply for the QCA6174? >>>>>> https://www.spinics.net/lists/linux-pci/msg106395.html >>>> I have been testing more ath cards and I'm going to send a new version >>>> of this patch with including more PCI ids. >>> Dropping this patch in favor of Pali's new version. >> Hello Bjorn! Seems that it would take much more time to finish my >> version of patch. So could you take Ingmar's patch with cc:stable tag >> for now, which just adds PCI device id into list of problematic devices? > Ping, gentle reminder... Hi Pali and Bjorn, here is the original trivial patch again. For the moment, this could do. Thank you! Best regards, Ingmar [-- Attachment #2: qualcomm_qca6174_add_pci_quirks_signed_off.patch --] [-- Type: text/plain, Size: 696 bytes --] Signed-off-by: Ingmar Klein <ingmar_klein@web.de> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 706f27a86a8e..ecfe80ec5b9c 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3584,6 +3584,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset); /* * Root port on some Cavium CN8xxx chips do not successfully complete a bus ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-09-08 19:18 ` Ingmar Klein @ 2021-09-08 20:35 ` Pali Rohár 0 siblings, 0 replies; 18+ messages in thread From: Pali Rohár @ 2021-09-08 20:35 UTC (permalink / raw) To: Ingmar Klein Cc: Bjorn Helgaas, Alex Williamson, bhelgaas, linux-pci, linux-kernel On Wednesday 08 September 2021 21:18:00 Ingmar Klein wrote: > Am 21.08.2021 um 01:22 schrieb Pali Rohár: > > On Wednesday 21 July 2021 10:54:53 Pali Rohár wrote: > > > On Tuesday 25 May 2021 17:12:15 Bjorn Helgaas wrote: > > > > On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote: > > > > > Hello! > > > > > > > > > > On Thursday 15 April 2021 13:01:19 Alex Williamson wrote: > > > > > > [cc +Pali] > > > > > > > > > > > > On Thu, 15 Apr 2021 20:02:23 +0200 > > > > > > Ingmar Klein <ingmar_klein@web.de> wrote: > > > > > > > > > > > > > First thanks to you both, Alex and Bjorn! > > > > > > > I am in no way an expert on this topic, so I have to fully rely on your > > > > > > > feedback, concerning this issue. > > > > > > > > > > > > > > If you should have any other solution approach, in form of patch-set, I > > > > > > > would be glad to test it out. Just let me know, what you think might > > > > > > > make sense. > > > > > > > I will wait for your further feedback on the issue. In the meantime I > > > > > > > have my current workaround via quirk entry. > > > > > > > > > > > > > > By the way, my layman's question: > > > > > > > Do you think, that the following topic might also apply for the QCA6174? > > > > > > > https://www.spinics.net/lists/linux-pci/msg106395.html > > > > > I have been testing more ath cards and I'm going to send a new version > > > > > of this patch with including more PCI ids. > > > > Dropping this patch in favor of Pali's new version. > > > Hello Bjorn! Seems that it would take much more time to finish my > > > version of patch. So could you take Ingmar's patch with cc:stable tag > > > for now, which just adds PCI device id into list of problematic devices? > > Ping, gentle reminder... > > Hi Pali and Bjorn, > > here is the original trivial patch again. > For the moment, this could do. > > Thank you! > Best regards, > Ingmar > > Signed-off-by: Ingmar Klein <ingmar_klein@web.de> Reviewed-by: Pali Rohár <pali@kernel.org> Cc: stable@vger.kernel.org > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 706f27a86a8e..ecfe80ec5b9c 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3584,6 +3584,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset); > > /* > * Root port on some Cavium CN8xxx chips do not successfully complete a bus ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: QCA6174 pcie wifi: Add pci quirks 2021-04-09 9:26 ` QCA6174 pcie wifi: Add pci quirks Ingmar Klein 2021-04-14 21:03 ` Bjorn Helgaas @ 2021-09-14 21:11 ` Bjorn Helgaas 1 sibling, 0 replies; 18+ messages in thread From: Bjorn Helgaas @ 2021-09-14 21:11 UTC (permalink / raw) To: Ingmar Klein; +Cc: bhelgaas, linux-pci, linux-kernel On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote: > Edit: Retry, as I did not consider, that my mail-client would make this > party html. > > Dear maintainers, > I recently encountered an issue on my Proxmox server system, that > includes a Qualcomm QCA6174 m.2 PCIe wifi module. > https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX > > On system boot and subsequent virtual machine start (with passed-through > QCA6174), the VM would just freeze/hang, at the point where the ath10k > driver loads. > Quick search in the proxmox related topics, brought me to the following > discussion, which suggested a PCI quirk entry for the QCA6174 in the kernel: > https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/ > > I then went ahead, got the Proxmox kernel source (v5.4.106) and applied > the attached patch. > Effect was as hoped, that the VM hangs are now gone. System boots and > runs as intended. > > Judging by the existing quirk entries for Atheros, I would think, that > my proposed "fix" could be included in the vanilla kernel. > As far as I saw, there is no entry yet, even in the latest kernel sources. > > Thank you very much! > Best regards, > Ingmar I wrote a commit log and applied this to pci/virtualization for v5.16 with Pali's reviewed-by and your signed-off-by from https://lore.kernel.org/r/408e5b45-3eaa-fa13-318d-48f7d1ecdacf@web.de PCI: Mark Atheros QCA6174 to avoid bus reset When passing the Atheros QCA6174 through to a virtual machine, the VM hangs at the point where the ath10k driver loads. Add a quirk to avoid bus resets on this device, which avoids the hang. [bhelgaas: commit log] Link: https://lore.kernel.org/r/08982e05-b6e8-5a8d-24ab-da1488ee50a8@web.de Signed-off-by: Ingmar Klein <ingmar_klein@web.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Pali Rohár <pali@kernel.org> Cc: stable@vger.kernel.org Thank you! > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 706f27a86a8e..ecfe80ec5b9c 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3584,6 +3584,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset); > > /* > * Root port on some Cavium CN8xxx chips do not successfully complete a bus ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-09-14 21:11 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <d74205a4-8a69-c383-e265-1ed5b8508422@web.de> 2021-04-09 9:26 ` QCA6174 pcie wifi: Add pci quirks Ingmar Klein 2021-04-14 21:03 ` Bjorn Helgaas 2021-04-15 2:36 ` Alex Williamson 2021-04-15 18:02 ` Ingmar Klein 2021-04-15 19:01 ` Alex Williamson 2021-04-15 19:53 ` Pali Rohár 2021-05-25 22:12 ` Bjorn Helgaas 2021-05-28 18:08 ` Ingmar Klein 2021-05-28 18:21 ` Pali Rohár 2021-05-28 18:47 ` Ingmar Klein 2021-06-05 14:46 ` Ingmar Klein 2021-06-08 18:34 ` Pali Rohár 2021-06-09 17:07 ` Ingmar Klein 2021-07-21 8:54 ` Pali Rohár 2021-08-20 23:22 ` Pali Rohár 2021-09-08 19:18 ` Ingmar Klein 2021-09-08 20:35 ` Pali Rohár 2021-09-14 21:11 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).