* Enumeration issue with QCA9005 AR9462 @ 2018-08-20 23:06 Bjorn Helgaas 2018-08-21 5:47 ` Lukas Wunner 0 siblings, 1 reply; 5+ messages in thread From: Bjorn Helgaas @ 2018-08-20 23:06 UTC (permalink / raw) To: linux-pci; +Cc: Lukas Wunner, linux-kernel, mmyangfl mmyangfl@gmail.com reported a problem [1]: on v4.17, a QCA9005 AR9462 wifi device was present at boot, but disappeared after suspend/resume. He/she also tested a recent kernel (5c60a7389d79, from Thu Aug 16), where the suspend/resume problem doesn't seem to happen, but the wifi device isn't enumerated correctly at boot-time. pci 0000:00:1c.3: PCIe Root Port to [bus 03-07] pci 0000:03:00.0: [1ae9:0101] PCIe Switch Upstream Port to [bus 04-07] pci 0000:04:00.0: [1ae9:0200] PCIe Switch Downstream Port to [bus 05] pci 0000:05:00.0: [168c:0034] QCA9005 AR9462 wifi NIC These messages are extracted from [2]: [ 0.928714] pciehp 0000:04:00.0:pcie204: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ [ 0.928752] pciehp 0000:04:00.0:pcie204: Slot(0-1): Card not present [ 0.928811] pciehp 0000:04:00.0:pcie204: Slot(0-1): Link Up [ 0.928815] pciehp 0000:04:00.0:pcie204: Slot(0-1): No adapter After manually executing: echo 1 > /sys/devices/pci0000\:00/0000\:00\:1c.3/0000\:03\:00.0/0000\:04\:00.0/rescan the wifi NIC is discovered correctly: [ 114.649896] pci 0000:05:00.0: [168c:0034] type 00 class 0x028000 [ 114.649977] pci 0000:05:00.0: reg 0x10: [mem 0xf7400000-0xf747ffff 64bit] [ 114.650090] pci 0000:05:00.0: reg 0x30: [mem 0xf7480000-0xf748ffff pref] [1] https://bugzilla.kernel.org/show_bug.cgi?id=200839 [2] https://bugzilla.kernel.org/attachment.cgi?id=277923 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Enumeration issue with QCA9005 AR9462 2018-08-20 23:06 Enumeration issue with QCA9005 AR9462 Bjorn Helgaas @ 2018-08-21 5:47 ` Lukas Wunner 2018-08-21 7:25 ` Lukas Wunner 0 siblings, 1 reply; 5+ messages in thread From: Lukas Wunner @ 2018-08-21 5:47 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, mmyangfl On Mon, Aug 20, 2018 at 06:06:24PM -0500, Bjorn Helgaas wrote: > mmyangfl@gmail.com reported a problem [1]: on v4.17, a QCA9005 AR9462 > wifi device was present at boot, but disappeared after suspend/resume. > > He also tested a recent kernel (5c60a7389d79, from Thu Aug 16), > where the suspend/resume problem doesn't seem to happen, but the wifi > device isn't enumerated correctly at boot-time. > > [ 0.928714] pciehp 0000:04:00.0:pcie204: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ > [ 0.928752] pciehp 0000:04:00.0:pcie204: Slot(0-1): Card not present > [ 0.928811] pciehp 0000:04:00.0:pcie204: Slot(0-1): Link Up > [ 0.928815] pciehp 0000:04:00.0:pcie204: Slot(0-1): No adapter > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=200839 > [2] https://bugzilla.kernel.org/attachment.cgi?id=277923 The hardware appears to be broken in that the Presence Detect State bit in the Slot Status register is 0 (Slot Empty) even though the slot is occupied. Thus, as of v4.19, pciehp will initially consider the slot to be in ON_STATE when it probes (because there are enumerated children). It then looks at the PDS bit, sees that it's 0, believes that there is no longer anything in the slot and synthesizes a Presence Detect Changed event to bring down the slot. The IRQ thread then removes the device in the slot, sees that the link is up, tries to bring the slot up again, but that fails because __pciehp_enable_slot() complains that the Presence Detect State bit isn't set ("No adapter"). The slot is then considered to be in OFF_STATE by pciehp, even though the rescan made the device reappear behind pciehp's back. On resume from system sleep, pciehp sees that the Presence Detect State bit in the Slot Status register is still 0, and because it's already in OFF_STATE, there's nothing to do. Up until v4.18, an unoccupied slot was only brought down on resume: /* Check if slot is occupied */ pciehp_get_adapter_status(slot, &status); mutex_lock(&slot->hotplug_lock); if (status) pciehp_enable_slot(slot); else pciehp_disable_slot(slot); mutex_unlock(&slot->hotplug_lock); From v4.19, this is now also done on probe for consistency. The above hypothesis is confirmed by the lspci -vv output: LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt+ ABWMgmt- ^^^^^^^^^ SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- ^^^^^^^^ Possible solutions: (a) Be lenient towards broken hardware and accept DLActive+ as a proxy for PresDet+. (b) Add a blacklist to pciehp such that it doesn't bind to [1ae9:0200]. The bug reporter writes that "it's a single Half Mini PCIe card, with two chipsets (Wil6110? + AR9462) combined by a PCIe hub". This sounds like it's not really hotpluggable. (Is Mini PCIe hotplug capable at all?) Let me go through the driver and see if (a) is feasible and how intrusive it would be. Thanks, Lukas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Enumeration issue with QCA9005 AR9462 2018-08-21 5:47 ` Lukas Wunner @ 2018-08-21 7:25 ` Lukas Wunner 2018-08-21 16:50 ` Bjorn Helgaas 0 siblings, 1 reply; 5+ messages in thread From: Lukas Wunner @ 2018-08-21 7:25 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, mmyangfl On Tue, Aug 21, 2018 at 07:47:04AM +0200, Lukas Wunner wrote: > On Mon, Aug 20, 2018 at 06:06:24PM -0500, Bjorn Helgaas wrote: > > mmyangfl@gmail.com reported a problem [1]: on v4.17, a QCA9005 AR9462 > > wifi device was present at boot, but disappeared after suspend/resume. > > > > He also tested a recent kernel (5c60a7389d79, from Thu Aug 16), > > where the suspend/resume problem doesn't seem to happen, but the wifi > > device isn't enumerated correctly at boot-time. > > > > [ 0.928714] pciehp 0000:04:00.0:pcie204: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ > > [ 0.928752] pciehp 0000:04:00.0:pcie204: Slot(0-1): Card not present > > [ 0.928811] pciehp 0000:04:00.0:pcie204: Slot(0-1): Link Up > > [ 0.928815] pciehp 0000:04:00.0:pcie204: Slot(0-1): No adapter > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=200839 > > [2] https://bugzilla.kernel.org/attachment.cgi?id=277923 > > The hardware appears to be broken in that the Presence Detect State bit > in the Slot Status register is 0 (Slot Empty) even though the slot is > occupied. [...] > Possible solutions: > > (a) Be lenient towards broken hardware and accept DLActive+ as a proxy > for PresDet+. > > (b) Add a blacklist to pciehp such that it doesn't bind to [1ae9:0200]. > The bug reporter writes that "it's a single Half Mini PCIe card, > with two chipsets (Wil6110? + AR9462) combined by a PCIe hub". > This sounds like it's not really hotpluggable. > (Is Mini PCIe hotplug capable at all?) > > Let me go through the driver and see if (a) is feasible and how intrusive > it would be. So (a) would seem to be feasible, we could add a quirk for devices like this to call pciehp_check_link_active() in pciehp_get_adapter_status(). Alternatively, we could generally add a call to pciehp_check_link_active() in get_adapter_status(), pciehp_check_presence() and pcie_init() and thus avoid a quirk for this specific device. The existing call in __pciehp_enable_slot() could actually be removed, this code path is only entered if either PDS or DLLLA is set. And the third option would be to add a quirk like quirk_hotplug_bridge() which sets is_hotplug_bridge = 0 on this broken device such that pciehp doesn't bind to it in the first place. Bjorn, please let me know which of these options you'd prefer. Thanks, Lukas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Enumeration issue with QCA9005 AR9462 2018-08-21 7:25 ` Lukas Wunner @ 2018-08-21 16:50 ` Bjorn Helgaas 2018-08-21 20:24 ` Rajat Jain 0 siblings, 1 reply; 5+ messages in thread From: Bjorn Helgaas @ 2018-08-21 16:50 UTC (permalink / raw) To: Lukas Wunner; +Cc: linux-pci, linux-kernel, mmyangfl, Ashok Raj, Rajat Jain [+cc Rajat, Ashok] On Tue, Aug 21, 2018 at 09:25:41AM +0200, Lukas Wunner wrote: > On Tue, Aug 21, 2018 at 07:47:04AM +0200, Lukas Wunner wrote: > > On Mon, Aug 20, 2018 at 06:06:24PM -0500, Bjorn Helgaas wrote: > > > mmyangfl@gmail.com reported a problem [1]: on v4.17, a QCA9005 AR9462 > > > wifi device was present at boot, but disappeared after suspend/resume. > > > > > > He also tested a recent kernel (5c60a7389d79, from Thu Aug 16), > > > where the suspend/resume problem doesn't seem to happen, but the wifi > > > device isn't enumerated correctly at boot-time. > > > > > > [ 0.928714] pciehp 0000:04:00.0:pcie204: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ > > > [ 0.928752] pciehp 0000:04:00.0:pcie204: Slot(0-1): Card not present > > > [ 0.928811] pciehp 0000:04:00.0:pcie204: Slot(0-1): Link Up > > > [ 0.928815] pciehp 0000:04:00.0:pcie204: Slot(0-1): No adapter > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=200839 > > > [2] https://bugzilla.kernel.org/attachment.cgi?id=277923 > > > > The hardware appears to be broken in that the Presence Detect State bit > > in the Slot Status register is 0 (Slot Empty) even though the slot is > > occupied. > [...] > > Possible solutions: > > > > (a) Be lenient towards broken hardware and accept DLActive+ as a proxy > > for PresDet+. > > > > (b) Add a blacklist to pciehp such that it doesn't bind to [1ae9:0200]. > > The bug reporter writes that "it's a single Half Mini PCIe card, > > with two chipsets (Wil6110? + AR9462) combined by a PCIe hub". > > This sounds like it's not really hotpluggable. > > (Is Mini PCIe hotplug capable at all?) > > > > Let me go through the driver and see if (a) is feasible and how intrusive > > it would be. > > So (a) would seem to be feasible, we could add a quirk for devices like > this to call pciehp_check_link_active() in pciehp_get_adapter_status(). > > Alternatively, we could generally add a call to pciehp_check_link_active() > in get_adapter_status(), pciehp_check_presence() and pcie_init() and thus > avoid a quirk for this specific device. > The existing call in __pciehp_enable_slot() could actually be removed, > this code path is only entered if either PDS or DLLLA is set. > > And the third option would be to add a quirk like quirk_hotplug_bridge() > which sets is_hotplug_bridge = 0 on this broken device such that pciehp > doesn't bind to it in the first place. It sounds like with (a), you could make this work without having a Wil6110-specific quirk, i.e., if the Link Status says the link is active, we assume a device is present. That seems reasonable to me and it sort of fits with these previous changes: 385895fef6b5 ("PCI: pciehp: Prioritize data-link event over presence detect") e48f1b67f668 ("PCI: pciehp: Use link change notifications for hot-plug and removal") Bjorn ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Enumeration issue with QCA9005 AR9462 2018-08-21 16:50 ` Bjorn Helgaas @ 2018-08-21 20:24 ` Rajat Jain 0 siblings, 0 replies; 5+ messages in thread From: Rajat Jain @ 2018-08-21 20:24 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lukas Wunner, linux-pci, Linux Kernel Mailing List, mmyangfl, ashok.raj On Tue, Aug 21, 2018 at 9:50 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Rajat, Ashok] > > On Tue, Aug 21, 2018 at 09:25:41AM +0200, Lukas Wunner wrote: > > On Tue, Aug 21, 2018 at 07:47:04AM +0200, Lukas Wunner wrote: > > > On Mon, Aug 20, 2018 at 06:06:24PM -0500, Bjorn Helgaas wrote: > > > > mmyangfl@gmail.com reported a problem [1]: on v4.17, a QCA9005 AR9462 > > > > wifi device was present at boot, but disappeared after suspend/resume. > > > > > > > > He also tested a recent kernel (5c60a7389d79, from Thu Aug 16), > > > > where the suspend/resume problem doesn't seem to happen, but the wifi > > > > device isn't enumerated correctly at boot-time. > > > > > > > > [ 0.928714] pciehp 0000:04:00.0:pcie204: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ > > > > [ 0.928752] pciehp 0000:04:00.0:pcie204: Slot(0-1): Card not present > > > > [ 0.928811] pciehp 0000:04:00.0:pcie204: Slot(0-1): Link Up > > > > [ 0.928815] pciehp 0000:04:00.0:pcie204: Slot(0-1): No adapter > > > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=200839 > > > > [2] https://bugzilla.kernel.org/attachment.cgi?id=277923 > > > > > > The hardware appears to be broken in that the Presence Detect State bit > > > in the Slot Status register is 0 (Slot Empty) even though the slot is > > > occupied. > > [...] > > > Possible solutions: > > > > > > (a) Be lenient towards broken hardware and accept DLActive+ as a proxy > > > for PresDet+. > > > > > > (b) Add a blacklist to pciehp such that it doesn't bind to [1ae9:0200]. > > > The bug reporter writes that "it's a single Half Mini PCIe card, > > > with two chipsets (Wil6110? + AR9462) combined by a PCIe hub". > > > This sounds like it's not really hotpluggable. > > > (Is Mini PCIe hotplug capable at all?) > > > > > > Let me go through the driver and see if (a) is feasible and how intrusive > > > it would be. > > > > So (a) would seem to be feasible, we could add a quirk for devices like > > this to call pciehp_check_link_active() in pciehp_get_adapter_status(). > > > > Alternatively, we could generally add a call to pciehp_check_link_active() > > in get_adapter_status(), pciehp_check_presence() and pcie_init() and thus > > avoid a quirk for this specific device. > > The existing call in __pciehp_enable_slot() could actually be removed, > > this code path is only entered if either PDS or DLLLA is set. > > > > And the third option would be to add a quirk like quirk_hotplug_bridge() > > which sets is_hotplug_bridge = 0 on this broken device such that pciehp > > doesn't bind to it in the first place. > > It sounds like with (a), you could make this work without having a > Wil6110-specific quirk, i.e., if the Link Status says the link is > active, we assume a device is present. That seems reasonable to me > and it sort of fits with these previous changes: I also like idea (a) and think it makes sense. One thing to note is that we may pass the same confusion ("how is my wifi detected when no card is not present on the slot") to userspace in case anyone looks in /sysfs (but I don't think anyone looks that deeply). Thanks, Rajat > > > 385895fef6b5 ("PCI: pciehp: Prioritize data-link event over presence detect") > e48f1b67f668 ("PCI: pciehp: Use link change notifications for hot-plug and removal") > > Bjorn ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-08-21 20:25 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-20 23:06 Enumeration issue with QCA9005 AR9462 Bjorn Helgaas 2018-08-21 5:47 ` Lukas Wunner 2018-08-21 7:25 ` Lukas Wunner 2018-08-21 16:50 ` Bjorn Helgaas 2018-08-21 20:24 ` Rajat Jain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).