* [PATCH] PCI: pciehp: Fix false command timeouts on boot
@ 2019-04-10 18:07 Spencer Lingard
2019-04-14 19:36 ` Lukas Wunner
0 siblings, 1 reply; 4+ messages in thread
From: Spencer Lingard @ 2019-04-10 18:07 UTC (permalink / raw)
To: lukas; +Cc: linux-pci, Spencer Lingard
During command writes, pcie_poll_cmd() is invoked if
Command Completed notifications are disabled. When polling,
if Command Completed is set, the bit is reset and success is returned,
but ctrl->cmd_busy is not set back to 0. The next command write
then attempts to wait on a command that has already been completed,
timing out after 2 seconds. This delay occurs more frequently at
boot time, since pcie_init() disables notifications when powering
down empty slots.
Clear cmd_busy upon successful command completion during
pcie_poll_cmd().
Signed-off-by: Spencer Lingard <spencer@mellanox.com>
Cc: Lukas Wunner <lukas@wunner.de>
---
drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 6a2365c..28c70cf 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -77,6 +77,7 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout)
if (slot_status & PCI_EXP_SLTSTA_CC) {
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_CC);
+ ctrl->cmd_busy = 0;
return 1;
}
if (timeout < 0)
--
2.1.2
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] PCI: pciehp: Fix false command timeouts on boot
2019-04-10 18:07 [PATCH] PCI: pciehp: Fix false command timeouts on boot Spencer Lingard
@ 2019-04-14 19:36 ` Lukas Wunner
2019-04-15 2:59 ` Lukas Wunner
0 siblings, 1 reply; 4+ messages in thread
From: Lukas Wunner @ 2019-04-14 19:36 UTC (permalink / raw)
To: Spencer Lingard
Cc: linux-pci, Mika Westerberg, Sergey Miroshnichenko, Keith Busch
[cc += Mika, Sergey, Keith]
On Wed, Apr 10, 2019 at 02:07:48PM -0400, Spencer Lingard wrote:
> During command writes, pcie_poll_cmd() is invoked if
> Command Completed notifications are disabled. When polling,
> if Command Completed is set, the bit is reset and success is returned,
> but ctrl->cmd_busy is not set back to 0. The next command write
> then attempts to wait on a command that has already been completed,
> timing out after 2 seconds. This delay occurs more frequently at
> boot time, since pcie_init() disables notifications when powering
> down empty slots.
>
> Clear cmd_busy upon successful command completion during
> pcie_poll_cmd().
>
> Signed-off-by: Spencer Lingard <spencer@mellanox.com>
I suppose this can happen if a write to the Slot Control register is
performed while HPIE and/or CCIE is disabled, the two notifications
are subsequently enabled and another write to the Slot Control is
performed. That second write will then call wait_event_timeout()
because of the stale ctrl->cmd_busy == 1, but the Command Complete
notification has already happened and was cleared by pcie_poll_cmd(),
hence it times out.
Sounds reasonable, I'm a little suprised though that I've never seen
this myself. I guess we've been doing this wrong for years, so:
Cc: stable@vger.kernel.org
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Thanks,
Lukas
> ---
> drivers/pci/hotplug/pciehp_hpc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 6a2365c..28c70cf 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -77,6 +77,7 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout)
> if (slot_status & PCI_EXP_SLTSTA_CC) {
> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> PCI_EXP_SLTSTA_CC);
> + ctrl->cmd_busy = 0;
> return 1;
> }
> if (timeout < 0)
> --
> 2.1.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] PCI: pciehp: Fix false command timeouts on boot
2019-04-14 19:36 ` Lukas Wunner
@ 2019-04-15 2:59 ` Lukas Wunner
2019-04-16 18:32 ` Spencer Lingard
0 siblings, 1 reply; 4+ messages in thread
From: Lukas Wunner @ 2019-04-15 2:59 UTC (permalink / raw)
To: Spencer Lingard
Cc: linux-pci, Mika Westerberg, Sergey Miroshnichenko, Keith Busch
On Sun, Apr 14, 2019 at 09:36:41PM +0200, Lukas Wunner wrote:
> I suppose this can happen if a write to the Slot Control register is
> performed while HPIE and/or CCIE is disabled, the two notifications
> are subsequently enabled and another write to the Slot Control is
> performed. That second write will then call wait_event_timeout()
> because of the stale ctrl->cmd_busy == 1, but the Command Complete
> notification has already happened and was cleared by pcie_poll_cmd(),
> hence it times out.
>
> Sounds reasonable, I'm a little suprised though that I've never seen
> this myself. I guess we've been doing this wrong for years, so:
On second thought, it's not surprising at all that I never saw this
because Thunderbolt sets NoCompl+, so doesn't use Command Complete
notifications.
I suspect that even though we've been doing this wrong for a long time,
the bug was exposed by a recent change to pciehp. Do you happen to
know since which kernel version or commit you've been witnessing the
timeouts?
Thanks,
Lukas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] PCI: pciehp: Fix false command timeouts on boot
2019-04-15 2:59 ` Lukas Wunner
@ 2019-04-16 18:32 ` Spencer Lingard
0 siblings, 0 replies; 4+ messages in thread
From: Spencer Lingard @ 2019-04-16 18:32 UTC (permalink / raw)
To: Lukas Wunner
Cc: linux-pci, Mika Westerberg, Sergey Miroshnichenko, Keith Busch
On 4/14/2019 10:59 PM, Lukas Wunner wrote:
> On Sun, Apr 14, 2019 at 09:36:41PM +0200, Lukas Wunner wrote:
>> I suppose this can happen if a write to the Slot Control register is
>> performed while HPIE and/or CCIE is disabled, the two notifications
>> are subsequently enabled and another write to the Slot Control is
>> performed. That second write will then call wait_event_timeout()
>> because of the stale ctrl->cmd_busy == 1, but the Command Complete
>> notification has already happened and was cleared by pcie_poll_cmd(),
>> hence it times out.
>>
>> Sounds reasonable, I'm a little suprised though that I've never seen
>> this myself. I guess we've been doing this wrong for years, so:
> On second thought, it's not surprising at all that I never saw this
> because Thunderbolt sets NoCompl+, so doesn't use Command Complete
> notifications.
>
> I suspect that even though we've been doing this wrong for a long time,
> the bug was exposed by a recent change to pciehp. Do you happen to
> know since which kernel version or commit you've been witnessing the
> timeouts?
Hi Lukas, thank you for your time.
We started seeing these timeouts when we went to 4.20.5 from 4.14.61.
In pcie_init(), there's a check that turns off a slot if it's powered on
but unoccupied. Before 4e6a13356f1c ("PCI: pciehp: Deduplicate presence
check on probe & resume"), that power check was at the end of
pcie_probe(), after the IRQ is requested. I've investigated a little and
found that the delays go away if the power check is moved back where it
was before that commit.
- Spencer
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-04-16 18:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-10 18:07 [PATCH] PCI: pciehp: Fix false command timeouts on boot Spencer Lingard
2019-04-14 19:36 ` Lukas Wunner
2019-04-15 2:59 ` Lukas Wunner
2019-04-16 18:32 ` Spencer Lingard
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.