All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI: pciehp: Fix false command timeouts on boot
@ 2019-04-10 18:07 Spencer Lingard
  2019-04-14 19:36 ` Lukas Wunner
  0 siblings, 1 reply; 4+ messages in thread
From: Spencer Lingard @ 2019-04-10 18:07 UTC (permalink / raw)
  To: lukas; +Cc: linux-pci, Spencer Lingard

During command writes, pcie_poll_cmd() is invoked if
Command Completed notifications are disabled. When polling,
if Command Completed is set, the bit is reset and success is returned,
but ctrl->cmd_busy is not set back to 0. The next command write
then attempts to wait on a command that has already been completed,
timing out after 2 seconds. This delay occurs more frequently at
boot time, since pcie_init() disables notifications when powering
down empty slots.

Clear cmd_busy upon successful command completion during
pcie_poll_cmd().

Signed-off-by: Spencer Lingard <spencer@mellanox.com>
Cc: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_hpc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 6a2365c..28c70cf 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -77,6 +77,7 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout)
 		if (slot_status & PCI_EXP_SLTSTA_CC) {
 			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 						   PCI_EXP_SLTSTA_CC);
+			ctrl->cmd_busy = 0;
 			return 1;
 		}
 		if (timeout < 0)
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] PCI: pciehp: Fix false command timeouts on boot
  2019-04-10 18:07 [PATCH] PCI: pciehp: Fix false command timeouts on boot Spencer Lingard
@ 2019-04-14 19:36 ` Lukas Wunner
  2019-04-15  2:59   ` Lukas Wunner
  0 siblings, 1 reply; 4+ messages in thread
From: Lukas Wunner @ 2019-04-14 19:36 UTC (permalink / raw)
  To: Spencer Lingard
  Cc: linux-pci, Mika Westerberg, Sergey Miroshnichenko, Keith Busch

[cc += Mika, Sergey, Keith]

On Wed, Apr 10, 2019 at 02:07:48PM -0400, Spencer Lingard wrote:
> During command writes, pcie_poll_cmd() is invoked if
> Command Completed notifications are disabled. When polling,
> if Command Completed is set, the bit is reset and success is returned,
> but ctrl->cmd_busy is not set back to 0. The next command write
> then attempts to wait on a command that has already been completed,
> timing out after 2 seconds. This delay occurs more frequently at
> boot time, since pcie_init() disables notifications when powering
> down empty slots.
> 
> Clear cmd_busy upon successful command completion during
> pcie_poll_cmd().
> 
> Signed-off-by: Spencer Lingard <spencer@mellanox.com>

I suppose this can happen if a write to the Slot Control register is
performed while HPIE and/or CCIE is disabled, the two notifications
are subsequently enabled and another write to the Slot Control is
performed.  That second write will then call wait_event_timeout()
because of the stale ctrl->cmd_busy == 1, but the Command Complete
notification has already happened and was cleared by pcie_poll_cmd(),
hence it times out.

Sounds reasonable, I'm a little suprised though that I've never seen
this myself.  I guess we've been doing this wrong for years, so:

Cc: stable@vger.kernel.org
Reviewed-by: Lukas Wunner <lukas@wunner.de>

Thanks,

Lukas

> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 6a2365c..28c70cf 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -77,6 +77,7 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout)
>  		if (slot_status & PCI_EXP_SLTSTA_CC) {
>  			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>  						   PCI_EXP_SLTSTA_CC);
> +			ctrl->cmd_busy = 0;
>  			return 1;
>  		}
>  		if (timeout < 0)
> -- 
> 2.1.2

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] PCI: pciehp: Fix false command timeouts on boot
  2019-04-14 19:36 ` Lukas Wunner
@ 2019-04-15  2:59   ` Lukas Wunner
  2019-04-16 18:32     ` Spencer Lingard
  0 siblings, 1 reply; 4+ messages in thread
From: Lukas Wunner @ 2019-04-15  2:59 UTC (permalink / raw)
  To: Spencer Lingard
  Cc: linux-pci, Mika Westerberg, Sergey Miroshnichenko, Keith Busch

On Sun, Apr 14, 2019 at 09:36:41PM +0200, Lukas Wunner wrote:
> I suppose this can happen if a write to the Slot Control register is
> performed while HPIE and/or CCIE is disabled, the two notifications
> are subsequently enabled and another write to the Slot Control is
> performed.  That second write will then call wait_event_timeout()
> because of the stale ctrl->cmd_busy == 1, but the Command Complete
> notification has already happened and was cleared by pcie_poll_cmd(),
> hence it times out.
> 
> Sounds reasonable, I'm a little suprised though that I've never seen
> this myself.  I guess we've been doing this wrong for years, so:

On second thought, it's not surprising at all that I never saw this
because Thunderbolt sets NoCompl+, so doesn't use Command Complete
notifications.

I suspect that even though we've been doing this wrong for a long time,
the bug was exposed by a recent change to pciehp.  Do you happen to
know since which kernel version or commit you've been witnessing the
timeouts?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] PCI: pciehp: Fix false command timeouts on boot
  2019-04-15  2:59   ` Lukas Wunner
@ 2019-04-16 18:32     ` Spencer Lingard
  0 siblings, 0 replies; 4+ messages in thread
From: Spencer Lingard @ 2019-04-16 18:32 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, Mika Westerberg, Sergey Miroshnichenko, Keith Busch

On 4/14/2019 10:59 PM, Lukas Wunner wrote:
> On Sun, Apr 14, 2019 at 09:36:41PM +0200, Lukas Wunner wrote:
>> I suppose this can happen if a write to the Slot Control register is
>> performed while HPIE and/or CCIE is disabled, the two notifications
>> are subsequently enabled and another write to the Slot Control is
>> performed.  That second write will then call wait_event_timeout()
>> because of the stale ctrl->cmd_busy == 1, but the Command Complete
>> notification has already happened and was cleared by pcie_poll_cmd(),
>> hence it times out.
>>
>> Sounds reasonable, I'm a little suprised though that I've never seen
>> this myself.  I guess we've been doing this wrong for years, so:
> On second thought, it's not surprising at all that I never saw this
> because Thunderbolt sets NoCompl+, so doesn't use Command Complete
> notifications.
>
> I suspect that even though we've been doing this wrong for a long time,
> the bug was exposed by a recent change to pciehp.  Do you happen to
> know since which kernel version or commit you've been witnessing the
> timeouts?

Hi Lukas, thank you for your time.

We started seeing these timeouts when we went to 4.20.5 from 4.14.61.

In pcie_init(), there's a check that turns off a slot if it's powered on
but unoccupied. Before 4e6a13356f1c ("PCI: pciehp: Deduplicate presence
check on probe & resume"), that power check was at the end of
pcie_probe(), after the IRQ is requested. I've investigated a little and
found that the delays go away if the power check is moved back where it
was before that commit.

- Spencer

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-16 18:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-10 18:07 [PATCH] PCI: pciehp: Fix false command timeouts on boot Spencer Lingard
2019-04-14 19:36 ` Lukas Wunner
2019-04-15  2:59   ` Lukas Wunner
2019-04-16 18:32     ` Spencer Lingard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.