linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Spencer Lingard <spencer@mellanox.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: linux-pci@vger.kernel.org,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Sergey Miroshnichenko <s.miroshnichenko@yadro.com>,
	Keith Busch <keith.busch@intel.com>
Subject: Re: [PATCH] PCI: pciehp: Fix false command timeouts on boot
Date: Tue, 16 Apr 2019 14:32:55 -0400	[thread overview]
Message-ID: <a24e3667-9aa6-1ee5-0746-c780523cc0bf@mellanox.com> (raw)
In-Reply-To: <20190415025908.bd45jqtp62eeuiuz@wunner.de>

On 4/14/2019 10:59 PM, Lukas Wunner wrote:
> On Sun, Apr 14, 2019 at 09:36:41PM +0200, Lukas Wunner wrote:
>> I suppose this can happen if a write to the Slot Control register is
>> performed while HPIE and/or CCIE is disabled, the two notifications
>> are subsequently enabled and another write to the Slot Control is
>> performed.  That second write will then call wait_event_timeout()
>> because of the stale ctrl->cmd_busy == 1, but the Command Complete
>> notification has already happened and was cleared by pcie_poll_cmd(),
>> hence it times out.
>>
>> Sounds reasonable, I'm a little suprised though that I've never seen
>> this myself.  I guess we've been doing this wrong for years, so:
> On second thought, it's not surprising at all that I never saw this
> because Thunderbolt sets NoCompl+, so doesn't use Command Complete
> notifications.
>
> I suspect that even though we've been doing this wrong for a long time,
> the bug was exposed by a recent change to pciehp.  Do you happen to
> know since which kernel version or commit you've been witnessing the
> timeouts?

Hi Lukas, thank you for your time.

We started seeing these timeouts when we went to 4.20.5 from 4.14.61.

In pcie_init(), there's a check that turns off a slot if it's powered on
but unoccupied. Before 4e6a13356f1c ("PCI: pciehp: Deduplicate presence
check on probe & resume"), that power check was at the end of
pcie_probe(), after the IRQ is requested. I've investigated a little and
found that the delays go away if the power check is moved back where it
was before that commit.

- Spencer

      reply	other threads:[~2019-04-16 18:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 18:07 [PATCH] PCI: pciehp: Fix false command timeouts on boot Spencer Lingard
2019-04-14 19:36 ` Lukas Wunner
2019-04-15  2:59   ` Lukas Wunner
2019-04-16 18:32     ` Spencer Lingard [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a24e3667-9aa6-1ee5-0746-c780523cc0bf@mellanox.com \
    --to=spencer@mellanox.com \
    --cc=keith.busch@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=s.miroshnichenko@yadro.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).