From: Joerg Roedel <joro@8bytes.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
Stuart Hayes <stuart.w.hayes@gmail.com>,
Austin Bolen <austin_bolen@dell.com>,
Keith Busch <kbusch@kernel.org>,
Alexandru Gagniuc <mr.nuke.me@gmail.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Mika Westerberg <mika.westerberg@linux.intel.com>,
Andy Shevchenko <andy.shevchenko@gmail.com>,
Sinan Kaya <okaya@kernel.org>,
Oza Pawandeep <poza@codeaurora.org>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
narendra_k@dell.com, Enzo Matsumiya <ematsumiya@suse.com>
Subject: Re: [PATCH v4] PCI: pciehp: Fix MSI interrupt race
Date: Fri, 20 Mar 2020 16:16:21 +0100 [thread overview]
Message-ID: <20200320151621.GA15468@8bytes.org> (raw)
In-Reply-To: <78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de>
On Wed, Feb 19, 2020 at 03:31:13PM +0100, Lukas Wunner wrote:
> From: Stuart Hayes <stuart.w.hayes@gmail.com>
>
> Without this commit, a PCIe hotplug port can stop generating interrupts
> on hotplug events, so device adds and removals will not be seen:
>
> The pciehp interrupt handler pciehp_isr() reads the Slot Status register
> and then writes back to it to clear the bits that caused the interrupt.
> If a different interrupt event bit gets set between the read and the
> write, pciehp_isr() returns without having cleared all of the interrupt
> event bits. If this happens when the MSI isn't masked (which by default
> it isn't in handle_edge_irq(), and which it will never be when MSI
> per-vector masking is not supported), we won't get any more hotplug
> interrupts from that device.
>
> That is expected behavior, according to the PCIe Base Spec r5.0, section
> 6.7.3.4, "Software Notification of Hot-Plug Events".
>
> Because the Presence Detect Changed and Data Link Layer State Changed
> event bits can both get set at nearly the same time when a device is
> added or removed, this is more likely to happen than it might seem.
> The issue was found (and can be reproduced rather easily) by connecting
> and disconnecting an NVMe storage device on at least one system model
> where the NVMe devices were being connected to an AMD PCIe port (PCI
> device 0x1022/0x1483).
>
> Fix the issue by modifying pciehp_isr() to loop back and re-read the
> Slot Status register immediately after writing to it, until it sees that
> all of the event status bits have been cleared.
>
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> [lukas: drop loop count limitation, write "events" instead of "status",
> don't loop back in INTx and poll modes, tweak code comment & commit msg]
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
Looks good to me:
Reviewed-by: Joerg Roedel <jroedel@suse.de>
> ---
> v4 (lukas):
> * drop "MAX_ISR_STATUS_READS" loop count limitation
> * drop unnecessary braces around PCI_EXP_SLTSTA_* flags
> * write "events" instead of "status" variable to Slot Status register
> to avoid unnecessary loop iterations if the same bit gets set
> repeatedly
> * don't loop back in INTx and poll modes
> * shorten and tweak code comment & commit message
>
> v3:
> * removed pvm_capable flag (from v2) since MSI may not be masked
> regardless of whether per-vector masking is supported
> * tweaked comments
>
> v2:
> * fixed ctrl_warn() call
> * improved comments
> * added pvm_capable flag and changed pciehp_isr() to loop back only when
> pvm_capable flag not set (suggested by Lukas Wunner)
>
> drivers/pci/hotplug/pciehp_hpc.c | 26 ++++++++++++++++++++------
> 1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 8a2cb1764386..f64d10df9eb5 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -527,7 +527,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
> struct controller *ctrl = (struct controller *)dev_id;
> struct pci_dev *pdev = ctrl_dev(ctrl);
> struct device *parent = pdev->dev.parent;
> - u16 status, events;
> + u16 status, events = 0;
>
> /*
> * Interrupts only occur in D3hot or shallower and only if enabled
> @@ -552,6 +552,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
> }
> }
>
> +read_status:
> pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
> if (status == (u16) ~0) {
> ctrl_info(ctrl, "%s: no response from device\n", __func__);
> @@ -564,24 +565,37 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
> * Slot Status contains plain status bits as well as event
> * notification bits; right now we only want the event bits.
> */
> - events = status & (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
> - PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
> - PCI_EXP_SLTSTA_DLLSC);
> + status &= PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
> + PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
> + PCI_EXP_SLTSTA_DLLSC;
>
> /*
> * If we've already reported a power fault, don't report it again
> * until we've done something to handle it.
> */
> if (ctrl->power_fault_detected)
> - events &= ~PCI_EXP_SLTSTA_PFD;
> + status &= ~PCI_EXP_SLTSTA_PFD;
>
> + events |= status;
> if (!events) {
> if (parent)
> pm_runtime_put(parent);
> return IRQ_NONE;
> }
>
> - pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
> + if (status) {
> + pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
> +
> + /*
> + * In MSI mode, all event bits must be zero before the port
> + * will send a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).
> + * So re-read the Slot Status register in case a bit was set
> + * between read and write.
> + */
> + if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode)
> + goto read_status;
> + }
> +
> ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
> if (parent)
> pm_runtime_put(parent);
> --
> 2.24.0
>
next prev parent reply other threads:[~2020-03-20 15:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 19:54 [PATCH v3] PCI: pciehp: Make sure pciehp_isr clears interrupt events Stuart Hayes
2020-02-09 15:03 ` Lukas Wunner
2020-02-09 18:07 ` Lukas Wunner
2020-02-09 20:25 ` Lukas Wunner
2020-02-10 21:40 ` Stuart Hayes
2020-02-13 20:19 ` Stuart Hayes
2020-02-19 15:57 ` Lukas Wunner
2020-02-19 14:31 ` [PATCH v4] PCI: pciehp: Fix MSI interrupt race Lukas Wunner
2020-03-06 19:59 ` Stuart Hayes
2020-03-20 15:16 ` Joerg Roedel [this message]
2020-03-28 21:27 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200320151621.GA15468@8bytes.org \
--to=joro@8bytes.org \
--cc=andy.shevchenko@gmail.com \
--cc=austin_bolen@dell.com \
--cc=ematsumiya@suse.com \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mika.westerberg@linux.intel.com \
--cc=mr.nuke.me@gmail.com \
--cc=narendra_k@dell.com \
--cc=okaya@kernel.org \
--cc=poza@codeaurora.org \
--cc=rafael.j.wysocki@intel.com \
--cc=stuart.w.hayes@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).