All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: okaya@codeaurora.org, Pavel Machek <pavel@ucw.cz>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Lukas Wunner <lukas@wunner.de>
Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
Date: Sun, 6 May 2018 11:35:01 +0200	[thread overview]
Message-ID: <f60a69c1-865e-e919-48ce-92416e44879d@molgen.mpg.de> (raw)
In-Reply-To: <20180504133327.GF15790@bhelgaas-glaptop.roam.corp.google.com>

Dear Bjorn,


Am 04.05.2018 um 15:33 schrieb Bjorn Helgaas:

[…]

> Yes, definitely.  I intended to do that but got a little lazy.  What
> do you think about the following?  Paul, if you haven't tested the
> first patch, can you try this one instead?  The logic is pretty much
> the same.
> 
> 3461a068661c ("PCI: pciehp: Wait for hotplug command completion
> lazily") mentions AMD and Nvidia devices with the same issue, but
> unfortunately doesn't include any specifics.
> 
> 
> commit b0d6f2230e12c85ae3b65a854a53c67c7c1f6406
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu May 3 18:39:38 2018 -0500
> 
>      PCI: pciehp: Add quirk for Intel Command Completed erratum
>      
>      The Intel CF118 erratum means the controller does not set the Command
>      Completed bit unless writes to the Slot Command register change "Control"
>      bits.  Command Completed is never set for writes that only change software
>      notification "Enable" bits.  This results in timeouts like this:
>      
>        pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
>      
>      When this erratum is present, avoid these timeouts by marking commands
>      "completed" immediately unless they change the "Control" bits.
>      
>      Here's the text of the erratum from the Intel document:
>      
>        CF118        PCIe Slot Status Register Command Completed bit not always
>                     updated on any configuration write to the Slot Control
>                     Register
>      
>        Problem:     For PCIe root ports (devices 0 - 10) supporting hot-plug,
>                     the Slot Status Register (offset AAh) Command Completed
>                     (bit[4]) status is updated under the following condition:
>                     IOH will set Command Completed bit after delivering the new
>                     commands written in the Slot Controller register (offset
>                     A8h) to VPP. The IOH detects new commands written in Slot
>                     Control register by checking the change of value for Power
>                     Controller Control (bit[10]), Power Indicator Control
>                     (bits[9:8]), Attention Indicator Control (bits[7:6]), or
>                     Electromechanical Interlock Control (bit[11]) fields. Any
>                     other configuration writes to the Slot Control register
>                     without changing the values of these fields will not cause
>                     Command Completed bit to be set.
>      
>                     The PCIe Base Specification Revision 2.0 or later describes
>                     the “Slot Control Register” in section 7.8.10, as follows
>                     (Reference section 7.8.10, Slot Control Register, Offset
>                     18h). In hot-plug capable Downstream Ports, a write to the
>                     Slot Control register must cause a hot-plug command to be
>                     generated (see Section 6.7.3.2 for details on hot-plug
>                     commands). A write to the Slot Control register in a
>                     Downstream Port that is not hotplug capable must not cause a
>                     hot-plug command to be executed.
>      
>                     The PCIe Spec intended that every write to the Slot Control
>                     Register is a command and expected a command complete status
>                     to abstract the VPP implementation specific nuances from the
>                     OS software. IOH PCIe Slot Control Register implementation
>                     is not fully conforming to the PCIe Specification in this
>                     respect.
>      
>        Implication: Software checking on the Command Completed status after
>                     writing to the Slot Control register may time out.
>      
>        Workaround:  Software can read the Slot Control register and compare the
>                     existing and new values to determine if it should check the
>                     Command Completed status after writing to the Slot Control
>                     register.
>      
>      Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
>      Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
>      Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de>
>      Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 18a42f8f5dc5..e70eba5ea906 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -10,7 +10,6 @@
>    * All rights reserved.
>    *
>    * Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com>
> - *
>    */
>   
>   #include <linux/kernel.h>
> @@ -147,25 +146,22 @@ static void pcie_wait_cmd(struct controller *ctrl)
>   	else
>   		rc = pcie_poll_cmd(ctrl, jiffies_to_msecs(timeout));
>   
> -	/*
> -	 * Controllers with errata like Intel CF118 don't generate
> -	 * completion notifications unless the power/indicator/interlock
> -	 * control bits are changed.  On such controllers, we'll emit this
> -	 * timeout message when we wait for completion of commands that
> -	 * don't change those bits, e.g., commands that merely enable
> -	 * interrupts.
> -	 */
>   	if (!rc)
>   		ctrl_info(ctrl, "Timeout on hotplug command %#06x (issued %u msec ago)\n",
>   			  ctrl->slot_ctrl,
>   			  jiffies_to_msecs(jiffies - ctrl->cmd_started));
>   }
>   
> +#define CC_ERRATUM_MASK		(PCI_EXP_SLTCTL_PCC |	\
> +				 PCI_EXP_SLTCTL_PIC |	\
> +				 PCI_EXP_SLTCTL_AIC |	\
> +				 PCI_EXP_SLTCTL_EIC)
> +
>   static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
>   			      u16 mask, bool wait)
>   {
>   	struct pci_dev *pdev = ctrl_dev(ctrl);
> -	u16 slot_ctrl;
> +	u16 slot_ctrl_orig, slot_ctrl;
>   
>   	mutex_lock(&ctrl->ctrl_lock);
>   
> @@ -180,6 +176,7 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
>   		goto out;
>   	}
>   
> +	slot_ctrl_orig = slot_ctrl;
>   	slot_ctrl &= ~mask;
>   	slot_ctrl |= (cmd & mask);
>   	ctrl->cmd_busy = 1;
> @@ -188,6 +185,17 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
>   	ctrl->cmd_started = jiffies;
>   	ctrl->slot_ctrl = slot_ctrl;
>   
> +	/*
> +	 * Controllers with the Intel CF118 and similar errata advertise
> +	 * Command Completed support, but they only set Command Completed
> +	 * if we change the "Control" bits for power, power indicator,
> +	 * attention indicator, or interlock.  If we only change the
> +	 * "Enable" bits, they never set the Command Completed bit.
> +	 */
> +	if (pdev->broken_cmd_compl &&
> +	    (slot_ctrl_orig & CC_ERRATUM_MASK) == (slot_ctrl & CC_ERRATUM_MASK))
> +		ctrl->cmd_busy = 0;
> +
>   	/*
>   	 * Optionally wait for the hardware to be ready for a new command,
>   	 * indicating completion of the above issued command.
> @@ -861,7 +869,7 @@ struct controller *pcie_init(struct pcie_device *dev)
>   		PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_CC |
>   		PCI_EXP_SLTSTA_DLLSC);
>   
> -	ctrl_info(ctrl, "Slot #%d AttnBtn%c PwrCtrl%c MRL%c AttnInd%c PwrInd%c HotPlug%c Surprise%c Interlock%c NoCompl%c LLActRep%c\n",
> +	ctrl_info(ctrl, "Slot #%d AttnBtn%c PwrCtrl%c MRL%c AttnInd%c PwrInd%c HotPlug%c Surprise%c Interlock%c NoCompl%c LLActRep%c%s\n",
>   		(slot_cap & PCI_EXP_SLTCAP_PSN) >> 19,
>   		FLAG(slot_cap, PCI_EXP_SLTCAP_ABP),
>   		FLAG(slot_cap, PCI_EXP_SLTCAP_PCP),
> @@ -872,7 +880,8 @@ struct controller *pcie_init(struct pcie_device *dev)
>   		FLAG(slot_cap, PCI_EXP_SLTCAP_HPS),
>   		FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
>   		FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
> -		FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC));
> +		FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
> +		pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
>   
>   	if (pcie_init_slot(ctrl))
>   		goto abort_ctrl;
> @@ -891,3 +900,17 @@ void pciehp_release_ctrl(struct controller *ctrl)
>   	pcie_cleanup_slot(ctrl);
>   	kfree(ctrl);
>   }
> +
> +static void quirk_cmd_compl(struct pci_dev *pdev)
> +{
> +	u32 slot_cap;
> +
> +	if (pci_is_pcie(pdev)) {
> +		pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
> +		if (slot_cap & PCI_EXP_SLTCAP_HPC &&
> +		    !(slot_cap & PCI_EXP_SLTCAP_NCCS))
> +			pdev->broken_cmd_compl = 1;
> +	}
> +}
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
> +			      PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 73178a2fcee0..60cb5350ad28 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -406,6 +406,9 @@ struct pci_dev {
>   	struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
>   	struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
>   
> +#ifdef CONFIG_HOTPLUG_PCI_PCIE
> +	unsigned int	broken_cmd_compl:1;	/* Command Complete broken */
> +#endif
>   #ifdef CONFIG_PCIE_PTM
>   	unsigned int	ptm_root:1;
>   	unsigned int	ptm_enabled:1;
> 

With this change, the message is also not shown anymore on the Lenovo 
X60. Thank you.

Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>


Kind regards,

Paul

  parent reply	other threads:[~2018-05-06  9:35 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-26 10:17 pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Paul Menzel
2018-04-27 19:22 ` Bjorn Helgaas
2018-04-27 19:34   ` Sinan Kaya
2018-04-27 21:12     ` Bjorn Helgaas
2018-04-27 21:12       ` Bjorn Helgaas
2018-04-28  0:56       ` Dave Young
2018-04-28  0:56         ` Dave Young
2018-04-28  1:18         ` Dave Young
2018-04-28  1:18           ` Dave Young
2018-04-28 13:03           ` okaya
2018-04-28 13:03             ` okaya
2018-04-30 20:48             ` Sinan Kaya
2018-04-30 20:48               ` Sinan Kaya
2018-04-30 21:17               ` Bjorn Helgaas
2018-04-30 21:17                 ` Bjorn Helgaas
2018-04-30 21:27                 ` Sinan Kaya
2018-04-30 21:27                   ` Sinan Kaya
2018-04-30 21:38                   ` Lukas Wunner
2018-05-01 12:38                   ` Sinan Kaya
2018-05-01 12:38                     ` Sinan Kaya
2018-05-01 12:59                     ` Marc Zyngier
2018-05-01 12:59                       ` Marc Zyngier
2018-05-01 13:25                       ` Bjorn Helgaas
2018-05-01 13:25                         ` Bjorn Helgaas
2018-05-01 16:31                         ` Marc Zyngier
2018-05-01 16:31                           ` Marc Zyngier
2018-05-01 22:32                           ` Eric W. Biederman
2018-05-01 22:32                             ` Eric W. Biederman
2018-05-01 22:32                             ` Eric W. Biederman
2018-05-03  8:49   ` Paul Menzel
2018-05-04  2:45     ` Bjorn Helgaas
2018-05-04  6:37       ` okaya
2018-05-04 13:33         ` Bjorn Helgaas
2018-05-04 14:24           ` okaya
2018-05-06  9:35           ` Paul Menzel [this message]
2018-05-07 21:33           ` Bjorn Helgaas
2018-05-08  6:59             ` Paul Menzel
2018-05-08 12:34               ` Bjorn Helgaas
2018-05-08 13:22                 ` Paul Menzel
2018-05-09 11:41   ` Lukas Wunner
2018-05-09 12:57     ` Bjorn Helgaas
2018-05-09 13:16       ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f60a69c1-865e-e919-48ce-92416e44879d@molgen.mpg.de \
    --to=pmenzel+linux-pci@molgen.mpg.de \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=okaya@codeaurora.org \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.