All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Keith Busch <keith.busch@intel.com>
Cc: Linux PCI <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Sinan Kaya <okaya@kernel.org>, Thomas Tai <thomas.tai@oracle.com>,
	poza@codeaurora.org, Lukas Wunner <lukas@wunner.de>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCHv2 15/20] PCI/pciehp: Fix powerfault detection order
Date: Fri, 7 Sep 2018 11:53:52 -0500	[thread overview]
Message-ID: <20180907165352.GA250890@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20180906195047.GD31024@localhost.localdomain>

On Thu, Sep 06, 2018 at 01:50:47PM -0600, Keith Busch wrote:
> On Thu, Sep 06, 2018 at 02:36:57PM -0500, Bjorn Helgaas wrote:
> > On Wed, Sep 05, 2018 at 02:35:41PM -0600, Keith Busch wrote:
> > > A device add in a power controller controlled slot will power on and
> > > clear power fault slot events, but this was happening before the interrupt
> > > handler attempted to set the sticky status and attention indicators. The
> > > wrong status will be set if a hot-add and power fault are handled in
> > > one interrupt. This patch fixes that by checking for power faults before
> > > checking for new devices.
> > 
> > Can you clarify the part about "the interrupt handler attempting to set the
> > sticky status and attention indicators"?  My first impression is that
> > you're talking about bits in the Slot Status register, but that's
> > obviously wrong because those bits are set by hardware (not the interrupt
> > handler) and they're RW1C so software clears them by writing 1 to them.
> 
> The sticky status being the pciehp driver's "power_fault_detected"
> field. We set it on the first observation of a slot's PFD and do not
> clear it until we have a successful board_added event.
> 
> > Lukas suggests that this patch should be in v4.19.  Do you agree, and if
> > so, can you help me justify it by describing the user-visible effect of
> > this?  I'm not sure what "setting the wrong status" means to a user, e.g.,
> > does this result in a non-functional device, an incorrect status LED on the
> > slot, something else?  Does it fix a regression or something we merged for
> > v4.19?
> 
> From a user point of view, it is possible the attention LED light could be
> on after a successful hot add.

Great, thanks!  Also, it looks like the power LED will be off even though
the power is actually on.

    pciehp_ist
      if (events & (PDC | DLLSC))
        pciehp_handle_presence_or_link_change
          case OFF_STATE:
            pciehp_enable_slot
              __pciehp_enable_slot
                board_added
                  pciehp_power_on_slot
                    ctrl->power_fault_detected = 0
                    pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC)
      if (PFD && !ctrl->power_fault_detected)
        ctrl->power_fault_detected = 1
        pciehp_set_attention_status(slot, 1)     # attention LED on
        pciehp_green_led_off(slot)               # power LED off


Tangent: how annoying that the spec refers to "Power Indicator" and
"Attention Indicator", but (a) we call them the "green_led" and
"attention_status", and (b) both can be on/off/blinking, but the interfaces
are totally different.

> The only reason this was successful before was how everything was chained
> through work queues, the work order being:
> 
>   INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ
> 
> The ENABLE_REQ cleared the power fault at the end, but now everything
> is handled inline with the interrupt thread (which was a great change,
> IMO), such that the work ENABLE_REQ was doing happens before power
> fault handling now.
> 
> The commit that changed that order:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=0e94916e6091f48391b65110e71c87c583021640
> 
>  
> > > Signed-off-by: Keith Busch <keith.busch@intel.com>
> > > Reviewed-by: Lukas Wunner <lukas@wunner.de>
> > > ---
> > >  drivers/pci/hotplug/pciehp_hpc.c | 16 ++++++++--------
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> > > index 9eb28a06cac6..52a18a7ec2a2 100644
> > > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > > @@ -630,6 +630,14 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> > >  		pciehp_handle_button_press(slot);
> > >  	}
> > >  
> > > +	/* Check Power Fault Detected */
> > > +	if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
> > > +		ctrl->power_fault_detected = 1;
> > > +		ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot));
> > > +		pciehp_set_attention_status(slot, 1);
> > > +		pciehp_green_led_off(slot);
> > > +	}
> > > +
> > >  	/*
> > >  	 * Disable requests have higher priority than Presence Detect Changed
> > >  	 * or Data Link Layer State Changed events.
> > > @@ -641,14 +649,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> > >  		pciehp_handle_presence_or_link_change(slot, events);
> > >  	up_read(&ctrl->reset_lock);
> > >  
> > > -	/* Check Power Fault Detected */
> > > -	if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
> > > -		ctrl->power_fault_detected = 1;
> > > -		ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot));
> > > -		pciehp_set_attention_status(slot, 1);
> > > -		pciehp_green_led_off(slot);
> > > -	}
> > > -
> > >  	pci_config_pm_runtime_put(pdev);
> > >  	wake_up(&ctrl->requester);
> > >  	return IRQ_HANDLED;
> > > -- 
> > > 2.14.4
> > > 

  reply	other threads:[~2018-09-07 21:35 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-05 20:35 [PATCHv2 00/20] PCI, error handling and hot plug Keith Busch
2018-09-05 20:35 ` [PATCHv2 01/20] PCI: Simplify disconnected marking Keith Busch
2018-09-05 20:35 ` [PATCHv2 02/20] PCI: Fix faulty logic in pci_reset_bus() Keith Busch
2018-09-05 20:35 ` [PATCHv2 03/20] PCI: Add required waits on link active Keith Busch
2018-09-06 11:42   ` Lukas Wunner
2018-09-06 14:44     ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 04/20] PCI/AER: Remove dead code Keith Busch
2018-09-05 20:35 ` [PATCHv2 05/20] PCI/ERR: Use slot reset if available Keith Busch
2018-09-05 20:35 ` [PATCHv2 06/20] PCI/ERR: Handle fatal error recovery Keith Busch
2018-09-05 20:35 ` [PATCHv2 07/20] PCI/ERR: Always use the first downstream port Keith Busch
2018-09-05 20:35 ` [PATCHv2 08/20] PCI/ERR: Simplify broadcast callouts Keith Busch
2018-09-05 20:35 ` [PATCHv2 09/20] PCI/ERR: Report current recovery status for udev Keith Busch
2018-09-05 20:35 ` [PATCHv2 10/20] PCI/ERR: Remove devices on recovery failure Keith Busch
2018-09-05 20:35 ` [PATCHv2 11/20] PCI/portdrv: Provide pci error callbacks Keith Busch
2018-09-05 20:35 ` [PATCHv2 12/20] PCI/portdrv: Restore pci state on slot reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 13/20] PCI: Make link active reporting detection generic Keith Busch
2018-09-06 12:38   ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 14/20] PCI: Create recursive bus walk Keith Busch
2018-09-05 20:35 ` [PATCHv2 15/20] PCI/pciehp: Fix powerfault detection order Keith Busch
2018-09-06 19:36   ` Bjorn Helgaas
2018-09-06 19:50     ` Keith Busch
2018-09-07 16:53       ` Bjorn Helgaas [this message]
2018-09-07 20:03         ` Bjorn Helgaas
2018-09-07 20:18           ` Keith Busch
2018-09-18 21:46             ` Bjorn Helgaas
2018-09-18 22:11               ` Keith Busch
2018-09-07 20:26           ` Lukas Wunner
2018-09-05 20:35 ` [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks Keith Busch
2018-09-06 18:23   ` Thomas Tai
2018-09-06 18:49     ` Keith Busch
2018-09-10 13:20   ` Lukas Wunner
2018-09-10 14:56     ` Keith Busch
2018-09-10 16:09       ` Lukas Wunner
2018-09-10 16:18         ` Keith Busch
2018-09-10 16:45         ` Keith Busch
2018-09-10 17:08           ` Lukas Wunner
2018-09-10 17:22             ` Keith Busch
2018-09-05 20:35 ` [PATCHv2 17/20] PCI/pciehp: Ignore link events during DPC event Keith Busch
2018-09-05 20:35 ` [PATCHv2 18/20] PCI/DPC: Wait for link active after reset Keith Busch
2018-09-05 20:35 ` [PATCHv2 19/20] PCI/DPC: Link reset code cleanup Keith Busch
2018-09-05 20:35 ` [PATCHv2 20/20] PCI: Unify device inaccessible Keith Busch
2018-09-06  4:20   ` Benjamin Herrenschmidt
2018-09-06 17:30 ` [PATCHv2 00/20] PCI, error handling and hot plug Thomas Tai
2018-09-06 17:36   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180907165352.GA250890@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=okaya@kernel.org \
    --cc=poza@codeaurora.org \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.