linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
@ 2019-03-12 12:05 Sergey Miroshnichenko
  2019-03-12 12:20 ` Lukas Wunner
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-12 12:05 UTC (permalink / raw)
  To: linux-pci; +Cc: linux, Sergey Miroshnichenko, Lukas Wunner

During the safe removal procedure, a Data Link Layer State Changed event
may occur after pciehp_power_off_slot(), and it is handled when the slot is
already set to OFF_STATE. This results in re-enabling the device and makes
it impossible to actually safely remove it.

Clear out the Presence Detect Changed and Data Link Layer State Changed
events when the disabled slot has settled down.

It is still possible to re-enable the device if it remains in the slot
after pressing the Attention Button by pressing it again.

Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
Cc: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 3f3df4c29f6e..905282a8ddaa 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -115,6 +115,10 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
 		 * removed from the slot/adapter.
 		 */
 		msleep(1000);
+
+		/* Ignore link or presence changes caused by power off */
+		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
+			   &ctrl->pending_events);
 	}
 
 	/* turn off Green LED */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-03-12 12:05 [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal Sergey Miroshnichenko
@ 2019-03-12 12:20 ` Lukas Wunner
  2019-03-13 13:42   ` Sergey Miroshnichenko
  2019-04-04 19:44 ` Bjorn Helgaas
  2019-04-08 19:55 ` Bjorn Helgaas
  2 siblings, 1 reply; 9+ messages in thread
From: Lukas Wunner @ 2019-03-12 12:20 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linux

On Tue, Mar 12, 2019 at 03:05:48PM +0300, Sergey Miroshnichenko wrote:
> During the safe removal procedure, a Data Link Layer State Changed event
> may occur after pciehp_power_off_slot(), and it is handled when the slot is
> already set to OFF_STATE. This results in re-enabling the device and makes
> it impossible to actually safely remove it.
> 
> Clear out the Presence Detect Changed and Data Link Layer State Changed
> events when the disabled slot has settled down.
> 
> It is still possible to re-enable the device if it remains in the slot
> after pressing the Attention Button by pressing it again.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>

Reviewed-by: Lukas Wunner <lukas@wunner.de>

Did this work correctly before v4.19 or is it a regression caused by
d331710ea78f ("PCI: pciehp: Become resilient to missed events")?
Either way, it seems reasonable to backport it to stable and
v4.19 is the earliest that it should apply cleanly to, so:

Cc: stable@vger.kernel.org # v4.19+

Thanks a lot for catching this, I wasn't able to test the code paths
which are only exercised if a power controller is present (because
Thunderbolt doesn't have one).

Lukas

> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 3f3df4c29f6e..905282a8ddaa 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -115,6 +115,10 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
>  		 * removed from the slot/adapter.
>  		 */
>  		msleep(1000);
> +
> +		/* Ignore link or presence changes caused by power off */
> +		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
> +			   &ctrl->pending_events);
>  	}
>  
>  	/* turn off Green LED */
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-03-12 12:20 ` Lukas Wunner
@ 2019-03-13 13:42   ` Sergey Miroshnichenko
  0 siblings, 0 replies; 9+ messages in thread
From: Sergey Miroshnichenko @ 2019-03-13 13:42 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux


[-- Attachment #1.1: Type: text/plain, Size: 2165 bytes --]

On 3/12/19 3:20 PM, Lukas Wunner wrote:
> On Tue, Mar 12, 2019 at 03:05:48PM +0300, Sergey Miroshnichenko wrote:
>> During the safe removal procedure, a Data Link Layer State Changed event
>> may occur after pciehp_power_off_slot(), and it is handled when the slot is
>> already set to OFF_STATE. This results in re-enabling the device and makes
>> it impossible to actually safely remove it.
>>
>> Clear out the Presence Detect Changed and Data Link Layer State Changed
>> events when the disabled slot has settled down.
>>
>> It is still possible to re-enable the device if it remains in the slot
>> after pressing the Attention Button by pressing it again.
>>
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> 
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
> 

Thank you!

> Did this work correctly before v4.19 or is it a regression caused by
> d331710ea78f ("PCI: pciehp: Become resilient to missed events")?
> Either way, it seems reasonable to backport it to stable and
> v4.19 is the earliest that it should apply cleanly to, so:
> 
> Cc: stable@vger.kernel.org # v4.19+
> 

Yes, that was a regression caused by this commit. I've verified it
applies cleanly to v4.19.

Best regards,
Serge

> Thanks a lot for catching this, I wasn't able to test the code paths
> which are only exercised if a power controller is present (because
> Thunderbolt doesn't have one).
> 
> Lukas
> 
>> ---
>>  drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
>> index 3f3df4c29f6e..905282a8ddaa 100644
>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -115,6 +115,10 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
>>  		 * removed from the slot/adapter.
>>  		 */
>>  		msleep(1000);
>> +
>> +		/* Ignore link or presence changes caused by power off */
>> +		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
>> +			   &ctrl->pending_events);
>>  	}
>>  
>>  	/* turn off Green LED */
>> -- 
>> 2.20.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-03-12 12:05 [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal Sergey Miroshnichenko
  2019-03-12 12:20 ` Lukas Wunner
@ 2019-04-04 19:44 ` Bjorn Helgaas
  2019-04-08 19:55 ` Bjorn Helgaas
  2 siblings, 0 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2019-04-04 19:44 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linux, Lukas Wunner

On Tue, Mar 12, 2019 at 03:05:48PM +0300, Sergey Miroshnichenko wrote:
> During the safe removal procedure, a Data Link Layer State Changed event
> may occur after pciehp_power_off_slot(), and it is handled when the slot is
> already set to OFF_STATE. This results in re-enabling the device and makes
> it impossible to actually safely remove it.
> 
> Clear out the Presence Detect Changed and Data Link Layer State Changed
> events when the disabled slot has settled down.
> 
> It is still possible to re-enable the device if it remains in the slot
> after pressing the Attention Button by pressing it again.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> Cc: Lukas Wunner <lukas@wunner.de>

Applied to pci/hotplug with Lukas' reviewed-by and stable tag for v5.2,
thanks!

> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 3f3df4c29f6e..905282a8ddaa 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -115,6 +115,10 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
>  		 * removed from the slot/adapter.
>  		 */
>  		msleep(1000);
> +
> +		/* Ignore link or presence changes caused by power off */
> +		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
> +			   &ctrl->pending_events);
>  	}
>  
>  	/* turn off Green LED */
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-03-12 12:05 [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal Sergey Miroshnichenko
  2019-03-12 12:20 ` Lukas Wunner
  2019-04-04 19:44 ` Bjorn Helgaas
@ 2019-04-08 19:55 ` Bjorn Helgaas
  2019-04-09 23:41   ` Micah Parrish
  2019-04-10  8:26   ` Lukas Wunner
  2 siblings, 2 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2019-04-08 19:55 UTC (permalink / raw)
  To: Sergey Miroshnichenko; +Cc: linux-pci, linux, Lukas Wunner

On Tue, Mar 12, 2019 at 03:05:48PM +0300, Sergey Miroshnichenko wrote:
> During the safe removal procedure, a Data Link Layer State Changed event
> may occur after pciehp_power_off_slot(), and it is handled when the slot is
> already set to OFF_STATE. This results in re-enabling the device and makes
> it impossible to actually safely remove it.
> 
> Clear out the Presence Detect Changed and Data Link Layer State Changed
> events when the disabled slot has settled down.
> 
> It is still possible to re-enable the device if it remains in the slot
> after pressing the Attention Button by pressing it again.
> 
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
> Cc: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 3f3df4c29f6e..905282a8ddaa 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -115,6 +115,10 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
>  		 * removed from the slot/adapter.
>  		 */
>  		msleep(1000);
> +
> +		/* Ignore link or presence changes caused by power off */
> +		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
> +			   &ctrl->pending_events);

I did apply this and I'm told that it fixes an issue where pressing an
NVMe drive power button does not turn off the drive.

But I wonder if it would be feasible to just turn off reporting of the
events we don't care about, as opposed to figuring out that we should
ignore them if they do occur.  E.g., maybe when we write
PCI_EXP_SLTCTL_PWR_OFF, we should clear PCI_EXP_SLTCTL_DLLSCE and
PCI_EXP_SLTCTL_PDCE at the same time.  Of course, then we'd have to clear
any pending events and re-enable them later.

It's somewhat non-obvious that we ignore these events by clearing out
bits in pending_events that might have been set somewhere else (I
assume by pciehp_isr()) and will be consumed in a third place (maybe
pciehp_ist()?)

Bjorn

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-04-08 19:55 ` Bjorn Helgaas
@ 2019-04-09 23:41   ` Micah Parrish
  2019-04-10  8:33     ` Lukas Wunner
  2019-04-10 21:22     ` Bjorn Helgaas
  2019-04-10  8:26   ` Lukas Wunner
  1 sibling, 2 replies; 9+ messages in thread
From: Micah Parrish @ 2019-04-09 23:41 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, lukas, s.miroshnichenko

Because this bug affects our Proliant nvme hardware, I would like to 
request the patch be merged to 5.1 and 4.19 longterm.

I have filed kernel BZ https://bugzilla.kernel.org/show_bug.cgi?id=203237


Micah Parrish



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-04-08 19:55 ` Bjorn Helgaas
  2019-04-09 23:41   ` Micah Parrish
@ 2019-04-10  8:26   ` Lukas Wunner
  1 sibling, 0 replies; 9+ messages in thread
From: Lukas Wunner @ 2019-04-10  8:26 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Sergey Miroshnichenko, linux-pci, linux, Micah Parrish

[cc += Micah]

On Mon, Apr 08, 2019 at 02:55:40PM -0500, Bjorn Helgaas wrote:
> On Tue, Mar 12, 2019 at 03:05:48PM +0300, Sergey Miroshnichenko wrote:
> > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > @@ -115,6 +115,10 @@ static void remove_board(struct controller *ctrl, bool safe_removal)
> >  		 * removed from the slot/adapter.
> >  		 */
> >  		msleep(1000);
> > +
> > +		/* Ignore link or presence changes caused by power off */
> > +		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
> > +			   &ctrl->pending_events);
> 
> I did apply this and I'm told that it fixes an issue where pressing an
> NVMe drive power button does not turn off the drive.
> 
> But I wonder if it would be feasible to just turn off reporting of the
> events we don't care about, as opposed to figuring out that we should
> ignore them if they do occur.  E.g., maybe when we write
> PCI_EXP_SLTCTL_PWR_OFF, we should clear PCI_EXP_SLTCTL_DLLSCE and
> PCI_EXP_SLTCTL_PDCE at the same time.  Of course, then we'd have to clear
> any pending events and re-enable them later.

I think we'd have to re-enable them immediately after disabling them:

My understanding is that even if the slot is powered off, it may signal
a presence change from out-of-band presence detection.  For such form
factors, the user may power off the slot (via sysfs or Attention Button),
remove the card and replace it with another once.  The slot then signals
a presence change, whereupon pciehp automatically brings up the slot
again (without the need for another button press or sysfs interaction).
At least that's how the code would have worked up until v4.18 for such
form factors AFAICS.

So remove_board() would have to look like this:

	pciehp_unconfigure_device(ctrl, safe_removal);

	if (POWER_CTRL(ctrl)) {
+		pcie_disable_notification(ctrl);
		pciehp_power_off_slot(ctrl);
		msleep(1000);
+		pcie_clear_hotplug_events(ctrl);
+		pcie_enable_notification(ctrl);
	}

So, three lines instead of just one line to achieve the same thing
and also three MMIO accesses instead of just a single memory change,
hence the solution in Sergey's patch seems simpler to me.


> It's somewhat non-obvious that we ignore these events by clearing out
> bits in pending_events that might have been set somewhere else (I
> assume by pciehp_isr()) and will be consumed in a third place (maybe
> pciehp_ist()?)

Yes that is correct, pciehp_isr() collects events for later consumption
by pciehp_ist().  They're stored in pending_events in struct controller.

We already use the same atomic_and() trick of Sergey's patch in
pciehp_check_link_status() to ignore link or presence flaps that
occur during the first 100 ms on slot bringup.  This was done to
fix an issue reported by Stefan Roese, cf. commit 6c35a1ac3da6
("PCI: pciehp: Tolerate initially unstable link").

We also use the pending_events to synthesize events in
pciehp_sysfs_enable_slot(), pciehp_sysfs_disable_slot() and
pciehp_queue_pushbutton_work() by setting bits in pending_events
and calling irq_wake_thread() to force pciehp_ist() to run.
If pciehp_ist() is already running at the moment irq_wake_thread()
is called, another run is automagically queued up by the IRQ core.
We take advantage of the fact that the IRQ core already has this
race-free, high-performance event handling implemented so that
there's no need to duplicate the functionality in pciehp.

Basically all the event handling is in one place and that's
pciehp_ist() and everything else interfaces with it via the
pending_events.  So the pending_events is quite central to how
pciehp now works.

I meant to write a few paragraphs to document the inner workings
of pciehp, just haven't gotten to it yet, sorry.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-04-09 23:41   ` Micah Parrish
@ 2019-04-10  8:33     ` Lukas Wunner
  2019-04-10 21:22     ` Bjorn Helgaas
  1 sibling, 0 replies; 9+ messages in thread
From: Lukas Wunner @ 2019-04-10  8:33 UTC (permalink / raw)
  To: Micah Parrish; +Cc: helgaas, linux-pci, linux, s.miroshnichenko

On Tue, Apr 09, 2019 at 05:41:24PM -0600, Micah Parrish wrote:
> Because this bug affects our Proliant nvme hardware, I would like to request
> the patch be merged to 5.1 and 4.19 longterm.
> 
> I have filed kernel BZ https://bugzilla.kernel.org/show_bug.cgi?id=203237

Thanks for the bug report and sorry for the breakage.

The patch has already been committed to Bjorn's pci/hotplug branch
and is marked for stable, hence will appear in the releases you've
mentioned above in about 5 weeks.  If you need it in mainline earlier
I guess Bjorn could cherry-pick the commit onto his for-linus branch
for inclusion in v5.1, that's his call.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal
  2019-04-09 23:41   ` Micah Parrish
  2019-04-10  8:33     ` Lukas Wunner
@ 2019-04-10 21:22     ` Bjorn Helgaas
  1 sibling, 0 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2019-04-10 21:22 UTC (permalink / raw)
  To: Micah Parrish; +Cc: linux-pci, linux, lukas, s.miroshnichenko

On Tue, Apr 09, 2019 at 05:41:24PM -0600, Micah Parrish wrote:
> Because this bug affects our Proliant nvme hardware, I would like to request
> the patch be merged to 5.1 and 4.19 longterm.
> 
> I have filed kernel BZ https://bugzilla.kernel.org/show_bug.cgi?id=203237

Thanks, I added that BZ as well as a reported-by and tested-by from you and
moved the patch to my for-linus branch for v5.1.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-04-10 21:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12 12:05 [PATCH v2] PCI: pciehp: Fix re-enabling the slot marked for safe removal Sergey Miroshnichenko
2019-03-12 12:20 ` Lukas Wunner
2019-03-13 13:42   ` Sergey Miroshnichenko
2019-04-04 19:44 ` Bjorn Helgaas
2019-04-08 19:55 ` Bjorn Helgaas
2019-04-09 23:41   ` Micah Parrish
2019-04-10  8:33     ` Lukas Wunner
2019-04-10 21:22     ` Bjorn Helgaas
2019-04-10  8:26   ` Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).