[PATCH] pci/pciehp: bail on bogus pcie reads from removed devices

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
@ 2015-07-21 16:25 Jarod Wilson
  2015-08-03  4:14 ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Jarod Wilson @ 2015-07-21 16:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Bjorn Helgaas, Rafael J. Wysocki, linux-pci

https://bugzilla.kernel.org/show_bug.cgi?id=99841

Seems like a read of all 1's from a register of a device that has gone
away should be taken as a sign that the device has gone away.
Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as
much with this snippet:

|IMPLEMENTATION NOTE
|Data Value of All 1’s
|Many platforms, including those supporting RP Extensions for DPC, can
|return a data value of all 1’s to software when an error is associated
|with a PCI Express Configuration, I/O, or Memory Read Request. During DPC,
|the Downstream Port discards Requests destined for the Link and completes
|them with an error (i.e., either with an Unsupported Request (UR) or
|Completer Abort (CA) Completion Status). By ending a series of MMIO or
|configuration space operations with a read to an address with a known
|data value not equal to all 1’s, software may determine if a Completer
|has been removed or DPC has been triggered.

I'm not sure the above is directly relevant to this case, but the same
principle (reading all 1's means the device is probably gone) seems to
hold.

This is based on part of a debugging patch Bjorn posted in the referenced
bugzilla, and its required to make the HP ZBook G2 I've got here not barf
when disconnecting a thunderbolt ethernet adapter and corrupt memory.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: linux-pci@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/pci/hotplug/pciehp_hpc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bb1d0de..c397851 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -547,6 +547,11 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 	do {
 		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &detected);

+		if (detected == (u16) ~0) {
+			ctrl_info(ctrl, "Device has gone away\n");
+			return IRQ_HANDLED;
+		}
+
 		detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
 			     PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
 			     PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
  2015-07-21 16:25 [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices Jarod Wilson
@ 2015-08-03  4:14 ` Bjorn Helgaas
       [not found]   ` <CAKfmpSdrdjyiQ-WWBdXFuPuTyo0WkTTsTX5ByHBt7haZeF0w=g@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2015-08-03  4:14 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=99841
> 
> Seems like a read of all 1's from a register of a device that has gone
> away should be taken as a sign that the device has gone away.
> Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as
> much with this snippet:
> 
> |IMPLEMENTATION NOTE
> |Data Value of All 1’s
> |Many platforms, including those supporting RP Extensions for DPC, can
> |return a data value of all 1’s to software when an error is associated
> |with a PCI Express Configuration, I/O, or Memory Read Request. During DPC,
> |the Downstream Port discards Requests destined for the Link and completes
> |them with an error (i.e., either with an Unsupported Request (UR) or
> |Completer Abort (CA) Completion Status). By ending a series of MMIO or
> |configuration space operations with a read to an address with a known
> |data value not equal to all 1’s, software may determine if a Completer
> |has been removed or DPC has been triggered.
> 
> I'm not sure the above is directly relevant to this case, but the same
> principle (reading all 1's means the device is probably gone) seems to
> hold.
> 
> This is based on part of a debugging patch Bjorn posted in the referenced
> bugzilla, and its required to make the HP ZBook G2 I've got here not barf
> when disconnecting a thunderbolt ethernet adapter and corrupt memory.
> 
> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> CC: Bjorn Helgaas <bhelgaas@google.com>
> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> CC: linux-pci@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>

Hi Jarod,

I think there are two issues here:

  1) pciehp doesn't handle all 1's correctly
  2) use-after-free of hotplug_slot

This patch is for the first issue.  I think it's correct, but I still
have a question or two.   I attached an updated version of the patch
and changelog.

Here's the path I think we're taking: 03:03.0 receives pciehp
interrupt for removal (link down and card not present), and we call
pciehp_unconfigure_device() for 05:00.0 and everything downstream from
it.  Part of this is removing 06:00.0.  I expected this would use this
path:

  pcie_port_remove_service            # .remove method for 06:00.0
    dev_printk("unloading service driver ...")
    pciehp_remove                     # hpdriver_portdrv.remove
      pciehp_release_ctrl
        pcie_shutdown_notification
          pcie_disable_notification
            pcie_write_cmd
              pcie_do_write_cmd(..., true)    # wait
                pcie_wait_cmd
                  pcie_poll_cmd
                    read PCI_EXP_SLTSTA        # would get 0xffff
                read PCI_EXPT_SLTCTL        # would get 0xffff

so I added checks for ~0 data in pcie_poll_cmd() and
pcie_do_write_cmd().

But the dmesg log shows that we were in pcie_isr(), and I don't
understand yet how we got there.  Can you help figure that out?  Maybe
put a dump_stack() in pcie_isr() or something?


OK, now for the second issue.  I think we have a lifetime issue with
the hotplug_slot structure.

  pcie_port_remove_service            # .remove method
    "unloading service driver ..."
    pciehp_remove                     # hpdriver_portdrv.remove
      cleanup_slot
        pci_hp_deregister(ctrl->slot->hotplug_slot)
          hotplug->release
            release_slot              # hotplug->release
              ctrl_dbg("release_slot: physical_slot = 9")
              kfree(hotplug_slot->ops)
              kfree(hotplug_slot->info)
              kfree(hotplug_slot)     # <--- FREE
          pci_slot->hotplug = NULL
          pci_destroy_slot
            kobject_put(pci_slot->kobj)
      pciehp_release_ctrl
        pcie_shutdown_notification
          pcie_disable_notification
            pcie_write_cmd
            ...

      pcie_isr                                # not sure how we got here
        ctrl_info("Latch open on Slot(%s)", slot_name(slot))  # <--- USE

I haven't chased this down completely either, but I'm pretty sure
we're looking at ctrl->slot->hotplug_slot to get the name after we've
already freed it, and this accounts for the garbage slot names we
print.

This seems like a pretty serious problem as well, but I don't
understand it well enough to propose a fix.

I suspect both of these issues affect all the hotplug drivers, not
just pciehp.

Bjorn


commit b24e231a9e846f0420746a56cea7a48b41f3798b
Author: Jarod Wilson <jarod@redhat.com>
Date:   Tue Jul 21 12:25:30 2015 -0400

    PCI: pciehp: Handle invalid data when reading from non-existent devices
    
    It's platform-dependent, but an MMIO read to a non-existent PCI device
    generally returns data with all bits set.  This happens when the host
    bridge or Root Complex times out waiting for a response from the device and
    fabricates return data to complete the CPU's read.
    
    One example, reported in the bugzilla below, involved this hierarchy:
    
      pci 0000:00:1c.0: PCI bridge to [bus 02-3a] Root Port
      pci 0000:02:00.0: PCI bridge to [bus 03-0a] Upstream Port
      pci 0000:03:03.0: PCI bridge to [bus 05-07] Downstream Port
      pci 0000:05:00.0: PCI bridge to [bus 06-07] Thunderbolt Upstream Port
      pci 0000:06:00.0: PCI bridge to [bus 07]    Thunderbolt Downstream Port
      pci 0000:07:00.0: BCM57762 NIC
    
    Unplugging the Thunderbolt switch and the NIC below it resulted in this:
    
      pciehp 0000:03:03.0: Surprise Removal
      tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
      pciehp 0000:06:00.0: unloading service driver pciehp
      pciehp 0000:06:00.0: pcie_isr: intr_loc 11f
      pciehp 0000:06:00.0: Switch interrupt received
      pciehp 0000:06:00.0: Latch open on Slot
      pciehp 0000:06:00.0: Attention button interrupt received
      pciehp 0000:06:00.0: Button pressed on Slot
      pciehp 0000:06:00.0: Presence/Notify input change
      pciehp 0000:06:00.0: Card present on Slot
      pciehp 0000:06:00.0: Power fault interrupt received
      pciehp 0000:06:00.0: Data Link Layer State change
      pciehp 0000:06:00.0: Link Up event
    
    The pciehp driver correctly noticed that the Thunderbolt switch (05:00.0
    and 06:00.0) and NIC (07:00.0) had been removed, and it called their driver
    remove methods.
    
    Since the NIC was already gone, tg3 received 0xffffffff when it tried to
    read from the device.  The resulting timeout is a tg3 issue and not of
    interest here.
    
    Similarly, since the 06:00.0 Thunderbolt switch was already gone,
    pcie_isr() received 0xffff when it tried to read PCI_EXP_SLTSTA, and pciehp
    thought that was valid status showing that many events had happened: the
    latch had been opened, the attention button had been pressed, a card was
    now present, and the link was now up.  These are all wrong, of course, but
    pciehp went on to try to power up and enumerate devices below the
    non-existent bridge:
    
      pciehp 0000:06:00.0: PCI slot - powering on due to button press
      pciehp 0000:06:00.0: Surprise Insertion
      pci 0000:07:00.0 id reading try 50 times with interval 20 ms to get ffffffff
    
    [bhelgaas: changelog, also check in pcie_poll_cmd() & pcie_do_write_cmd()]
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99841
    Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
    Signed-off-by: Jarod Wilson <jarod@redhat.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index daf54be..8f3d3cf 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -111,6 +111,12 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout)
 
 	while (true) {
 		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
+		if (slot_status == (u16) ~0) {
+			ctrl_info(ctrl, "%s: no response from device\n",
+				  __func__);
+			return 0;
+		}
+
 		if (slot_status & PCI_EXP_SLTSTA_CC) {
 			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 						   PCI_EXP_SLTSTA_CC);
@@ -186,6 +192,11 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
 	pcie_wait_cmd(ctrl);
 
 	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
+	if (slot_ctrl == (u16) ~0) {
+		ctrl_info(ctrl, "%s: no response from device\n", __func__);
+		goto out;
+	}
+
 	slot_ctrl &= ~mask;
 	slot_ctrl |= (cmd & mask);
 	ctrl->cmd_busy = 1;
@@ -201,6 +212,7 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
 	if (wait)
 		pcie_wait_cmd(ctrl);
 
+out:
 	mutex_unlock(&ctrl->ctrl_lock);
 }
 
@@ -542,6 +554,11 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 	intr_loc = 0;
 	do {
 		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &detected);
+		if (detected == (u16) ~0) {
+			ctrl_info(ctrl, "%s: no response from device\n",
+				  __func__);
+			return IRQ_HANDLED;
+		}
 
 		detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
 			     PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
       [not found]   ` <CAKfmpSdrdjyiQ-WWBdXFuPuTyo0WkTTsTX5ByHBt7haZeF0w=g@mail.gmail.com>
@ 2015-08-04 14:10     ` Jarod Wilson
  2015-08-04 16:56     ` Bjorn Helgaas
  1 sibling, 0 replies; 10+ messages in thread
From: Jarod Wilson @ 2015-08-04 14:10 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

Re-sending, hopefully without multi-part now, so it gets to the list.
Temporarily forced into using a different mail client...

On Tue, Aug 4, 2015 at 10:05 AM, Jarod Wilson <jarod@redhat.com> wrote:
> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:
>> > https://bugzilla.kernel.org/show_bug.cgi?id=99841
>> >
>> > Seems like a read of all 1's from a register of a device that has gone
>> > away should be taken as a sign that the device has gone away.
>> > Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests
>> > as
>> > much with this snippet:
>> >
>> > |IMPLEMENTATION NOTE
>> > |Data Value of All 1’s
>> > |Many platforms, including those supporting RP Extensions for DPC, can
>> > |return a data value of all 1’s to software when an error is associated
>> > |with a PCI Express Configuration, I/O, or Memory Read Request. During
>> > DPC,
>> > |the Downstream Port discards Requests destined for the Link and
>> > completes
>> > |them with an error (i.e., either with an Unsupported Request (UR) or
>> > |Completer Abort (CA) Completion Status). By ending a series of MMIO or
>> > |configuration space operations with a read to an address with a known
>> > |data value not equal to all 1’s, software may determine if a Completer
>> > |has been removed or DPC has been triggered.
>> >
>> > I'm not sure the above is directly relevant to this case, but the same
>> > principle (reading all 1's means the device is probably gone) seems to
>> > hold.
>> >
>> > This is based on part of a debugging patch Bjorn posted in the
>> > referenced
>> > bugzilla, and its required to make the HP ZBook G2 I've got here not
>> > barf
>> > when disconnecting a thunderbolt ethernet adapter and corrupt memory.
>> >
>> > Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
>> > CC: Bjorn Helgaas <bhelgaas@google.com>
>> > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > CC: linux-pci@vger.kernel.org
>> > Signed-off-by: Jarod Wilson <jarod@redhat.com>
>>
>> Hi Jarod,
>>
>> I think there are two issues here:
>>
>>   1) pciehp doesn't handle all 1's correctly
>>   2) use-after-free of hotplug_slot
>>
>> This patch is for the first issue.  I think it's correct, but I still
>> have a question or two.   I attached an updated version of the patch
>> and changelog.
>>
>> Here's the path I think we're taking: 03:03.0 receives pciehp
>> interrupt for removal (link down and card not present), and we call
>> pciehp_unconfigure_device() for 05:00.0 and everything downstream from
>> it.  Part of this is removing 06:00.0.  I expected this would use this
>> path:
>>
>>   pcie_port_remove_service            # .remove method for 06:00.0
>>     dev_printk("unloading service driver ...")
>>     pciehp_remove                     # hpdriver_portdrv.remove
>>       pciehp_release_ctrl
>>         pcie_shutdown_notification
>>           pcie_disable_notification
>>             pcie_write_cmd
>>               pcie_do_write_cmd(..., true)    # wait
>>                 pcie_wait_cmd
>>                   pcie_poll_cmd
>>                     read PCI_EXP_SLTSTA        # would get 0xffff
>>                 read PCI_EXPT_SLTCTL        # would get 0xffff
>>
>> so I added checks for ~0 data in pcie_poll_cmd() and
>> pcie_do_write_cmd().
>>
>> But the dmesg log shows that we were in pcie_isr(), and I don't
>> understand yet how we got there.  Can you help figure that out?  Maybe
>> put a dump_stack() in pcie_isr() or something?
>
>
> pciehp_power_thread looks to be the entry point.
>
> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108
> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change
> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3)
> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change
> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event
> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal
> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on
> slot(3): already powering off
> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling
> domain:bus:device=0000:05:00
> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device:
> domain:bus:dev = 0000:05:00
> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will
> not clear MAC_TX_MODE=ffffffff
> [ 1950.525986] [sched_delayed] sched: RT throttling activated
> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp
> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9
> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command 0x1038
> (issued 19092 msec ago)
> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response
> from device
> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification:
> SLOTCTRL d8 write cmd 0
> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away
> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted
> 3.10.0-302.el7.hp.x86_64 #1
> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70
> Ver. 01.07 02/26/2015
> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread
> [ 1950.545034]  0000000000000000 00000000f721dd13 ffff8804822ffa78
> ffffffff81632729
> [ 1950.545036]  ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e
> 00000000f721dd13
> [ 1950.545038]  ffff8804818fab00 ffff880468f70cc0 000000000000002e
> 0000000000000282
> [ 1950.545039] Call Trace:
> [ 1950.545044]  [<ffffffff81632729>] dump_stack+0x19/0x1b
> [ 1950.545046]  [<ffffffff8133bf64>] pcie_isr+0x264/0x280
> [ 1950.545048]  [<ffffffff8111b6b9>] __free_irq+0x189/0x220
> [ 1950.545049]  [<ffffffff8111b7e9>] free_irq+0x49/0xb0
> [ 1950.545051]  [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0
> [ 1950.545053]  [<ffffffff81339db3>] pciehp_remove+0x23/0x30
> [ 1950.545055]  [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60
> [ 1950.545058]  [<ffffffff813f39ff>] __device_release_driver+0x7f/0xf0
> [ 1950.545059]  [<ffffffff813f3a93>] device_release_driver+0x23/0x30
> [ 1950.545061]  [<ffffffff813f3208>] bus_remove_device+0x108/0x180
> [ 1950.545062]  [<ffffffff813ef725>] device_del+0x135/0x1f0
> [ 1950.545064]  [<ffffffff81334360>] ? cleanup_service_irqs+0x40/0x40
> [ 1950.545065]  [<ffffffff813ef7fe>] device_unregister+0x1e/0x60
> [ 1950.545066]  [<ffffffff81334381>] remove_iter+0x21/0x30
> [ 1950.545068]  [<ffffffff813ef093>] device_for_each_child+0x53/0x90
> [ 1950.545069]  [<ffffffff81334ae2>] pcie_port_device_remove+0x22/0x40
> [ 1950.545070]  [<ffffffff81334e1e>] pcie_portdrv_remove+0xe/0x10
> [ 1950.545074]  [<ffffffff813266eb>] pci_device_remove+0x3b/0xb0
> [ 1950.545075]  [<ffffffff813f39ff>] __device_release_driver+0x7f/0xf0
> [ 1950.545077]  [<ffffffff813f3a93>] device_release_driver+0x23/0x30
> [ 1950.545078]  [<ffffffff8131fa54>] pci_stop_bus_device+0x94/0xa0
> [ 1950.545079]  [<ffffffff8131f9fb>] pci_stop_bus_device+0x3b/0xa0
> [ 1950.545081]  [<ffffffff8131fb42>]
> pci_stop_and_remove_bus_device+0x12/0x20
> [ 1950.545082]  [<ffffffff8133b8a0>] pciehp_unconfigure_device+0xb0/0x1b0
> [ 1950.545084]  [<ffffffff8133b282>] pciehp_disable_slot+0x52/0xd0
> [ 1950.545085]  [<ffffffff8133b3ea>] pciehp_power_thread+0xea/0x150
> [ 1950.545088]  [<ffffffff8109c57b>] process_one_work+0x17b/0x470
> [ 1950.545089]  [<ffffffff8109d34b>] worker_thread+0x11b/0x400
> [ 1950.545090]  [<ffffffff8109d230>] ? rescuer_thread+0x400/0x400
> [ 1950.545093]  [<ffffffff810a4a6f>] kthread+0xcf/0xe0
> [ 1950.545094]  [<ffffffff810a49a0>] ? kthread_create_on_node+0x140/0x140
> [ 1950.545097]  [<ffffffff81642d58>] ret_from_fork+0x58/0x90
> [ 1950.545099]  [<ffffffff810a49a0>] ? kthread_create_on_node+0x140/0x140
>
>
> I get pretty much identical stacks for these devices as well, in this order
> after the above:
> [ 1959.103493] pciehp 0000:03:06.0:pcie24: Device has gone away
> [ 1959.104152] pciehp 0000:03:05.0:pcie24: Device has gone away
> [ 1959.104736] pciehp 0000:03:04.0:pcie24: Device has gone away
> [ 1959.105265] pciehp 0000:03:03.0:pcie24: Device has gone away
> [ 1959.105841] pciehp 0000:03:00.0:pcie24: Device has gone away
>
>
> --jarod
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
       [not found]   ` <CAKfmpSdrdjyiQ-WWBdXFuPuTyo0WkTTsTX5ByHBt7haZeF0w=g@mail.gmail.com>
  2015-08-04 14:10     ` Jarod Wilson
@ 2015-08-04 16:56     ` Bjorn Helgaas
  2015-08-04 17:21       ` Jarod Wilson
  1 sibling, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2015-08-04 16:56 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote:
> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> 
> > On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=99841
> > >
> > > Seems like a read of all 1's from a register of a device that has gone
> > > away should be taken as a sign that the device has gone away.
> > > Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as
> > > much with this snippet:
> > >
> > > |IMPLEMENTATION NOTE
> > > |Data Value of All 1’s
> > > |Many platforms, including those supporting RP Extensions for DPC, can
> > > |return a data value of all 1’s to software when an error is associated
> > > |with a PCI Express Configuration, I/O, or Memory Read Request. During
> > DPC,
> > > |the Downstream Port discards Requests destined for the Link and
> > completes
> > > |them with an error (i.e., either with an Unsupported Request (UR) or
> > > |Completer Abort (CA) Completion Status). By ending a series of MMIO or
> > > |configuration space operations with a read to an address with a known
> > > |data value not equal to all 1’s, software may determine if a Completer
> > > |has been removed or DPC has been triggered.
> > >
> > > I'm not sure the above is directly relevant to this case, but the same
> > > principle (reading all 1's means the device is probably gone) seems to
> > > hold.
> > >
> > > This is based on part of a debugging patch Bjorn posted in the referenced
> > > bugzilla, and its required to make the HP ZBook G2 I've got here not barf
> > > when disconnecting a thunderbolt ethernet adapter and corrupt memory.
> > >
> > > Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> > > CC: Bjorn Helgaas <bhelgaas@google.com>
> > > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > CC: linux-pci@vger.kernel.org
> > > Signed-off-by: Jarod Wilson <jarod@redhat.com>
> >
> > Hi Jarod,
> >
> > I think there are two issues here:
> >
> >   1) pciehp doesn't handle all 1's correctly
> >   2) use-after-free of hotplug_slot
> >
> > This patch is for the first issue.  I think it's correct, but I still
> > have a question or two.   I attached an updated version of the patch
> > and changelog.
> >
> > Here's the path I think we're taking: 03:03.0 receives pciehp
> > interrupt for removal (link down and card not present), and we call
> > pciehp_unconfigure_device() for 05:00.0 and everything downstream from
> > it.  Part of this is removing 06:00.0.  I expected this would use this
> > path:
> >
> >   pcie_port_remove_service            # .remove method for 06:00.0
> >     dev_printk("unloading service driver ...")
> >     pciehp_remove                     # hpdriver_portdrv.remove
> >       pciehp_release_ctrl
> >         pcie_shutdown_notification
> >           pcie_disable_notification
> >             pcie_write_cmd
> >               pcie_do_write_cmd(..., true)    # wait
> >                 pcie_wait_cmd
> >                   pcie_poll_cmd
> >                     read PCI_EXP_SLTSTA        # would get 0xffff
> >                 read PCI_EXPT_SLTCTL        # would get 0xffff
> >
> > so I added checks for ~0 data in pcie_poll_cmd() and
> > pcie_do_write_cmd().
> >
> > But the dmesg log shows that we were in pcie_isr(), and I don't
> > understand yet how we got there.  Can you help figure that out?  Maybe
> > put a dump_stack() in pcie_isr() or something?
> 
> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108
> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change
> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3)
> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change
> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event
> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal
> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on
> slot(3): already powering off
> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling
> domain:bus:device=0000:05:00
> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device:
> domain:bus:dev = 0000:05:00
> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE
> will not clear MAC_TX_MODE=ffffffff
> [ 1950.525986] [sched_delayed] sched: RT throttling activated
> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp
> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9
> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command
> 0x1038 (issued 19092 msec ago)
> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response
> from device
> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification:
> SLOTCTRL d8 write cmd 0
> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away
> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted
> 3.10.0-302.el7.hp.x86_64 #1
> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70
> Ver. 01.07 02/26/2015
> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread
> [ 1950.545034]  0000000000000000 00000000f721dd13 ffff8804822ffa78
> ffffffff81632729
> [ 1950.545036]  ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e
> 00000000f721dd13
> [ 1950.545038]  ffff8804818fab00 ffff880468f70cc0 000000000000002e
> 0000000000000282
> [ 1950.545039] Call Trace:
> [ 1950.545044]  [<ffffffff81632729>] dump_stack+0x19/0x1b
> [ 1950.545046]  [<ffffffff8133bf64>] pcie_isr+0x264/0x280
> [ 1950.545048]  [<ffffffff8111b6b9>] __free_irq+0x189/0x220
> [ 1950.545049]  [<ffffffff8111b7e9>] free_irq+0x49/0xb0
> [ 1950.545051]  [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0
> [ 1950.545053]  [<ffffffff81339db3>] pciehp_remove+0x23/0x30
> [ 1950.545055]  [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60

Do you have CONFIG_DEBUG_SHIRQ set?  When CONFIG_DEBUG_SHIRQ is set,
__free_irq() calls the ISR one last time.  It does make sense that the
driver must be prepared for the ISR to be called at any time before
free_irq() returns.  I just didn't see a path for the already-removed
device to generate an actual interrupt.

Can you try the version I posted, with the additional tests in
pcie_poll_cmd() and pcie_do_write_cmd()?  We should try to read from
the device there, even before we free the IRQ, so we might see several
messages.  Maybe there's a way we can be smarter about bailing out
there.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
  2015-08-04 16:56     ` Bjorn Helgaas
@ 2015-08-04 17:21       ` Jarod Wilson
  2015-08-04 17:51         ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Jarod Wilson @ 2015-08-04 17:21 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On 8/4/2015 12:56 PM, Bjorn Helgaas wrote:
> On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote:
>> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>>> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=99841
>>>>
>>>> Seems like a read of all 1's from a register of a device that has gone
>>>> away should be taken as a sign that the device has gone away.
>>>> Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as
>>>> much with this snippet:
>>>>
>>>> |IMPLEMENTATION NOTE
>>>> |Data Value of All 1’s
>>>> |Many platforms, including those supporting RP Extensions for DPC, can
>>>> |return a data value of all 1’s to software when an error is associated
>>>> |with a PCI Express Configuration, I/O, or Memory Read Request. During
>>> DPC,
>>>> |the Downstream Port discards Requests destined for the Link and
>>> completes
>>>> |them with an error (i.e., either with an Unsupported Request (UR) or
>>>> |Completer Abort (CA) Completion Status). By ending a series of MMIO or
>>>> |configuration space operations with a read to an address with a known
>>>> |data value not equal to all 1’s, software may determine if a Completer
>>>> |has been removed or DPC has been triggered.
>>>>
>>>> I'm not sure the above is directly relevant to this case, but the same
>>>> principle (reading all 1's means the device is probably gone) seems to
>>>> hold.
>>>>
>>>> This is based on part of a debugging patch Bjorn posted in the referenced
>>>> bugzilla, and its required to make the HP ZBook G2 I've got here not barf
>>>> when disconnecting a thunderbolt ethernet adapter and corrupt memory.
>>>>
>>>> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
>>>> CC: Bjorn Helgaas <bhelgaas@google.com>
>>>> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>> CC: linux-pci@vger.kernel.org
>>>> Signed-off-by: Jarod Wilson <jarod@redhat.com>
>>>
>>> Hi Jarod,
>>>
>>> I think there are two issues here:
>>>
>>>    1) pciehp doesn't handle all 1's correctly
>>>    2) use-after-free of hotplug_slot
>>>
>>> This patch is for the first issue.  I think it's correct, but I still
>>> have a question or two.   I attached an updated version of the patch
>>> and changelog.
>>>
>>> Here's the path I think we're taking: 03:03.0 receives pciehp
>>> interrupt for removal (link down and card not present), and we call
>>> pciehp_unconfigure_device() for 05:00.0 and everything downstream from
>>> it.  Part of this is removing 06:00.0.  I expected this would use this
>>> path:
>>>
>>>    pcie_port_remove_service            # .remove method for 06:00.0
>>>      dev_printk("unloading service driver ...")
>>>      pciehp_remove                     # hpdriver_portdrv.remove
>>>        pciehp_release_ctrl
>>>          pcie_shutdown_notification
>>>            pcie_disable_notification
>>>              pcie_write_cmd
>>>                pcie_do_write_cmd(..., true)    # wait
>>>                  pcie_wait_cmd
>>>                    pcie_poll_cmd
>>>                      read PCI_EXP_SLTSTA        # would get 0xffff
>>>                  read PCI_EXPT_SLTCTL        # would get 0xffff
>>>
>>> so I added checks for ~0 data in pcie_poll_cmd() and
>>> pcie_do_write_cmd().
>>>
>>> But the dmesg log shows that we were in pcie_isr(), and I don't
>>> understand yet how we got there.  Can you help figure that out?  Maybe
>>> put a dump_stack() in pcie_isr() or something?
>>
>> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108
>> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change
>> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3)
>> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change
>> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event
>> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal
>> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on
>> slot(3): already powering off
>> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling
>> domain:bus:device=0000:05:00
>> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device:
>> domain:bus:dev = 0000:05:00
>> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE
>> will not clear MAC_TX_MODE=ffffffff
>> [ 1950.525986] [sched_delayed] sched: RT throttling activated
>> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp
>> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9
>> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command
>> 0x1038 (issued 19092 msec ago)
>> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response
>> from device
>> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification:
>> SLOTCTRL d8 write cmd 0
>> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away
>> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted
>> 3.10.0-302.el7.hp.x86_64 #1
>> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70
>> Ver. 01.07 02/26/2015
>> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread
>> [ 1950.545034]  0000000000000000 00000000f721dd13 ffff8804822ffa78
>> ffffffff81632729
>> [ 1950.545036]  ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e
>> 00000000f721dd13
>> [ 1950.545038]  ffff8804818fab00 ffff880468f70cc0 000000000000002e
>> 0000000000000282
>> [ 1950.545039] Call Trace:
>> [ 1950.545044]  [<ffffffff81632729>] dump_stack+0x19/0x1b
>> [ 1950.545046]  [<ffffffff8133bf64>] pcie_isr+0x264/0x280
>> [ 1950.545048]  [<ffffffff8111b6b9>] __free_irq+0x189/0x220
>> [ 1950.545049]  [<ffffffff8111b7e9>] free_irq+0x49/0xb0
>> [ 1950.545051]  [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0
>> [ 1950.545053]  [<ffffffff81339db3>] pciehp_remove+0x23/0x30
>> [ 1950.545055]  [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60
>
> Do you have CONFIG_DEBUG_SHIRQ set?  When CONFIG_DEBUG_SHIRQ is set,
> __free_irq() calls the ISR one last time.  It does make sense that the
> driver must be prepared for the ISR to be called at any time before
> free_irq() returns.  I just didn't see a path for the already-removed
> device to generate an actual interrupt.

Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're 
hitting the action->handler(irq, dev_id) bit in __free_irq(), after 
we've already done a bunch of teardown/removal?

> Can you try the version I posted, with the additional tests in
> pcie_poll_cmd() and pcie_do_write_cmd()?  We should try to read from
> the device there, even before we free the IRQ, so we might see several
> messages.  Maybe there's a way we can be smarter about bailing out
> there.

The above was with your additions munged in with the older patch, I 
actually do see "pcie_do_write_cmd: no response from device" just two 
lines ahead of each "Device has gone away" message from pcie_isr().

pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device
pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 write 
cmd 0
pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()


-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
  2015-08-04 17:21       ` Jarod Wilson
@ 2015-08-04 17:51         ` Bjorn Helgaas
  2015-08-04 18:46           ` Jarod Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2015-08-04 17:51 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On Tue, Aug 04, 2015 at 01:21:23PM -0400, Jarod Wilson wrote:
> On 8/4/2015 12:56 PM, Bjorn Helgaas wrote:
> >On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote:
> >>On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >>>On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:
> >>>>https://bugzilla.kernel.org/show_bug.cgi?id=99841
> >>>>
> >>>>Seems like a read of all 1's from a register of a device that has gone
> >>>>away should be taken as a sign that the device has gone away.
> >>>>Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as
> >>>>much with this snippet:
> >>>>
> >>>>|IMPLEMENTATION NOTE
> >>>>|Data Value of All 1’s
> >>>>|Many platforms, including those supporting RP Extensions for DPC, can
> >>>>|return a data value of all 1’s to software when an error is associated
> >>>>|with a PCI Express Configuration, I/O, or Memory Read Request. During
> >>>DPC,
> >>>>|the Downstream Port discards Requests destined for the Link and
> >>>completes
> >>>>|them with an error (i.e., either with an Unsupported Request (UR) or
> >>>>|Completer Abort (CA) Completion Status). By ending a series of MMIO or
> >>>>|configuration space operations with a read to an address with a known
> >>>>|data value not equal to all 1’s, software may determine if a Completer
> >>>>|has been removed or DPC has been triggered.
> >>>>
> >>>>I'm not sure the above is directly relevant to this case, but the same
> >>>>principle (reading all 1's means the device is probably gone) seems to
> >>>>hold.
> >>>>
> >>>>This is based on part of a debugging patch Bjorn posted in the referenced
> >>>>bugzilla, and its required to make the HP ZBook G2 I've got here not barf
> >>>>when disconnecting a thunderbolt ethernet adapter and corrupt memory.
> >>>>
> >>>>Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> >>>>CC: Bjorn Helgaas <bhelgaas@google.com>
> >>>>CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>>CC: linux-pci@vger.kernel.org
> >>>>Signed-off-by: Jarod Wilson <jarod@redhat.com>
> >>>
> >>>Hi Jarod,
> >>>
> >>>I think there are two issues here:
> >>>
> >>>   1) pciehp doesn't handle all 1's correctly
> >>>   2) use-after-free of hotplug_slot
> >>>
> >>>This patch is for the first issue.  I think it's correct, but I still
> >>>have a question or two.   I attached an updated version of the patch
> >>>and changelog.
> >>>
> >>>Here's the path I think we're taking: 03:03.0 receives pciehp
> >>>interrupt for removal (link down and card not present), and we call
> >>>pciehp_unconfigure_device() for 05:00.0 and everything downstream from
> >>>it.  Part of this is removing 06:00.0.  I expected this would use this
> >>>path:
> >>>
> >>>   pcie_port_remove_service            # .remove method for 06:00.0
> >>>     dev_printk("unloading service driver ...")
> >>>     pciehp_remove                     # hpdriver_portdrv.remove
> >>>       pciehp_release_ctrl
> >>>         pcie_shutdown_notification
> >>>           pcie_disable_notification
> >>>             pcie_write_cmd
> >>>               pcie_do_write_cmd(..., true)    # wait
> >>>                 pcie_wait_cmd
> >>>                   pcie_poll_cmd
> >>>                     read PCI_EXP_SLTSTA        # would get 0xffff
> >>>                 read PCI_EXPT_SLTCTL        # would get 0xffff
> >>>
> >>>so I added checks for ~0 data in pcie_poll_cmd() and
> >>>pcie_do_write_cmd().
> >>>
> >>>But the dmesg log shows that we were in pcie_isr(), and I don't
> >>>understand yet how we got there.  Can you help figure that out?  Maybe
> >>>put a dump_stack() in pcie_isr() or something?
> >>
> >>[ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108
> >>[ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change
> >>[ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3)
> >>[ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change
> >>[ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event
> >>[ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal
> >>[ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on
> >>slot(3): already powering off
> >>[ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling
> >>domain:bus:device=0000:05:00
> >>[ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device:
> >>domain:bus:dev = 0000:05:00
> >>[ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE
> >>will not clear MAC_TX_MODE=ffffffff
> >>[ 1950.525986] [sched_delayed] sched: RT throttling activated
> >>[ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp
> >>[ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9
> >>[ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command
> >>0x1038 (issued 19092 msec ago)
> >>[ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response
> >>from device
> >>[ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification:
> >>SLOTCTRL d8 write cmd 0
> >>[ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away
> >>[ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted
> >>3.10.0-302.el7.hp.x86_64 #1
> >>[ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70
> >>Ver. 01.07 02/26/2015
> >>[ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread
> >>[ 1950.545034]  0000000000000000 00000000f721dd13 ffff8804822ffa78
> >>ffffffff81632729
> >>[ 1950.545036]  ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e
> >>00000000f721dd13
> >>[ 1950.545038]  ffff8804818fab00 ffff880468f70cc0 000000000000002e
> >>0000000000000282
> >>[ 1950.545039] Call Trace:
> >>[ 1950.545044]  [<ffffffff81632729>] dump_stack+0x19/0x1b
> >>[ 1950.545046]  [<ffffffff8133bf64>] pcie_isr+0x264/0x280
> >>[ 1950.545048]  [<ffffffff8111b6b9>] __free_irq+0x189/0x220
> >>[ 1950.545049]  [<ffffffff8111b7e9>] free_irq+0x49/0xb0
> >>[ 1950.545051]  [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0
> >>[ 1950.545053]  [<ffffffff81339db3>] pciehp_remove+0x23/0x30
> >>[ 1950.545055]  [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60
> >
> >Do you have CONFIG_DEBUG_SHIRQ set?  When CONFIG_DEBUG_SHIRQ is set,
> >__free_irq() calls the ISR one last time.  It does make sense that the
> >driver must be prepared for the ISR to be called at any time before
> >free_irq() returns.  I just didn't see a path for the already-removed
> >device to generate an actual interrupt.
> 
> Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're
> hitting the action->handler(irq, dev_id) bit in __free_irq(), after
> we've already done a bunch of teardown/removal?

OK, good, that explains why we called pcie_isr().

> >Can you try the version I posted, with the additional tests in
> >pcie_poll_cmd() and pcie_do_write_cmd()?  We should try to read from
> >the device there, even before we free the IRQ, so we might see several
> >messages.  Maybe there's a way we can be smarter about bailing out
> >there.
> 
> The above was with your additions munged in with the older patch, I
> actually do see "pcie_do_write_cmd: no response from device" just
> two lines ahead of each "Device has gone away" message from
> pcie_isr().
> 
> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device
> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8
> write cmd 0
> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()

Oh, sorry!  I should have noticed that.  I just wanted to make sure I
didn't cause a flood of extra messages.

I think I'll merge this version (with all three checks).  We still have a
slot lifetime issue, but that's a separate problem.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
  2015-08-04 17:51         ` Bjorn Helgaas
@ 2015-08-04 18:46           ` Jarod Wilson
  2015-08-04 19:27             ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Jarod Wilson @ 2015-08-04 18:46 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On 8/4/2015 1:51 PM, Bjorn Helgaas wrote:
> On Tue, Aug 04, 2015 at 01:21:23PM -0400, Jarod Wilson wrote:
>> On 8/4/2015 12:56 PM, Bjorn Helgaas wrote:
>>> On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote:
>>>> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>>
>>>>> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=99841
>>>>>>
>>>>>> Seems like a read of all 1's from a register of a device that has gone
>>>>>> away should be taken as a sign that the device has gone away.
>>>>>> Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as
>>>>>> much with this snippet:
>>>>>>
>>>>>> |IMPLEMENTATION NOTE
>>>>>> |Data Value of All 1’s
>>>>>> |Many platforms, including those supporting RP Extensions for DPC, can
>>>>>> |return a data value of all 1’s to software when an error is associated
>>>>>> |with a PCI Express Configuration, I/O, or Memory Read Request. During
>>>>> DPC,
>>>>>> |the Downstream Port discards Requests destined for the Link and
>>>>> completes
>>>>>> |them with an error (i.e., either with an Unsupported Request (UR) or
>>>>>> |Completer Abort (CA) Completion Status). By ending a series of MMIO or
>>>>>> |configuration space operations with a read to an address with a known
>>>>>> |data value not equal to all 1’s, software may determine if a Completer
>>>>>> |has been removed or DPC has been triggered.
>>>>>>
>>>>>> I'm not sure the above is directly relevant to this case, but the same
>>>>>> principle (reading all 1's means the device is probably gone) seems to
>>>>>> hold.
>>>>>>
>>>>>> This is based on part of a debugging patch Bjorn posted in the referenced
>>>>>> bugzilla, and its required to make the HP ZBook G2 I've got here not barf
>>>>>> when disconnecting a thunderbolt ethernet adapter and corrupt memory.
>>>>>>
>>>>>> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
>>>>>> CC: Bjorn Helgaas <bhelgaas@google.com>
>>>>>> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>>> CC: linux-pci@vger.kernel.org
>>>>>> Signed-off-by: Jarod Wilson <jarod@redhat.com>
>>>>>
>>>>> Hi Jarod,
>>>>>
>>>>> I think there are two issues here:
>>>>>
>>>>>    1) pciehp doesn't handle all 1's correctly
>>>>>    2) use-after-free of hotplug_slot
>>>>>
>>>>> This patch is for the first issue.  I think it's correct, but I still
>>>>> have a question or two.   I attached an updated version of the patch
>>>>> and changelog.
>>>>>
>>>>> Here's the path I think we're taking: 03:03.0 receives pciehp
>>>>> interrupt for removal (link down and card not present), and we call
>>>>> pciehp_unconfigure_device() for 05:00.0 and everything downstream from
>>>>> it.  Part of this is removing 06:00.0.  I expected this would use this
>>>>> path:
>>>>>
>>>>>    pcie_port_remove_service            # .remove method for 06:00.0
>>>>>      dev_printk("unloading service driver ...")
>>>>>      pciehp_remove                     # hpdriver_portdrv.remove
>>>>>        pciehp_release_ctrl
>>>>>          pcie_shutdown_notification
>>>>>            pcie_disable_notification
>>>>>              pcie_write_cmd
>>>>>                pcie_do_write_cmd(..., true)    # wait
>>>>>                  pcie_wait_cmd
>>>>>                    pcie_poll_cmd
>>>>>                      read PCI_EXP_SLTSTA        # would get 0xffff
>>>>>                  read PCI_EXPT_SLTCTL        # would get 0xffff
>>>>>
>>>>> so I added checks for ~0 data in pcie_poll_cmd() and
>>>>> pcie_do_write_cmd().
>>>>>
>>>>> But the dmesg log shows that we were in pcie_isr(), and I don't
>>>>> understand yet how we got there.  Can you help figure that out?  Maybe
>>>>> put a dump_stack() in pcie_isr() or something?
>>>>
>>>> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108
>>>> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change
>>>> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3)
>>>> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change
>>>> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event
>>>> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal
>>>> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on
>>>> slot(3): already powering off
>>>> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling
>>>> domain:bus:device=0000:05:00
>>>> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device:
>>>> domain:bus:dev = 0000:05:00
>>>> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE
>>>> will not clear MAC_TX_MODE=ffffffff
>>>> [ 1950.525986] [sched_delayed] sched: RT throttling activated
>>>> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp
>>>> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9
>>>> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command
>>>> 0x1038 (issued 19092 msec ago)
>>>> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response
>>> >from device
>>>> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification:
>>>> SLOTCTRL d8 write cmd 0
>>>> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away
>>>> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted
>>>> 3.10.0-302.el7.hp.x86_64 #1
>>>> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70
>>>> Ver. 01.07 02/26/2015
>>>> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread
>>>> [ 1950.545034]  0000000000000000 00000000f721dd13 ffff8804822ffa78
>>>> ffffffff81632729
>>>> [ 1950.545036]  ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e
>>>> 00000000f721dd13
>>>> [ 1950.545038]  ffff8804818fab00 ffff880468f70cc0 000000000000002e
>>>> 0000000000000282
>>>> [ 1950.545039] Call Trace:
>>>> [ 1950.545044]  [<ffffffff81632729>] dump_stack+0x19/0x1b
>>>> [ 1950.545046]  [<ffffffff8133bf64>] pcie_isr+0x264/0x280
>>>> [ 1950.545048]  [<ffffffff8111b6b9>] __free_irq+0x189/0x220
>>>> [ 1950.545049]  [<ffffffff8111b7e9>] free_irq+0x49/0xb0
>>>> [ 1950.545051]  [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0
>>>> [ 1950.545053]  [<ffffffff81339db3>] pciehp_remove+0x23/0x30
>>>> [ 1950.545055]  [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60
>>>
>>> Do you have CONFIG_DEBUG_SHIRQ set?  When CONFIG_DEBUG_SHIRQ is set,
>>> __free_irq() calls the ISR one last time.  It does make sense that the
>>> driver must be prepared for the ISR to be called at any time before
>>> free_irq() returns.  I just didn't see a path for the already-removed
>>> device to generate an actual interrupt.
>>
>> Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're
>> hitting the action->handler(irq, dev_id) bit in __free_irq(), after
>> we've already done a bunch of teardown/removal?
>
> OK, good, that explains why we called pcie_isr().
>
>>> Can you try the version I posted, with the additional tests in
>>> pcie_poll_cmd() and pcie_do_write_cmd()?  We should try to read from
>>> the device there, even before we free the IRQ, so we might see several
>>> messages.  Maybe there's a way we can be smarter about bailing out
>>> there.
>>
>> The above was with your additions munged in with the older patch, I
>> actually do see "pcie_do_write_cmd: no response from device" just
>> two lines ahead of each "Device has gone away" message from
>> pcie_isr().
>>
>> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device
>> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8
>> write cmd 0
>> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()
>
> Oh, sorry!  I should have noticed that.  I just wanted to make sure I
> didn't cause a flood of extra messages.
>
> I think I'll merge this version (with all three checks).  We still have a
> slot lifetime issue, but that's a separate problem.

Sounds good to me, thanks much for your help on this.

Do we really still have a slot lifetime issue though? It looks to be the 
path from pciehp_release_ctrl that leads to free_irq and __free_irq 
calling pcie_isr one last time, and there's a ctrl_info("Latch %s on 
Slot(%s)", open ? ..., slot_name(slot)); in pcie_isr *if* we aren't 
bailing when we read all 1's from PCI_EXP_SLTSTA. I think when we bail 
early, we should never see the subsequent attempt to read the freed slot.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
  2015-08-04 18:46           ` Jarod Wilson
@ 2015-08-04 19:27             ` Bjorn Helgaas
  2015-08-04 20:22               ` Jarod Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2015-08-04 19:27 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On Tue, Aug 4, 2015 at 1:46 PM, Jarod Wilson <jarod@redhat.com> wrote:
> On 8/4/2015 1:51 PM, Bjorn Helgaas wrote:
>>
>> On Tue, Aug 04, 2015 at 01:21:23PM -0400, Jarod Wilson wrote:
>>>
>>> On 8/4/2015 12:56 PM, Bjorn Helgaas wrote:
>>>>
>>>> On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote:
>>>>>
>>>>> On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:
>>>>>>>
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=99841
>>>>>>>
>>>>>>> Seems like a read of all 1's from a register of a device that has
>>>>>>> gone
>>>>>>> away should be taken as a sign that the device has gone away.
>>>>>>> Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014)
>>>>>>> suggests as
>>>>>>> much with this snippet:
>>>>>>>
>>>>>>> |IMPLEMENTATION NOTE
>>>>>>> |Data Value of All 1’s
>>>>>>> |Many platforms, including those supporting RP Extensions for DPC,
>>>>>>> can
>>>>>>> |return a data value of all 1’s to software when an error is
>>>>>>> associated
>>>>>>> |with a PCI Express Configuration, I/O, or Memory Read Request.
>>>>>>> During
>>>>>>
>>>>>> DPC,
>>>>>>>
>>>>>>> |the Downstream Port discards Requests destined for the Link and
>>>>>>
>>>>>> completes
>>>>>>>
>>>>>>> |them with an error (i.e., either with an Unsupported Request (UR) or
>>>>>>> |Completer Abort (CA) Completion Status). By ending a series of MMIO
>>>>>>> or
>>>>>>> |configuration space operations with a read to an address with a
>>>>>>> known
>>>>>>> |data value not equal to all 1’s, software may determine if a
>>>>>>> Completer
>>>>>>> |has been removed or DPC has been triggered.
>>>>>>>
>>>>>>> I'm not sure the above is directly relevant to this case, but the
>>>>>>> same
>>>>>>> principle (reading all 1's means the device is probably gone) seems
>>>>>>> to
>>>>>>> hold.
>>>>>>>
>>>>>>> This is based on part of a debugging patch Bjorn posted in the
>>>>>>> referenced
>>>>>>> bugzilla, and its required to make the HP ZBook G2 I've got here not
>>>>>>> barf
>>>>>>> when disconnecting a thunderbolt ethernet adapter and corrupt memory.
>>>>>>>
>>>>>>> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
>>>>>>> CC: Bjorn Helgaas <bhelgaas@google.com>
>>>>>>> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>>>> CC: linux-pci@vger.kernel.org
>>>>>>> Signed-off-by: Jarod Wilson <jarod@redhat.com>
>>>>>>
>>>>>>
>>>>>> Hi Jarod,
>>>>>>
>>>>>> I think there are two issues here:
>>>>>>
>>>>>>    1) pciehp doesn't handle all 1's correctly
>>>>>>    2) use-after-free of hotplug_slot
>>>>>>
>>>>>> This patch is for the first issue.  I think it's correct, but I still
>>>>>> have a question or two.   I attached an updated version of the patch
>>>>>> and changelog.
>>>>>>
>>>>>> Here's the path I think we're taking: 03:03.0 receives pciehp
>>>>>> interrupt for removal (link down and card not present), and we call
>>>>>> pciehp_unconfigure_device() for 05:00.0 and everything downstream from
>>>>>> it.  Part of this is removing 06:00.0.  I expected this would use this
>>>>>> path:
>>>>>>
>>>>>>    pcie_port_remove_service            # .remove method for 06:00.0
>>>>>>      dev_printk("unloading service driver ...")
>>>>>>      pciehp_remove                     # hpdriver_portdrv.remove
>>>>>>        pciehp_release_ctrl
>>>>>>          pcie_shutdown_notification
>>>>>>            pcie_disable_notification
>>>>>>              pcie_write_cmd
>>>>>>                pcie_do_write_cmd(..., true)    # wait
>>>>>>                  pcie_wait_cmd
>>>>>>                    pcie_poll_cmd
>>>>>>                      read PCI_EXP_SLTSTA        # would get 0xffff
>>>>>>                  read PCI_EXPT_SLTCTL        # would get 0xffff
>>>>>>
>>>>>> so I added checks for ~0 data in pcie_poll_cmd() and
>>>>>> pcie_do_write_cmd().
>>>>>>
>>>>>> But the dmesg log shows that we were in pcie_isr(), and I don't
>>>>>> understand yet how we got there.  Can you help figure that out?  Maybe
>>>>>> put a dump_stack() in pcie_isr() or something?
>>>>>
>>>>>
>>>>> [ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108
>>>>> [ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change
>>>>> [ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3)
>>>>> [ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change
>>>>> [ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event
>>>>> [ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal
>>>>> [ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on
>>>>> slot(3): already powering off
>>>>> [ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling
>>>>> domain:bus:device=0000:05:00
>>>>> [ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device:
>>>>> domain:bus:dev = 0000:05:00
>>>>> [ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE
>>>>> will not clear MAC_TX_MODE=ffffffff
>>>>> [ 1950.525986] [sched_delayed] sched: RT throttling activated
>>>>> [ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver
>>>>> pciehp
>>>>> [ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot
>>>>> = 9
>>>>> [ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command
>>>>> 0x1038 (issued 19092 msec ago)
>>>>> [ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no
>>>>> response
>>>>
>>>> >from device
>>>>>
>>>>> [ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification:
>>>>> SLOTCTRL d8 write cmd 0
>>>>> [ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away
>>>>> [ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted
>>>>> 3.10.0-302.el7.hp.x86_64 #1
>>>>> [ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS
>>>>> M70
>>>>> Ver. 01.07 02/26/2015
>>>>> [ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread
>>>>> [ 1950.545034]  0000000000000000 00000000f721dd13 ffff8804822ffa78
>>>>> ffffffff81632729
>>>>> [ 1950.545036]  ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e
>>>>> 00000000f721dd13
>>>>> [ 1950.545038]  ffff8804818fab00 ffff880468f70cc0 000000000000002e
>>>>> 0000000000000282
>>>>> [ 1950.545039] Call Trace:
>>>>> [ 1950.545044]  [<ffffffff81632729>] dump_stack+0x19/0x1b
>>>>> [ 1950.545046]  [<ffffffff8133bf64>] pcie_isr+0x264/0x280
>>>>> [ 1950.545048]  [<ffffffff8111b6b9>] __free_irq+0x189/0x220
>>>>> [ 1950.545049]  [<ffffffff8111b7e9>] free_irq+0x49/0xb0
>>>>> [ 1950.545051]  [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0
>>>>> [ 1950.545053]  [<ffffffff81339db3>] pciehp_remove+0x23/0x30
>>>>> [ 1950.545055]  [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60
>>>>
>>>>
>>>> Do you have CONFIG_DEBUG_SHIRQ set?  When CONFIG_DEBUG_SHIRQ is set,
>>>> __free_irq() calls the ISR one last time.  It does make sense that the
>>>> driver must be prepared for the ISR to be called at any time before
>>>> free_irq() returns.  I just didn't see a path for the already-removed
>>>> device to generate an actual interrupt.
>>>
>>>
>>> Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're
>>> hitting the action->handler(irq, dev_id) bit in __free_irq(), after
>>> we've already done a bunch of teardown/removal?
>>
>>
>> OK, good, that explains why we called pcie_isr().
>>
>>>> Can you try the version I posted, with the additional tests in
>>>> pcie_poll_cmd() and pcie_do_write_cmd()?  We should try to read from
>>>> the device there, even before we free the IRQ, so we might see several
>>>> messages.  Maybe there's a way we can be smarter about bailing out
>>>> there.
>>>
>>>
>>> The above was with your additions munged in with the older patch, I
>>> actually do see "pcie_do_write_cmd: no response from device" just
>>> two lines ahead of each "Device has gone away" message from
>>> pcie_isr().
>>>
>>> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device
>>> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8
>>> write cmd 0
>>> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()
>>
>>
>> Oh, sorry!  I should have noticed that.  I just wanted to make sure I
>> didn't cause a flood of extra messages.
>>
>> I think I'll merge this version (with all three checks).  We still have a
>> slot lifetime issue, but that's a separate problem.
>
>
> Sounds good to me, thanks much for your help on this.
>
> Do we really still have a slot lifetime issue though? It looks to be the
> path from pciehp_release_ctrl that leads to free_irq and __free_irq calling
> pcie_isr one last time, and there's a ctrl_info("Latch %s on Slot(%s)", open
> ? ..., slot_name(slot)); in pcie_isr *if* we aren't bailing when we read all
> 1's from PCI_EXP_SLTSTA. I think when we bail early, we should never see the
> subsequent attempt to read the freed slot.

It's possible that we avoid referencing the freed data, but I don't
have warm fuzzies because it's hard to prove that by analyzing the
source code.  It's hard to even know what to look for -- there's no
clue in the code that says "don't reference slot->hotplug_slot after
this point."  And it feels like a poor design to hang on to that
pointer after the slot has been freed.

IIRC, your initial report mentioned possible memory corruption, and I
don't even have a theory about where that happened.  The
slot->hotplug_slot references I saw were all reads where we printed
junk but shouldn't have actually corrupted anything.

Anyway, I don't know what to do about it, and I don't have time right
now to poke at it, but it does worry me a bit.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
  2015-08-04 19:27             ` Bjorn Helgaas
@ 2015-08-04 20:22               ` Jarod Wilson
  2015-08-04 21:11                 ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Jarod Wilson @ 2015-08-04 20:22 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On 8/4/2015 3:27 PM, Bjorn Helgaas wrote:
> On Tue, Aug 4, 2015 at 1:46 PM, Jarod Wilson <jarod@redhat.com> wrote:
>> On 8/4/2015 1:51 PM, Bjorn Helgaas wrote:
...
>>>>> Can you try the version I posted, with the additional tests in
>>>>> pcie_poll_cmd() and pcie_do_write_cmd()?  We should try to read from
>>>>> the device there, even before we free the IRQ, so we might see several
>>>>> messages.  Maybe there's a way we can be smarter about bailing out
>>>>> there.
>>>>
>>>>
>>>> The above was with your additions munged in with the older patch, I
>>>> actually do see "pcie_do_write_cmd: no response from device" just
>>>> two lines ahead of each "Device has gone away" message from
>>>> pcie_isr().
>>>>
>>>> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device
>>>> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8
>>>> write cmd 0
>>>> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()
>>>
>>>
>>> Oh, sorry!  I should have noticed that.  I just wanted to make sure I
>>> didn't cause a flood of extra messages.
>>>
>>> I think I'll merge this version (with all three checks).  We still have a
>>> slot lifetime issue, but that's a separate problem.
>>
>>
>> Sounds good to me, thanks much for your help on this.
>>
>> Do we really still have a slot lifetime issue though? It looks to be the
>> path from pciehp_release_ctrl that leads to free_irq and __free_irq calling
>> pcie_isr one last time, and there's a ctrl_info("Latch %s on Slot(%s)", open
>> ? ..., slot_name(slot)); in pcie_isr *if* we aren't bailing when we read all
>> 1's from PCI_EXP_SLTSTA. I think when we bail early, we should never see the
>> subsequent attempt to read the freed slot.
>
> It's possible that we avoid referencing the freed data, but I don't
> have warm fuzzies because it's hard to prove that by analyzing the
> source code.  It's hard to even know what to look for -- there's no
> clue in the code that says "don't reference slot->hotplug_slot after
> this point."  And it feels like a poor design to hang on to that
> pointer after the slot has been freed.
>
> IIRC, your initial report mentioned possible memory corruption, and I
> don't even have a theory about where that happened.  The
> slot->hotplug_slot references I saw were all reads where we printed
> junk but shouldn't have actually corrupted anything.

Looking at the output I was seeing, it looks like one of the ~0 reads is 
interpreted as a switch interrupt received, data link layer state 
change, etc., followed by "Enabling domain:bus:device=0000:0x:00" from 
pciehp_power_thread. Subsequently, we're calling pciehp_enable_slot, 
which calls board_added, and in the output I've got, its tripping over 
board_added's call to pciehp_check_link_status ("Failed to check link 
status"), which means going to err_exit and calling set_slot_off.

Next up, set_slot_off is calling pciehp_power_off_slot, which does a 
pcie_write_cmd(). Is it possible that write might lead to memory corruption?

> Anyway, I don't know what to do about it, and I don't have time right
> now to poke at it, but it does worry me a bit.

Definitely a bit worrying, still trying to comprehend it all here myself...

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices
  2015-08-04 20:22               ` Jarod Wilson
@ 2015-08-04 21:11                 ` Bjorn Helgaas
  0 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2015-08-04 21:11 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-kernel, Rafael J. Wysocki, linux-pci

On Tue, Aug 4, 2015 at 3:22 PM, Jarod Wilson <jarod@redhat.com> wrote:
> On 8/4/2015 3:27 PM, Bjorn Helgaas wrote:
>>
>> On Tue, Aug 4, 2015 at 1:46 PM, Jarod Wilson <jarod@redhat.com> wrote:
>>>
>>> On 8/4/2015 1:51 PM, Bjorn Helgaas wrote:
>
> ...
>
>>>>>> Can you try the version I posted, with the additional tests in
>>>>>> pcie_poll_cmd() and pcie_do_write_cmd()?  We should try to read from
>>>>>> the device there, even before we free the IRQ, so we might see several
>>>>>> messages.  Maybe there's a way we can be smarter about bailing out
>>>>>> there.
>>>>>
>>>>>
>>>>>
>>>>> The above was with your additions munged in with the older patch, I
>>>>> actually do see "pcie_do_write_cmd: no response from device" just
>>>>> two lines ahead of each "Device has gone away" message from
>>>>> pcie_isr().
>>>>>
>>>>> pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device
>>>>> pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8
>>>>> write cmd 0
>>>>> pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()
>>>>
>>>>
>>>>
>>>> Oh, sorry!  I should have noticed that.  I just wanted to make sure I
>>>> didn't cause a flood of extra messages.
>>>>
>>>> I think I'll merge this version (with all three checks).  We still have
>>>> a
>>>> slot lifetime issue, but that's a separate problem.
>>>
>>>
>>>
>>> Sounds good to me, thanks much for your help on this.
>>>
>>> Do we really still have a slot lifetime issue though? It looks to be the
>>> path from pciehp_release_ctrl that leads to free_irq and __free_irq
>>> calling
>>> pcie_isr one last time, and there's a ctrl_info("Latch %s on Slot(%s)",
>>> open
>>> ? ..., slot_name(slot)); in pcie_isr *if* we aren't bailing when we read
>>> all
>>> 1's from PCI_EXP_SLTSTA. I think when we bail early, we should never see
>>> the
>>> subsequent attempt to read the freed slot.
>>
>>
>> It's possible that we avoid referencing the freed data, but I don't
>> have warm fuzzies because it's hard to prove that by analyzing the
>> source code.  It's hard to even know what to look for -- there's no
>> clue in the code that says "don't reference slot->hotplug_slot after
>> this point."  And it feels like a poor design to hang on to that
>> pointer after the slot has been freed.
>>
>> IIRC, your initial report mentioned possible memory corruption, and I
>> don't even have a theory about where that happened.  The
>> slot->hotplug_slot references I saw were all reads where we printed
>> junk but shouldn't have actually corrupted anything.
>
>
> Looking at the output I was seeing, it looks like one of the ~0 reads is
> interpreted as a switch interrupt received, data link layer state change,
> etc., followed by "Enabling domain:bus:device=0000:0x:00" from
> pciehp_power_thread. Subsequently, we're calling pciehp_enable_slot, which
> calls board_added, and in the output I've got, its tripping over
> board_added's call to pciehp_check_link_status ("Failed to check link
> status"), which means going to err_exit and calling set_slot_off.
>
> Next up, set_slot_off is calling pciehp_power_off_slot, which does a
> pcie_write_cmd(). Is it possible that write might lead to memory corruption?

I doubt it; pcie_write_cmd() by itself just writes to a bridge
register.  Even if the device is gone, that shouldn't corrupt memory.
But I don't know what really happened, and I don't remember what led
to the corruption hypothesis in the first place.  I think the
corrupted-looking slot name strings are just a consequence of reading
memory that had already been freed.  With some work, we might be able
to confirm that by matching it with a poison pattern or something.

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-08-04 21:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-21 16:25 [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices Jarod Wilson
2015-08-03  4:14 ` Bjorn Helgaas
     [not found]   ` <CAKfmpSdrdjyiQ-WWBdXFuPuTyo0WkTTsTX5ByHBt7haZeF0w=g@mail.gmail.com>
2015-08-04 14:10     ` Jarod Wilson
2015-08-04 16:56     ` Bjorn Helgaas
2015-08-04 17:21       ` Jarod Wilson
2015-08-04 17:51         ` Bjorn Helgaas
2015-08-04 18:46           ` Jarod Wilson
2015-08-04 19:27             ` Bjorn Helgaas
2015-08-04 20:22               ` Jarod Wilson
2015-08-04 21:11                 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).