All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (28 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 18/32] PCI: pciehp: Drop enable/disable lock Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-17 16:44   ` Andy Shevchenko
  2018-06-16 19:25 ` [PATCH 24/32] PCI: portdrv: Deduplicate PM callback iterator Lukas Wunner
                   ` (4 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Len Brown, Scott Murray,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Gavin Shan, Sebastian Ott, Gerald Schaefer, Corentin Chary,
	Darren Hart, Andy Shevchenko, linux-acpi, linux-s390,
	platform-driver-x86, acpi4asus-user

When a hotplug driver calls pci_hp_register(), all steps necessary for
registration are carried out in one go, including creation of a kobject
and addition to sysfs.  That's a problem for pciehp once it's converted
to enable/disable the slot exclusively from the IRQ thread:  The thread
needs to be spawned after creation of the kobject (because it uses the
kobject's name), but before addition to sysfs (because it will handle
enable/disable requests submitted via sysfs).

pci_hp_deregister() does offer a ->release callback that's invoked
after deletion from sysfs and before destruction of the kobject.  But
because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
->probe and ->remove code becomes asymmetric, which is error prone
as recently discovered use-after-free bugs in pciehp's ->remove hook
have shown.

In a sense, this appears to be a case of the midlayer antipattern:

   "The core thesis of the "midlayer mistake" is that midlayers are
    bad and should not exist.  That common functionality which it is
    so tempting to put in a midlayer should instead be provided as
    library routines which can [be] used, augmented, or ignored by
    each bottom level driver independently.  Thus every subsystem
    that supports multiple implementations (or drivers) should
    provide a very thin top layer which calls directly into the
    bottom layer drivers, and a rich library of support code that
    eases the implementation of those drivers.  This library is
    available to, but not forced upon, those drivers."
        --  Neil Brown (2009), https://lwn.net/Articles/336262/

The presence of midlayer traits in the PCI hotplug core might be ascribed
to its age:  When it was introduced in February 2002, the blessings of a
library approach might not have been well known:
https://git.kernel.org/tglx/history/c/a8a2069f432c

For comparison, the driver core does offer split functions for creating
a kobject (device_initialize()) and addition to sysfs (device_add()) as
an alternative to carrying out everything at once (device_register()).
This was introduced in October 2002:
https://git.kernel.org/tglx/history/c/8b290eb19962

The odd ->release callback in the PCI hotplug core was added in 2003:
https://git.kernel.org/tglx/history/c/69f8d663b595

Clearly, a library approach would not force every hotplug driver to
implement a ->release callback, but rather allow the driver to remove
the sysfs files, release its data structures and finally destroy the
kobject.  Alternatively, a driver may choose to remove everything with
pci_hp_deregister(), then release its data structures.

To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
split-up version of pci_hp_register().  Likewise, offer pci_hp_del()
and pci_hp_destroy() as a split-up version of pci_hp_deregister().

Eliminate the ->release callback and move its code into each driver's
teardown routine.

Declare pci_hp_deregister() void, in keeping with the usual kernel
pattern that enablement can fail, but disablement cannot.  It only
returned an error if the caller passed in a NULL pointer or a slot which
has never or is no longer registered or is sharing its name with another
slot.  Those would be bugs, so WARN about them.  Few hotplug drivers
actually checked the return value and those that did only printed a
useless error message to dmesg.  Remove that.

For most drivers the conversion was straightforward since it doesn't
matter whether the code in the ->release callback is executed before or
after destruction of the kobject.  But in the case of ibmphp, it was
unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
NULL needs to happen before the kobject is destroyed, so I erred on
the side of caution and ensured that the order stays the same.  Another
nontrivial case is pnv_php, I've found the list and kref logic difficult
to understand, however my impression was that it is safe to delete the
list element and drop the references until after the kobject is
destroyed.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/acpiphp_core.c      |  22 +---
 drivers/pci/hotplug/cpci_hotplug_core.c |  14 +--
 drivers/pci/hotplug/cpqphp_core.c       |  16 +--
 drivers/pci/hotplug/ibmphp_core.c       |  15 ++-
 drivers/pci/hotplug/ibmphp_ebda.c       |  20 ----
 drivers/pci/hotplug/pci_hotplug_core.c  | 139 +++++++++++++++++-------
 drivers/pci/hotplug/pciehp_core.c       |  19 +---
 drivers/pci/hotplug/pcihp_skeleton.c    |  16 +--
 drivers/pci/hotplug/pnv_php.c           |   5 +-
 drivers/pci/hotplug/rpaphp_core.c       |   2 +-
 drivers/pci/hotplug/rpaphp_slot.c       |  13 +--
 drivers/pci/hotplug/s390_pci_hpc.c      |  13 +--
 drivers/pci/hotplug/sgi_hotplug.c       |   9 +-
 drivers/pci/hotplug/shpchp_core.c       |  20 +---
 drivers/platform/x86/asus-wmi.c         |  12 +-
 drivers/platform/x86/eeepc-laptop.c     |  12 +-
 include/linux/pci_hotplug.h             |  15 ++-
 17 files changed, 171 insertions(+), 191 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_core.c b/drivers/pci/hotplug/acpiphp_core.c
index 12b5655fd390..ad32ffbc4b91 100644
--- a/drivers/pci/hotplug/acpiphp_core.c
+++ b/drivers/pci/hotplug/acpiphp_core.c
@@ -254,20 +254,6 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 	return 0;
 }
 
-/**
- * release_slot - free up the memory used by a slot
- * @hotplug_slot: slot to free
- */
-static void release_slot(struct hotplug_slot *hotplug_slot)
-{
-	struct slot *slot = hotplug_slot->private;
-
-	pr_debug("%s - physical_slot = %s\n", __func__, slot_name(slot));
-
-	kfree(slot->hotplug_slot);
-	kfree(slot);
-}
-
 /* callback routine to initialize 'struct slot' for each slot */
 int acpiphp_register_hotplug_slot(struct acpiphp_slot *acpiphp_slot,
 				  unsigned int sun)
@@ -287,7 +273,6 @@ int acpiphp_register_hotplug_slot(struct acpiphp_slot *acpiphp_slot,
 	slot->hotplug_slot->info = &slot->info;
 
 	slot->hotplug_slot->private = slot;
-	slot->hotplug_slot->release = &release_slot;
 	slot->hotplug_slot->ops = &acpi_hotplug_slot_ops;
 
 	slot->acpi_slot = acpiphp_slot;
@@ -324,13 +309,12 @@ int acpiphp_register_hotplug_slot(struct acpiphp_slot *acpiphp_slot,
 void acpiphp_unregister_hotplug_slot(struct acpiphp_slot *acpiphp_slot)
 {
 	struct slot *slot = acpiphp_slot->slot;
-	int retval = 0;
 
 	pr_info("Slot [%s] unregistered\n", slot_name(slot));
 
-	retval = pci_hp_deregister(slot->hotplug_slot);
-	if (retval)
-		pr_err("pci_hp_deregister failed with error %d\n", retval);
+	pci_hp_deregister(slot->hotplug_slot);
+	kfree(slot->hotplug_slot);
+	kfree(slot);
 }
 
 
diff --git a/drivers/pci/hotplug/cpci_hotplug_core.c b/drivers/pci/hotplug/cpci_hotplug_core.c
index 07b533adc9df..52a339baf06c 100644
--- a/drivers/pci/hotplug/cpci_hotplug_core.c
+++ b/drivers/pci/hotplug/cpci_hotplug_core.c
@@ -195,10 +195,8 @@ get_latch_status(struct hotplug_slot *hotplug_slot, u8 *value)
 	return 0;
 }
 
-static void release_slot(struct hotplug_slot *hotplug_slot)
+static void release_slot(struct slot *slot)
 {
-	struct slot *slot = hotplug_slot->private;
-
 	kfree(slot->hotplug_slot->info);
 	kfree(slot->hotplug_slot);
 	pci_dev_put(slot->dev);
@@ -253,7 +251,6 @@ cpci_hp_register_bus(struct pci_bus *bus, u8 first, u8 last)
 		snprintf(name, SLOT_NAME_SIZE, "%02x:%02x", bus->number, i);
 
 		hotplug_slot->private = slot;
-		hotplug_slot->release = &release_slot;
 		hotplug_slot->ops = &cpci_hotplug_slot_ops;
 
 		/*
@@ -308,12 +305,8 @@ cpci_hp_unregister_bus(struct pci_bus *bus)
 			slots--;
 
 			dbg("deregistering slot %s", slot_name(slot));
-			status = pci_hp_deregister(slot->hotplug_slot);
-			if (status) {
-				err("pci_hp_deregister failed with error %d",
-				    status);
-				break;
-			}
+			pci_hp_deregister(slot->hotplug_slot);
+			release_slot(slot);
 		}
 	}
 	up_write(&list_rwsem);
@@ -623,6 +616,7 @@ cleanup_slots(void)
 	list_for_each_entry_safe(slot, tmp, &slot_list, slot_list) {
 		list_del(&slot->slot_list);
 		pci_hp_deregister(slot->hotplug_slot);
+		release_slot(slot);
 	}
 cleanup_null:
 	up_write(&list_rwsem);
diff --git a/drivers/pci/hotplug/cpqphp_core.c b/drivers/pci/hotplug/cpqphp_core.c
index 1797e36ec586..5a06636e910a 100644
--- a/drivers/pci/hotplug/cpqphp_core.c
+++ b/drivers/pci/hotplug/cpqphp_core.c
@@ -266,17 +266,6 @@ static void __iomem *get_SMBIOS_entry(void __iomem *smbios_start,
 	return previous;
 }
 
-static void release_slot(struct hotplug_slot *hotplug_slot)
-{
-	struct slot *slot = hotplug_slot->private;
-
-	dbg("%s - physical_slot = %s\n", __func__, slot_name(slot));
-
-	kfree(slot->hotplug_slot->info);
-	kfree(slot->hotplug_slot);
-	kfree(slot);
-}
-
 static int ctrl_slot_cleanup(struct controller *ctrl)
 {
 	struct slot *old_slot, *next_slot;
@@ -285,9 +274,11 @@ static int ctrl_slot_cleanup(struct controller *ctrl)
 	ctrl->slot = NULL;
 
 	while (old_slot) {
-		/* memory will be freed by the release_slot callback */
 		next_slot = old_slot->next;
 		pci_hp_deregister(old_slot->hotplug_slot);
+		kfree(old_slot->hotplug_slot->info);
+		kfree(old_slot->hotplug_slot);
+		kfree(old_slot);
 		old_slot = next_slot;
 	}
 
@@ -678,7 +669,6 @@ static int ctrl_slot_setup(struct controller *ctrl,
 			((read_slot_enable(ctrl) << 2) >> ctrl_slot) & 0x04;
 
 		/* register this slot with the hotplug pci core */
-		hotplug_slot->release = &release_slot;
 		hotplug_slot->private = slot;
 		snprintf(name, SLOT_NAME_SIZE, "%u", slot->number);
 		hotplug_slot->ops = &cpqphp_hotplug_slot_ops;
diff --git a/drivers/pci/hotplug/ibmphp_core.c b/drivers/pci/hotplug/ibmphp_core.c
index 1869b0411ce0..4ea57e9019f1 100644
--- a/drivers/pci/hotplug/ibmphp_core.c
+++ b/drivers/pci/hotplug/ibmphp_core.c
@@ -673,7 +673,20 @@ static void free_slots(void)
 
 	list_for_each_entry_safe(slot_cur, next, &ibmphp_slot_head,
 				 ibm_slot_list) {
-		pci_hp_deregister(slot_cur->hotplug_slot);
+		pci_hp_del(slot_cur->hotplug_slot);
+		slot_cur->ctrl = NULL;
+		slot_cur->bus_on = NULL;
+
+		/*
+		 * We don't want to actually remove the resources,
+		 * since ibmphp_free_resources() will do just that.
+		 */
+		ibmphp_unconfigure_card(&slot_cur, -1);
+
+		pci_hp_destroy(slot_cur->hotplug_slot);
+		kfree(slot_cur->hotplug_slot->info);
+		kfree(slot_cur->hotplug_slot);
+		kfree(slot_cur);
 	}
 	debug("%s -- exit\n", __func__);
 }
diff --git a/drivers/pci/hotplug/ibmphp_ebda.c b/drivers/pci/hotplug/ibmphp_ebda.c
index 64549aa24c0f..6f8e90e3ec08 100644
--- a/drivers/pci/hotplug/ibmphp_ebda.c
+++ b/drivers/pci/hotplug/ibmphp_ebda.c
@@ -699,25 +699,6 @@ static int fillslotinfo(struct hotplug_slot *hotplug_slot)
 	return rc;
 }
 
-static void release_slot(struct hotplug_slot *hotplug_slot)
-{
-	struct slot *slot;
-
-	if (!hotplug_slot || !hotplug_slot->private)
-		return;
-
-	slot = hotplug_slot->private;
-	kfree(slot->hotplug_slot->info);
-	kfree(slot->hotplug_slot);
-	slot->ctrl = NULL;
-	slot->bus_on = NULL;
-
-	/* we don't want to actually remove the resources, since free_resources will do just that */
-	ibmphp_unconfigure_card(&slot, -1);
-
-	kfree(slot);
-}
-
 static struct pci_driver ibmphp_driver;
 
 /*
@@ -941,7 +922,6 @@ static int __init ebda_rsrc_controller(void)
 			tmp_slot->hotplug_slot = hp_slot_ptr;
 
 			hp_slot_ptr->private = tmp_slot;
-			hp_slot_ptr->release = release_slot;
 
 			rc = fillslotinfo(hp_slot_ptr);
 			if (rc)
diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
index fd93783a87b0..90fde5f106d8 100644
--- a/drivers/pci/hotplug/pci_hotplug_core.c
+++ b/drivers/pci/hotplug/pci_hotplug_core.c
@@ -396,8 +396,9 @@ static struct hotplug_slot *get_slot_from_name(const char *name)
  * @owner: caller module owner
  * @mod_name: caller module name
  *
- * Registers a hotplug slot with the pci hotplug subsystem, which will allow
- * userspace interaction to the slot.
+ * Prepares a hotplug slot for in-kernel use and immediately publishes it to
+ * user space in one go.  Drivers may alternatively carry out the two steps
+ * separately by invoking pci_hp_initialize() and pci_hp_add().
  *
  * Returns 0 if successful, anything else for an error.
  */
@@ -406,54 +407,91 @@ int __pci_hp_register(struct hotplug_slot *slot, struct pci_bus *bus,
 		      struct module *owner, const char *mod_name)
 {
 	int result;
+
+	result = __pci_hp_initialize(slot, bus, devnr, name, owner, mod_name);
+	if (result)
+		return result;
+
+	result = pci_hp_add(slot);
+	if (result)
+		pci_hp_destroy(slot);
+
+	return result;
+}
+EXPORT_SYMBOL_GPL(__pci_hp_register);
+
+/**
+ * __pci_hp_initialize - prepare hotplug slot for in-kernel use
+ * @slot: pointer to the &struct hotplug_slot to initialize
+ * @bus: bus this slot is on
+ * @devnr: slot number
+ * @name: name registered with kobject core
+ * @owner: caller module owner
+ * @mod_name: caller module name
+ *
+ * Allocate and fill in a PCI slot for use by a hotplug driver.  Once this has
+ * been called, the driver may invoke hotplug_slot_name() to get the slot's
+ * unique name.  The driver must be prepared to handle a ->reset_slot callback
+ * from this point on.
+ *
+ * Returns 0 on success or a negative int on error.
+ */
+int __pci_hp_initialize(struct hotplug_slot *slot, struct pci_bus *bus,
+			int devnr, const char *name, struct module *owner,
+			const char *mod_name)
+{
 	struct pci_slot *pci_slot;
 
 	if (slot == NULL)
 		return -ENODEV;
 	if ((slot->info == NULL) || (slot->ops == NULL))
 		return -EINVAL;
-	if (slot->release == NULL) {
-		dbg("Why are you trying to register a hotplug slot without a proper release function?\n");
-		return -EINVAL;
-	}
 
 	slot->ops->owner = owner;
 	slot->ops->mod_name = mod_name;
 
-	mutex_lock(&pci_hp_mutex);
 	/*
 	 * No problems if we call this interface from both ACPI_PCI_SLOT
 	 * driver and call it here again. If we've already created the
 	 * pci_slot, the interface will simply bump the refcount.
 	 */
 	pci_slot = pci_create_slot(bus, devnr, name, slot);
-	if (IS_ERR(pci_slot)) {
-		result = PTR_ERR(pci_slot);
-		goto out;
-	}
+	if (IS_ERR(pci_slot))
+		return PTR_ERR(pci_slot);
 
 	slot->pci_slot = pci_slot;
 	pci_slot->hotplug = slot;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__pci_hp_initialize);
 
-	list_add(&slot->slot_list, &pci_hotplug_slot_list);
+/**
+ * pci_hp_add - publish hotplug slot to user space
+ * @slot: pointer to the &struct hotplug_slot to publish
+ *
+ * Make a hotplug slot's sysfs interface available and inform user space of its
+ * addition by sending a uevent.  The hotplug driver must be prepared to handle
+ * all &struct hotplug_slot_ops callbacks from this point on.
+ *
+ * Returns 0 on success or a negative int on error.
+ */
+int pci_hp_add(struct hotplug_slot *slot)
+{
+	struct pci_slot *pci_slot = slot->pci_slot;
+	int result;
 
 	result = fs_add_slot(pci_slot);
 	if (result)
-		goto err_list_del;
+		return result;
 
 	kobject_uevent(&pci_slot->kobj, KOBJ_ADD);
-	dbg("Added slot %s to the list\n", name);
-	goto out;
-
-err_list_del:
-	list_del(&slot->slot_list);
-	pci_slot->hotplug = NULL;
-	pci_destroy_slot(pci_slot);
-out:
+	mutex_lock(&pci_hp_mutex);
+	list_add(&slot->slot_list, &pci_hotplug_slot_list);
 	mutex_unlock(&pci_hp_mutex);
-	return result;
+	dbg("Added slot %s to the list\n", hotplug_slot_name(slot));
+	return 0;
 }
-EXPORT_SYMBOL_GPL(__pci_hp_register);
+EXPORT_SYMBOL_GPL(pci_hp_add);
 
 /**
  * pci_hp_deregister - deregister a hotplug_slot with the PCI hotplug subsystem
@@ -464,35 +502,62 @@ EXPORT_SYMBOL_GPL(__pci_hp_register);
  *
  * Returns 0 if successful, anything else for an error.
  */
-int pci_hp_deregister(struct hotplug_slot *slot)
+void pci_hp_deregister(struct hotplug_slot *slot)
+{
+	pci_hp_del(slot);
+	pci_hp_destroy(slot);
+}
+EXPORT_SYMBOL_GPL(pci_hp_deregister);
+
+/**
+ * pci_hp_del - unpublish hotplug slot from user space
+ * @slot: pointer to the &struct hotplug_slot to unpublish
+ *
+ * Remove a hotplug slot's sysfs interface.
+ *
+ * Returns 0 on success or a negative int on error.
+ */
+void pci_hp_del(struct hotplug_slot *slot)
 {
 	struct hotplug_slot *temp;
-	struct pci_slot *pci_slot;
 
-	if (!slot)
-		return -ENODEV;
+	if (WARN_ON(!slot))
+		return;
 
 	mutex_lock(&pci_hp_mutex);
 	temp = get_slot_from_name(hotplug_slot_name(slot));
-	if (temp != slot) {
+	if (WARN_ON(temp != slot)) {
 		mutex_unlock(&pci_hp_mutex);
-		return -ENODEV;
+		return;
 	}
 
 	list_del(&slot->slot_list);
-
-	pci_slot = slot->pci_slot;
-	fs_remove_slot(pci_slot);
+	mutex_unlock(&pci_hp_mutex);
 	dbg("Removed slot %s from the list\n", hotplug_slot_name(slot));
+	fs_remove_slot(slot->pci_slot);
+}
+EXPORT_SYMBOL_GPL(pci_hp_del);
+
+/**
+ * pci_hp_destroy - remove hotplug slot from in-kernel use
+ * @slot: pointer to the &struct hotplug_slot to destroy
+ *
+ * Destroy a PCI slot used by a hotplug driver.  Once this has been called,
+ * the driver may no longer invoke hotplug_slot_name() to get the slot's
+ * unique name.  The driver no longer needs to handle a ->reset_slot callback
+ * from this point on.
+ *
+ * Returns 0 on success or a negative int on error.
+ */
+void pci_hp_destroy(struct hotplug_slot *slot)
+{
+	struct pci_slot *pci_slot = slot->pci_slot;
 
-	slot->release(slot);
+	slot->pci_slot = NULL;
 	pci_slot->hotplug = NULL;
 	pci_destroy_slot(pci_slot);
-	mutex_unlock(&pci_hp_mutex);
-
-	return 0;
 }
-EXPORT_SYMBOL_GPL(pci_hp_deregister);
+EXPORT_SYMBOL_GPL(pci_hp_destroy);
 
 /**
  * pci_hp_change_slot_info - changes the slot's information structure in the core
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index b360645377c2..37d8f81e548f 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -56,17 +56,6 @@ static int get_latch_status(struct hotplug_slot *slot, u8 *value);
 static int get_adapter_status(struct hotplug_slot *slot, u8 *value);
 static int reset_slot(struct hotplug_slot *slot, int probe);
 
-/**
- * release_slot - free up the memory used by a slot
- * @hotplug_slot: slot to free
- */
-static void release_slot(struct hotplug_slot *hotplug_slot)
-{
-	kfree(hotplug_slot->ops);
-	kfree(hotplug_slot->info);
-	kfree(hotplug_slot);
-}
-
 static int init_slot(struct controller *ctrl)
 {
 	struct slot *slot = ctrl->slot;
@@ -107,7 +96,6 @@ static int init_slot(struct controller *ctrl)
 	/* register this slot with the hotplug pci core */
 	hotplug->info = info;
 	hotplug->private = slot;
-	hotplug->release = &release_slot;
 	hotplug->ops = ops;
 	slot->hotplug_slot = hotplug;
 	snprintf(name, SLOT_NAME_SIZE, "%u", PSN(ctrl));
@@ -127,7 +115,12 @@ static int init_slot(struct controller *ctrl)
 
 static void cleanup_slot(struct controller *ctrl)
 {
-	pci_hp_deregister(ctrl->slot->hotplug_slot);
+	struct hotplug_slot *hotplug_slot = ctrl->slot->hotplug_slot;
+
+	pci_hp_deregister(hotplug_slot);
+	kfree(hotplug_slot->ops);
+	kfree(hotplug_slot->info);
+	kfree(hotplug_slot);
 }
 
 /*
diff --git a/drivers/pci/hotplug/pcihp_skeleton.c b/drivers/pci/hotplug/pcihp_skeleton.c
index c19694a04d2c..94ad7cd72878 100644
--- a/drivers/pci/hotplug/pcihp_skeleton.c
+++ b/drivers/pci/hotplug/pcihp_skeleton.c
@@ -210,16 +210,6 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 	return retval;
 }
 
-static void release_slot(struct hotplug_slot *hotplug_slot)
-{
-	struct slot *slot = hotplug_slot->private;
-
-	dbg("%s - physical_slot = %s\n", __func__, hotplug_slot->name);
-	kfree(slot->hotplug_slot->info);
-	kfree(slot->hotplug_slot);
-	kfree(slot);
-}
-
 static void make_slot_name(struct slot *slot)
 {
 	/*
@@ -270,7 +260,6 @@ static int __init init_slots(void)
 
 		hotplug_slot->name = slot->name;
 		hotplug_slot->private = slot;
-		hotplug_slot->release = &release_slot;
 		make_slot_name(slot);
 		hotplug_slot->ops = &skel_hotplug_slot_ops;
 
@@ -311,12 +300,13 @@ static void __exit cleanup_slots(void)
 
 	/*
 	 * Unregister all of our slots with the pci_hotplug subsystem.
-	 * Memory will be freed in release_slot() callback after slot's
-	 * lifespan is finished.
 	 */
 	list_for_each_entry_safe(slot, next, &slot_list, slot_list) {
 		list_del(&slot->slot_list);
 		pci_hp_deregister(slot->hotplug_slot);
+		kfree(slot->hotplug_slot->info);
+		kfree(slot->hotplug_slot);
+		kfree(slot);
 	}
 }
 
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 6c2e8d7307c6..3276a5e4c430 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -538,9 +538,8 @@ static struct hotplug_slot_ops php_slot_ops = {
 	.disable_slot		= pnv_php_disable_slot,
 };
 
-static void pnv_php_release(struct hotplug_slot *slot)
+static void pnv_php_release(struct pnv_php_slot *php_slot)
 {
-	struct pnv_php_slot *php_slot = slot->private;
 	unsigned long flags;
 
 	/* Remove from global or child list */
@@ -596,7 +595,6 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
 	php_slot->power_state_check     = false;
 	php_slot->slot.ops              = &php_slot_ops;
 	php_slot->slot.info             = &php_slot->slot_info;
-	php_slot->slot.release          = pnv_php_release;
 	php_slot->slot.private          = php_slot;
 
 	INIT_LIST_HEAD(&php_slot->children);
@@ -924,6 +922,7 @@ static void pnv_php_unregister_one(struct device_node *dn)
 
 	php_slot->state = PNV_PHP_STATE_OFFLINE;
 	pci_hp_deregister(&php_slot->slot);
+	pnv_php_release(php_slot);
 	pnv_php_put_slot(php_slot);
 }
 
diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index fb5e0845429d..857c358b727b 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -404,13 +404,13 @@ static void __exit cleanup_slots(void)
 	/*
 	 * Unregister all of our slots with the pci_hotplug subsystem,
 	 * and free up all memory that we had allocated.
-	 * memory will be freed in release_slot callback.
 	 */
 
 	list_for_each_entry_safe(slot, next, &rpaphp_slot_head,
 				 rpaphp_slot_list) {
 		list_del(&slot->rpaphp_slot_list);
 		pci_hp_deregister(slot->hotplug_slot);
+		dealloc_slot_struct(slot);
 	}
 	return;
 }
diff --git a/drivers/pci/hotplug/rpaphp_slot.c b/drivers/pci/hotplug/rpaphp_slot.c
index 3840a2075e6a..b916c8e4372d 100644
--- a/drivers/pci/hotplug/rpaphp_slot.c
+++ b/drivers/pci/hotplug/rpaphp_slot.c
@@ -19,12 +19,6 @@
 #include "rpaphp.h"
 
 /* free up the memory used by a slot */
-static void rpaphp_release_slot(struct hotplug_slot *hotplug_slot)
-{
-	struct slot *slot = (struct slot *) hotplug_slot->private;
-	dealloc_slot_struct(slot);
-}
-
 void dealloc_slot_struct(struct slot *slot)
 {
 	kfree(slot->hotplug_slot->info);
@@ -56,7 +50,6 @@ struct slot *alloc_slot_struct(struct device_node *dn,
 	slot->power_domain = power_domain;
 	slot->hotplug_slot->private = slot;
 	slot->hotplug_slot->ops = &rpaphp_hotplug_slot_ops;
-	slot->hotplug_slot->release = &rpaphp_release_slot;
 
 	return (slot);
 
@@ -90,10 +83,8 @@ int rpaphp_deregister_slot(struct slot *slot)
 		__func__, slot->name);
 
 	list_del(&slot->rpaphp_slot_list);
-
-	retval = pci_hp_deregister(php_slot);
-	if (retval)
-		err("Problem unregistering a slot %s\n", slot->name);
+	pci_hp_deregister(php_slot);
+	dealloc_slot_struct(slot);
 
 	dbg("%s - Exit: rc[%d]\n", __func__, retval);
 	return retval;
diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
index ffdc2977395d..93b5341d282c 100644
--- a/drivers/pci/hotplug/s390_pci_hpc.c
+++ b/drivers/pci/hotplug/s390_pci_hpc.c
@@ -130,15 +130,6 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 	return 0;
 }
 
-static void release_slot(struct hotplug_slot *hotplug_slot)
-{
-	struct slot *slot = hotplug_slot->private;
-
-	kfree(slot->hotplug_slot->info);
-	kfree(slot->hotplug_slot);
-	kfree(slot);
-}
-
 static struct hotplug_slot_ops s390_hotplug_slot_ops = {
 	.enable_slot =		enable_slot,
 	.disable_slot =		disable_slot,
@@ -175,7 +166,6 @@ int zpci_init_slot(struct zpci_dev *zdev)
 	hotplug_slot->info = info;
 
 	hotplug_slot->ops = &s390_hotplug_slot_ops;
-	hotplug_slot->release = &release_slot;
 
 	get_power_status(hotplug_slot, &info->power_status);
 	get_adapter_status(hotplug_slot, &info->adapter_status);
@@ -209,5 +199,8 @@ void zpci_exit_slot(struct zpci_dev *zdev)
 			continue;
 		list_del(&slot->slot_list);
 		pci_hp_deregister(slot->hotplug_slot);
+		kfree(slot->hotplug_slot->info);
+		kfree(slot->hotplug_slot);
+		kfree(slot);
 	}
 }
diff --git a/drivers/pci/hotplug/sgi_hotplug.c b/drivers/pci/hotplug/sgi_hotplug.c
index 78b6bdbb3a39..babd23409f61 100644
--- a/drivers/pci/hotplug/sgi_hotplug.c
+++ b/drivers/pci/hotplug/sgi_hotplug.c
@@ -628,7 +628,6 @@ static int sn_hotplug_slot_register(struct pci_bus *pci_bus)
 			goto alloc_err;
 		}
 		bss_hotplug_slot->ops = &sn_hotplug_slot_ops;
-		bss_hotplug_slot->release = &sn_release_slot;
 
 		rc = pci_hp_register(bss_hotplug_slot, pci_bus, device, name);
 		if (rc)
@@ -656,8 +655,10 @@ static int sn_hotplug_slot_register(struct pci_bus *pci_bus)
 		sn_release_slot(bss_hotplug_slot);
 
 	/* destroy anything else on the list */
-	while ((bss_hotplug_slot = sn_hp_destroy()))
+	while ((bss_hotplug_slot = sn_hp_destroy())) {
 		pci_hp_deregister(bss_hotplug_slot);
+		sn_release_slot(bss_hotplug_slot);
+	}
 
 	return rc;
 }
@@ -703,8 +704,10 @@ static void __exit sn_pci_hotplug_exit(void)
 {
 	struct hotplug_slot *bss_hotplug_slot;
 
-	while ((bss_hotplug_slot = sn_hp_destroy()))
+	while ((bss_hotplug_slot = sn_hp_destroy())) {
 		pci_hp_deregister(bss_hotplug_slot);
+		sn_release_slot(bss_hotplug_slot);
+	}
 
 	if (!list_empty(&sn_hp_list))
 		printk(KERN_ERR "%s: internal list is not empty\n", __FILE__);
diff --git a/drivers/pci/hotplug/shpchp_core.c b/drivers/pci/hotplug/shpchp_core.c
index e91be287f292..5990695d1737 100644
--- a/drivers/pci/hotplug/shpchp_core.c
+++ b/drivers/pci/hotplug/shpchp_core.c
@@ -61,22 +61,6 @@ static struct hotplug_slot_ops shpchp_hotplug_slot_ops = {
 	.get_adapter_status =	get_adapter_status,
 };
 
-/**
- * release_slot - free up the memory used by a slot
- * @hotplug_slot: slot to free
- */
-static void release_slot(struct hotplug_slot *hotplug_slot)
-{
-	struct slot *slot = hotplug_slot->private;
-
-	ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
-		 __func__, slot_name(slot));
-
-	kfree(slot->hotplug_slot->info);
-	kfree(slot->hotplug_slot);
-	kfree(slot);
-}
-
 static int init_slots(struct controller *ctrl)
 {
 	struct slot *slot;
@@ -125,7 +109,6 @@ static int init_slots(struct controller *ctrl)
 
 		/* register this slot with the hotplug pci core */
 		hotplug_slot->private = slot;
-		hotplug_slot->release = &release_slot;
 		snprintf(name, SLOT_NAME_SIZE, "%d", slot->number);
 		hotplug_slot->ops = &shpchp_hotplug_slot_ops;
 
@@ -171,6 +154,9 @@ void cleanup_slots(struct controller *ctrl)
 		cancel_delayed_work(&slot->work);
 		destroy_workqueue(slot->wq);
 		pci_hp_deregister(slot->hotplug_slot);
+		kfree(slot->hotplug_slot->info);
+		kfree(slot->hotplug_slot);
+		kfree(slot);
 	}
 }
 
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 3d523ca64694..d67f32a29bb4 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -858,12 +858,6 @@ static int asus_get_adapter_status(struct hotplug_slot *hotplug_slot,
 	return 0;
 }
 
-static void asus_cleanup_pci_hotplug(struct hotplug_slot *hotplug_slot)
-{
-	kfree(hotplug_slot->info);
-	kfree(hotplug_slot);
-}
-
 static struct hotplug_slot_ops asus_hotplug_slot_ops = {
 	.owner = THIS_MODULE,
 	.get_adapter_status = asus_get_adapter_status,
@@ -905,7 +899,6 @@ static int asus_setup_pci_hotplug(struct asus_wmi *asus)
 		goto error_info;
 
 	asus->hotplug_slot->private = asus;
-	asus->hotplug_slot->release = &asus_cleanup_pci_hotplug;
 	asus->hotplug_slot->ops = &asus_hotplug_slot_ops;
 	asus_get_adapter_status(asus->hotplug_slot,
 				&asus->hotplug_slot->info->adapter_status);
@@ -1051,8 +1044,11 @@ static void asus_wmi_rfkill_exit(struct asus_wmi *asus)
 	 * asus_unregister_rfkill_notifier()
 	 */
 	asus_rfkill_hotplug(asus);
-	if (asus->hotplug_slot)
+	if (asus->hotplug_slot) {
 		pci_hp_deregister(asus->hotplug_slot);
+		kfree(asus->hotplug_slot->info);
+		kfree(asus->hotplug_slot);
+	}
 	if (asus->hotplug_workqueue)
 		destroy_workqueue(asus->hotplug_workqueue);
 
diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
index 4c38904a8a32..a4bbf6ecd1f0 100644
--- a/drivers/platform/x86/eeepc-laptop.c
+++ b/drivers/platform/x86/eeepc-laptop.c
@@ -726,12 +726,6 @@ static int eeepc_get_adapter_status(struct hotplug_slot *hotplug_slot,
 	return 0;
 }
 
-static void eeepc_cleanup_pci_hotplug(struct hotplug_slot *hotplug_slot)
-{
-	kfree(hotplug_slot->info);
-	kfree(hotplug_slot);
-}
-
 static struct hotplug_slot_ops eeepc_hotplug_slot_ops = {
 	.owner = THIS_MODULE,
 	.get_adapter_status = eeepc_get_adapter_status,
@@ -758,7 +752,6 @@ static int eeepc_setup_pci_hotplug(struct eeepc_laptop *eeepc)
 		goto error_info;
 
 	eeepc->hotplug_slot->private = eeepc;
-	eeepc->hotplug_slot->release = &eeepc_cleanup_pci_hotplug;
 	eeepc->hotplug_slot->ops = &eeepc_hotplug_slot_ops;
 	eeepc_get_adapter_status(eeepc->hotplug_slot,
 				 &eeepc->hotplug_slot->info->adapter_status);
@@ -837,8 +830,11 @@ static void eeepc_rfkill_exit(struct eeepc_laptop *eeepc)
 		eeepc->wlan_rfkill = NULL;
 	}
 
-	if (eeepc->hotplug_slot)
+	if (eeepc->hotplug_slot) {
 		pci_hp_deregister(eeepc->hotplug_slot);
+		kfree(eeepc->hotplug_slot->info);
+		kfree(eeepc->hotplug_slot);
+	}
 
 	if (eeepc->bluetooth_rfkill) {
 		rfkill_unregister(eeepc->bluetooth_rfkill);
diff --git a/include/linux/pci_hotplug.h b/include/linux/pci_hotplug.h
index cf5e22103f68..a6d6650a0490 100644
--- a/include/linux/pci_hotplug.h
+++ b/include/linux/pci_hotplug.h
@@ -80,15 +80,12 @@ struct hotplug_slot_info {
  * @ops: pointer to the &struct hotplug_slot_ops to be used for this slot
  * @info: pointer to the &struct hotplug_slot_info for the initial values for
  * this slot.
- * @release: called during pci_hp_deregister to free memory allocated in a
- * hotplug_slot structure.
  * @private: used by the hotplug pci controller driver to store whatever it
  * needs.
  */
 struct hotplug_slot {
 	struct hotplug_slot_ops		*ops;
 	struct hotplug_slot_info	*info;
-	void (*release) (struct hotplug_slot *slot);
 	void				*private;
 
 	/* Variables below this are for use only by the hotplug pci core. */
@@ -104,13 +101,23 @@ static inline const char *hotplug_slot_name(const struct hotplug_slot *slot)
 int __pci_hp_register(struct hotplug_slot *slot, struct pci_bus *pbus, int nr,
 		      const char *name, struct module *owner,
 		      const char *mod_name);
-int pci_hp_deregister(struct hotplug_slot *slot);
+int __pci_hp_initialize(struct hotplug_slot *slot, struct pci_bus *bus, int nr,
+			const char *name, struct module *owner,
+			const char *mod_name);
+int pci_hp_add(struct hotplug_slot *slot);
+
+void pci_hp_del(struct hotplug_slot *slot);
+void pci_hp_destroy(struct hotplug_slot *slot);
+void pci_hp_deregister(struct hotplug_slot *slot);
+
 int __must_check pci_hp_change_slot_info(struct hotplug_slot *slot,
 					 struct hotplug_slot_info *info);
 
 /* use a define to avoid include chaining to get THIS_MODULE & friends */
 #define pci_hp_register(slot, pbus, devnr, name) \
 	__pci_hp_register(slot, pbus, devnr, name, THIS_MODULE, KBUILD_MODNAME)
+#define pci_hp_initialize(slot, bus, nr, name) \
+	__pci_hp_initialize(slot, bus, nr, name, THIS_MODULE, KBUILD_MODNAME)
 
 /* PCI Setting Record (Type 0) */
 struct hpp_type0 {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/32] PCI: pciehp: Document struct slot and struct controller
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (22 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 09/32] PCI: pciehp: Convert to threaded IRQ Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 20/32] PCI: pciehp: Tolerate initially unstable link Lukas Wunner
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Document the driver's data structures to lower the barrier to entry for
contributors.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h | 48 +++++++++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index acdeb08cbcba..c3d63e5b650f 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -57,11 +57,26 @@ do {									\
 	dev_warn(&ctrl->pcie->device, format, ## arg)
 
 #define SLOT_NAME_SIZE 10
+
+/**
+ * struct slot - PCIe hotplug slot
+ * @state: current state machine position
+ * @ctrl: pointer to the slot's controller structure
+ * @hotplug_slot: pointer to the structure registered with the PCI hotplug core
+ * @work: work item to turn the slot on or off after 5 seconds in response to
+ *	an Attention Button press
+ * @lock: protects reads and writes of @state;
+ *	protects scheduling, execution and cancellation of @work
+ * @hotplug_lock: serializes calls to pciehp_enable_slot() and
+ *	pciehp_disable_slot()
+ * @wq: work queue on which @work is scheduled;
+ *	also used to queue interrupt events and slot enablement and disablement
+ */
 struct slot {
 	u8 state;
 	struct controller *ctrl;
 	struct hotplug_slot *hotplug_slot;
-	struct delayed_work work;	/* work for button event */
+	struct delayed_work work;
 	struct mutex lock;
 	struct mutex hotplug_lock;
 	struct workqueue_struct *wq;
@@ -73,11 +88,36 @@ struct event_info {
 	struct work_struct work;
 };
 
+/**
+ * struct controller - PCIe hotplug controller
+ * @ctrl_lock: serializes writes to the Slot Control register
+ * @pcie: pointer to the controller's PCIe port service device
+ * @slot: pointer to the controller's slot structure
+ * @queue: wait queue to wake up on reception of a Command Completed event,
+ *	used for synchronous writes to the Slot Control register
+ * @slot_cap: cached copy of the Slot Capabilities register
+ * @slot_ctrl: cached copy of the Slot Control register
+ * @poll_timer: timer to poll for slot events if no IRQ is available,
+ *	enabled with pciehp_poll_mode module parameter
+ * @cmd_started: jiffies when the Slot Control register was last written;
+ *	the next write is allowed 1 second later, absent a Command Completed
+ *	interrupt (PCIe r4.0, sec 6.7.3.2)
+ * @cmd_busy: flag set on Slot Control register write, cleared by IRQ handler
+ *	on reception of a Command Completed event
+ * @link_active_reporting: cached copy of Data Link Layer Link Active Reporting
+ *	Capable bit in Link Capabilities register; if this bit is zero, the
+ *	Data Link Layer Link Active bit in the Link Status register will never
+ *	be set and the driver is thus confined to wait 1 second before assuming
+ *	the link to a hotplugged device is up and accessing it
+ * @notification_enabled: whether the IRQ was requested successfully
+ * @power_fault_detected: whether a power fault was detected by the hardware
+ *	that has not yet been cleared by the user
+ */
 struct controller {
-	struct mutex ctrl_lock;		/* controller lock */
-	struct pcie_device *pcie;	/* PCI Express port service */
+	struct mutex ctrl_lock;
+	struct pcie_device *pcie;
 	struct slot *slot;
-	wait_queue_head_t queue;	/* sleep & wake process */
+	wait_queue_head_t queue;
 	u32 slot_cap;
 	u16 slot_ctrl;
 	struct timer_list poll_timer;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 28/32] PCI: pciehp: Resume to D0 on enable/disable
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (11 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 06/32] PCI: pciehp: Declare pciehp_unconfigure_device() void Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 22/32] PCI: pciehp: Always enable occupied slot on probe Lukas Wunner
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

pciehp's IRQ thread ensures accessibility of the port by runtime resuming
its parent to D0.  However when the slot is enabled/disabled, the port
itself needs to be in D0 because its secondary bus is accessed in:

    pciehp_check_link_status(),
    pciehp_configure_device() (both called from board_added())
and
    pciehp_unconfigure_device() (called from remove_board()).

Thus, acquire a runtime PM ref on enable/disablement of the slot.

Yinghai Lu additionally discovered that some SkyLake servers feature a
Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
which requires the port to be in D0 when invoking
    pciehp_power_on_slot() (likewise called from board_added()).

If slot power is turned on while in D3hot, link training later fails:
https://lkml.org/lkml/2017/2/5/13

The spec is silent about such a requirement, but it seems prudent to
assume that any hotplug port with a Power Controller may need this.

The present commit holds a runtime PM ref whenever slot power is turned
on and off, but it doesn't keep the port in D0 as long as slot power is
on.  If vendors determine that's necessary, they need to amend pciehp to
acquire a runtime PM ref in pciehp_power_on_slot() and release one in
pciehp_power_off_slot().

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 811019902ada..6855933ab372 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -17,6 +17,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/slab.h>
+#include <linux/pm_runtime.h>
 #include <linux/pci.h>
 #include "../pci.h"
 #include "pciehp.h"
@@ -310,9 +311,11 @@ static int pciehp_enable_slot(struct slot *slot)
 	struct controller *ctrl = slot->ctrl;
 	int ret;
 
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
 	ret = __pciehp_enable_slot(slot);
 	if (ret && ATTN_BUTTN(ctrl))
 		pciehp_green_led_off(slot); /* may be blinking */
+	pm_runtime_put(&ctrl->pcie->port->dev);
 
 	mutex_lock(&slot->lock);
 	slot->state = ret ? OFF_STATE : ON_STATE;
@@ -341,9 +344,12 @@ static int __pciehp_disable_slot(struct slot *p_slot)
 
 static int pciehp_disable_slot(struct slot *slot)
 {
+	struct controller *ctrl = slot->ctrl;
 	int ret;
 
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
 	ret = __pciehp_disable_slot(slot);
+	pm_runtime_put(&ctrl->pcie->port->dev);
 
 	mutex_lock(&slot->lock);
 	slot->state = OFF_STATE;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 22/32] PCI: pciehp: Always enable occupied slot on probe
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (12 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 28/32] PCI: pciehp: Resume to D0 on enable/disable Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 10/32] PCI: pciehp: Convert to threaded polling Lukas Wunner
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
there are hot-plug events that occur while interrupt generation is
disabled, and interrupt generation is subsequently enabled."

On probe, we currently clear all event bits in the Slot Status register
with the notable exception of the Presence Detect Changed bit.  Thereby
we seek to receive an interrupt for an already occupied slot once event
notification is enabled.

But because the interrupt is optional, users may have to specify the
pciehp_force parameter on the command line, which is inconvenient.

Moreover, now that pciehp's event handling has become resilient to
missed events, a Presence Detect Changed interrupt for a slot which is
powered on is interpreted as removal of the card.  If the slot has
already been brought up by the BIOS, receiving such an interrupt on
probe causes the slot to be powered off and immediately back on, which
is likewise undesirable.

Avoid both issues by making the behavior of pciehp_force the default and
clearing the Presence Detect Changed bit on probe.

Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
("Force pciehp, even if OSHP is missing") seems nonsensical because the
OSHP control method is only relevant for SHCP slots according to the
PCI Firmware specification r3.0, sec 4.8.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_core.c | 12 ++++--------
 drivers/pci/hotplug/pciehp_hpc.c  |  9 ++-------
 2 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index b11f0db0695f..f4eaa9944699 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -30,7 +30,6 @@
 bool pciehp_debug;
 bool pciehp_poll_mode;
 int pciehp_poll_time;
-static bool pciehp_force;
 
 /*
  * not really modular, but the easiest way to keep compat with existing
@@ -39,11 +38,9 @@ static bool pciehp_force;
 module_param(pciehp_debug, bool, 0644);
 module_param(pciehp_poll_mode, bool, 0644);
 module_param(pciehp_poll_time, int, 0644);
-module_param(pciehp_force, bool, 0644);
 MODULE_PARM_DESC(pciehp_debug, "Debugging mode enabled or not");
 MODULE_PARM_DESC(pciehp_poll_mode, "Using polling mechanism for hot-plug events or not");
 MODULE_PARM_DESC(pciehp_poll_time, "Polling mechanism frequency, in seconds");
-MODULE_PARM_DESC(pciehp_force, "Force pciehp, even if OSHP is missing");
 
 #define PCIE_MODULE_NAME "pciehp"
 
@@ -243,11 +240,10 @@ static int pciehp_probe(struct pcie_device *dev)
 	mutex_lock(&slot->lock);
 	pciehp_get_adapter_status(slot, &occupied);
 	pciehp_get_power_status(slot, &poweron);
-	if (pciehp_force &&
-	    ((occupied && (slot->state == OFF_STATE ||
-			   slot->state == BLINKINGON_STATE)) ||
-	     (!occupied && (slot->state == ON_STATE ||
-			    slot->state == BLINKINGOFF_STATE))))
+	if ((occupied && (slot->state == OFF_STATE ||
+			  slot->state == BLINKINGON_STATE)) ||
+	    (!occupied && (slot->state == ON_STATE ||
+			   slot->state == BLINKINGOFF_STATE)))
 		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
 	/* If empty slot's power status is on, turn power off */
 	if (!occupied && poweron && POWER_CTRL(ctrl))
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index b746c3b52719..3c0adf4967a8 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -843,16 +843,11 @@ struct controller *pcie_init(struct pcie_device *dev)
 	if (link_cap & PCI_EXP_LNKCAP_DLLLARC)
 		ctrl->link_active_reporting = 1;
 
-	/*
-	 * Clear all remaining event bits in Slot Status register except
-	 * Presence Detect Changed. We want to make sure possible
-	 * hotplug event is triggered when the interrupt is unmasked so
-	 * that we don't lose that event.
-	 */
+	/* Clear all remaining event bits in Slot Status register. */
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 		PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
 		PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_CC |
-		PCI_EXP_SLTSTA_DLLSC);
+		PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC);
 
 	ctrl_info(ctrl, "Slot #%d AttnBtn%c PwrCtrl%c MRL%c AttnInd%c PwrInd%c HotPlug%c Surprise%c Interlock%c NoCompl%c LLActRep%c%s\n",
 		(slot_cap & PCI_EXP_SLTCAP_PSN) >> 19,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/32] PCI: pciehp: Fix unprotected list iteration in IRQ handler
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 08/32] genirq: Synchronize only with single thread on free_irq() Lukas Wunner
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Commit b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug
events for a device") iterates over the devices on a hotplug port's
subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem.
It is thus possible for a user to cause a crash by concurrently
manipulating the device list, e.g. by disabling slot power via sysfs
on a different CPU or by initiating a remove/rescan via sysfs.

This can't be fixed by acquiring pci_bus_sem because it may sleep.
The simplest fix is to avoid the list iteration altogether and just
check the ignore_hotplug flag on the port itself.  This works because
pci_ignore_hotplug() sets the flag both on the device as well as on its
parent bridge.

We do lose the ability to print the name of the device blocking hotplug
in the debug message, but that's probably bearable.

Fixes: b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device")
Cc: stable@vger.kernel.org
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_hpc.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 26b85440053e..044087e2683d 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -539,8 +539,6 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 {
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
-	struct pci_bus *subordinate = pdev->subordinate;
-	struct pci_dev *dev;
 	struct slot *slot = ctrl->slot;
 	u16 status, events;
 	u8 present;
@@ -588,14 +586,9 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		wake_up(&ctrl->queue);
 	}
 
-	if (subordinate) {
-		list_for_each_entry(dev, &subordinate->devices, bus_list) {
-			if (dev->ignore_hotplug) {
-				ctrl_dbg(ctrl, "ignoring hotplug event %#06x (%s requested no hotplug)\n",
-					 events, pci_name(dev));
-				return IRQ_HANDLED;
-			}
-		}
+	if (pdev->ignore_hotplug) {
+		ctrl_dbg(ctrl, "ignoring hotplug event %#06x\n", events);
+		return IRQ_HANDLED;
 	}
 
 	/* Check Attention Button Pressed */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 21/32] PCI: pciehp: Become resilient to missed events
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (16 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 12/32] PCI: pciehp: Handle events synchronously Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 25/32] PCI: pciehp: Clear spurious events earlier on resume Lukas Wunner
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Stefan Roese,
	Mayurkumar Patel, Kenji Kaneshige

A hotplug port's Slot Status register does not count how often each type
of event occurred, it only records the fact *that* an event has occurred.

Previously pciehp queued a work item for each event.  But if it missed
an event, e.g. removal of a card in-between two back-to-back insertions,
it queued up the wrong work item or no work item at all.  Commit
fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking
for new ones") sought to improve the situation by shrinking the window
during which events may be missed.

But Stefan Roese reports unbalanced Card present and Link Up events,
suggesting that we're still missing events if they occur very rapidly.
Bjorn Helgaas responds that he considers pciehp's event handling
"baroque" and calls for its simplification and rationalization:
https://patchwork.ozlabs.org/patch/867418/

It gets worse once a hotplug port is runtime suspended:  The port can
signal an interrupt while it and its parents are in D3hot, i.e. while
it is inaccessible.  By the time we've runtime resumed all parents to D0
and read the port's Slot Status register, we may have missed an arbitrary
number of events.  Event handling therefore needs to be reworked to
become resilient to missed events.

Assume that a Presence Detect Changed event has occurred.
Consider the following truth table:
- Slot is in OFF_STATE and is currently empty.    => Do nothing.
  (The event is trailing a Link Down or we've
  missed an insertion and subsequent removal.)
- Slot is in OFF_STATE and is currently occupied. => Turn the slot on.
- Slot is in ON_STATE  and is currently empty.    => Turn the slot off.
- Slot is in ON_STATE  and is currently occupied. => Turn the slot off,
  (Be cautious and assume the card in                then back on.
  the slot isn't the same as before.)

This leads to the following simple algorithm:
1 If the slot is in ON_STATE, turn it off unconditionally.
2 If the slot is currently occupied, turn it on.

Because those actions are now carried out synchronously, rather than by
scheduled work items, pciehp reacts to the *current* situation and
missed events no longer matter.

Data Link Layer State Changed events can be handled identically to
Presence Detect Changed events.  Note that in the above truth table,
a Link Up trailing a Card present event didn't have to be accounted for:
It is filtered out by pciehp_check_link_status().

As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says:
"Once the Power Indicator begins blinking, a 5-second abort interval
exists during which a second depression of the Attention Button cancels
the operation."  In other words, the user can only expect the system to
react to a button press after it starts blinking.  Missed button presses
that occur in-between are irrelevant.

Cc: Stefan Roese <sr@denx.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      |  3 +-
 drivers/pci/hotplug/pciehp_ctrl.c | 80 ++++++++++++++-----------------
 drivers/pci/hotplug/pciehp_hpc.c  | 10 +---
 3 files changed, 40 insertions(+), 53 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 9c75acd291fb..47cd9af5caf3 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -169,8 +169,7 @@ int pciehp_sysfs_disable_slot(struct slot *slot);
 void pciehp_request(struct controller *ctrl, int action);
 void pciehp_handle_button_press(struct slot *slot);
 void pciehp_handle_disable_request(struct slot *slot);
-void pciehp_handle_link_change(struct slot *slot);
-void pciehp_handle_presence_change(struct slot *slot);
+void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events);
 int pciehp_configure_device(struct slot *p_slot);
 void pciehp_unconfigure_device(struct slot *p_slot);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 4a12e70aacd0..811019902ada 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -217,66 +217,60 @@ void pciehp_handle_disable_request(struct slot *slot)
 	ctrl->request_result = pciehp_disable_slot(slot);
 }
 
-void pciehp_handle_link_change(struct slot *p_slot)
+void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events)
 {
-	struct controller *ctrl = p_slot->ctrl;
+	struct controller *ctrl = slot->ctrl;
 	bool link_active;
+	u8 present;
 
-	mutex_lock(&p_slot->lock);
-	link_active = pciehp_check_link_active(ctrl);
-
-	switch (p_slot->state) {
-	case BLINKINGON_STATE:
+	/*
+	 * If the slot is on and presence or link has changed, turn it off.
+	 * Even if it's occupied again, we cannot assume the card is the same.
+	 */
+	mutex_lock(&slot->lock);
+	switch (slot->state) {
 	case BLINKINGOFF_STATE:
-		cancel_delayed_work(&p_slot->work);
-		/* Fall through */
+		cancel_delayed_work(&slot->work);
 	case ON_STATE:
-	case OFF_STATE:
-		if (link_active) {
-			p_slot->state = POWERON_STATE;
-			mutex_unlock(&p_slot->lock);
-			ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(p_slot));
-			pciehp_enable_slot(p_slot);
-		} else {
-			p_slot->state = POWEROFF_STATE;
-			mutex_unlock(&p_slot->lock);
-			ctrl_info(ctrl, "Slot(%s): Link Down\n", slot_name(p_slot));
-			pciehp_disable_slot(p_slot);
-		}
-		return;
+		slot->state = POWEROFF_STATE;
+		mutex_unlock(&slot->lock);
+		if (events & PCI_EXP_SLTSTA_DLLSC)
+			ctrl_info(ctrl, "Slot(%s): Link Down\n",
+				  slot_name(slot));
+		if (events & PCI_EXP_SLTSTA_PDC)
+			ctrl_info(ctrl, "Slot(%s): Card not present\n",
+				  slot_name(slot));
+		pciehp_disable_slot(slot);
 		break;
 	default:
-		ctrl_err(ctrl, "Slot(%s): Ignoring invalid state %#x\n",
-			 slot_name(p_slot), p_slot->state);
-		break;
+		mutex_unlock(&slot->lock);
 	}
-	mutex_unlock(&p_slot->lock);
-}
-
-void pciehp_handle_presence_change(struct slot *slot)
-{
-	struct controller *ctrl = slot->ctrl;
-	u8 present;
 
+	/* Turn the slot on if it's occupied or link is up */
 	mutex_lock(&slot->lock);
+	pciehp_get_adapter_status(slot, &present);
+	link_active = pciehp_check_link_active(ctrl);
+	if (!present && !link_active) {
+		mutex_unlock(&slot->lock);
+		return;
+	}
+
 	switch (slot->state) {
 	case BLINKINGON_STATE:
-	case BLINKINGOFF_STATE:
 		cancel_delayed_work(&slot->work);
-	}
-
-	pciehp_get_adapter_status(slot, &present);
-	ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
-		  present ? "" : "not ");
-
-	if (present) {
+	case OFF_STATE:
 		slot->state = POWERON_STATE;
 		mutex_unlock(&slot->lock);
+		if (present)
+			ctrl_info(ctrl, "Slot(%s): Card present\n",
+				  slot_name(slot));
+		if (link_active)
+			ctrl_info(ctrl, "Slot(%s): Link Up\n",
+				  slot_name(slot));
 		ctrl->request_result = pciehp_enable_slot(slot);
-	} else {
-		slot->state = POWEROFF_STATE;
+		break;
+	default:
 		mutex_unlock(&slot->lock);
-		pciehp_disable_slot(slot);
 	}
 }
 
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index adc6a89a3b5d..b746c3b52719 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -602,17 +602,11 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	/*
 	 * Disable requests have higher priority than Presence Detect Changed
 	 * or Data Link Layer State Changed events.
-	 *
-	 * Check Link Status Changed at higher precedence than Presence
-	 * Detect Changed.  The PDS value may be set to "card present" from
-	 * out-of-band detection, which may be in conflict with a Link Down.
 	 */
 	if (events & DISABLE_SLOT)
 		pciehp_handle_disable_request(slot);
-	else if (events & PCI_EXP_SLTSTA_DLLSC)
-		pciehp_handle_link_change(slot);
-	else if (events & PCI_EXP_SLTSTA_PDC)
-		pciehp_handle_presence_change(slot);
+	else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
+		pciehp_handle_presence_or_link_change(slot, events);
 
 	/* Check Power Fault Detected */
 	if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 27/32] PCI: pciehp: Support interrupts sent from D3hot
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (30 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 24/32] PCI: portdrv: Deduplicate PM callback iterator Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-07-12 23:03   ` Bjorn Helgaas
  2018-06-21 12:19 ` [PATCH 00/32] Rework pciehp event handling & add runtime PM Mika Westerberg
                   ` (2 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Thomas Gleixner

If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment.  After all, if its parent is in
D3hot and the link to the hotplug port is thus down, how should an
interrupt come through?

It turns out that assumption is wrong at least for Thunderbolt:  Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts.  Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.

If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events.  For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.

That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port.  The worst case is a Thunderbolt downstream port at the
end of a daisy chain:  There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port.  That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.

Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)

PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."

This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended.  pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there.  On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled.  (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)

(Confusingly, section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector".  That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports.)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h     |  5 ++++
 drivers/pci/hotplug/pciehp_hpc.c | 45 ++++++++++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 247681963063..811cf83f956d 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -156,8 +156,13 @@ struct controller {
  *
  * %DISABLE_SLOT: Disable the slot in response to a user request via sysfs or
  *	an Attention Button press after the 5 second delay
+ * %RERUN_ISR: Used by the IRQ handler to inform the IRQ thread that the
+ *	hotplug port was inaccessible when the interrupt occurred, requiring
+ *	that the IRQ handler is rerun by the IRQ thread after it has made the
+ *	hotplug port accessible by runtime resuming its parents to D0
  */
 #define DISABLE_SLOT		(1 << 16)
+#define RERUN_ISR		(1 << 17)
 
 #define ATTN_BUTTN(ctrl)	((ctrl)->slot_cap & PCI_EXP_SLTCAP_ABP)
 #define POWER_CTRL(ctrl)	((ctrl)->slot_cap & PCI_EXP_SLTCAP_PCP)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 165b940de783..4bf7dda433b3 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -19,6 +19,7 @@
 #include <linux/jiffies.h>
 #include <linux/kthread.h>
 #include <linux/pci.h>
+#include <linux/pm_runtime.h>
 #include <linux/interrupt.h>
 #include <linux/time.h>
 #include <linux/slab.h>
@@ -521,6 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 {
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
+	struct device *parent = pdev->dev.parent;
 	u16 status, events;
 
 	/*
@@ -529,9 +531,26 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	if (pdev->current_state == PCI_D3cold)
 		return IRQ_NONE;
 
+	/*
+	 * Keep the port accessible by holding a runtime PM ref on its parent.
+	 * Defer resume of the parent to the IRQ thread if it's suspended.
+	 * Mask the interrupt until then.
+	 */
+	if (parent) {
+		pm_runtime_get_noresume(parent);
+		if (!pm_runtime_active(parent)) {
+			pm_runtime_put(parent);
+			disable_irq_nosync(irq);
+			atomic_or(RERUN_ISR, &ctrl->pending_events);
+			return IRQ_WAKE_THREAD;
+		}
+	}
+
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
 	if (status == (u16) ~0) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
+		if (parent)
+			pm_runtime_put(parent);
 		return IRQ_NONE;
 	}
 
@@ -550,11 +569,16 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	if (ctrl->power_fault_detected)
 		events &= ~PCI_EXP_SLTSTA_PFD;
 
-	if (!events)
+	if (!events) {
+		if (parent)
+			pm_runtime_put(parent);
 		return IRQ_NONE;
+	}
 
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
 	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
+	if (parent)
+		pm_runtime_put(parent);
 
 	/*
 	 * Command Completed notifications are not deferred to the
@@ -584,13 +608,29 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 static irqreturn_t pciehp_ist(int irq, void *dev_id)
 {
 	struct controller *ctrl = (struct controller *)dev_id;
+	struct pci_dev *pdev = ctrl_dev(ctrl);
 	struct slot *slot = ctrl->slot;
+	irqreturn_t ret;
 	u32 events;
 
+	pci_config_pm_runtime_get(pdev);
+
+	/* rerun pciehp_isr() if the port was inaccessible on interrupt */
+	if (atomic_fetch_and(~RERUN_ISR, &ctrl->pending_events) & RERUN_ISR) {
+		ret = pciehp_isr(irq, dev_id);
+		enable_irq(irq);
+		if (ret != IRQ_WAKE_THREAD) {
+			pci_config_pm_runtime_put(pdev);
+			return ret;
+		}
+	}
+
 	synchronize_hardirq(irq);
 	events = atomic_xchg(&ctrl->pending_events, 0);
-	if (!events)
+	if (!events) {
+		pci_config_pm_runtime_put(pdev);
 		return IRQ_NONE;
+	}
 
 	/* Check Attention Button Pressed */
 	if (events & PCI_EXP_SLTSTA_ABP) {
@@ -618,6 +658,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 		pciehp_green_led_off(slot);
 	}
 
+	pci_config_pm_runtime_put(pdev);
 	wake_up(&ctrl->requester);
 	return IRQ_HANDLED;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 15/32] PCI: pciehp: Publish to user space last on probe
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (7 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 26/32] PCI: pciehp: Obey compulsory command delay after resume Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 16/32] PCI: pciehp: Track enable/disable status Lukas Wunner
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

The PCI hotplug core has just been refactored to separate slot
initialization for in-kernel use from publication to user space.

Take advantage of it in pciehp by publishing to user space last on
probe.  This will allow enable/disablement of the slot exclusively from
the IRQ thread because the IRQ is requested after initialization for
in-kernel use (thereby getting its unique name needed by the IRQ thread)
but before user space is able to submit enable/disable requests.

On teardown, the order is the same in reverse:  The user space interface
is removed prior to freeing the IRQ and destroying the slot.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_core.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 37d8f81e548f..cde32e137f6c 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -100,10 +100,10 @@ static int init_slot(struct controller *ctrl)
 	slot->hotplug_slot = hotplug;
 	snprintf(name, SLOT_NAME_SIZE, "%u", PSN(ctrl));
 
-	retval = pci_hp_register(hotplug,
-				 ctrl->pcie->port->subordinate, 0, name);
+	retval = pci_hp_initialize(hotplug,
+				   ctrl->pcie->port->subordinate, 0, name);
 	if (retval)
-		ctrl_err(ctrl, "pci_hp_register failed: error %d\n", retval);
+		ctrl_err(ctrl, "pci_hp_initialize failed: error %d\n", retval);
 out:
 	if (retval) {
 		kfree(ops);
@@ -117,7 +117,7 @@ static void cleanup_slot(struct controller *ctrl)
 {
 	struct hotplug_slot *hotplug_slot = ctrl->slot->hotplug_slot;
 
-	pci_hp_deregister(hotplug_slot);
+	pci_hp_destroy(hotplug_slot);
 	kfree(hotplug_slot->ops);
 	kfree(hotplug_slot->info);
 	kfree(hotplug_slot);
@@ -231,8 +231,15 @@ static int pciehp_probe(struct pcie_device *dev)
 		goto err_out_free_ctrl_slot;
 	}
 
-	/* Check if slot is occupied */
+	/* Publish to user space */
 	slot = ctrl->slot;
+	rc = pci_hp_add(slot->hotplug_slot);
+	if (rc) {
+		ctrl_err(ctrl, "Publication to user space failed (%d)\n", rc);
+		goto err_out_shutdown_notification;
+	}
+
+	/* Check if slot is occupied */
 	pciehp_get_adapter_status(slot, &occupied);
 	pciehp_get_power_status(slot, &poweron);
 	if (occupied && pciehp_force)
@@ -243,6 +250,8 @@ static int pciehp_probe(struct pcie_device *dev)
 
 	return 0;
 
+err_out_shutdown_notification:
+	pcie_shutdown_notification(ctrl);
 err_out_free_ctrl_slot:
 	cleanup_slot(ctrl);
 err_out_release_ctlr:
@@ -254,6 +263,7 @@ static void pciehp_remove(struct pcie_device *dev)
 {
 	struct controller *ctrl = get_service_data(dev);
 
+	pci_hp_del(ctrl->slot->hotplug_slot);
 	pcie_shutdown_notification(ctrl);
 	cleanup_slot(ctrl);
 	pciehp_release_ctrl(ctrl);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 13/32] PCI: pciehp: Drop slot workqueue
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (14 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 10/32] PCI: pciehp: Convert to threaded polling Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 12/32] PCI: pciehp: Handle events synchronously Lukas Wunner
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Previously the slot workqueue was used to handle events and enable or
disable the slot.  That's no longer the case as those tasks are done
synchronously in the IRQ thread.  The slot workqueue is thus merely used
to handle a button press after the 5 second delay and only one such work
item may be in flight at any given time.  A separate workqueue isn't
necessary for this simple task, so use the system workqueue instead.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      | 2 --
 drivers/pci/hotplug/pciehp_core.c | 6 ------
 drivers/pci/hotplug/pciehp_ctrl.c | 2 +-
 drivers/pci/hotplug/pciehp_hpc.c  | 9 +--------
 4 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 82ff77cc92f5..0d005c5fabfa 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -69,7 +69,6 @@ do {									\
  *	protects scheduling, execution and cancellation of @work
  * @hotplug_lock: serializes calls to pciehp_enable_slot() and
  *	pciehp_disable_slot()
- * @wq: work queue on which @work is scheduled
  */
 struct slot {
 	u8 state;
@@ -78,7 +77,6 @@ struct slot {
 	struct delayed_work work;
 	struct mutex lock;
 	struct mutex hotplug_lock;
-	struct workqueue_struct *wq;
 };
 
 /**
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 8a2133856dfd..b360645377c2 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -62,12 +62,6 @@ static int reset_slot(struct hotplug_slot *slot, int probe);
  */
 static void release_slot(struct hotplug_slot *hotplug_slot)
 {
-	struct slot *slot = hotplug_slot->private;
-
-	/* queued work needs hotplug_slot name */
-	cancel_delayed_work(&slot->work);
-	drain_workqueue(slot->wq);
-
 	kfree(hotplug_slot->ops);
 	kfree(hotplug_slot->info);
 	kfree(hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 5763e81be2ed..a4a8a5457aca 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -165,7 +165,7 @@ void pciehp_handle_button_press(struct slot *p_slot)
 		/* blink green LED and turn off amber */
 		pciehp_green_led_blink(p_slot);
 		pciehp_set_attention_status(p_slot, 0);
-		queue_delayed_work(p_slot->wq, &p_slot->work, 5*HZ);
+		schedule_delayed_work(&p_slot->work, 5 * HZ);
 		break;
 	case BLINKINGOFF_STATE:
 	case BLINKINGON_STATE:
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bc70762cbba1..0898501eea93 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -767,25 +767,18 @@ static int pcie_init_slot(struct controller *ctrl)
 	if (!slot)
 		return -ENOMEM;
 
-	slot->wq = alloc_ordered_workqueue("pciehp-%u", 0, PSN(ctrl));
-	if (!slot->wq)
-		goto abort;
-
 	slot->ctrl = ctrl;
 	mutex_init(&slot->lock);
 	mutex_init(&slot->hotplug_lock);
 	INIT_DELAYED_WORK(&slot->work, pciehp_queue_pushbutton_work);
 	ctrl->slot = slot;
 	return 0;
-abort:
-	kfree(slot);
-	return -ENOMEM;
 }
 
 static void pcie_cleanup_slot(struct controller *ctrl)
 {
 	struct slot *slot = ctrl->slot;
-	destroy_workqueue(slot->wq);
+	cancel_delayed_work_sync(&slot->work);
 	kfree(slot);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (26 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 30/32] PCI: sysfs: Resume to D0 on function reset Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-09-06 16:01   ` Mika Westerberg
  2018-06-16 19:25 ` [PATCH 18/32] PCI: pciehp: Drop enable/disable lock Lukas Wunner
                   ` (6 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

When pciehp was conceived it apparently wasn't considered that one
hotplug port may be the parent of another, but with Thunderbolt this is
par for the course.

If the sysfs interface is used to initiate a simultaneous card removal
of two hotplug ports where one is a parent of the other, or if the two
ports simultaneously signal interrupts, e.g. because a link or presence
change was detected on both, a deadlock may occur if:

- The parent acquires pci_lock_rescan_remove(), starts removal of the
  child and waits for it to be unbound.
- The child waits to acquire pci_lock_rescan_remove() in order to
  service the pending sysfs command or interrupt before it can unbind.

Fix by using pci_dev_is_disconnected() as an indicator whether a parent
is removing the child, and avoid acquiring the lock in the child if so.
Should the child happen to acquire the lock first, there's no deadlock.

Testing or setting the disconnected flag needs to happen atomically with
acquisition of the lock, so introduce a mutex to protect that critical
section.

Previously the disconnected flag was only set if the slot was no longer
occupied.  That doesn't seem to make sense because the (logical) child
devices are going to be removed regardless of occupancy, so set the flag
unconditionally.  (Why is occupancy checked at all?  Because if the slot
was disabled via sysfs or a button press, the device remains physically
present for the time being, so the Command register is modified to
quiesce the device.)

The deadlock is probably rarely encountered in practice, but I can
easily reproduce it in poll mode:

    INFO: task pciehp_poll-4-2:102 blocked for more than 120 seconds.
    __schedule+0x291/0x880
    schedule+0x28/0x80
    schedule_timeout+0x1e3/0x370
    wait_for_completion+0x123/0x190
    kthread_stop+0x42/0xf0
    pciehp_release_ctrl+0xaa/0xb0
    pcie_port_remove_service+0x2f/0x40
    device_release_driver_internal+0x157/0x220
    bus_remove_device+0xe2/0x150
    device_del+0x124/0x340
    device_unregister+0x16/0x60
    remove_iter+0x1a/0x20
    device_for_each_child+0x4b/0x90
    pcie_port_device_remove+0x1e/0x30
    pci_device_remove+0x36/0xb0
    device_release_driver_internal+0x157/0x220
    pci_stop_bus_device+0x7d/0xa0
    pci_stop_bus_device+0x2b/0xa0
    pci_stop_and_remove_bus_device+0xe/0x20
    pciehp_unconfigure_device+0xb8/0x160
    pciehp_disable_slot+0x84/0x130
    pciehp_handle_card_not_present+0xd6/0x120
    pciehp_ist+0x111/0x150
    pciehp_poll+0x37/0x90
    kthread+0x111/0x130

    INFO: task pciehp_poll-9:104 blocked for more than 120 seconds.
    schedule+0x28/0x80
    schedule_preempt_disabled+0xa/0x10
    __mutex_lock.isra.1+0x1a0/0x4e0
    pciehp_configure_device+0x24/0x130
    pciehp_enable_slot+0x236/0x390
    pciehp_handle_card_present+0xe2/0x160
    pciehp_ist+0x11e/0x150
    pciehp_poll+0x37/0x90
    kthread+0x111/0x130

Cc: stable@vger.kernel.org
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_pci.c | 38 ++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index 3f518dea856d..242395389293 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -20,15 +20,27 @@
 #include "../pci.h"
 #include "pciehp.h"
 
+static DEFINE_MUTEX(acquire_unless_disconnected);
+
 int pciehp_configure_device(struct slot *p_slot)
 {
 	struct pci_dev *dev;
-	struct pci_dev *bridge = p_slot->ctrl->pcie->port;
+	struct controller *ctrl = p_slot->ctrl;
+	struct pci_dev *bridge = ctrl->pcie->port;
 	struct pci_bus *parent = bridge->subordinate;
 	int num, ret = 0;
-	struct controller *ctrl = p_slot->ctrl;
 
+	/*
+	 * Avoid deadlock if an upstream hotplug port has already acquired
+	 * pci_lock_rescan_remove() in order to remove this hotplug port.
+	 */
+	mutex_lock(&acquire_unless_disconnected);
+	if (pci_dev_is_disconnected(bridge)) {
+		mutex_unlock(&acquire_unless_disconnected);
+		return -ENODEV;
+	}
 	pci_lock_rescan_remove();
+	mutex_unlock(&acquire_unless_disconnected);
 
 	dev = pci_get_slot(parent, PCI_DEVFN(0, 0));
 	if (dev) {
@@ -67,16 +79,24 @@ int pciehp_unconfigure_device(struct slot *p_slot)
 	int rc = 0;
 	u8 presence = 0;
 	struct pci_dev *dev, *temp;
-	struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate;
-	u16 command;
 	struct controller *ctrl = p_slot->ctrl;
+	struct pci_dev *bridge = ctrl->pcie->port;
+	struct pci_bus *parent = bridge->subordinate;
+	u16 command;
+
+	mutex_lock(&acquire_unless_disconnected);
+	if (pci_dev_is_disconnected(bridge)) {
+		mutex_unlock(&acquire_unless_disconnected);
+		return -ENODEV;
+	}
+	pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
+	pci_lock_rescan_remove();
+	mutex_unlock(&acquire_unless_disconnected);
 
 	ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n",
 		 __func__, pci_domain_nr(parent), parent->number);
 	pciehp_get_adapter_status(p_slot, &presence);
 
-	pci_lock_rescan_remove();
-
 	/*
 	 * Stopping an SR-IOV PF device removes all the associated VFs,
 	 * which will update the bus->devices list and confuse the
@@ -86,12 +106,6 @@ int pciehp_unconfigure_device(struct slot *p_slot)
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
 					 bus_list) {
 		pci_dev_get(dev);
-		if (!presence) {
-			pci_dev_set_disconnected(dev, NULL);
-			if (pci_has_subordinate(dev))
-				pci_walk_bus(dev->subordinate,
-					     pci_dev_set_disconnected, NULL);
-		}
 		pci_stop_and_remove_bus_device(dev);
 		/*
 		 * Ensure that no new Requests will be generated from
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/32] PCI: pciehp: Drop unnecessary NULL pointer check
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (24 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 20/32] PCI: pciehp: Tolerate initially unstable link Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 30/32] PCI: sysfs: Resume to D0 on function reset Lukas Wunner
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

pciehp_disable_slot() checks if the ctrl attribute of the slot is NULL
and bails out if so.  However the function is not called prior to the
attribute being set in pcie_init_slot(), and pcie_init_slot() is not
called if ctrl is NULL.  So the check is unnecessary.  Drop it.

It has been present ever since the driver was introduced in 2004, but it
was already unnecessary back then:
https://git.kernel.org/tglx/history/c/c16b4b14d980

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index c684faa43387..4a4639b7a479 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -412,9 +412,6 @@ int pciehp_disable_slot(struct slot *p_slot)
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
 
-	if (!p_slot->ctrl)
-		return 1;
-
 	if (POWER_CTRL(p_slot->ctrl)) {
 		pciehp_get_power_status(p_slot, &getstatus);
 		if (!getstatus) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 19/32] PCI: pciehp: Declare pciehp_enable/disable_slot() static
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (2 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 31/32] PCI: Whitelist native hotplug ports for runtime D3 Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 11/32] PCI: pciehp: Stop blinking on slot enable failure Lukas Wunner
                   ` (30 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

No callers of pciehp_enable/disable_slot() outside of pciehp_ctrl.c
remain, so declare the functions static.  For now this requires forward
declarations.  Those can be eliminated by reshuffling functions once the
ongoing effort to refactor the driver has settled.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      | 2 --
 drivers/pci/hotplug/pciehp_ctrl.c | 7 +++++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 5417180c3aff..9c75acd291fb 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -177,8 +177,6 @@ void pciehp_queue_pushbutton_work(struct work_struct *work);
 struct controller *pcie_init(struct pcie_device *dev);
 int pcie_init_notification(struct controller *ctrl);
 void pcie_shutdown_notification(struct controller *ctrl);
-int pciehp_enable_slot(struct slot *p_slot);
-int pciehp_disable_slot(struct slot *p_slot);
 void pcie_reenable_notification(struct controller *ctrl);
 int pciehp_power_on_slot(struct slot *slot);
 void pciehp_power_off_slot(struct slot *slot);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 8ba937599fb3..4a12e70aacd0 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -122,6 +122,9 @@ static void remove_board(struct slot *p_slot)
 	pciehp_green_led_off(p_slot);
 }
 
+static int pciehp_enable_slot(struct slot *slot);
+static int pciehp_disable_slot(struct slot *slot);
+
 void pciehp_request(struct controller *ctrl, int action)
 {
 	atomic_or(action, &ctrl->pending_events);
@@ -308,7 +311,7 @@ static int __pciehp_enable_slot(struct slot *p_slot)
 	return board_added(p_slot);
 }
 
-int pciehp_enable_slot(struct slot *slot)
+static int pciehp_enable_slot(struct slot *slot)
 {
 	struct controller *ctrl = slot->ctrl;
 	int ret;
@@ -342,7 +345,7 @@ static int __pciehp_disable_slot(struct slot *p_slot)
 	return 0;
 }
 
-int pciehp_disable_slot(struct slot *slot)
+static int pciehp_disable_slot(struct slot *slot)
 {
 	int ret;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 16/32] PCI: pciehp: Track enable/disable status
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (8 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 15/32] PCI: pciehp: Publish to user space last on probe Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3 Lukas Wunner
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

handle_button_press_event() currently determines whether the slot has
been turned on or off by looking at the Power Controller Control bit in
the Slot Control register.  This assumes that an attention button
implies presence of a power controller even though that's not mandated
by the spec.  Moreover the Power Controller Control bit is unreliable
when a power fault occurs (PCIe r4.0, sec 6.7.1.8).  This issue has
existed since the driver was introduced in 2004.

Fix by replacing STATIC_STATE with ON_STATE and OFF_STATE and tracking
whether the slot has been turned on or off.  This is also a required
ingredient to make pciehp resilient to missed events, which is the
object of an upcoming commit.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      | 15 ++++++++++++++-
 drivers/pci/hotplug/pciehp_ctrl.c | 28 ++++++++++++++++------------
 drivers/pci/hotplug/pciehp_hpc.c  |  5 +++++
 3 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 0d005c5fabfa..1ba335d6563a 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -122,11 +122,24 @@ struct controller {
 	atomic_t pending_events;
 };
 
-#define STATIC_STATE			0
+/**
+ * DOC: Slot state
+ *
+ * @OFF_STATE: slot is powered off, no subordinate devices are enumerated
+ * @BLINKINGON_STATE: slot will be powered on after the 5 second delay,
+ *	green led is blinking
+ * @BLINKINGOFF_STATE: slot will be powered off after the 5 second delay,
+ *	green led is blinking
+ * @POWERON_STATE: slot is currently powering on
+ * @POWEROFF_STATE: slot is currently powering off
+ * @ON_STATE: slot is powered on, subordinate devices have been enumerated
+ */
+#define OFF_STATE			0
 #define BLINKINGON_STATE		1
 #define BLINKINGOFF_STATE		2
 #define POWERON_STATE			3
 #define POWEROFF_STATE			4
+#define ON_STATE			5
 
 #define ATTN_BUTTN(ctrl)	((ctrl)->slot_cap & PCI_EXP_SLTCAP_ABP)
 #define POWER_CTRL(ctrl)	((ctrl)->slot_cap & PCI_EXP_SLTCAP_PCP)
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index a4a8a5457aca..627e846df802 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -147,13 +147,12 @@ void pciehp_queue_pushbutton_work(struct work_struct *work)
 void pciehp_handle_button_press(struct slot *p_slot)
 {
 	struct controller *ctrl = p_slot->ctrl;
-	u8 getstatus;
 
 	mutex_lock(&p_slot->lock);
 	switch (p_slot->state) {
-	case STATIC_STATE:
-		pciehp_get_power_status(p_slot, &getstatus);
-		if (getstatus) {
+	case OFF_STATE:
+	case ON_STATE:
+		if (p_slot->state == ON_STATE) {
 			p_slot->state = BLINKINGOFF_STATE;
 			ctrl_info(ctrl, "Slot(%s): Powering off due to button press\n",
 				  slot_name(p_slot));
@@ -176,14 +175,16 @@ void pciehp_handle_button_press(struct slot *p_slot)
 		 */
 		ctrl_info(ctrl, "Slot(%s): Button cancel\n", slot_name(p_slot));
 		cancel_delayed_work(&p_slot->work);
-		if (p_slot->state == BLINKINGOFF_STATE)
+		if (p_slot->state == BLINKINGOFF_STATE) {
+			p_slot->state = ON_STATE;
 			pciehp_green_led_on(p_slot);
-		else
+		} else {
+			p_slot->state = OFF_STATE;
 			pciehp_green_led_off(p_slot);
+		}
 		pciehp_set_attention_status(p_slot, 0);
 		ctrl_info(ctrl, "Slot(%s): Action canceled due to button press\n",
 			  slot_name(p_slot));
-		p_slot->state = STATIC_STATE;
 		break;
 	case POWEROFF_STATE:
 	case POWERON_STATE:
@@ -216,7 +217,8 @@ void pciehp_handle_link_change(struct slot *p_slot)
 	case BLINKINGOFF_STATE:
 		cancel_delayed_work(&p_slot->work);
 		/* Fall through */
-	case STATIC_STATE:
+	case ON_STATE:
+	case OFF_STATE:
 		if (link_active) {
 			p_slot->state = POWERON_STATE;
 			mutex_unlock(&p_slot->lock);
@@ -332,7 +334,7 @@ int pciehp_enable_slot(struct slot *slot)
 		pciehp_green_led_off(slot); /* may be blinking */
 
 	mutex_lock(&slot->lock);
-	slot->state = STATIC_STATE;
+	slot->state = ret ? OFF_STATE : ON_STATE;
 	mutex_unlock(&slot->lock);
 
 	return ret;
@@ -368,7 +370,7 @@ int pciehp_disable_slot(struct slot *slot)
 	mutex_unlock(&slot->hotplug_lock);
 
 	mutex_lock(&slot->lock);
-	slot->state = STATIC_STATE;
+	slot->state = OFF_STATE;
 	mutex_unlock(&slot->lock);
 
 	return ret;
@@ -382,7 +384,7 @@ int pciehp_sysfs_enable_slot(struct slot *p_slot)
 	switch (p_slot->state) {
 	case BLINKINGON_STATE:
 		cancel_delayed_work(&p_slot->work);
-	case STATIC_STATE:
+	case OFF_STATE:
 		p_slot->state = POWERON_STATE;
 		mutex_unlock(&p_slot->lock);
 		return pciehp_enable_slot(p_slot);
@@ -391,6 +393,7 @@ int pciehp_sysfs_enable_slot(struct slot *p_slot)
 			  slot_name(p_slot));
 		break;
 	case BLINKINGOFF_STATE:
+	case ON_STATE:
 	case POWEROFF_STATE:
 		ctrl_info(ctrl, "Slot(%s): Already enabled\n",
 			  slot_name(p_slot));
@@ -413,7 +416,7 @@ int pciehp_sysfs_disable_slot(struct slot *p_slot)
 	switch (p_slot->state) {
 	case BLINKINGOFF_STATE:
 		cancel_delayed_work(&p_slot->work);
-	case STATIC_STATE:
+	case ON_STATE:
 		p_slot->state = POWEROFF_STATE;
 		mutex_unlock(&p_slot->lock);
 		return pciehp_disable_slot(p_slot);
@@ -422,6 +425,7 @@ int pciehp_sysfs_disable_slot(struct slot *p_slot)
 			  slot_name(p_slot));
 		break;
 	case BLINKINGON_STATE:
+	case OFF_STATE:
 	case POWERON_STATE:
 		ctrl_info(ctrl, "Slot(%s): Already disabled\n",
 			  slot_name(p_slot));
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 0898501eea93..564a904d7fa9 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -761,12 +761,17 @@ void pcie_shutdown_notification(struct controller *ctrl)
 
 static int pcie_init_slot(struct controller *ctrl)
 {
+	struct pci_bus *subordinate = ctrl_dev(ctrl)->subordinate;
 	struct slot *slot;
 
 	slot = kzalloc(sizeof(*slot), GFP_KERNEL);
 	if (!slot)
 		return -ENOMEM;
 
+	down_read(&pci_bus_sem);
+	slot->state = list_empty(&subordinate->devices) ? OFF_STATE : ON_STATE;
+	up_read(&pci_bus_sem);
+
 	slot->ctrl = ctrl;
 	mutex_init(&slot->lock);
 	mutex_init(&slot->hotplug_lock);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/32] PCI: pciehp: Handle events synchronously
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (15 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 13/32] PCI: pciehp: Drop slot workqueue Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 21/32] PCI: pciehp: Become resilient to missed events Lukas Wunner
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Up until now, pciehp's IRQ handler schedules a work item for each event,
which in turn schedules a work item to enable or disable the slot.  This
double indirection was necessary because sleeping wasn't allowed in the
IRQ handler.

However it is now that pciehp has been converted to threaded IRQ handling
and polling, so handle events synchronously in pciehp_ist() and remove
the work item infrastructure (with the exception of work items to handle
a button press after the 5 second delay).

For link or presence change events, move the register read to determine
the current link or presence state behind acquisition of the slot lock
to prevent if from becoming stale while the lock is contended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      |  20 +---
 drivers/pci/hotplug/pciehp_ctrl.c | 178 +++++++++---------------------
 drivers/pci/hotplug/pciehp_hpc.c  |  27 ++---
 3 files changed, 67 insertions(+), 158 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 9f11360e3318..82ff77cc92f5 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -69,8 +69,7 @@ do {									\
  *	protects scheduling, execution and cancellation of @work
  * @hotplug_lock: serializes calls to pciehp_enable_slot() and
  *	pciehp_disable_slot()
- * @wq: work queue on which @work is scheduled;
- *	also used to queue interrupt events and slot enablement and disablement
+ * @wq: work queue on which @work is scheduled
  */
 struct slot {
 	u8 state;
@@ -82,12 +81,6 @@ struct slot {
 	struct workqueue_struct *wq;
 };
 
-struct event_info {
-	u32 event_type;
-	struct slot *p_slot;
-	struct work_struct work;
-};
-
 /**
  * struct controller - PCIe hotplug controller
  * @ctrl_lock: serializes writes to the Slot Control register
@@ -131,13 +124,6 @@ struct controller {
 	atomic_t pending_events;
 };
 
-#define INT_PRESENCE_ON			1
-#define INT_PRESENCE_OFF		2
-#define INT_POWER_FAULT			3
-#define INT_BUTTON_PRESS		4
-#define INT_LINK_UP			5
-#define INT_LINK_DOWN			6
-
 #define STATIC_STATE			0
 #define BLINKINGON_STATE		1
 #define BLINKINGOFF_STATE		2
@@ -156,7 +142,9 @@ struct controller {
 
 int pciehp_sysfs_enable_slot(struct slot *slot);
 int pciehp_sysfs_disable_slot(struct slot *slot);
-void pciehp_queue_interrupt_event(struct slot *slot, u32 event_type);
+void pciehp_handle_button_press(struct slot *slot);
+void pciehp_handle_link_change(struct slot *slot);
+void pciehp_handle_presence_change(struct slot *slot);
 int pciehp_configure_device(struct slot *p_slot);
 void pciehp_unconfigure_device(struct slot *p_slot);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 9d6343a35db1..5763e81be2ed 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -21,24 +21,6 @@
 #include "../pci.h"
 #include "pciehp.h"
 
-static void interrupt_event_handler(struct work_struct *work);
-
-void pciehp_queue_interrupt_event(struct slot *p_slot, u32 event_type)
-{
-	struct event_info *info;
-
-	info = kmalloc(sizeof(*info), GFP_ATOMIC);
-	if (!info) {
-		ctrl_err(p_slot->ctrl, "dropped event %d (ENOMEM)\n", event_type);
-		return;
-	}
-
-	INIT_WORK(&info->work, interrupt_event_handler);
-	info->event_type = event_type;
-	info->p_slot = p_slot;
-	queue_work(p_slot->wq, &info->work);
-}
-
 /* The following routines constitute the bulk of the
    hotplug controller logic
  */
@@ -140,59 +122,6 @@ static void remove_board(struct slot *p_slot)
 	pciehp_green_led_off(p_slot);
 }
 
-struct power_work_info {
-	struct slot *p_slot;
-	struct work_struct work;
-	unsigned int req;
-#define DISABLE_REQ 0
-#define ENABLE_REQ  1
-};
-
-/**
- * pciehp_power_thread - handle pushbutton events
- * @work: &struct work_struct describing work to be done
- *
- * Scheduled procedure to handle blocking stuff for the pushbuttons.
- * Handles all pending events and exits.
- */
-static void pciehp_power_thread(struct work_struct *work)
-{
-	struct power_work_info *info =
-		container_of(work, struct power_work_info, work);
-	struct slot *p_slot = info->p_slot;
-
-	switch (info->req) {
-	case DISABLE_REQ:
-		pciehp_disable_slot(p_slot);
-		break;
-	case ENABLE_REQ:
-		pciehp_enable_slot(p_slot);
-		break;
-	default:
-		break;
-	}
-
-	kfree(info);
-}
-
-static void pciehp_queue_power_work(struct slot *p_slot, int req)
-{
-	struct power_work_info *info;
-
-	p_slot->state = (req == ENABLE_REQ) ? POWERON_STATE : POWEROFF_STATE;
-
-	info = kmalloc(sizeof(*info), GFP_KERNEL);
-	if (!info) {
-		ctrl_err(p_slot->ctrl, "no memory to queue %s request\n",
-			 (req == ENABLE_REQ) ? "poweron" : "poweroff");
-		return;
-	}
-	info->p_slot = p_slot;
-	INIT_WORK(&info->work, pciehp_power_thread);
-	info->req = req;
-	queue_work(p_slot->wq, &info->work);
-}
-
 void pciehp_queue_pushbutton_work(struct work_struct *work)
 {
 	struct slot *p_slot = container_of(work, struct slot, work.work);
@@ -200,25 +129,27 @@ void pciehp_queue_pushbutton_work(struct work_struct *work)
 	mutex_lock(&p_slot->lock);
 	switch (p_slot->state) {
 	case BLINKINGOFF_STATE:
-		pciehp_queue_power_work(p_slot, DISABLE_REQ);
-		break;
+		p_slot->state = POWEROFF_STATE;
+		mutex_unlock(&p_slot->lock);
+		pciehp_disable_slot(p_slot);
+		return;
 	case BLINKINGON_STATE:
-		pciehp_queue_power_work(p_slot, ENABLE_REQ);
-		break;
+		p_slot->state = POWERON_STATE;
+		mutex_unlock(&p_slot->lock);
+		pciehp_enable_slot(p_slot);
+		return;
 	default:
 		break;
 	}
 	mutex_unlock(&p_slot->lock);
 }
 
-/*
- * Note: This function must be called with slot->lock held
- */
-static void handle_button_press_event(struct slot *p_slot)
+void pciehp_handle_button_press(struct slot *p_slot)
 {
 	struct controller *ctrl = p_slot->ctrl;
 	u8 getstatus;
 
+	mutex_lock(&p_slot->lock);
 	switch (p_slot->state) {
 	case STATIC_STATE:
 		pciehp_get_power_status(p_slot, &getstatus);
@@ -269,14 +200,16 @@ static void handle_button_press_event(struct slot *p_slot)
 			 slot_name(p_slot), p_slot->state);
 		break;
 	}
+	mutex_unlock(&p_slot->lock);
 }
 
-/*
- * Note: This function must be called with slot->lock held
- */
-static void handle_link_event(struct slot *p_slot, u32 event)
+void pciehp_handle_link_change(struct slot *p_slot)
 {
 	struct controller *ctrl = p_slot->ctrl;
+	bool link_active;
+
+	mutex_lock(&p_slot->lock);
+	link_active = pciehp_check_link_active(ctrl);
 
 	switch (p_slot->state) {
 	case BLINKINGON_STATE:
@@ -284,24 +217,40 @@ static void handle_link_event(struct slot *p_slot, u32 event)
 		cancel_delayed_work(&p_slot->work);
 		/* Fall through */
 	case STATIC_STATE:
-		pciehp_queue_power_work(p_slot, event == INT_LINK_UP ?
-					ENABLE_REQ : DISABLE_REQ);
+		if (link_active) {
+			p_slot->state = POWERON_STATE;
+			mutex_unlock(&p_slot->lock);
+			ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(p_slot));
+			pciehp_enable_slot(p_slot);
+		} else {
+			p_slot->state = POWEROFF_STATE;
+			mutex_unlock(&p_slot->lock);
+			ctrl_info(ctrl, "Slot(%s): Link Down\n", slot_name(p_slot));
+			pciehp_disable_slot(p_slot);
+		}
+		return;
 		break;
 	case POWERON_STATE:
-		if (event == INT_LINK_UP) {
+		if (link_active) {
 			ctrl_info(ctrl, "Slot(%s): Link Up event ignored; already powering on\n",
 				  slot_name(p_slot));
 		} else {
+			p_slot->state = POWEROFF_STATE;
+			mutex_unlock(&p_slot->lock);
 			ctrl_info(ctrl, "Slot(%s): Link Down event queued; currently getting powered on\n",
 				  slot_name(p_slot));
-			pciehp_queue_power_work(p_slot, DISABLE_REQ);
+			pciehp_disable_slot(p_slot);
+			return;
 		}
 		break;
 	case POWEROFF_STATE:
-		if (event == INT_LINK_UP) {
+		if (link_active) {
+			p_slot->state = POWERON_STATE;
+			mutex_unlock(&p_slot->lock);
 			ctrl_info(ctrl, "Slot(%s): Link Up event queued; currently getting powered off\n",
 				  slot_name(p_slot));
-			pciehp_queue_power_work(p_slot, ENABLE_REQ);
+			pciehp_enable_slot(p_slot);
+			return;
 		} else {
 			ctrl_info(ctrl, "Slot(%s): Link Down event ignored; already powering off\n",
 				  slot_name(p_slot));
@@ -312,45 +261,28 @@ static void handle_link_event(struct slot *p_slot, u32 event)
 			 slot_name(p_slot), p_slot->state);
 		break;
 	}
+	mutex_unlock(&p_slot->lock);
 }
 
-static void interrupt_event_handler(struct work_struct *work)
+void pciehp_handle_presence_change(struct slot *slot)
 {
-	struct event_info *info = container_of(work, struct event_info, work);
-	struct slot *p_slot = info->p_slot;
-	struct controller *ctrl = p_slot->ctrl;
+	struct controller *ctrl = slot->ctrl;
+	u8 present;
 
-	mutex_lock(&p_slot->lock);
-	switch (info->event_type) {
-	case INT_BUTTON_PRESS:
-		handle_button_press_event(p_slot);
-		break;
-	case INT_POWER_FAULT:
-		if (!POWER_CTRL(ctrl))
-			break;
-		pciehp_set_attention_status(p_slot, 1);
-		pciehp_green_led_off(p_slot);
-		break;
-	case INT_PRESENCE_ON:
-		pciehp_queue_power_work(p_slot, ENABLE_REQ);
-		break;
-	case INT_PRESENCE_OFF:
-		/*
-		 * Regardless of surprise capability, we need to
-		 * definitely remove a card that has been pulled out!
-		 */
-		pciehp_queue_power_work(p_slot, DISABLE_REQ);
-		break;
-	case INT_LINK_UP:
-	case INT_LINK_DOWN:
-		handle_link_event(p_slot, info->event_type);
-		break;
-	default:
-		break;
+	mutex_lock(&slot->lock);
+	pciehp_get_adapter_status(slot, &present);
+	ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
+		  present ? "" : "not ");
+
+	if (present) {
+		slot->state = POWERON_STATE;
+		mutex_unlock(&slot->lock);
+		pciehp_enable_slot(slot);
+	} else {
+		slot->state = POWEROFF_STATE;
+		mutex_unlock(&slot->lock);
+		pciehp_disable_slot(slot);
 	}
-	mutex_unlock(&p_slot->lock);
-
-	kfree(info);
 }
 
 /*
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index be4d9698f254..bc70762cbba1 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -581,8 +581,6 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct slot *slot = ctrl->slot;
 	u32 events;
-	u8 present;
-	bool link;
 
 	synchronize_hardirq(irq);
 	events = atomic_xchg(&ctrl->pending_events, 0);
@@ -593,34 +591,25 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	if (events & PCI_EXP_SLTSTA_ABP) {
 		ctrl_info(ctrl, "Slot(%s): Attention button pressed\n",
 			  slot_name(slot));
-		pciehp_queue_interrupt_event(slot, INT_BUTTON_PRESS);
+		pciehp_handle_button_press(slot);
 	}
 
 	/*
 	 * Check Link Status Changed at higher precedence than Presence
 	 * Detect Changed.  The PDS value may be set to "card present" from
-	 * out-of-band detection, which may be in conflict with a Link Down
-	 * and cause the wrong event to queue.
+	 * out-of-band detection, which may be in conflict with a Link Down.
 	 */
-	if (events & PCI_EXP_SLTSTA_DLLSC) {
-		link = pciehp_check_link_active(ctrl);
-		ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot),
-			  link ? "Up" : "Down");
-		pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP :
-					     INT_LINK_DOWN);
-	} else if (events & PCI_EXP_SLTSTA_PDC) {
-		pciehp_get_adapter_status(slot, &present);
-		ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
-			  present ? "" : "not ");
-		pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
-					     INT_PRESENCE_OFF);
-	}
+	if (events & PCI_EXP_SLTSTA_DLLSC)
+		pciehp_handle_link_change(slot);
+	else if (events & PCI_EXP_SLTSTA_PDC)
+		pciehp_handle_presence_change(slot);
 
 	/* Check Power Fault Detected */
 	if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
 		ctrl->power_fault_detected = 1;
 		ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot));
-		pciehp_queue_interrupt_event(slot, INT_POWER_FAULT);
+		pciehp_set_attention_status(slot, 1);
+		pciehp_green_led_off(slot);
 	}
 
 	return IRQ_HANDLED;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 29/32] PCI: pciehp: Resume parent to D0 on config space access
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (19 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 02/32] PCI: pciehp: Fix UAF on unplug Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 17/32] PCI: pciehp: Enable/disable exclusively from IRQ thread Lukas Wunner
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Ensure accessibility of a hotplug port's config space when accessed via
sysfs by resuming its parent to D0.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_core.c | 14 ++++++++++++++
 drivers/pci/hotplug/pciehp_hpc.c  |  7 +++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 3e73b12ed656..01fcf1fa0f66 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -26,6 +26,8 @@
 #include <linux/interrupt.h>
 #include <linux/time.h>
 
+#include "../pci.h"
+
 /* Global variables */
 bool pciehp_debug;
 bool pciehp_poll_mode;
@@ -126,8 +128,11 @@ static void cleanup_slot(struct controller *ctrl)
 static int set_attention_status(struct hotplug_slot *hotplug_slot, u8 status)
 {
 	struct slot *slot = hotplug_slot->private;
+	struct pci_dev *pdev = slot->ctrl->pcie->port;
 
+	pci_config_pm_runtime_get(pdev);
 	pciehp_set_attention_status(slot, status);
+	pci_config_pm_runtime_put(pdev);
 	return 0;
 }
 
@@ -150,8 +155,11 @@ static int disable_slot(struct hotplug_slot *hotplug_slot)
 static int get_power_status(struct hotplug_slot *hotplug_slot, u8 *value)
 {
 	struct slot *slot = hotplug_slot->private;
+	struct pci_dev *pdev = slot->ctrl->pcie->port;
 
+	pci_config_pm_runtime_get(pdev);
 	pciehp_get_power_status(slot, value);
+	pci_config_pm_runtime_put(pdev);
 	return 0;
 }
 
@@ -166,16 +174,22 @@ static int get_attention_status(struct hotplug_slot *hotplug_slot, u8 *value)
 static int get_latch_status(struct hotplug_slot *hotplug_slot, u8 *value)
 {
 	struct slot *slot = hotplug_slot->private;
+	struct pci_dev *pdev = slot->ctrl->pcie->port;
 
+	pci_config_pm_runtime_get(pdev);
 	pciehp_get_latch_status(slot, value);
+	pci_config_pm_runtime_put(pdev);
 	return 0;
 }
 
 static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 {
 	struct slot *slot = hotplug_slot->private;
+	struct pci_dev *pdev = slot->ctrl->pcie->port;
 
+	pci_config_pm_runtime_get(pdev);
 	pciehp_get_adapter_status(slot, value);
+	pci_config_pm_runtime_put(pdev);
 	return 0;
 }
 
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 4bf7dda433b3..73b1d02a36d0 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -322,7 +322,9 @@ int pciehp_get_raw_indicator_status(struct hotplug_slot *hotplug_slot,
 	struct pci_dev *pdev = ctrl_dev(slot->ctrl);
 	u16 slot_ctrl;
 
+	pci_config_pm_runtime_get(pdev);
 	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
+	pci_config_pm_runtime_put(pdev);
 	*status = (slot_ctrl & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
 	return 0;
 }
@@ -333,7 +335,9 @@ void pciehp_get_attention_status(struct slot *slot, u8 *status)
 	struct pci_dev *pdev = ctrl_dev(ctrl);
 	u16 slot_ctrl;
 
+	pci_config_pm_runtime_get(pdev);
 	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
+	pci_config_pm_runtime_put(pdev);
 	ctrl_dbg(ctrl, "%s: SLOTCTRL %x, value read %x\n", __func__,
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_ctrl);
 
@@ -408,9 +412,12 @@ int pciehp_set_raw_indicator_status(struct hotplug_slot *hotplug_slot,
 {
 	struct slot *slot = hotplug_slot->private;
 	struct controller *ctrl = slot->ctrl;
+	struct pci_dev *pdev = ctrl_dev(ctrl);
 
+	pci_config_pm_runtime_get(pdev);
 	pcie_write_cmd_nowait(ctrl, status << 6,
 			      PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC);
+	pci_config_pm_runtime_put(pdev);
 	return 0;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 24/32] PCI: portdrv: Deduplicate PM callback iterator
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (29 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 27/32] PCI: pciehp: Support interrupts sent from D3hot Lukas Wunner
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Replace suspend_iter() and resume_iter() with a single function pm_iter()
to allow addition of port service callbacks for further power management
phases without having to add another iterator each time.

No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/portdrv_core.c | 30 ++++++++++++------------------
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e0261ad4bcdd..13a248575a14 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -353,14 +353,19 @@ int pcie_port_device_register(struct pci_dev *dev)
 }
 
 #ifdef CONFIG_PM
-static int suspend_iter(struct device *dev, void *data)
+typedef int (*pcie_pm_callback_t)(struct pcie_device *);
+
+static int pm_iter(struct device *dev, void *data)
 {
 	struct pcie_port_service_driver *service_driver;
+	size_t offset = *(size_t *)data;
+	pcie_pm_callback_t cb;
 
 	if ((dev->bus == &pcie_port_bus_type) && dev->driver) {
 		service_driver = to_service_driver(dev->driver);
-		if (service_driver->suspend)
-			service_driver->suspend(to_pcie_device(dev));
+		cb = *(pcie_pm_callback_t *)((void *)service_driver + offset);
+		if (cb)
+			return cb(to_pcie_device(dev));
 	}
 	return 0;
 }
@@ -371,20 +376,8 @@ static int suspend_iter(struct device *dev, void *data)
  */
 int pcie_port_device_suspend(struct device *dev)
 {
-	return device_for_each_child(dev, NULL, suspend_iter);
-}
-
-static int resume_iter(struct device *dev, void *data)
-{
-	struct pcie_port_service_driver *service_driver;
-
-	if ((dev->bus == &pcie_port_bus_type) &&
-	    (dev->driver)) {
-		service_driver = to_service_driver(dev->driver);
-		if (service_driver->resume)
-			service_driver->resume(to_pcie_device(dev));
-	}
-	return 0;
+	size_t off = offsetof(struct pcie_port_service_driver, suspend);
+	return device_for_each_child(dev, &off, pm_iter);
 }
 
 /**
@@ -393,7 +386,8 @@ static int resume_iter(struct device *dev, void *data)
  */
 int pcie_port_device_resume(struct device *dev)
 {
-	return device_for_each_child(dev, NULL, resume_iter);
+	size_t off = offsetof(struct pcie_port_service_driver, resume);
+	return device_for_each_child(dev, &off, pm_iter);
 }
 #endif /* PM */
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 01/32] PCI: hotplug: Don't leak pci_slot on registration failure
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (5 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 23/32] PCI: pciehp: Avoid slot access during reset Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 26/32] PCI: pciehp: Obey compulsory command delay after resume Lukas Wunner
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman

If addition of sysfs files fails on registration of a hotplug slot, the
struct pci_slot as well as the entry in the slot_list is leaked.  The
issue has been present since the hotplug core was introduced in 2002:
https://git.kernel.org/tglx/history/c/a8a2069f432c

Perhaps the idea was that even though sysfs addition fails, the slot
should still be usable.  But that's not how drivers use the interface,
they abort probe if a non-zero value is returned.

Cc: stable@vger.kernel.org # v2.4.15+
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pci_hotplug_core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
index af92fed46ab7..fd93783a87b0 100644
--- a/drivers/pci/hotplug/pci_hotplug_core.c
+++ b/drivers/pci/hotplug/pci_hotplug_core.c
@@ -438,8 +438,17 @@ int __pci_hp_register(struct hotplug_slot *slot, struct pci_bus *bus,
 	list_add(&slot->slot_list, &pci_hotplug_slot_list);
 
 	result = fs_add_slot(pci_slot);
+	if (result)
+		goto err_list_del;
+
 	kobject_uevent(&pci_slot->kobj, KOBJ_ADD);
 	dbg("Added slot %s to the list\n", name);
+	goto out;
+
+err_list_del:
+	list_del(&slot->slot_list);
+	pci_slot->hotplug = NULL;
+	pci_destroy_slot(pci_slot);
 out:
 	mutex_unlock(&pci_hp_mutex);
 	return result;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 00/32] Rework pciehp event handling & add runtime PM
@ 2018-06-16 19:25 Lukas Wunner
  2018-06-16 19:25 ` [PATCH 04/32] PCI: pciehp: Fix unprotected list iteration in IRQ handler Lukas Wunner
                   ` (34 more replies)
  0 siblings, 35 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Mayurkumar Patel, Kenji Kaneshige, Stefan Roese,
	Rajat Jain, Alex Williamson, Andreas Noever

Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
on and off synchronously in the IRQ thread, no indirection via a work
queue anymore.

When the slot is enabled/disabled by the user via sysfs or an Attention
Button press, a request is sent to the IRQ thread.  The IRQ thread is
thus the sole entity enabling/disabling the slot.

The IRQ thread can cope with missed events, e.g. if a card is inserted
and immediately pulled out before the IRQ thread had a chance to react.
It also tolerates an initially unstable link as observed in the wild by
Stefan Roese.

Finally, runtime PM support is added.  This was the original motivation
of the series because runtime suspending hotplug ports is needed to power
down Thunderbolt controllers on idle, which saves ~1.5W per controller.
Runtime resuming ports takes tenths of milliseconds during which events
may be missed, this in turn necessitated the event handling rework.

I've pushed the series to GitHub to ease reviewing/fetching:
https://github.com/l1k/linux/commits/pciehp_runpm_v2

Please review and test.

Thanks,

Lukas


Lukas Wunner (32):
  PCI: hotplug: Don't leak pci_slot on registration failure
  PCI: pciehp: Fix UAF on unplug
  PCI: pciehp: Fix deadlock on unplug
  PCI: pciehp: Fix unprotected list iteration in IRQ handler
  PCI: pciehp: Drop unnecessary NULL pointer check
  PCI: pciehp: Declare pciehp_unconfigure_device() void
  PCI: pciehp: Document struct slot and struct controller
  genirq: Synchronize only with single thread on free_irq()
  PCI: pciehp: Convert to threaded IRQ
  PCI: pciehp: Convert to threaded polling
  PCI: pciehp: Stop blinking on slot enable failure
  PCI: pciehp: Handle events synchronously
  PCI: pciehp: Drop slot workqueue
  PCI: hotplug: Demidlayer registration with the core
  PCI: pciehp: Publish to user space last on probe
  PCI: pciehp: Track enable/disable status
  PCI: pciehp: Enable/disable exclusively from IRQ thread
  PCI: pciehp: Drop enable/disable lock
  PCI: pciehp: Declare pciehp_enable/disable_slot() static
  PCI: pciehp: Tolerate initially unstable link
  PCI: pciehp: Become resilient to missed events
  PCI: pciehp: Always enable occupied slot on probe
  PCI: pciehp: Avoid slot access during reset
  PCI: portdrv: Deduplicate PM callback iterator
  PCI: pciehp: Clear spurious events earlier on resume
  PCI: pciehp: Obey compulsory command delay after resume
  PCI: pciehp: Support interrupts sent from D3hot
  PCI: pciehp: Resume to D0 on enable/disable
  PCI: pciehp: Resume parent to D0 on config space access
  PCI: sysfs: Resume to D0 on function reset
  PCI: Whitelist native hotplug ports for runtime D3
  PCI: Whitelist Thunderbolt ports for runtime D3

 drivers/pci/hotplug/acpiphp_core.c      |  22 +-
 drivers/pci/hotplug/cpci_hotplug_core.c |  14 +-
 drivers/pci/hotplug/cpqphp_core.c       |  16 +-
 drivers/pci/hotplug/ibmphp_core.c       |  15 +-
 drivers/pci/hotplug/ibmphp_ebda.c       |  20 --
 drivers/pci/hotplug/pci_hotplug_core.c  | 134 +++++++--
 drivers/pci/hotplug/pciehp.h            | 117 ++++++--
 drivers/pci/hotplug/pciehp_core.c       | 104 ++++---
 drivers/pci/hotplug/pciehp_ctrl.c       | 352 ++++++++++--------------
 drivers/pci/hotplug/pciehp_hpc.c        | 261 ++++++++++--------
 drivers/pci/hotplug/pciehp_pci.c        |  42 ++-
 drivers/pci/hotplug/pcihp_skeleton.c    |  16 +-
 drivers/pci/hotplug/pnv_php.c           |   5 +-
 drivers/pci/hotplug/rpaphp_core.c       |   2 +-
 drivers/pci/hotplug/rpaphp_slot.c       |  13 +-
 drivers/pci/hotplug/s390_pci_hpc.c      |  13 +-
 drivers/pci/hotplug/sgi_hotplug.c       |   9 +-
 drivers/pci/hotplug/shpchp_core.c       |  20 +-
 drivers/pci/pci-sysfs.c                 |   2 +
 drivers/pci/pci.c                       |  21 +-
 drivers/pci/pcie/portdrv.h              |   2 +
 drivers/pci/pcie/portdrv_core.c         |  30 +-
 drivers/pci/pcie/portdrv_pci.c          |   2 +
 drivers/platform/x86/asus-wmi.c         |  12 +-
 drivers/platform/x86/eeepc-laptop.c     |  12 +-
 include/linux/pci_hotplug.h             |  15 +-
 kernel/irq/manage.c                     |  21 +-
 27 files changed, 687 insertions(+), 605 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 11/32] PCI: pciehp: Stop blinking on slot enable failure
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (3 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 19/32] PCI: pciehp: Declare pciehp_enable/disable_slot() static Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 23/32] PCI: pciehp: Avoid slot access during reset Lukas Wunner
                   ` (29 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

If the attention button is pressed to power on the slot AND the user
powers on the slot via sysfs before 5 seconds have elapsed AND powering
on the slot fails because either the slot is unoccupied OR the latch is
open, we neglect turning off the green led so it keeps on blinking.

That's because the error path of pciehp_sysfs_enable_slot() doesn't call
pciehp_green_led_off(), unlike pciehp_power_thread() which does.
The bug has been present since 2004 when the driver was introduced.

Fix by deduplicating common code in pciehp_sysfs_enable_slot() and
pciehp_power_thread() into a wrapper function pciehp_enable_slot() and
renaming the existing function to __pciehp_enable_slot().  Same for
pciehp_disable_slot().  This will also simplify the upcoming rework of
pciehp's event handling.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_core.c |  7 +--
 drivers/pci/hotplug/pciehp_ctrl.c | 73 +++++++++++++++++--------------
 2 files changed, 42 insertions(+), 38 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 2ba59fc94827..8a2133856dfd 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -248,11 +248,8 @@ static int pciehp_probe(struct pcie_device *dev)
 	slot = ctrl->slot;
 	pciehp_get_adapter_status(slot, &occupied);
 	pciehp_get_power_status(slot, &poweron);
-	if (occupied && pciehp_force) {
-		mutex_lock(&slot->hotplug_lock);
+	if (occupied && pciehp_force)
 		pciehp_enable_slot(slot);
-		mutex_unlock(&slot->hotplug_lock);
-	}
 	/* If empty slot's power status is on, turn power off */
 	if (!occupied && poweron && POWER_CTRL(ctrl))
 		pciehp_power_off_slot(slot);
@@ -296,12 +293,10 @@ static int pciehp_resume(struct pcie_device *dev)
 
 	/* Check if slot is occupied */
 	pciehp_get_adapter_status(slot, &status);
-	mutex_lock(&slot->hotplug_lock);
 	if (status)
 		pciehp_enable_slot(slot);
 	else
 		pciehp_disable_slot(slot);
-	mutex_unlock(&slot->hotplug_lock);
 	return 0;
 }
 #endif /* PM */
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 76a9b8c26d2b..9d6343a35db1 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -160,26 +160,13 @@ static void pciehp_power_thread(struct work_struct *work)
 	struct power_work_info *info =
 		container_of(work, struct power_work_info, work);
 	struct slot *p_slot = info->p_slot;
-	int ret;
 
 	switch (info->req) {
 	case DISABLE_REQ:
-		mutex_lock(&p_slot->hotplug_lock);
 		pciehp_disable_slot(p_slot);
-		mutex_unlock(&p_slot->hotplug_lock);
-		mutex_lock(&p_slot->lock);
-		p_slot->state = STATIC_STATE;
-		mutex_unlock(&p_slot->lock);
 		break;
 	case ENABLE_REQ:
-		mutex_lock(&p_slot->hotplug_lock);
-		ret = pciehp_enable_slot(p_slot);
-		mutex_unlock(&p_slot->hotplug_lock);
-		if (ret)
-			pciehp_green_led_off(p_slot);
-		mutex_lock(&p_slot->lock);
-		p_slot->state = STATIC_STATE;
-		mutex_unlock(&p_slot->lock);
+		pciehp_enable_slot(p_slot);
 		break;
 	default:
 		break;
@@ -369,7 +356,7 @@ static void interrupt_event_handler(struct work_struct *work)
 /*
  * Note: This function must be called with slot->hotplug_lock held
  */
-int pciehp_enable_slot(struct slot *p_slot)
+static int __pciehp_enable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
@@ -400,10 +387,29 @@ int pciehp_enable_slot(struct slot *p_slot)
 	return board_added(p_slot);
 }
 
+int pciehp_enable_slot(struct slot *slot)
+{
+	struct controller *ctrl = slot->ctrl;
+	int ret;
+
+	mutex_lock(&slot->hotplug_lock);
+	ret = __pciehp_enable_slot(slot);
+	mutex_unlock(&slot->hotplug_lock);
+
+	if (ret && ATTN_BUTTN(ctrl))
+		pciehp_green_led_off(slot); /* may be blinking */
+
+	mutex_lock(&slot->lock);
+	slot->state = STATIC_STATE;
+	mutex_unlock(&slot->lock);
+
+	return ret;
+}
+
 /*
  * Note: This function must be called with slot->hotplug_lock held
  */
-int pciehp_disable_slot(struct slot *p_slot)
+static int __pciehp_disable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
@@ -421,9 +427,23 @@ int pciehp_disable_slot(struct slot *p_slot)
 	return 0;
 }
 
+int pciehp_disable_slot(struct slot *slot)
+{
+	int ret;
+
+	mutex_lock(&slot->hotplug_lock);
+	ret = __pciehp_disable_slot(slot);
+	mutex_unlock(&slot->hotplug_lock);
+
+	mutex_lock(&slot->lock);
+	slot->state = STATIC_STATE;
+	mutex_unlock(&slot->lock);
+
+	return ret;
+}
+
 int pciehp_sysfs_enable_slot(struct slot *p_slot)
 {
-	int retval = -ENODEV;
 	struct controller *ctrl = p_slot->ctrl;
 
 	mutex_lock(&p_slot->lock);
@@ -433,12 +453,7 @@ int pciehp_sysfs_enable_slot(struct slot *p_slot)
 	case STATIC_STATE:
 		p_slot->state = POWERON_STATE;
 		mutex_unlock(&p_slot->lock);
-		mutex_lock(&p_slot->hotplug_lock);
-		retval = pciehp_enable_slot(p_slot);
-		mutex_unlock(&p_slot->hotplug_lock);
-		mutex_lock(&p_slot->lock);
-		p_slot->state = STATIC_STATE;
-		break;
+		return pciehp_enable_slot(p_slot);
 	case POWERON_STATE:
 		ctrl_info(ctrl, "Slot(%s): Already in powering on state\n",
 			  slot_name(p_slot));
@@ -455,12 +470,11 @@ int pciehp_sysfs_enable_slot(struct slot *p_slot)
 	}
 	mutex_unlock(&p_slot->lock);
 
-	return retval;
+	return -ENODEV;
 }
 
 int pciehp_sysfs_disable_slot(struct slot *p_slot)
 {
-	int retval = -ENODEV;
 	struct controller *ctrl = p_slot->ctrl;
 
 	mutex_lock(&p_slot->lock);
@@ -470,12 +484,7 @@ int pciehp_sysfs_disable_slot(struct slot *p_slot)
 	case STATIC_STATE:
 		p_slot->state = POWEROFF_STATE;
 		mutex_unlock(&p_slot->lock);
-		mutex_lock(&p_slot->hotplug_lock);
-		retval = pciehp_disable_slot(p_slot);
-		mutex_unlock(&p_slot->hotplug_lock);
-		mutex_lock(&p_slot->lock);
-		p_slot->state = STATIC_STATE;
-		break;
+		return pciehp_disable_slot(p_slot);
 	case POWEROFF_STATE:
 		ctrl_info(ctrl, "Slot(%s): Already in powering off state\n",
 			  slot_name(p_slot));
@@ -492,5 +501,5 @@ int pciehp_sysfs_disable_slot(struct slot *p_slot)
 	}
 	mutex_unlock(&p_slot->lock);
 
-	return retval;
+	return -ENODEV;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 30/32] PCI: sysfs: Resume to D0 on function reset
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (25 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 05/32] PCI: pciehp: Drop unnecessary NULL pointer check Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug Lukas Wunner
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

When performing a function reset via sysfs, the device's config space is
accessed in places such as pcie_flr() and its mmio space is accessed e.g.
in reset_ivb_igd(), so ensure accessibility by resuming the device to D0.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci-sysfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 0c4653c1d2ce..a3df6b57df0f 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1449,7 +1449,9 @@ static ssize_t reset_store(struct device *dev, struct device_attribute *attr,
 	if (val != 1)
 		return -EINVAL;
 
+	pm_runtime_get_sync(dev);
 	result = pci_reset_function(pdev);
+	pm_runtime_put(dev);
 	if (result < 0)
 		return result;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 17/32] PCI: pciehp: Enable/disable exclusively from IRQ thread
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (20 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 29/32] PCI: pciehp: Resume parent to D0 on config space access Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-21 11:58   ` Mika Westerberg
  2018-06-16 19:25 ` [PATCH 09/32] PCI: pciehp: Convert to threaded IRQ Lukas Wunner
                   ` (12 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:

- pciehp_probe() enables the slot if it's occupied and the pciehp_force
  module parameter is used.

- pciehp_resume() enables or disables the slot after system sleep.

- pciehp_queue_pushbutton_work() enables or disables the slot after the
  5 second delay following an Attention Button press.

- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
  disable the slot on sysfs write.

This requires locking and complicates pciehp's state machine.

A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.

Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request.  The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.

For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result.  There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.

The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition.  Afterwards, it moves to STATIC_STATE.  The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.

An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot().  So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      | 18 ++++++
 drivers/pci/hotplug/pciehp_core.c | 22 +++++--
 drivers/pci/hotplug/pciehp_ctrl.c | 99 +++++++++++++++----------------
 drivers/pci/hotplug/pciehp_hpc.c  | 14 ++++-
 4 files changed, 93 insertions(+), 60 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 1ba335d6563a..ed42dde5f9ac 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -105,6 +105,9 @@ struct slot {
  *	that has not yet been cleared by the user
  * @pending_events: used by the IRQ handler to save events retrieved from the
  *	Slot Status register for later consumption by the IRQ thread
+ * @request_result: result of last user request submitted to the IRQ thread
+ * @requester: wait queue to wake up on completion of user request,
+ *	used for synchronous slot enable/disable request via sysfs
  */
 struct controller {
 	struct mutex ctrl_lock;
@@ -120,6 +123,8 @@ struct controller {
 	unsigned int notification_enabled:1;
 	unsigned int power_fault_detected;
 	atomic_t pending_events;
+	int request_result;
+	wait_queue_head_t requester;
 };
 
 /**
@@ -141,6 +146,17 @@ struct controller {
 #define POWEROFF_STATE			4
 #define ON_STATE			5
 
+/**
+ * DOC: Flags to request an action from the IRQ thread
+ *
+ * These are stored together with events read from the Slot Status register,
+ * hence must be greater than its 16-bit width.
+ *
+ * %DISABLE_SLOT: Disable the slot in response to a user request via sysfs or
+ *	an Attention Button press after the 5 second delay
+ */
+#define DISABLE_SLOT		(1 << 16)
+
 #define ATTN_BUTTN(ctrl)	((ctrl)->slot_cap & PCI_EXP_SLTCAP_ABP)
 #define POWER_CTRL(ctrl)	((ctrl)->slot_cap & PCI_EXP_SLTCAP_PCP)
 #define MRL_SENS(ctrl)		((ctrl)->slot_cap & PCI_EXP_SLTCAP_MRLSP)
@@ -153,7 +169,9 @@ struct controller {
 
 int pciehp_sysfs_enable_slot(struct slot *slot);
 int pciehp_sysfs_disable_slot(struct slot *slot);
+void pciehp_request(struct controller *ctrl, int action);
 void pciehp_handle_button_press(struct slot *slot);
+void pciehp_handle_disable_request(struct slot *slot);
 void pciehp_handle_link_change(struct slot *slot);
 void pciehp_handle_presence_change(struct slot *slot);
 int pciehp_configure_device(struct slot *p_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index cde32e137f6c..b11f0db0695f 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -240,13 +240,19 @@ static int pciehp_probe(struct pcie_device *dev)
 	}
 
 	/* Check if slot is occupied */
+	mutex_lock(&slot->lock);
 	pciehp_get_adapter_status(slot, &occupied);
 	pciehp_get_power_status(slot, &poweron);
-	if (occupied && pciehp_force)
-		pciehp_enable_slot(slot);
+	if (pciehp_force &&
+	    ((occupied && (slot->state == OFF_STATE ||
+			   slot->state == BLINKINGON_STATE)) ||
+	     (!occupied && (slot->state == ON_STATE ||
+			    slot->state == BLINKINGOFF_STATE))))
+		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
 	/* If empty slot's power status is on, turn power off */
 	if (!occupied && poweron && POWER_CTRL(ctrl))
 		pciehp_power_off_slot(slot);
+	mutex_unlock(&slot->lock);
 
 	return 0;
 
@@ -290,10 +296,14 @@ static int pciehp_resume(struct pcie_device *dev)
 
 	/* Check if slot is occupied */
 	pciehp_get_adapter_status(slot, &status);
-	if (status)
-		pciehp_enable_slot(slot);
-	else
-		pciehp_disable_slot(slot);
+	mutex_lock(&slot->lock);
+	if ((status && (slot->state == OFF_STATE ||
+			slot->state == BLINKINGON_STATE)) ||
+	    (!status && (slot->state == ON_STATE ||
+			 slot->state == BLINKINGOFF_STATE)))
+		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
+	mutex_unlock(&slot->lock);
+
 	return 0;
 }
 #endif /* PM */
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 627e846df802..70bad847a450 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -122,22 +122,26 @@ static void remove_board(struct slot *p_slot)
 	pciehp_green_led_off(p_slot);
 }
 
+void pciehp_request(struct controller *ctrl, int action)
+{
+	atomic_or(action, &ctrl->pending_events);
+	if (!pciehp_poll_mode)
+		irq_wake_thread(ctrl->pcie->irq, ctrl);
+}
+
 void pciehp_queue_pushbutton_work(struct work_struct *work)
 {
 	struct slot *p_slot = container_of(work, struct slot, work.work);
+	struct controller *ctrl = p_slot->ctrl;
 
 	mutex_lock(&p_slot->lock);
 	switch (p_slot->state) {
 	case BLINKINGOFF_STATE:
-		p_slot->state = POWEROFF_STATE;
-		mutex_unlock(&p_slot->lock);
-		pciehp_disable_slot(p_slot);
-		return;
+		pciehp_request(ctrl, DISABLE_SLOT);
+		break;
 	case BLINKINGON_STATE:
-		p_slot->state = POWERON_STATE;
-		mutex_unlock(&p_slot->lock);
-		pciehp_enable_slot(p_slot);
-		return;
+		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
+		break;
 	default:
 		break;
 	}
@@ -186,16 +190,6 @@ void pciehp_handle_button_press(struct slot *p_slot)
 		ctrl_info(ctrl, "Slot(%s): Action canceled due to button press\n",
 			  slot_name(p_slot));
 		break;
-	case POWEROFF_STATE:
-	case POWERON_STATE:
-		/*
-		 * Ignore if the slot is on power-on or power-off state;
-		 * this means that the previous attention button action
-		 * to hot-add or hot-remove is undergoing
-		 */
-		ctrl_info(ctrl, "Slot(%s): Button ignored\n",
-			  slot_name(p_slot));
-		break;
 	default:
 		ctrl_err(ctrl, "Slot(%s): Ignoring invalid state %#x\n",
 			 slot_name(p_slot), p_slot->state);
@@ -204,6 +198,22 @@ void pciehp_handle_button_press(struct slot *p_slot)
 	mutex_unlock(&p_slot->lock);
 }
 
+void pciehp_handle_disable_request(struct slot *slot)
+{
+	struct controller *ctrl = slot->ctrl;
+
+	mutex_lock(&slot->lock);
+	switch (slot->state) {
+	case BLINKINGON_STATE:
+	case BLINKINGOFF_STATE:
+		cancel_delayed_work(&slot->work);
+	}
+	slot->state = POWEROFF_STATE;
+	mutex_unlock(&slot->lock);
+
+	ctrl->request_result = pciehp_disable_slot(slot);
+}
+
 void pciehp_handle_link_change(struct slot *p_slot)
 {
 	struct controller *ctrl = p_slot->ctrl;
@@ -232,32 +242,6 @@ void pciehp_handle_link_change(struct slot *p_slot)
 		}
 		return;
 		break;
-	case POWERON_STATE:
-		if (link_active) {
-			ctrl_info(ctrl, "Slot(%s): Link Up event ignored; already powering on\n",
-				  slot_name(p_slot));
-		} else {
-			p_slot->state = POWEROFF_STATE;
-			mutex_unlock(&p_slot->lock);
-			ctrl_info(ctrl, "Slot(%s): Link Down event queued; currently getting powered on\n",
-				  slot_name(p_slot));
-			pciehp_disable_slot(p_slot);
-			return;
-		}
-		break;
-	case POWEROFF_STATE:
-		if (link_active) {
-			p_slot->state = POWERON_STATE;
-			mutex_unlock(&p_slot->lock);
-			ctrl_info(ctrl, "Slot(%s): Link Up event queued; currently getting powered off\n",
-				  slot_name(p_slot));
-			pciehp_enable_slot(p_slot);
-			return;
-		} else {
-			ctrl_info(ctrl, "Slot(%s): Link Down event ignored; already powering off\n",
-				  slot_name(p_slot));
-		}
-		break;
 	default:
 		ctrl_err(ctrl, "Slot(%s): Ignoring invalid state %#x\n",
 			 slot_name(p_slot), p_slot->state);
@@ -272,6 +256,12 @@ void pciehp_handle_presence_change(struct slot *slot)
 	u8 present;
 
 	mutex_lock(&slot->lock);
+	switch (slot->state) {
+	case BLINKINGON_STATE:
+	case BLINKINGOFF_STATE:
+		cancel_delayed_work(&slot->work);
+	}
+
 	pciehp_get_adapter_status(slot, &present);
 	ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
 		  present ? "" : "not ");
@@ -279,7 +269,7 @@ void pciehp_handle_presence_change(struct slot *slot)
 	if (present) {
 		slot->state = POWERON_STATE;
 		mutex_unlock(&slot->lock);
-		pciehp_enable_slot(slot);
+		ctrl->request_result = pciehp_enable_slot(slot);
 	} else {
 		slot->state = POWEROFF_STATE;
 		mutex_unlock(&slot->lock);
@@ -383,11 +373,17 @@ int pciehp_sysfs_enable_slot(struct slot *p_slot)
 	mutex_lock(&p_slot->lock);
 	switch (p_slot->state) {
 	case BLINKINGON_STATE:
-		cancel_delayed_work(&p_slot->work);
 	case OFF_STATE:
-		p_slot->state = POWERON_STATE;
 		mutex_unlock(&p_slot->lock);
-		return pciehp_enable_slot(p_slot);
+		/*
+		 * The IRQ thread becomes a no-op if the user pulls out the
+		 * card before the thread wakes up, so initialize to -ENODEV.
+		 */
+		ctrl->request_result = -ENODEV;
+		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
+		wait_event(ctrl->requester,
+			   !atomic_read(&ctrl->pending_events));
+		return ctrl->request_result;
 	case POWERON_STATE:
 		ctrl_info(ctrl, "Slot(%s): Already in powering on state\n",
 			  slot_name(p_slot));
@@ -415,11 +411,12 @@ int pciehp_sysfs_disable_slot(struct slot *p_slot)
 	mutex_lock(&p_slot->lock);
 	switch (p_slot->state) {
 	case BLINKINGOFF_STATE:
-		cancel_delayed_work(&p_slot->work);
 	case ON_STATE:
-		p_slot->state = POWEROFF_STATE;
 		mutex_unlock(&p_slot->lock);
-		return pciehp_disable_slot(p_slot);
+		pciehp_request(ctrl, DISABLE_SLOT);
+		wait_event(ctrl->requester,
+			   !atomic_read(&ctrl->pending_events));
+		return ctrl->request_result;
 	case POWEROFF_STATE:
 		ctrl_info(ctrl, "Slot(%s): Already in powering off state\n",
 			  slot_name(p_slot));
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 564a904d7fa9..9aea0bdbb019 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -595,11 +595,16 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	}
 
 	/*
+	 * Disable requests have higher priority than Presence Detect Changed
+	 * or Data Link Layer State Changed events.
+	 *
 	 * Check Link Status Changed at higher precedence than Presence
 	 * Detect Changed.  The PDS value may be set to "card present" from
 	 * out-of-band detection, which may be in conflict with a Link Down.
 	 */
-	if (events & PCI_EXP_SLTSTA_DLLSC)
+	if (events & DISABLE_SLOT)
+		pciehp_handle_disable_request(slot);
+	else if (events & PCI_EXP_SLTSTA_DLLSC)
 		pciehp_handle_link_change(slot);
 	else if (events & PCI_EXP_SLTSTA_PDC)
 		pciehp_handle_presence_change(slot);
@@ -612,6 +617,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 		pciehp_green_led_off(slot);
 	}
 
+	wake_up(&ctrl->requester);
 	return IRQ_HANDLED;
 }
 
@@ -625,8 +631,9 @@ static int pciehp_poll(void *data)
 		if (kthread_should_park())
 			kthread_parkme();
 
-		/* poll for interrupt events */
-		while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD)
+		/* poll for interrupt events or user requests */
+		while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD ||
+		       atomic_read(&ctrl->pending_events))
 			pciehp_ist(IRQ_NOTCONNECTED, ctrl);
 
 		if (pciehp_poll_time <= 0 || pciehp_poll_time > 60)
@@ -829,6 +836,7 @@ struct controller *pcie_init(struct pcie_device *dev)
 
 	ctrl->slot_cap = slot_cap;
 	mutex_init(&ctrl->ctrl_lock);
+	init_waitqueue_head(&ctrl->requester);
 	init_waitqueue_head(&ctrl->queue);
 	dbg_ctrl(ctrl);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/32] PCI: pciehp: Convert to threaded polling
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (13 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 22/32] PCI: pciehp: Always enable occupied slot on probe Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 13/32] PCI: pciehp: Drop slot workqueue Lukas Wunner
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Thomas Gleixner

We've just converted pciehp to threaded IRQ handling, but still cannot
sleep in pciehp_ist() because the function is also called in poll mode,
which runs in softirq context (from a timer).

Convert poll mode to a kthread so that pciehp_ist() always runs in task
context.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h     |  4 +-
 drivers/pci/hotplug/pciehp_hpc.c | 67 +++++++++++++++-----------------
 2 files changed, 34 insertions(+), 37 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index ab1d97a1822d..9f11360e3318 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -97,7 +97,7 @@ struct event_info {
  *	used for synchronous writes to the Slot Control register
  * @slot_cap: cached copy of the Slot Capabilities register
  * @slot_ctrl: cached copy of the Slot Control register
- * @poll_timer: timer to poll for slot events if no IRQ is available,
+ * @poll_thread: thread to poll for slot events if no IRQ is available,
  *	enabled with pciehp_poll_mode module parameter
  * @cmd_started: jiffies when the Slot Control register was last written;
  *	the next write is allowed 1 second later, absent a Command Completed
@@ -122,7 +122,7 @@ struct controller {
 	wait_queue_head_t queue;
 	u32 slot_cap;
 	u16 slot_ctrl;
-	struct timer_list poll_timer;
+	struct task_struct *poll_thread;
 	unsigned long cmd_started;	/* jiffies */
 	unsigned int cmd_busy:1;
 	unsigned int link_active_reporting:1;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5f41ba788cb2..be4d9698f254 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -17,7 +17,7 @@
 #include <linux/types.h>
 #include <linux/signal.h>
 #include <linux/jiffies.h>
-#include <linux/timer.h>
+#include <linux/kthread.h>
 #include <linux/pci.h>
 #include <linux/interrupt.h>
 #include <linux/time.h>
@@ -33,43 +33,17 @@ static inline struct pci_dev *ctrl_dev(struct controller *ctrl)
 
 static irqreturn_t pciehp_isr(int irq, void *dev_id);
 static irqreturn_t pciehp_ist(int irq, void *dev_id);
-static void start_int_poll_timer(struct controller *ctrl, int sec);
-
-/* This is the interrupt polling timeout function. */
-static void int_poll_timeout(struct timer_list *t)
-{
-	struct controller *ctrl = from_timer(ctrl, t, poll_timer);
-
-	/* Poll for interrupt events.  regs == NULL => polling */
-	while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD)
-		pciehp_ist(IRQ_NOTCONNECTED, ctrl);
-
-	if (!pciehp_poll_time)
-		pciehp_poll_time = 2; /* default polling interval is 2 sec */
-
-	start_int_poll_timer(ctrl, pciehp_poll_time);
-}
-
-/* This function starts the interrupt polling timer. */
-static void start_int_poll_timer(struct controller *ctrl, int sec)
-{
-	/* Clamp to sane value */
-	if ((sec <= 0) || (sec > 60))
-		sec = 2;
-
-	ctrl->poll_timer.expires = jiffies + sec * HZ;
-	add_timer(&ctrl->poll_timer);
-}
+static int pciehp_poll(void *data);
 
 static inline int pciehp_request_irq(struct controller *ctrl)
 {
 	int retval, irq = ctrl->pcie->irq;
 
-	/* Install interrupt polling timer. Start with 10 sec delay */
 	if (pciehp_poll_mode) {
-		timer_setup(&ctrl->poll_timer, int_poll_timeout, 0);
-		start_int_poll_timer(ctrl, 10);
-		return 0;
+		ctrl->poll_thread = kthread_run(&pciehp_poll, ctrl,
+						"pciehp_poll-%s",
+						slot_name(ctrl->slot));
+		return PTR_ERR_OR_ZERO(ctrl->poll_thread);
 	}
 
 	/* Installs the interrupt handler */
@@ -84,7 +58,7 @@ static inline int pciehp_request_irq(struct controller *ctrl)
 static inline void pciehp_free_irq(struct controller *ctrl)
 {
 	if (pciehp_poll_mode)
-		del_timer_sync(&ctrl->poll_timer);
+		kthread_stop(ctrl->poll_thread);
 	else
 		free_irq(ctrl->pcie->irq, ctrl);
 }
@@ -652,6 +626,29 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static int pciehp_poll(void *data)
+{
+	struct controller *ctrl = data;
+
+	schedule_timeout_idle(10 * HZ); /* start with 10 sec delay */
+
+	while (!kthread_should_stop()) {
+		if (kthread_should_park())
+			kthread_parkme();
+
+		/* poll for interrupt events */
+		while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD)
+			pciehp_ist(IRQ_NOTCONNECTED, ctrl);
+
+		if (pciehp_poll_time <= 0 || pciehp_poll_time > 60)
+			pciehp_poll_time = 2; /* clamp to sane value */
+
+		schedule_timeout_idle(pciehp_poll_time * HZ);
+	}
+
+	return 0;
+}
+
 static void pcie_enable_notification(struct controller *ctrl)
 {
 	u16 cmd, mask;
@@ -742,7 +739,7 @@ int pciehp_reset_slot(struct slot *slot, int probe)
 	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
 	if (pciehp_poll_mode)
-		del_timer_sync(&ctrl->poll_timer);
+		kthread_park(ctrl->poll_thread);
 
 	pci_reset_bridge_secondary_bus(ctrl->pcie->port);
 
@@ -751,7 +748,7 @@ int pciehp_reset_slot(struct slot *slot, int probe)
 	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
 	if (pciehp_poll_mode)
-		int_poll_timeout(&ctrl->poll_timer);
+		kthread_unpark(ctrl->poll_thread);
 	return 0;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/32] PCI: pciehp: Fix UAF on unplug
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (18 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 25/32] PCI: pciehp: Clear spurious events earlier on resume Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 29/32] PCI: pciehp: Resume parent to D0 on config space access Lukas Wunner
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the
hotplug_slot struct is deregistered and thus freed before freeing the
IRQ.  The IRQ handler and the work items it schedules print the slot
name referenced from the freed structure in various informational and
debug log messages, each time resulting in a quadruple dereference of
freed pointers (hotplug_slot -> pci_slot -> kobject -> name).

At best the slot name is logged as "(null)", at worst kernel memory is
exposed in logs or the driver crashes:
    pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present

An attacker may provoke the bug by unplugging multiple devices on a
Thunderbolt daisy chain at once.  Unplugging can also be simulated by
powering down slots via sysfs.  The bug is particularly easy to trigger
in poll mode.

It has been present since the driver's introduction in 2004:
https://git.kernel.org/tglx/history/c/c16b4b14d980

Fix by rearranging teardown such that the IRQ is freed first.  Run the
work items queued by the IRQ handler to completion before freeing the
hotplug_slot struct by draining the work queue from the ->release_slot
callback which is invoked by pci_hp_deregister().

Cc: stable@vger.kernel.org # v2.6.4
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      | 1 +
 drivers/pci/hotplug/pciehp_core.c | 7 +++++++
 drivers/pci/hotplug/pciehp_hpc.c  | 4 +---
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 5f892065585e..fca87a1a2b22 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -119,6 +119,7 @@ int pciehp_unconfigure_device(struct slot *p_slot);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
 struct controller *pcie_init(struct pcie_device *dev);
 int pcie_init_notification(struct controller *ctrl);
+void pcie_shutdown_notification(struct controller *ctrl);
 int pciehp_enable_slot(struct slot *p_slot);
 int pciehp_disable_slot(struct slot *p_slot);
 void pcie_reenable_notification(struct controller *ctrl);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 44a6a63802d5..2ba59fc94827 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -62,6 +62,12 @@ static int reset_slot(struct hotplug_slot *slot, int probe);
  */
 static void release_slot(struct hotplug_slot *hotplug_slot)
 {
+	struct slot *slot = hotplug_slot->private;
+
+	/* queued work needs hotplug_slot name */
+	cancel_delayed_work(&slot->work);
+	drain_workqueue(slot->wq);
+
 	kfree(hotplug_slot->ops);
 	kfree(hotplug_slot->info);
 	kfree(hotplug_slot);
@@ -264,6 +270,7 @@ static void pciehp_remove(struct pcie_device *dev)
 {
 	struct controller *ctrl = get_service_data(dev);
 
+	pcie_shutdown_notification(ctrl);
 	cleanup_slot(ctrl);
 	pciehp_release_ctrl(ctrl);
 }
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 718b6073afad..26b85440053e 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -765,7 +765,7 @@ int pcie_init_notification(struct controller *ctrl)
 	return 0;
 }
 
-static void pcie_shutdown_notification(struct controller *ctrl)
+void pcie_shutdown_notification(struct controller *ctrl)
 {
 	if (ctrl->notification_enabled) {
 		pcie_disable_notification(ctrl);
@@ -800,7 +800,6 @@ static int pcie_init_slot(struct controller *ctrl)
 static void pcie_cleanup_slot(struct controller *ctrl)
 {
 	struct slot *slot = ctrl->slot;
-	cancel_delayed_work(&slot->work);
 	destroy_workqueue(slot->wq);
 	kfree(slot);
 }
@@ -893,7 +892,6 @@ struct controller *pcie_init(struct pcie_device *dev)
 
 void pciehp_release_ctrl(struct controller *ctrl)
 {
-	pcie_shutdown_notification(ctrl);
 	pcie_cleanup_slot(ctrl);
 	kfree(ctrl);
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 23/32] PCI: pciehp: Avoid slot access during reset
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (4 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 11/32] PCI: pciehp: Stop blinking on slot enable failure Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-21 12:06   ` Mika Westerberg
  2018-06-16 19:25 ` [PATCH 01/32] PCI: hotplug: Don't leak pci_slot on registration failure Lukas Wunner
                   ` (28 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Rajat Jain, Alex Williamson

The ->reset_slot callback introduced by commits:

2e35afaefe64 ("PCI: pciehp: Add reset_slot() method") and
06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset")

disables notification of Presence Detect Changed and Data Link Layer
State Changed events for the duration of a secondary bus reset.

Presumably a bus reset not only triggers these events, but also clears
the Presence Detect State bit in the Slot Status register and the Data
Link Layer Link Active bit in the Link Status register momentarily.

pciehp should therefore ensure that any parts of the driver that access
those bits do not run concurrently.  The only precaution the commits
took to that effect was to halt interrupt polling.  They made no effort
to drain the slot workqueue, cancel an outstanding Attention Button
work, or block slot enable/disable requests via sysfs and in the ->probe
hook.

Now that pciehp is converted to enable/disable the slot exclusively from
the IRQ thread, the only places accessing the two above-mentioned bits
are the IRQ thread and the ->probe hook.  Add locking to serialize them
with a bus reset.  This obviates the need to halt interrupt polling.
Do not add locking to the ->get_adapter_status sysfs callback to afford
users unfettered access to that bit.

Cc: Rajat Jain <rajatja@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      |  5 +++++
 drivers/pci/hotplug/pciehp_core.c |  2 ++
 drivers/pci/hotplug/pciehp_hpc.c  | 14 +++++++-------
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 47cd9af5caf3..bb015be34894 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -21,6 +21,7 @@
 #include <linux/delay.h>
 #include <linux/sched/signal.h>		/* signal_pending() */
 #include <linux/mutex.h>
+#include <linux/rwsem.h>
 #include <linux/workqueue.h>
 
 #include "../pcie/portdrv.h"
@@ -80,6 +81,9 @@ struct slot {
  * struct controller - PCIe hotplug controller
  * @ctrl_lock: serializes writes to the Slot Control register
  * @pcie: pointer to the controller's PCIe port service device
+ * @reset_lock: prevents access to the Data Link Layer Link Active bit in the
+ *	Link Status register and to the Presence Detect State bit in the Slot
+ *	Status register during a slot reset which may cause them to flap
  * @slot: pointer to the controller's slot structure
  * @queue: wait queue to wake up on reception of a Command Completed event,
  *	used for synchronous writes to the Slot Control register
@@ -109,6 +113,7 @@ struct slot {
 struct controller {
 	struct mutex ctrl_lock;
 	struct pcie_device *pcie;
+	struct rw_semaphore reset_lock;
 	struct slot *slot;
 	wait_queue_head_t queue;
 	u32 slot_cap;
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index f4eaa9944699..1be8871e7439 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -237,6 +237,7 @@ static int pciehp_probe(struct pcie_device *dev)
 	}
 
 	/* Check if slot is occupied */
+	down_read(&ctrl->reset_lock);
 	mutex_lock(&slot->lock);
 	pciehp_get_adapter_status(slot, &occupied);
 	pciehp_get_power_status(slot, &poweron);
@@ -249,6 +250,7 @@ static int pciehp_probe(struct pcie_device *dev)
 	if (!occupied && poweron && POWER_CTRL(ctrl))
 		pciehp_power_off_slot(slot);
 	mutex_unlock(&slot->lock);
+	up_read(&ctrl->reset_lock);
 
 	return 0;
 
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3c0adf4967a8..a89ae65d3c7c 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -603,10 +603,12 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
 	 * Disable requests have higher priority than Presence Detect Changed
 	 * or Data Link Layer State Changed events.
 	 */
+	down_read(&ctrl->reset_lock);
 	if (events & DISABLE_SLOT)
 		pciehp_handle_disable_request(slot);
 	else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
 		pciehp_handle_presence_or_link_change(slot, events);
+	up_read(&ctrl->reset_lock);
 
 	/* Check Power Fault Detected */
 	if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
@@ -627,9 +629,6 @@ static int pciehp_poll(void *data)
 	schedule_timeout_idle(10 * HZ); /* start with 10 sec delay */
 
 	while (!kthread_should_stop()) {
-		if (kthread_should_park())
-			kthread_parkme();
-
 		/* poll for interrupt events or user requests */
 		while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD ||
 		       atomic_read(&ctrl->pending_events))
@@ -723,6 +722,8 @@ int pciehp_reset_slot(struct slot *slot, int probe)
 	if (probe)
 		return 0;
 
+	down_write(&ctrl->reset_lock);
+
 	if (!ATTN_BUTTN(ctrl)) {
 		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
 		stat_mask |= PCI_EXP_SLTSTA_PDC;
@@ -733,8 +734,6 @@ int pciehp_reset_slot(struct slot *slot, int probe)
 	pcie_write_cmd(ctrl, 0, ctrl_mask);
 	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
-	if (pciehp_poll_mode)
-		kthread_park(ctrl->poll_thread);
 
 	pci_reset_bridge_secondary_bus(ctrl->pcie->port);
 
@@ -742,8 +741,8 @@ int pciehp_reset_slot(struct slot *slot, int probe)
 	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
 	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
-	if (pciehp_poll_mode)
-		kthread_unpark(ctrl->poll_thread);
+
+	up_write(&ctrl->reset_lock);
 	return 0;
 }
 
@@ -834,6 +833,7 @@ struct controller *pcie_init(struct pcie_device *dev)
 
 	ctrl->slot_cap = slot_cap;
 	mutex_init(&ctrl->ctrl_lock);
+	init_rwsem(&ctrl->reset_lock);
 	init_waitqueue_head(&ctrl->requester);
 	init_waitqueue_head(&ctrl->queue);
 	dbg_ctrl(ctrl);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 31/32] PCI: Whitelist native hotplug ports for runtime D3
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
  2018-06-16 19:25 ` [PATCH 04/32] PCI: pciehp: Fix unprotected list iteration in IRQ handler Lukas Wunner
  2018-06-16 19:25 ` [PATCH 08/32] genirq: Synchronize only with single thread on free_irq() Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 19/32] PCI: pciehp: Declare pciehp_enable/disable_slot() static Lukas Wunner
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Previously we blacklisted PCIe hotplug ports for runtime D3 because:

(a) Ports handled by the firmware must not be transitioned to D3 by the
    OS behind the firmware's back:
    https://bugzilla.kernel.org/show_bug.cgi?id=53811

(b) Ports handled natively by the OS lacked runtime D3 support in the
    pciehp driver.

We've just rectified the latter, so allow users to manually enable and
test it by passing pcie_port_pm=force on the command line.  Vendors are
thus put in a position to validate hotplug ports for runtime D3 and
perhaps we can someday enable it by default, but with a BIOS cutoff date.

Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in
2017 and encountered Hardware Error NMIs, so this feature clearly cannot
be enabled for everyone yet:
https://lkml.org/lkml/2017/5/3/480

While at it, remove an erroneous code comment I added with 97a90aee5dab
("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which
claims that parents of a hotplug port must stay awake lest interrupts
cannot be delivered.  That has turned out to be wrong at least for
Thunderbolt hotplug ports.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 97acba712e4e..4099a6c14b6d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2305,18 +2305,23 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
 			return false;
 
 		/*
-		 * Hotplug interrupts cannot be delivered if the link is down,
-		 * so parents of a hotplug port must stay awake. In addition,
-		 * hotplug ports handled by firmware in System Management Mode
+		 * Hotplug ports handled by firmware in System Management Mode
 		 * may not be put into D3 by the OS (Thunderbolt on non-Macs).
-		 * For simplicity, disallow in general for now.
 		 */
-		if (bridge->is_hotplug_bridge)
+		if (bridge->is_hotplug_bridge && !pciehp_is_native(bridge))
 			return false;
 
 		if (pci_bridge_d3_force)
 			return true;
 
+		/*
+		 * Hotplug ports handled natively by the OS were not validated
+		 * by vendors for runtime D3 at least until 2018 because there
+		 * was no OS support.
+		 */
+		if (bridge->is_hotplug_bridge)
+			return false;
+
 		/*
 		 * It should be safe to put PCIe ports from 2015 or newer
 		 * to D3.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 20/32] PCI: pciehp: Tolerate initially unstable link
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (23 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 07/32] PCI: pciehp: Document struct slot and struct controller Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 05/32] PCI: pciehp: Drop unnecessary NULL pointer check Lukas Wunner
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Stefan Roese

When a device is hotplugged, Presence Detect and Link Up events often do
not occur simultaneously, but with a lag of a few milliseconds.  Only
the first event received is relevant, the other one can be disregarded.

Moreover, Stefan Roese reports that on certain platforms, Link State and
Presence Detect may flap for up to 100 ms before stabilizing, suggesting
that such events should be disregarded for at least this long:
https://patchwork.ozlabs.org/patch/867418/

On slot enablement, pciehp_check_link_status() waits for 100 ms per
PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor
register for up to 1 second.

If this succeeds, the link is definitely up, so ignore any Presence
Detect or Link State events that occurred up to this point.

pciehp_check_link_status() then checks the Link Training bit in the
Link Status register.  This is the final opportunity to detect
inaccessibility of the device and abort slot enablement.  Any link
or presence change that occurs afterwards will cause the slot to be
disabled again immediately after attempting to enable it.

The astute reviewer may appreciate that achieving this behavior would be
more complicated had pciehp not just been converted to enable/disable
the slot exclusively from the IRQ thread:  When the slot is enabled via
sysfs, each link or presence flap would otherwise cause the IRQ thread
to run and it would have to sense that those events are belonging to a
concurrent slot enablement operation and disregard them.  It would be
much more difficult than this mere 3 line change.

Cc: Stefan Roese <sr@denx.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_hpc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index c795d32dbb73..adc6a89a3b5d 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -270,6 +270,11 @@ int pciehp_check_link_status(struct controller *ctrl)
 	found = pci_bus_check_dev(ctrl->pcie->port->subordinate,
 					PCI_DEVFN(0, 0));
 
+	/* ignore link or presence changes up to this point */
+	if (found)
+		atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
+			   &ctrl->pending_events);
+
 	pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
 	ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status);
 	if ((lnk_status & PCI_EXP_LNKSTA_LT) ||
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/32] genirq: Synchronize only with single thread on free_irq()
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
  2018-06-16 19:25 ` [PATCH 04/32] PCI: pciehp: Fix unprotected list iteration in IRQ handler Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-07-12 22:21   ` Bjorn Helgaas
  2018-06-16 19:25 ` [PATCH 31/32] PCI: Whitelist native hotplug ports for runtime D3 Lukas Wunner
                   ` (32 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Thomas Gleixner

When pciehp is converted to threaded IRQ handling, removal of unplugged
devices below a PCIe hotplug port happens synchronously in the IRQ
thread.  Removal of devices typically entails a call to free_irq() by
their drivers.

If those devices share their IRQ with the hotplug port, free_irq()
deadlocks because it calls synchronize_irq() to wait for all hard IRQ
handlers as well as all threads sharing the IRQ to finish.

Actually it's sufficient to wait only for the IRQ thread of the removed
device, so call synchronize_hardirq() to wait for all hard IRQ handlers
to finish, but no longer for any threads.  Compensate by rearranging the
control flow in irq_wait_for_interrupt() such that the device's thread
is allowed to run one last time after kthread_stop() has been called.

Stack trace for posterity:
    INFO: task irq/17-pciehp:94 blocked for more than 120 seconds.
    schedule+0x28/0x80
    synchronize_irq+0x6e/0xa0
    __free_irq+0x15a/0x2b0
    free_irq+0x33/0x70
    pciehp_release_ctrl+0x98/0xb0
    pcie_port_remove_service+0x2f/0x40
    device_release_driver_internal+0x157/0x220
    bus_remove_device+0xe2/0x150
    device_del+0x124/0x340
    device_unregister+0x16/0x60
    remove_iter+0x1a/0x20
    device_for_each_child+0x4b/0x90
    pcie_port_device_remove+0x1e/0x30
    pci_device_remove+0x36/0xb0
    device_release_driver_internal+0x157/0x220
    pci_stop_bus_device+0x7d/0xa0
    pci_stop_bus_device+0x3d/0xa0
    pci_stop_and_remove_bus_device+0xe/0x20
    pciehp_unconfigure_device+0xb8/0x160
    pciehp_disable_slot+0x84/0x130
    pciehp_ist+0x158/0x190
    irq_thread_fn+0x1b/0x50
    irq_thread+0x143/0x1a0
    kthread+0x111/0x130

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 kernel/irq/manage.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index daeabd791d58..0531f645bfea 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -790,9 +790,19 @@ static irqreturn_t irq_forced_secondary_handler(int irq, void *dev_id)
 
 static int irq_wait_for_interrupt(struct irqaction *action)
 {
-	set_current_state(TASK_INTERRUPTIBLE);
+	for (;;) {
+		set_current_state(TASK_INTERRUPTIBLE);
 
-	while (!kthread_should_stop()) {
+		if (kthread_should_stop()) {
+			/* may need to run one last time. */
+			if (test_and_clear_bit(IRQTF_RUNTHREAD,
+					       &action->thread_flags)) {
+				__set_current_state(TASK_RUNNING);
+				return 0;
+			}
+			__set_current_state(TASK_RUNNING);
+			return -1;
+		}
 
 		if (test_and_clear_bit(IRQTF_RUNTHREAD,
 				       &action->thread_flags)) {
@@ -800,10 +810,7 @@ static int irq_wait_for_interrupt(struct irqaction *action)
 			return 0;
 		}
 		schedule();
-		set_current_state(TASK_INTERRUPTIBLE);
 	}
-	__set_current_state(TASK_RUNNING);
-	return -1;
 }
 
 /*
@@ -1024,7 +1031,7 @@ static int irq_thread(void *data)
 	/*
 	 * This is the regular exit path. __free_irq() is stopping the
 	 * thread via kthread_stop() after calling
-	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
+	 * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
 	 * oneshot mask bit can be set. We cannot verify that as we
 	 * cannot touch the oneshot mask at this point anymore as
 	 * __setup_irq() might have given out currents thread_mask
@@ -1629,7 +1636,7 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
 	unregister_handler_proc(irq, action);
 
 	/* Make sure it's not being used on another CPU: */
-	synchronize_irq(irq);
+	synchronize_hardirq(irq);
 
 #ifdef CONFIG_DEBUG_SHIRQ
 	/*
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 18/32] PCI: pciehp: Drop enable/disable lock
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (27 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core Lukas Wunner
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Previously slot enablement and disablement could happen concurrently.
But now it's under the exclusive control of the IRQ thread, rendering
the locking obsolete.  Drop it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      |  3 ---
 drivers/pci/hotplug/pciehp_ctrl.c | 11 -----------
 drivers/pci/hotplug/pciehp_hpc.c  |  1 -
 3 files changed, 15 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index ed42dde5f9ac..5417180c3aff 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -67,8 +67,6 @@ do {									\
  *	an Attention Button press
  * @lock: protects reads and writes of @state;
  *	protects scheduling, execution and cancellation of @work
- * @hotplug_lock: serializes calls to pciehp_enable_slot() and
- *	pciehp_disable_slot()
  */
 struct slot {
 	u8 state;
@@ -76,7 +74,6 @@ struct slot {
 	struct hotplug_slot *hotplug_slot;
 	struct delayed_work work;
 	struct mutex lock;
-	struct mutex hotplug_lock;
 };
 
 /**
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 70bad847a450..8ba937599fb3 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -277,9 +277,6 @@ void pciehp_handle_presence_change(struct slot *slot)
 	}
 }
 
-/*
- * Note: This function must be called with slot->hotplug_lock held
- */
 static int __pciehp_enable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
@@ -316,10 +313,7 @@ int pciehp_enable_slot(struct slot *slot)
 	struct controller *ctrl = slot->ctrl;
 	int ret;
 
-	mutex_lock(&slot->hotplug_lock);
 	ret = __pciehp_enable_slot(slot);
-	mutex_unlock(&slot->hotplug_lock);
-
 	if (ret && ATTN_BUTTN(ctrl))
 		pciehp_green_led_off(slot); /* may be blinking */
 
@@ -330,9 +324,6 @@ int pciehp_enable_slot(struct slot *slot)
 	return ret;
 }
 
-/*
- * Note: This function must be called with slot->hotplug_lock held
- */
 static int __pciehp_disable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
@@ -355,9 +346,7 @@ int pciehp_disable_slot(struct slot *slot)
 {
 	int ret;
 
-	mutex_lock(&slot->hotplug_lock);
 	ret = __pciehp_disable_slot(slot);
-	mutex_unlock(&slot->hotplug_lock);
 
 	mutex_lock(&slot->lock);
 	slot->state = OFF_STATE;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 9aea0bdbb019..c795d32dbb73 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -781,7 +781,6 @@ static int pcie_init_slot(struct controller *ctrl)
 
 	slot->ctrl = ctrl;
 	mutex_init(&slot->lock);
-	mutex_init(&slot->hotplug_lock);
 	INIT_DELAYED_WORK(&slot->work, pciehp_queue_pushbutton_work);
 	ctrl->slot = slot;
 	return 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/32] PCI: pciehp: Convert to threaded IRQ
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (21 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 17/32] PCI: pciehp: Enable/disable exclusively from IRQ thread Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-19 23:16   ` Keith Busch
  2018-06-16 19:25 ` [PATCH 07/32] PCI: pciehp: Document struct slot and struct controller Lukas Wunner
                   ` (11 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Thomas Gleixner,
	Mayurkumar Patel, Kenji Kaneshige

pciehp's IRQ handler queues up a work item for each event signaled by
the hardware.  A more modern alternative is to let a long running
kthread service the events.  The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.

One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments.  Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items.  pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years.  (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)

Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.

By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.

pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred.  If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions.  We're about to make pciehp_ist() resilient
to missed events.  The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h     |  3 ++
 drivers/pci/hotplug/pciehp_hpc.c | 70 +++++++++++++++++---------------
 2 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index c3d63e5b650f..ab1d97a1822d 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -112,6 +112,8 @@ struct event_info {
  * @notification_enabled: whether the IRQ was requested successfully
  * @power_fault_detected: whether a power fault was detected by the hardware
  *	that has not yet been cleared by the user
+ * @pending_events: used by the IRQ handler to save events retrieved from the
+ *	Slot Status register for later consumption by the IRQ thread
  */
 struct controller {
 	struct mutex ctrl_lock;
@@ -126,6 +128,7 @@ struct controller {
 	unsigned int link_active_reporting:1;
 	unsigned int notification_enabled:1;
 	unsigned int power_fault_detected;
+	atomic_t pending_events;
 };
 
 #define INT_PRESENCE_ON			1
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 044087e2683d..5f41ba788cb2 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -31,7 +31,8 @@ static inline struct pci_dev *ctrl_dev(struct controller *ctrl)
 	return ctrl->pcie->port;
 }
 
-static irqreturn_t pcie_isr(int irq, void *dev_id);
+static irqreturn_t pciehp_isr(int irq, void *dev_id);
+static irqreturn_t pciehp_ist(int irq, void *dev_id);
 static void start_int_poll_timer(struct controller *ctrl, int sec);
 
 /* This is the interrupt polling timeout function. */
@@ -40,7 +41,8 @@ static void int_poll_timeout(struct timer_list *t)
 	struct controller *ctrl = from_timer(ctrl, t, poll_timer);
 
 	/* Poll for interrupt events.  regs == NULL => polling */
-	pcie_isr(0, ctrl);
+	while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD)
+		pciehp_ist(IRQ_NOTCONNECTED, ctrl);
 
 	if (!pciehp_poll_time)
 		pciehp_poll_time = 2; /* default polling interval is 2 sec */
@@ -71,7 +73,8 @@ static inline int pciehp_request_irq(struct controller *ctrl)
 	}
 
 	/* Installs the interrupt handler */
-	retval = request_irq(irq, pcie_isr, IRQF_SHARED, MY_NAME, ctrl);
+	retval = request_threaded_irq(irq, pciehp_isr, pciehp_ist,
+				      IRQF_SHARED, MY_NAME, ctrl);
 	if (retval)
 		ctrl_err(ctrl, "Cannot get irq %d for the hotplug controller\n",
 			 irq);
@@ -539,12 +542,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 {
 	struct controller *ctrl = (struct controller *)dev_id;
 	struct pci_dev *pdev = ctrl_dev(ctrl);
-	struct slot *slot = ctrl->slot;
 	u16 status, events;
-	u8 present;
-	bool link;
 
-	/* Interrupts cannot originate from a controller that's asleep */
+	/*
+	 * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
+	 */
 	if (pdev->current_state == PCI_D3cold)
 		return IRQ_NONE;
 
@@ -572,18 +574,22 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	if (!events)
 		return IRQ_NONE;
 
-	/* Capture link status before clearing interrupts */
-	if (events & PCI_EXP_SLTSTA_DLLSC)
-		link = pciehp_check_link_active(ctrl);
-
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
 	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
 
-	/* Check Command Complete Interrupt Pending */
+	/*
+	 * Command Completed notifications are not deferred to the
+	 * IRQ thread because it may be waiting for their arrival.
+	 */
 	if (events & PCI_EXP_SLTSTA_CC) {
 		ctrl->cmd_busy = 0;
 		smp_mb();
 		wake_up(&ctrl->queue);
+
+		if (events == PCI_EXP_SLTSTA_CC)
+			return IRQ_HANDLED;
+
+		events &= ~PCI_EXP_SLTSTA_CC;
 	}
 
 	if (pdev->ignore_hotplug) {
@@ -591,6 +597,24 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 		return IRQ_HANDLED;
 	}
 
+	/* Save pending events for consumption by IRQ thread. */
+	atomic_or(events, &ctrl->pending_events);
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t pciehp_ist(int irq, void *dev_id)
+{
+	struct controller *ctrl = (struct controller *)dev_id;
+	struct slot *slot = ctrl->slot;
+	u32 events;
+	u8 present;
+	bool link;
+
+	synchronize_hardirq(irq);
+	events = atomic_xchg(&ctrl->pending_events, 0);
+	if (!events)
+		return IRQ_NONE;
+
 	/* Check Attention Button Pressed */
 	if (events & PCI_EXP_SLTSTA_ABP) {
 		ctrl_info(ctrl, "Slot(%s): Attention button pressed\n",
@@ -605,12 +629,13 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	 * and cause the wrong event to queue.
 	 */
 	if (events & PCI_EXP_SLTSTA_DLLSC) {
+		link = pciehp_check_link_active(ctrl);
 		ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot),
 			  link ? "Up" : "Down");
 		pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP :
 					     INT_LINK_DOWN);
 	} else if (events & PCI_EXP_SLTSTA_PDC) {
-		present = !!(status & PCI_EXP_SLTSTA_PDS);
+		pciehp_get_adapter_status(slot, &present);
 		ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
 			  present ? "" : "not ");
 		pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
@@ -627,25 +652,6 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t pcie_isr(int irq, void *dev_id)
-{
-	irqreturn_t rc, handled = IRQ_NONE;
-
-	/*
-	 * To guarantee that all interrupt events are serviced, we need to
-	 * re-inspect Slot Status register after clearing what is presumed
-	 * to be the last pending interrupt.
-	 */
-	do {
-		rc = pciehp_isr(irq, dev_id);
-		if (rc == IRQ_HANDLED)
-			handled = IRQ_HANDLED;
-	} while (rc == IRQ_HANDLED);
-
-	/* Return IRQ_HANDLED if we handled one or more events */
-	return handled;
-}
-
 static void pcie_enable_notification(struct controller *ctrl)
 {
 	u16 cmd, mask;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 26/32] PCI: pciehp: Obey compulsory command delay after resume
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (6 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 01/32] PCI: hotplug: Don't leak pci_slot on registration failure Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 15/32] PCI: pciehp: Publish to user space last on probe Lukas Wunner
                   ` (26 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Upon resume from system sleep, the Slot Control register is written via:

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()
        pci_restore_pcie_state()

PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that
targets any portion of the Port's Slot Control register, [...] software
must wait for [the] command to complete before issuing the next command".

pciehp currently fails to enforce that rule after the above-mentioned
write.  Fix it.

(Moving restoration of the Slot Control register to pciehp doesn't seem
to make sense because the other PCIe hotplug drivers may need it as
well.)

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index a5bd3b055c0e..3e73b12ed656 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -284,6 +284,10 @@ static int pciehp_resume_noirq(struct pcie_device *dev)
 	struct controller *ctrl = get_service_data(dev);
 	struct slot *slot = ctrl->slot;
 
+	/* pci_restore_state() just wrote to the Slot Control register */
+	ctrl->cmd_started = jiffies;
+	ctrl->cmd_busy = true;
+
 	/* clear spurious events from rediscovery of inserted card */
 	if (slot->state == ON_STATE || slot->state == BLINKINGOFF_STATE)
 		pcie_clear_hotplug_events(ctrl);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/32] PCI: pciehp: Declare pciehp_unconfigure_device() void
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (10 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3 Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 28/32] PCI: pciehp: Resume to D0 on enable/disable Lukas Wunner
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Since commit 0f4bd8014db5 ("PCI: hotplug: Drop checking of PCI_BRIDGE_
CONTROL in *_unconfigure_device()"), pciehp_unconfigure_device() can no
longer fail, so declare it and its sole caller remove_board() void, in
keeping with the usual kernel pattern that enablement can fail, but
disablement cannot.  No functional change intended.

Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      |  2 +-
 drivers/pci/hotplug/pciehp_ctrl.c | 11 ++++-------
 drivers/pci/hotplug/pciehp_pci.c  |  6 ++----
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index fca87a1a2b22..acdeb08cbcba 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -115,7 +115,7 @@ int pciehp_sysfs_enable_slot(struct slot *slot);
 int pciehp_sysfs_disable_slot(struct slot *slot);
 void pciehp_queue_interrupt_event(struct slot *slot, u32 event_type);
 int pciehp_configure_device(struct slot *p_slot);
-int pciehp_unconfigure_device(struct slot *p_slot);
+void pciehp_unconfigure_device(struct slot *p_slot);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
 struct controller *pcie_init(struct pcie_device *dev);
 int pcie_init_notification(struct controller *ctrl);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 4a4639b7a479..76a9b8c26d2b 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -119,14 +119,11 @@ static int board_added(struct slot *p_slot)
  * remove_board - Turns off slot and LEDs
  * @p_slot: slot where board is being removed
  */
-static int remove_board(struct slot *p_slot)
+static void remove_board(struct slot *p_slot)
 {
-	int retval;
 	struct controller *ctrl = p_slot->ctrl;
 
-	retval = pciehp_unconfigure_device(p_slot);
-	if (retval)
-		return retval;
+	pciehp_unconfigure_device(p_slot);
 
 	if (POWER_CTRL(ctrl)) {
 		pciehp_power_off_slot(p_slot);
@@ -141,7 +138,6 @@ static int remove_board(struct slot *p_slot)
 
 	/* turn off Green LED */
 	pciehp_green_led_off(p_slot);
-	return 0;
 }
 
 struct power_work_info {
@@ -421,7 +417,8 @@ int pciehp_disable_slot(struct slot *p_slot)
 		}
 	}
 
-	return remove_board(p_slot);
+	remove_board(p_slot);
+	return 0;
 }
 
 int pciehp_sysfs_enable_slot(struct slot *p_slot)
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index 242395389293..a3aa575c248a 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -74,9 +74,8 @@ int pciehp_configure_device(struct slot *p_slot)
 	return ret;
 }
 
-int pciehp_unconfigure_device(struct slot *p_slot)
+void pciehp_unconfigure_device(struct slot *p_slot)
 {
-	int rc = 0;
 	u8 presence = 0;
 	struct pci_dev *dev, *temp;
 	struct controller *ctrl = p_slot->ctrl;
@@ -87,7 +86,7 @@ int pciehp_unconfigure_device(struct slot *p_slot)
 	mutex_lock(&acquire_unless_disconnected);
 	if (pci_dev_is_disconnected(bridge)) {
 		mutex_unlock(&acquire_unless_disconnected);
-		return -ENODEV;
+		return;
 	}
 	pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
 	pci_lock_rescan_remove();
@@ -121,5 +120,4 @@ int pciehp_unconfigure_device(struct slot *p_slot)
 	}
 
 	pci_unlock_rescan_remove();
-	return rc;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (9 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 16/32] PCI: pciehp: Track enable/disable status Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-21 11:13   ` Mika Westerberg
  2018-06-16 19:25 ` [PATCH 06/32] PCI: pciehp: Declare pciehp_unconfigure_device() void Lukas Wunner
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Andreas Noever

Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W.
This requires that runtime D3 is allowed on its PCIe ports, so whitelist
them.

The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports
is unnecessary on Thunderbolt because we know that even the oldest
controller, Light Ridge (2010), is able to suspend its ports to D3 just
fine -- specifically including its hotplug ports.  And the power saving
should be afforded to machines even if their BIOS predates 2015.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andreas Noever <andreas.noever@gmail.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4099a6c14b6d..c4d10726c59a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2290,7 +2290,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
  * @bridge: Bridge to check
  *
  * This function checks if it is possible to move the bridge to D3.
- * Currently we only allow D3 for recent enough PCIe ports.
+ * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt.
  */
 bool pci_bridge_d3_possible(struct pci_dev *bridge)
 {
@@ -2314,6 +2314,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
 		if (pci_bridge_d3_force)
 			return true;
 
+		/* Even the oldest 2010 Thunderbolt controller supports D3. */
+		if (bridge->is_thunderbolt)
+			return true;
+
 		/*
 		 * Hotplug ports handled natively by the OS were not validated
 		 * by vendors for runtime D3 at least until 2018 because there
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 25/32] PCI: pciehp: Clear spurious events earlier on resume
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (17 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 21/32] PCI: pciehp: Become resilient to missed events Lukas Wunner
@ 2018-06-16 19:25 ` Lukas Wunner
  2018-06-16 19:25 ` [PATCH 02/32] PCI: pciehp: Fix UAF on unplug Lukas Wunner
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-16 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Thunderbolt hotplug ports that were occupied before system sleep resume
with their downstream link in "off" state.  Only after the Thunderbolt
controller has reestablished the PCIe tunnels does the link go up.
As a result, a spurious Presence Detect Changed and/or Data Link Layer
State Changed event occurs.

The events are not immediately acted upon because tunnel reestablishment
happens in the ->resume_noirq phase, when interrupts are still disabled.
Also, notification of events may initially be disabled in the Slot
Control register when coming out of system sleep and is reenabled in the
->resume_noirq phase through:

  pci_pm_resume_noirq()
    pci_pm_default_resume_early()
      pci_restore_state()
        pci_restore_pcie_state()

It is not guaranteed that the events are acted upon at all:  PCIe r4.0,
sec 6.7.3.4 says that "a port may optionally send an MSI when there are
hot-plug events that occur while interrupt generation is disabled, and
interrupt generation is subsequently enabled."  Note the "optionally".

If an MSI is sent, pciehp will gratuitously turn the slot off and back
on once the ->resume_early phase has commenced.

If an MSI is not sent, the extant, unacknowledged events in the Slot
Status register will prevent future notification of presence or link
changes.

Commit 13c65840feab ("PCI: pciehp: Clear Presence Detect and Data Link
Layer Status Changed on resume") fixed the latter by clearing the events
in the ->resume phase.  Move this to the ->resume_noirq phase to also
fix the gratuitous disable/enablement of the slot.

The commit further restored the Slot Control register in the ->resume
phase, but that's dispensable because as shown above it's already been
done in the ->resume_noirq phase.

Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp.h      |  2 +-
 drivers/pci/hotplug/pciehp_core.c | 17 +++++++++++++----
 drivers/pci/hotplug/pciehp_hpc.c  | 17 ++++++-----------
 drivers/pci/pcie/portdrv.h        |  2 ++
 drivers/pci/pcie/portdrv_core.c   |  6 ++++++
 drivers/pci/pcie/portdrv_pci.c    |  2 ++
 6 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index bb015be34894..247681963063 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -181,7 +181,7 @@ void pciehp_queue_pushbutton_work(struct work_struct *work);
 struct controller *pcie_init(struct pcie_device *dev);
 int pcie_init_notification(struct controller *ctrl);
 void pcie_shutdown_notification(struct controller *ctrl);
-void pcie_reenable_notification(struct controller *ctrl);
+void pcie_clear_hotplug_events(struct controller *ctrl);
 int pciehp_power_on_slot(struct slot *slot);
 void pciehp_power_off_slot(struct slot *slot);
 void pciehp_get_power_status(struct slot *slot, u8 *status);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 1be8871e7439..a5bd3b055c0e 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -279,6 +279,18 @@ static int pciehp_suspend(struct pcie_device *dev)
 	return 0;
 }
 
+static int pciehp_resume_noirq(struct pcie_device *dev)
+{
+	struct controller *ctrl = get_service_data(dev);
+	struct slot *slot = ctrl->slot;
+
+	/* clear spurious events from rediscovery of inserted card */
+	if (slot->state == ON_STATE || slot->state == BLINKINGOFF_STATE)
+		pcie_clear_hotplug_events(ctrl);
+
+	return 0;
+}
+
 static int pciehp_resume(struct pcie_device *dev)
 {
 	struct controller *ctrl;
@@ -286,10 +298,6 @@ static int pciehp_resume(struct pcie_device *dev)
 	u8 status;
 
 	ctrl = get_service_data(dev);
-
-	/* reinitialize the chipset's event detection logic */
-	pcie_reenable_notification(ctrl);
-
 	slot = ctrl->slot;
 
 	/* Check if slot is occupied */
@@ -316,6 +324,7 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
 
 #ifdef	CONFIG_PM
 	.suspend	= pciehp_suspend,
+	.resume_noirq	= pciehp_resume_noirq,
 	.resume		= pciehp_resume,
 #endif	/* PM */
 };
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index a89ae65d3c7c..165b940de783 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -681,17 +681,6 @@ static void pcie_enable_notification(struct controller *ctrl)
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
 }
 
-void pcie_reenable_notification(struct controller *ctrl)
-{
-	/*
-	 * Clear both Presence and Data Link Layer Changed to make sure
-	 * those events still fire after we have re-enabled them.
-	 */
-	pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_SLTSTA,
-				   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
-	pcie_enable_notification(ctrl);
-}
-
 static void pcie_disable_notification(struct controller *ctrl)
 {
 	u16 mask;
@@ -705,6 +694,12 @@ static void pcie_disable_notification(struct controller *ctrl)
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
 }
 
+void pcie_clear_hotplug_events(struct controller *ctrl)
+{
+	pcie_capability_write_word(ctrl_dev(ctrl), PCI_EXP_SLTSTA,
+				   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
+}
+
 /*
  * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
  * bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6ffc797a0dc1..d59afa42fc14 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -50,6 +50,7 @@ struct pcie_port_service_driver {
 	int (*probe) (struct pcie_device *dev);
 	void (*remove) (struct pcie_device *dev);
 	int (*suspend) (struct pcie_device *dev);
+	int (*resume_noirq) (struct pcie_device *dev);
 	int (*resume) (struct pcie_device *dev);
 
 	/* Device driver may resume normal operations */
@@ -82,6 +83,7 @@ extern struct bus_type pcie_port_bus_type;
 int pcie_port_device_register(struct pci_dev *dev);
 #ifdef CONFIG_PM
 int pcie_port_device_suspend(struct device *dev);
+int pcie_port_device_resume_noirq(struct device *dev);
 int pcie_port_device_resume(struct device *dev);
 #endif
 void pcie_port_device_remove(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 13a248575a14..7c37d815229e 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -380,6 +380,12 @@ int pcie_port_device_suspend(struct device *dev)
 	return device_for_each_child(dev, &off, pm_iter);
 }
 
+int pcie_port_device_resume_noirq(struct device *dev)
+{
+	size_t off = offsetof(struct pcie_port_service_driver, resume_noirq);
+	return device_for_each_child(dev, &off, pm_iter);
+}
+
 /**
  * pcie_port_device_resume - resume port services associated with a PCIe port
  * @dev: PCI Express port to handle
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 973f1b80a038..5e0f02c68faa 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -76,10 +76,12 @@ static int pcie_port_runtime_idle(struct device *dev)
 
 static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.suspend	= pcie_port_device_suspend,
+	.resume_noirq	= pcie_port_device_resume_noirq,
 	.resume		= pcie_port_device_resume,
 	.freeze		= pcie_port_device_suspend,
 	.thaw		= pcie_port_device_resume,
 	.poweroff	= pcie_port_device_suspend,
+	.restore_noirq	= pcie_port_device_resume_noirq,
 	.restore	= pcie_port_device_resume,
 	.runtime_suspend = pcie_port_runtime_suspend,
 	.runtime_resume	= pcie_port_runtime_resume,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core
  2018-06-16 19:25 ` [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core Lukas Wunner
@ 2018-06-17 16:44   ` Andy Shevchenko
  2018-07-16 12:46     ` Bjorn Helgaas
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Shevchenko @ 2018-06-17 16:44 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Rafael J. Wysocki, ashok.raj,
	Keith Busch, Yinghai Lu, okaya, linux-pci, Len Brown, scott,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, gwshan,
	Sebastian Ott, gerald.schaefer, Corentin Chary, Darren Hart,
	Andy Shevchenko, ACPI Devel Maling List, linux-s390,
	Platform Driver, acpi4asus-user

On Sat, Jun 16, 2018 at 10:37 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> When a hotplug driver calls pci_hp_register(), all steps necessary for
> registration are carried out in one go, including creation of a kobject
> and addition to sysfs.  That's a problem for pciehp once it's converted
> to enable/disable the slot exclusively from the IRQ thread:  The thread
> needs to be spawned after creation of the kobject (because it uses the
> kobject's name), but before addition to sysfs (because it will handle
> enable/disable requests submitted via sysfs).
>
> pci_hp_deregister() does offer a ->release callback that's invoked
> after deletion from sysfs and before destruction of the kobject.  But
> because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
> ->probe and ->remove code becomes asymmetric, which is error prone
> as recently discovered use-after-free bugs in pciehp's ->remove hook
> have shown.
>
> In a sense, this appears to be a case of the midlayer antipattern:
>
>    "The core thesis of the "midlayer mistake" is that midlayers are
>     bad and should not exist.  That common functionality which it is
>     so tempting to put in a midlayer should instead be provided as
>     library routines which can [be] used, augmented, or ignored by
>     each bottom level driver independently.  Thus every subsystem
>     that supports multiple implementations (or drivers) should
>     provide a very thin top layer which calls directly into the
>     bottom layer drivers, and a rich library of support code that
>     eases the implementation of those drivers.  This library is
>     available to, but not forced upon, those drivers."
>         --  Neil Brown (2009), https://lwn.net/Articles/336262/
>
> The presence of midlayer traits in the PCI hotplug core might be ascribed
> to its age:  When it was introduced in February 2002, the blessings of a
> library approach might not have been well known:
> https://git.kernel.org/tglx/history/c/a8a2069f432c
>
> For comparison, the driver core does offer split functions for creating
> a kobject (device_initialize()) and addition to sysfs (device_add()) as
> an alternative to carrying out everything at once (device_register()).
> This was introduced in October 2002:
> https://git.kernel.org/tglx/history/c/8b290eb19962
>
> The odd ->release callback in the PCI hotplug core was added in 2003:
> https://git.kernel.org/tglx/history/c/69f8d663b595
>
> Clearly, a library approach would not force every hotplug driver to
> implement a ->release callback, but rather allow the driver to remove
> the sysfs files, release its data structures and finally destroy the
> kobject.  Alternatively, a driver may choose to remove everything with
> pci_hp_deregister(), then release its data structures.
>
> To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
> split-up version of pci_hp_register().  Likewise, offer pci_hp_del()
> and pci_hp_destroy() as a split-up version of pci_hp_deregister().
>
> Eliminate the ->release callback and move its code into each driver's
> teardown routine.
>
> Declare pci_hp_deregister() void, in keeping with the usual kernel
> pattern that enablement can fail, but disablement cannot.  It only
> returned an error if the caller passed in a NULL pointer or a slot which
> has never or is no longer registered or is sharing its name with another
> slot.  Those would be bugs, so WARN about them.  Few hotplug drivers
> actually checked the return value and those that did only printed a
> useless error message to dmesg.  Remove that.
>
> For most drivers the conversion was straightforward since it doesn't
> matter whether the code in the ->release callback is executed before or
> after destruction of the kobject.  But in the case of ibmphp, it was
> unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
> NULL needs to happen before the kobject is destroyed, so I erred on
> the side of caution and ensured that the order stays the same.  Another
> nontrivial case is pnv_php, I've found the list and kref logic difficult
> to understand, however my impression was that it is safe to delete the
> list element and drop the references until after the kobject is
> destroyed.
>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Scott Murray <scott@spiteful.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Cc: Corentin Chary <corentin.chary@gmail.com>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Shevchenko <andy@infradead.org>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/hotplug/acpiphp_core.c      |  22 +---
>  drivers/pci/hotplug/cpci_hotplug_core.c |  14 +--
>  drivers/pci/hotplug/cpqphp_core.c       |  16 +--
>  drivers/pci/hotplug/ibmphp_core.c       |  15 ++-
>  drivers/pci/hotplug/ibmphp_ebda.c       |  20 ----
>  drivers/pci/hotplug/pci_hotplug_core.c  | 139 +++++++++++++++++-------
>  drivers/pci/hotplug/pciehp_core.c       |  19 +---
>  drivers/pci/hotplug/pcihp_skeleton.c    |  16 +--
>  drivers/pci/hotplug/pnv_php.c           |   5 +-
>  drivers/pci/hotplug/rpaphp_core.c       |   2 +-
>  drivers/pci/hotplug/rpaphp_slot.c       |  13 +--
>  drivers/pci/hotplug/s390_pci_hpc.c      |  13 +--
>  drivers/pci/hotplug/sgi_hotplug.c       |   9 +-
>  drivers/pci/hotplug/shpchp_core.c       |  20 +---

>  drivers/platform/x86/asus-wmi.c         |  12 +-
>  drivers/platform/x86/eeepc-laptop.c     |  12 +-

So, as far as I can see these are just mechanical changes due to
internal API change.
Thus,

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>

for PDx86 bits only.

>  include/linux/pci_hotplug.h             |  15 ++-
>  17 files changed, 171 insertions(+), 191 deletions(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_core.c b/drivers/pci/hotplug/acpiphp_core.c
> index 12b5655fd390..ad32ffbc4b91 100644
> --- a/drivers/pci/hotplug/acpiphp_core.c
> +++ b/drivers/pci/hotplug/acpiphp_core.c
> @@ -254,20 +254,6 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
>         return 0;
>  }
>
> -/**
> - * release_slot - free up the memory used by a slot
> - * @hotplug_slot: slot to free
> - */
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       struct slot *slot = hotplug_slot->private;
> -
> -       pr_debug("%s - physical_slot = %s\n", __func__, slot_name(slot));
> -
> -       kfree(slot->hotplug_slot);
> -       kfree(slot);
> -}
> -
>  /* callback routine to initialize 'struct slot' for each slot */
>  int acpiphp_register_hotplug_slot(struct acpiphp_slot *acpiphp_slot,
>                                   unsigned int sun)
> @@ -287,7 +273,6 @@ int acpiphp_register_hotplug_slot(struct acpiphp_slot *acpiphp_slot,
>         slot->hotplug_slot->info = &slot->info;
>
>         slot->hotplug_slot->private = slot;
> -       slot->hotplug_slot->release = &release_slot;
>         slot->hotplug_slot->ops = &acpi_hotplug_slot_ops;
>
>         slot->acpi_slot = acpiphp_slot;
> @@ -324,13 +309,12 @@ int acpiphp_register_hotplug_slot(struct acpiphp_slot *acpiphp_slot,
>  void acpiphp_unregister_hotplug_slot(struct acpiphp_slot *acpiphp_slot)
>  {
>         struct slot *slot = acpiphp_slot->slot;
> -       int retval = 0;
>
>         pr_info("Slot [%s] unregistered\n", slot_name(slot));
>
> -       retval = pci_hp_deregister(slot->hotplug_slot);
> -       if (retval)
> -               pr_err("pci_hp_deregister failed with error %d\n", retval);
> +       pci_hp_deregister(slot->hotplug_slot);
> +       kfree(slot->hotplug_slot);
> +       kfree(slot);
>  }
>
>
> diff --git a/drivers/pci/hotplug/cpci_hotplug_core.c b/drivers/pci/hotplug/cpci_hotplug_core.c
> index 07b533adc9df..52a339baf06c 100644
> --- a/drivers/pci/hotplug/cpci_hotplug_core.c
> +++ b/drivers/pci/hotplug/cpci_hotplug_core.c
> @@ -195,10 +195,8 @@ get_latch_status(struct hotplug_slot *hotplug_slot, u8 *value)
>         return 0;
>  }
>
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> +static void release_slot(struct slot *slot)
>  {
> -       struct slot *slot = hotplug_slot->private;
> -
>         kfree(slot->hotplug_slot->info);
>         kfree(slot->hotplug_slot);
>         pci_dev_put(slot->dev);
> @@ -253,7 +251,6 @@ cpci_hp_register_bus(struct pci_bus *bus, u8 first, u8 last)
>                 snprintf(name, SLOT_NAME_SIZE, "%02x:%02x", bus->number, i);
>
>                 hotplug_slot->private = slot;
> -               hotplug_slot->release = &release_slot;
>                 hotplug_slot->ops = &cpci_hotplug_slot_ops;
>
>                 /*
> @@ -308,12 +305,8 @@ cpci_hp_unregister_bus(struct pci_bus *bus)
>                         slots--;
>
>                         dbg("deregistering slot %s", slot_name(slot));
> -                       status = pci_hp_deregister(slot->hotplug_slot);
> -                       if (status) {
> -                               err("pci_hp_deregister failed with error %d",
> -                                   status);
> -                               break;
> -                       }
> +                       pci_hp_deregister(slot->hotplug_slot);
> +                       release_slot(slot);
>                 }
>         }
>         up_write(&list_rwsem);
> @@ -623,6 +616,7 @@ cleanup_slots(void)
>         list_for_each_entry_safe(slot, tmp, &slot_list, slot_list) {
>                 list_del(&slot->slot_list);
>                 pci_hp_deregister(slot->hotplug_slot);
> +               release_slot(slot);
>         }
>  cleanup_null:
>         up_write(&list_rwsem);
> diff --git a/drivers/pci/hotplug/cpqphp_core.c b/drivers/pci/hotplug/cpqphp_core.c
> index 1797e36ec586..5a06636e910a 100644
> --- a/drivers/pci/hotplug/cpqphp_core.c
> +++ b/drivers/pci/hotplug/cpqphp_core.c
> @@ -266,17 +266,6 @@ static void __iomem *get_SMBIOS_entry(void __iomem *smbios_start,
>         return previous;
>  }
>
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       struct slot *slot = hotplug_slot->private;
> -
> -       dbg("%s - physical_slot = %s\n", __func__, slot_name(slot));
> -
> -       kfree(slot->hotplug_slot->info);
> -       kfree(slot->hotplug_slot);
> -       kfree(slot);
> -}
> -
>  static int ctrl_slot_cleanup(struct controller *ctrl)
>  {
>         struct slot *old_slot, *next_slot;
> @@ -285,9 +274,11 @@ static int ctrl_slot_cleanup(struct controller *ctrl)
>         ctrl->slot = NULL;
>
>         while (old_slot) {
> -               /* memory will be freed by the release_slot callback */
>                 next_slot = old_slot->next;
>                 pci_hp_deregister(old_slot->hotplug_slot);
> +               kfree(old_slot->hotplug_slot->info);
> +               kfree(old_slot->hotplug_slot);
> +               kfree(old_slot);
>                 old_slot = next_slot;
>         }
>
> @@ -678,7 +669,6 @@ static int ctrl_slot_setup(struct controller *ctrl,
>                         ((read_slot_enable(ctrl) << 2) >> ctrl_slot) & 0x04;
>
>                 /* register this slot with the hotplug pci core */
> -               hotplug_slot->release = &release_slot;
>                 hotplug_slot->private = slot;
>                 snprintf(name, SLOT_NAME_SIZE, "%u", slot->number);
>                 hotplug_slot->ops = &cpqphp_hotplug_slot_ops;
> diff --git a/drivers/pci/hotplug/ibmphp_core.c b/drivers/pci/hotplug/ibmphp_core.c
> index 1869b0411ce0..4ea57e9019f1 100644
> --- a/drivers/pci/hotplug/ibmphp_core.c
> +++ b/drivers/pci/hotplug/ibmphp_core.c
> @@ -673,7 +673,20 @@ static void free_slots(void)
>
>         list_for_each_entry_safe(slot_cur, next, &ibmphp_slot_head,
>                                  ibm_slot_list) {
> -               pci_hp_deregister(slot_cur->hotplug_slot);
> +               pci_hp_del(slot_cur->hotplug_slot);
> +               slot_cur->ctrl = NULL;
> +               slot_cur->bus_on = NULL;
> +
> +               /*
> +                * We don't want to actually remove the resources,
> +                * since ibmphp_free_resources() will do just that.
> +                */
> +               ibmphp_unconfigure_card(&slot_cur, -1);
> +
> +               pci_hp_destroy(slot_cur->hotplug_slot);
> +               kfree(slot_cur->hotplug_slot->info);
> +               kfree(slot_cur->hotplug_slot);
> +               kfree(slot_cur);
>         }
>         debug("%s -- exit\n", __func__);
>  }
> diff --git a/drivers/pci/hotplug/ibmphp_ebda.c b/drivers/pci/hotplug/ibmphp_ebda.c
> index 64549aa24c0f..6f8e90e3ec08 100644
> --- a/drivers/pci/hotplug/ibmphp_ebda.c
> +++ b/drivers/pci/hotplug/ibmphp_ebda.c
> @@ -699,25 +699,6 @@ static int fillslotinfo(struct hotplug_slot *hotplug_slot)
>         return rc;
>  }
>
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       struct slot *slot;
> -
> -       if (!hotplug_slot || !hotplug_slot->private)
> -               return;
> -
> -       slot = hotplug_slot->private;
> -       kfree(slot->hotplug_slot->info);
> -       kfree(slot->hotplug_slot);
> -       slot->ctrl = NULL;
> -       slot->bus_on = NULL;
> -
> -       /* we don't want to actually remove the resources, since free_resources will do just that */
> -       ibmphp_unconfigure_card(&slot, -1);
> -
> -       kfree(slot);
> -}
> -
>  static struct pci_driver ibmphp_driver;
>
>  /*
> @@ -941,7 +922,6 @@ static int __init ebda_rsrc_controller(void)
>                         tmp_slot->hotplug_slot = hp_slot_ptr;
>
>                         hp_slot_ptr->private = tmp_slot;
> -                       hp_slot_ptr->release = release_slot;
>
>                         rc = fillslotinfo(hp_slot_ptr);
>                         if (rc)
> diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c
> index fd93783a87b0..90fde5f106d8 100644
> --- a/drivers/pci/hotplug/pci_hotplug_core.c
> +++ b/drivers/pci/hotplug/pci_hotplug_core.c
> @@ -396,8 +396,9 @@ static struct hotplug_slot *get_slot_from_name(const char *name)
>   * @owner: caller module owner
>   * @mod_name: caller module name
>   *
> - * Registers a hotplug slot with the pci hotplug subsystem, which will allow
> - * userspace interaction to the slot.
> + * Prepares a hotplug slot for in-kernel use and immediately publishes it to
> + * user space in one go.  Drivers may alternatively carry out the two steps
> + * separately by invoking pci_hp_initialize() and pci_hp_add().
>   *
>   * Returns 0 if successful, anything else for an error.
>   */
> @@ -406,54 +407,91 @@ int __pci_hp_register(struct hotplug_slot *slot, struct pci_bus *bus,
>                       struct module *owner, const char *mod_name)
>  {
>         int result;
> +
> +       result = __pci_hp_initialize(slot, bus, devnr, name, owner, mod_name);
> +       if (result)
> +               return result;
> +
> +       result = pci_hp_add(slot);
> +       if (result)
> +               pci_hp_destroy(slot);
> +
> +       return result;
> +}
> +EXPORT_SYMBOL_GPL(__pci_hp_register);
> +
> +/**
> + * __pci_hp_initialize - prepare hotplug slot for in-kernel use
> + * @slot: pointer to the &struct hotplug_slot to initialize
> + * @bus: bus this slot is on
> + * @devnr: slot number
> + * @name: name registered with kobject core
> + * @owner: caller module owner
> + * @mod_name: caller module name
> + *
> + * Allocate and fill in a PCI slot for use by a hotplug driver.  Once this has
> + * been called, the driver may invoke hotplug_slot_name() to get the slot's
> + * unique name.  The driver must be prepared to handle a ->reset_slot callback
> + * from this point on.
> + *
> + * Returns 0 on success or a negative int on error.
> + */
> +int __pci_hp_initialize(struct hotplug_slot *slot, struct pci_bus *bus,
> +                       int devnr, const char *name, struct module *owner,
> +                       const char *mod_name)
> +{
>         struct pci_slot *pci_slot;
>
>         if (slot == NULL)
>                 return -ENODEV;
>         if ((slot->info == NULL) || (slot->ops == NULL))
>                 return -EINVAL;
> -       if (slot->release == NULL) {
> -               dbg("Why are you trying to register a hotplug slot without a proper release function?\n");
> -               return -EINVAL;
> -       }
>
>         slot->ops->owner = owner;
>         slot->ops->mod_name = mod_name;
>
> -       mutex_lock(&pci_hp_mutex);
>         /*
>          * No problems if we call this interface from both ACPI_PCI_SLOT
>          * driver and call it here again. If we've already created the
>          * pci_slot, the interface will simply bump the refcount.
>          */
>         pci_slot = pci_create_slot(bus, devnr, name, slot);
> -       if (IS_ERR(pci_slot)) {
> -               result = PTR_ERR(pci_slot);
> -               goto out;
> -       }
> +       if (IS_ERR(pci_slot))
> +               return PTR_ERR(pci_slot);
>
>         slot->pci_slot = pci_slot;
>         pci_slot->hotplug = slot;
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(__pci_hp_initialize);
>
> -       list_add(&slot->slot_list, &pci_hotplug_slot_list);
> +/**
> + * pci_hp_add - publish hotplug slot to user space
> + * @slot: pointer to the &struct hotplug_slot to publish
> + *
> + * Make a hotplug slot's sysfs interface available and inform user space of its
> + * addition by sending a uevent.  The hotplug driver must be prepared to handle
> + * all &struct hotplug_slot_ops callbacks from this point on.
> + *
> + * Returns 0 on success or a negative int on error.
> + */
> +int pci_hp_add(struct hotplug_slot *slot)
> +{
> +       struct pci_slot *pci_slot = slot->pci_slot;
> +       int result;
>
>         result = fs_add_slot(pci_slot);
>         if (result)
> -               goto err_list_del;
> +               return result;
>
>         kobject_uevent(&pci_slot->kobj, KOBJ_ADD);
> -       dbg("Added slot %s to the list\n", name);
> -       goto out;
> -
> -err_list_del:
> -       list_del(&slot->slot_list);
> -       pci_slot->hotplug = NULL;
> -       pci_destroy_slot(pci_slot);
> -out:
> +       mutex_lock(&pci_hp_mutex);
> +       list_add(&slot->slot_list, &pci_hotplug_slot_list);
>         mutex_unlock(&pci_hp_mutex);
> -       return result;
> +       dbg("Added slot %s to the list\n", hotplug_slot_name(slot));
> +       return 0;
>  }
> -EXPORT_SYMBOL_GPL(__pci_hp_register);
> +EXPORT_SYMBOL_GPL(pci_hp_add);
>
>  /**
>   * pci_hp_deregister - deregister a hotplug_slot with the PCI hotplug subsystem
> @@ -464,35 +502,62 @@ EXPORT_SYMBOL_GPL(__pci_hp_register);
>   *
>   * Returns 0 if successful, anything else for an error.
>   */
> -int pci_hp_deregister(struct hotplug_slot *slot)
> +void pci_hp_deregister(struct hotplug_slot *slot)
> +{
> +       pci_hp_del(slot);
> +       pci_hp_destroy(slot);
> +}
> +EXPORT_SYMBOL_GPL(pci_hp_deregister);
> +
> +/**
> + * pci_hp_del - unpublish hotplug slot from user space
> + * @slot: pointer to the &struct hotplug_slot to unpublish
> + *
> + * Remove a hotplug slot's sysfs interface.
> + *
> + * Returns 0 on success or a negative int on error.
> + */
> +void pci_hp_del(struct hotplug_slot *slot)
>  {
>         struct hotplug_slot *temp;
> -       struct pci_slot *pci_slot;
>
> -       if (!slot)
> -               return -ENODEV;
> +       if (WARN_ON(!slot))
> +               return;
>
>         mutex_lock(&pci_hp_mutex);
>         temp = get_slot_from_name(hotplug_slot_name(slot));
> -       if (temp != slot) {
> +       if (WARN_ON(temp != slot)) {
>                 mutex_unlock(&pci_hp_mutex);
> -               return -ENODEV;
> +               return;
>         }
>
>         list_del(&slot->slot_list);
> -
> -       pci_slot = slot->pci_slot;
> -       fs_remove_slot(pci_slot);
> +       mutex_unlock(&pci_hp_mutex);
>         dbg("Removed slot %s from the list\n", hotplug_slot_name(slot));
> +       fs_remove_slot(slot->pci_slot);
> +}
> +EXPORT_SYMBOL_GPL(pci_hp_del);
> +
> +/**
> + * pci_hp_destroy - remove hotplug slot from in-kernel use
> + * @slot: pointer to the &struct hotplug_slot to destroy
> + *
> + * Destroy a PCI slot used by a hotplug driver.  Once this has been called,
> + * the driver may no longer invoke hotplug_slot_name() to get the slot's
> + * unique name.  The driver no longer needs to handle a ->reset_slot callback
> + * from this point on.
> + *
> + * Returns 0 on success or a negative int on error.
> + */
> +void pci_hp_destroy(struct hotplug_slot *slot)
> +{
> +       struct pci_slot *pci_slot = slot->pci_slot;
>
> -       slot->release(slot);
> +       slot->pci_slot = NULL;
>         pci_slot->hotplug = NULL;
>         pci_destroy_slot(pci_slot);
> -       mutex_unlock(&pci_hp_mutex);
> -
> -       return 0;
>  }
> -EXPORT_SYMBOL_GPL(pci_hp_deregister);
> +EXPORT_SYMBOL_GPL(pci_hp_destroy);
>
>  /**
>   * pci_hp_change_slot_info - changes the slot's information structure in the core
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index b360645377c2..37d8f81e548f 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -56,17 +56,6 @@ static int get_latch_status(struct hotplug_slot *slot, u8 *value);
>  static int get_adapter_status(struct hotplug_slot *slot, u8 *value);
>  static int reset_slot(struct hotplug_slot *slot, int probe);
>
> -/**
> - * release_slot - free up the memory used by a slot
> - * @hotplug_slot: slot to free
> - */
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       kfree(hotplug_slot->ops);
> -       kfree(hotplug_slot->info);
> -       kfree(hotplug_slot);
> -}
> -
>  static int init_slot(struct controller *ctrl)
>  {
>         struct slot *slot = ctrl->slot;
> @@ -107,7 +96,6 @@ static int init_slot(struct controller *ctrl)
>         /* register this slot with the hotplug pci core */
>         hotplug->info = info;
>         hotplug->private = slot;
> -       hotplug->release = &release_slot;
>         hotplug->ops = ops;
>         slot->hotplug_slot = hotplug;
>         snprintf(name, SLOT_NAME_SIZE, "%u", PSN(ctrl));
> @@ -127,7 +115,12 @@ static int init_slot(struct controller *ctrl)
>
>  static void cleanup_slot(struct controller *ctrl)
>  {
> -       pci_hp_deregister(ctrl->slot->hotplug_slot);
> +       struct hotplug_slot *hotplug_slot = ctrl->slot->hotplug_slot;
> +
> +       pci_hp_deregister(hotplug_slot);
> +       kfree(hotplug_slot->ops);
> +       kfree(hotplug_slot->info);
> +       kfree(hotplug_slot);
>  }
>
>  /*
> diff --git a/drivers/pci/hotplug/pcihp_skeleton.c b/drivers/pci/hotplug/pcihp_skeleton.c
> index c19694a04d2c..94ad7cd72878 100644
> --- a/drivers/pci/hotplug/pcihp_skeleton.c
> +++ b/drivers/pci/hotplug/pcihp_skeleton.c
> @@ -210,16 +210,6 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
>         return retval;
>  }
>
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       struct slot *slot = hotplug_slot->private;
> -
> -       dbg("%s - physical_slot = %s\n", __func__, hotplug_slot->name);
> -       kfree(slot->hotplug_slot->info);
> -       kfree(slot->hotplug_slot);
> -       kfree(slot);
> -}
> -
>  static void make_slot_name(struct slot *slot)
>  {
>         /*
> @@ -270,7 +260,6 @@ static int __init init_slots(void)
>
>                 hotplug_slot->name = slot->name;
>                 hotplug_slot->private = slot;
> -               hotplug_slot->release = &release_slot;
>                 make_slot_name(slot);
>                 hotplug_slot->ops = &skel_hotplug_slot_ops;
>
> @@ -311,12 +300,13 @@ static void __exit cleanup_slots(void)
>
>         /*
>          * Unregister all of our slots with the pci_hotplug subsystem.
> -        * Memory will be freed in release_slot() callback after slot's
> -        * lifespan is finished.
>          */
>         list_for_each_entry_safe(slot, next, &slot_list, slot_list) {
>                 list_del(&slot->slot_list);
>                 pci_hp_deregister(slot->hotplug_slot);
> +               kfree(slot->hotplug_slot->info);
> +               kfree(slot->hotplug_slot);
> +               kfree(slot);
>         }
>  }
>
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index 6c2e8d7307c6..3276a5e4c430 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -538,9 +538,8 @@ static struct hotplug_slot_ops php_slot_ops = {
>         .disable_slot           = pnv_php_disable_slot,
>  };
>
> -static void pnv_php_release(struct hotplug_slot *slot)
> +static void pnv_php_release(struct pnv_php_slot *php_slot)
>  {
> -       struct pnv_php_slot *php_slot = slot->private;
>         unsigned long flags;
>
>         /* Remove from global or child list */
> @@ -596,7 +595,6 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>         php_slot->power_state_check     = false;
>         php_slot->slot.ops              = &php_slot_ops;
>         php_slot->slot.info             = &php_slot->slot_info;
> -       php_slot->slot.release          = pnv_php_release;
>         php_slot->slot.private          = php_slot;
>
>         INIT_LIST_HEAD(&php_slot->children);
> @@ -924,6 +922,7 @@ static void pnv_php_unregister_one(struct device_node *dn)
>
>         php_slot->state = PNV_PHP_STATE_OFFLINE;
>         pci_hp_deregister(&php_slot->slot);
> +       pnv_php_release(php_slot);
>         pnv_php_put_slot(php_slot);
>  }
>
> diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
> index fb5e0845429d..857c358b727b 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -404,13 +404,13 @@ static void __exit cleanup_slots(void)
>         /*
>          * Unregister all of our slots with the pci_hotplug subsystem,
>          * and free up all memory that we had allocated.
> -        * memory will be freed in release_slot callback.
>          */
>
>         list_for_each_entry_safe(slot, next, &rpaphp_slot_head,
>                                  rpaphp_slot_list) {
>                 list_del(&slot->rpaphp_slot_list);
>                 pci_hp_deregister(slot->hotplug_slot);
> +               dealloc_slot_struct(slot);
>         }
>         return;
>  }
> diff --git a/drivers/pci/hotplug/rpaphp_slot.c b/drivers/pci/hotplug/rpaphp_slot.c
> index 3840a2075e6a..b916c8e4372d 100644
> --- a/drivers/pci/hotplug/rpaphp_slot.c
> +++ b/drivers/pci/hotplug/rpaphp_slot.c
> @@ -19,12 +19,6 @@
>  #include "rpaphp.h"
>
>  /* free up the memory used by a slot */
> -static void rpaphp_release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       struct slot *slot = (struct slot *) hotplug_slot->private;
> -       dealloc_slot_struct(slot);
> -}
> -
>  void dealloc_slot_struct(struct slot *slot)
>  {
>         kfree(slot->hotplug_slot->info);
> @@ -56,7 +50,6 @@ struct slot *alloc_slot_struct(struct device_node *dn,
>         slot->power_domain = power_domain;
>         slot->hotplug_slot->private = slot;
>         slot->hotplug_slot->ops = &rpaphp_hotplug_slot_ops;
> -       slot->hotplug_slot->release = &rpaphp_release_slot;
>
>         return (slot);
>
> @@ -90,10 +83,8 @@ int rpaphp_deregister_slot(struct slot *slot)
>                 __func__, slot->name);
>
>         list_del(&slot->rpaphp_slot_list);
> -
> -       retval = pci_hp_deregister(php_slot);
> -       if (retval)
> -               err("Problem unregistering a slot %s\n", slot->name);
> +       pci_hp_deregister(php_slot);
> +       dealloc_slot_struct(slot);
>
>         dbg("%s - Exit: rc[%d]\n", __func__, retval);
>         return retval;
> diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
> index ffdc2977395d..93b5341d282c 100644
> --- a/drivers/pci/hotplug/s390_pci_hpc.c
> +++ b/drivers/pci/hotplug/s390_pci_hpc.c
> @@ -130,15 +130,6 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
>         return 0;
>  }
>
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       struct slot *slot = hotplug_slot->private;
> -
> -       kfree(slot->hotplug_slot->info);
> -       kfree(slot->hotplug_slot);
> -       kfree(slot);
> -}
> -
>  static struct hotplug_slot_ops s390_hotplug_slot_ops = {
>         .enable_slot =          enable_slot,
>         .disable_slot =         disable_slot,
> @@ -175,7 +166,6 @@ int zpci_init_slot(struct zpci_dev *zdev)
>         hotplug_slot->info = info;
>
>         hotplug_slot->ops = &s390_hotplug_slot_ops;
> -       hotplug_slot->release = &release_slot;
>
>         get_power_status(hotplug_slot, &info->power_status);
>         get_adapter_status(hotplug_slot, &info->adapter_status);
> @@ -209,5 +199,8 @@ void zpci_exit_slot(struct zpci_dev *zdev)
>                         continue;
>                 list_del(&slot->slot_list);
>                 pci_hp_deregister(slot->hotplug_slot);
> +               kfree(slot->hotplug_slot->info);
> +               kfree(slot->hotplug_slot);
> +               kfree(slot);
>         }
>  }
> diff --git a/drivers/pci/hotplug/sgi_hotplug.c b/drivers/pci/hotplug/sgi_hotplug.c
> index 78b6bdbb3a39..babd23409f61 100644
> --- a/drivers/pci/hotplug/sgi_hotplug.c
> +++ b/drivers/pci/hotplug/sgi_hotplug.c
> @@ -628,7 +628,6 @@ static int sn_hotplug_slot_register(struct pci_bus *pci_bus)
>                         goto alloc_err;
>                 }
>                 bss_hotplug_slot->ops = &sn_hotplug_slot_ops;
> -               bss_hotplug_slot->release = &sn_release_slot;
>
>                 rc = pci_hp_register(bss_hotplug_slot, pci_bus, device, name);
>                 if (rc)
> @@ -656,8 +655,10 @@ static int sn_hotplug_slot_register(struct pci_bus *pci_bus)
>                 sn_release_slot(bss_hotplug_slot);
>
>         /* destroy anything else on the list */
> -       while ((bss_hotplug_slot = sn_hp_destroy()))
> +       while ((bss_hotplug_slot = sn_hp_destroy())) {
>                 pci_hp_deregister(bss_hotplug_slot);
> +               sn_release_slot(bss_hotplug_slot);
> +       }
>
>         return rc;
>  }
> @@ -703,8 +704,10 @@ static void __exit sn_pci_hotplug_exit(void)
>  {
>         struct hotplug_slot *bss_hotplug_slot;
>
> -       while ((bss_hotplug_slot = sn_hp_destroy()))
> +       while ((bss_hotplug_slot = sn_hp_destroy())) {
>                 pci_hp_deregister(bss_hotplug_slot);
> +               sn_release_slot(bss_hotplug_slot);
> +       }
>
>         if (!list_empty(&sn_hp_list))
>                 printk(KERN_ERR "%s: internal list is not empty\n", __FILE__);
> diff --git a/drivers/pci/hotplug/shpchp_core.c b/drivers/pci/hotplug/shpchp_core.c
> index e91be287f292..5990695d1737 100644
> --- a/drivers/pci/hotplug/shpchp_core.c
> +++ b/drivers/pci/hotplug/shpchp_core.c
> @@ -61,22 +61,6 @@ static struct hotplug_slot_ops shpchp_hotplug_slot_ops = {
>         .get_adapter_status =   get_adapter_status,
>  };
>
> -/**
> - * release_slot - free up the memory used by a slot
> - * @hotplug_slot: slot to free
> - */
> -static void release_slot(struct hotplug_slot *hotplug_slot)
> -{
> -       struct slot *slot = hotplug_slot->private;
> -
> -       ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
> -                __func__, slot_name(slot));
> -
> -       kfree(slot->hotplug_slot->info);
> -       kfree(slot->hotplug_slot);
> -       kfree(slot);
> -}
> -
>  static int init_slots(struct controller *ctrl)
>  {
>         struct slot *slot;
> @@ -125,7 +109,6 @@ static int init_slots(struct controller *ctrl)
>
>                 /* register this slot with the hotplug pci core */
>                 hotplug_slot->private = slot;
> -               hotplug_slot->release = &release_slot;
>                 snprintf(name, SLOT_NAME_SIZE, "%d", slot->number);
>                 hotplug_slot->ops = &shpchp_hotplug_slot_ops;
>
> @@ -171,6 +154,9 @@ void cleanup_slots(struct controller *ctrl)
>                 cancel_delayed_work(&slot->work);
>                 destroy_workqueue(slot->wq);
>                 pci_hp_deregister(slot->hotplug_slot);
> +               kfree(slot->hotplug_slot->info);
> +               kfree(slot->hotplug_slot);
> +               kfree(slot);
>         }
>  }
>
> diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
> index 3d523ca64694..d67f32a29bb4 100644
> --- a/drivers/platform/x86/asus-wmi.c
> +++ b/drivers/platform/x86/asus-wmi.c
> @@ -858,12 +858,6 @@ static int asus_get_adapter_status(struct hotplug_slot *hotplug_slot,
>         return 0;
>  }
>
> -static void asus_cleanup_pci_hotplug(struct hotplug_slot *hotplug_slot)
> -{
> -       kfree(hotplug_slot->info);
> -       kfree(hotplug_slot);
> -}
> -
>  static struct hotplug_slot_ops asus_hotplug_slot_ops = {
>         .owner = THIS_MODULE,
>         .get_adapter_status = asus_get_adapter_status,
> @@ -905,7 +899,6 @@ static int asus_setup_pci_hotplug(struct asus_wmi *asus)
>                 goto error_info;
>
>         asus->hotplug_slot->private = asus;
> -       asus->hotplug_slot->release = &asus_cleanup_pci_hotplug;
>         asus->hotplug_slot->ops = &asus_hotplug_slot_ops;
>         asus_get_adapter_status(asus->hotplug_slot,
>                                 &asus->hotplug_slot->info->adapter_status);
> @@ -1051,8 +1044,11 @@ static void asus_wmi_rfkill_exit(struct asus_wmi *asus)
>          * asus_unregister_rfkill_notifier()
>          */
>         asus_rfkill_hotplug(asus);
> -       if (asus->hotplug_slot)
> +       if (asus->hotplug_slot) {
>                 pci_hp_deregister(asus->hotplug_slot);
> +               kfree(asus->hotplug_slot->info);
> +               kfree(asus->hotplug_slot);
> +       }
>         if (asus->hotplug_workqueue)
>                 destroy_workqueue(asus->hotplug_workqueue);
>
> diff --git a/drivers/platform/x86/eeepc-laptop.c b/drivers/platform/x86/eeepc-laptop.c
> index 4c38904a8a32..a4bbf6ecd1f0 100644
> --- a/drivers/platform/x86/eeepc-laptop.c
> +++ b/drivers/platform/x86/eeepc-laptop.c
> @@ -726,12 +726,6 @@ static int eeepc_get_adapter_status(struct hotplug_slot *hotplug_slot,
>         return 0;
>  }
>
> -static void eeepc_cleanup_pci_hotplug(struct hotplug_slot *hotplug_slot)
> -{
> -       kfree(hotplug_slot->info);
> -       kfree(hotplug_slot);
> -}
> -
>  static struct hotplug_slot_ops eeepc_hotplug_slot_ops = {
>         .owner = THIS_MODULE,
>         .get_adapter_status = eeepc_get_adapter_status,
> @@ -758,7 +752,6 @@ static int eeepc_setup_pci_hotplug(struct eeepc_laptop *eeepc)
>                 goto error_info;
>
>         eeepc->hotplug_slot->private = eeepc;
> -       eeepc->hotplug_slot->release = &eeepc_cleanup_pci_hotplug;
>         eeepc->hotplug_slot->ops = &eeepc_hotplug_slot_ops;
>         eeepc_get_adapter_status(eeepc->hotplug_slot,
>                                  &eeepc->hotplug_slot->info->adapter_status);
> @@ -837,8 +830,11 @@ static void eeepc_rfkill_exit(struct eeepc_laptop *eeepc)
>                 eeepc->wlan_rfkill = NULL;
>         }
>
> -       if (eeepc->hotplug_slot)
> +       if (eeepc->hotplug_slot) {
>                 pci_hp_deregister(eeepc->hotplug_slot);
> +               kfree(eeepc->hotplug_slot->info);
> +               kfree(eeepc->hotplug_slot);
> +       }
>
>         if (eeepc->bluetooth_rfkill) {
>                 rfkill_unregister(eeepc->bluetooth_rfkill);
> diff --git a/include/linux/pci_hotplug.h b/include/linux/pci_hotplug.h
> index cf5e22103f68..a6d6650a0490 100644
> --- a/include/linux/pci_hotplug.h
> +++ b/include/linux/pci_hotplug.h
> @@ -80,15 +80,12 @@ struct hotplug_slot_info {
>   * @ops: pointer to the &struct hotplug_slot_ops to be used for this slot
>   * @info: pointer to the &struct hotplug_slot_info for the initial values for
>   * this slot.
> - * @release: called during pci_hp_deregister to free memory allocated in a
> - * hotplug_slot structure.
>   * @private: used by the hotplug pci controller driver to store whatever it
>   * needs.
>   */
>  struct hotplug_slot {
>         struct hotplug_slot_ops         *ops;
>         struct hotplug_slot_info        *info;
> -       void (*release) (struct hotplug_slot *slot);
>         void                            *private;
>
>         /* Variables below this are for use only by the hotplug pci core. */
> @@ -104,13 +101,23 @@ static inline const char *hotplug_slot_name(const struct hotplug_slot *slot)
>  int __pci_hp_register(struct hotplug_slot *slot, struct pci_bus *pbus, int nr,
>                       const char *name, struct module *owner,
>                       const char *mod_name);
> -int pci_hp_deregister(struct hotplug_slot *slot);
> +int __pci_hp_initialize(struct hotplug_slot *slot, struct pci_bus *bus, int nr,
> +                       const char *name, struct module *owner,
> +                       const char *mod_name);
> +int pci_hp_add(struct hotplug_slot *slot);
> +
> +void pci_hp_del(struct hotplug_slot *slot);
> +void pci_hp_destroy(struct hotplug_slot *slot);
> +void pci_hp_deregister(struct hotplug_slot *slot);
> +
>  int __must_check pci_hp_change_slot_info(struct hotplug_slot *slot,
>                                          struct hotplug_slot_info *info);
>
>  /* use a define to avoid include chaining to get THIS_MODULE & friends */
>  #define pci_hp_register(slot, pbus, devnr, name) \
>         __pci_hp_register(slot, pbus, devnr, name, THIS_MODULE, KBUILD_MODNAME)
> +#define pci_hp_initialize(slot, bus, nr, name) \
> +       __pci_hp_initialize(slot, bus, nr, name, THIS_MODULE, KBUILD_MODNAME)
>
>  /* PCI Setting Record (Type 0) */
>  struct hpp_type0 {
> --
> 2.17.1
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/32] PCI: pciehp: Convert to threaded IRQ
  2018-06-16 19:25 ` [PATCH 09/32] PCI: pciehp: Convert to threaded IRQ Lukas Wunner
@ 2018-06-19 23:16   ` Keith Busch
  2018-06-20 11:01     ` Lukas Wunner
  0 siblings, 1 reply; 66+ messages in thread
From: Keith Busch @ 2018-06-19 23:16 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Wysocki, Rafael J, Raj, Ashok,
	Yinghai Lu, Sinan Kaya, linux-pci, Thomas Gleixner, Patel,
	Mayurkumar, Kenji Kaneshige

On Sat, Jun 16, 2018 at 12:25:00PM -0700, Lukas Wunner wrote:
> pciehp's IRQ handler queues up a work item for each event signaled by
> the hardware.  A more modern alternative is to let a long running
> kthread service the events.  The IRQ handler's sole job is then to check
> whether the IRQ originated from the device in question, acknowledge its
> receipt to the hardware to quiesce the interrupt and wake up the kthread.
> 
> One benefit is reduced latency to handle the IRQ, which is a necessity
> for realtime environments.  Another benefit is that we can make pciehp
> simpler and more robust by handling events synchronously in process
> context, rather than asynchronously by queueing up work items.  pciehp's
> usage of work items is a historic artifact, it predates the introduction
> of threaded IRQ handlers by two years.  (The former was introduced in
> 2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
> latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
> handler support").)
> 
> Convert pciehp to threaded IRQ handling by retrieving the pending events
> in pciehp_isr(), saving them for later consumption by the thread handler
> pciehp_ist() and clearing them in the Slot Status register.
> 
> By clearing the Slot Status (and thereby acknowledging the events) in
> pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
> would have the unpleasant side effect of starving devices sharing the
> IRQ until pciehp_ist() has finished.
> 
> pciehp_isr() does not count how many times each event occurred, but
> merely records the fact *that* an event occurred.  If the same event
> occurs a second time before pciehp_ist() is woken, that second event
> will not be recorded separately, which is problematic according to
> commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
> looking for new ones") because we may miss removal of a card in-between
> two back-to-back insertions.  We're about to make pciehp_ist() resilient
> to missed events.  The present commit regresses the driver's behavior
> temporarily in order to separate the changes into reviewable chunks.
> This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
> operations that happen in a timespan shorter than wakeup of the IRQ
> thread.

I like the series over-all. Definitely an improvement.

I am a little concered about what may happen if we need to remove the
bridge while its irq thread is running. The task removing the bridge
is holding the pci_rescan_remove_lock so when it tries to free the
bridge IRQ, the IRQ subsystem may not be able to progress because the
action->thread may be waiting to take the same lock.

It actually looks like the same deadlock already exists in the current
implementation when it takes down its workqueue, but it's a lot harder to
follow all the different work tasks before this clean-up. Maybe removing
bridges isn't very common, but it's just something I noticed.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/32] PCI: pciehp: Convert to threaded IRQ
  2018-06-19 23:16   ` Keith Busch
@ 2018-06-20 11:01     ` Lukas Wunner
  0 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-06-20 11:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bjorn Helgaas, Mika Westerberg, Wysocki, Rafael J, Raj, Ashok,
	Yinghai Lu, Sinan Kaya, linux-pci, Thomas Gleixner, Patel,
	Mayurkumar, Kenji Kaneshige, Xiongfeng Wang

[+cc Xiongfeng Wang]

On Tue, Jun 19, 2018 at 05:16:46PM -0600, Keith Busch wrote:
> I am a little concered about what may happen if we need to remove the
> bridge while its irq thread is running. The task removing the bridge
> is holding the pci_rescan_remove_lock so when it tries to free the
> bridge IRQ, the IRQ subsystem may not be able to progress because the
> action->thread may be waiting to take the same lock.
> 
> It actually looks like the same deadlock already exists in the current
> implementation when it takes down its workqueue, but it's a lot harder to
> follow all the different work tasks before this clean-up. Maybe removing
> bridges isn't very common, but it's just something I noticed.

In patch [03/32], "PCI: pciehp: Fix deadlock on unplug", I've fixed
this deadlock in case the lock is contended by two pciehp instances.

But when browsing patchwork yesterday, I came across Xiongfeng Wang's
patch "pciehp: fix a race between pciehp and removing operations by sysfs".
It deals with the same deadlock, but the contenders for the lock are a
pciehp instance and a sysfs "remove" request:
https://patchwork.ozlabs.org/patch/877835/

We need a generic solution which works regardless of the contenders'
type, so I'm withdrawing patch [03/32] and I'll try to come up with
something better.

It seems this is about the hierarchy, we need to prevent that the lock
is acquired to remove a device which is a child of another device which
is already being removed, wherefore the lock is currently held.  An idea
would be to change the API such that a struct pci_dev pointer is passed
in to pci_lock_rescan_remove().  That's the device being removed and
for which the lock is handed out.  If the lock is requested for a child,
the request is denied.  The invocation would thus look like this:

 void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev)
 {
-	pci_lock_rescan_remove();
+	if (!pci_trylock_rescan_remove(dev));
+		return;
 	pci_stop_and_remove_bus_device(dev);
 	pci_unlock_rescan_remove();
 }

Thoughts?

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3
  2018-06-16 19:25 ` [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3 Lukas Wunner
@ 2018-06-21 11:13   ` Mika Westerberg
  2018-07-18 19:30     ` Lukas Wunner
  0 siblings, 1 reply; 66+ messages in thread
From: Mika Westerberg @ 2018-06-21 11:13 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Andreas Noever

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> +		if (bridge->is_thunderbolt)
> +			return true;

I have a small concern here. In PC world native PCIe hotplug used with
Thunderbolt comes in two flavors:

  - Native PCI hotplug without runtime PM
  - Native PCI hotplug with runtime PM

The former works so that even if it uses native PCIe hotplug, the power
management is done so that the Thunderbolt host router is hot-removed
when there is nothing connected (and that brings the power savings).
With the above change we start putting all Thunderbolt PCIe hotplug
ports to D3 runtime. While this probably works (and I tested it on the
same Dell) I don't think Windows does it and it may lead to a path that
has not been tested very thoroughly by OEMs.

So I wonder if it makes sense to restrict this particular check to Apple
hardware at this point?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 17/32] PCI: pciehp: Enable/disable exclusively from IRQ thread
  2018-06-16 19:25 ` [PATCH 17/32] PCI: pciehp: Enable/disable exclusively from IRQ thread Lukas Wunner
@ 2018-06-21 11:58   ` Mika Westerberg
  0 siblings, 0 replies; 66+ messages in thread
From: Mika Westerberg @ 2018-06-21 11:58 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> @@ -240,13 +240,19 @@ static int pciehp_probe(struct pcie_device *dev)
>  	}
>  
>  	/* Check if slot is occupied */
> +	mutex_lock(&slot->lock);
>  	pciehp_get_adapter_status(slot, &occupied);
>  	pciehp_get_power_status(slot, &poweron);
> -	if (occupied && pciehp_force)
> -		pciehp_enable_slot(slot);
> +	if (pciehp_force &&
> +	    ((occupied && (slot->state == OFF_STATE ||
> +			   slot->state == BLINKINGON_STATE)) ||
> +	     (!occupied && (slot->state == ON_STATE ||
> +			    slot->state == BLINKINGOFF_STATE))))
> +		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);

This...

>  	/* If empty slot's power status is on, turn power off */
>  	if (!occupied && poweron && POWER_CTRL(ctrl))
>  		pciehp_power_off_slot(slot);
> +	mutex_unlock(&slot->lock);
>  
>  	return 0;
>  
> @@ -290,10 +296,14 @@ static int pciehp_resume(struct pcie_device *dev)
>  
>  	/* Check if slot is occupied */
>  	pciehp_get_adapter_status(slot, &status);
> -	if (status)
> -		pciehp_enable_slot(slot);
> -	else
> -		pciehp_disable_slot(slot);
> +	mutex_lock(&slot->lock);
> +	if ((status && (slot->state == OFF_STATE ||
> +			slot->state == BLINKINGON_STATE)) ||
> +	    (!status && (slot->state == ON_STATE ||
> +			 slot->state == BLINKINGOFF_STATE)))
> +		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);

... and this look the same (I think there are other places as well).
Perhaps you could factor this to a helper function?


> +	mutex_unlock(&slot->lock);
> +
>  	return 0;
>  }
>  #endif /* PM */
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 627e846df802..70bad847a450 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -122,22 +122,26 @@ static void remove_board(struct slot *p_slot)
>  	pciehp_green_led_off(p_slot);
>  }
>  
> +void pciehp_request(struct controller *ctrl, int action)
> +{
> +	atomic_or(action, &ctrl->pending_events);
> +	if (!pciehp_poll_mode)
> +		irq_wake_thread(ctrl->pcie->irq, ctrl);
> +}
> +
>  void pciehp_queue_pushbutton_work(struct work_struct *work)
>  {
>  	struct slot *p_slot = container_of(work, struct slot, work.work);
> +	struct controller *ctrl = p_slot->ctrl;
>  
>  	mutex_lock(&p_slot->lock);
>  	switch (p_slot->state) {
>  	case BLINKINGOFF_STATE:
> -		p_slot->state = POWEROFF_STATE;
> -		mutex_unlock(&p_slot->lock);
> -		pciehp_disable_slot(p_slot);
> -		return;
> +		pciehp_request(ctrl, DISABLE_SLOT);
> +		break;
>  	case BLINKINGON_STATE:
> -		p_slot->state = POWERON_STATE;
> -		mutex_unlock(&p_slot->lock);
> -		pciehp_enable_slot(p_slot);
> -		return;
> +		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
> +		break;
>  	default:
>  		break;
>  	}
> @@ -186,16 +190,6 @@ void pciehp_handle_button_press(struct slot *p_slot)
>  		ctrl_info(ctrl, "Slot(%s): Action canceled due to button press\n",
>  			  slot_name(p_slot));
>  		break;
> -	case POWEROFF_STATE:
> -	case POWERON_STATE:
> -		/*
> -		 * Ignore if the slot is on power-on or power-off state;
> -		 * this means that the previous attention button action
> -		 * to hot-add or hot-remove is undergoing
> -		 */
> -		ctrl_info(ctrl, "Slot(%s): Button ignored\n",
> -			  slot_name(p_slot));
> -		break;
>  	default:
>  		ctrl_err(ctrl, "Slot(%s): Ignoring invalid state %#x\n",
>  			 slot_name(p_slot), p_slot->state);
> @@ -204,6 +198,22 @@ void pciehp_handle_button_press(struct slot *p_slot)
>  	mutex_unlock(&p_slot->lock);
>  }
>  
> +void pciehp_handle_disable_request(struct slot *slot)
> +{
> +	struct controller *ctrl = slot->ctrl;
> +
> +	mutex_lock(&slot->lock);
> +	switch (slot->state) {
> +	case BLINKINGON_STATE:
> +	case BLINKINGOFF_STATE:
> +		cancel_delayed_work(&slot->work);

Missing break here (even though it does nothing) but if someone extends
this in future it may cause problems.

> +	}
> +	slot->state = POWEROFF_STATE;
> +	mutex_unlock(&slot->lock);
> +
> +	ctrl->request_result = pciehp_disable_slot(slot);
> +}
> +
>  void pciehp_handle_link_change(struct slot *p_slot)
>  {
>  	struct controller *ctrl = p_slot->ctrl;
> @@ -232,32 +242,6 @@ void pciehp_handle_link_change(struct slot *p_slot)
>  		}
>  		return;
>  		break;
> -	case POWERON_STATE:
> -		if (link_active) {
> -			ctrl_info(ctrl, "Slot(%s): Link Up event ignored; already powering on\n",
> -				  slot_name(p_slot));
> -		} else {
> -			p_slot->state = POWEROFF_STATE;
> -			mutex_unlock(&p_slot->lock);
> -			ctrl_info(ctrl, "Slot(%s): Link Down event queued; currently getting powered on\n",
> -				  slot_name(p_slot));
> -			pciehp_disable_slot(p_slot);
> -			return;
> -		}
> -		break;
> -	case POWEROFF_STATE:
> -		if (link_active) {
> -			p_slot->state = POWERON_STATE;
> -			mutex_unlock(&p_slot->lock);
> -			ctrl_info(ctrl, "Slot(%s): Link Up event queued; currently getting powered off\n",
> -				  slot_name(p_slot));
> -			pciehp_enable_slot(p_slot);
> -			return;
> -		} else {
> -			ctrl_info(ctrl, "Slot(%s): Link Down event ignored; already powering off\n",
> -				  slot_name(p_slot));
> -		}
> -		break;
>  	default:
>  		ctrl_err(ctrl, "Slot(%s): Ignoring invalid state %#x\n",
>  			 slot_name(p_slot), p_slot->state);
> @@ -272,6 +256,12 @@ void pciehp_handle_presence_change(struct slot *slot)
>  	u8 present;
>  
>  	mutex_lock(&slot->lock);
> +	switch (slot->state) {
> +	case BLINKINGON_STATE:
> +	case BLINKINGOFF_STATE:
> +		cancel_delayed_work(&slot->work);

Ditto.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 23/32] PCI: pciehp: Avoid slot access during reset
  2018-06-16 19:25 ` [PATCH 23/32] PCI: pciehp: Avoid slot access during reset Lukas Wunner
@ 2018-06-21 12:06   ` Mika Westerberg
  2018-06-22  9:23     ` Lukas Wunner
  0 siblings, 1 reply; 66+ messages in thread
From: Mika Westerberg @ 2018-06-21 12:06 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Rajat Jain, Alex Williamson

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
>  struct controller {
>  	struct mutex ctrl_lock;
>  	struct pcie_device *pcie;
> +	struct rw_semaphore reset_lock;

Is there any particular reason why you can't use a regular mutex here?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (31 preceding siblings ...)
  2018-06-16 19:25 ` [PATCH 27/32] PCI: pciehp: Support interrupts sent from D3hot Lukas Wunner
@ 2018-06-21 12:19 ` Mika Westerberg
  2018-06-27 13:35   ` Patel, Mayurkumar
  2018-07-12 22:28 ` Bjorn Helgaas
  2018-07-16 14:20 ` Bjorn Helgaas
  34 siblings, 1 reply; 66+ messages in thread
From: Mika Westerberg @ 2018-06-21 12:19 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Mayurkumar Patel, Kenji Kaneshige, Stefan Roese,
	Rajat Jain, Alex Williamson, Andreas Noever

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
> on and off synchronously in the IRQ thread, no indirection via a work
> queue anymore.
> 
> When the slot is enabled/disabled by the user via sysfs or an Attention
> Button press, a request is sent to the IRQ thread.  The IRQ thread is
> thus the sole entity enabling/disabling the slot.
> 
> The IRQ thread can cope with missed events, e.g. if a card is inserted
> and immediately pulled out before the IRQ thread had a chance to react.
> It also tolerates an initially unstable link as observed in the wild by
> Stefan Roese.
> 
> Finally, runtime PM support is added.  This was the original motivation
> of the series because runtime suspending hotplug ports is needed to power
> down Thunderbolt controllers on idle, which saves ~1.5W per controller.
> Runtime resuming ports takes tenths of milliseconds during which events
> may be missed, this in turn necessitated the event handling rework.
> 
> I've pushed the series to GitHub to ease reviewing/fetching:
> https://github.com/l1k/linux/commits/pciehp_runpm_v2
> 
> Please review and test.

I've tested this series on every native PCIe hotplug system I have here,
including one non-Thunderbolt (just hotplugging NVMes) and I did not
observe any issues. There are few minor comments for few patches but in
general this is really nice cleanup.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 23/32] PCI: pciehp: Avoid slot access during reset
  2018-06-21 12:06   ` Mika Westerberg
@ 2018-06-22  9:23     ` Lukas Wunner
  2018-06-25 13:10       ` Mika Westerberg
  0 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-06-22  9:23 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Rajat Jain, Alex Williamson

On Thu, Jun 21, 2018 at 03:06:00PM +0300, Mika Westerberg wrote:
> On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> >  struct controller {
> >  	struct mutex ctrl_lock;
> >  	struct pcie_device *pcie;
> > +	struct rw_semaphore reset_lock;
> 
> Is there any particular reason why you can't use a regular mutex here?

It would unnecessarily serialize pciehp_probe() against pciehp_ist()
even though it's legal for the two to run in parallel.

I only want to serialize pciehp_reset_slot() with the rest of the driver,
more specifically those portions accessing the Presence Detect State bit
in the Slot Status register and the Data Link Layer Link Active bit in
the Link Status register.  And the only functions remaining that access
those bits are pciehp_probe() and pciehp_ist().  The tool to achieve that
kind of serialization is an rw_semaphore.

That said, I don't know for sure that the two above-mentioned bits flap
on performing a slot reset.  But I highly suspect it, so the code looked
unsafe.  It would be good if Rajat Jain and Alex Williamson, who
introduced pciehp_reset_slot(), could voice their opinion as to the
patch's necessity.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 23/32] PCI: pciehp: Avoid slot access during reset
  2018-06-22  9:23     ` Lukas Wunner
@ 2018-06-25 13:10       ` Mika Westerberg
  0 siblings, 0 replies; 66+ messages in thread
From: Mika Westerberg @ 2018-06-25 13:10 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Rajat Jain, Alex Williamson

On Fri, Jun 22, 2018 at 11:23:19AM +0200, Lukas Wunner wrote:
> On Thu, Jun 21, 2018 at 03:06:00PM +0300, Mika Westerberg wrote:
> > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > >  struct controller {
> > >  	struct mutex ctrl_lock;
> > >  	struct pcie_device *pcie;
> > > +	struct rw_semaphore reset_lock;
> > 
> > Is there any particular reason why you can't use a regular mutex here?
> 
> It would unnecessarily serialize pciehp_probe() against pciehp_ist()
> even though it's legal for the two to run in parallel.
> 
> I only want to serialize pciehp_reset_slot() with the rest of the driver,
> more specifically those portions accessing the Presence Detect State bit
> in the Slot Status register and the Data Link Layer Link Active bit in
> the Link Status register.  And the only functions remaining that access
> those bits are pciehp_probe() and pciehp_ist().  The tool to achieve that
> kind of serialization is an rw_semaphore.

In that case I think it may be good idea to put the above to the
changelog so it is clear why rw_semaphore is used here.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-06-21 12:19 ` [PATCH 00/32] Rework pciehp event handling & add runtime PM Mika Westerberg
@ 2018-06-27 13:35   ` Patel, Mayurkumar
  0 siblings, 0 replies; 66+ messages in thread
From: Patel, Mayurkumar @ 2018-06-27 13:35 UTC (permalink / raw)
  To: Mika Westerberg, Lukas Wunner
  Cc: Bjorn Helgaas, Wysocki, Rafael J, Raj, Ashok, Busch, Keith,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Kenji Kaneshige, Stefan Roese, Rajat Jain,
	Alex Williamson, Andreas Noever



>-----Original Message-----
>From: Mika Westerberg [mailto:mika.westerberg@linux.intel.com]
>Sent: Thursday, June 21, 2018 2:20 PM
>To: Lukas Wunner <lukas@wunner.de>
>Cc: Bjorn Helgaas <bhelgaas@google.com>; Wysocki, Rafael J <rafael.j.wysocki@intel.com>; Raj, Ashok <ashok.raj@intel.com>; Busch,
>Keith <keith.busch@intel.com>; Yinghai Lu <yinghai@kernel.org>; Sinan Kaya <okaya@kernel.org>; linux-pci@vger.kernel.org; Greg Kroah-
>Hartman <greg@kroah.com>; Thomas Gleixner <tglx@linutronix.de>; Patel, Mayurkumar <mayurkumar.patel@intel.com>; Kenji Kaneshige
><kaneshige.kenji@jp.fujitsu.com>; Stefan Roese <sr@denx.de>; Rajat Jain <rajatja@google.com>; Alex Williamson
><alex.williamson@redhat.com>; Andreas Noever <andreas.noever@gmail.com>
>Subject: Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
>
>On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
>> Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
>> on and off synchronously in the IRQ thread, no indirection via a work
>> queue anymore.
>>
>> When the slot is enabled/disabled by the user via sysfs or an Attention
>> Button press, a request is sent to the IRQ thread.  The IRQ thread is
>> thus the sole entity enabling/disabling the slot.
>>
>> The IRQ thread can cope with missed events, e.g. if a card is inserted
>> and immediately pulled out before the IRQ thread had a chance to react.
>> It also tolerates an initially unstable link as observed in the wild by
>> Stefan Roese.
>>
>> Finally, runtime PM support is added.  This was the original motivation
>> of the series because runtime suspending hotplug ports is needed to power
>> down Thunderbolt controllers on idle, which saves ~1.5W per controller.
>> Runtime resuming ports takes tenths of milliseconds during which events
>> may be missed, this in turn necessitated the event handling rework.
>>
>> I've pushed the series to GitHub to ease reviewing/fetching:
>> https://github.com/l1k/linux/commits/pciehp_runpm_v2
>>
>> Please review and test.
>
>I've tested this series on every native PCIe hotplug system I have here,
>including one non-Thunderbolt (just hotplugging NVMes) and I did not
>observe any issues. There are few minor comments for few patches but in
>general this is really nice cleanup.


I have not found any issues with this test series on my test setup related to PCIe hotplug.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/32] genirq: Synchronize only with single thread on free_irq()
  2018-06-16 19:25 ` [PATCH 08/32] genirq: Synchronize only with single thread on free_irq() Lukas Wunner
@ 2018-07-12 22:21   ` Bjorn Helgaas
  2018-07-13  7:21     ` Lukas Wunner
  0 siblings, 1 reply; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-12 22:21 UTC (permalink / raw)
  To: Lukas Wunner, Thomas Gleixner
  Cc: Bjorn Helgaas, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci, linux-kernel

[+cc linux-kernel]

I assume this would need to be merged along with the rest of the
series, which should probably go through the PCI tree, but I'm
definitely not qualified to review this IRQ patch.  And it would need
an ack from Thomas in any case.

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> When pciehp is converted to threaded IRQ handling, removal of unplugged
> devices below a PCIe hotplug port happens synchronously in the IRQ
> thread.  Removal of devices typically entails a call to free_irq() by
> their drivers.
> 
> If those devices share their IRQ with the hotplug port, free_irq()
> deadlocks because it calls synchronize_irq() to wait for all hard IRQ
> handlers as well as all threads sharing the IRQ to finish.
> 
> Actually it's sufficient to wait only for the IRQ thread of the removed
> device, so call synchronize_hardirq() to wait for all hard IRQ handlers
> to finish, but no longer for any threads.  Compensate by rearranging the
> control flow in irq_wait_for_interrupt() such that the device's thread
> is allowed to run one last time after kthread_stop() has been called.
> 
> Stack trace for posterity:
>     INFO: task irq/17-pciehp:94 blocked for more than 120 seconds.
>     schedule+0x28/0x80
>     synchronize_irq+0x6e/0xa0
>     __free_irq+0x15a/0x2b0
>     free_irq+0x33/0x70
>     pciehp_release_ctrl+0x98/0xb0
>     pcie_port_remove_service+0x2f/0x40
>     device_release_driver_internal+0x157/0x220
>     bus_remove_device+0xe2/0x150
>     device_del+0x124/0x340
>     device_unregister+0x16/0x60
>     remove_iter+0x1a/0x20
>     device_for_each_child+0x4b/0x90
>     pcie_port_device_remove+0x1e/0x30
>     pci_device_remove+0x36/0xb0
>     device_release_driver_internal+0x157/0x220
>     pci_stop_bus_device+0x7d/0xa0
>     pci_stop_bus_device+0x3d/0xa0
>     pci_stop_and_remove_bus_device+0xe/0x20
>     pciehp_unconfigure_device+0xb8/0x160
>     pciehp_disable_slot+0x84/0x130
>     pciehp_ist+0x158/0x190
>     irq_thread_fn+0x1b/0x50
>     irq_thread+0x143/0x1a0
>     kthread+0x111/0x130
> 
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  kernel/irq/manage.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index daeabd791d58..0531f645bfea 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -790,9 +790,19 @@ static irqreturn_t irq_forced_secondary_handler(int irq, void *dev_id)
>  
>  static int irq_wait_for_interrupt(struct irqaction *action)
>  {
> -	set_current_state(TASK_INTERRUPTIBLE);
> +	for (;;) {
> +		set_current_state(TASK_INTERRUPTIBLE);
>  
> -	while (!kthread_should_stop()) {
> +		if (kthread_should_stop()) {
> +			/* may need to run one last time. */
> +			if (test_and_clear_bit(IRQTF_RUNTHREAD,
> +					       &action->thread_flags)) {
> +				__set_current_state(TASK_RUNNING);
> +				return 0;
> +			}
> +			__set_current_state(TASK_RUNNING);
> +			return -1;
> +		}
>  
>  		if (test_and_clear_bit(IRQTF_RUNTHREAD,
>  				       &action->thread_flags)) {
> @@ -800,10 +810,7 @@ static int irq_wait_for_interrupt(struct irqaction *action)
>  			return 0;
>  		}
>  		schedule();
> -		set_current_state(TASK_INTERRUPTIBLE);
>  	}
> -	__set_current_state(TASK_RUNNING);
> -	return -1;
>  }
>  
>  /*
> @@ -1024,7 +1031,7 @@ static int irq_thread(void *data)
>  	/*
>  	 * This is the regular exit path. __free_irq() is stopping the
>  	 * thread via kthread_stop() after calling
> -	 * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the
> +	 * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the
>  	 * oneshot mask bit can be set. We cannot verify that as we
>  	 * cannot touch the oneshot mask at this point anymore as
>  	 * __setup_irq() might have given out currents thread_mask
> @@ -1629,7 +1636,7 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
>  	unregister_handler_proc(irq, action);
>  
>  	/* Make sure it's not being used on another CPU: */
> -	synchronize_irq(irq);
> +	synchronize_hardirq(irq);
>  
>  #ifdef CONFIG_DEBUG_SHIRQ
>  	/*
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (32 preceding siblings ...)
  2018-06-21 12:19 ` [PATCH 00/32] Rework pciehp event handling & add runtime PM Mika Westerberg
@ 2018-07-12 22:28 ` Bjorn Helgaas
  2018-07-13  7:54   ` Lukas Wunner
  2018-07-16 14:20 ` Bjorn Helgaas
  34 siblings, 1 reply; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-12 22:28 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci,
	Greg Kroah-Hartman, Thomas Gleixner, Mayurkumar Patel,
	Kenji Kaneshige, Stefan Roese, Rajat Jain, Alex Williamson,
	Andreas Noever

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
> on and off synchronously in the IRQ thread, no indirection via a work
> queue anymore.
> 
> When the slot is enabled/disabled by the user via sysfs or an Attention
> Button press, a request is sent to the IRQ thread.  The IRQ thread is
> thus the sole entity enabling/disabling the slot.
> 
> The IRQ thread can cope with missed events, e.g. if a card is inserted
> and immediately pulled out before the IRQ thread had a chance to react.
> It also tolerates an initially unstable link as observed in the wild by
> Stefan Roese.
> 
> Finally, runtime PM support is added.  This was the original motivation
> of the series because runtime suspending hotplug ports is needed to power
> down Thunderbolt controllers on idle, which saves ~1.5W per controller.
> Runtime resuming ports takes tenths of milliseconds during which events
> may be missed, this in turn necessitated the event handling rework.
> 
> I've pushed the series to GitHub to ease reviewing/fetching:
> https://github.com/l1k/linux/commits/pciehp_runpm_v2

These patches all showed up in random order in my inbox, I think
because they were sent with zero delay between them, which made them a
bit of a hassle to read and apply via email.  If you repost these (I
think I saw something about an issue with patch 3), can you use a
second or two of delay between them?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 27/32] PCI: pciehp: Support interrupts sent from D3hot
  2018-06-16 19:25 ` [PATCH 27/32] PCI: pciehp: Support interrupts sent from D3hot Lukas Wunner
@ 2018-07-12 23:03   ` Bjorn Helgaas
  0 siblings, 0 replies; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-12 23:03 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci, Thomas Gleixner

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> If a hotplug port is able to send an interrupt, one would naively assume
> that it is accessible at that moment.  After all, if its parent is in
> D3hot and the link to the hotplug port is thus down, how should an
> interrupt come through?
> 
> It turns out that assumption is wrong at least for Thunderbolt:  Even
> though its parents are in D3hot, a Thunderbolt hotplug port is able to
> signal interrupts.  Because the port's config space is inaccessible and
> resuming the parents may sleep, the hard IRQ handler has to defer
> runtime resuming the parents and reading the Slot Status register to the
> IRQ thread.
> 
> If the hotplug port uses a level-triggered INTx interrupt, it needs to
> be masked until the IRQ thread has cleared the signaled events.  For
> simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
> Note that if the interrupt is shared (which can only happen for INTx),
> other devices are starved from receiving interrupts until the IRQ thread
> is scheduled, has runtime resumed the hotplug port's parents and has
> read and cleared the Slot Status register.
> 
> That delay is dominated by the 10 ms D3hot->D0 transition time of each
> parent port.  The worst case is a Thunderbolt downstream port at the
> end of a daisy chain:  There may be up to six Thunderbolt controllers
> in-between it and the root port, each comprising an upstream and
> downstream port, plus its own upstream port.  That's 13 x 10 = 130 ms.
> Possible mitigations are polling the interrupt while it's disabled or
> reducing the d3_delay of Thunderbolt ports if possible.
> 
> Open code masking of the interrupt instead of requesting it with the
> IRQF_ONESHOT flag to minimize the period during which it is masked.
> (IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
> 
> PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
> associated form factor specification, a hotplug capable Downstream Port
> must support generation of a wakeup event (using the PME mechanism) on
> hotplug events that occur when the system is in a sleep state or the
> Port is in device state D1, D2, or D3Hot."
> 
> This would seem to imply that PME needs to be enabled on the hotplug
> port when it is runtime suspended.  pci_enable_wake() currently doesn't
> enable PME on bridges, it may be necessary to add an exemption for
> hotplug bridges there.  On "Light Ridge" Thunderbolt controllers, the
> PME_Status bit is not set when an interrupt occurs while the hotplug
> port is in D3hot, even if PME is enabled.  (I've tested this on a Mac
> and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
> negotiate_os_control(), modifying it to 1 didn't change the behavior.)
> 
> (Confusingly, section 6.7.3.4 also states that "PME and Hot-Plug Event
> interrupts (when both are implemented) always share the same MSI or
> MSI-X vector".  That would only seem to apply to Root Ports, however
> the section never mentions Root Ports, only Downstream Ports.)

FWIW, based on the "Downstream" entry in the "Terms and Acronyms"
section, I think the "Downstream Port" term would include both Root
Ports and Switch Downstream Ports:

  ... The Ports on a Switch that are not the Upstream Port are
  Downstream Ports. All Ports on a Root Complex are Downstream Ports.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/32] genirq: Synchronize only with single thread on free_irq()
  2018-07-12 22:21   ` Bjorn Helgaas
@ 2018-07-13  7:21     ` Lukas Wunner
  2018-07-13 11:44       ` Bjorn Helgaas
  2018-07-16 12:37       ` Bjorn Helgaas
  0 siblings, 2 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-07-13  7:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Gleixner, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci, linux-kernel

On Thu, Jul 12, 2018 at 05:21:09PM -0500, Bjorn Helgaas wrote:
> On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > When pciehp is converted to threaded IRQ handling, removal of unplugged
> > devices below a PCIe hotplug port happens synchronously in the IRQ
> > thread.  Removal of devices typically entails a call to free_irq() by
> > their drivers.
> > 
> > If those devices share their IRQ with the hotplug port, free_irq()
> > deadlocks because it calls synchronize_irq() to wait for all hard IRQ
> > handlers as well as all threads sharing the IRQ to finish.
> > 
> > Actually it's sufficient to wait only for the IRQ thread of the removed
> > device, so call synchronize_hardirq() to wait for all hard IRQ handlers
> > to finish, but no longer for any threads.  Compensate by rearranging the
> > control flow in irq_wait_for_interrupt() such that the device's thread
> > is allowed to run one last time after kthread_stop() has been called.
> 
> I assume this would need to be merged along with the rest of the
> series, which should probably go through the PCI tree, but I'm
> definitely not qualified to review this IRQ patch.  And it would need
> an ack from Thomas in any case.

A v2 of this patch has already been merged through the tip tree on June 24,
it's in linux-next as commit 519cc8652b3a, and ISTR that I marked this patch
either as "Obsoleted" or "Not Applicable" in pci-patchwork.  There was no
build-dependency of the succeeding patches in the series on this patch,
hence merging through a different tree was possible.

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-07-12 22:28 ` Bjorn Helgaas
@ 2018-07-13  7:54   ` Lukas Wunner
  2018-07-13 11:43     ` Bjorn Helgaas
  0 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-07-13  7:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Mayurkumar Patel, Kenji Kaneshige, Stefan Roese,
	Rajat Jain, Alex Williamson, Andreas Noever

On Thu, Jul 12, 2018 at 05:28:16PM -0500, Bjorn Helgaas wrote:
> On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
> > on and off synchronously in the IRQ thread, no indirection via a work
> > queue anymore.
> > 
> > When the slot is enabled/disabled by the user via sysfs or an Attention
> > Button press, a request is sent to the IRQ thread.  The IRQ thread is
> > thus the sole entity enabling/disabling the slot.
> > 
> > The IRQ thread can cope with missed events, e.g. if a card is inserted
> > and immediately pulled out before the IRQ thread had a chance to react.
> > It also tolerates an initially unstable link as observed in the wild by
> > Stefan Roese.
> > 
> > Finally, runtime PM support is added.  This was the original motivation
> > of the series because runtime suspending hotplug ports is needed to power
> > down Thunderbolt controllers on idle, which saves ~1.5W per controller.
> > Runtime resuming ports takes tenths of milliseconds during which events
> > may be missed, this in turn necessitated the event handling rework.
> > 
> > I've pushed the series to GitHub to ease reviewing/fetching:
> > https://github.com/l1k/linux/commits/pciehp_runpm_v2
> 
> These patches all showed up in random order in my inbox, I think
> because they were sent with zero delay between them, which made them a
> bit of a hassle to read and apply via email.

I habitually use a 20 second delay between each message, but I got
unlucky and the mail servers delivered them in an incorrect order.

Since you're using Mutt, you can work around this problem by first
limiting the displayed messages to just this series by pressing the
"l" key and typing:
~f "Lukas Wunner" ~d 16/06

This will only show messages with a From header matching my name and
a Date header of June 16.  (You may need to type 06/16 depending on
your locale.)  You can go back to seeing all messages by pressing "l"
again and typing "." or "~a".

Now press the "o" key and then the "s" key.  This sorts by subject,
so you've got the order that you need to apply patches.  I think the
default is to order by reverse-date-received, you can go back to that
by pressing "O" (shift-O) and then "r".

FWIW you can verify my claim that the messages were sent out 20 seconds
apart by looking at the bottom-most Received: header.  (Press "h" to
see the full header in Mutt.)  This is the one of patch [00/32]:

Received: from localhost (unknown [redacted])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by h08.hostsharing.net (Postfix) with ESMTPSA id 8479E603E110;
	Sat, 16 Jun 2018 21:25:00 +0200 (CEST)

And this is patch [01/32]:

Received: from localhost (unknown [redacted])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by h08.hostsharing.net (Postfix) with ESMTPSA id 25AE9603E110;
	Sat, 16 Jun 2018 21:25:21 +0200 (CEST)

It was actually 21 seconds here, I guess the TLS setup took a little
longer.  Each subsequent patch is likewise spaced apart by 20 secs.


> If you repost these (I think I saw something about an issue with
> patch 3), can you use a second or two of delay between them?

Would you like me to resend the whole series?  The ones at the
beginning of the series (with the exception of patch 3) are probably
uncontroversial, I was hoping that you could maybe apply some of them,
or comment on some of the patches in the series so that I can rework
them before reposting, thereby avoiding spamming everyone with identical
patches a second time.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-07-13  7:54   ` Lukas Wunner
@ 2018-07-13 11:43     ` Bjorn Helgaas
  0 siblings, 0 replies; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-13 11:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Mayurkumar Patel, Kenji Kaneshige, Stefan Roese,
	Rajat Jain, Alex Williamson, Andreas Noever

On Fri, Jul 13, 2018 at 09:54:49AM +0200, Lukas Wunner wrote:
> On Thu, Jul 12, 2018 at 05:28:16PM -0500, Bjorn Helgaas wrote:

> > These patches all showed up in random order in my inbox, I think
> > because they were sent with zero delay between them, which made them a
> > bit of a hassle to read and apply via email.
> 
> I habitually use a 20 second delay between each message, but I got
> unlucky and the mail servers delivered them in an incorrect order.

Strange, I wonder what happened.

> > If you repost these (I think I saw something about an issue with
> > patch 3), can you use a second or two of delay between them?
> 
> Would you like me to resend the whole series?  The ones at the
> beginning of the series (with the exception of patch 3) are probably
> uncontroversial, I was hoping that you could maybe apply some of them,
> or comment on some of the patches in the series so that I can rework
> them before reposting, thereby avoiding spamming everyone with identical
> patches a second time.

No need to resend for that, I had already sorted (but I didn't think to use
mutt sorting, so thanks for that!) and applied them on a test branch (see
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=pci/06-16-lukas-pciehp)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/32] genirq: Synchronize only with single thread on free_irq()
  2018-07-13  7:21     ` Lukas Wunner
@ 2018-07-13 11:44       ` Bjorn Helgaas
  2018-07-16 12:37       ` Bjorn Helgaas
  1 sibling, 0 replies; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-13 11:44 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Thomas Gleixner, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci, linux-kernel

On Fri, Jul 13, 2018 at 09:21:09AM +0200, Lukas Wunner wrote:
> On Thu, Jul 12, 2018 at 05:21:09PM -0500, Bjorn Helgaas wrote:
> > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > > When pciehp is converted to threaded IRQ handling, removal of unplugged
> > > devices below a PCIe hotplug port happens synchronously in the IRQ
> > > thread.  Removal of devices typically entails a call to free_irq() by
> > > their drivers.
> > > 
> > > If those devices share their IRQ with the hotplug port, free_irq()
> > > deadlocks because it calls synchronize_irq() to wait for all hard IRQ
> > > handlers as well as all threads sharing the IRQ to finish.
> > > 
> > > Actually it's sufficient to wait only for the IRQ thread of the removed
> > > device, so call synchronize_hardirq() to wait for all hard IRQ handlers
> > > to finish, but no longer for any threads.  Compensate by rearranging the
> > > control flow in irq_wait_for_interrupt() such that the device's thread
> > > is allowed to run one last time after kthread_stop() has been called.
> > 
> > I assume this would need to be merged along with the rest of the
> > series, which should probably go through the PCI tree, but I'm
> > definitely not qualified to review this IRQ patch.  And it would need
> > an ack from Thomas in any case.
> 
> A v2 of this patch has already been merged through the tip tree on June 24,
> it's in linux-next as commit 519cc8652b3a, and ISTR that I marked this patch
> either as "Obsoleted" or "Not Applicable" in pci-patchwork.  There was no
> build-dependency of the succeeding patches in the series on this patch,
> hence merging through a different tree was possible.

Perfect, thanks!

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/32] genirq: Synchronize only with single thread on free_irq()
  2018-07-13  7:21     ` Lukas Wunner
  2018-07-13 11:44       ` Bjorn Helgaas
@ 2018-07-16 12:37       ` Bjorn Helgaas
  2018-07-16 13:37         ` Lukas Wunner
  1 sibling, 1 reply; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-16 12:37 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Thomas Gleixner, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci, linux-kernel

On Fri, Jul 13, 2018 at 09:21:09AM +0200, Lukas Wunner wrote:
> On Thu, Jul 12, 2018 at 05:21:09PM -0500, Bjorn Helgaas wrote:
> > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > > When pciehp is converted to threaded IRQ handling, removal of unplugged
> > > devices below a PCIe hotplug port happens synchronously in the IRQ
> > > thread.  Removal of devices typically entails a call to free_irq() by
> > > their drivers.
> > > 
> > > If those devices share their IRQ with the hotplug port, free_irq()
> > > deadlocks because it calls synchronize_irq() to wait for all hard IRQ
> > > handlers as well as all threads sharing the IRQ to finish.
> > > 
> > > Actually it's sufficient to wait only for the IRQ thread of the removed
> > > device, so call synchronize_hardirq() to wait for all hard IRQ handlers
> > > to finish, but no longer for any threads.  Compensate by rearranging the
> > > control flow in irq_wait_for_interrupt() such that the device's thread
> > > is allowed to run one last time after kthread_stop() has been called.
> > 
> > I assume this would need to be merged along with the rest of the
> > series, which should probably go through the PCI tree, but I'm
> > definitely not qualified to review this IRQ patch.  And it would need
> > an ack from Thomas in any case.
> 
> A v2 of this patch has already been merged through the tip tree on June 24,
> it's in linux-next as commit 519cc8652b3a, and ISTR that I marked this patch
> either as "Obsoleted" or "Not Applicable" in pci-patchwork.  There was no
> build-dependency of the succeeding patches in the series on this patch,
> hence merging through a different tree was possible.

Great!  Do I need to make sure the tip tree is merged before the PCI
tree during the merge window?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core
  2018-06-17 16:44   ` Andy Shevchenko
@ 2018-07-16 12:46     ` Bjorn Helgaas
  2018-07-16 14:14       ` Andy Shevchenko
  0 siblings, 1 reply; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-16 12:46 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Lukas Wunner, Bjorn Helgaas, Mika Westerberg, Rafael J. Wysocki,
	ashok.raj, Keith Busch, Yinghai Lu, okaya, linux-pci, Len Brown,
	scott, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	gwshan, Sebastian Ott, gerald.schaefer, Corentin Chary,
	Darren Hart, Andy Shevchenko, ACPI Devel Maling List, linux-s390,
	Platform Driver, acpi4asus-user

On Sun, Jun 17, 2018 at 07:44:03PM +0300, Andy Shevchenko wrote:
> On Sat, Jun 16, 2018 at 10:37 PM Lukas Wunner <lukas@wunner.de> wrote:
> >
> > When a hotplug driver calls pci_hp_register(), all steps necessary for
> > registration are carried out in one go, including creation of a kobject
> > and addition to sysfs.  That's a problem for pciehp once it's converted
> > to enable/disable the slot exclusively from the IRQ thread:  The thread
> > needs to be spawned after creation of the kobject (because it uses the
> > kobject's name), but before addition to sysfs (because it will handle
> > enable/disable requests submitted via sysfs).
> >
> > pci_hp_deregister() does offer a ->release callback that's invoked
> > after deletion from sysfs and before destruction of the kobject.  But
> > because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
> > ->probe and ->remove code becomes asymmetric, which is error prone
> > as recently discovered use-after-free bugs in pciehp's ->remove hook
> > have shown.
> >
> > In a sense, this appears to be a case of the midlayer antipattern:
> >
> >    "The core thesis of the "midlayer mistake" is that midlayers are
> >     bad and should not exist.  That common functionality which it is
> >     so tempting to put in a midlayer should instead be provided as
> >     library routines which can [be] used, augmented, or ignored by
> >     each bottom level driver independently.  Thus every subsystem
> >     that supports multiple implementations (or drivers) should
> >     provide a very thin top layer which calls directly into the
> >     bottom layer drivers, and a rich library of support code that
> >     eases the implementation of those drivers.  This library is
> >     available to, but not forced upon, those drivers."
> >         --  Neil Brown (2009), https://lwn.net/Articles/336262/
> >
> > The presence of midlayer traits in the PCI hotplug core might be ascribed
> > to its age:  When it was introduced in February 2002, the blessings of a
> > library approach might not have been well known:
> > https://git.kernel.org/tglx/history/c/a8a2069f432c
> >
> > For comparison, the driver core does offer split functions for creating
> > a kobject (device_initialize()) and addition to sysfs (device_add()) as
> > an alternative to carrying out everything at once (device_register()).
> > This was introduced in October 2002:
> > https://git.kernel.org/tglx/history/c/8b290eb19962
> >
> > The odd ->release callback in the PCI hotplug core was added in 2003:
> > https://git.kernel.org/tglx/history/c/69f8d663b595
> >
> > Clearly, a library approach would not force every hotplug driver to
> > implement a ->release callback, but rather allow the driver to remove
> > the sysfs files, release its data structures and finally destroy the
> > kobject.  Alternatively, a driver may choose to remove everything with
> > pci_hp_deregister(), then release its data structures.
> >
> > To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
> > split-up version of pci_hp_register().  Likewise, offer pci_hp_del()
> > and pci_hp_destroy() as a split-up version of pci_hp_deregister().
> >
> > Eliminate the ->release callback and move its code into each driver's
> > teardown routine.
> >
> > Declare pci_hp_deregister() void, in keeping with the usual kernel
> > pattern that enablement can fail, but disablement cannot.  It only
> > returned an error if the caller passed in a NULL pointer or a slot which
> > has never or is no longer registered or is sharing its name with another
> > slot.  Those would be bugs, so WARN about them.  Few hotplug drivers
> > actually checked the return value and those that did only printed a
> > useless error message to dmesg.  Remove that.
> >
> > For most drivers the conversion was straightforward since it doesn't
> > matter whether the code in the ->release callback is executed before or
> > after destruction of the kobject.  But in the case of ibmphp, it was
> > unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
> > NULL needs to happen before the kobject is destroyed, so I erred on
> > the side of caution and ensured that the order stays the same.  Another
> > nontrivial case is pnv_php, I've found the list and kref logic difficult
> > to understand, however my impression was that it is safe to delete the
> > list element and drop the references until after the kobject is
> > destroyed.
> >
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Scott Murray <scott@spiteful.org>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
> > Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> > Cc: Corentin Chary <corentin.chary@gmail.com>
> > Cc: Darren Hart <dvhart@infradead.org>
> > Cc: Andy Shevchenko <andy@infradead.org>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/hotplug/acpiphp_core.c      |  22 +---
> >  drivers/pci/hotplug/cpci_hotplug_core.c |  14 +--
> >  drivers/pci/hotplug/cpqphp_core.c       |  16 +--
> >  drivers/pci/hotplug/ibmphp_core.c       |  15 ++-
> >  drivers/pci/hotplug/ibmphp_ebda.c       |  20 ----
> >  drivers/pci/hotplug/pci_hotplug_core.c  | 139 +++++++++++++++++-------
> >  drivers/pci/hotplug/pciehp_core.c       |  19 +---
> >  drivers/pci/hotplug/pcihp_skeleton.c    |  16 +--
> >  drivers/pci/hotplug/pnv_php.c           |   5 +-
> >  drivers/pci/hotplug/rpaphp_core.c       |   2 +-
> >  drivers/pci/hotplug/rpaphp_slot.c       |  13 +--
> >  drivers/pci/hotplug/s390_pci_hpc.c      |  13 +--
> >  drivers/pci/hotplug/sgi_hotplug.c       |   9 +-
> >  drivers/pci/hotplug/shpchp_core.c       |  20 +---
> 
> >  drivers/platform/x86/asus-wmi.c         |  12 +-
> >  drivers/platform/x86/eeepc-laptop.c     |  12 +-
> 
> So, as far as I can see these are just mechanical changes due to
> internal API change.
> Thus,
> 
> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> 
> for PDx86 bits only.

Hi Andy, I'm not sure what you intend by "PDx86".  I'm happy to add a note
so your ack looks like

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>  # drivers/platform/x86

or something similar.  Is that what you have in mind?

> >  include/linux/pci_hotplug.h             |  15 ++-
> >  17 files changed, 171 insertions(+), 191 deletions(-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/32] genirq: Synchronize only with single thread on free_irq()
  2018-07-16 12:37       ` Bjorn Helgaas
@ 2018-07-16 13:37         ` Lukas Wunner
  0 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-07-16 13:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Gleixner, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci, linux-kernel

On Mon, Jul 16, 2018 at 07:37:19AM -0500, Bjorn Helgaas wrote:
> On Fri, Jul 13, 2018 at 09:21:09AM +0200, Lukas Wunner wrote:
> > On Thu, Jul 12, 2018 at 05:21:09PM -0500, Bjorn Helgaas wrote:
> > > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > > > When pciehp is converted to threaded IRQ handling, removal of unplugged
> > > > devices below a PCIe hotplug port happens synchronously in the IRQ
> > > > thread.  Removal of devices typically entails a call to free_irq() by
> > > > their drivers.
> > > > 
> > > > If those devices share their IRQ with the hotplug port, free_irq()
> > > > deadlocks because it calls synchronize_irq() to wait for all hard IRQ
> > > > handlers as well as all threads sharing the IRQ to finish.
> > > > 
> > > > Actually it's sufficient to wait only for the IRQ thread of the removed
> > > > device, so call synchronize_hardirq() to wait for all hard IRQ handlers
> > > > to finish, but no longer for any threads.  Compensate by rearranging the
> > > > control flow in irq_wait_for_interrupt() such that the device's thread
> > > > is allowed to run one last time after kthread_stop() has been called.
> > > 
> > > I assume this would need to be merged along with the rest of the
> > > series, which should probably go through the PCI tree, but I'm
> > > definitely not qualified to review this IRQ patch.  And it would need
> > > an ack from Thomas in any case.
> > 
> > A v2 of this patch has already been merged through the tip tree on June 24,
> > it's in linux-next as commit 519cc8652b3a, and ISTR that I marked this patch
> > either as "Obsoleted" or "Not Applicable" in pci-patchwork.  There was no
> > build-dependency of the succeeding patches in the series on this patch,
> > hence merging through a different tree was possible.
> 
> Great!  Do I need to make sure the tip tree is merged before the PCI
> tree during the merge window?

In theory the tip tree would have to be merged beforehand, but in practice
the odds are extremely low that someone with an affected machine (primarily
a Mac introduced 2011/2012) happens to unplug a PCI device during a bisect
while running a kernel which has the pci tree merged but not the tip tree.
So I wouldn't worry about it.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core
  2018-07-16 12:46     ` Bjorn Helgaas
@ 2018-07-16 14:14       ` Andy Shevchenko
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Shevchenko @ 2018-07-16 14:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lukas Wunner, Bjorn Helgaas, Mika Westerberg, Rafael J. Wysocki,
	ashok.raj, Keith Busch, Yinghai Lu, okaya, linux-pci, Len Brown,
	scott, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	gwshan, Sebastian Ott, gerald.schaefer, Corentin Chary,
	Darren Hart, Andy Shevchenko, ACPI Devel Maling List, linux-s390,
	Platform Driver, acpi4asus-user

On Mon, Jul 16, 2018 at 3:46 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Sun, Jun 17, 2018 at 07:44:03PM +0300, Andy Shevchenko wrote:
>> On Sat, Jun 16, 2018 at 10:37 PM Lukas Wunner <lukas@wunner.de> wrote:

>> Thus,
>>
>> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
>>
>> for PDx86 bits only.
>
> Hi Andy, I'm not sure what you intend by "PDx86".  I'm happy to add a note
> so your ack looks like
>
> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>  # drivers/platform/x86
>
> or something similar.  Is that what you have in mind?

Ah, sorry, not everyone gets that abbreviation. It stands for Platform
Drivers x86.
What you crafted above is quite good fit to what I meant. Thanks!

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
                   ` (33 preceding siblings ...)
  2018-07-12 22:28 ` Bjorn Helgaas
@ 2018-07-16 14:20 ` Bjorn Helgaas
  2018-07-19  9:43   ` Lukas Wunner
  34 siblings, 1 reply; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-16 14:20 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mika Westerberg, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci,
	Greg Kroah-Hartman, Thomas Gleixner, Mayurkumar Patel,
	Kenji Kaneshige, Stefan Roese, Rajat Jain, Alex Williamson,
	Andreas Noever

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
> on and off synchronously in the IRQ thread, no indirection via a work
> queue anymore.
> 
> When the slot is enabled/disabled by the user via sysfs or an Attention
> Button press, a request is sent to the IRQ thread.  The IRQ thread is
> thus the sole entity enabling/disabling the slot.
> 
> The IRQ thread can cope with missed events, e.g. if a card is inserted
> and immediately pulled out before the IRQ thread had a chance to react.
> It also tolerates an initially unstable link as observed in the wild by
> Stefan Roese.
> 
> Finally, runtime PM support is added.  This was the original motivation
> of the series because runtime suspending hotplug ports is needed to power
> down Thunderbolt controllers on idle, which saves ~1.5W per controller.
> Runtime resuming ports takes tenths of milliseconds during which events
> may be missed, this in turn necessitated the event handling rework.
> 
> I've pushed the series to GitHub to ease reviewing/fetching:
> https://github.com/l1k/linux/commits/pciehp_runpm_v2

My current test branch:

  https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=pci/06-16-lukas-pciehp

has this series with these changes:

  - Drop the genirq patch (already merged via tip)

  - Add one blank line (pcie_cleanup_slot())

  - A few trivial changelog updates (mostly to use lkml.kernel.org
    links to reduce dependency on 3rd party archives)

Do you plan any other updates?  The open questions I see are:

  - You mentioned withdrawing "03/32 PCI: pciehp: Fix deadlock on
    unplug".  I tried simply dropping that, but that caused a conflict
    that I didn't try to resolve.

  - Mika had a few questions/comments that are still dangling.

  - Whether to include "02/32 PCI: pciehp: Fix UAF on unplug" in the
    v4.19 merge window or in v4.18.

Bjorn

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3
  2018-06-21 11:13   ` Mika Westerberg
@ 2018-07-18 19:30     ` Lukas Wunner
  2018-07-20 15:23       ` Mika Westerberg
  0 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-07-18 19:30 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Andreas Noever

Hi Mika,

sorry for the delay!

On Thu, Jun 21, 2018 at 02:13:54PM +0300, Mika Westerberg wrote:
> On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> > +		if (bridge->is_thunderbolt)
> > +			return true;
> 
> I have a small concern here. In PC world native PCIe hotplug used with
> Thunderbolt comes in two flavors:
> 
>   - Native PCI hotplug without runtime PM
>   - Native PCI hotplug with runtime PM
> 
> The former works so that even if it uses native PCIe hotplug, the power
> management is done so that the Thunderbolt host router is hot-removed
> when there is nothing connected (and that brings the power savings).
> With the above change we start putting all Thunderbolt PCIe hotplug
> ports to D3 runtime. While this probably works (and I tested it on the
> same Dell) I don't think Windows does it and it may lead to a path that
> has not been tested very thoroughly by OEMs.
> 
> So I wonder if it makes sense to restrict this particular check to Apple
> hardware at this point?

I'm not familiar at all with Windows, so this your call.  Normally I'd say,
if there's no known case which forces us to constrain this to Apple, we
probably shouldn't be overzealous.  OTOH I imagine you have to deal with
Windows and broken BIOSes on a daily basis and can anticipate if problems
are to be expected.

The "host router" is the NHI, right?  So IIUC all downstream ports and
the upstream port of the TB PCIe switch may go to D3hot, same for the
root port.  And your concern is that hotplug addition or removal of the
NHI might not work if those ports are in D3hot?

What I don't get is, if we constrain this to Apple, TB ports on non-Apple
systems may not runtime suspend (e.g. if the OEM botched the BIOS date
to be < 2015) and then the "Native PCI hotplug with runtime PM" case
will be broken, won't it?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-07-16 14:20 ` Bjorn Helgaas
@ 2018-07-19  9:43   ` Lukas Wunner
  2018-07-19 19:05     ` Bjorn Helgaas
  2018-07-19 22:50     ` Bjorn Helgaas
  0 siblings, 2 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-07-19  9:43 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Mayurkumar Patel, Kenji Kaneshige, Stefan Roese,
	Rajat Jain, Alex Williamson, Andreas Noever

On Mon, Jul 16, 2018 at 09:20:54AM -0500, Bjorn Helgaas wrote:
> On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
> > on and off synchronously in the IRQ thread, no indirection via a work
> > queue anymore.
> > 
> > When the slot is enabled/disabled by the user via sysfs or an Attention
> > Button press, a request is sent to the IRQ thread.  The IRQ thread is
> > thus the sole entity enabling/disabling the slot.
> > 
> > The IRQ thread can cope with missed events, e.g. if a card is inserted
> > and immediately pulled out before the IRQ thread had a chance to react.
> > It also tolerates an initially unstable link as observed in the wild by
> > Stefan Roese.
> > 
> > Finally, runtime PM support is added.  This was the original motivation
> > of the series because runtime suspending hotplug ports is needed to power
> > down Thunderbolt controllers on idle, which saves ~1.5W per controller.
> > Runtime resuming ports takes tenths of milliseconds during which events
> > may be missed, this in turn necessitated the event handling rework.
> > 
> > I've pushed the series to GitHub to ease reviewing/fetching:
> > https://github.com/l1k/linux/commits/pciehp_runpm_v2
> 
> My current test branch:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=pci/06-16-lukas-pciehp
> 
> has this series with these changes:
> 
>   - Drop the genirq patch (already merged via tip)
> 
>   - Add one blank line (pcie_cleanup_slot())
> 
>   - A few trivial changelog updates (mostly to use lkml.kernel.org
>     links to reduce dependency on 3rd party archives)

I've reviewed the branch and diffed every commit with the original patch
and this all LGTM.  I'll try to adhere more closely to the desired style
in the future, i.e. use kernel.org links and order the Cc: to the bottom.


> Do you plan any other updates?  The open questions I see are:
> 
>   - You mentioned withdrawing "03/32 PCI: pciehp: Fix deadlock on
>     unplug".  I tried simply dropping that, but that caused a conflict
>     that I didn't try to resolve.

Yes, a single patch succeeding it won't apply cleanly if patch 03/32 is
omitted, namely "06/32 PCI: pciehp: Declare pciehp_unconfigure_device()
void".  However resolving the conflict is straightforward, I'm including
a replacement patch below.


>   - Mika had a few questions/comments that are still dangling.

I could resolve those with two further replacement patches:

- "17/32 PCI: pciehp: Enable/disable exclusively from IRQ thread"
  => Deduplicate code to detect a change in slot occupancy
     by introducing a small helper.

- "23/32 PCI: pciehp: Avoid slot access during reset"
  => Amend commit message to justify usage of rw_semaphore.

However ISTR that you dislike replacement patches because they're
more complicated for you to handle.  Would you prefer me to repost
the full series instead?

Further open points:

- Mika suggested adding a few breaks to switch/case statements to avoid
  unintentional fall-throughs if the code is later extended.  I think
  it might be good to do that in a separate commit that is applied on
  top of this series, and at the same time mark all intentional
  fall-throughs as such for -Wimplicit-fallthrough.
  BTW, you may see a merge conflict between the pci/06-16-lukas-pciehp
  and the pci/misc branch because you've already applied Gustavo's patch
  to the latter and it touches pciehp_ctrl.c.

- The commit message of "27/32 PCI: pciehp: Support interrupts sent from
  D3hot" could optionally be extended by your comment that the "Downstream
  Port" term includes both Root Ports and Switch Downstream Ports.

- Mika voiced a concern that "32/32 PCI: Whitelist Thunderbolt ports for
  runtime D3" should probably be constrained to Apple systems, this is
  pending a reply to the mail I sent yesterday evening.


>   - Whether to include "02/32 PCI: pciehp: Fix UAF on unplug" in the
>     v4.19 merge window or in v4.18.

Personally I think submitting the fix during the 4.19 merge window is
sufficient, considering that it'll already open in two to three weeks
and the bug has been present for years.

Thanks,

Lukas

-- >8 --
Subject: [PATCH] PCI: pciehp: Declare pciehp_unconfigure_device() void

Since commit 0f4bd8014db5 ("PCI: hotplug: Drop checking of PCI_BRIDGE_
CONTROL in *_unconfigure_device()"), pciehp_unconfigure_device() can no
longer fail, so declare it and its sole caller remove_board() void, in
keeping with the usual kernel pattern that enablement can fail, but
disablement cannot.  No functional change intended.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/hotplug/pciehp.h      |  2 +-
 drivers/pci/hotplug/pciehp_ctrl.c | 11 ++++-------
 drivers/pci/hotplug/pciehp_pci.c  |  4 +---
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 79b9b5f..9bb9931 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -129,7 +129,7 @@ struct controller {
 int pciehp_sysfs_disable_slot(struct slot *slot);
 void pciehp_queue_interrupt_event(struct slot *slot, u32 event_type);
 int pciehp_configure_device(struct slot *p_slot);
-int pciehp_unconfigure_device(struct slot *p_slot);
+void pciehp_unconfigure_device(struct slot *p_slot);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
 struct controller *pcie_init(struct pcie_device *dev);
 int pcie_init_notification(struct controller *ctrl);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 5bbd28d..163947b 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -133,14 +133,11 @@ static int board_added(struct slot *p_slot)
  * remove_board - Turns off slot and LEDs
  * @p_slot: slot where board is being removed
  */
-static int remove_board(struct slot *p_slot)
+static void remove_board(struct slot *p_slot)
 {
-	int retval;
 	struct controller *ctrl = p_slot->ctrl;
 
-	retval = pciehp_unconfigure_device(p_slot);
-	if (retval)
-		return retval;
+	pciehp_unconfigure_device(p_slot);
 
 	if (POWER_CTRL(ctrl)) {
 		pciehp_power_off_slot(p_slot);
@@ -155,7 +152,6 @@ static int remove_board(struct slot *p_slot)
 
 	/* turn off Green LED */
 	pciehp_green_led_off(p_slot);
-	return 0;
 }
 
 struct power_work_info {
@@ -435,7 +431,8 @@ int pciehp_disable_slot(struct slot *p_slot)
 		}
 	}
 
-	return remove_board(p_slot);
+	remove_board(p_slot);
+	return 0;
 }
 
 int pciehp_sysfs_enable_slot(struct slot *p_slot)
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index acc360d..ec3f065 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -76,9 +76,8 @@ int pciehp_configure_device(struct slot *p_slot)
 	return ret;
 }
 
-int pciehp_unconfigure_device(struct slot *p_slot)
+void pciehp_unconfigure_device(struct slot *p_slot)
 {
-	int rc = 0;
 	u8 presence = 0;
 	struct pci_dev *dev, *temp;
 	struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate;
@@ -121,5 +120,4 @@ int pciehp_unconfigure_device(struct slot *p_slot)
 	}
 
 	pci_unlock_rescan_remove();
-	return rc;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-07-19  9:43   ` Lukas Wunner
@ 2018-07-19 19:05     ` Bjorn Helgaas
  2018-07-19 22:50     ` Bjorn Helgaas
  1 sibling, 0 replies; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-19 19:05 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Mayurkumar Patel, Kenji Kaneshige, Stefan Roese,
	Rajat Jain, Alex Williamson, Andreas Noever

On Thu, Jul 19, 2018 at 11:43:15AM +0200, Lukas Wunner wrote:
> On Mon, Jul 16, 2018 at 09:20:54AM -0500, Bjorn Helgaas wrote:

> >   - You mentioned withdrawing "03/32 PCI: pciehp: Fix deadlock on
> >     unplug".  I tried simply dropping that, but that caused a conflict
> >     that I didn't try to resolve.
> 
> Yes, a single patch succeeding it won't apply cleanly if patch 03/32 is
> omitted, namely "06/32 PCI: pciehp: Declare pciehp_unconfigure_device()
> void".  However resolving the conflict is straightforward, I'm including
> a replacement patch below.

Thanks, I'll pull that in shortly.

> >   - Mika had a few questions/comments that are still dangling.
> 
> I could resolve those with two further replacement patches:
> 
> - "17/32 PCI: pciehp: Enable/disable exclusively from IRQ thread"
>   => Deduplicate code to detect a change in slot occupancy
>      by introducing a small helper.
> 
> - "23/32 PCI: pciehp: Avoid slot access during reset"
>   => Amend commit message to justify usage of rw_semaphore.
> 
> However ISTR that you dislike replacement patches because they're
> more complicated for you to handle.  Would you prefer me to repost
> the full series instead?

No need, if you want to post updates for those, I can incorporate
those, too.

This is an extremely well-done series; thanks for all the work you've
done on it!

Bjorn

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-07-19  9:43   ` Lukas Wunner
  2018-07-19 19:05     ` Bjorn Helgaas
@ 2018-07-19 22:50     ` Bjorn Helgaas
  2018-07-28  5:44       ` Lukas Wunner
  1 sibling, 1 reply; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-19 22:50 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Greg Kroah-Hartman,
	Thomas Gleixner, Mayurkumar Patel, Kenji Kaneshige, Stefan Roese,
	Rajat Jain, Alex Williamson, Andreas Noever

On Thu, Jul 19, 2018 at 11:43:15AM +0200, Lukas Wunner wrote:
> On Mon, Jul 16, 2018 at 09:20:54AM -0500, Bjorn Helgaas wrote:
> > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > > Rework pciehp to use modern, threaded IRQ handling.  The slot is powered
> > > on and off synchronously in the IRQ thread, no indirection via a work
> > > queue anymore.
> > > 
> > > When the slot is enabled/disabled by the user via sysfs or an Attention
> > > Button press, a request is sent to the IRQ thread.  The IRQ thread is
> > > thus the sole entity enabling/disabling the slot.
> > > 
> > > The IRQ thread can cope with missed events, e.g. if a card is inserted
> > > and immediately pulled out before the IRQ thread had a chance to react.
> > > It also tolerates an initially unstable link as observed in the wild by
> > > Stefan Roese.
> > > 
> > > Finally, runtime PM support is added.  This was the original motivation
> > > of the series because runtime suspending hotplug ports is needed to power
> > > down Thunderbolt controllers on idle, which saves ~1.5W per controller.
> > > Runtime resuming ports takes tenths of milliseconds during which events
> > > may be missed, this in turn necessitated the event handling rework.
> > > 
> > > I've pushed the series to GitHub to ease reviewing/fetching:
> > > https://github.com/l1k/linux/commits/pciehp_runpm_v2
> > 
> > My current test branch:
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=pci/06-16-lukas-pciehp
> > 
> > has this series with these changes:
> > 
> >   - Drop the genirq patch (already merged via tip)
> > 
> >   - Add one blank line (pcie_cleanup_slot())
> > 
> >   - A few trivial changelog updates (mostly to use lkml.kernel.org
> >     links to reduce dependency on 3rd party archives)
> 
> I've reviewed the branch and diffed every commit with the original patch
> and this all LGTM.  I'll try to adhere more closely to the desired style
> in the future, i.e. use kernel.org links and order the Cc: to the bottom.
> 
> 
> > Do you plan any other updates?  The open questions I see are:
> > 
> >   - You mentioned withdrawing "03/32 PCI: pciehp: Fix deadlock on
> >     unplug".  I tried simply dropping that, but that caused a conflict
> >     that I didn't try to resolve.
> 
> Yes, a single patch succeeding it won't apply cleanly if patch 03/32 is
> omitted, namely "06/32 PCI: pciehp: Declare pciehp_unconfigure_device()
> void".  However resolving the conflict is straightforward, I'm including
> a replacement patch below.

I applied the replacement and put everything on pci/hotplug for v4.19.

This is fantastic.  I really appreciate all your work, and especially
your clear, concise changelogs.

> >   - Mika had a few questions/comments that are still dangling.
> 
> I could resolve those with two further replacement patches:
> 
> - "17/32 PCI: pciehp: Enable/disable exclusively from IRQ thread"
>   => Deduplicate code to detect a change in slot occupancy
>      by introducing a small helper.
> 
> - "23/32 PCI: pciehp: Avoid slot access during reset"
>   => Amend commit message to justify usage of rw_semaphore.
> 
> However ISTR that you dislike replacement patches because they're
> more complicated for you to handle.  Would you prefer me to repost
> the full series instead?
> 
> Further open points:
> 
> - Mika suggested adding a few breaks to switch/case statements to avoid
>   unintentional fall-throughs if the code is later extended.  I think
>   it might be good to do that in a separate commit that is applied on
>   top of this series, and at the same time mark all intentional
>   fall-throughs as such for -Wimplicit-fallthrough.
>   BTW, you may see a merge conflict between the pci/06-16-lukas-pciehp
>   and the pci/misc branch because you've already applied Gustavo's patch
>   to the latter and it touches pciehp_ctrl.c.
> 
> - The commit message of "27/32 PCI: pciehp: Support interrupts sent from
>   D3hot" could optionally be extended by your comment that the "Downstream
>   Port" term includes both Root Ports and Switch Downstream Ports.
> 
> - Mika voiced a concern that "32/32 PCI: Whitelist Thunderbolt ports for
>   runtime D3" should probably be constrained to Apple systems, this is
>   pending a reply to the mail I sent yesterday evening.

If you send any of the above updates, I'll gladly update the
pci/hotplug branch.  You can either send replacement patches or
incremental ones that I can fold into existing commits.

Bjorn

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3
  2018-07-18 19:30     ` Lukas Wunner
@ 2018-07-20 15:23       ` Mika Westerberg
  2018-07-20 16:00         ` Mika Westerberg
  0 siblings, 1 reply; 66+ messages in thread
From: Mika Westerberg @ 2018-07-20 15:23 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Andreas Noever

On Wed, Jul 18, 2018 at 09:30:53PM +0200, Lukas Wunner wrote:
> Hi Mika,
> 
> sorry for the delay!
> 
> On Thu, Jun 21, 2018 at 02:13:54PM +0300, Mika Westerberg wrote:
> > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > > +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> > > +		if (bridge->is_thunderbolt)
> > > +			return true;
> > 
> > I have a small concern here. In PC world native PCIe hotplug used with
> > Thunderbolt comes in two flavors:
> > 
> >   - Native PCI hotplug without runtime PM
> >   - Native PCI hotplug with runtime PM
> > 
> > The former works so that even if it uses native PCIe hotplug, the power
> > management is done so that the Thunderbolt host router is hot-removed
> > when there is nothing connected (and that brings the power savings).
> > With the above change we start putting all Thunderbolt PCIe hotplug
> > ports to D3 runtime. While this probably works (and I tested it on the
> > same Dell) I don't think Windows does it and it may lead to a path that
> > has not been tested very thoroughly by OEMs.
> > 
> > So I wonder if it makes sense to restrict this particular check to Apple
> > hardware at this point?
> 
> I'm not familiar at all with Windows, so this your call.  Normally I'd say,
> if there's no known case which forces us to constrain this to Apple, we
> probably shouldn't be overzealous.  OTOH I imagine you have to deal with
> Windows and broken BIOSes on a daily basis and can anticipate if problems
> are to be expected.
> 
> The "host router" is the NHI, right?  So IIUC all downstream ports and
> the upstream port of the TB PCIe switch may go to D3hot, same for the
> root port.  And your concern is that hotplug addition or removal of the
> NHI might not work if those ports are in D3hot?

Host router is the PCIe switch + NHI + xHCI. My concern is that now we
start putting the downstream hotplug ports of the host router (the ports
where TBT PCIe daisy chain is to be extented) to D3hot even when the
whole power management of the "Native PCI hotplug without runtime PM" is
done pretty much by hot-removing the whole host router if there is
nothing connected.

> What I don't get is, if we constrain this to Apple, TB ports on non-Apple
> systems may not runtime suspend (e.g. if the OEM botched the BIOS date
> to be < 2015) and then the "Native PCI hotplug with runtime PM" case
> will be broken, won't it?

Yes, but it needs additional patches anyway and we are going to use ACPI
_DSD with "HotplugInD3" property to selectively enable this on such
systems.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3
  2018-07-20 15:23       ` Mika Westerberg
@ 2018-07-20 16:00         ` Mika Westerberg
  2018-07-20 20:33           ` Bjorn Helgaas
  0 siblings, 1 reply; 66+ messages in thread
From: Mika Westerberg @ 2018-07-20 16:00 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Andreas Noever

On Fri, Jul 20, 2018 at 06:23:49PM +0300, Mika Westerberg wrote:
> On Wed, Jul 18, 2018 at 09:30:53PM +0200, Lukas Wunner wrote:
> > Hi Mika,
> > 
> > sorry for the delay!
> > 
> > On Thu, Jun 21, 2018 at 02:13:54PM +0300, Mika Westerberg wrote:
> > > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > > > +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> > > > +		if (bridge->is_thunderbolt)
> > > > +			return true;
> > > 
> > > I have a small concern here. In PC world native PCIe hotplug used with
> > > Thunderbolt comes in two flavors:
> > > 
> > >   - Native PCI hotplug without runtime PM
> > >   - Native PCI hotplug with runtime PM
> > > 
> > > The former works so that even if it uses native PCIe hotplug, the power
> > > management is done so that the Thunderbolt host router is hot-removed
> > > when there is nothing connected (and that brings the power savings).
> > > With the above change we start putting all Thunderbolt PCIe hotplug
> > > ports to D3 runtime. While this probably works (and I tested it on the
> > > same Dell) I don't think Windows does it and it may lead to a path that
> > > has not been tested very thoroughly by OEMs.
> > > 
> > > So I wonder if it makes sense to restrict this particular check to Apple
> > > hardware at this point?
> > 
> > I'm not familiar at all with Windows, so this your call.  Normally I'd say,
> > if there's no known case which forces us to constrain this to Apple, we
> > probably shouldn't be overzealous.  OTOH I imagine you have to deal with
> > Windows and broken BIOSes on a daily basis and can anticipate if problems
> > are to be expected.
> > 
> > The "host router" is the NHI, right?  So IIUC all downstream ports and
> > the upstream port of the TB PCIe switch may go to D3hot, same for the
> > root port.  And your concern is that hotplug addition or removal of the
> > NHI might not work if those ports are in D3hot?
> 
> Host router is the PCIe switch + NHI + xHCI. My concern is that now we
> start putting the downstream hotplug ports of the host router (the ports
> where TBT PCIe daisy chain is to be extented) to D3hot even when the
> whole power management of the "Native PCI hotplug without runtime PM" is
> done pretty much by hot-removing the whole host router if there is
> nothing connected.

I see Bjorn already merged these so please ignore the above. We can
add the Apple check later if this turns out causing problems.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3
  2018-07-20 16:00         ` Mika Westerberg
@ 2018-07-20 20:33           ` Bjorn Helgaas
  0 siblings, 0 replies; 66+ messages in thread
From: Bjorn Helgaas @ 2018-07-20 20:33 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Lukas Wunner, Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj,
	Keith Busch, Yinghai Lu, Sinan Kaya, linux-pci, Andreas Noever

On Fri, Jul 20, 2018 at 07:00:38PM +0300, Mika Westerberg wrote:
> On Fri, Jul 20, 2018 at 06:23:49PM +0300, Mika Westerberg wrote:
> > On Wed, Jul 18, 2018 at 09:30:53PM +0200, Lukas Wunner wrote:
> > > Hi Mika,
> > > 
> > > sorry for the delay!
> > > 
> > > On Thu, Jun 21, 2018 at 02:13:54PM +0300, Mika Westerberg wrote:
> > > > On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > > > > +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> > > > > +		if (bridge->is_thunderbolt)
> > > > > +			return true;
> > > > 
> > > > I have a small concern here. In PC world native PCIe hotplug used with
> > > > Thunderbolt comes in two flavors:
> > > > 
> > > >   - Native PCI hotplug without runtime PM
> > > >   - Native PCI hotplug with runtime PM
> > > > 
> > > > The former works so that even if it uses native PCIe hotplug, the power
> > > > management is done so that the Thunderbolt host router is hot-removed
> > > > when there is nothing connected (and that brings the power savings).
> > > > With the above change we start putting all Thunderbolt PCIe hotplug
> > > > ports to D3 runtime. While this probably works (and I tested it on the
> > > > same Dell) I don't think Windows does it and it may lead to a path that
> > > > has not been tested very thoroughly by OEMs.
> > > > 
> > > > So I wonder if it makes sense to restrict this particular check to Apple
> > > > hardware at this point?
> > > 
> > > I'm not familiar at all with Windows, so this your call.  Normally I'd say,
> > > if there's no known case which forces us to constrain this to Apple, we
> > > probably shouldn't be overzealous.  OTOH I imagine you have to deal with
> > > Windows and broken BIOSes on a daily basis and can anticipate if problems
> > > are to be expected.
> > > 
> > > The "host router" is the NHI, right?  So IIUC all downstream ports and
> > > the upstream port of the TB PCIe switch may go to D3hot, same for the
> > > root port.  And your concern is that hotplug addition or removal of the
> > > NHI might not work if those ports are in D3hot?
> > 
> > Host router is the PCIe switch + NHI + xHCI. My concern is that now we
> > start putting the downstream hotplug ports of the host router (the ports
> > where TBT PCIe daisy chain is to be extented) to D3hot even when the
> > whole power management of the "Native PCI hotplug without runtime PM" is
> > done pretty much by hot-removing the whole host router if there is
> > nothing connected.
> 
> I see Bjorn already merged these so please ignore the above. We can
> add the Apple check later if this turns out causing problems.

I merged these just to try to make forward progress and dig out from
my backlog.  If you want to tweak this before the merge window, I'd be
glad to update the branch.

Bjorn

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/32] Rework pciehp event handling & add runtime PM
  2018-07-19 22:50     ` Bjorn Helgaas
@ 2018-07-28  5:44       ` Lukas Wunner
  0 siblings, 0 replies; 66+ messages in thread
From: Lukas Wunner @ 2018-07-28  5:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci, Mayurkumar Patel,
	Kenji Kaneshige, Stefan Roese, Rajat Jain, Alex Williamson,
	Andreas Noever

On Thu, Jul 19, 2018 at 05:50:16PM -0500, Bjorn Helgaas wrote:
> On Thu, Jul 19, 2018 at 11:43:15AM +0200, Lukas Wunner wrote:
> > On Mon, Jul 16, 2018 at 09:20:54AM -0500, Bjorn Helgaas wrote:
> > >   - Mika had a few questions/comments that are still dangling.
> > 
> > I could resolve those with two further replacement patches:
> > 
> > - "17/32 PCI: pciehp: Enable/disable exclusively from IRQ thread"
> >   => Deduplicate code to detect a change in slot occupancy
> >      by introducing a small helper.
> > 
> > - "23/32 PCI: pciehp: Avoid slot access during reset"
> >   => Amend commit message to justify usage of rw_semaphore.
> > 
> > Further open points:
> > 
> > - Mika suggested adding a few breaks to switch/case statements to avoid
> >   unintentional fall-throughs if the code is later extended.  I think
> >   it might be good to do that in a separate commit that is applied on
> >   top of this series, and at the same time mark all intentional
> >   fall-throughs as such for -Wimplicit-fallthrough.
> >   BTW, you may see a merge conflict between the pci/06-16-lukas-pciehp
> >   and the pci/misc branch because you've already applied Gustavo's patch
> >   to the latter and it touches pciehp_ctrl.c.
> > 
> > - The commit message of "27/32 PCI: pciehp: Support interrupts sent from
> >   D3hot" could optionally be extended by your comment that the "Downstream
> >   Port" term includes both Root Ports and Switch Downstream Ports.
> 
> If you send any of the above updates, I'll gladly update the
> pci/hotplug branch.  You can either send replacement patches or
> incremental ones that I can fold into existing commits.

Bjorn, Mika, everyone, please excuse the delay.

I've just posted 4 patches to linux-pci@ to address all the above-listed
review comments that are still outstanding:

* 2 replacement patches for existing commits on Bjorn's pci/hotplug branch:
    PCI: pciehp: Avoid slot access during reset
    PCI: pciehp: Support interrupts sent from D3hot
  No code changes, only the commit messages have been updated.
  Specifics are below the three dash separator in each patch.

* 2 patches that are intended to be applied on top of the branch:
    PCI: pciehp: Avoid implicit fallthroughs in switch statements
    PCI: pciehp: Deduplicate presence check on probe & resume

With this approach I hope to minimize the work for Bjorn by avoiding
any rebase conflicts.

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug
  2018-06-16 19:25 ` [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug Lukas Wunner
@ 2018-09-06 16:01   ` Mika Westerberg
  2018-09-06 16:26     ` Lukas Wunner
  0 siblings, 1 reply; 66+ messages in thread
From: Mika Westerberg @ 2018-09-06 16:01 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

Hi Lukas,

On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> When pciehp was conceived it apparently wasn't considered that one
> hotplug port may be the parent of another, but with Thunderbolt this is
> par for the course.
> 
> If the sysfs interface is used to initiate a simultaneous card removal
> of two hotplug ports where one is a parent of the other, or if the two
> ports simultaneously signal interrupts, e.g. because a link or presence
> change was detected on both, a deadlock may occur if:
> 
> - The parent acquires pci_lock_rescan_remove(), starts removal of the
>   child and waits for it to be unbound.
> - The child waits to acquire pci_lock_rescan_remove() in order to
>   service the pending sysfs command or interrupt before it can unbind.
> 
> Fix by using pci_dev_is_disconnected() as an indicator whether a parent
> is removing the child, and avoid acquiring the lock in the child if so.
> Should the child happen to acquire the lock first, there's no deadlock.

This patch got dropped from your pciehp series but I wonder if you have
any plans to re-send it or rework it?

I'm asking because this is pretty real issue if you have longer chains
of devices and I've been using this patch to make the issue go away. So
even if this is not solving all possible problems, I think it may be
good idea to get included.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug
  2018-09-06 16:01   ` Mika Westerberg
@ 2018-09-06 16:26     ` Lukas Wunner
  2018-09-06 18:08       ` Mika Westerberg
  0 siblings, 1 reply; 66+ messages in thread
From: Lukas Wunner @ 2018-09-06 16:26 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

On Thu, Sep 06, 2018 at 07:01:00PM +0300, Mika Westerberg wrote:
> On Sat, Jun 16, 2018 at 09:25:00PM +0200, Lukas Wunner wrote:
> > When pciehp was conceived it apparently wasn't considered that one
> > hotplug port may be the parent of another, but with Thunderbolt this is
> > par for the course.
> > 
> > If the sysfs interface is used to initiate a simultaneous card removal
> > of two hotplug ports where one is a parent of the other, or if the two
> > ports simultaneously signal interrupts, e.g. because a link or presence
> > change was detected on both, a deadlock may occur if:
> > 
> > - The parent acquires pci_lock_rescan_remove(), starts removal of the
> >   child and waits for it to be unbound.
> > - The child waits to acquire pci_lock_rescan_remove() in order to
> >   service the pending sysfs command or interrupt before it can unbind.
> > 
> > Fix by using pci_dev_is_disconnected() as an indicator whether a parent
> > is removing the child, and avoid acquiring the lock in the child if so.
> > Should the child happen to acquire the lock first, there's no deadlock.
> 
> This patch got dropped from your pciehp series but I wonder if you have
> any plans to re-send it or rework it?
> 
> I'm asking because this is pretty real issue if you have longer chains
> of devices and I've been using this patch to make the issue go away. So
> even if this is not solving all possible problems, I think it may be
> good idea to get included.

I asked for it to be dropped because while it fixes the problem,
it's not a good solution, it would just add technical debt to the
code base and come back to haunt us later.

Xiongfeng Wang found a similar race a few months ago that wouldn't
be fixed by my patch:
https://patchwork.ozlabs.org/patch/877835/

The PCI core nicely separates unbinding of drivers from destruction
of the pci_dev (pci_stop_bus_device() + pci_remove_bus_device()).
Problem is we currently protect both steps with pci_lock_rescan_remove().
We should only be protecting the second step.  The first step (unbinding)
needs to run lockless.

As a start, I've moved the call to pcie_aspm_exit_link_state() out
of pci_stop_dev().  This also fixes an ASPM bug.  Bjorn merged it
the day before yesterday:
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/aspm&id=1f3934b1d5e5

The call to device_release_driver() already uses locking internally
and can be run lockless.  The calls to pci_proc_detach_device() and
pci_remove_sysfs_dev_files() need to be amended with locking, I've
got some preliminary patches for this on my development branch that
I'll have to rework.  This is very hairy, historically grown code
that requires great care to avoid breakage.  It'll take a little
more time I'm afraid.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug
  2018-09-06 16:26     ` Lukas Wunner
@ 2018-09-06 18:08       ` Mika Westerberg
  0 siblings, 0 replies; 66+ messages in thread
From: Mika Westerberg @ 2018-09-06 18:08 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Ashok Raj, Keith Busch,
	Yinghai Lu, Sinan Kaya, linux-pci

On Thu, Sep 06, 2018 at 06:26:34PM +0200, Lukas Wunner wrote:
> I asked for it to be dropped because while it fixes the problem,
> it's not a good solution, it would just add technical debt to the
> code base and come back to haunt us later.
> 
> Xiongfeng Wang found a similar race a few months ago that wouldn't
> be fixed by my patch:
> https://patchwork.ozlabs.org/patch/877835/
> 
> The PCI core nicely separates unbinding of drivers from destruction
> of the pci_dev (pci_stop_bus_device() + pci_remove_bus_device()).
> Problem is we currently protect both steps with pci_lock_rescan_remove().
> We should only be protecting the second step.  The first step (unbinding)
> needs to run lockless.
> 
> As a start, I've moved the call to pcie_aspm_exit_link_state() out
> of pci_stop_dev().  This also fixes an ASPM bug.  Bjorn merged it
> the day before yesterday:
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/aspm&id=1f3934b1d5e5
> 
> The call to device_release_driver() already uses locking internally
> and can be run lockless.  The calls to pci_proc_detach_device() and
> pci_remove_sysfs_dev_files() need to be amended with locking, I've
> got some preliminary patches for this on my development branch that
> I'll have to rework.  This is very hairy, historically grown code
> that requires great care to avoid breakage.  It'll take a little
> more time I'm afraid.

OK, thanks for the explanation. I see you have a good plan going forward
with this.

I can reproduce the issue easily so can assist in testing if needed.

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2018-09-06 22:45 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-16 19:25 [PATCH 00/32] Rework pciehp event handling & add runtime PM Lukas Wunner
2018-06-16 19:25 ` [PATCH 04/32] PCI: pciehp: Fix unprotected list iteration in IRQ handler Lukas Wunner
2018-06-16 19:25 ` [PATCH 08/32] genirq: Synchronize only with single thread on free_irq() Lukas Wunner
2018-07-12 22:21   ` Bjorn Helgaas
2018-07-13  7:21     ` Lukas Wunner
2018-07-13 11:44       ` Bjorn Helgaas
2018-07-16 12:37       ` Bjorn Helgaas
2018-07-16 13:37         ` Lukas Wunner
2018-06-16 19:25 ` [PATCH 31/32] PCI: Whitelist native hotplug ports for runtime D3 Lukas Wunner
2018-06-16 19:25 ` [PATCH 19/32] PCI: pciehp: Declare pciehp_enable/disable_slot() static Lukas Wunner
2018-06-16 19:25 ` [PATCH 11/32] PCI: pciehp: Stop blinking on slot enable failure Lukas Wunner
2018-06-16 19:25 ` [PATCH 23/32] PCI: pciehp: Avoid slot access during reset Lukas Wunner
2018-06-21 12:06   ` Mika Westerberg
2018-06-22  9:23     ` Lukas Wunner
2018-06-25 13:10       ` Mika Westerberg
2018-06-16 19:25 ` [PATCH 01/32] PCI: hotplug: Don't leak pci_slot on registration failure Lukas Wunner
2018-06-16 19:25 ` [PATCH 26/32] PCI: pciehp: Obey compulsory command delay after resume Lukas Wunner
2018-06-16 19:25 ` [PATCH 15/32] PCI: pciehp: Publish to user space last on probe Lukas Wunner
2018-06-16 19:25 ` [PATCH 16/32] PCI: pciehp: Track enable/disable status Lukas Wunner
2018-06-16 19:25 ` [PATCH 32/32] PCI: Whitelist Thunderbolt ports for runtime D3 Lukas Wunner
2018-06-21 11:13   ` Mika Westerberg
2018-07-18 19:30     ` Lukas Wunner
2018-07-20 15:23       ` Mika Westerberg
2018-07-20 16:00         ` Mika Westerberg
2018-07-20 20:33           ` Bjorn Helgaas
2018-06-16 19:25 ` [PATCH 06/32] PCI: pciehp: Declare pciehp_unconfigure_device() void Lukas Wunner
2018-06-16 19:25 ` [PATCH 28/32] PCI: pciehp: Resume to D0 on enable/disable Lukas Wunner
2018-06-16 19:25 ` [PATCH 22/32] PCI: pciehp: Always enable occupied slot on probe Lukas Wunner
2018-06-16 19:25 ` [PATCH 10/32] PCI: pciehp: Convert to threaded polling Lukas Wunner
2018-06-16 19:25 ` [PATCH 13/32] PCI: pciehp: Drop slot workqueue Lukas Wunner
2018-06-16 19:25 ` [PATCH 12/32] PCI: pciehp: Handle events synchronously Lukas Wunner
2018-06-16 19:25 ` [PATCH 21/32] PCI: pciehp: Become resilient to missed events Lukas Wunner
2018-06-16 19:25 ` [PATCH 25/32] PCI: pciehp: Clear spurious events earlier on resume Lukas Wunner
2018-06-16 19:25 ` [PATCH 02/32] PCI: pciehp: Fix UAF on unplug Lukas Wunner
2018-06-16 19:25 ` [PATCH 29/32] PCI: pciehp: Resume parent to D0 on config space access Lukas Wunner
2018-06-16 19:25 ` [PATCH 17/32] PCI: pciehp: Enable/disable exclusively from IRQ thread Lukas Wunner
2018-06-21 11:58   ` Mika Westerberg
2018-06-16 19:25 ` [PATCH 09/32] PCI: pciehp: Convert to threaded IRQ Lukas Wunner
2018-06-19 23:16   ` Keith Busch
2018-06-20 11:01     ` Lukas Wunner
2018-06-16 19:25 ` [PATCH 07/32] PCI: pciehp: Document struct slot and struct controller Lukas Wunner
2018-06-16 19:25 ` [PATCH 20/32] PCI: pciehp: Tolerate initially unstable link Lukas Wunner
2018-06-16 19:25 ` [PATCH 05/32] PCI: pciehp: Drop unnecessary NULL pointer check Lukas Wunner
2018-06-16 19:25 ` [PATCH 30/32] PCI: sysfs: Resume to D0 on function reset Lukas Wunner
2018-06-16 19:25 ` [PATCH 03/32] PCI: pciehp: Fix deadlock on unplug Lukas Wunner
2018-09-06 16:01   ` Mika Westerberg
2018-09-06 16:26     ` Lukas Wunner
2018-09-06 18:08       ` Mika Westerberg
2018-06-16 19:25 ` [PATCH 18/32] PCI: pciehp: Drop enable/disable lock Lukas Wunner
2018-06-16 19:25 ` [PATCH 14/32] PCI: hotplug: Demidlayer registration with the core Lukas Wunner
2018-06-17 16:44   ` Andy Shevchenko
2018-07-16 12:46     ` Bjorn Helgaas
2018-07-16 14:14       ` Andy Shevchenko
2018-06-16 19:25 ` [PATCH 24/32] PCI: portdrv: Deduplicate PM callback iterator Lukas Wunner
2018-06-16 19:25 ` [PATCH 27/32] PCI: pciehp: Support interrupts sent from D3hot Lukas Wunner
2018-07-12 23:03   ` Bjorn Helgaas
2018-06-21 12:19 ` [PATCH 00/32] Rework pciehp event handling & add runtime PM Mika Westerberg
2018-06-27 13:35   ` Patel, Mayurkumar
2018-07-12 22:28 ` Bjorn Helgaas
2018-07-13  7:54   ` Lukas Wunner
2018-07-13 11:43     ` Bjorn Helgaas
2018-07-16 14:20 ` Bjorn Helgaas
2018-07-19  9:43   ` Lukas Wunner
2018-07-19 19:05     ` Bjorn Helgaas
2018-07-19 22:50     ` Bjorn Helgaas
2018-07-28  5:44       ` Lukas Wunner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.