All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (11 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 12/13] thunderbolt: Support runtime pm on upstream bridge Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-06-17 22:52   ` Bjorn Helgaas
  2016-05-21  9:48 ` [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Andreas Noever
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

If a hotplug port is suspended to D3cold, its slot status register
cannot be read.  If that hotplug port happens to share its IRQ with
other devices, then whenever an interrupt occurs for one of these
devices, a "no response from device" message is logged with level
KERN_INFO.  Apart from this annoyance, CPU time is needlessly spent
trying to read the slot status register even though we know in advance
that it will fail.

On MacBook Pros introduced 2011 and 2012, the IRQ of a Thunderbolt
hotplug port is unfortunately shared with a wireless card, an audio card
and an SDXC controller.  When the Thunderbolt controller is powered
down, the machine carries out at least one unneeded slot status register
read for each wireless packet received and prints a corresponding error
message to the system log.

The hotplug port's current_state will be D3cold when it's powered down,
so ignore interrupts that occur during that power state.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_hpc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5c24e93..08e84d6 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -546,6 +546,10 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 	u8 present;
 	bool link;
 
+	/* Interrupts cannot originate from a controller that's asleep */
+	if (pdev->current_state == PCI_D3cold)
+		return IRQ_NONE;
+
 	/*
 	 * In order to guarantee that all interrupt events are
 	 * serviced, we need to re-inspect Slot Status register after
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (6 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 08/13] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-06-17 21:09   ` Bjorn Helgaas
  2016-05-13 11:15 ` [PATCH v2 06/13] PCI: pciehp: Support runtime pm Lukas Wunner
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever, Rafael J. Wysocki

There are devices wich are not power-managed by the platform, yet can be
runtime suspended to D3cold with some other mechanism.  When putting the
system to sleep, we currently handle such devices improperly by trying
to transition them from D3cold to D3hot (the default power state defined
at the beginning of pci_target_state()).  Avoid that.

An example for devices affected by this are Thunderbolt controllers
built into Macs which can be put into D3cold with nonstandard ACPI
methods.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 791dfe7..6af9911 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
 			      && !(dev->pme_support & (1 << target_state)))
 				target_state--;
 		}
+	} else if (dev->current_state == PCI_D3cold) {
+		target_state = PCI_D3cold;
 	}
 
 	return target_state;
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 12/13] thunderbolt: Support runtime pm on upstream bridge
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (10 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 02/13] PCI: Allow D3 for Thunderbolt ports Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold Lukas Wunner
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

Document and implement Apple's ACPI-based (but nonstandard) mechanism to
power the controller up and down as needed.  Briefly, an ACPI method
provided by Apple is used to cut power to the controller.  A GPE is
enabled while the controller is powered down which side-band signals a
plug event, whereupon we reinstate power using the ACPI method.

This saves 1.7 W on machines with a Light Ridge controller and is
reported to save 4 W on Cactus Ridge 4C and Falcon Ridge 4C.  It fixes
(at least partially) a power regression introduced in Linux 3.17 by
commit 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").

A Thunderbolt controller appears to the OS as a set of PCI devices:  One
NHI (Native Host Interface) and multiple bridges.  Power is cut to the
entire set of devices.  The Linux pm model is hierarchical and assumes
that a child cannot resume before its parent.  To conform to this model,
power control must be governed by the Thunderbolt controller's topmost
device, which is the upstream bridge.  This is achieved by binding to it
as a Thunderbolt port service driver:

  (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
                                     +-- Downstream Bridge 1 --
                                     +-- Downstream Bridge 2 --
                                     ...

There are no Thunderbolt specs publicly available from Intel or Apple,
so I've included documentation to the extent that I was able to reverse-
engineer things.  Documentation on the Go2Sx and Ok2Go2Sx pins is
tentative as those are missing on my Light Ridge.  Apple only uses them
on Cactus Ridge 4C.  Someone with such a controller needs to find out
through experimentation if the documentation is accurate and amend it if
necessary.

To maximize power saving, the controller is left asleep during the
system suspend process ("direct-complete" in runtime pm lingo).  We also
opt out of the mandatory runtime resume after system suspend which was
introduced with 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might
have been reset by firmware").  We're better than OS X there, which
always wakes the controller after system sleep for no apparent reason.
Finally, we also do not wake the controller on system shutdown to avoid
stalling the shutdown procedure by 2 seconds (that's how long it takes
for the controller to power up).

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92111
Cc: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/thunderbolt/Kconfig    |   4 +-
 drivers/thunderbolt/Makefile   |   4 +-
 drivers/thunderbolt/nhi.c      |  21 ++-
 drivers/thunderbolt/upstream.c | 345 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 369 insertions(+), 5 deletions(-)
 create mode 100644 drivers/thunderbolt/upstream.c

diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
index c121acc..79f53fc 100644
--- a/drivers/thunderbolt/Kconfig
+++ b/drivers/thunderbolt/Kconfig
@@ -1,7 +1,9 @@
 menuconfig THUNDERBOLT
 	tristate "Thunderbolt support for Apple devices"
-	depends on PCI
+	depends on PCI && ACPI
+	select PCIEPORTBUS
 	select CRC32
+	select PM
 	help
 	  Cactus Ridge Thunderbolt Controller driver
 	  This driver is required if you want to hotplug Thunderbolt devices on
diff --git a/drivers/thunderbolt/Makefile b/drivers/thunderbolt/Makefile
index 5d1053c..8cae413 100644
--- a/drivers/thunderbolt/Makefile
+++ b/drivers/thunderbolt/Makefile
@@ -1,3 +1,3 @@
 obj-${CONFIG_THUNDERBOLT} := thunderbolt.o
-thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o eeprom.o
-
+thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o \
+		    eeprom.o upstream.o
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 9c15344..d54666e 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/pci.h>
+#include <linux/pcieport_if.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
 #include <linux/dmi.h>
@@ -631,7 +632,7 @@ static const struct dev_pm_ops nhi_pm_ops = {
 	.restore_noirq = nhi_resume_noirq,
 };
 
-static struct pci_device_id nhi_ids[] = {
+struct pci_device_id nhi_ids[] = {
 	/*
 	 * We have to specify class, the TB bridges use the same device and
 	 * vendor (sub)id on gen 1 and gen 2 controllers.
@@ -668,16 +669,32 @@ static struct pci_driver nhi_driver = {
 	.driver.pm = &nhi_pm_ops,
 };
 
+extern struct pcie_port_service_driver upstream_driver;
+
 static int __init nhi_init(void)
 {
+	int res;
+
 	if (!dmi_match(DMI_BOARD_VENDOR, "Apple Inc."))
 		return -ENOSYS;
-	return pci_register_driver(&nhi_driver);
+
+	res = pcie_port_service_register(&upstream_driver);
+	if (res)
+		return res;
+
+	res = pci_register_driver(&nhi_driver);
+	if (res) {
+		pcie_port_service_unregister(&upstream_driver);
+		return res;
+	}
+
+	return 0;
 }
 
 static void __exit nhi_unload(void)
 {
 	pci_unregister_driver(&nhi_driver);
+	pcie_port_service_unregister(&upstream_driver);
 }
 
 module_init(nhi_init);
diff --git a/drivers/thunderbolt/upstream.c b/drivers/thunderbolt/upstream.c
new file mode 100644
index 0000000..d69422b
--- /dev/null
+++ b/drivers/thunderbolt/upstream.c
@@ -0,0 +1,345 @@
+/*
+ * upstream.c - thunderbolt upstream bridge driver (powers controller up/down)
+ * Copyright (C) 2016 Lukas Wunner <lukas@wunner.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Apple provides the following means for power control in ACPI:
+ *
+ * * On Macs with Thunderbolt 1 Gen 1 controllers (Light Ridge, Eagle Ridge):
+ *   * XRPE method ("Power Enable"), takes argument 1 or 0, toggles a GPIO pin
+ *     to switch the controller on or off.
+ *   * XRIN named object (alternatively _GPE), contains number of a GPE which
+ *     fires as long as something is plugged in (regardless of power state).
+ *   * XRIL method ("Interrupt Low"), returns 0 as long as something is
+ *     plugged in, 1 otherwise.
+ *   * XRIP and XRIO methods, unused by OS X driver.
+ *
+ * * On Macs with Thunderbolt 1 Gen 2 controllers (Cactus Ridge 4C):
+ *   * XRIN not only fires as long as something is plugged in, but also as long
+ *     as the controller's CIO switch is powered up.
+ *   * XRIL method changed its meaning, it returns 0 as long as the CIO switch
+ *     is powered up, 1 otherwise.
+ *   * Additional SXFP method ("Force Power"), accepts only argument 0,
+ *     switches the controller off. This carries out just the raw power change,
+ *     unlike XRPE which disables the link on the PCIe Root Port in an orderly
+ *     fashion before switching off the controller.
+ *   * Additional SXLV, SXIO, SXIL methods to utilize the Go2Sx and Ok2Go2Sx
+ *     pins (see background below). Apparently SXLV toggles the value given to
+ *     the POC via Go2Sx (0 or 1), SXIO changes the direction (0 or 1) and SXIL
+ *     returns the value received from the POC via Ok2Go2Sx.
+ *   * On some Macs, additional XRST method, takes argument 1 or 0, asserts or
+ *     deasserts a GPIO pin to reset the controller.
+ *   * On Macs introduced 2013, XRPE was renamed TRPE.
+ *
+ * * On Macs with Thunderbolt 2 controllers (Falcon Ridge 4C and 2C):
+ *   * SXLV, SXIO, SXIL methods to utilize Go2Sx and Ok2Go2Sx are gone.
+ *   * On the MacPro6,1 which has multiple Thunderbolt controllers, each NHI
+ *     device has a separate XRIN GPE and separate TRPE, SXFP and XRIL methods.
+ *
+ * Background:
+ *
+ * * Gen 1 controllers (Light Ridge, Eagle Ridge) had no power management
+ *   and no ability to distinguish whether a DP or Thunderbolt device is
+ *   plugged in. Apple put an ARM Cortex MCU (NXP LPC1112A) on the logic board
+ *   which snoops on the connector lines and, depending on the type of device,
+ *   sends an HPD signal to the GPU or rings the Thunderbolt XRIN doorbell
+ *   interrupt. The switches for the 3.3V and 1.05V power rails to the
+ *   Thunderbolt controller are toggled by a GPIO pin on the southbridge.
+ *
+ * * On gen 2 controllers (Cactus Ridge 4C), Intel integrated the MCU into the
+ *   controller and called it POC. This caused a change of semantics for XRIN
+ *   and XRIL. The POC is powered by a separate 3.3V rail which is active even
+ *   in sleep state S4. It only draws a very small current. The regular 3.3V
+ *   and 1.05V power rails are no longer controlled by the southbridge but by
+ *   the POC. In other words the controller powers *itself* up and down! It is
+ *   instructed to do so with the Go2Sx pin. Another pin, Ok2Go2Sx, allows the
+ *   controller to indicate if it is ready to power itself down. Apple wires
+ *   Go2Sx and Ok2Go2Sx to the same GPIO pin on the southbridge, hence the pin
+ *   is used bidirectionally. A third pin, Force Power, is intended by Intel
+ *   for debug only but Apple abuses it for XRPE/TRPE and SXFP. Perhaps it
+ *   leads to larger power saving gains. They utilize Go2Sx and Ok2Go2Sx only
+ *   on Cactus Ridge, presumably because the controller somehow requires that.
+ *   On Falcon Ridge they forego these pins and rely solely on Force Power.
+ *
+ * Implementation Notes:
+ *
+ * * The controller is powered down once all child devices have suspended and
+ *   its autosuspend delay has elapsed. The delay is user configurable via
+ *   sysfs and should be lower or equal to that of the NHI since hotplug events
+ *   are not acted upon if the NHI has suspended but the controller has not yet
+ *   powered down. The delay should not be zero to avoid frequent power changes
+ *   (e.g. multiple times just for lspci -vv) since powering up takes 2 sec.
+ *   (Powering down is almost instantaneous.)
+ */
+
+#include <linux/acpi.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/pcieport_if.h>
+#include <linux/pm_runtime.h>
+
+struct tb_upstream {
+	struct pci_dev *nhi;
+	unsigned long long wake_gpe; /* hotplug interrupt during powerdown */
+	acpi_handle set_power; /* method to power controller up/down */
+	acpi_handle get_power; /* method to query power state */
+};
+
+static int upstream_prepare(struct pcie_device *dev)
+{
+	struct tb_upstream *upstream = get_service_data(dev);
+
+	/* prevent interrupts during system sleep transition */
+	if (pm_runtime_suspended(&dev->port->dev) &&
+	    ACPI_FAILURE(acpi_disable_gpe(NULL, upstream->wake_gpe))) {
+		dev_err(&dev->device, "cannot disable wake GPE, resuming\n");
+		pm_request_resume(&dev->port->dev);
+	}
+
+	return 0;
+}
+
+static int upstream_complete(struct pcie_device *dev)
+{
+	struct tb_upstream *upstream = get_service_data(dev);
+
+	/*
+	 * If the controller was powered up before system sleep, we'll find it
+	 * automatically powered up afterwards.
+	 */
+	if (pm_runtime_active(&dev->port->dev))
+		return 0;
+
+	/*
+	 * If the controller was powered down before system sleep, calling XRPE
+	 * to power it up will fail on the next runtime resume. An additional
+	 * call to XRPE is necessary to reset the power switch first.
+	 */
+	dev_info(&dev->device, "resetting power switch\n");
+	if (ACPI_FAILURE(acpi_execute_simple_method(upstream->set_power, NULL,
+						    0))) {
+		dev_err(&dev->device, "cannot call set_power method\n");
+		dev->port->dev.power.runtime_error = -ENODEV;
+	}
+
+	if (ACPI_FAILURE(acpi_enable_gpe(NULL, upstream->wake_gpe))) {
+		dev_err(&dev->device, "cannot enable wake GPE, resuming\n");
+		pm_request_resume(&dev->port->dev);
+	}
+
+	return 0;
+}
+
+static int pm_set_d3cold_cb(struct pci_dev *pdev, void *ptr)
+{
+	pdev->current_state = PCI_D3cold;
+	return 0;
+}
+
+static int pm_set_d3hot_and_resume_cb(struct pci_dev *pdev, void *ptr)
+{
+	pdev->current_state = PCI_D3hot;
+	WARN_ON(pm_request_resume(&pdev->dev) < 0);
+	return 0;
+}
+
+static int upstream_runtime_suspend(struct pcie_device *dev)
+{
+	struct tb_upstream *upstream = get_service_data(dev);
+	unsigned long long powered_down;
+	int i;
+
+	if (!dev->port->d3cold_allowed)
+		return -EAGAIN;
+
+	pci_save_state(dev->port);
+	pci_walk_bus(dev->port->bus, pm_set_d3cold_cb, NULL);
+
+	dev_info(&dev->device, "powering down\n");
+	if (ACPI_FAILURE(acpi_execute_simple_method(upstream->set_power, NULL,
+						    0))) {
+		dev_err(&dev->device, "cannot call set_power method, resuming\n");
+		goto err;
+	}
+
+	/*
+	 * On gen 2 controllers, the wake GPE fires as long as the controller
+	 * is powered up. Poll until it's powered down before enabling the GPE.
+	 */
+	for (i = 0; i < 300; i++) {
+		if (ACPI_FAILURE(acpi_evaluate_integer(upstream->get_power,
+						       NULL, NULL,
+						       &powered_down))) {
+			dev_err(&dev->device, "cannot call get_power method, resuming\n");
+			goto err;
+		}
+		if (powered_down)
+			break;
+		usleep_range(800, 1600);
+	}
+	if (!powered_down) {
+		dev_err(&dev->device, "refused to power down, resuming\n");
+		goto err;
+	}
+
+	if (ACPI_FAILURE(acpi_enable_gpe(NULL, upstream->wake_gpe))) {
+		dev_err(&dev->device, "cannot enable wake GPE, resuming\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	acpi_execute_simple_method(upstream->set_power, NULL, 1);
+	dev->port->current_state = PCI_D0;
+	pci_restore_state(dev->port);
+	pci_walk_bus(dev->port->subordinate, pm_set_d3hot_and_resume_cb, NULL);
+	return -EAGAIN;
+}
+
+static int upstream_runtime_resume(struct pcie_device *dev)
+{
+	struct tb_upstream *upstream = get_service_data(dev);
+
+	if (system_state >= SYSTEM_HALT)
+		return -ESHUTDOWN;
+
+	if (ACPI_FAILURE(acpi_disable_gpe(NULL, upstream->wake_gpe))) {
+		dev_err(&dev->device, "cannot disable wake GPE, disabling runtime pm\n");
+		pm_runtime_get_noresume(&upstream->nhi->dev);
+	}
+
+	dev_info(&dev->device, "powering up\n");
+	if (ACPI_FAILURE(acpi_execute_simple_method(upstream->set_power, NULL,
+						    1))) {
+		dev_err(&dev->device, "cannot call set_power method\n");
+		return -ENODEV;
+	}
+
+	dev->port->current_state = PCI_D0;
+	pci_restore_state(dev->port);
+
+	/* wake children to force pci_restore_state() after D3cold */
+	pci_walk_bus(dev->port->subordinate, pm_set_d3hot_and_resume_cb, NULL);
+	return 0;
+}
+
+static u32 upstream_wake_nhi(acpi_handle gpe_device, u32 gpe_number, void *ctx)
+{
+	struct pci_dev *nhi = ctx;
+	WARN_ON(pm_request_resume(&nhi->dev) < 0);
+	return ACPI_INTERRUPT_HANDLED;
+}
+
+static int pm_init_cb(struct pci_dev *pdev, void *ptr)
+{
+	/* opt out of mandatory runtime resume after system sleep */
+	pdev->dev.power.direct_complete_noresume = true;
+	return 0;
+}
+
+extern struct pci_device_id nhi_ids[];
+
+static int upstream_probe(struct pcie_device *dev)
+{
+	struct tb_upstream *upstream;
+	struct acpi_handle *nhi_handle;
+	struct pci_dev *dsb0;
+
+	/* host controllers only */
+	if (!dev->port->bus->self ||
+	    pci_pcie_type(dev->port->bus->self) != PCI_EXP_TYPE_ROOT_PORT)
+		return -ENODEV;
+
+	upstream = devm_kzalloc(&dev->device, sizeof(*upstream), GFP_KERNEL);
+	if (!upstream)
+		return -ENOMEM;
+
+	/* find Downstream Bridge 0 and NHI */
+	dsb0 = pci_get_slot(dev->port->subordinate, 0);
+	if (!dsb0 || !dsb0->subordinate) {
+		pci_dev_put(dsb0);
+		return -ENODEV;
+	}
+	upstream->nhi = pci_get_slot(dsb0->subordinate, 0);
+	pci_dev_put(dsb0);
+	if (!upstream->nhi || !pci_match_id(nhi_ids, upstream->nhi))
+		goto err;
+	nhi_handle = ACPI_HANDLE(&upstream->nhi->dev);
+	if (!nhi_handle)
+		goto err;
+
+	/* Macs introduced 2011/2012 have XRPE, 2013+ have TRPE */
+	if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRPE",
+					 &upstream->set_power)) &&
+	    ACPI_FAILURE(acpi_get_handle(nhi_handle, "TRPE",
+					 &upstream->set_power))) {
+		dev_err(&dev->device, "cannot find set_power method\n");
+		goto err;
+	}
+
+	if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRIL",
+					 &upstream->get_power))) {
+		dev_err(&dev->device, "cannot find get_power method\n");
+		goto err;
+	}
+
+	if (ACPI_FAILURE(acpi_evaluate_integer(nhi_handle, "XRIN", NULL,
+					       &upstream->wake_gpe))) {
+		dev_err(&dev->device, "cannot find wake GPE\n");
+		goto err;
+	}
+
+	if (ACPI_FAILURE(acpi_install_gpe_handler(NULL, upstream->wake_gpe,
+						  ACPI_GPE_LEVEL_TRIGGERED,
+						  upstream_wake_nhi,
+						  upstream->nhi))) {
+		dev_err(&dev->device, "cannot install GPE handler\n");
+		goto err;
+	}
+
+	set_service_data(dev, upstream);
+	pci_walk_bus(dev->port->bus, pm_init_cb, NULL);
+	return 0;
+
+err:
+	pci_dev_put(upstream->nhi);
+	return -ENODEV;
+}
+
+static void upstream_remove(struct pcie_device *dev)
+{
+	struct tb_upstream *upstream = get_service_data(dev);
+
+	if (ACPI_FAILURE(acpi_remove_gpe_handler(NULL, upstream->wake_gpe,
+						 upstream_wake_nhi)))
+		dev_err(&dev->device, "cannot remove GPE handler\n");
+
+	pci_dev_put(upstream->nhi);
+	set_service_data(dev, NULL);
+}
+
+struct pcie_port_service_driver upstream_driver = {
+	.name			= "thunderbolt_upstream",
+	.port_type		= PCI_EXP_TYPE_UPSTREAM,
+	.service		= PCIE_PORT_SERVICE_TBT,
+	.probe			= upstream_probe,
+	.remove			= upstream_remove,
+	.prepare		= upstream_prepare,
+	.complete		= upstream_complete,
+	.runtime_suspend	= upstream_runtime_suspend,
+	.runtime_resume		= upstream_runtime_resume,
+};
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 06/13] PCI: pciehp: Support runtime pm
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (7 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 05/13] PCI: Use portdrv pm iterator on further callbacks Lukas Wunner
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

Resume the port for the duration of board_added() and remove_board().
When plugging a device in, the port stays active as long as it has
active children.  When nothing is plugged in, the port may suspend since
hotplug detection continues to work during D3hot.  Thus further runtime
refs need not be acquired.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 880978b..edf8b0e 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -31,6 +31,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/slab.h>
+#include <linux/pm_runtime.h>
 #include <linux/pci.h>
 #include "../pci.h"
 #include "pciehp.h"
@@ -432,7 +433,9 @@ int pciehp_enable_slot(struct slot *p_slot)
 
 	pciehp_get_latch_status(p_slot, &getstatus);
 
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
 	rc = board_added(p_slot);
+	pm_runtime_put(&ctrl->pcie->port->dev);
 	if (rc)
 		pciehp_get_latch_status(p_slot, &getstatus);
 
@@ -445,6 +448,7 @@ int pciehp_enable_slot(struct slot *p_slot)
 int pciehp_disable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
+	int rc;
 	struct controller *ctrl = p_slot->ctrl;
 
 	if (!p_slot->ctrl)
@@ -459,7 +463,10 @@ int pciehp_disable_slot(struct slot *p_slot)
 		}
 	}
 
-	return remove_board(p_slot);
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
+	rc = remove_board(p_slot);
+	pm_runtime_put(&ctrl->pcie->port->dev);
+	return rc;
 }
 
 int pciehp_sysfs_enable_slot(struct slot *p_slot)
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 01/13] PCI: Recognize Thunderbolt devices
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete Lukas Wunner
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

We're about to add a Thunderbolt port service in pcie/portdrv_core.c,
allow runtime PM for Thunderbolt ports on Macs in pcie/portdrv_pci.c
and allow power state D3 for Thunderbolt ports in pci.c.  In each case
we need to uniquely identify if a PCI device belongs to a Thunderbolt
controller.

We also have the need to detect presence of a Thunderbolt controller in
drivers/platform/x86/apple-gmux.c because dual GPU MacBook Pros cannot
switch external DP/HDMI ports between GPUs if they have Thunderbolt.

Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234
on all Thunderbolt devices which allows us to recognize them.

Detect presence of this VSEC on device probe and cache it in a newly
added is_thunderbolt bit in struct pci_dev which can then be queried by
pci.c, portdrv, apple-gmux and possibly others.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.h   |  2 ++
 drivers/pci/probe.c | 17 +++++++++++++++++
 include/linux/pci.h |  1 +
 3 files changed, 20 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9730c47..14ceed7 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -6,6 +6,8 @@
 
 #define PCI_FIND_CAP_TTL	48
 
+#define PCI_VSEC_ID_INTEL_TBT	0x1234	/* Thunderbolt */
+
 extern const unsigned char pcie_link_speed[];
 
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ae7daeb..57f175e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1078,6 +1078,20 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
 		pdev->is_hotplug_bridge = 1;
 }
 
+static void parse_pcie_vendor_specific_caps(struct pci_dev *dev)
+{
+	int vsec = 0;
+	u32 header;
+
+	while ((vsec = pci_find_next_ext_capability(dev, vsec,
+						    PCI_EXT_CAP_ID_VNDR))) {
+		pci_read_config_dword(dev, vsec + PCI_VNDR_HEADER, &header);
+		if (dev->vendor == PCI_VENDOR_ID_INTEL &&
+		    PCI_VNDR_HEADER_ID(header) == PCI_VSEC_ID_INTEL_TBT)
+			dev->is_thunderbolt = 1;
+	}
+}
+
 /**
  * pci_ext_cfg_is_aliased - is ext config space just an alias of std config?
  * @dev: PCI device
@@ -1230,6 +1244,9 @@ int pci_setup_device(struct pci_dev *dev)
 	/* need to have dev->class ready */
 	dev->cfg_size = pci_cfg_space_size(dev);
 
+	/* need to have dev->cfg_size ready */
+	parse_pcie_vendor_specific_caps(dev);
+
 	/* "Unknown power state" */
 	dev->current_state = PCI_UNKNOWN;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7081db3f..267564d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -346,6 +346,7 @@ struct pci_dev {
 	unsigned int	is_virtfn:1;
 	unsigned int	reset_fn:1;
 	unsigned int    is_hotplug_bridge:1;
+	unsigned int	is_thunderbolt:1;
 	unsigned int    __aer_firmware_first_valid:1;
 	unsigned int	__aer_firmware_first:1;
 	unsigned int	broken_intx_masking:1;
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
@ 2016-05-13 11:15 Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 01/13] PCI: Recognize Thunderbolt devices Lukas Wunner
                   ` (15 more replies)
  0 siblings, 16 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm
  Cc: Andreas Noever, Rafael J. Wysocki, Alan Stern, Huang Ying

This series powers Thunderbolt controllers on Macs down when nothing is
plugged in, saving 1.7 W on machines with a Light Ridge controller and
reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.

Briefly, a custom ACPI method provided by Apple is used to cut power to
the controller.  A GPE is enabled while the controller is powered down
which side-band signals a plug event, whereupon power is reinstated using
the ACPI method.  Note that even though this mechanism is ACPI-based,
it does not use _PSx methods and is thus entirely nonstandard.


A Thunderbolt controller appears to the OS as a set of PCI devices:  One
NHI (Native Host Interface) and multiple bridges.  Power is cut to the
entire set of devices:

  (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
                                     +-- Downstream Bridge 1 --
                                     +-- Downstream Bridge 2 --
                                     ...

v1 of this series shoehorned power control into the NHI driver.  This
violated the Linux pm model which assumes that a child cannot resume
before its parent. As seen above, the NHI is a child, so the child cut
power to the bridges above it.

v2 resolves this by positioning power control on the controller's
topmost device, which is the upstream bridge.  That is achieved by
binding to it as a Thunderbolt port service driver.  portdrv already
calls down to each service driver on ->suspend and ->resume and I
extended that scheme to further PM callbacks.  E.g. when the upstream
bridge is runtime suspended, portdrv invokes the ->runtime_suspend
callback of each attached service driver, and the Thunderbolt service
driver's callback in turn invokes the ACPI method to cut power to the
controller.


For such a nonstandard ACPI-based PM mechanism one would normally assign
a dev_pm_domain to the upstream bridge which overrides the PCI subsystem
PM callbacks.  But that's not an option here because dev_pm_domain_set()
can only be called during driver probe.  The driver is portdrv and
obviously loads earlier than the thunderbolt port service driver.
So one has to make do with the PCI subsystem PM callbacks.

The PCI core currently assumes that devices can only be put into D3cold
by the platform, i.e. using the standard ACPI _PSx methods.  I extended
the PCI core so that it can deal with devices which are put into D3cold
by the driver callbacks.  It turns out only two changes are needed to
make this work, and they are in patches [09/13] and [10/13].  Runtime
suspend works out of the box, but runtime resume tries to set the device
power state *before* invoking the driver callback, and this goes awry
since the device is still in D3cold.  I solved this by returning an error
in pci_raw_set_power_state() if the device's current_state is D3cold
(patch [09/13]).

Theoretically it would also be possible to patch the missing _PSx methods
into the ACPI namespace at runtime but I suspect it wouldn't be pretty:
I think I'd have to include pre-compiled AML methods in the kernel and
modify those blobs at runtime (adjust GPE number etc) before patching
them into the namespace.


To make direct-complete work for such non-platform-power-managed devices,
I also had to modify pci_target_state() (patch [10/13]).

Finally, I wanted to avoid the mandatory runtime resume after direct-
complete which was introduced by Rafael with 58a1fbbb2ee8 ("PM / PCI /
ACPI: Kick devices that might have been reset by firmware"), so I added
the possibility to opt out of it (patch [11/13]).


I've pushed these patches to GitHub where they can be reviewed more
comfortably with green/red highlighting:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v2

For reference, here's a link to v1:
http://thread.gmane.org/gmane.linux.power-management.general/73197

Thanks in advance for your comments.

Lukas


Lukas Wunner (13):
  PCI: Recognize Thunderbolt devices
  PCI: Allow D3 for Thunderbolt ports
  PCI: Add Thunderbolt portdrv service type
  PCI: Generalize portdrv pm iterator
  PCI: Use portdrv pm iterator on further callbacks
  PCI: pciehp: Support runtime pm
  PCI: pciehp: Ignore interrupts during D3cold
  PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs
  PCI: Do not write to PM control register while in D3cold
  PCI: Avoid going from D3cold to D3hot for system sleep
  PM / sleep: Allow opt-out from runtime resume after direct-complete
  thunderbolt: Support runtime pm on upstream bridge
  thunderbolt: Support runtime pm on NHI

 drivers/base/power/generic_ops.c  |   3 +-
 drivers/pci/hotplug/pciehp_ctrl.c |   9 +-
 drivers/pci/hotplug/pciehp_hpc.c  |   4 +
 drivers/pci/pci.c                 |  50 ++----
 drivers/pci/pci.h                 |   2 +
 drivers/pci/pcie/portdrv.h        |   6 +-
 drivers/pci/pcie/portdrv_core.c   |  47 +-----
 drivers/pci/pcie/portdrv_pci.c    |  88 ++++++++--
 drivers/pci/probe.c               |  17 ++
 drivers/thunderbolt/Kconfig       |   4 +-
 drivers/thunderbolt/Makefile      |   4 +-
 drivers/thunderbolt/nhi.c         |  32 +++-
 drivers/thunderbolt/switch.c      |   9 +
 drivers/thunderbolt/tb.c          |  13 ++
 drivers/thunderbolt/upstream.c    | 345 ++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h               |   1 +
 include/linux/pcieport_if.h       |   7 +
 include/linux/pm.h                |   1 +
 18 files changed, 539 insertions(+), 103 deletions(-)
 create mode 100644 drivers/thunderbolt/upstream.c

-- 
2.8.1


^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 04/13] PCI: Generalize portdrv pm iterator
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (4 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 13/13] thunderbolt: Support runtime pm on NHI Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 08/13] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs Lukas Wunner
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

Move ->suspend and ->resume callbacks from portdrv_core.c to
portdrv_pci.c (where ->resume_noirq, ->runtime_suspend, ->runtime_resume
and ->runtime_idle already are), allowing us to drop their prototypes
from portdrv.h.

Replace suspend_iter() and resume_iter() with a single function
generic_iter() with the intention of using it in further pm callbacks.

Rename pcie_port_device_(suspend|resume) to pcie_port_(suspend|resume)
to be consistent with the existing pm callbacks.

Replace the somewhat terse kerneldoc for pcie_port_(suspend|resume) with
a generic documentation which applies to all pm callbacks.

No functional change intended.

(Okay there *is* one functional change, generic_iter() returns the
result of the service driver's callback whereas suspend_iter() and
resume_iter() always returned 0.  That was a bug since we never
propagated errors that occurred in the service driver callbacks back to
the pm core.  The bug never manifested itself because PME's and
Hotplug's pm callbacks always return 0, AER doesn't declare pm callbacks
and VC has no service driver.  So there's no *behavioral* change right
now.)

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/portdrv.h      |  4 ----
 drivers/pci/pcie/portdrv_core.c | 45 -------------------------------------
 drivers/pci/pcie/portdrv_pci.c  | 49 ++++++++++++++++++++++++++++++++++++-----
 3 files changed, 43 insertions(+), 55 deletions(-)

diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index a0d9973..9f21926 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -22,10 +22,6 @@
 
 extern struct bus_type pcie_port_bus_type;
 int pcie_port_device_register(struct pci_dev *dev);
-#ifdef CONFIG_PM
-int pcie_port_device_suspend(struct device *dev);
-int pcie_port_device_resume(struct device *dev);
-#endif
 void pcie_port_device_remove(struct pci_dev *dev);
 int __must_check pcie_port_bus_register(void);
 void pcie_port_bus_unregister(void);
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 8cd9db8..3621f96 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -414,51 +414,6 @@ error_disable:
 	return status;
 }
 
-#ifdef CONFIG_PM
-static int suspend_iter(struct device *dev, void *data)
-{
-	struct pcie_port_service_driver *service_driver;
-
-	if ((dev->bus == &pcie_port_bus_type) && dev->driver) {
-		service_driver = to_service_driver(dev->driver);
-		if (service_driver->suspend)
-			service_driver->suspend(to_pcie_device(dev));
-	}
-	return 0;
-}
-
-/**
- * pcie_port_device_suspend - suspend port services associated with a PCIe port
- * @dev: PCI Express port to handle
- */
-int pcie_port_device_suspend(struct device *dev)
-{
-	return device_for_each_child(dev, NULL, suspend_iter);
-}
-
-static int resume_iter(struct device *dev, void *data)
-{
-	struct pcie_port_service_driver *service_driver;
-
-	if ((dev->bus == &pcie_port_bus_type) &&
-	    (dev->driver)) {
-		service_driver = to_service_driver(dev->driver);
-		if (service_driver->resume)
-			service_driver->resume(to_pcie_device(dev));
-	}
-	return 0;
-}
-
-/**
- * pcie_port_device_resume - resume port services associated with a PCIe port
- * @dev: PCI Express port to handle
- */
-int pcie_port_device_resume(struct device *dev)
-{
-	return device_for_each_child(dev, NULL, resume_iter);
-}
-#endif /* PM */
-
 static int remove_iter(struct device *dev, void *data)
 {
 	if (dev->bus == &pcie_port_bus_type)
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index cd41360..acbd1d2 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -79,6 +79,43 @@ static int pcie_portdrv_restore_config(struct pci_dev *dev)
 }
 
 #ifdef CONFIG_PM
+typedef int (*pcie_pm_callback_t)(struct pcie_device *);
+
+static int generic_iter(struct device *dev, void *data)
+{
+	struct pcie_port_service_driver *service_driver;
+	size_t offset = *(size_t *)data;
+	pcie_pm_callback_t cb;
+
+	if ((dev->bus == &pcie_port_bus_type) && dev->driver) {
+		service_driver = to_service_driver(dev->driver);
+		cb = *(pcie_pm_callback_t *)((void *)service_driver + offset);
+		if (cb)
+			return cb(to_pcie_device(dev));
+	}
+	return 0;
+}
+
+/*
+ * The PM callbacks iterate over the port services allocated for the PCIe port
+ * and call down to each of them. Execution is aborted as soon as one of them
+ * returns a non-zero value.
+ *
+ * The return value is 0 if all port services' callbacks returned 0, otherwise
+ * it is the return value of the last callback executed.
+ */
+static int pcie_port_suspend(struct device *dev)
+{
+	size_t o = offsetof(struct pcie_port_service_driver, suspend);
+	return device_for_each_child(dev, &o, generic_iter);
+}
+
+static int pcie_port_resume(struct device *dev)
+{
+	size_t o = offsetof(struct pcie_port_service_driver, resume);
+	return device_for_each_child(dev, &o, generic_iter);
+}
+
 static int pcie_port_resume_noirq(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -114,12 +151,12 @@ static int pcie_port_runtime_idle(struct device *dev)
 }
 
 static const struct dev_pm_ops pcie_portdrv_pm_ops = {
-	.suspend	= pcie_port_device_suspend,
-	.resume		= pcie_port_device_resume,
-	.freeze		= pcie_port_device_suspend,
-	.thaw		= pcie_port_device_resume,
-	.poweroff	= pcie_port_device_suspend,
-	.restore	= pcie_port_device_resume,
+	.suspend	= pcie_port_suspend,
+	.resume		= pcie_port_resume,
+	.freeze		= pcie_port_suspend,
+	.thaw		= pcie_port_resume,
+	.poweroff	= pcie_port_suspend,
+	.restore	= pcie_port_resume,
 	.resume_noirq	= pcie_port_resume_noirq,
 	.runtime_suspend = pcie_port_runtime_suspend,
 	.runtime_resume	= pcie_port_runtime_resume,
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 01/13] PCI: Recognize Thunderbolt devices Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-06-17 22:51   ` Bjorn Helgaas
  2016-05-13 11:15 ` [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold Lukas Wunner
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

A Thunderbolt controller is a PCIe switch which, as defined in the PCIe
spec, appears to the OS "as a collection of virtual PCI-to-PCI bridges".

We're about to add support for Apple's nonstandard ACPI methods to power
Thunderbolt controllers up and down.  To facilitate that, allocate a
port service for every PCI bridge belonging to a Thunderbolt controller.

This port service might come in handy for other use cases, e.g. device
initialization of Thunderbolt controllers.

To understand when and how this port service will be allocated, consider
the PCI devices exposed by a Thunderbolt host controller:

  (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
                                     +-- Downstream Bridge 1 --
                                     +-- Downstream Bridge 2 --
                                     ...

The upstream and downstream bridges represent the PCIe switch and a
Thunderbolt port service will be allocated for each of them.  Hotplugged
devices will appear behind the downstream bridges.  The NHI (Native Host
Interface) is a virtual PCI device to manage the switch fabric and is
not relevant here.  It uses class 0x88000, so it is not a PCIe port.

Next, consider the PCI devices exposed by Thunderbolt controllers built
into hotplugged devices:

  -- Upstream Bridge ---- Downstream Bridge ---- Hotplugged device

Again, Thunderbolt port services will be allocated for the upstream and
downstream bridge, but not for the hotplugged device, which might use
e.g. class 0x20000 if it's a Thunderbolt Ethernet adapter.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/portdrv.h      | 2 +-
 drivers/pci/pcie/portdrv_core.c | 2 ++
 include/linux/pcieport_if.h     | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 587aef3..a0d9973 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -11,7 +11,7 @@
 
 #include <linux/compiler.h>
 
-#define PCIE_PORT_DEVICE_MAXSERVICES   5
+#define PCIE_PORT_DEVICE_MAXSERVICES	6
 /*
  * According to the PCI Express Base Specification 2.0, the indices of
  * the MSI-X table entries used by port services must not exceed 31
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index d04fb58..8cd9db8 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -310,6 +310,8 @@ static int get_port_device_capability(struct pci_dev *dev)
 	}
 	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC))
 		services |= PCIE_PORT_SERVICE_DPC;
+	if (dev->is_thunderbolt)
+		services |= PCIE_PORT_SERVICE_TBT;
 
 	return services;
 }
diff --git a/include/linux/pcieport_if.h b/include/linux/pcieport_if.h
index afcd130..d205bd6 100644
--- a/include/linux/pcieport_if.h
+++ b/include/linux/pcieport_if.h
@@ -23,6 +23,8 @@
 #define PCIE_PORT_SERVICE_VC		(1 << PCIE_PORT_SERVICE_VC_SHIFT)
 #define PCIE_PORT_SERVICE_DPC_SHIFT	4	/* Downstream Port Containment */
 #define PCIE_PORT_SERVICE_DPC		(1 << PCIE_PORT_SERVICE_DPC_SHIFT)
+#define PCIE_PORT_SERVICE_TBT_SHIFT	5	/* Thunderbolt */
+#define PCIE_PORT_SERVICE_TBT		(1 << PCIE_PORT_SERVICE_TBT_SHIFT)
 
 struct pcie_device {
 	int		irq;	    /* Service IRQ/MSI/MSI-X Vector */
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 08/13] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (5 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 04/13] PCI: Generalize portdrv pm iterator Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-06-14  9:08   ` [PATCH v2 08/13 REBASED] " Lukas Wunner
  2016-06-17 21:53   ` [PATCH v2 08/13] " Bjorn Helgaas
  2016-05-13 11:15 ` [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep Lukas Wunner
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

Thunderbolt controllers have a pin to signal plug events while the
controller is powered down.  On Macs this pin is wired to the
southbridge and causes a GPE to be fired.  The OS may then power up the
controller to probe the newly connected device.  It is thus okay to let
Thunderbolt hotplug ports go to D3 on Macs.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/portdrv_pci.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index f75d4b5..7860ab3 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -224,9 +224,12 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 	 * to enumerate devices behind this port properly (the port is
 	 * powered down preventing all config space accesses to the
 	 * subordinate devices).  We can't be sure for native PCIe hotplug
-	 * either so prevent that as well.
+	 * either so prevent that as well.  However Thunderbolt controllers
+	 * on Macs are capable of side-band signaling plug events while
+	 * powered down, so allow them to suspend.
 	 */
-	if (!dev->is_hotplug_bridge) {
+	if (!dev->is_hotplug_bridge ||
+	    (dev->is_thunderbolt && dmi_match(DMI_SYS_VENDOR, "Apple Inc."))) {
 		/*
 		 * Keep the port resumed 10ms to make sure things like
 		 * config space accesses from userspace (lspci) will not
@@ -243,7 +246,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
-	if (!dev->is_hotplug_bridge) {
+	if (!dev->is_hotplug_bridge ||
+	    (dev->is_thunderbolt && dmi_match(DMI_SYS_VENDOR, "Apple Inc."))) {
 		pm_runtime_forbid(&dev->dev);
 		pm_runtime_get_noresume(&dev->dev);
 		pm_runtime_dont_use_autosuspend(&dev->dev);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 13/13] thunderbolt: Support runtime pm on NHI
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (3 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 04/13] PCI: Generalize portdrv pm iterator Lukas Wunner
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

Runtime suspend the NHI when no Thunderbolt devices have been plugged in
for 10 sec (user-configurable via autosuspend_delay_ms in sysfs).

The NHI is not able to detect plug events while suspended, it relies on
the upstream bridge to resume it on hotplug.

After the NHI resumes, it takes about 700 ms until a hotplug event
appears on the RX ring.  In case autosuspend_delay_ms has been reduced
to 0 by the user, we need to wait in tb_resume() to avoid going back to
sleep before we had a chance to detect the hotplugged device.  A runtime
pm ref is held for the duration of tb_handle_hotplug() to keep the NHI
awake while the hotplug event is processed.

Apart from that we acquire a runtime pm ref for each newly allocated
switch (except for the root switch) and drop one when a switch is freed,
thereby ensuring the NHI stays active as long as devices are plugged in.
This behaviour is identical to the OS X driver.

Cc: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/thunderbolt/nhi.c    | 11 +++++++++++
 drivers/thunderbolt/switch.c |  9 +++++++++
 drivers/thunderbolt/tb.c     | 13 +++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index d54666e..b44415c 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -606,6 +606,12 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 	pci_set_drvdata(pdev, tb);
 
+	pm_runtime_allow(&pdev->dev);
+	pm_runtime_set_autosuspend_delay(&pdev->dev, 10000);
+	pm_runtime_use_autosuspend(&pdev->dev);
+	pm_runtime_mark_last_busy(&pdev->dev);
+	pm_runtime_put_autosuspend(&pdev->dev);
+
 	return 0;
 }
 
@@ -613,6 +619,9 @@ static void nhi_remove(struct pci_dev *pdev)
 {
 	struct tb *tb = pci_get_drvdata(pdev);
 	struct tb_nhi *nhi = tb->nhi;
+
+	pm_runtime_get(&pdev->dev);
+	pm_runtime_forbid(&pdev->dev);
 	thunderbolt_shutdown_and_free(tb);
 	nhi_shutdown(nhi);
 }
@@ -630,6 +639,8 @@ static const struct dev_pm_ops nhi_pm_ops = {
 					    * pci-tunnels stay alive.
 					    */
 	.restore_noirq = nhi_resume_noirq,
+	.runtime_suspend = nhi_suspend_noirq,
+	.runtime_resume = nhi_resume_noirq,
 };
 
 struct pci_device_id nhi_ids[] = {
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index 1e116f5..89b4622 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/pm_runtime.h>
 #include <linux/slab.h>
 
 #include "tb.h"
@@ -326,6 +327,11 @@ void tb_switch_free(struct tb_switch *sw)
 	if (!sw->is_unplugged)
 		tb_plug_events_active(sw, false);
 
+	if (sw != sw->tb->root_switch) {
+		pm_runtime_mark_last_busy(&sw->tb->nhi->pdev->dev);
+		pm_runtime_put_autosuspend(&sw->tb->nhi->pdev->dev);
+	}
+
 	kfree(sw->ports);
 	kfree(sw->drom);
 	kfree(sw);
@@ -418,6 +424,9 @@ struct tb_switch *tb_switch_alloc(struct tb *tb, u64 route)
 	if (tb_plug_events_active(sw, true))
 		goto err;
 
+	if (tb->root_switch)
+		pm_runtime_get(&tb->nhi->pdev->dev);
+
 	return sw;
 err:
 	kfree(sw->ports);
diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index 24b6d30..a3fedf9 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -7,6 +7,7 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/delay.h>
+#include <linux/pm_runtime.h>
 
 #include "tb.h"
 #include "tb_regs.h"
@@ -217,8 +218,11 @@ static void tb_handle_hotplug(struct work_struct *work)
 {
 	struct tb_hotplug_event *ev = container_of(work, typeof(*ev), work);
 	struct tb *tb = ev->tb;
+	struct device *dev = &tb->nhi->pdev->dev;
 	struct tb_switch *sw;
 	struct tb_port *port;
+
+	pm_runtime_get(dev);
 	mutex_lock(&tb->lock);
 	if (!tb->hotplug_active)
 		goto out; /* during init, suspend or shutdown */
@@ -274,6 +278,8 @@ static void tb_handle_hotplug(struct work_struct *work)
 out:
 	mutex_unlock(&tb->lock);
 	kfree(ev);
+	pm_runtime_mark_last_busy(dev);
+	pm_runtime_put_autosuspend(dev);
 }
 
 /**
@@ -433,4 +439,11 @@ void thunderbolt_resume(struct tb *tb)
 	tb->hotplug_active = true;
 	mutex_unlock(&tb->lock);
 	tb_info(tb, "resume finished\n");
+
+	/*
+	 * If runtime resuming due to a hotplug event (rather than resuming
+	 * from system sleep), wait for it to arrive. May take about 700 ms.
+	 */
+	if (tb->nhi->pdev->dev.power.runtime_status == RPM_RESUMING)
+		msleep(1000);
 }
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 05/13] PCI: Use portdrv pm iterator on further callbacks
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (8 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 06/13] PCI: pciehp: Support runtime pm Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 02/13] PCI: Allow D3 for Thunderbolt ports Lukas Wunner
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

We already allow service drivers to declare ->suspend and ->resume
callbacks which get called one by one when the port is suspended or
resumed.

Allow the same for ->prepare, ->complete, ->resume_noirq,
->runtime_suspend and ->runtime_resume.

Call pcie_port_resume_noirq() also on ->restore_noirq.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/portdrv_pci.c | 29 ++++++++++++++++++++++++++---
 include/linux/pcieport_if.h    |  5 +++++
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index acbd1d2..f75d4b5 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -104,6 +104,18 @@ static int generic_iter(struct device *dev, void *data)
  * The return value is 0 if all port services' callbacks returned 0, otherwise
  * it is the return value of the last callback executed.
  */
+static int pcie_port_prepare(struct device *dev)
+{
+	size_t o = offsetof(struct pcie_port_service_driver, prepare);
+	return device_for_each_child(dev, &o, generic_iter);
+}
+
+static void pcie_port_complete(struct device *dev)
+{
+	size_t o = offsetof(struct pcie_port_service_driver, complete);
+	device_for_each_child(dev, &o, generic_iter);
+}
+
 static int pcie_port_suspend(struct device *dev)
 {
 	size_t o = offsetof(struct pcie_port_service_driver, suspend);
@@ -118,6 +130,7 @@ static int pcie_port_resume(struct device *dev)
 
 static int pcie_port_resume_noirq(struct device *dev)
 {
+	size_t o = offsetof(struct pcie_port_service_driver, resume_noirq);
 	struct pci_dev *pdev = to_pci_dev(dev);
 
 	/*
@@ -127,17 +140,24 @@ static int pcie_port_resume_noirq(struct device *dev)
 	 */
 	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
 		pcie_clear_root_pme_status(pdev);
-	return 0;
+
+	return device_for_each_child(dev, &o, generic_iter);
 }
 
 static int pcie_port_runtime_suspend(struct device *dev)
 {
-	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
+	size_t o = offsetof(struct pcie_port_service_driver, runtime_suspend);
+
+	if (!to_pci_dev(dev)->bridge_d3)
+		return -EBUSY;
+
+	return device_for_each_child(dev, &o, generic_iter);
 }
 
 static int pcie_port_runtime_resume(struct device *dev)
 {
-	return 0;
+	size_t o = offsetof(struct pcie_port_service_driver, runtime_resume);
+	return device_for_each_child(dev, &o, generic_iter);
 }
 
 static int pcie_port_runtime_idle(struct device *dev)
@@ -151,6 +171,8 @@ static int pcie_port_runtime_idle(struct device *dev)
 }
 
 static const struct dev_pm_ops pcie_portdrv_pm_ops = {
+	.prepare	= pcie_port_prepare,
+	.complete	= pcie_port_complete,
 	.suspend	= pcie_port_suspend,
 	.resume		= pcie_port_resume,
 	.freeze		= pcie_port_suspend,
@@ -158,6 +180,7 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.poweroff	= pcie_port_suspend,
 	.restore	= pcie_port_resume,
 	.resume_noirq	= pcie_port_resume_noirq,
+	.restore_noirq	= pcie_port_resume_noirq,
 	.runtime_suspend = pcie_port_runtime_suspend,
 	.runtime_resume	= pcie_port_runtime_resume,
 	.runtime_idle	= pcie_port_runtime_idle,
diff --git a/include/linux/pcieport_if.h b/include/linux/pcieport_if.h
index d205bd6..092517d 100644
--- a/include/linux/pcieport_if.h
+++ b/include/linux/pcieport_if.h
@@ -49,8 +49,13 @@ struct pcie_port_service_driver {
 	const char *name;
 	int (*probe) (struct pcie_device *dev);
 	void (*remove) (struct pcie_device *dev);
+	int (*prepare) (struct pcie_device *dev);
+	int (*complete) (struct pcie_device *dev);
 	int (*suspend) (struct pcie_device *dev);
 	int (*resume) (struct pcie_device *dev);
+	int (*resume_noirq) (struct pcie_device *dev);
+	int (*runtime_suspend) (struct pcie_device *dev);
+	int (*runtime_resume) (struct pcie_device *dev);
 
 	/* Service Error Recovery Handler */
 	const struct pci_error_handlers *err_handler;
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (2 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-06-17 21:18   ` Bjorn Helgaas
  2016-07-18 13:55   ` Rafael J. Wysocki
  2016-05-13 11:15 ` [PATCH v2 13/13] thunderbolt: Support runtime pm on NHI Lukas Wunner
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever, Huang Ying, Rafael J. Wysocki

The PM control register is not accessible in D3cold so we shouldn't try
writing to it in pci_raw_set_power_state() and return an error instead.

The current behaviour is fatal for devices which are not power-managed
by the platform, yet can be runtime suspended to D3cold with some other
mechanism by the driver:

- When the device is runtime resumed, pci_pm_runtime_resume() first
  calls pci_restore_standard_config() which calls pci_set_power_state()
  which calls pci_raw_set_power_state() to put the device into D0.
  This fails since the device is still in D3cold.  It will be powered up
  later on when pci_pm_runtime_resume() calls the driver's
  ->runtime_resume callback.

- pci_raw_set_power_state() prints a message that the device refused to
  change power state and returns 0.  Further up in the call stack,
  pci_restore_standard_config() calls pci_restore_state(), which fails
  since the device is in D3cold, but nevertheless invalidates the
  saved_state.

- When pci_pm_runtime_resume() finally calls the driver ->runtime_resume
  callback to power up the device, the saved_state is gone.

Return an error from pci_raw_set_power_state() to avoid this.

An example for devices affected by this are Thunderbolt controllers
built into Macs which can be put into D3cold with nonstandard ACPI
methods.

Unfortunately we rely on pci_raw_set_power_state()'s current behaviour
in several places: When platform_pci_set_power_state() is called to wake
a device from D3cold, its current_state is not updated even though it is
no longer in D3cold.  Instead, pci_raw_set_power_state() is assumed to
clean up the resulting incongruence.  Fix by setting current_state to
PCI_UNKNOWN after platform_pci_set_power_state().

Also, when a bridge is put into D3cold, its children's current_state is
changed to D3cold in __pci_complete_power_transition().  (Introduced by
commit 448bd857d48e ("PCI/PM: add PCIe runtime D3cold support").) This
doesn't necessarily reflect the children's actual power state: They may
still be powered on, they're just no longer accessible.  However this
shouldn't be a concern because if the children are accessed, their
parent needs to resume anyway and the PM core takes care of this.
Again, pci_raw_set_power_state() is relied upon to clean up the
current_state when the children are resumed the next time.  We cannot
reliably reconstruct the children's current_state when resuming their
parent.  We also shouldn't blindly set it to PCI_UNKNOWN since some
children may actually be turned off and D3cold is their correct
current_state.  Therefore fix by no longer touching the children's
current_state at all.

Cc: Huang Ying <ying.huang@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.c | 43 ++++++++++---------------------------------
 1 file changed, 10 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 95727b4..791dfe7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -612,6 +612,9 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
 	if (!dev->pm_cap)
 		return -EIO;
 
+	if (dev->current_state == PCI_D3cold)
+		return -EIO;
+
 	if (state < PCI_D0 || state > PCI_D3hot)
 		return -EINVAL;
 
@@ -728,8 +731,10 @@ void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
  */
 void pci_power_up(struct pci_dev *dev)
 {
-	if (platform_pci_power_manageable(dev))
+	if (platform_pci_power_manageable(dev)) {
 		platform_pci_set_power_state(dev, PCI_D0);
+		dev->current_state = PCI_UNKNOWN;
+	}
 
 	pci_raw_set_power_state(dev, PCI_D0);
 	pci_update_current_state(dev, PCI_D0);
@@ -746,8 +751,10 @@ static int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state)
 
 	if (platform_pci_power_manageable(dev)) {
 		error = platform_pci_set_power_state(dev, state);
-		if (!error)
+		if (!error) {
+			dev->current_state = PCI_UNKNOWN;
 			pci_update_current_state(dev, state);
+		}
 	} else
 		error = -ENODEV;
 
@@ -809,30 +816,6 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
 }
 
 /**
- * __pci_dev_set_current_state - Set current state of a PCI device
- * @dev: Device to handle
- * @data: pointer to state to be set
- */
-static int __pci_dev_set_current_state(struct pci_dev *dev, void *data)
-{
-	pci_power_t state = *(pci_power_t *)data;
-
-	dev->current_state = state;
-	return 0;
-}
-
-/**
- * __pci_bus_set_current_state - Walk given bus and set current state of devices
- * @bus: Top bus of the subtree to walk.
- * @state: state to be set
- */
-static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
-{
-	if (bus)
-		pci_walk_bus(bus, __pci_dev_set_current_state, &state);
-}
-
-/**
  * __pci_complete_power_transition - Complete power transition of a PCI device
  * @dev: PCI device to handle.
  * @state: State to put the device into.
@@ -841,15 +824,9 @@ static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
  */
 int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state)
 {
-	int ret;
-
 	if (state <= PCI_D0)
 		return -EINVAL;
-	ret = pci_platform_power_transition(dev, state);
-	/* Power off the bridge may power off the whole hierarchy */
-	if (!ret && state == PCI_D3cold)
-		__pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
-	return ret;
+	return pci_platform_power_transition(dev, state);
 }
 EXPORT_SYMBOL_GPL(__pci_complete_power_transition);
 
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 02/13] PCI: Allow D3 for Thunderbolt ports
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (9 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 05/13] PCI: Use portdrv pm iterator on further callbacks Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 12/13] thunderbolt: Support runtime pm on upstream bridge Lukas Wunner
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

Currently PCIe ports are only allowed to go to D3 if the BIOS is dated
2015 or newer to avoid potential issues with old chipsets.  However for
Thunderbolt we know that even the oldest controller, Light Ridge (2010),
is able to suspend its ports to D3 just fine.

We're about to add runtime PM for Thunderbolt on the Mac.  Apple has
released two EFI security updates in 2015 which encompass all machines
with Thunderbolt, but the achieved power saving should be made available
to users even if they haven't updated their BIOS.  To this end,
special-case Thunderbolt in pci_bridge_d3_possible().

This allows the Thunderbolt controller to power down but the root port
to which the Thunderbolt controller is attached remains in D0 unless
the EFI update is installed.  Users can pass pcie_port_pm=force on the
kernel command line if they cannot install the EFI update but still want
to benefit from the additional power saving of putting the root port to
D3.  In practice root ports can be suspended to D3 without problems at
least on 2012 Ivy Bridge machines.

If the BIOS cut-off date is ever lowered to 2010, the Thunderbolt
special case can be removed.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b2eb530f..95727b4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2176,7 +2176,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
  * @bridge: Bridge to check
  *
  * This function checks if it is possible to move the bridge to D3.
- * Currently we only allow D3 for recent enough PCIe ports.
+ * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt.
  */
 static bool pci_bridge_d3_possible(struct pci_dev *bridge)
 {
@@ -2202,6 +2202,9 @@ static bool pci_bridge_d3_possible(struct pci_dev *bridge)
 		    year >= 2015) {
 			return true;
 		}
+		/* Even the oldest 2010 Thunderbolt controller supports D3. */
+		if (bridge->is_thunderbolt)
+			return true;
 		break;
 	}
 
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
  2016-05-13 11:15 ` [PATCH v2 01/13] PCI: Recognize Thunderbolt devices Lukas Wunner
@ 2016-05-13 11:15 ` Lukas Wunner
  2016-07-18 13:18   ` Rafael J. Wysocki
  2016-05-13 11:15 ` [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type Lukas Wunner
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-05-13 11:15 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever, Rafael J. Wysocki, Alan Stern

Since commit aae4518b3124 ("PM / sleep: Mechanism to avoid resuming
runtime-suspended devices unnecessarily"), we no longer wake up devices
which are already runtime suspended upon entering system sleep
("direct-complete").

However commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might
have been reset by firmware") changed this to mandatorily runtime resume
such devices after the system is woken.  The motivation was to ensure
that devices do not remain in a reset-power-on state after system
resume, potentially preventing deep SoC-wide low-power states from being
entered on idle.

This is counter-productive for devices of which we know that the
mandatory runtime resume is unnecessary.  Thunderbolt on the Mac is a
case in point: Runtime resume not just powers up the controller, but
multiple adjacent chips, including a 15V boost converter, multiplexers
and an eeprom.  Gratuitously powering this up after every system sleep
burns a not insignificant amount of energy and needlessly strains the
hardware.

Perhaps it would have been better to carry out the mandatory runtime
resume only for those devices that actually need it, but at least we
should allow an opt-out.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/base/power/generic_ops.c | 3 ++-
 include/linux/pm.h               | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c
index 07c3c4a..6e88f55 100644
--- a/drivers/base/power/generic_ops.c
+++ b/drivers/base/power/generic_ops.c
@@ -316,7 +316,8 @@ void pm_complete_with_resume_check(struct device *dev)
 	 * the sleep state it is going out of and it has never been resumed till
 	 * now, resume it in case the firmware powered it up.
 	 */
-	if (dev->power.direct_complete && pm_resume_via_firmware())
+	if (dev->power.direct_complete && pm_resume_via_firmware() &&
+	    !dev->power.direct_complete_noresume)
 		pm_request_resume(dev);
 }
 EXPORT_SYMBOL_GPL(pm_complete_with_resume_check);
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 6a5d654..023de94 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -596,6 +596,7 @@ struct dev_pm_info {
 	unsigned int		use_autosuspend:1;
 	unsigned int		timer_autosuspends:1;
 	unsigned int		memalloc_noio:1;
+	unsigned int		direct_complete_noresume:1;
 	enum rpm_request	request;
 	enum rpm_status		runtime_status;
 	int			runtime_error;
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (12 preceding siblings ...)
  2016-05-13 11:15 ` [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold Lukas Wunner
@ 2016-05-21  9:48 ` Andreas Noever
  2016-06-14 16:37   ` Bjorn Helgaas
  2016-06-13 20:58 ` Bjorn Helgaas
  2016-07-07 15:02 ` Lukas Wunner
  15 siblings, 1 reply; 65+ messages in thread
From: Andreas Noever @ 2016-05-21  9:48 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, Linux PM list, Rafael J. Wysocki, Alan Stern,
	Huang Ying, Bjorn Helgaas

[+CC: Bjorn]

Signed-off-by: Andreas Noever <andreas.noever@gmail.com>

Tested on MacBookPro10,1

On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
> This series powers Thunderbolt controllers on Macs down when nothing is
> plugged in, saving 1.7 W on machines with a Light Ridge controller and
> reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
>
> Briefly, a custom ACPI method provided by Apple is used to cut power to
> the controller.  A GPE is enabled while the controller is powered down
> which side-band signals a plug event, whereupon power is reinstated using
> the ACPI method.  Note that even though this mechanism is ACPI-based,
> it does not use _PSx methods and is thus entirely nonstandard.
>
>
> A Thunderbolt controller appears to the OS as a set of PCI devices:  One
> NHI (Native Host Interface) and multiple bridges.  Power is cut to the
> entire set of devices:
>
>   (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
>                                      +-- Downstream Bridge 1 --
>                                      +-- Downstream Bridge 2 --
>                                      ...
>
> v1 of this series shoehorned power control into the NHI driver.  This
> violated the Linux pm model which assumes that a child cannot resume
> before its parent. As seen above, the NHI is a child, so the child cut
> power to the bridges above it.
>
> v2 resolves this by positioning power control on the controller's
> topmost device, which is the upstream bridge.  That is achieved by
> binding to it as a Thunderbolt port service driver.  portdrv already
> calls down to each service driver on ->suspend and ->resume and I
> extended that scheme to further PM callbacks.  E.g. when the upstream
> bridge is runtime suspended, portdrv invokes the ->runtime_suspend
> callback of each attached service driver, and the Thunderbolt service
> driver's callback in turn invokes the ACPI method to cut power to the
> controller.
>
>
> For such a nonstandard ACPI-based PM mechanism one would normally assign
> a dev_pm_domain to the upstream bridge which overrides the PCI subsystem
> PM callbacks.  But that's not an option here because dev_pm_domain_set()
> can only be called during driver probe.  The driver is portdrv and
> obviously loads earlier than the thunderbolt port service driver.
> So one has to make do with the PCI subsystem PM callbacks.
>
> The PCI core currently assumes that devices can only be put into D3cold
> by the platform, i.e. using the standard ACPI _PSx methods.  I extended
> the PCI core so that it can deal with devices which are put into D3cold
> by the driver callbacks.  It turns out only two changes are needed to
> make this work, and they are in patches [09/13] and [10/13].  Runtime
> suspend works out of the box, but runtime resume tries to set the device
> power state *before* invoking the driver callback, and this goes awry
> since the device is still in D3cold.  I solved this by returning an error
> in pci_raw_set_power_state() if the device's current_state is D3cold
> (patch [09/13]).
>
> Theoretically it would also be possible to patch the missing _PSx methods
> into the ACPI namespace at runtime but I suspect it wouldn't be pretty:
> I think I'd have to include pre-compiled AML methods in the kernel and
> modify those blobs at runtime (adjust GPE number etc) before patching
> them into the namespace.
>
>
> To make direct-complete work for such non-platform-power-managed devices,
> I also had to modify pci_target_state() (patch [10/13]).
>
> Finally, I wanted to avoid the mandatory runtime resume after direct-
> complete which was introduced by Rafael with 58a1fbbb2ee8 ("PM / PCI /
> ACPI: Kick devices that might have been reset by firmware"), so I added
> the possibility to opt out of it (patch [11/13]).
>
>
> I've pushed these patches to GitHub where they can be reviewed more
> comfortably with green/red highlighting:
> https://github.com/l1k/linux/commits/thunderbolt_runpm_v2
>
> For reference, here's a link to v1:
> http://thread.gmane.org/gmane.linux.power-management.general/73197
>
> Thanks in advance for your comments.
>
> Lukas
>
>
> Lukas Wunner (13):
>   PCI: Recognize Thunderbolt devices
>   PCI: Allow D3 for Thunderbolt ports
>   PCI: Add Thunderbolt portdrv service type
>   PCI: Generalize portdrv pm iterator
>   PCI: Use portdrv pm iterator on further callbacks
>   PCI: pciehp: Support runtime pm
>   PCI: pciehp: Ignore interrupts during D3cold
>   PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs
>   PCI: Do not write to PM control register while in D3cold
>   PCI: Avoid going from D3cold to D3hot for system sleep
>   PM / sleep: Allow opt-out from runtime resume after direct-complete
>   thunderbolt: Support runtime pm on upstream bridge
>   thunderbolt: Support runtime pm on NHI
>
>  drivers/base/power/generic_ops.c  |   3 +-
>  drivers/pci/hotplug/pciehp_ctrl.c |   9 +-
>  drivers/pci/hotplug/pciehp_hpc.c  |   4 +
>  drivers/pci/pci.c                 |  50 ++----
>  drivers/pci/pci.h                 |   2 +
>  drivers/pci/pcie/portdrv.h        |   6 +-
>  drivers/pci/pcie/portdrv_core.c   |  47 +-----
>  drivers/pci/pcie/portdrv_pci.c    |  88 ++++++++--
>  drivers/pci/probe.c               |  17 ++
>  drivers/thunderbolt/Kconfig       |   4 +-
>  drivers/thunderbolt/Makefile      |   4 +-
>  drivers/thunderbolt/nhi.c         |  32 +++-
>  drivers/thunderbolt/switch.c      |   9 +
>  drivers/thunderbolt/tb.c          |  13 ++
>  drivers/thunderbolt/upstream.c    | 345 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h               |   1 +
>  include/linux/pcieport_if.h       |   7 +
>  include/linux/pm.h                |   1 +
>  18 files changed, 539 insertions(+), 103 deletions(-)
>  create mode 100644 drivers/thunderbolt/upstream.c
>
> --
> 2.8.1
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (13 preceding siblings ...)
  2016-05-21  9:48 ` [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Andreas Noever
@ 2016-06-13 20:58 ` Bjorn Helgaas
  2016-06-14  9:27   ` Lukas Wunner
  2016-07-07 15:02 ` Lukas Wunner
  15 siblings, 1 reply; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-13 20:58 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki,
	Alan Stern, Huang Ying

On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> This series powers Thunderbolt controllers on Macs down when nothing is
> plugged in, saving 1.7 W on machines with a Light Ridge controller and
> reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.

Hi Lukas,

I suppose these depend on Mika's runtime PM patches (at least, I see
Mika's commits in your github tree).  Would you mind rebasing your
patches so they apply on my pci/pm branch?  I couldn't import them
directly into stg.

I'm sure git could pull them in easily, but I use git in a very
simple-minded way.

Bjorn

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 08/13 REBASED] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs
  2016-05-13 11:15 ` [PATCH v2 08/13] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs Lukas Wunner
@ 2016-06-14  9:08   ` Lukas Wunner
  2016-06-17 21:53   ` [PATCH v2 08/13] " Bjorn Helgaas
  1 sibling, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-06-14  9:08 UTC (permalink / raw)
  To: linux-pci, linux-pm; +Cc: Andreas Noever

Thunderbolt controllers have a pin to signal plug events while the
controller is powered down.  On Macs this pin is wired to the
southbridge and causes a GPE to be fired.  The OS may then power up the
controller to probe the newly connected device.  It is thus okay to let
Thunderbolt hotplug ports go to D3 on Macs.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pcie/portdrv_pci.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index dc73a54..08761b8 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -224,9 +224,12 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 	 * to enumerate devices behind this port properly (the port is
 	 * powered down preventing all config space accesses to the
 	 * subordinate devices).  We can't be sure for native PCIe hotplug
-	 * either so prevent that as well.
+	 * either so prevent that as well.  However Thunderbolt controllers
+	 * on Macs are capable of side-band signaling plug events while
+	 * powered down, so allow them to suspend.
 	 */
-	if (!dev->is_hotplug_bridge) {
+	if (!dev->is_hotplug_bridge ||
+	    (dev->is_thunderbolt && dmi_match(DMI_SYS_VENDOR, "Apple Inc."))) {
 		/*
 		 * Keep the port resumed 100ms to make sure things like
 		 * config space accesses from userspace (lspci) will not
@@ -244,7 +247,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
-	if (!dev->is_hotplug_bridge) {
+	if (!dev->is_hotplug_bridge ||
+	    (dev->is_thunderbolt && dmi_match(DMI_SYS_VENDOR, "Apple Inc."))) {
 		pm_runtime_forbid(&dev->dev);
 		pm_runtime_get_noresume(&dev->dev);
 		pm_runtime_dont_use_autosuspend(&dev->dev);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-06-13 20:58 ` Bjorn Helgaas
@ 2016-06-14  9:27   ` Lukas Wunner
  0 siblings, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-06-14  9:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki,
	Alan Stern, Huang Ying

Hi Bjorn,

On Mon, Jun 13, 2016 at 03:58:23PM -0500, Bjorn Helgaas wrote:
> On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > This series powers Thunderbolt controllers on Macs down when nothing is
> > plugged in, saving 1.7 W on machines with a Light Ridge controller and
> > reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
> I suppose these depend on Mika's runtime PM patches (at least, I see
> Mika's commits in your github tree).  Would you mind rebasing your
> patches so they apply on my pci/pm branch?  I couldn't import them
> directly into stg.

Right, patch [08/13] no longer applies cleanly because the context has
changed slightly ("100 ms" instead of "10 ms").

I've just sent out a rebased version of that patch in-reply-to the
original version. So if you use threading in mutt (or limit the
messages displayed to e.g. "~f lukas@wunner.de" and order by subject),
you should be able to tag the rebased version and pipe that together
with the rest to stg.

If this doesn't work out for some reason and you'd prefer me to resend
the whole series just shout.

Thanks!

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-05-21  9:48 ` [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Andreas Noever
@ 2016-06-14 16:37   ` Bjorn Helgaas
  2016-06-14 19:14     ` Andreas Noever
  0 siblings, 1 reply; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-14 16:37 UTC (permalink / raw)
  To: Andreas Noever
  Cc: Lukas Wunner, linux-pci, Linux PM list, Rafael J. Wysocki,
	Alan Stern, Huang Ying, linux-kernel

[+cc linux-kernel]

On Sat, May 21, 2016 at 11:48:42AM +0200, Andreas Noever wrote:
> 
> Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
>
> Tested on MacBookPro10,1
> 
> On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > This series powers Thunderbolt controllers on Macs down when nothing is
> > plugged in, saving 1.7 W on machines with a Light Ridge controller and
> > reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
> >
> > Briefly, a custom ACPI method provided by Apple is used to cut power to
> > the controller.  A GPE is enabled while the controller is powered down
> > which side-band signals a plug event, whereupon power is reinstated using
> > the ACPI method.  Note that even though this mechanism is ACPI-based,
> > it does not use _PSx methods and is thus entirely nonstandard.

I think the current arrangement was that Andreas would ack Thunderbolt
patches and I would merge them via the PCI tree.  That makes some sense
because Thunderbolt and PCIe are related, but the more I think about
it, the less I'm happy with it.

This series is a good example.  I'm sure it's good work and
worthwhile.  But I can't really say anything about the content of it
because most of it is Thunderbolt-specific and there's no public spec.
It seems like this is basically a collection of reverse-engineered
quirks that happen to work with the current state of Linux PM on
certain Macs.  We don't know what might change on future Macs.  We
don't know what might break when we make changes to Linux PM.

I can't test this series, nor do I want to.  I can't test most of the
patches I merge, but I can at least read the spec and see whether the
patches make sense.  What I would *like* is to have public Thunderbolt
specs and a kernel developer's guide so we know what to expect from
the hardware and the firmware and we can write code that should work
not just on current Macs, but also on non-Macs and future Macs.

I don't think the current situation is really maintainable, and I'm
not comfortable merging code that I can't maintain.

I know I don't understand the whole situation, so somebody please tell
me why I'm being unreasonable here.

> > A Thunderbolt controller appears to the OS as a set of PCI devices:  One
> > NHI (Native Host Interface) and multiple bridges.  Power is cut to the
> > entire set of devices:
> >
> >   (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> >                                      +-- Downstream Bridge 1 --
> >                                      +-- Downstream Bridge 2 --
> >                                      ...
> >
> > v1 of this series shoehorned power control into the NHI driver.  This
> > violated the Linux pm model which assumes that a child cannot resume
> > before its parent. As seen above, the NHI is a child, so the child cut
> > power to the bridges above it.
> >
> > v2 resolves this by positioning power control on the controller's
> > topmost device, which is the upstream bridge.  That is achieved by
> > binding to it as a Thunderbolt port service driver.  portdrv already
> > calls down to each service driver on ->suspend and ->resume and I
> > extended that scheme to further PM callbacks.  E.g. when the upstream
> > bridge is runtime suspended, portdrv invokes the ->runtime_suspend
> > callback of each attached service driver, and the Thunderbolt service
> > driver's callback in turn invokes the ACPI method to cut power to the
> > controller.
> >
> >
> > For such a nonstandard ACPI-based PM mechanism one would normally assign
> > a dev_pm_domain to the upstream bridge which overrides the PCI subsystem
> > PM callbacks.  But that's not an option here because dev_pm_domain_set()
> > can only be called during driver probe.  The driver is portdrv and
> > obviously loads earlier than the thunderbolt port service driver.
> > So one has to make do with the PCI subsystem PM callbacks.
> >
> > The PCI core currently assumes that devices can only be put into D3cold
> > by the platform, i.e. using the standard ACPI _PSx methods.  I extended
> > the PCI core so that it can deal with devices which are put into D3cold
> > by the driver callbacks.  It turns out only two changes are needed to
> > make this work, and they are in patches [09/13] and [10/13].  Runtime
> > suspend works out of the box, but runtime resume tries to set the device
> > power state *before* invoking the driver callback, and this goes awry
> > since the device is still in D3cold.  I solved this by returning an error
> > in pci_raw_set_power_state() if the device's current_state is D3cold
> > (patch [09/13]).
> >
> > Theoretically it would also be possible to patch the missing _PSx methods
> > into the ACPI namespace at runtime but I suspect it wouldn't be pretty:
> > I think I'd have to include pre-compiled AML methods in the kernel and
> > modify those blobs at runtime (adjust GPE number etc) before patching
> > them into the namespace.
> >
> >
> > To make direct-complete work for such non-platform-power-managed devices,
> > I also had to modify pci_target_state() (patch [10/13]).
> >
> > Finally, I wanted to avoid the mandatory runtime resume after direct-
> > complete which was introduced by Rafael with 58a1fbbb2ee8 ("PM / PCI /
> > ACPI: Kick devices that might have been reset by firmware"), so I added
> > the possibility to opt out of it (patch [11/13]).
> >
> >
> > I've pushed these patches to GitHub where they can be reviewed more
> > comfortably with green/red highlighting:
> > https://github.com/l1k/linux/commits/thunderbolt_runpm_v2
> >
> > For reference, here's a link to v1:
> > http://thread.gmane.org/gmane.linux.power-management.general/73197
> >
> > Thanks in advance for your comments.
> >
> > Lukas
> >
> >
> > Lukas Wunner (13):
> >   PCI: Recognize Thunderbolt devices
> >   PCI: Allow D3 for Thunderbolt ports
> >   PCI: Add Thunderbolt portdrv service type
> >   PCI: Generalize portdrv pm iterator
> >   PCI: Use portdrv pm iterator on further callbacks
> >   PCI: pciehp: Support runtime pm
> >   PCI: pciehp: Ignore interrupts during D3cold
> >   PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs
> >   PCI: Do not write to PM control register while in D3cold
> >   PCI: Avoid going from D3cold to D3hot for system sleep
> >   PM / sleep: Allow opt-out from runtime resume after direct-complete
> >   thunderbolt: Support runtime pm on upstream bridge
> >   thunderbolt: Support runtime pm on NHI
> >
> >  drivers/base/power/generic_ops.c  |   3 +-
> >  drivers/pci/hotplug/pciehp_ctrl.c |   9 +-
> >  drivers/pci/hotplug/pciehp_hpc.c  |   4 +
> >  drivers/pci/pci.c                 |  50 ++----
> >  drivers/pci/pci.h                 |   2 +
> >  drivers/pci/pcie/portdrv.h        |   6 +-
> >  drivers/pci/pcie/portdrv_core.c   |  47 +-----
> >  drivers/pci/pcie/portdrv_pci.c    |  88 ++++++++--
> >  drivers/pci/probe.c               |  17 ++
> >  drivers/thunderbolt/Kconfig       |   4 +-
> >  drivers/thunderbolt/Makefile      |   4 +-
> >  drivers/thunderbolt/nhi.c         |  32 +++-
> >  drivers/thunderbolt/switch.c      |   9 +
> >  drivers/thunderbolt/tb.c          |  13 ++
> >  drivers/thunderbolt/upstream.c    | 345 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h               |   1 +
> >  include/linux/pcieport_if.h       |   7 +
> >  include/linux/pm.h                |   1 +
> >  18 files changed, 539 insertions(+), 103 deletions(-)
> >  create mode 100644 drivers/thunderbolt/upstream.c
> >
> > --
> > 2.8.1
> >

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-06-14 16:37   ` Bjorn Helgaas
@ 2016-06-14 19:14     ` Andreas Noever
  2016-06-14 20:22       ` Bjorn Helgaas
  0 siblings, 1 reply; 65+ messages in thread
From: Andreas Noever @ 2016-06-14 19:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lukas Wunner, linux-pci, Linux PM list, Rafael J. Wysocki,
	Alan Stern, Huang Ying, linux-kernel

On Tue, Jun 14, 2016 at 6:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> [+cc linux-kernel]
>
> On Sat, May 21, 2016 at 11:48:42AM +0200, Andreas Noever wrote:
>>
>> Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
>>
>> Tested on MacBookPro10,1
>>
>> On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> > This series powers Thunderbolt controllers on Macs down when nothing is
>> > plugged in, saving 1.7 W on machines with a Light Ridge controller and
>> > reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
>> >
>> > Briefly, a custom ACPI method provided by Apple is used to cut power to
>> > the controller.  A GPE is enabled while the controller is powered down
>> > which side-band signals a plug event, whereupon power is reinstated using
>> > the ACPI method.  Note that even though this mechanism is ACPI-based,
>> > it does not use _PSx methods and is thus entirely nonstandard.
>
> I think the current arrangement was that Andreas would ack Thunderbolt
> patches and I would merge them via the PCI tree.  That makes some sense
> because Thunderbolt and PCIe are related, but the more I think about
> it, the less I'm happy with it.
>
> This series is a good example.  I'm sure it's good work and
> worthwhile.  But I can't really say anything about the content of it
> because most of it is Thunderbolt-specific and there's no public spec.
> It seems like this is basically a collection of reverse-engineered
> quirks that happen to work with the current state of Linux PM on
> certain Macs.  We don't know what might change on future Macs.  We
> don't know what might break when we make changes to Linux PM.
>
> I can't test this series, nor do I want to.  I can't test most of the
> patches I merge, but I can at least read the spec and see whether the
> patches make sense.  What I would *like* is to have public Thunderbolt
> specs and a kernel developer's guide so we know what to expect from
> the hardware and the firmware and we can write code that should work
> not just on current Macs, but also on non-Macs and future Macs.
>
> I don't think the current situation is really maintainable, and I'm
> not comfortable merging code that I can't maintain.
Most of the code is contained within the thunderbolt driver. I think
there is quite some precedence for reverse engineered drivers without
specs being part of the kernel. My understanding was that, since I am
listed in MAINTAINERS, I am responsible for the driver. Now our
changes often need improvements to the pci core, which is why I think
merging through your tree is a good idea (without transferring
responsibility). The changes to the drivers/pci should be supported by
the PCI-spec and make sense without knowing about thunderbolt (but it
might be the case that thunderbolt is the only user of these
features).

Specifically for this series we want to:
 - whitelist thunderbolt bridges for PM. Detecting those bridges is
non-standard but I think this is acceptable, since this
blacklist/whitelist is basically a quirk.
 - Load our portdrv on tb bridges. PCI just sees another portdriver
and all the reverse engineered magic lives inside the driver.
 - Forward more PM callbacks to portdrivers (not tb specific)
 - hotplug D3cold fixes: resume around board_added/remove_board,
ignore interrupts in d3cold (not tb specific and probably a general
bugfix)
 - Make pci not fail if bridges have been put into D3cold by some
external mechanism.

So maybe you could review the pci changes as a solution to the problem
"we want to load a custom portdriver which can put bridges into d3cold
in a device specific way". We certainly to not expect you to take
responsibility for the thunderbolt driver.


> I know I don't understand the whole situation, so somebody please tell
> me why I'm being unreasonable here.
>
>> > A Thunderbolt controller appears to the OS as a set of PCI devices:  One
>> > NHI (Native Host Interface) and multiple bridges.  Power is cut to the
>> > entire set of devices:
>> >
>> >   (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
>> >                                      +-- Downstream Bridge 1 --
>> >                                      +-- Downstream Bridge 2 --
>> >                                      ...
>> >
>> > v1 of this series shoehorned power control into the NHI driver.  This
>> > violated the Linux pm model which assumes that a child cannot resume
>> > before its parent. As seen above, the NHI is a child, so the child cut
>> > power to the bridges above it.
>> >
>> > v2 resolves this by positioning power control on the controller's
>> > topmost device, which is the upstream bridge.  That is achieved by
>> > binding to it as a Thunderbolt port service driver.  portdrv already
>> > calls down to each service driver on ->suspend and ->resume and I
>> > extended that scheme to further PM callbacks.  E.g. when the upstream
>> > bridge is runtime suspended, portdrv invokes the ->runtime_suspend
>> > callback of each attached service driver, and the Thunderbolt service
>> > driver's callback in turn invokes the ACPI method to cut power to the
>> > controller.
>> >
>> >
>> > For such a nonstandard ACPI-based PM mechanism one would normally assign
>> > a dev_pm_domain to the upstream bridge which overrides the PCI subsystem
>> > PM callbacks.  But that's not an option here because dev_pm_domain_set()
>> > can only be called during driver probe.  The driver is portdrv and
>> > obviously loads earlier than the thunderbolt port service driver.
>> > So one has to make do with the PCI subsystem PM callbacks.
>> >
>> > The PCI core currently assumes that devices can only be put into D3cold
>> > by the platform, i.e. using the standard ACPI _PSx methods.  I extended
>> > the PCI core so that it can deal with devices which are put into D3cold
>> > by the driver callbacks.  It turns out only two changes are needed to
>> > make this work, and they are in patches [09/13] and [10/13].  Runtime
>> > suspend works out of the box, but runtime resume tries to set the device
>> > power state *before* invoking the driver callback, and this goes awry
>> > since the device is still in D3cold.  I solved this by returning an error
>> > in pci_raw_set_power_state() if the device's current_state is D3cold
>> > (patch [09/13]).
>> >
>> > Theoretically it would also be possible to patch the missing _PSx methods
>> > into the ACPI namespace at runtime but I suspect it wouldn't be pretty:
>> > I think I'd have to include pre-compiled AML methods in the kernel and
>> > modify those blobs at runtime (adjust GPE number etc) before patching
>> > them into the namespace.
>> >
>> >
>> > To make direct-complete work for such non-platform-power-managed devices,
>> > I also had to modify pci_target_state() (patch [10/13]).
>> >
>> > Finally, I wanted to avoid the mandatory runtime resume after direct-
>> > complete which was introduced by Rafael with 58a1fbbb2ee8 ("PM / PCI /
>> > ACPI: Kick devices that might have been reset by firmware"), so I added
>> > the possibility to opt out of it (patch [11/13]).
>> >
>> >
>> > I've pushed these patches to GitHub where they can be reviewed more
>> > comfortably with green/red highlighting:
>> > https://github.com/l1k/linux/commits/thunderbolt_runpm_v2
>> >
>> > For reference, here's a link to v1:
>> > http://thread.gmane.org/gmane.linux.power-management.general/73197
>> >
>> > Thanks in advance for your comments.
>> >
>> > Lukas
>> >
>> >
>> > Lukas Wunner (13):
>> >   PCI: Recognize Thunderbolt devices
>> >   PCI: Allow D3 for Thunderbolt ports
>> >   PCI: Add Thunderbolt portdrv service type
>> >   PCI: Generalize portdrv pm iterator
>> >   PCI: Use portdrv pm iterator on further callbacks
>> >   PCI: pciehp: Support runtime pm
>> >   PCI: pciehp: Ignore interrupts during D3cold
>> >   PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs
>> >   PCI: Do not write to PM control register while in D3cold
>> >   PCI: Avoid going from D3cold to D3hot for system sleep
>> >   PM / sleep: Allow opt-out from runtime resume after direct-complete
>> >   thunderbolt: Support runtime pm on upstream bridge
>> >   thunderbolt: Support runtime pm on NHI
>> >
>> >  drivers/base/power/generic_ops.c  |   3 +-
>> >  drivers/pci/hotplug/pciehp_ctrl.c |   9 +-
>> >  drivers/pci/hotplug/pciehp_hpc.c  |   4 +
>> >  drivers/pci/pci.c                 |  50 ++----
>> >  drivers/pci/pci.h                 |   2 +
>> >  drivers/pci/pcie/portdrv.h        |   6 +-
>> >  drivers/pci/pcie/portdrv_core.c   |  47 +-----
>> >  drivers/pci/pcie/portdrv_pci.c    |  88 ++++++++--
>> >  drivers/pci/probe.c               |  17 ++
>> >  drivers/thunderbolt/Kconfig       |   4 +-
>> >  drivers/thunderbolt/Makefile      |   4 +-
>> >  drivers/thunderbolt/nhi.c         |  32 +++-
>> >  drivers/thunderbolt/switch.c      |   9 +
>> >  drivers/thunderbolt/tb.c          |  13 ++
>> >  drivers/thunderbolt/upstream.c    | 345 ++++++++++++++++++++++++++++++++++++++
>> >  include/linux/pci.h               |   1 +
>> >  include/linux/pcieport_if.h       |   7 +
>> >  include/linux/pm.h                |   1 +
>> >  18 files changed, 539 insertions(+), 103 deletions(-)
>> >  create mode 100644 drivers/thunderbolt/upstream.c
>> >
>> > --
>> > 2.8.1
>> >

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-06-14 19:14     ` Andreas Noever
@ 2016-06-14 20:22       ` Bjorn Helgaas
  2016-06-15 18:40         ` Lukas Wunner
  2016-07-07 17:39         ` Andreas Noever
  0 siblings, 2 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-14 20:22 UTC (permalink / raw)
  To: Andreas Noever
  Cc: Lukas Wunner, linux-pci, Linux PM list, Rafael J. Wysocki,
	Alan Stern, Huang Ying, linux-kernel

On Tue, Jun 14, 2016 at 09:14:27PM +0200, Andreas Noever wrote:
> On Tue, Jun 14, 2016 at 6:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > [+cc linux-kernel]
> >
> > On Sat, May 21, 2016 at 11:48:42AM +0200, Andreas Noever wrote:
> >>
> >> Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
> >>
> >> Tested on MacBookPro10,1
> >>
> >> On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
> >> > This series powers Thunderbolt controllers on Macs down when nothing is
> >> > plugged in, saving 1.7 W on machines with a Light Ridge controller and
> >> > reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
> >> >
> >> > Briefly, a custom ACPI method provided by Apple is used to cut power to
> >> > the controller.  A GPE is enabled while the controller is powered down
> >> > which side-band signals a plug event, whereupon power is reinstated using
> >> > the ACPI method.  Note that even though this mechanism is ACPI-based,
> >> > it does not use _PSx methods and is thus entirely nonstandard.
> >
> > I think the current arrangement was that Andreas would ack Thunderbolt
> > patches and I would merge them via the PCI tree.  That makes some sense
> > because Thunderbolt and PCIe are related, but the more I think about
> > it, the less I'm happy with it.
> >
> > This series is a good example.  I'm sure it's good work and
> > worthwhile.  But I can't really say anything about the content of it
> > because most of it is Thunderbolt-specific and there's no public spec.
> > It seems like this is basically a collection of reverse-engineered
> > quirks that happen to work with the current state of Linux PM on
> > certain Macs.  We don't know what might change on future Macs.  We
> > don't know what might break when we make changes to Linux PM.
> >
> > I can't test this series, nor do I want to.  I can't test most of the
> > patches I merge, but I can at least read the spec and see whether the
> > patches make sense.  What I would *like* is to have public Thunderbolt
> > specs and a kernel developer's guide so we know what to expect from
> > the hardware and the firmware and we can write code that should work
> > not just on current Macs, but also on non-Macs and future Macs.
> >
> > I don't think the current situation is really maintainable, and I'm
> > not comfortable merging code that I can't maintain.
> Most of the code is contained within the thunderbolt driver. I think
> there is quite some precedence for reverse engineered drivers without
> specs being part of the kernel. My understanding was that, since I am
> listed in MAINTAINERS, I am responsible for the driver. Now our
> changes often need improvements to the pci core, which is why I think
> merging through your tree is a good idea (without transferring
> responsibility). The changes to the drivers/pci should be supported by
> the PCI-spec and make sense without knowing about thunderbolt (but it
> might be the case that thunderbolt is the only user of these
> features).
> 
> Specifically for this series we want to:
>  - whitelist thunderbolt bridges for PM. Detecting those bridges is
> non-standard but I think this is acceptable, since this
> blacklist/whitelist is basically a quirk.
>  - Load our portdrv on tb bridges. PCI just sees another portdriver
> and all the reverse engineered magic lives inside the driver.
>  - Forward more PM callbacks to portdrivers (not tb specific)
>  - hotplug D3cold fixes: resume around board_added/remove_board,
> ignore interrupts in d3cold (not tb specific and probably a general
> bugfix)
>  - Make pci not fail if bridges have been put into D3cold by some
> external mechanism.
> 
> So maybe you could review the pci changes as a solution to the problem
> "we want to load a custom portdriver which can put bridges into d3cold
> in a device specific way". We certainly to not expect you to take
> responsibility for the thunderbolt driver.

That's a fine solution as far as I'm personally concerned.  I think
it's poor for Linux overall, because I think it's fragile, and it's
disappointing that a technology as important as Thunderbolt is so
poorly supported by the promulgators.  But if you're willing to work
in that environment, that's great.

You maintain the thunderbolt code and merge changes, and I'll review
the pieces that touch drivers/pci.  I do have a couple comments on
those pieces, but I don't think they'll be major.

I just want to get out of the business of merging drivers/thunderbolt
code that I can't maintain.

Bjorn

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-06-14 20:22       ` Bjorn Helgaas
@ 2016-06-15 18:40         ` Lukas Wunner
  2016-06-16  1:55           ` Linus Torvalds
  2016-07-07 17:39         ` Andreas Noever
  1 sibling, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-06-15 18:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, linux-pm, Rafael J. Wysocki,
	Alan Stern, Huang Ying, linux-kernel, Linus Torvalds, Greg KH

[+cc Linus, Greg KH]

On Tue, Jun 14, 2016 at 03:22:28PM -0500, Bjorn Helgaas wrote:
> On Tue, Jun 14, 2016 at 09:14:27PM +0200, Andreas Noever wrote:
> > On Tue, Jun 14, 2016 at 6:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Sat, May 21, 2016 at 11:48:42AM +0200, Andreas Noever wrote:
> > > > On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > > This series powers Thunderbolt controllers on Macs down when
> > > > > nothing is plugged in, saving 1.7 W on machines with a Light Ridge
> > > > > controller and reportedly 4 W on Cactus Ridge 4C and Falcon Ridge
> > > > > 4C.
> > > > >
> > > > > Briefly, a custom ACPI method provided by Apple is used to cut
> > > > > power to the controller.  A GPE is enabled while the controller
> > > > > is powered down which side-band signals a plug event, whereupon
> > > > > power is reinstated using the ACPI method.  Note that even though
> > > > > this mechanism is ACPI-based, it does not use _PSx methods and is
> > > > > thus entirely nonstandard.
> > > >
> > > > Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
> > > > Tested on MacBookPro10,1
> > >
> > > I think the current arrangement was that Andreas would ack Thunderbolt
> > > patches and I would merge them via the PCI tree.  That makes some sense
> > > because Thunderbolt and PCIe are related, but the more I think about
> > > it, the less I'm happy with it.
> > >
> > > This series is a good example.  I'm sure it's good work and
> > > worthwhile.  But I can't really say anything about the content of it
> > > because most of it is Thunderbolt-specific and there's no public spec.
> > > It seems like this is basically a collection of reverse-engineered
> > > quirks that happen to work with the current state of Linux PM on
> > > certain Macs.  We don't know what might change on future Macs.  We
> > > don't know what might break when we make changes to Linux PM.
> > >
> > > I can't test this series, nor do I want to.  I can't test most of the
> > > patches I merge, but I can at least read the spec and see whether the
> > > patches make sense.  What I would *like* is to have public Thunderbolt
> > > specs and a kernel developer's guide so we know what to expect from
> > > the hardware and the firmware and we can write code that should work
> > > not just on current Macs, but also on non-Macs and future Macs.
> > >
> > > I don't think the current situation is really maintainable, and I'm
> > > not comfortable merging code that I can't maintain.
> >
> > Most of the code is contained within the thunderbolt driver. I think
> > there is quite some precedence for reverse engineered drivers without
> > specs being part of the kernel. My understanding was that, since I am
> > listed in MAINTAINERS, I am responsible for the driver. Now our
> > changes often need improvements to the pci core, which is why I think
> > merging through your tree is a good idea (without transferring
> > responsibility). The changes to the drivers/pci should be supported by
> > the PCI-spec and make sense without knowing about thunderbolt (but it
> > might be the case that thunderbolt is the only user of these
> > features). [...]
> > 
> > So maybe you could review the pci changes as a solution to the problem
> > "we want to load a custom portdriver which can put bridges into d3cold
> > in a device specific way". We certainly to not expect you to take
> > responsibility for the thunderbolt driver.
> 
> That's a fine solution as far as I'm personally concerned.  I think
> it's poor for Linux overall, because I think it's fragile, and it's
> disappointing that a technology as important as Thunderbolt is so
> poorly supported by the promulgators.  But if you're willing to work
> in that environment, that's great.
> 
> You maintain the thunderbolt code and merge changes, and I'll review
> the pieces that touch drivers/pci.  I do have a couple comments on
> those pieces, but I don't think they'll be major.
> 
> I just want to get out of the business of merging drivers/thunderbolt
> code that I can't maintain.

So how should changes to drivers/thunderbolt/ be merged in the future?

Andreas could probably send pulls directly to Linus, but I'm not sure
what the requirements are. I believe Linus wants signed tags. The trust
path from Linus to me is 4 hops and I've signed Andreas' key today,
yielding a 5 hop trust path:
http://pgp.cs.uu.nl/mk_path.cgi?FROM=0x79BE3E4300411886&TO=0x2AAF22EB
http://pgp.surfnet.nl:11371/pks/lookup?op=vindex&search=0xB1FCD9A3

Is there an upper limit on the acceptable length of the trust path?
Does the key have to be signed by another maintainer?

I guess the alternative would be that Greg KH picks up the patches,
as he did with the initial version of the Thunderbolt driver back in
2014. I'm not sure if that makes sense as I assume he has numerous
other things on his plate. (Which is not to belittle your own or
Linus' workload.)

Most subsystems seem to practice a four-eyes principle, i.e. a
Reviewed-by should be provided by someone else if author and committer
are the same person. I'll be glad to provide that for Andreas' own
patches such as 2ffa9a5d76a7, or help otherwise if I can.


As concerns this particular series, 10 of the 13 patches, i.e. the
majority, concern the PCI core, 1 concerns the PM core and only 2
concern the Thunderbolt driver. Since the PCI core is generally seeing
a lot more activity than the Thunderbolt driver, the probability of
merge conflicts is much higher if this series is merged through a
different tree than yours. It seems to be common practice to just
accept an Acked-by from other subsystem maintainers as a green light
to merge without looking closer at those patches.


I agree with your assessment that the lack of public documentation on
Thunderbolt is deplorable. However the PCIe spec does define what a
PCIe switch is and how it functions, and Thunderbolt is precisely that.
I.e. it documents a portion of Thunderbolt without ever saying so
explicitly.

You cite the lack of a public spec as a reason for unmaintainability,
yet your subsystem contains code to support Thunderbolt on non-Macs,
in the form of acpiphp. Was the maintainability argument ever mounted
against acpiphp? Intel engineers with access to the spec contributed
the changes for acpiphp to make Thunderbolt work on non-Macs. Is their
code more maintainable than reverse-engineered code?

Best regards,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-06-15 18:40         ` Lukas Wunner
@ 2016-06-16  1:55           ` Linus Torvalds
  0 siblings, 0 replies; 65+ messages in thread
From: Linus Torvalds @ 2016-06-16  1:55 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Andreas Noever, linux-pci, linux-pm,
	Rafael J. Wysocki, Alan Stern, Huang Ying,
	Linux Kernel Mailing List, Greg KH

On Wed, Jun 15, 2016 at 8:40 AM, Lukas Wunner <lukas@wunner.de> wrote:
>
> So how should changes to drivers/thunderbolt/ be merged in the future?
>
> Andreas could probably send pulls directly to Linus, but I'm not sure
> what the requirements are. I believe Linus wants signed tags. The trust
> path from Linus to me is 4 hops and I've signed Andreas' key today,
> yielding a 5 hop trust path:
>
> Is there an upper limit on the acceptable length of the trust path?
> Does the key have to be signed by another maintainer?

I care not one whit about the idiotic gpg "trust path" crap.

To me, signatures are not about technicalities. I absolutely abhor all
the crazy people who think that signatures are about automatic web of
trust, and spend a lot of time on things like subkeys that expire
every six months etc (you know who you are). To me, that is just
complete gpg masturbation, and completely misses the point about
"trust".

Trust is not about the gpg signature. Trust is about the *person*. And
the gpg signature is a good and reasonable approximation of an ID. But
it's not some kind of absolute thing.

I'd much rather get an email from a current maintainer that I trust,
saying "look, there's going to be a new maintainer for this part of
the tree, and I signed his gpg keym and the fingerprint of that is
so-and-so.

Then, I'll do a "gpg --fetch-key", so that I have that particular key
in my keyring, and can verify that "ok, yes, I recognize the key that
signed it".

At no point do I start counting hops.

And if you lose your key, screw the whole crazy "key revocation
protocol". Its a joke. Most people who lost their keys will not have
any revocation key either. Just let me and others know. I'll just
remove that key from my keychain.

What makes me look at a key is "I've never seen this key before". The
most common reason is the people who do that f*cking annoying "let's
refresh signing keys every six months whether I need it or not because
I auto-expire them". Then I'll have to look at why the hell I'm
getting a signed pull request with a new key.

So don't worry about technicalities. I've pulled from people who had
not a single signature on their keychain, because they just were in
the wrong spot. I'd rather have a signed pull even then, just so that
I see that I get the pull requests from the same person each time, and
hopefully in a week (or month, or two), that key will get signatures.

Obviously, if you can get five people I know personally signing your
key, that makes me worry less about your particular identity, and
that's fine.

But the *real* trust is something that builds up over time as people
are good maintainers. It has absolutely nothing to do with gpg key
details. And that *real* trust is what matters a whole lot more than a
few random bits that just happen to be part of a pgp key.

                Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-05-13 11:15 ` [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep Lukas Wunner
@ 2016-06-17 21:09   ` Bjorn Helgaas
  2016-06-17 22:14     ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-17 21:09 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki

On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> There are devices wich are not power-managed by the platform, yet can be

s/wich/which/

> runtime suspended to D3cold with some other mechanism.  When putting the
> system to sleep, we currently handle such devices improperly by trying
> to transition them from D3cold to D3hot (the default power state defined
> at the beginning of pci_target_state()).  Avoid that.
> 
> An example for devices affected by this are Thunderbolt controllers
> built into Macs which can be put into D3cold with nonstandard ACPI
> methods.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

This needs an ack from Rafael.

Naive question: why is the default target_state PCI_D3hot?

> ---
>  drivers/pci/pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 791dfe7..6af9911 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
>  			      && !(dev->pme_support & (1 << target_state)))
>  				target_state--;
>  		}
> +	} else if (dev->current_state == PCI_D3cold) {
> +		target_state = PCI_D3cold;
>  	}

This only covers the case of !device_may_wakeup().  So I guess
device_may_wakeup() is false for these Thunderbolt controllers.  Is
there a reason you don't want to do this check for devices that may
wakeup?

Sorry, more naive questions.  I don't know anything about power
management, and it all looks like black magic to me.

>  	return target_state;
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold
  2016-05-13 11:15 ` [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold Lukas Wunner
@ 2016-06-17 21:18   ` Bjorn Helgaas
  2016-07-18 13:55   ` Rafael J. Wysocki
  1 sibling, 0 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-17 21:18 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, Andreas Noever, Huang Ying, Rafael J. Wysocki

On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> The PM control register is not accessible in D3cold so we shouldn't try
> writing to it in pci_raw_set_power_state() and return an error instead.
> 
> The current behaviour is fatal for devices which are not power-managed
> by the platform, yet can be runtime suspended to D3cold with some other
> mechanism by the driver:
> 
> - When the device is runtime resumed, pci_pm_runtime_resume() first
>   calls pci_restore_standard_config() which calls pci_set_power_state()
>   which calls pci_raw_set_power_state() to put the device into D0.
>   This fails since the device is still in D3cold.  It will be powered up
>   later on when pci_pm_runtime_resume() calls the driver's
>   ->runtime_resume callback.
> 
> - pci_raw_set_power_state() prints a message that the device refused to
>   change power state and returns 0.  Further up in the call stack,
>   pci_restore_standard_config() calls pci_restore_state(), which fails
>   since the device is in D3cold, but nevertheless invalidates the
>   saved_state.
> 
> - When pci_pm_runtime_resume() finally calls the driver ->runtime_resume
>   callback to power up the device, the saved_state is gone.
> 
> Return an error from pci_raw_set_power_state() to avoid this.
> 
> An example for devices affected by this are Thunderbolt controllers
> built into Macs which can be put into D3cold with nonstandard ACPI
> methods.
> 
> Unfortunately we rely on pci_raw_set_power_state()'s current behaviour
> in several places: When platform_pci_set_power_state() is called to wake
> a device from D3cold, its current_state is not updated even though it is
> no longer in D3cold.  Instead, pci_raw_set_power_state() is assumed to
> clean up the resulting incongruence.  Fix by setting current_state to
> PCI_UNKNOWN after platform_pci_set_power_state().
> 
> Also, when a bridge is put into D3cold, its children's current_state is
> changed to D3cold in __pci_complete_power_transition().  (Introduced by
> commit 448bd857d48e ("PCI/PM: add PCIe runtime D3cold support").) This
> doesn't necessarily reflect the children's actual power state: They may
> still be powered on, they're just no longer accessible.  However this
> shouldn't be a concern because if the children are accessed, their
> parent needs to resume anyway and the PM core takes care of this.
> Again, pci_raw_set_power_state() is relied upon to clean up the
> current_state when the children are resumed the next time.  We cannot
> reliably reconstruct the children's current_state when resuming their
> parent.  We also shouldn't blindly set it to PCI_UNKNOWN since some
> children may actually be turned off and D3cold is their correct
> current_state.  Therefore fix by no longer touching the children's
> current_state at all.
> 
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

It looks like this makes the code significantly simpler, but I haven't
the faintest idea whether this all makes sense or not.

It makes me a little nervous that there doesn't seem to be any spec to
guide this and we are so dependent on the current Linux code
structure.  There seem to be so many assumptions and dependencies that 
it's impractical to review changes in isolation.

So I guess I can only rely on Rafael's review here (as always for PM).

Since the current behavior is fatal to some devices, I guess this
fixes a bug?  Thunderbolt didn't work after system suspend/resume or
something?

> ---
>  drivers/pci/pci.c | 43 ++++++++++---------------------------------
>  1 file changed, 10 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 95727b4..791dfe7 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -612,6 +612,9 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
>  	if (!dev->pm_cap)
>  		return -EIO;
>  
> +	if (dev->current_state == PCI_D3cold)
> +		return -EIO;
> +
>  	if (state < PCI_D0 || state > PCI_D3hot)
>  		return -EINVAL;
>  
> @@ -728,8 +731,10 @@ void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
>   */
>  void pci_power_up(struct pci_dev *dev)
>  {
> -	if (platform_pci_power_manageable(dev))
> +	if (platform_pci_power_manageable(dev)) {
>  		platform_pci_set_power_state(dev, PCI_D0);
> +		dev->current_state = PCI_UNKNOWN;
> +	}
>  
>  	pci_raw_set_power_state(dev, PCI_D0);
>  	pci_update_current_state(dev, PCI_D0);
> @@ -746,8 +751,10 @@ static int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state)
>  
>  	if (platform_pci_power_manageable(dev)) {
>  		error = platform_pci_set_power_state(dev, state);
> -		if (!error)
> +		if (!error) {
> +			dev->current_state = PCI_UNKNOWN;
>  			pci_update_current_state(dev, state);
> +		}
>  	} else
>  		error = -ENODEV;
>  
> @@ -809,30 +816,6 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
>  }
>  
>  /**
> - * __pci_dev_set_current_state - Set current state of a PCI device
> - * @dev: Device to handle
> - * @data: pointer to state to be set
> - */
> -static int __pci_dev_set_current_state(struct pci_dev *dev, void *data)
> -{
> -	pci_power_t state = *(pci_power_t *)data;
> -
> -	dev->current_state = state;
> -	return 0;
> -}
> -
> -/**
> - * __pci_bus_set_current_state - Walk given bus and set current state of devices
> - * @bus: Top bus of the subtree to walk.
> - * @state: state to be set
> - */
> -static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
> -{
> -	if (bus)
> -		pci_walk_bus(bus, __pci_dev_set_current_state, &state);
> -}
> -
> -/**
>   * __pci_complete_power_transition - Complete power transition of a PCI device
>   * @dev: PCI device to handle.
>   * @state: State to put the device into.
> @@ -841,15 +824,9 @@ static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
>   */
>  int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state)
>  {
> -	int ret;
> -
>  	if (state <= PCI_D0)
>  		return -EINVAL;
> -	ret = pci_platform_power_transition(dev, state);
> -	/* Power off the bridge may power off the whole hierarchy */
> -	if (!ret && state == PCI_D3cold)
> -		__pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
> -	return ret;
> +	return pci_platform_power_transition(dev, state);
>  }
>  EXPORT_SYMBOL_GPL(__pci_complete_power_transition);
>  
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 08/13] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs
  2016-05-13 11:15 ` [PATCH v2 08/13] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs Lukas Wunner
  2016-06-14  9:08   ` [PATCH v2 08/13 REBASED] " Lukas Wunner
@ 2016-06-17 21:53   ` Bjorn Helgaas
  1 sibling, 0 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-17 21:53 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, Andreas Noever

On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> Thunderbolt controllers have a pin to signal plug events while the
> controller is powered down.  On Macs this pin is wired to the
> southbridge and causes a GPE to be fired.  The OS may then power up the
> controller to probe the newly connected device.  It is thus okay to let
> Thunderbolt hotplug ports go to D3 on Macs.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pcie/portdrv_pci.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index f75d4b5..7860ab3 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -224,9 +224,12 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  	 * to enumerate devices behind this port properly (the port is
>  	 * powered down preventing all config space accesses to the
>  	 * subordinate devices).  We can't be sure for native PCIe hotplug
> -	 * either so prevent that as well.
> +	 * either so prevent that as well.  However Thunderbolt controllers
> +	 * on Macs are capable of side-band signaling plug events while
> +	 * powered down, so allow them to suspend.
>  	 */
> -	if (!dev->is_hotplug_bridge) {
> +	if (!dev->is_hotplug_bridge ||
> +	    (dev->is_thunderbolt && dmi_match(DMI_SYS_VENDOR, "Apple Inc."))) {

I'd rather have a bit in the pci_dev to control this.  I don't know
how this would be different from the recently-added "bridge_d3" bit.
The bit could be set by a quirk.

It'd be nice if it were a single bit so we don't have to test
is_hotplug_bridge *and* another bit.

>  		/*
>  		 * Keep the port resumed 10ms to make sure things like
>  		 * config space accesses from userspace (lspci) will not
> @@ -243,7 +246,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
> -	if (!dev->is_hotplug_bridge) {
> +	if (!dev->is_hotplug_bridge ||
> +	    (dev->is_thunderbolt && dmi_match(DMI_SYS_VENDOR, "Apple Inc."))) {
>  		pm_runtime_forbid(&dev->dev);
>  		pm_runtime_get_noresume(&dev->dev);
>  		pm_runtime_dont_use_autosuspend(&dev->dev);
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-06-17 21:09   ` Bjorn Helgaas
@ 2016-06-17 22:14     ` Lukas Wunner
  2016-07-18 13:39       ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-06-17 22:14 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki

On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > There are devices wich are not power-managed by the platform, yet can be
> 
> s/wich/which/

Oops.

> 
> > runtime suspended to D3cold with some other mechanism.  When putting the
> > system to sleep, we currently handle such devices improperly by trying
> > to transition them from D3cold to D3hot (the default power state defined
> > at the beginning of pci_target_state()).  Avoid that.
> > 
> > An example for devices affected by this are Thunderbolt controllers
> > built into Macs which can be put into D3cold with nonstandard ACPI
> > methods.
> > 
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> 
> This needs an ack from Rafael.
> 
> Naive question: why is the default target_state PCI_D3hot?

No idea. :-)

> 
> > ---
> >  drivers/pci/pci.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 791dfe7..6af9911 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
> >  			      && !(dev->pme_support & (1 << target_state)))
> >  				target_state--;
> >  		}
> > +	} else if (dev->current_state == PCI_D3cold) {
> > +		target_state = PCI_D3cold;
> >  	}
> 
> This only covers the case of !device_may_wakeup().  So I guess
> device_may_wakeup() is false for these Thunderbolt controllers.

Correct. device_may_wakeup() is defined in include/linux/pm_wakeup.h as:
dev->power.can_wakeup && !!dev->power.wakeup

The first one, dev->power.can_wakeup is true because PME is claimed to be
supported for all power states in the PMC register, so pci_pm_init() calls
device_set_wakeup_capable(&dev->dev, true):
  Capabilities: [80] Power Management version 3
    Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-

The second one, dev->power.wakeup is false because device_wakeup_enable()
is never called.


> Is there a reason you don't want to do this check for devices that
> may wakeup?

Fear of breaking things. It would mean that a device would be left in
D3cold even though it may not be able to signal wakeup from that power
state. That's a change of behaviour the consequences of which I cannot
estimate. Intuitively, I would expect breakage from such a change.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type
  2016-05-13 11:15 ` [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type Lukas Wunner
@ 2016-06-17 22:51   ` Bjorn Helgaas
  2016-07-20  0:30     ` Rafael J. Wysocki
  2016-07-20  6:59     ` Lukas Wunner
  0 siblings, 2 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-17 22:51 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, linux-pm, Andreas Noever

On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> A Thunderbolt controller is a PCIe switch which, as defined in the PCIe
> spec, appears to the OS "as a collection of virtual PCI-to-PCI bridges".
> 
> We're about to add support for Apple's nonstandard ACPI methods to power
> Thunderbolt controllers up and down.  To facilitate that, allocate a
> port service for every PCI bridge belonging to a Thunderbolt controller.
> 
> This port service might come in handy for other use cases, e.g. device
> initialization of Thunderbolt controllers.
> 
> To understand when and how this port service will be allocated, consider
> the PCI devices exposed by a Thunderbolt host controller:
> 
>   (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
>                                      +-- Downstream Bridge 1 --
>                                      +-- Downstream Bridge 2 --
>                                      ...
> 
> The upstream and downstream bridges represent the PCIe switch and a
> Thunderbolt port service will be allocated for each of them.  Hotplugged
> devices will appear behind the downstream bridges.  The NHI (Native Host
> Interface) is a virtual PCI device to manage the switch fabric and is
> not relevant here.  It uses class 0x88000, so it is not a PCIe port.
> 
> Next, consider the PCI devices exposed by Thunderbolt controllers built
> into hotplugged devices:
> 
>   -- Upstream Bridge ---- Downstream Bridge ---- Hotplugged device
> 
> Again, Thunderbolt port services will be allocated for the upstream and
> downstream bridge, but not for the hotplugged device, which might use
> e.g. class 0x20000 if it's a Thunderbolt Ethernet adapter.

I don't really *like* the portdrv infrastructure, even though we're
sort of stuck with it now.  It seems like all it really does is allow
multiple sub-drivers to attach to a single device and share interrupts
between them.  And we get some extra devices in sysfs that don't fit
the regular PCI model.  We used to support loadable sub-drivers
(pciehp, aer, etc.), but we decided that didn't really make sense
(though I notice you do support thunderbolt as a module).

I think we would be better off if the PCIe services (hotplug, AER,
etc.) were directly integrated into the PCI core without the portdrv
abstraction in the middle.  But anyway, we do have portdrv, and the
only question here is whether extending it for Thunderbolt is the
right thing.

So the question for Thunderbolt is what benefit you get from being a
portdrv sub-driver.  It seems like basically a way for you to hook on
to PCI bridges that happen to be Thunderbolt controllers.  I don't
think you really use any portdrv services (other than forwarding the
PM ops down to you, which a regular PCI device driver would get for
free).

upstream.c does a lot of ACPI stuff; I can't tell whether it has more
affinity with ACPI or with PCI.  I don't see any PNP IDs though, so I
guess you just look for the magic method names in the ACPI device
associated with some PCI device.  That seems a little bit "back-door"
to me; from an ASL point of view, I would think you'd want to start
from a _HID and interpret the device based on that.

> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pcie/portdrv.h      | 2 +-
>  drivers/pci/pcie/portdrv_core.c | 2 ++
>  include/linux/pcieport_if.h     | 2 ++
>  3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 587aef3..a0d9973 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -11,7 +11,7 @@
>  
>  #include <linux/compiler.h>
>  
> -#define PCIE_PORT_DEVICE_MAXSERVICES   5
> +#define PCIE_PORT_DEVICE_MAXSERVICES	6
>  /*
>   * According to the PCI Express Base Specification 2.0, the indices of
>   * the MSI-X table entries used by port services must not exceed 31
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index d04fb58..8cd9db8 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -310,6 +310,8 @@ static int get_port_device_capability(struct pci_dev *dev)
>  	}
>  	if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC))
>  		services |= PCIE_PORT_SERVICE_DPC;
> +	if (dev->is_thunderbolt)
> +		services |= PCIE_PORT_SERVICE_TBT;
>  
>  	return services;
>  }
> diff --git a/include/linux/pcieport_if.h b/include/linux/pcieport_if.h
> index afcd130..d205bd6 100644
> --- a/include/linux/pcieport_if.h
> +++ b/include/linux/pcieport_if.h
> @@ -23,6 +23,8 @@
>  #define PCIE_PORT_SERVICE_VC		(1 << PCIE_PORT_SERVICE_VC_SHIFT)
>  #define PCIE_PORT_SERVICE_DPC_SHIFT	4	/* Downstream Port Containment */
>  #define PCIE_PORT_SERVICE_DPC		(1 << PCIE_PORT_SERVICE_DPC_SHIFT)
> +#define PCIE_PORT_SERVICE_TBT_SHIFT	5	/* Thunderbolt */
> +#define PCIE_PORT_SERVICE_TBT		(1 << PCIE_PORT_SERVICE_TBT_SHIFT)
>  
>  struct pcie_device {
>  	int		irq;	    /* Service IRQ/MSI/MSI-X Vector */
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold
  2016-05-13 11:15 ` [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold Lukas Wunner
@ 2016-06-17 22:52   ` Bjorn Helgaas
  2016-08-02 16:27     ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Bjorn Helgaas @ 2016-06-17 22:52 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, Andreas Noever, Mika Westerberg, Rafael J. Wysocki

[+cc Mika, Rafael]

On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> If a hotplug port is suspended to D3cold, its slot status register
> cannot be read.  If that hotplug port happens to share its IRQ with
> other devices, then whenever an interrupt occurs for one of these
> devices, a "no response from device" message is logged with level
> KERN_INFO.  Apart from this annoyance, CPU time is needlessly spent
> trying to read the slot status register even though we know in advance
> that it will fail.

I guess this is a pretty generic problem that could affect any device
that shares an IRQ.

I think I'll queue this on my pci/pm branch, since it seems closely
related to Mika's "PCI: Add runtime PM support for PCIe ports".

Did you check for the same issue in other likely places, e.g., AER,
PME, etc.?

> On MacBook Pros introduced 2011 and 2012, the IRQ of a Thunderbolt
> hotplug port is unfortunately shared with a wireless card, an audio card
> and an SDXC controller.  When the Thunderbolt controller is powered
> down, the machine carries out at least one unneeded slot status register
> read for each wireless packet received and prints a corresponding error
> message to the system log.
> 
> The hotplug port's current_state will be D3cold when it's powered down,
> so ignore interrupts that occur during that power state.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 5c24e93..08e84d6 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -546,6 +546,10 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
>  	u8 present;
>  	bool link;
>  
> +	/* Interrupts cannot originate from a controller that's asleep */
> +	if (pdev->current_state == PCI_D3cold)
> +		return IRQ_NONE;
> +
>  	/*
>  	 * In order to guarantee that all interrupt events are
>  	 * serviced, we need to re-inspect Slot Status register after
> -- 
> 2.8.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (14 preceding siblings ...)
  2016-06-13 20:58 ` Bjorn Helgaas
@ 2016-07-07 15:02 ` Lukas Wunner
  2016-07-08  1:28   ` Rafael J. Wysocki
  15 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-07-07 15:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pci, linux-pm, Andreas Noever, Alan Stern, Huang Ying

Dear Rafael,

The honor of your presence is requested to the review of this series
posted May 13.

Bjorn has requested an ack from you on patches 9 and 10, so the series
is essentially blocked until you find the time to comment. It would
also be good if you could look at the PM-related patches 4, 5, 6 and 11.
They're all fairly small. You do not have to bother about the larger
thunderbolt patches at the end of the series:

[01/13]  https://patchwork.kernel.org/patch/9090411/
[02/13]  https://patchwork.kernel.org/patch/9090421/
[03/13]  https://patchwork.kernel.org/patch/9090451/
[04/13]  https://patchwork.kernel.org/patch/9090471/
[05/13]  https://patchwork.kernel.org/patch/9090491/
[06/13]  https://patchwork.kernel.org/patch/9090511/
[07/13]  https://patchwork.kernel.org/patch/9090531/
[08/13]  https://patchwork.kernel.org/patch/9090541/
[09/13]  https://patchwork.kernel.org/patch/9090621/
[10/13]  https://patchwork.kernel.org/patch/9090641/
[11/13]  https://patchwork.kernel.org/patch/9090651/
[12/13]  https://patchwork.kernel.org/patch/9090571/
[13/13]  https://patchwork.kernel.org/patch/9090591/

There are also still unanswered questions about the architecture
I've chosen in this series: I'm attaching to the upstream bridge
of the Thunderbolt controller as a port service to be able to
power it down when nothing is plugged in. This architecture is
a workaround for the fact that dev_pm_domain_set() cannot be
called after a device is bound to a driver.

I've explained this in detail in an e-mail to you on June 17:
http://www.spinics.net/lists/linux-pci/msg52120.html

Near the end of that e-mail are two questions:
(1) Would it be possible to allow dev_pm_domain_set() for already
    bound devices? (It would allow me to simplify this series
    considerably.)
(2) How should the PCI core deal with devices that can be suspended
    to D3cold but not by the platform? Is it correct to solve this
    with dev_pm_domain_set()? (As is currently done for Optimus GPUs.)
    Is it also okay to suspend/resume them in the driver runtime PM
    callbacks? (This requires patch [09/13] of my series to work
    properly.)

Your help answering those questions and/or reviewing this series
is greatly appreciated.

Thank you!

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-06-14 20:22       ` Bjorn Helgaas
  2016-06-15 18:40         ` Lukas Wunner
@ 2016-07-07 17:39         ` Andreas Noever
  2016-07-09  5:23           ` Greg KH
  1 sibling, 1 reply; 65+ messages in thread
From: Andreas Noever @ 2016-07-07 17:39 UTC (permalink / raw)
  To: Greg KH, Bjorn Helgaas
  Cc: Lukas Wunner, linux-pci, Linux PM list, Rafael J. Wysocki,
	Alan Stern, Huang Ying, linux-kernel

On Tue, Jun 14, 2016 at 10:22 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Tue, Jun 14, 2016 at 09:14:27PM +0200, Andreas Noever wrote:
>> On Tue, Jun 14, 2016 at 6:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > [+cc linux-kernel]
>> >
>> > On Sat, May 21, 2016 at 11:48:42AM +0200, Andreas Noever wrote:
>> >>
>> >> Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
>> >>
>> >> Tested on MacBookPro10,1
>> >>
>> >> On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> >> > This series powers Thunderbolt controllers on Macs down when nothing is
>> >> > plugged in, saving 1.7 W on machines with a Light Ridge controller and
>> >> > reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
>> >> >
>> >> > Briefly, a custom ACPI method provided by Apple is used to cut power to
>> >> > the controller.  A GPE is enabled while the controller is powered down
>> >> > which side-band signals a plug event, whereupon power is reinstated using
>> >> > the ACPI method.  Note that even though this mechanism is ACPI-based,
>> >> > it does not use _PSx methods and is thus entirely nonstandard.
>> >
>> > I think the current arrangement was that Andreas would ack Thunderbolt
>> > patches and I would merge them via the PCI tree.  That makes some sense
>> > because Thunderbolt and PCIe are related, but the more I think about
>> > it, the less I'm happy with it.
>> >
>> > This series is a good example.  I'm sure it's good work and
>> > worthwhile.  But I can't really say anything about the content of it
>> > because most of it is Thunderbolt-specific and there's no public spec.
>> > It seems like this is basically a collection of reverse-engineered
>> > quirks that happen to work with the current state of Linux PM on
>> > certain Macs.  We don't know what might change on future Macs.  We
>> > don't know what might break when we make changes to Linux PM.
>> >
>> > I can't test this series, nor do I want to.  I can't test most of the
>> > patches I merge, but I can at least read the spec and see whether the
>> > patches make sense.  What I would *like* is to have public Thunderbolt
>> > specs and a kernel developer's guide so we know what to expect from
>> > the hardware and the firmware and we can write code that should work
>> > not just on current Macs, but also on non-Macs and future Macs.
>> >
>> > I don't think the current situation is really maintainable, and I'm
>> > not comfortable merging code that I can't maintain.
>> Most of the code is contained within the thunderbolt driver. I think
>> there is quite some precedence for reverse engineered drivers without
>> specs being part of the kernel. My understanding was that, since I am
>> listed in MAINTAINERS, I am responsible for the driver. Now our
>> changes often need improvements to the pci core, which is why I think
>> merging through your tree is a good idea (without transferring
>> responsibility). The changes to the drivers/pci should be supported by
>> the PCI-spec and make sense without knowing about thunderbolt (but it
>> might be the case that thunderbolt is the only user of these
>> features).
>>
>> Specifically for this series we want to:
>>  - whitelist thunderbolt bridges for PM. Detecting those bridges is
>> non-standard but I think this is acceptable, since this
>> blacklist/whitelist is basically a quirk.
>>  - Load our portdrv on tb bridges. PCI just sees another portdriver
>> and all the reverse engineered magic lives inside the driver.
>>  - Forward more PM callbacks to portdrivers (not tb specific)
>>  - hotplug D3cold fixes: resume around board_added/remove_board,
>> ignore interrupts in d3cold (not tb specific and probably a general
>> bugfix)
>>  - Make pci not fail if bridges have been put into D3cold by some
>> external mechanism.
>>
>> So maybe you could review the pci changes as a solution to the problem
>> "we want to load a custom portdriver which can put bridges into d3cold
>> in a device specific way". We certainly to not expect you to take
>> responsibility for the thunderbolt driver.
>
> That's a fine solution as far as I'm personally concerned.  I think
> it's poor for Linux overall, because I think it's fragile, and it's
> disappointing that a technology as important as Thunderbolt is so
> poorly supported by the promulgators.  But if you're willing to work
> in that environment, that's great.
>
> You maintain the thunderbolt code and merge changes, and I'll review
> the pieces that touch drivers/pci.  I do have a couple comments on
> those pieces, but I don't think they'll be major.
>
> I just want to get out of the business of merging drivers/thunderbolt
> code that I can't maintain.

[+ Greg]

Hi Greg,

do you mind if we revert to the old scheme and merge TB changes
through your tree?

Regards,
Andreas


> Bjorn

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-07-07 15:02 ` Lukas Wunner
@ 2016-07-08  1:28   ` Rafael J. Wysocki
  2016-07-20  7:23     ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-07-08  1:28 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, Andreas Noever,
	Alan Stern, Huang Ying

On Thu, Jul 7, 2016 at 5:02 PM, Lukas Wunner <lukas@wunner.de> wrote:
> Dear Rafael,
>
> The honor of your presence is requested to the review of this series
> posted May 13.
>
> Bjorn has requested an ack from you on patches 9 and 10, so the series
> is essentially blocked until you find the time to comment. It would
> also be good if you could look at the PM-related patches 4, 5, 6 and 11.
> They're all fairly small. You do not have to bother about the larger
> thunderbolt patches at the end of the series:

Sorry for being the bottleneck here.

This has been in my todo list all the time, but I get pulled away from
it on a regular basis due to regressions and similar.

> [01/13]  https://patchwork.kernel.org/patch/9090411/
> [02/13]  https://patchwork.kernel.org/patch/9090421/
> [03/13]  https://patchwork.kernel.org/patch/9090451/
> [04/13]  https://patchwork.kernel.org/patch/9090471/
> [05/13]  https://patchwork.kernel.org/patch/9090491/
> [06/13]  https://patchwork.kernel.org/patch/9090511/
> [07/13]  https://patchwork.kernel.org/patch/9090531/
> [08/13]  https://patchwork.kernel.org/patch/9090541/
> [09/13]  https://patchwork.kernel.org/patch/9090621/
> [10/13]  https://patchwork.kernel.org/patch/9090641/
> [11/13]  https://patchwork.kernel.org/patch/9090651/
> [12/13]  https://patchwork.kernel.org/patch/9090571/
> [13/13]  https://patchwork.kernel.org/patch/9090591/
>
> There are also still unanswered questions about the architecture
> I've chosen in this series: I'm attaching to the upstream bridge
> of the Thunderbolt controller as a port service to be able to
> power it down when nothing is plugged in. This architecture is
> a workaround for the fact that dev_pm_domain_set() cannot be
> called after a device is bound to a driver.
>
> I've explained this in detail in an e-mail to you on June 17:
> http://www.spinics.net/lists/linux-pci/msg52120.html

I've read this, but I still don't quite understand the problem to be
honest.  I'll have another look.

> Near the end of that e-mail are two questions:
> (1) Would it be possible to allow dev_pm_domain_set() for already
>     bound devices? (It would allow me to simplify this series
>     considerably.)

I don't think so, because setting a PM domain generally changes the
set of PM callbacks for the device and it may not be safe to call it
after the driver has been bound.

> (2) How should the PCI core deal with devices that can be suspended
>     to D3cold but not by the platform? Is it correct to solve this
>     with dev_pm_domain_set()? (As is currently done for Optimus GPUs.)
>     Is it also okay to suspend/resume them in the driver runtime PM
>     callbacks? (This requires patch [09/13] of my series to work
>     properly.)
>
> Your help answering those questions and/or reviewing this series
> is greatly appreciated.
>
> Thank you!

Well, honestly, you may be underestimating the amount of time needed
for me to understand the problem you're trying to solve.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-07-07 17:39         ` Andreas Noever
@ 2016-07-09  5:23           ` Greg KH
  2016-07-12 21:46             ` Andreas Noever
  0 siblings, 1 reply; 65+ messages in thread
From: Greg KH @ 2016-07-09  5:23 UTC (permalink / raw)
  To: Andreas Noever
  Cc: Bjorn Helgaas, Lukas Wunner, linux-pci, Linux PM list,
	Rafael J. Wysocki, Alan Stern, Huang Ying, linux-kernel

On Thu, Jul 07, 2016 at 07:39:12PM +0200, Andreas Noever wrote:
> On Tue, Jun 14, 2016 at 10:22 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Jun 14, 2016 at 09:14:27PM +0200, Andreas Noever wrote:
> >> On Tue, Jun 14, 2016 at 6:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >> > [+cc linux-kernel]
> >> >
> >> > On Sat, May 21, 2016 at 11:48:42AM +0200, Andreas Noever wrote:
> >> >>
> >> >> Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
> >> >>
> >> >> Tested on MacBookPro10,1
> >> >>
> >> >> On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
> >> >> > This series powers Thunderbolt controllers on Macs down when nothing is
> >> >> > plugged in, saving 1.7 W on machines with a Light Ridge controller and
> >> >> > reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
> >> >> >
> >> >> > Briefly, a custom ACPI method provided by Apple is used to cut power to
> >> >> > the controller.  A GPE is enabled while the controller is powered down
> >> >> > which side-band signals a plug event, whereupon power is reinstated using
> >> >> > the ACPI method.  Note that even though this mechanism is ACPI-based,
> >> >> > it does not use _PSx methods and is thus entirely nonstandard.
> >> >
> >> > I think the current arrangement was that Andreas would ack Thunderbolt
> >> > patches and I would merge them via the PCI tree.  That makes some sense
> >> > because Thunderbolt and PCIe are related, but the more I think about
> >> > it, the less I'm happy with it.
> >> >
> >> > This series is a good example.  I'm sure it's good work and
> >> > worthwhile.  But I can't really say anything about the content of it
> >> > because most of it is Thunderbolt-specific and there's no public spec.
> >> > It seems like this is basically a collection of reverse-engineered
> >> > quirks that happen to work with the current state of Linux PM on
> >> > certain Macs.  We don't know what might change on future Macs.  We
> >> > don't know what might break when we make changes to Linux PM.
> >> >
> >> > I can't test this series, nor do I want to.  I can't test most of the
> >> > patches I merge, but I can at least read the spec and see whether the
> >> > patches make sense.  What I would *like* is to have public Thunderbolt
> >> > specs and a kernel developer's guide so we know what to expect from
> >> > the hardware and the firmware and we can write code that should work
> >> > not just on current Macs, but also on non-Macs and future Macs.
> >> >
> >> > I don't think the current situation is really maintainable, and I'm
> >> > not comfortable merging code that I can't maintain.
> >> Most of the code is contained within the thunderbolt driver. I think
> >> there is quite some precedence for reverse engineered drivers without
> >> specs being part of the kernel. My understanding was that, since I am
> >> listed in MAINTAINERS, I am responsible for the driver. Now our
> >> changes often need improvements to the pci core, which is why I think
> >> merging through your tree is a good idea (without transferring
> >> responsibility). The changes to the drivers/pci should be supported by
> >> the PCI-spec and make sense without knowing about thunderbolt (but it
> >> might be the case that thunderbolt is the only user of these
> >> features).
> >>
> >> Specifically for this series we want to:
> >>  - whitelist thunderbolt bridges for PM. Detecting those bridges is
> >> non-standard but I think this is acceptable, since this
> >> blacklist/whitelist is basically a quirk.
> >>  - Load our portdrv on tb bridges. PCI just sees another portdriver
> >> and all the reverse engineered magic lives inside the driver.
> >>  - Forward more PM callbacks to portdrivers (not tb specific)
> >>  - hotplug D3cold fixes: resume around board_added/remove_board,
> >> ignore interrupts in d3cold (not tb specific and probably a general
> >> bugfix)
> >>  - Make pci not fail if bridges have been put into D3cold by some
> >> external mechanism.
> >>
> >> So maybe you could review the pci changes as a solution to the problem
> >> "we want to load a custom portdriver which can put bridges into d3cold
> >> in a device specific way". We certainly to not expect you to take
> >> responsibility for the thunderbolt driver.
> >
> > That's a fine solution as far as I'm personally concerned.  I think
> > it's poor for Linux overall, because I think it's fragile, and it's
> > disappointing that a technology as important as Thunderbolt is so
> > poorly supported by the promulgators.  But if you're willing to work
> > in that environment, that's great.
> >
> > You maintain the thunderbolt code and merge changes, and I'll review
> > the pieces that touch drivers/pci.  I do have a couple comments on
> > those pieces, but I don't think they'll be major.
> >
> > I just want to get out of the business of merging drivers/thunderbolt
> > code that I can't maintain.
> 
> [+ Greg]
> 
> Hi Greg,
> 
> do you mind if we revert to the old scheme and merge TB changes
> through your tree?

I will be glad to take them, feel free to send them on to me.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-07-09  5:23           ` Greg KH
@ 2016-07-12 21:46             ` Andreas Noever
  0 siblings, 0 replies; 65+ messages in thread
From: Andreas Noever @ 2016-07-12 21:46 UTC (permalink / raw)
  To: Greg KH
  Cc: Bjorn Helgaas, Lukas Wunner, linux-pci, Linux PM list,
	Rafael J. Wysocki, Alan Stern, Huang Ying, linux-kernel

On Sat, Jul 9, 2016 at 7:23 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Thu, Jul 07, 2016 at 07:39:12PM +0200, Andreas Noever wrote:
>> On Tue, Jun 14, 2016 at 10:22 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > On Tue, Jun 14, 2016 at 09:14:27PM +0200, Andreas Noever wrote:
>> >> On Tue, Jun 14, 2016 at 6:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> >> > [+cc linux-kernel]
>> >> >
>> >> > On Sat, May 21, 2016 at 11:48:42AM +0200, Andreas Noever wrote:
>> >> >>
>> >> >> Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
>> >> >>
>> >> >> Tested on MacBookPro10,1
>> >> >>
>> >> >> On Fri, May 13, 2016 at 1:15 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> >> >> > This series powers Thunderbolt controllers on Macs down when nothing is
>> >> >> > plugged in, saving 1.7 W on machines with a Light Ridge controller and
>> >> >> > reportedly 4 W on Cactus Ridge 4C and Falcon Ridge 4C.
>> >> >> >
>> >> >> > Briefly, a custom ACPI method provided by Apple is used to cut power to
>> >> >> > the controller.  A GPE is enabled while the controller is powered down
>> >> >> > which side-band signals a plug event, whereupon power is reinstated using
>> >> >> > the ACPI method.  Note that even though this mechanism is ACPI-based,
>> >> >> > it does not use _PSx methods and is thus entirely nonstandard.
>> >> >
>> >> > I think the current arrangement was that Andreas would ack Thunderbolt
>> >> > patches and I would merge them via the PCI tree.  That makes some sense
>> >> > because Thunderbolt and PCIe are related, but the more I think about
>> >> > it, the less I'm happy with it.
>> >> >
>> >> > This series is a good example.  I'm sure it's good work and
>> >> > worthwhile.  But I can't really say anything about the content of it
>> >> > because most of it is Thunderbolt-specific and there's no public spec.
>> >> > It seems like this is basically a collection of reverse-engineered
>> >> > quirks that happen to work with the current state of Linux PM on
>> >> > certain Macs.  We don't know what might change on future Macs.  We
>> >> > don't know what might break when we make changes to Linux PM.
>> >> >
>> >> > I can't test this series, nor do I want to.  I can't test most of the
>> >> > patches I merge, but I can at least read the spec and see whether the
>> >> > patches make sense.  What I would *like* is to have public Thunderbolt
>> >> > specs and a kernel developer's guide so we know what to expect from
>> >> > the hardware and the firmware and we can write code that should work
>> >> > not just on current Macs, but also on non-Macs and future Macs.
>> >> >
>> >> > I don't think the current situation is really maintainable, and I'm
>> >> > not comfortable merging code that I can't maintain.
>> >> Most of the code is contained within the thunderbolt driver. I think
>> >> there is quite some precedence for reverse engineered drivers without
>> >> specs being part of the kernel. My understanding was that, since I am
>> >> listed in MAINTAINERS, I am responsible for the driver. Now our
>> >> changes often need improvements to the pci core, which is why I think
>> >> merging through your tree is a good idea (without transferring
>> >> responsibility). The changes to the drivers/pci should be supported by
>> >> the PCI-spec and make sense without knowing about thunderbolt (but it
>> >> might be the case that thunderbolt is the only user of these
>> >> features).
>> >>
>> >> Specifically for this series we want to:
>> >>  - whitelist thunderbolt bridges for PM. Detecting those bridges is
>> >> non-standard but I think this is acceptable, since this
>> >> blacklist/whitelist is basically a quirk.
>> >>  - Load our portdrv on tb bridges. PCI just sees another portdriver
>> >> and all the reverse engineered magic lives inside the driver.
>> >>  - Forward more PM callbacks to portdrivers (not tb specific)
>> >>  - hotplug D3cold fixes: resume around board_added/remove_board,
>> >> ignore interrupts in d3cold (not tb specific and probably a general
>> >> bugfix)
>> >>  - Make pci not fail if bridges have been put into D3cold by some
>> >> external mechanism.
>> >>
>> >> So maybe you could review the pci changes as a solution to the problem
>> >> "we want to load a custom portdriver which can put bridges into d3cold
>> >> in a device specific way". We certainly to not expect you to take
>> >> responsibility for the thunderbolt driver.
>> >
>> > That's a fine solution as far as I'm personally concerned.  I think
>> > it's poor for Linux overall, because I think it's fragile, and it's
>> > disappointing that a technology as important as Thunderbolt is so
>> > poorly supported by the promulgators.  But if you're willing to work
>> > in that environment, that's great.
>> >
>> > You maintain the thunderbolt code and merge changes, and I'll review
>> > the pieces that touch drivers/pci.  I do have a couple comments on
>> > those pieces, but I don't think they'll be major.
>> >
>> > I just want to get out of the business of merging drivers/thunderbolt
>> > code that I can't maintain.
>>
>> [+ Greg]
>>
>> Hi Greg,
>>
>> do you mind if we revert to the old scheme and merge TB changes
>> through your tree?
>
> I will be glad to take them, feel free to send them on to me.

Thanks!
Andreas

> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
  2016-05-13 11:15 ` [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete Lukas Wunner
@ 2016-07-18 13:18   ` Rafael J. Wysocki
  2016-08-07  9:56     ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-07-18 13:18 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki, Alan Stern

On Friday, May 13, 2016 01:15:31 PM Lukas Wunner wrote:
> Since commit aae4518b3124 ("PM / sleep: Mechanism to avoid resuming
> runtime-suspended devices unnecessarily"), we no longer wake up devices
> which are already runtime suspended upon entering system sleep
> ("direct-complete").
> 
> However commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might
> have been reset by firmware") changed this to mandatorily runtime resume
> such devices after the system is woken.  The motivation was to ensure
> that devices do not remain in a reset-power-on state after system
> resume, potentially preventing deep SoC-wide low-power states from being
> entered on idle.
> 
> This is counter-productive for devices of which we know that the
> mandatory runtime resume is unnecessary.  Thunderbolt on the Mac is a
> case in point: Runtime resume not just powers up the controller, but
> multiple adjacent chips, including a 15V boost converter, multiplexers
> and an eeprom.  Gratuitously powering this up after every system sleep
> burns a not insignificant amount of energy and needlessly strains the
> hardware.
> 
> Perhaps it would have been better to carry out the mandatory runtime
> resume only for those devices that actually need it, but at least we
> should allow an opt-out.
> 
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

I don't like this patch and especially adding a new dev_pm_ops flag to
work around something that you're seeing as an issue in the generic ops.

It is sort of like saying "the generic ops don't work for me, so modify
them as well as struct dev_pm_ops", but maybe it's better to change the
PCI bus type to do something different from calling the generic function?

Or you can add a ->complete callback to your driver that will clear
power.direct_complete for the device in question.

> ---
>  drivers/base/power/generic_ops.c | 3 ++-
>  include/linux/pm.h               | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c
> index 07c3c4a..6e88f55 100644
> --- a/drivers/base/power/generic_ops.c
> +++ b/drivers/base/power/generic_ops.c
> @@ -316,7 +316,8 @@ void pm_complete_with_resume_check(struct device *dev)
>  	 * the sleep state it is going out of and it has never been resumed till
>  	 * now, resume it in case the firmware powered it up.
>  	 */
> -	if (dev->power.direct_complete && pm_resume_via_firmware())
> +	if (dev->power.direct_complete && pm_resume_via_firmware() &&
> +	    !dev->power.direct_complete_noresume)
>  		pm_request_resume(dev);
>  }
>  EXPORT_SYMBOL_GPL(pm_complete_with_resume_check);
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 6a5d654..023de94 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -596,6 +596,7 @@ struct dev_pm_info {
>  	unsigned int		use_autosuspend:1;
>  	unsigned int		timer_autosuspends:1;
>  	unsigned int		memalloc_noio:1;
> +	unsigned int		direct_complete_noresume:1;
>  	enum rpm_request	request;
>  	enum rpm_status		runtime_status;
>  	int			runtime_error;
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-06-17 22:14     ` Lukas Wunner
@ 2016-07-18 13:39       ` Rafael J. Wysocki
  2016-08-03 12:28         ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-07-18 13:39 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki

On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
> On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > > There are devices wich are not power-managed by the platform, yet can be
> > 
> > s/wich/which/
> 
> Oops.
> 
> > 
> > > runtime suspended to D3cold with some other mechanism.  When putting the
> > > system to sleep, we currently handle such devices improperly by trying
> > > to transition them from D3cold to D3hot (the default power state defined
> > > at the beginning of pci_target_state()).  Avoid that.
> > > 
> > > An example for devices affected by this are Thunderbolt controllers
> > > built into Macs which can be put into D3cold with nonstandard ACPI
> > > methods.
> > > 
> > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > 
> > This needs an ack from Rafael.
> > 
> > Naive question: why is the default target_state PCI_D3hot?
> 
> No idea. :-)

Because D3_hot is the deepest state you can *program* the device to go into
unless the platform can cut off power from it.

> > 
> > > ---
> > >  drivers/pci/pci.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 791dfe7..6af9911 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
> > >  			      && !(dev->pme_support & (1 << target_state)))
> > >  				target_state--;
> > >  		}
> > > +	} else if (dev->current_state == PCI_D3cold) {
> > > +		target_state = PCI_D3cold;
> > >  	}
> > 
> > This only covers the case of !device_may_wakeup().  So I guess
> > device_may_wakeup() is false for these Thunderbolt controllers.
> 
> Correct. device_may_wakeup() is defined in include/linux/pm_wakeup.h as:
> dev->power.can_wakeup && !!dev->power.wakeup
> 
> The first one, dev->power.can_wakeup is true because PME is claimed to be
> supported for all power states in the PMC register, so pci_pm_init() calls
> device_set_wakeup_capable(&dev->dev, true):
>   Capabilities: [80] Power Management version 3
>     Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
>     Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> 
> The second one, dev->power.wakeup is false because device_wakeup_enable()
> is never called.
> 
> 
> > Is there a reason you don't want to do this check for devices that
> > may wakeup?
> 
> Fear of breaking things. It would mean that a device would be left in
> D3cold even though it may not be able to signal wakeup from that power
> state.

Then it should not be put into D3_cold at run time too if it is wakeup-capable.

> That's a change of behaviour the consequences of which I cannot
> estimate. Intuitively, I would expect breakage from such a change.

That would have been the case if the device had been capable of signaling
wakeup from D3_cold at run time, but not from system sleep.  However, that
can only happen when platform_pci_power_manageable() is true AFAICS.

So I'd change the switch () under the platform_pci_power_manageable() check to
return "state" in the default case and then do

	return dev->current_state < target_state ? target_state : dev->current_state;

at the end of the function.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold
  2016-05-13 11:15 ` [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold Lukas Wunner
  2016-06-17 21:18   ` Bjorn Helgaas
@ 2016-07-18 13:55   ` Rafael J. Wysocki
  1 sibling, 0 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-07-18 13:55 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, Andreas Noever, Huang Ying, Rafael J. Wysocki

On Friday, May 13, 2016 01:15:31 PM Lukas Wunner wrote:
> The PM control register is not accessible in D3cold so we shouldn't try
> writing to it in pci_raw_set_power_state() and return an error instead.
> 
> The current behaviour is fatal for devices which are not power-managed
> by the platform, yet can be runtime suspended to D3cold with some other
> mechanism by the driver:
> 
> - When the device is runtime resumed, pci_pm_runtime_resume() first
>   calls pci_restore_standard_config() which calls pci_set_power_state()
>   which calls pci_raw_set_power_state() to put the device into D0.
>   This fails since the device is still in D3cold.  It will be powered up
>   later on when pci_pm_runtime_resume() calls the driver's
>   ->runtime_resume callback.
> 
> - pci_raw_set_power_state() prints a message that the device refused to
>   change power state and returns 0.  Further up in the call stack,
>   pci_restore_standard_config() calls pci_restore_state(), which fails
>   since the device is in D3cold, but nevertheless invalidates the
>   saved_state.
> 
> - When pci_pm_runtime_resume() finally calls the driver ->runtime_resume
>   callback to power up the device, the saved_state is gone.
> 
> Return an error from pci_raw_set_power_state() to avoid this.
> 
> An example for devices affected by this are Thunderbolt controllers
> built into Macs which can be put into D3cold with nonstandard ACPI
> methods.
> 
> Unfortunately we rely on pci_raw_set_power_state()'s current behaviour
> in several places: When platform_pci_set_power_state() is called to wake
> a device from D3cold, its current_state is not updated even though it is
> no longer in D3cold.  Instead, pci_raw_set_power_state() is assumed to
> clean up the resulting incongruence.  Fix by setting current_state to
> PCI_UNKNOWN after platform_pci_set_power_state().
> 
> Also, when a bridge is put into D3cold, its children's current_state is
> changed to D3cold in __pci_complete_power_transition().  (Introduced by
> commit 448bd857d48e ("PCI/PM: add PCIe runtime D3cold support").) This
> doesn't necessarily reflect the children's actual power state: They may
> still be powered on, they're just no longer accessible.  However this
> shouldn't be a concern because if the children are accessed, their
> parent needs to resume anyway and the PM core takes care of this.
> Again, pci_raw_set_power_state() is relied upon to clean up the
> current_state when the children are resumed the next time.  We cannot
> reliably reconstruct the children's current_state when resuming their
> parent.  We also shouldn't blindly set it to PCI_UNKNOWN since some
> children may actually be turned off and D3cold is their correct
> current_state.  Therefore fix by no longer touching the children's
> current_state at all.
> 
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci.c | 43 ++++++++++---------------------------------
>  1 file changed, 10 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 95727b4..791dfe7 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -612,6 +612,9 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
>  	if (!dev->pm_cap)
>  		return -EIO;
>  
> +	if (dev->current_state == PCI_D3cold)
> +		return -EIO;
> +

I can agree with this.

>  	if (state < PCI_D0 || state > PCI_D3hot)
>  		return -EINVAL;
>  
> @@ -728,8 +731,10 @@ void pci_update_current_state(struct pci_dev *dev, pci_power_t state)
>   */
>  void pci_power_up(struct pci_dev *dev)
>  {
> -	if (platform_pci_power_manageable(dev))
> +	if (platform_pci_power_manageable(dev)) {
>  		platform_pci_set_power_state(dev, PCI_D0);
> +		dev->current_state = PCI_UNKNOWN;

Why is this necessary?

> +	}
>  
>  	pci_raw_set_power_state(dev, PCI_D0);
>  	pci_update_current_state(dev, PCI_D0);
> @@ -746,8 +751,10 @@ static int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state)
>  
>  	if (platform_pci_power_manageable(dev)) {
>  		error = platform_pci_set_power_state(dev, state);
> -		if (!error)
> +		if (!error) {
> +			dev->current_state = PCI_UNKNOWN;

Again, why is this necessary?

>  			pci_update_current_state(dev, state);
> +		}
>  	} else
>  		error = -ENODEV;
>  
> @@ -809,30 +816,6 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
>  }
>  
>  /**
> - * __pci_dev_set_current_state - Set current state of a PCI device
> - * @dev: Device to handle
> - * @data: pointer to state to be set
> - */
> -static int __pci_dev_set_current_state(struct pci_dev *dev, void *data)
> -{
> -	pci_power_t state = *(pci_power_t *)data;
> -
> -	dev->current_state = state;
> -	return 0;
> -}
> -
> -/**
> - * __pci_bus_set_current_state - Walk given bus and set current state of devices
> - * @bus: Top bus of the subtree to walk.
> - * @state: state to be set
> - */
> -static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
> -{
> -	if (bus)
> -		pci_walk_bus(bus, __pci_dev_set_current_state, &state);
> -}
> -
> -/**
>   * __pci_complete_power_transition - Complete power transition of a PCI device
>   * @dev: PCI device to handle.
>   * @state: State to put the device into.
> @@ -841,15 +824,9 @@ static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
>   */
>  int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state)
>  {
> -	int ret;
> -
>  	if (state <= PCI_D0)
>  		return -EINVAL;
> -	ret = pci_platform_power_transition(dev, state);
> -	/* Power off the bridge may power off the whole hierarchy */
> -	if (!ret && state == PCI_D3cold)
> -		__pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
> -	return ret;
> +	return pci_platform_power_transition(dev, state);
>  }
>  EXPORT_SYMBOL_GPL(__pci_complete_power_transition);

What about if powering off the bridge does remove power from the hierarchy
below?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type
  2016-06-17 22:51   ` Bjorn Helgaas
@ 2016-07-20  0:30     ` Rafael J. Wysocki
  2016-07-20  6:59     ` Lukas Wunner
  1 sibling, 0 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-07-20  0:30 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner; +Cc: linux-pci, linux-pm, Andreas Noever

On Friday, June 17, 2016 05:51:52 PM Bjorn Helgaas wrote:
> On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > A Thunderbolt controller is a PCIe switch which, as defined in the PCIe
> > spec, appears to the OS "as a collection of virtual PCI-to-PCI bridges".
> > 
> > We're about to add support for Apple's nonstandard ACPI methods to power
> > Thunderbolt controllers up and down.  To facilitate that, allocate a
> > port service for every PCI bridge belonging to a Thunderbolt controller.
> > 
> > This port service might come in handy for other use cases, e.g. device
> > initialization of Thunderbolt controllers.
> > 
> > To understand when and how this port service will be allocated, consider
> > the PCI devices exposed by a Thunderbolt host controller:
> > 
> >   (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> >                                      +-- Downstream Bridge 1 --
> >                                      +-- Downstream Bridge 2 --
> >                                      ...
> > 
> > The upstream and downstream bridges represent the PCIe switch and a
> > Thunderbolt port service will be allocated for each of them.  Hotplugged
> > devices will appear behind the downstream bridges.  The NHI (Native Host
> > Interface) is a virtual PCI device to manage the switch fabric and is
> > not relevant here.  It uses class 0x88000, so it is not a PCIe port.
> > 
> > Next, consider the PCI devices exposed by Thunderbolt controllers built
> > into hotplugged devices:
> > 
> >   -- Upstream Bridge ---- Downstream Bridge ---- Hotplugged device
> > 
> > Again, Thunderbolt port services will be allocated for the upstream and
> > downstream bridge, but not for the hotplugged device, which might use
> > e.g. class 0x20000 if it's a Thunderbolt Ethernet adapter.
> 
> I don't really *like* the portdrv infrastructure, even though we're
> sort of stuck with it now.  It seems like all it really does is allow
> multiple sub-drivers to attach to a single device and share interrupts
> between them.  And we get some extra devices in sysfs that don't fit
> the regular PCI model.  We used to support loadable sub-drivers
> (pciehp, aer, etc.), but we decided that didn't really make sense
> (though I notice you do support thunderbolt as a module).
> 
> I think we would be better off if the PCIe services (hotplug, AER,
> etc.) were directly integrated into the PCI core without the portdrv
> abstraction in the middle.  But anyway, we do have portdrv, and the
> only question here is whether extending it for Thunderbolt is the
> right thing.
> 
> So the question for Thunderbolt is what benefit you get from being a
> portdrv sub-driver.  It seems like basically a way for you to hook on
> to PCI bridges that happen to be Thunderbolt controllers.  I don't
> think you really use any portdrv services (other than forwarding the
> PM ops down to you, which a regular PCI device driver would get for
> free).

Moreover, this approach creates sort of a priority inversion between
the Thunderbolt driver and the ports, because the driver, which really
is a superior entity (as it corresponds to the switch as a whole) is
caused to bind to children of the ports (ie. PCIe service devices).

That leads to ordering issues in probe and then suspend/resume etc.

> upstream.c does a lot of ACPI stuff; I can't tell whether it has more
> affinity with ACPI or with PCI.  I don't see any PNP IDs though, so I
> guess you just look for the magic method names in the ACPI device
> associated with some PCI device.  That seems a little bit "back-door"
> to me; from an ASL point of view, I would think you'd want to start
> from a _HID and interpret the device based on that.

Or from _ADR if the device object in question maps to a PCI device.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type
  2016-06-17 22:51   ` Bjorn Helgaas
  2016-07-20  0:30     ` Rafael J. Wysocki
@ 2016-07-20  6:59     ` Lukas Wunner
  1 sibling, 0 replies; 65+ messages in thread
From: Lukas Wunner @ 2016-07-20  6:59 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki

On Fri, Jun 17, 2016 at 05:51:52PM -0500, Bjorn Helgaas wrote:
> On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > A Thunderbolt controller is a PCIe switch which, as defined in the PCIe
> > spec, appears to the OS "as a collection of virtual PCI-to-PCI bridges".
> > 
> > We're about to add support for Apple's nonstandard ACPI methods to power
> > Thunderbolt controllers up and down.  To facilitate that, allocate a
> > port service for every PCI bridge belonging to a Thunderbolt controller.
> > 
> > This port service might come in handy for other use cases, e.g. device
> > initialization of Thunderbolt controllers.
> > 
> > To understand when and how this port service will be allocated, consider
> > the PCI devices exposed by a Thunderbolt host controller:
> > 
> >   (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
> >                                      +-- Downstream Bridge 1 --
> >                                      +-- Downstream Bridge 2 --
> >                                      ...
> > 
> > The upstream and downstream bridges represent the PCIe switch and a
> > Thunderbolt port service will be allocated for each of them.  Hotplugged
> > devices will appear behind the downstream bridges.  The NHI (Native Host
> > Interface) is a virtual PCI device to manage the switch fabric and is
> > not relevant here.  It uses class 0x88000, so it is not a PCIe port.
> > 
> > Next, consider the PCI devices exposed by Thunderbolt controllers built
> > into hotplugged devices:
> > 
> >   -- Upstream Bridge ---- Downstream Bridge ---- Hotplugged device
> > 
> > Again, Thunderbolt port services will be allocated for the upstream and
> > downstream bridge, but not for the hotplugged device, which might use
> > e.g. class 0x20000 if it's a Thunderbolt Ethernet adapter.
> 
> I don't really *like* the portdrv infrastructure, even though we're
> sort of stuck with it now.  It seems like all it really does is allow
> multiple sub-drivers to attach to a single device and share interrupts
> between them.  And we get some extra devices in sysfs that don't fit
> the regular PCI model.  We used to support loadable sub-drivers
> (pciehp, aer, etc.), but we decided that didn't really make sense
> (though I notice you do support thunderbolt as a module).
> 
> I think we would be better off if the PCIe services (hotplug, AER,
> etc.) were directly integrated into the PCI core without the portdrv
> abstraction in the middle.  But anyway, we do have portdrv, and the
> only question here is whether extending it for Thunderbolt is the
> right thing.
> 
> So the question for Thunderbolt is what benefit you get from being a
> portdrv sub-driver.  It seems like basically a way for you to hook on
> to PCI bridges that happen to be Thunderbolt controllers.  I don't
> think you really use any portdrv services (other than forwarding the
> PM ops down to you, which a regular PCI device driver would get for
> free).

The assessment above is entirely correct, I'm sort of abusing the
portdrv infrastructure as a way to bind to the upstream bridge.

For comparison, Optimus GPUs are also suspended to D3cold with a
non-standard method (i.e., not by the ACPI platform). The way we
handle that is to assign a dev_pm_domain to the device using
dev_pm_domain_set(). You can just think of "dev_pm_domain" as a
fancy name for overriding the callbacks in pci_dev_pm_ops:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/vga/vga_switcheroo.c#n1014

I cannot use that method with Thunderbolt because dev_pm_domain_set()
can only be called for unbound devices. And the upstream bridge will
already have been bound (to portdrv) when thunderbolt.ko loads.

I'm waiting for Rafael to weigh in if the dev_pm_domain_set() method
is the right thing to do for devices which are suspended to D3cold
in a non-standard way, and whether the "device not bound" restriction
on dev_pm_domain_set() can be lifted. If so, I could rework this
series to use that instead of binding to portdrv.


> upstream.c does a lot of ACPI stuff; I can't tell whether it has more
> affinity with ACPI or with PCI.  I don't see any PNP IDs though, so I
> guess you just look for the magic method names in the ACPI device
> associated with some PCI device.  That seems a little bit "back-door"
> to me; from an ASL point of view, I would think you'd want to start
> from a _HID and interpret the device based on that.

Apple's ACPI methods to power the controller up/down are located
below the NHI device in the namespace. I just use the ACPI_HANDLE()
macro to get from the NHI's PCI device to its ACPI companion's
handle, then find the methods below that. This avoids the need to
search the namespace for a _HID or _ADR:
https://github.com/l1k/linux/commit/65f56e6c8446#diff-66575f0946b607aa866a23518687f8b1R281

Best regards,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-07-08  1:28   ` Rafael J. Wysocki
@ 2016-07-20  7:23     ` Lukas Wunner
  2016-07-20 12:48       ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-07-20  7:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PCI, Linux PM, Andreas Noever, Alan Stern, Huang Ying

On Fri, Jul 08, 2016 at 03:28:12AM +0200, Rafael J. Wysocki wrote:
> On Thu, Jul 7, 2016 at 5:02 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > (1) Would it be possible to allow dev_pm_domain_set() for already
> >     bound devices? (It would allow me to simplify this series
> >     considerably.)
> 
> I don't think so, because setting a PM domain generally changes the
> set of PM callbacks for the device and it may not be safe to call it
> after the driver has been bound.

That sounds more like a locking problem than anything else.

If the system is awake and the device is active, it would seem safe
to change its set of PM callbacks. Am I missing something?

How about checking in dev_pm_domain_set() if pm_runtime_active(dev)
and calling lock_system_sleep() / unlock_system_sleep() to ensure that?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs
  2016-07-20  7:23     ` Lukas Wunner
@ 2016-07-20 12:48       ` Rafael J. Wysocki
  0 siblings, 0 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-07-20 12:48 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Linux PCI, Linux PM, Andreas Noever,
	Alan Stern, Huang Ying

On Wednesday, July 20, 2016 09:23:59 AM Lukas Wunner wrote:
> On Fri, Jul 08, 2016 at 03:28:12AM +0200, Rafael J. Wysocki wrote:
> > On Thu, Jul 7, 2016 at 5:02 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > (1) Would it be possible to allow dev_pm_domain_set() for already
> > >     bound devices? (It would allow me to simplify this series
> > >     considerably.)
> > 
> > I don't think so, because setting a PM domain generally changes the
> > set of PM callbacks for the device and it may not be safe to call it
> > after the driver has been bound.
> 
> That sounds more like a locking problem than anything else.
> 
> If the system is awake and the device is active, it would seem safe
> to change its set of PM callbacks. Am I missing something?
> 
> How about checking in dev_pm_domain_set() if pm_runtime_active(dev)

Realistically, you'd need to disable runtime PM too or at least bump up
the usage count.

> and calling lock_system_sleep() / unlock_system_sleep() to ensure that?

That might work.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold
  2016-06-17 22:52   ` Bjorn Helgaas
@ 2016-08-02 16:27     ` Lukas Wunner
  2016-08-05  0:29       ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-02 16:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-pm, Andreas Noever, Mika Westerberg, Rafael J. Wysocki

On Fri, Jun 17, 2016 at 05:52:04PM -0500, Bjorn Helgaas wrote:
> On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > If a hotplug port is suspended to D3cold, its slot status register
> > cannot be read.  If that hotplug port happens to share its IRQ with
> > other devices, then whenever an interrupt occurs for one of these
> > devices, a "no response from device" message is logged with level
> > KERN_INFO.  Apart from this annoyance, CPU time is needlessly spent
> > trying to read the slot status register even though we know in advance
> > that it will fail.
> 
> I guess this is a pretty generic problem that could affect any device
> that shares an IRQ.
> 
> I think I'll queue this on my pci/pm branch, since it seems closely
> related to Mika's "PCI: Add runtime PM support for PCIe ports".
> 
> Did you check for the same issue in other likely places, e.g., AER,
> PME, etc.?

Apologies for the delay, I've checked all other port services now:

- Our AER and PME drivers bind only to root ports and I can't imagine
  how those could go to D3cold, they're part of the root complex or
  PCH and I'm not aware of a chipset that would allow turning off the
  power well for individual PCIe ports.

- DPC on the other hand also binds to downstream ports. I do have
  downstream ports in my machine (as part of the Thunderbolt switch)
  but they do not have the DPC capability. I've never seen devices
  with that capability and cannot estimate what the chances are of them
  going to D3cold and sharing an IRQ with other devices. It's probably
  not worth preparing for such a situation without knowing its likelihood.

- VC: We allocate a port service for this but do not have a driver.

Bottom line is that the patch for the PCIe hotplug driver seems to be
sufficient.

FWIW, on my machine I see numerous devices with AER, PME and VC
capabilities. The Nvidia GPU as well as network, Firewire and
Thunderbolt controllers all have those. AFAICS we ignore them
because their specific drivers do not care for the capabilities
and portdrv only binds to root ports.

This seems to support your argument that the PCIe capabilities
should be handled by the core rather than portdrv, as we could
then make use of the capabilities on endpoint devices in a
universal manner.

On the other hand, I think we cannot use a separate MSI for
AER, PME et al, can we? If we cannot, then AER and PME would
share the IRQ with an endpoint device's regular interrupt handler,
and that might ruin performance. E.g. the Broadcom wireless card
generates millions of interrupts on a sufficiently active WiFi.
Accessing the device's config space on every interrupt just to
check for AER or PME seems like a bad idea. So at the very least
we'd need some kind of opt-out.

Best regards,

Lukas

> > On MacBook Pros introduced 2011 and 2012, the IRQ of a Thunderbolt
> > hotplug port is unfortunately shared with a wireless card, an audio card
> > and an SDXC controller.  When the Thunderbolt controller is powered
> > down, the machine carries out at least one unneeded slot status register
> > read for each wireless packet received and prints a corresponding error
> > message to the system log.
> > 
> > The hotplug port's current_state will be D3cold when it's powered down,
> > so ignore interrupts that occur during that power state.
> > 
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/hotplug/pciehp_hpc.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> > index 5c24e93..08e84d6 100644
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -546,6 +546,10 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
> >  	u8 present;
> >  	bool link;
> >  
> > +	/* Interrupts cannot originate from a controller that's asleep */
> > +	if (pdev->current_state == PCI_D3cold)
> > +		return IRQ_NONE;
> > +
> >  	/*
> >  	 * In order to guarantee that all interrupt events are
> >  	 * serviced, we need to re-inspect Slot Status register after
> > -- 
> > 2.8.1

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-07-18 13:39       ` Rafael J. Wysocki
@ 2016-08-03 12:28         ` Lukas Wunner
  2016-08-03 23:50           ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-03 12:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, linux-pci, linux-pm, Andreas Noever, Rafael J. Wysocki

On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
> On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
> > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> > > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > > > There are devices wich are not power-managed by the platform, yet can be
> > > > runtime suspended to D3cold with some other mechanism.  When putting the
> > > > system to sleep, we currently handle such devices improperly by trying
> > > > to transition them from D3cold to D3hot (the default power state defined
> > > > at the beginning of pci_target_state()).  Avoid that.
> > > > 
> > > > An example for devices affected by this are Thunderbolt controllers
> > > > built into Macs which can be put into D3cold with nonstandard ACPI
> > > > methods.
> > > > 
> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > > 
> > > This needs an ack from Rafael.
> > > 
> > > > ---
> > > >  drivers/pci/pci.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 791dfe7..6af9911 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
> > > >  			      && !(dev->pme_support & (1 << target_state)))
> > > >  				target_state--;
> > > >  		}
> > > > +	} else if (dev->current_state == PCI_D3cold) {
> > > > +		target_state = PCI_D3cold;
> > > >  	}
> > > 
> > > This only covers the case of !device_may_wakeup().  So I guess
> > > device_may_wakeup() is false for these Thunderbolt controllers.
> > > Is there a reason you don't want to do this check for devices that
> > > may wakeup?
> > 
> > Fear of breaking things. It would mean that a device would be left in
> > D3cold even though it may not be able to signal wakeup from that power
> > state.
> 
> Then it should not be put into D3_cold at run time too if it is wakeup-
> capable.
> 
> > That's a change of behaviour the consequences of which I cannot
> > estimate. Intuitively, I would expect breakage from such a change.
> 
> That would have been the case if the device had been capable of signaling
> wakeup from D3_cold at run time, but not from system sleep.  However, that
> can only happen when platform_pci_power_manageable() is true AFAICS.
> 
> So I'd change the switch () under the platform_pci_power_manageable() check
> to return "state" in the default case and then do
> 
> 	return dev->current_state < target_state ? target_state : dev->current_state;
> 
> at the end of the function.

That suggestion doesn't seem to be correct because there's another
value besides PCI_D3cold which is also greater than PCI_D3hot,
namely PCI_UNKNOWN. (If the device is in that state, e.g. after
pci_device_remove() has been called, and the system goes to sleep,
we'd leave the device as is and not put it into D3hot as we do now.)

I will update this patch with Bjorn's suggestion to also leave the
device in D3cold if it is wakeup-capable. The idea is to just change
the default state in the first line of the function like this:

-	pci_power_t target_state = PCI_D3hot;
+	pci_power_t target_state =
+		dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-03 12:28         ` Lukas Wunner
@ 2016-08-03 23:50           ` Rafael J. Wysocki
  2016-08-04  0:45             ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-03 23:50 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Linux PCI, Linux PM,
	Andreas Noever, Rafael J. Wysocki

On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
> On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
>> On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
>> > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
>> > > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
>> > > > There are devices wich are not power-managed by the platform, yet can be
>> > > > runtime suspended to D3cold with some other mechanism.  When putting the
>> > > > system to sleep, we currently handle such devices improperly by trying
>> > > > to transition them from D3cold to D3hot (the default power state defined
>> > > > at the beginning of pci_target_state()).  Avoid that.
>> > > >
>> > > > An example for devices affected by this are Thunderbolt controllers
>> > > > built into Macs which can be put into D3cold with nonstandard ACPI
>> > > > methods.
>> > > >
>> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
>> > >
>> > > This needs an ack from Rafael.
>> > >
>> > > > ---
>> > > >  drivers/pci/pci.c | 2 ++
>> > > >  1 file changed, 2 insertions(+)
>> > > >
>> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> > > > index 791dfe7..6af9911 100644
>> > > > --- a/drivers/pci/pci.c
>> > > > +++ b/drivers/pci/pci.c
>> > > > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
>> > > >                               && !(dev->pme_support & (1 << target_state)))
>> > > >                                 target_state--;
>> > > >                 }
>> > > > +       } else if (dev->current_state == PCI_D3cold) {
>> > > > +               target_state = PCI_D3cold;
>> > > >         }
>> > >
>> > > This only covers the case of !device_may_wakeup().  So I guess
>> > > device_may_wakeup() is false for these Thunderbolt controllers.
>> > > Is there a reason you don't want to do this check for devices that
>> > > may wakeup?
>> >
>> > Fear of breaking things. It would mean that a device would be left in
>> > D3cold even though it may not be able to signal wakeup from that power
>> > state.
>>
>> Then it should not be put into D3_cold at run time too if it is wakeup-
>> capable.
>>
>> > That's a change of behaviour the consequences of which I cannot
>> > estimate. Intuitively, I would expect breakage from such a change.
>>
>> That would have been the case if the device had been capable of signaling
>> wakeup from D3_cold at run time, but not from system sleep.  However, that
>> can only happen when platform_pci_power_manageable() is true AFAICS.
>>
>> So I'd change the switch () under the platform_pci_power_manageable() check
>> to return "state" in the default case and then do
>>
>>       return dev->current_state < target_state ? target_state : dev->current_state;
>>
>> at the end of the function.
>
> That suggestion doesn't seem to be correct because there's another
> value besides PCI_D3cold which is also greater than PCI_D3hot,
> namely PCI_UNKNOWN. (If the device is in that state, e.g. after
> pci_device_remove() has been called, and the system goes to sleep,
> we'd leave the device as is and not put it into D3hot as we do now.)

Right, I obviously forgot about PCI_UNKNOWN.

> I will update this patch with Bjorn's suggestion to also leave the
> device in D3cold if it is wakeup-capable. The idea is to just change
> the default state in the first line of the function like this:
>
> -       pci_power_t target_state = PCI_D3hot;
> +       pci_power_t target_state =
> +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;

That should work (even though it is a little clumsy IMO).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-03 23:50           ` Rafael J. Wysocki
@ 2016-08-04  0:45             ` Lukas Wunner
  2016-08-04  1:07               ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-04  0:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
> >> On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
> >> > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> >> > > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> >> > > > There are devices wich are not power-managed by the platform, yet can be
> >> > > > runtime suspended to D3cold with some other mechanism.  When putting the
> >> > > > system to sleep, we currently handle such devices improperly by trying
> >> > > > to transition them from D3cold to D3hot (the default power state defined
> >> > > > at the beginning of pci_target_state()).  Avoid that.
> >> > > >
> >> > > > An example for devices affected by this are Thunderbolt controllers
> >> > > > built into Macs which can be put into D3cold with nonstandard ACPI
> >> > > > methods.
> >> > > >
> >> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> >> > >
> >> > > This needs an ack from Rafael.
> >> > >
> >> > > > ---
> >> > > >  drivers/pci/pci.c | 2 ++
> >> > > >  1 file changed, 2 insertions(+)
> >> > > >
> >> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> > > > index 791dfe7..6af9911 100644
> >> > > > --- a/drivers/pci/pci.c
> >> > > > +++ b/drivers/pci/pci.c
> >> > > > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
> >> > > >                               && !(dev->pme_support & (1 << target_state)))
> >> > > >                                 target_state--;
> >> > > >                 }
> >> > > > +       } else if (dev->current_state == PCI_D3cold) {
> >> > > > +               target_state = PCI_D3cold;
> >> > > >         }
> >> > >
> >> > > This only covers the case of !device_may_wakeup().  So I guess
> >> > > device_may_wakeup() is false for these Thunderbolt controllers.
> >> > > Is there a reason you don't want to do this check for devices that
> >> > > may wakeup?
> >> >
> >> > Fear of breaking things. It would mean that a device would be left in
> >> > D3cold even though it may not be able to signal wakeup from that power
> >> > state.
> >>
> >> Then it should not be put into D3_cold at run time too if it is wakeup-
> >> capable.
> >>
> >> > That's a change of behaviour the consequences of which I cannot
> >> > estimate. Intuitively, I would expect breakage from such a change.
> >>
> >> That would have been the case if the device had been capable of signaling
> >> wakeup from D3_cold at run time, but not from system sleep.  However, that
> >> can only happen when platform_pci_power_manageable() is true AFAICS.
> >>
> >> So I'd change the switch () under the platform_pci_power_manageable() check
> >> to return "state" in the default case and then do
> >>
> >>       return dev->current_state < target_state ? target_state : dev->current_state;
> >>
> >> at the end of the function.
> >
> > That suggestion doesn't seem to be correct because there's another
> > value besides PCI_D3cold which is also greater than PCI_D3hot,
> > namely PCI_UNKNOWN. (If the device is in that state, e.g. after
> > pci_device_remove() has been called, and the system goes to sleep,
> > we'd leave the device as is and not put it into D3hot as we do now.)
> 
> Right, I obviously forgot about PCI_UNKNOWN.
> 
> > I will update this patch with Bjorn's suggestion to also leave the
> > device in D3cold if it is wakeup-capable. The idea is to just change
> > the default state in the first line of the function like this:
> >
> > -       pci_power_t target_state = PCI_D3hot;
> > +       pci_power_t target_state =
> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
> 
> That should work (even though it is a little clumsy IMO).

Not sure why that is clumsy but happy to use something else if you
have a suggestion?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-04  0:45             ` Lukas Wunner
@ 2016-08-04  1:07               ` Rafael J. Wysocki
  2016-08-04  8:14                 ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-04  1:07 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Thu, Aug 4, 2016 at 2:45 AM, Lukas Wunner <lukas@wunner.de> wrote:
> On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
>> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> > On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
>> >> On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
>> >> > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
>> >> > > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
>> >> > > > There are devices wich are not power-managed by the platform, yet can be
>> >> > > > runtime suspended to D3cold with some other mechanism.  When putting the
>> >> > > > system to sleep, we currently handle such devices improperly by trying
>> >> > > > to transition them from D3cold to D3hot (the default power state defined
>> >> > > > at the beginning of pci_target_state()).  Avoid that.
>> >> > > >
>> >> > > > An example for devices affected by this are Thunderbolt controllers
>> >> > > > built into Macs which can be put into D3cold with nonstandard ACPI
>> >> > > > methods.
>> >> > > >
>> >> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
>> >> > >
>> >> > > This needs an ack from Rafael.
>> >> > >
>> >> > > > ---
>> >> > > >  drivers/pci/pci.c | 2 ++
>> >> > > >  1 file changed, 2 insertions(+)
>> >> > > >
>> >> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> >> > > > index 791dfe7..6af9911 100644
>> >> > > > --- a/drivers/pci/pci.c
>> >> > > > +++ b/drivers/pci/pci.c
>> >> > > > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
>> >> > > >                               && !(dev->pme_support & (1 << target_state)))
>> >> > > >                                 target_state--;
>> >> > > >                 }
>> >> > > > +       } else if (dev->current_state == PCI_D3cold) {
>> >> > > > +               target_state = PCI_D3cold;
>> >> > > >         }
>> >> > >
>> >> > > This only covers the case of !device_may_wakeup().  So I guess
>> >> > > device_may_wakeup() is false for these Thunderbolt controllers.
>> >> > > Is there a reason you don't want to do this check for devices that
>> >> > > may wakeup?
>> >> >
>> >> > Fear of breaking things. It would mean that a device would be left in
>> >> > D3cold even though it may not be able to signal wakeup from that power
>> >> > state.
>> >>
>> >> Then it should not be put into D3_cold at run time too if it is wakeup-
>> >> capable.
>> >>
>> >> > That's a change of behaviour the consequences of which I cannot
>> >> > estimate. Intuitively, I would expect breakage from such a change.
>> >>
>> >> That would have been the case if the device had been capable of signaling
>> >> wakeup from D3_cold at run time, but not from system sleep.  However, that
>> >> can only happen when platform_pci_power_manageable() is true AFAICS.
>> >>
>> >> So I'd change the switch () under the platform_pci_power_manageable() check
>> >> to return "state" in the default case and then do
>> >>
>> >>       return dev->current_state < target_state ? target_state : dev->current_state;
>> >>
>> >> at the end of the function.
>> >
>> > That suggestion doesn't seem to be correct because there's another
>> > value besides PCI_D3cold which is also greater than PCI_D3hot,
>> > namely PCI_UNKNOWN. (If the device is in that state, e.g. after
>> > pci_device_remove() has been called, and the system goes to sleep,
>> > we'd leave the device as is and not put it into D3hot as we do now.)
>>
>> Right, I obviously forgot about PCI_UNKNOWN.
>>
>> > I will update this patch with Bjorn's suggestion to also leave the
>> > device in D3cold if it is wakeup-capable. The idea is to just change
>> > the default state in the first line of the function like this:
>> >
>> > -       pci_power_t target_state = PCI_D3hot;
>> > +       pci_power_t target_state =
>> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
>>
>> That should work (even though it is a little clumsy IMO).
>
> Not sure why that is clumsy but happy to use something else if you
> have a suggestion?

The clumsy thing is that we'd take the target_state as D3cold only if
the device already was in that state.

Otherwise, we'd take D3hot as the target state for the same device,
which doesn't seem particularly consistent to me.

Not that I have better ideas ATM, but then the current code works for
my use cases. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-04  1:07               ` Rafael J. Wysocki
@ 2016-08-04  8:14                 ` Lukas Wunner
  2016-08-04 15:30                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-04  8:14 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Thu, Aug 04, 2016 at 03:07:56AM +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 4, 2016 at 2:45 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
> >> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
> >> > On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
> >> >> On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
> >> >> > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> >> >> > > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> >> >> > > > There are devices wich are not power-managed by the platform, yet can be
> >> >> > > > runtime suspended to D3cold with some other mechanism.  When putting the
> >> >> > > > system to sleep, we currently handle such devices improperly by trying
> >> >> > > > to transition them from D3cold to D3hot (the default power state defined
> >> >> > > > at the beginning of pci_target_state()).  Avoid that.
> >> >> > > >
> >> >> > > > An example for devices affected by this are Thunderbolt controllers
> >> >> > > > built into Macs which can be put into D3cold with nonstandard ACPI
> >> >> > > > methods.
> >> >> > > >
> >> >> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> >> >> > > > ---
> >> >> > > >  drivers/pci/pci.c | 2 ++
> >> >> > > >  1 file changed, 2 insertions(+)
> >> >> > > >
> >> >> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> >> > > > index 791dfe7..6af9911 100644
> >> >> > > > --- a/drivers/pci/pci.c
> >> >> > > > +++ b/drivers/pci/pci.c
> >> >> > > > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
> >> >> > > >                               && !(dev->pme_support & (1 << target_state)))
> >> >> > > >                                 target_state--;
> >> >> > > >                 }
> >> >> > > > +       } else if (dev->current_state == PCI_D3cold) {
> >> >> > > > +               target_state = PCI_D3cold;
> >> >> > > >         }
> >> >> > >
> >> > I will update this patch with Bjorn's suggestion to also leave the
> >> > device in D3cold if it is wakeup-capable. The idea is to just change
> >> > the default state in the first line of the function like this:
> >> >
> >> > -       pci_power_t target_state = PCI_D3hot;
> >> > +       pci_power_t target_state =
> >> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
> >>
> >> That should work (even though it is a little clumsy IMO).
> >
> > Not sure why that is clumsy but happy to use something else if you
> > have a suggestion?
> 
> The clumsy thing is that we'd take the target_state as D3cold only if
> the device already was in that state.
> 
> Otherwise, we'd take D3hot as the target state for the same device,
> which doesn't seem particularly consistent to me.
> 
> Not that I have better ideas ATM, but then the current code works for
> my use cases. :-)

The goal is to afford direct-complete to devices which are not power-
manageable by the platform but can still be runtime suspended to D3cold.
Right now we wake those devices up from D3cold to D3hot before going to
sleep, which is a waste of energy and prolongs the suspend sequence
(waking up the Thunderbolt controller takes 2 seconds).

The de facto standard to power manage such devices seems to be with
dev_pm_domain_set(). That's what vga_switcheroo does and I'll move
to that as well for v3 of this series.

I could add a "bool can_power_off" to struct dev_pm_domain.

Then I could change pci_target_state() like this:

 	pci_power_t target_state = PCI_D3hot;
 
 	if (platform_pci_power_manageable(dev)) {
 		[...]
+	} else if (dev->dev.pm_domain && dev->dev.pm_domain.can_power_off) {
+		target_state = PCI_D3cold;
 	} else if [...]

Another idea would be to add a ->choose_state hook to dev_pm_domain,
but that would have to return a PCI-specific power state, so we'd be
in clumsy territory again.

Thoughts?

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-04  8:14                 ` Lukas Wunner
@ 2016-08-04 15:30                   ` Rafael J. Wysocki
  2016-08-07  9:03                     ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-04 15:30 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Thu, Aug 4, 2016 at 10:14 AM, Lukas Wunner <lukas@wunner.de> wrote:
> On Thu, Aug 04, 2016 at 03:07:56AM +0200, Rafael J. Wysocki wrote:
>> On Thu, Aug 4, 2016 at 2:45 AM, Lukas Wunner <lukas@wunner.de> wrote:
>> > On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
>> >> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> >> > On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
>> >> >> On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
>> >> >> > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
>> >> >> > > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
>> >> >> > > > There are devices wich are not power-managed by the platform, yet can be
>> >> >> > > > runtime suspended to D3cold with some other mechanism.  When putting the
>> >> >> > > > system to sleep, we currently handle such devices improperly by trying
>> >> >> > > > to transition them from D3cold to D3hot (the default power state defined
>> >> >> > > > at the beginning of pci_target_state()).  Avoid that.
>> >> >> > > >
>> >> >> > > > An example for devices affected by this are Thunderbolt controllers
>> >> >> > > > built into Macs which can be put into D3cold with nonstandard ACPI
>> >> >> > > > methods.
>> >> >> > > >
>> >> >> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de>
>> >> >> > > > ---
>> >> >> > > >  drivers/pci/pci.c | 2 ++
>> >> >> > > >  1 file changed, 2 insertions(+)
>> >> >> > > >
>> >> >> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> >> >> > > > index 791dfe7..6af9911 100644
>> >> >> > > > --- a/drivers/pci/pci.c
>> >> >> > > > +++ b/drivers/pci/pci.c
>> >> >> > > > @@ -1943,6 +1943,8 @@ static pci_power_t pci_target_state(struct pci_dev *dev)
>> >> >> > > >                               && !(dev->pme_support & (1 << target_state)))
>> >> >> > > >                                 target_state--;
>> >> >> > > >                 }
>> >> >> > > > +       } else if (dev->current_state == PCI_D3cold) {
>> >> >> > > > +               target_state = PCI_D3cold;
>> >> >> > > >         }
>> >> >> > >
>> >> > I will update this patch with Bjorn's suggestion to also leave the
>> >> > device in D3cold if it is wakeup-capable. The idea is to just change
>> >> > the default state in the first line of the function like this:
>> >> >
>> >> > -       pci_power_t target_state = PCI_D3hot;
>> >> > +       pci_power_t target_state =
>> >> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
>> >>
>> >> That should work (even though it is a little clumsy IMO).
>> >
>> > Not sure why that is clumsy but happy to use something else if you
>> > have a suggestion?
>>
>> The clumsy thing is that we'd take the target_state as D3cold only if
>> the device already was in that state.
>>
>> Otherwise, we'd take D3hot as the target state for the same device,
>> which doesn't seem particularly consistent to me.
>>
>> Not that I have better ideas ATM, but then the current code works for
>> my use cases. :-)
>
> The goal is to afford direct-complete to devices which are not power-
> manageable by the platform but can still be runtime suspended to D3cold.

Well, this is a bit misleading.

According to the PCI spec there are two ways to put a device into
D3cold: either by putting its bus into B3 (which for PCIe means
turning the link off IIRC) which happens when the bridge goes into
D3hot, or through the platform.

You aren't talking about any of those cases, though, so we go outside
of the spec here.

> Right now we wake those devices up from D3cold to D3hot before going to
> sleep, which is a waste of energy and prolongs the suspend sequence
> (waking up the Thunderbolt controller takes 2 seconds).

Understood.

> The de facto standard to power manage such devices seems to be with
> dev_pm_domain_set(). That's what vga_switcheroo does and I'll move
> to that as well for v3 of this series.

OK

> I could add a "bool can_power_off" to struct dev_pm_domain.

I'm not sure if dev_pm_domain is the right level.  The "can_power_off"
thing would be sort of specific to your particular use case.

Say you have something like

struct pci_pm_domain {
        struct dev_pm_domain pd;
        ...
};

> Then I could change pci_target_state() like this:
>
>         pci_power_t target_state = PCI_D3hot;
>
>         if (platform_pci_power_manageable(dev)) {
>                 [...]
> +       } else if (dev->dev.pm_domain && dev->dev.pm_domain.can_power_off) {

so you can do something like

          } else if (dev->dev.pm_domain) {
                    struct pci_pm_domain *pci_pd =
to_pci_pm_domain(dev->dev.pm_domain);

                    ....
          } else if [...]

and it may be a bit more PCI-oriented without expanding generic data types.

> +               target_state = PCI_D3cold;
>         } else if [...]
>
> Another idea would be to add a ->choose_state hook to dev_pm_domain,
> but that would have to return a PCI-specific power state, so we'd be
> in clumsy territory again.

Right.

> Thoughts?

Essentially, the PCI PM code needs to be told that there is a way to
put the device into D3cold by non-standard means.  There are a couple
of ways to do that (a new flag in struct pci_dev, the above, probably
more), but in any case it needs to be clear that this is non-standard
IMO.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold
  2016-08-02 16:27     ` Lukas Wunner
@ 2016-08-05  0:29       ` Rafael J. Wysocki
  0 siblings, 0 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-05  0:29 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, linux-pci, linux-pm, Andreas Noever, Mika Westerberg

On Tuesday, August 02, 2016 06:27:40 PM Lukas Wunner wrote:
> On Fri, Jun 17, 2016 at 05:52:04PM -0500, Bjorn Helgaas wrote:
> > On Fri, May 13, 2016 at 01:15:31PM +0200, Lukas Wunner wrote:
> > > If a hotplug port is suspended to D3cold, its slot status register
> > > cannot be read.  If that hotplug port happens to share its IRQ with
> > > other devices, then whenever an interrupt occurs for one of these
> > > devices, a "no response from device" message is logged with level
> > > KERN_INFO.  Apart from this annoyance, CPU time is needlessly spent
> > > trying to read the slot status register even though we know in advance
> > > that it will fail.
> > 
> > I guess this is a pretty generic problem that could affect any device
> > that shares an IRQ.
> > 
> > I think I'll queue this on my pci/pm branch, since it seems closely
> > related to Mika's "PCI: Add runtime PM support for PCIe ports".
> > 
> > Did you check for the same issue in other likely places, e.g., AER,
> > PME, etc.?
> 
> Apologies for the delay, I've checked all other port services now:
> 
> - Our AER and PME drivers bind only to root ports and I can't imagine
>   how those could go to D3cold, they're part of the root complex or
>   PCH and I'm not aware of a chipset that would allow turning off the
>   power well for individual PCIe ports.

No, they don't go into D3cold.  They can go into D3hot, however.

> - DPC on the other hand also binds to downstream ports. I do have
>   downstream ports in my machine (as part of the Thunderbolt switch)
>   but they do not have the DPC capability. I've never seen devices
>   with that capability and cannot estimate what the chances are of them
>   going to D3cold and sharing an IRQ with other devices. It's probably
>   not worth preparing for such a situation without knowing its likelihood.
> 
> - VC: We allocate a port service for this but do not have a driver.
> 
> Bottom line is that the patch for the PCIe hotplug driver seems to be
> sufficient.
> 
> FWIW, on my machine I see numerous devices with AER, PME and VC
> capabilities. The Nvidia GPU as well as network, Firewire and
> Thunderbolt controllers all have those. AFAICS we ignore them
> because their specific drivers do not care for the capabilities
> and portdrv only binds to root ports.
> 
> This seems to support your argument that the PCIe capabilities
> should be handled by the core rather than portdrv, as we could
> then make use of the capabilities on endpoint devices in a
> universal manner.

PME, for one, is not an endpoint capability.  It very specifically is
defined as a port capability AFAICS, and the whole idea here is that
endpoints will not use their in-band interrupts to signal PME.  That
is supposed to be done by ports.

> On the other hand, I think we cannot use a separate MSI for
> AER, PME et al, can we?

We can, at least in principle.

More precisely, the spec requires PME and hotplug to use the same interrupt
(please see the comment in pcie_port_enable_msix() on that), but AER can use
a different one.

> If we cannot, then AER and PME would
> share the IRQ with an endpoint device's regular interrupt handler,
> and that might ruin performance. E.g. the Broadcom wireless card
> generates millions of interrupts on a sufficiently active WiFi.
> Accessing the device's config space on every interrupt just to
> check for AER or PME seems like a bad idea. So at the very least
> we'd need some kind of opt-out.

That is why AER, PME and hotplug are all supposed to be signaled by ports,
possibly among other things.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-04 15:30                   ` Rafael J. Wysocki
@ 2016-08-07  9:03                     ` Lukas Wunner
  2016-08-07 23:32                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-07  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Thu, Aug 04, 2016 at 05:30:47PM +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 4, 2016 at 10:14 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Thu, Aug 04, 2016 at 03:07:56AM +0200, Rafael J. Wysocki wrote:
> >> On Thu, Aug 4, 2016 at 2:45 AM, Lukas Wunner <lukas@wunner.de> wrote:
> >> > On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
> >> >> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
> >> >> > I will update this patch with Bjorn's suggestion to also leave the
> >> >> > device in D3cold if it is wakeup-capable. The idea is to just change
> >> >> > the default state in the first line of the function like this:
> >> >> >
> >> >> > -       pci_power_t target_state = PCI_D3hot;
> >> >> > +       pci_power_t target_state =
> >> >> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
> >> >>
> >> >> That should work (even though it is a little clumsy IMO).
> >> >
> >> > Not sure why that is clumsy but happy to use something else if you
> >> > have a suggestion?
> >>
> >> The clumsy thing is that we'd take the target_state as D3cold only if
> >> the device already was in that state.
> >>
> >> Otherwise, we'd take D3hot as the target state for the same device,
> >> which doesn't seem particularly consistent to me.
> >>
> >> Not that I have better ideas ATM, but then the current code works for
> >> my use cases. :-)
> >
> > The goal is to afford direct-complete to devices which are not power-
> > manageable by the platform but can still be runtime suspended to D3cold.
> 
> Well, this is a bit misleading.
> 
> According to the PCI spec there are two ways to put a device into
> D3cold: either by putting its bus into B3 (which for PCIe means
> turning the link off IIRC) which happens when the bridge goes into
> D3hot, or through the platform.
> 
> You aren't talking about any of those cases, though, so we go outside
> of the spec here.

Yes. With Nvidia Optimus / AMD PowerXpress hybrid graphics on non-Macs
and Thunderbolt on Macs, it could still be argued that D3cold is
facilitated by the platform, albeit with custom methods instead of _PS3.

With hybrid graphics on Macs, the discrete GPU is turned off by
a gmux controller on the LPC bus which is controlled via I/O ports.
So the ACPI platform isn't involved at all and at least then we're
in completely nonstandard territory.

> > Right now we wake those devices up from D3cold to D3hot before going to
> > sleep, which is a waste of energy and prolongs the suspend sequence
> > (waking up the Thunderbolt controller takes 2 seconds).
> 
> Understood.
> 
> > The de facto standard to power manage such devices seems to be with
> > dev_pm_domain_set(). That's what vga_switcheroo does and I'll move
> > to that as well for v3 of this series.
> 
> OK
> 
> > I could add a "bool can_power_off" to struct dev_pm_domain.
> 
> I'm not sure if dev_pm_domain is the right level.  The "can_power_off"
> thing would be sort of specific to your particular use case.
> 
> Say you have something like
> 
> struct pci_pm_domain {
>         struct dev_pm_domain pd;
>         ...
> };
> 
> > Then I could change pci_target_state() like this:
> >
> >         pci_power_t target_state = PCI_D3hot;
> >
> >         if (platform_pci_power_manageable(dev)) {
> >                 [...]
> > +       } else if (dev->dev.pm_domain && dev->dev.pm_domain.can_power_off) {
> 
> so you can do something like
> 
>           } else if (dev->dev.pm_domain) {
>                     struct pci_pm_domain *pci_pd =
> to_pci_pm_domain(dev->dev.pm_domain);
> 
>                     ....
>           } else if [...]
> 
> and it may be a bit more PCI-oriented without expanding generic data types.
> 
> > +               target_state = PCI_D3cold;
> >         } else if [...]
> >
> > Another idea would be to add a ->choose_state hook to dev_pm_domain,
> > but that would have to return a PCI-specific power state, so we'd be
> > in clumsy territory again.
> 
> Right.
> 
> > Thoughts?
> 
> Essentially, the PCI PM code needs to be told that there is a way to
> put the device into D3cold by non-standard means.  There are a couple
> of ways to do that (a new flag in struct pci_dev, the above, probably
> more), but in any case it needs to be clear that this is non-standard
> IMO.

The more I think about it, the more I lean towards the one-line change
at the top of this e-mail, even though you found it clumsy. It's small
and simple and fixes the problem without overengineering things.

The reasoning is that going from D3cold to D3hot before system sleep
just never makes sense, no matter if the device got there by standard
or nonstandard means. So the default target state should be D3cold if
the device is already there, and D3hot otherwise. I could perhaps try
to make this clearer by adding a comment.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
  2016-07-18 13:18   ` Rafael J. Wysocki
@ 2016-08-07  9:56     ` Lukas Wunner
  2016-08-07 15:33         ` Alan Stern
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-07  9:56 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pci, linux-pm, Andreas Noever, Alan Stern

On Mon, Jul 18, 2016 at 03:18:25PM +0200, Rafael J. Wysocki wrote:
> On Friday, May 13, 2016 01:15:31 PM Lukas Wunner wrote:
> > Since commit aae4518b3124 ("PM / sleep: Mechanism to avoid resuming
> > runtime-suspended devices unnecessarily"), we no longer wake up devices
> > which are already runtime suspended upon entering system sleep
> > ("direct-complete").
> > 
> > However commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might
> > have been reset by firmware") changed this to mandatorily runtime resume
> > such devices after the system is woken.  The motivation was to ensure
> > that devices do not remain in a reset-power-on state after system
> > resume, potentially preventing deep SoC-wide low-power states from being
> > entered on idle.
> > 
> > This is counter-productive for devices of which we know that the
> > mandatory runtime resume is unnecessary.  Thunderbolt on the Mac is a
> > case in point: Runtime resume not just powers up the controller, but
> > multiple adjacent chips, including a 15V boost converter, multiplexers
> > and an eeprom.  Gratuitously powering this up after every system sleep
> > burns a not insignificant amount of energy and needlessly strains the
> > hardware.
> > 
> > Perhaps it would have been better to carry out the mandatory runtime
> > resume only for those devices that actually need it, but at least we
> > should allow an opt-out.
> > 
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> 
> I don't like this patch and especially adding a new dev_pm_ops flag to
> work around something that you're seeing as an issue in the generic ops.
> 
> It is sort of like saying "the generic ops don't work for me, so modify
> them as well as struct dev_pm_ops", but maybe it's better to change the
> PCI bus type to do something different from calling the generic function?
> 
> Or you can add a ->complete callback to your driver that will clear
> power.direct_complete for the device in question.

First of all, the direct_complete flag is marked "Owned by the PM core"
in include/linux/pm.h. So I would have expected that a driver is not
supposed to fudge it.

Second, yes it's possible to make it work by clearing direct_complete
in the ->complete callback, but there's a catch: The device tree is
traversed bottom-up in dpm_complete(). Recall that a Thunderbolt
controller consists of multiple devices and that power control is
governed by its top-most device (upstream bridge). But because we're
going bottom-up, clearing the direct_complete flag must be done by
the bottom-most device (NHI)! So I've got all the power management
stuff nicely separated in functions executed for the upstream bridge,
but a small portion needs to be executed for the NHI. That's ugly.

Normally the device hierarchy is traversed bottom-up during suspend
and top-down during resume. However ->prepare and ->complete do it
the other way round. In the case of ->prepare, this is even documented
in Documentation/power/devices.txt but the reason thereof is not.
Could you explain this please?

Third, I'm irritated by your question "maybe it's better to change the
PCI bus type to do something different from calling the generic function".
What should that be? Under which circumstances can we leave a PCI device
asleep after direct-complete?

I'm generally irritated by commit 58a1fbbb2ee8, it's a significant change
to mandatorily wake all devices, it wastes a not insignificant amount of
energy, yet the reasoning in the commit message sounds vague and handwavy
("There is a concern [...] devices that are most likely to be affected").

Are there clear indications for or against a device requiring a resume?
E.g. the commit message names SoCs, perhaps those can be recognized by
having child devices of certain types?

Thanks,

Lukas

> 
> > ---
> >  drivers/base/power/generic_ops.c | 3 ++-
> >  include/linux/pm.h               | 1 +
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c
> > index 07c3c4a..6e88f55 100644
> > --- a/drivers/base/power/generic_ops.c
> > +++ b/drivers/base/power/generic_ops.c
> > @@ -316,7 +316,8 @@ void pm_complete_with_resume_check(struct device *dev)
> >  	 * the sleep state it is going out of and it has never been resumed till
> >  	 * now, resume it in case the firmware powered it up.
> >  	 */
> > -	if (dev->power.direct_complete && pm_resume_via_firmware())
> > +	if (dev->power.direct_complete && pm_resume_via_firmware() &&
> > +	    !dev->power.direct_complete_noresume)
> >  		pm_request_resume(dev);
> >  }
> >  EXPORT_SYMBOL_GPL(pm_complete_with_resume_check);
> > diff --git a/include/linux/pm.h b/include/linux/pm.h
> > index 6a5d654..023de94 100644
> > --- a/include/linux/pm.h
> > +++ b/include/linux/pm.h
> > @@ -596,6 +596,7 @@ struct dev_pm_info {
> >  	unsigned int		use_autosuspend:1;
> >  	unsigned int		timer_autosuspends:1;
> >  	unsigned int		memalloc_noio:1;
> > +	unsigned int		direct_complete_noresume:1;
> >  	enum rpm_request	request;
> >  	enum rpm_status		runtime_status;
> >  	int			runtime_error;
> > 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
  2016-08-07  9:56     ` Lukas Wunner
@ 2016-08-07 15:33         ` Alan Stern
  0 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2016-08-07 15:33 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Rafael J. Wysocki, linux-pci, linux-pm, Andreas Noever

On Sun, 7 Aug 2016, Lukas Wunner wrote:

> Normally the device hierarchy is traversed bottom-up during suspend
> and top-down during resume. However ->prepare and ->complete do it
> the other way round. In the case of ->prepare, this is even documented
> in Documentation/power/devices.txt but the reason thereof is not.
> Could you explain this please?

The purpose of ->prepare is to tell drivers that a system sleep is
beginning and accordingly they should stop registering new children.  
This is necessary for the PM core to be able to traverse the entire
device tree safely; we want to avoid races where a new child is added
below a device concurrently with that device being suspended.  (Or if
you want to be more precise, races in which a new child is added below
a device while the PM core is acquiring the device's lock just prior to
invoking its ->suspend callback.)

Telling drivers to stop registering new children below a device has to
be done top-down, because if it were done bottom-up then it would be
subject to the same race described above.  Doing it top-down avoids 
problems; if a device registers new children while the PM core is 
acquiring its lock prior to invoking ->prepare, it doesn't matter.  The 
new children will be handled later, right along with the existing ones.

Alan Stern


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
@ 2016-08-07 15:33         ` Alan Stern
  0 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2016-08-07 15:33 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Rafael J. Wysocki, linux-pci, linux-pm, Andreas Noever

On Sun, 7 Aug 2016, Lukas Wunner wrote:

> Normally the device hierarchy is traversed bottom-up during suspend
> and top-down during resume. However ->prepare and ->complete do it
> the other way round. In the case of ->prepare, this is even documented
> in Documentation/power/devices.txt but the reason thereof is not.
> Could you explain this please?

The purpose of ->prepare is to tell drivers that a system sleep is
beginning and accordingly they should stop registering new children.  
This is necessary for the PM core to be able to traverse the entire
device tree safely; we want to avoid races where a new child is added
below a device concurrently with that device being suspended.  (Or if
you want to be more precise, races in which a new child is added below
a device while the PM core is acquiring the device's lock just prior to
invoking its ->suspend callback.)

Telling drivers to stop registering new children below a device has to
be done top-down, because if it were done bottom-up then it would be
subject to the same race described above.  Doing it top-down avoids 
problems; if a device registers new children while the PM core is 
acquiring its lock prior to invoking ->prepare, it doesn't matter.  The 
new children will be handled later, right along with the existing ones.

Alan Stern


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-07  9:03                     ` Lukas Wunner
@ 2016-08-07 23:32                       ` Rafael J. Wysocki
  2016-08-11 13:20                         ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-07 23:32 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Sunday, August 07, 2016 11:03:47 AM Lukas Wunner wrote:
> On Thu, Aug 04, 2016 at 05:30:47PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Aug 4, 2016 at 10:14 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > > On Thu, Aug 04, 2016 at 03:07:56AM +0200, Rafael J. Wysocki wrote:
> > >> On Thu, Aug 4, 2016 at 2:45 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > >> > On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
> > >> >> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > >> >> > I will update this patch with Bjorn's suggestion to also leave the
> > >> >> > device in D3cold if it is wakeup-capable. The idea is to just change
> > >> >> > the default state in the first line of the function like this:
> > >> >> >
> > >> >> > -       pci_power_t target_state = PCI_D3hot;
> > >> >> > +       pci_power_t target_state =
> > >> >> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
> > >> >>
> > >> >> That should work (even though it is a little clumsy IMO).
> > >> >
> > >> > Not sure why that is clumsy but happy to use something else if you
> > >> > have a suggestion?
> > >>
> > >> The clumsy thing is that we'd take the target_state as D3cold only if
> > >> the device already was in that state.
> > >>
> > >> Otherwise, we'd take D3hot as the target state for the same device,
> > >> which doesn't seem particularly consistent to me.
> > >>
> > >> Not that I have better ideas ATM, but then the current code works for
> > >> my use cases. :-)
> > >
> > > The goal is to afford direct-complete to devices which are not power-
> > > manageable by the platform but can still be runtime suspended to D3cold.
> > 
> > Well, this is a bit misleading.
> > 
> > According to the PCI spec there are two ways to put a device into
> > D3cold: either by putting its bus into B3 (which for PCIe means
> > turning the link off IIRC) which happens when the bridge goes into
> > D3hot, or through the platform.
> > 
> > You aren't talking about any of those cases, though, so we go outside
> > of the spec here.
> 
> Yes. With Nvidia Optimus / AMD PowerXpress hybrid graphics on non-Macs
> and Thunderbolt on Macs, it could still be argued that D3cold is
> facilitated by the platform, albeit with custom methods instead of _PS3.

So you'd need a custom set of callbacks for that "platform", but that's
only a few devices in the system, so you would also need normal ACPI callbacks
for the rest.

Conceivably, that could be addressed with per-device platform callbacks,
but that is conceptually equivalent to adding a pm_domain pointer to the
devices in question.

> With hybrid graphics on Macs, the discrete GPU is turned off by
> a gmux controller on the LPC bus which is controlled via I/O ports.
> So the ACPI platform isn't involved at all and at least then we're
> in completely nonstandard territory.
> 
> > > Right now we wake those devices up from D3cold to D3hot before going to
> > > sleep, which is a waste of energy and prolongs the suspend sequence
> > > (waking up the Thunderbolt controller takes 2 seconds).
> > 
> > Understood.
> > 
> > > The de facto standard to power manage such devices seems to be with
> > > dev_pm_domain_set(). That's what vga_switcheroo does and I'll move
> > > to that as well for v3 of this series.
> > 
> > OK
> > 
> > > I could add a "bool can_power_off" to struct dev_pm_domain.
> > 
> > I'm not sure if dev_pm_domain is the right level.  The "can_power_off"
> > thing would be sort of specific to your particular use case.
> > 
> > Say you have something like
> > 
> > struct pci_pm_domain {
> >         struct dev_pm_domain pd;
> >         ...
> > };
> > 
> > > Then I could change pci_target_state() like this:
> > >
> > >         pci_power_t target_state = PCI_D3hot;
> > >
> > >         if (platform_pci_power_manageable(dev)) {
> > >                 [...]
> > > +       } else if (dev->dev.pm_domain && dev->dev.pm_domain.can_power_off) {
> > 
> > so you can do something like
> > 
> >           } else if (dev->dev.pm_domain) {
> >                     struct pci_pm_domain *pci_pd =
> > to_pci_pm_domain(dev->dev.pm_domain);
> > 
> >                     ....
> >           } else if [...]
> > 
> > and it may be a bit more PCI-oriented without expanding generic data types.
> > 
> > > +               target_state = PCI_D3cold;
> > >         } else if [...]
> > >
> > > Another idea would be to add a ->choose_state hook to dev_pm_domain,
> > > but that would have to return a PCI-specific power state, so we'd be
> > > in clumsy territory again.
> > 
> > Right.
> > 
> > > Thoughts?
> > 
> > Essentially, the PCI PM code needs to be told that there is a way to
> > put the device into D3cold by non-standard means.  There are a couple
> > of ways to do that (a new flag in struct pci_dev, the above, probably
> > more), but in any case it needs to be clear that this is non-standard
> > IMO.
> 
> The more I think about it, the more I lean towards the one-line change
> at the top of this e-mail, even though you found it clumsy. It's small
> and simple and fixes the problem without overengineering things.

I still would like the default D3cold to only apply if the device has no
platform support, though.

> The reasoning is that going from D3cold to D3hot before system sleep
> just never makes sense, no matter if the device got there by standard
> or nonstandard means.

That may not be true in theory.

If this is a wakeup device, it may not be able to generate wakeup signals
from D3cold while the system is in the target system state, although it might
be able to generate those signals when the system is in S0 (in the ACPI case).

That's why I'd leave the platform support case as is.

> So the default target state should be D3cold if the device is already there,
> and D3hot otherwise. I could perhaps try to make this clearer by adding a
> comment.

A comment would certainly be useful.  In particular, about how the device can
get into D3cold at all if the default is D3cold only when the device is already
there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-07 23:32                       ` Rafael J. Wysocki
@ 2016-08-11 13:20                         ` Lukas Wunner
  2016-08-12  0:50                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-11 13:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Mon, Aug 08, 2016 at 01:32:54AM +0200, Rafael J. Wysocki wrote:
> On Sunday, August 07, 2016 11:03:47 AM Lukas Wunner wrote:
> > On Thu, Aug 04, 2016 at 05:30:47PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, Aug 4, 2016 at 10:14 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > On Thu, Aug 04, 2016 at 03:07:56AM +0200, Rafael J. Wysocki wrote:
> > > >> On Thu, Aug 4, 2016 at 2:45 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > > >> > On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
> > > >> >> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > >> >> > I will update this patch with Bjorn's suggestion to also leave the
> > > >> >> > device in D3cold if it is wakeup-capable. The idea is to just change
> > > >> >> > the default state in the first line of the function like this:
> > > >> >> >
> > > >> >> > -       pci_power_t target_state = PCI_D3hot;
> > > >> >> > +       pci_power_t target_state =
> > > >> >> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
> > > >> >>
> > > >> >> That should work (even though it is a little clumsy IMO).
> > > >> >
> > > >> > Not sure why that is clumsy but happy to use something else if you
> > > >> > have a suggestion?
> > > >>
> > > >> The clumsy thing is that we'd take the target_state as D3cold only if
> > > >> the device already was in that state.
> > > >>
> > > >> Otherwise, we'd take D3hot as the target state for the same device,
> > > >> which doesn't seem particularly consistent to me.
> > > >>
> > > >> Not that I have better ideas ATM, but then the current code works for
> > > >> my use cases. :-)
> > > >
> > > > The goal is to afford direct-complete to devices which are not power-
> > > > manageable by the platform but can still be runtime suspended to D3cold.
> > > 
> > > Well, this is a bit misleading.
> > > 
> > > According to the PCI spec there are two ways to put a device into
> > > D3cold: either by putting its bus into B3 (which for PCIe means
> > > turning the link off IIRC) which happens when the bridge goes into
> > > D3hot, or through the platform.
> > > 
> > > You aren't talking about any of those cases, though, so we go outside
> > > of the spec here.
> > 
> > Yes. With Nvidia Optimus / AMD PowerXpress hybrid graphics on non-Macs
> > and Thunderbolt on Macs, it could still be argued that D3cold is
> > facilitated by the platform, albeit with custom methods instead of _PS3.
> 
> So you'd need a custom set of callbacks for that "platform", but that's
> only a few devices in the system, so you would also need normal ACPI callbacks
> for the rest.
> 
> Conceivably, that could be addressed with per-device platform callbacks,
> but that is conceptually equivalent to adding a pm_domain pointer to the
> devices in question.

Precisely.

> > > > The de facto standard to power manage such devices seems to be with
> > > > dev_pm_domain_set(). That's what vga_switcheroo does and I'll move
> > > > to that as well for v3 of this series.
> > > 
> > > OK
> > > 
> > > > I could add a "bool can_power_off" to struct dev_pm_domain.
> > > 
> > > I'm not sure if dev_pm_domain is the right level.  The "can_power_off"
> > > thing would be sort of specific to your particular use case.
> > > 
> > > Say you have something like
> > > 
> > > struct pci_pm_domain {
> > >         struct dev_pm_domain pd;
> > >         ...
> > > };

So I would like to find a common ground and something you feel
comfortable to ack. The problem I see with your suggested approach
of subclassing struct dev_pm_domain in a struct pci_pm_domain is
that I can easily envision Apple putting some custom methods in the
DSDT to power a non-PCI device up and down. They're starting to use
SPI and UART to attach devices in newer machines.

Hence my suggestion to add a flag to struct dev_pm_domain, even
though at the moment that flag would only be queried by the PCI core.
I don't care if this is called can_power_off or power_manageable or
whatever.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-11 13:20                         ` Lukas Wunner
@ 2016-08-12  0:50                           ` Rafael J. Wysocki
  2016-08-12 16:16                             ` Lukas Wunner
  0 siblings, 1 reply; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-12  0:50 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Thu, Aug 11, 2016 at 3:20 PM, Lukas Wunner <lukas@wunner.de> wrote:
> On Mon, Aug 08, 2016 at 01:32:54AM +0200, Rafael J. Wysocki wrote:
>> On Sunday, August 07, 2016 11:03:47 AM Lukas Wunner wrote:
>> > On Thu, Aug 04, 2016 at 05:30:47PM +0200, Rafael J. Wysocki wrote:
>> > > On Thu, Aug 4, 2016 at 10:14 AM, Lukas Wunner <lukas@wunner.de> wrote:
>> > > > On Thu, Aug 04, 2016 at 03:07:56AM +0200, Rafael J. Wysocki wrote:
>> > > >> On Thu, Aug 4, 2016 at 2:45 AM, Lukas Wunner <lukas@wunner.de> wrote:
>> > > >> > On Thu, Aug 04, 2016 at 01:50:39AM +0200, Rafael J. Wysocki wrote:
>> > > >> >> On Wed, Aug 3, 2016 at 2:28 PM, Lukas Wunner <lukas@wunner.de> wrote:
>> > > >> >> > I will update this patch with Bjorn's suggestion to also leave the
>> > > >> >> > device in D3cold if it is wakeup-capable. The idea is to just change
>> > > >> >> > the default state in the first line of the function like this:
>> > > >> >> >
>> > > >> >> > -       pci_power_t target_state = PCI_D3hot;
>> > > >> >> > +       pci_power_t target_state =
>> > > >> >> > +               dev->current_state == PCI_D3cold ? PCI_D3cold : PCI_D3hot;
>> > > >> >>
>> > > >> >> That should work (even though it is a little clumsy IMO).
>> > > >> >
>> > > >> > Not sure why that is clumsy but happy to use something else if you
>> > > >> > have a suggestion?
>> > > >>
>> > > >> The clumsy thing is that we'd take the target_state as D3cold only if
>> > > >> the device already was in that state.
>> > > >>
>> > > >> Otherwise, we'd take D3hot as the target state for the same device,
>> > > >> which doesn't seem particularly consistent to me.
>> > > >>
>> > > >> Not that I have better ideas ATM, but then the current code works for
>> > > >> my use cases. :-)
>> > > >
>> > > > The goal is to afford direct-complete to devices which are not power-
>> > > > manageable by the platform but can still be runtime suspended to D3cold.
>> > >
>> > > Well, this is a bit misleading.
>> > >
>> > > According to the PCI spec there are two ways to put a device into
>> > > D3cold: either by putting its bus into B3 (which for PCIe means
>> > > turning the link off IIRC) which happens when the bridge goes into
>> > > D3hot, or through the platform.
>> > >
>> > > You aren't talking about any of those cases, though, so we go outside
>> > > of the spec here.
>> >
>> > Yes. With Nvidia Optimus / AMD PowerXpress hybrid graphics on non-Macs
>> > and Thunderbolt on Macs, it could still be argued that D3cold is
>> > facilitated by the platform, albeit with custom methods instead of _PS3.
>>
>> So you'd need a custom set of callbacks for that "platform", but that's
>> only a few devices in the system, so you would also need normal ACPI callbacks
>> for the rest.
>>
>> Conceivably, that could be addressed with per-device platform callbacks,
>> but that is conceptually equivalent to adding a pm_domain pointer to the
>> devices in question.
>
> Precisely.
>
>> > > > The de facto standard to power manage such devices seems to be with
>> > > > dev_pm_domain_set(). That's what vga_switcheroo does and I'll move
>> > > > to that as well for v3 of this series.
>> > >
>> > > OK
>> > >
>> > > > I could add a "bool can_power_off" to struct dev_pm_domain.
>> > >
>> > > I'm not sure if dev_pm_domain is the right level.  The "can_power_off"
>> > > thing would be sort of specific to your particular use case.
>> > >
>> > > Say you have something like
>> > >
>> > > struct pci_pm_domain {
>> > >         struct dev_pm_domain pd;
>> > >         ...
>> > > };
>
> So I would like to find a common ground and something you feel
> comfortable to ack. The problem I see with your suggested approach
> of subclassing struct dev_pm_domain in a struct pci_pm_domain is
> that I can easily envision Apple putting some custom methods in the
> DSDT to power a non-PCI device up and down. They're starting to use
> SPI and UART to attach devices in newer machines.

Those devices have no standard power state definitions.

The problem you have here really is PCI-specific, because you want to
use PCI PM along with the non-standard methods.

> Hence my suggestion to add a flag to struct dev_pm_domain, even
> though at the moment that flag would only be queried by the PCI core.
> I don't care if this is called can_power_off or power_manageable or
> whatever.

struct dev_pm_domain is way too generic for that though, as I'm sure
there are users of it where the can_power_off thing wouldn't make any
sense whatever.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-12  0:50                           ` Rafael J. Wysocki
@ 2016-08-12 16:16                             ` Lukas Wunner
  2016-08-12 22:18                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-12 16:16 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Fri, Aug 12, 2016 at 02:50:04AM +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 11, 2016 at 3:20 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > So I would like to find a common ground and something you feel
> > comfortable to ack. The problem I see with your suggested approach
> > of subclassing struct dev_pm_domain in a struct pci_pm_domain is
> > that I can easily envision Apple putting some custom methods in the
> > DSDT to power a non-PCI device up and down. They're starting to use
> > SPI and UART to attach devices in newer machines.
> 
> Those devices have no standard power state definitions.
> 
> The problem you have here really is PCI-specific, because you want to
> use PCI PM along with the non-standard methods.

If I introduce a struct pci_pm_domain like you suggested, it would mean
that *all* PCI devices using dev_pm_domain_set() have to be changed,
else the container_of() wouldn't work. The resulting code bloat alone
inhibits me from implementing this. Plus, it's a tripwire for anyone
wishing to assign a dev_pm_domain to their PCI device.

> > Hence my suggestion to add a flag to struct dev_pm_domain, even
> > though at the moment that flag would only be queried by the PCI core.
> > I don't care if this is called can_power_off or power_manageable or
> > whatever.
> 
> struct dev_pm_domain is way too generic for that though, as I'm sure
> there are users of it where the can_power_off thing wouldn't make any
> sense whatever.

That seems like a small tradeoff compared to introducing a struct
pci_pm_domain.

If you dislike a can_power_off flag in struct dev_pm_domain, that only
leaves the option to add a one-liner to pci_target_state(), unless I'm
missing something.

BTW there seems to be a contradiction in your statements on wakeup devices:

On Mon, Aug 08, 2016 at 01:32:54AM +0200, Rafael J. Wysocki wrote:
> On Sunday, August 07, 2016 11:03:47 AM Lukas Wunner wrote:
> > The reasoning is that going from D3cold to D3hot before system sleep
> > just never makes sense, no matter if the device got there by standard
> > or nonstandard means.
>
> That may not be true in theory.
>
> If this is a wakeup device, it may not be able to generate wakeup signals
> from D3cold while the system is in the target system state, although it might
> be able to generate those signals when the system is in S0 (in the ACPI case).

However earlier you wrote:

On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
> On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
> > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> > > Is there a reason you don't want to do this check for devices that
> > > may wakeup?
> >
> > Fear of breaking things. It would mean that a device would be left in
> > D3cold even though it may not be able to signal wakeup from that power
> > state.
>
> Then it should not be put into D3_cold at run time too if it is wakeup-capable.

So on the one hand, you warn that a wakeup-capable device may have been
put into D3cold at runtime but needs to be woken before system sleep
because it might otherwise not be able to signal wakeup.

On the other hand you say that such devices should not be put into D3cold
at runtime at all.

Which one is it?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
  2016-08-07 15:33         ` Alan Stern
  (?)
@ 2016-08-12 16:39         ` Lukas Wunner
  2016-08-12 17:30             ` Alan Stern
  -1 siblings, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-12 16:39 UTC (permalink / raw)
  To: Alan Stern; +Cc: Rafael J. Wysocki, linux-pci, linux-pm, Andreas Noever

On Sun, Aug 07, 2016 at 11:33:17AM -0400, Alan Stern wrote:
> On Sun, 7 Aug 2016, Lukas Wunner wrote:
> 
> > Normally the device hierarchy is traversed bottom-up during suspend
> > and top-down during resume. However ->prepare and ->complete do it
> > the other way round. In the case of ->prepare, this is even documented
> > in Documentation/power/devices.txt but the reason thereof is not.
> > Could you explain this please?
> 
> The purpose of ->prepare is to tell drivers that a system sleep is
> beginning and accordingly they should stop registering new children.  
> This is necessary for the PM core to be able to traverse the entire
> device tree safely; we want to avoid races where a new child is added
> below a device concurrently with that device being suspended.  (Or if
> you want to be more precise, races in which a new child is added below
> a device while the PM core is acquiring the device's lock just prior to
> invoking its ->suspend callback.)
> 
> Telling drivers to stop registering new children below a device has to
> be done top-down, because if it were done bottom-up then it would be
> subject to the same race described above.  Doing it top-down avoids 
> problems; if a device registers new children while the PM core is 
> acquiring its lock prior to invoking ->prepare, it doesn't matter.  The
> new children will be handled later, right along with the existing ones.

Thank you for explaining the motivation to carry out ->prepare top-down.
However my problem is really that ->complete is carried out bottom-up.
What's the motivation for that? Merely to mirror the behaviour of
->prepare? Would it be possible to change it to top-down? Note that
re-enablement of device addition is already allowed in ->resume,
which is called top-down.

By the way, neither the PCI nor USB bus-level ->prepare callbacks perform
any action that would stop device addition. Same for the pciehp driver
(we don't even have a ->prepare callback defined for PCIe port services.
So it *is* possible to hotplug PCI devices after ->prepare.

Best regards,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
  2016-08-12 16:39         ` Lukas Wunner
@ 2016-08-12 17:30             ` Alan Stern
  0 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2016-08-12 17:30 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Rafael J. Wysocki, linux-pci, linux-pm, Andreas Noever

On Fri, 12 Aug 2016, Lukas Wunner wrote:

> Thank you for explaining the motivation to carry out ->prepare top-down.
> However my problem is really that ->complete is carried out bottom-up.
> What's the motivation for that? Merely to mirror the behaviour of
> ->prepare? Would it be possible to change it to top-down? Note that
> re-enablement of device addition is already allowed in ->resume,
> which is called top-down.

I'm not aware of any particular reason why making ->complete run
top-down wouldn't work.  Of course, if you did then the environment at
the start of the ->complete callback wouldn't be the same as it was at
the end of the ->prepare callback.

I think originally the idea was just to mirror ->prepare.  Perhaps
Rafael will remember something that has escaped me.

> By the way, neither the PCI nor USB bus-level ->prepare callbacks perform
> any action that would stop device addition. Same for the pciehp driver
> (we don't even have a ->prepare callback defined for PCIe port services.
> So it *is* possible to hotplug PCI devices after ->prepare.

I don't know about PCI (although what you describe sounds like a bug).  

USB relies on a freezable workqueue for adding child devices, so it
stops adding children even before the prepare phase begins.

Alan Stern


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
@ 2016-08-12 17:30             ` Alan Stern
  0 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2016-08-12 17:30 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Rafael J. Wysocki, linux-pci, linux-pm, Andreas Noever

On Fri, 12 Aug 2016, Lukas Wunner wrote:

> Thank you for explaining the motivation to carry out ->prepare top-down.
> However my problem is really that ->complete is carried out bottom-up.
> What's the motivation for that? Merely to mirror the behaviour of
> ->prepare? Would it be possible to change it to top-down? Note that
> re-enablement of device addition is already allowed in ->resume,
> which is called top-down.

I'm not aware of any particular reason why making ->complete run
top-down wouldn't work.  Of course, if you did then the environment at
the start of the ->complete callback wouldn't be the same as it was at
the end of the ->prepare callback.

I think originally the idea was just to mirror ->prepare.  Perhaps
Rafael will remember something that has escaped me.

> By the way, neither the PCI nor USB bus-level ->prepare callbacks perform
> any action that would stop device addition. Same for the pciehp driver
> (we don't even have a ->prepare callback defined for PCIe port services.
> So it *is* possible to hotplug PCI devices after ->prepare.

I don't know about PCI (although what you describe sounds like a bug).  

USB relies on a freezable workqueue for adding child devices, so it
stops adding children even before the prepare phase begins.

Alan Stern

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-12 16:16                             ` Lukas Wunner
@ 2016-08-12 22:18                               ` Rafael J. Wysocki
  2016-08-12 22:37                                 ` Rafael J. Wysocki
  2016-08-14 10:27                                 ` Lukas Wunner
  0 siblings, 2 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-12 22:18 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Friday, August 12, 2016 06:16:09 PM Lukas Wunner wrote:
> On Fri, Aug 12, 2016 at 02:50:04AM +0200, Rafael J. Wysocki wrote:
> > On Thu, Aug 11, 2016 at 3:20 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > So I would like to find a common ground and something you feel
> > > comfortable to ack. The problem I see with your suggested approach
> > > of subclassing struct dev_pm_domain in a struct pci_pm_domain is
> > > that I can easily envision Apple putting some custom methods in the
> > > DSDT to power a non-PCI device up and down. They're starting to use
> > > SPI and UART to attach devices in newer machines.
> > 
> > Those devices have no standard power state definitions.
> > 
> > The problem you have here really is PCI-specific, because you want to
> > use PCI PM along with the non-standard methods.
> 
> If I introduce a struct pci_pm_domain like you suggested, it would mean
> that *all* PCI devices using dev_pm_domain_set() have to be changed,
> else the container_of() wouldn't work. The resulting code bloat alone
> inhibits me from implementing this. Plus, it's a tripwire for anyone
> wishing to assign a dev_pm_domain to their PCI device.
> 
> > > Hence my suggestion to add a flag to struct dev_pm_domain, even
> > > though at the moment that flag would only be queried by the PCI core.
> > > I don't care if this is called can_power_off or power_manageable or
> > > whatever.
> > 
> > struct dev_pm_domain is way too generic for that though, as I'm sure
> > there are users of it where the can_power_off thing wouldn't make any
> > sense whatever.
> 
> That seems like a small tradeoff compared to introducing a struct
> pci_pm_domain.

I'm not going to apply any patches addding can_power_off or similar flags to
struct dev_pm_domain.

> If you dislike a can_power_off flag in struct dev_pm_domain, that only
> leaves the option to add a one-liner to pci_target_state(), unless I'm
> missing something.

I'm not sure why you are insisting on setting target_state to D3cold
before taking the platform_pci_power_manageable() branch.  Why don't
you simply rearrange the routine like

	pci_power_t target_state = PCI_D3hot;

	if (platform_pci_power_manageable(dev)) {
		...
		return target_state;
	}

	if (!dev->pm_cap)
		return PCI_D0;

	if (dev->current_state == PCI_D3cold)
		target_state = PCI_D3cold;

	if (device_may_wakeup(&dev->dev)) {
		...
	}

	return target_state;

And that would be fine by me.

That said I'm not sure why you want to use pci_target_state() so badly?

If you are going to use a PM domain, why do you still need that function?

> BTW there seems to be a contradiction in your statements on wakeup devices:
> 
> On Mon, Aug 08, 2016 at 01:32:54AM +0200, Rafael J. Wysocki wrote:
> > On Sunday, August 07, 2016 11:03:47 AM Lukas Wunner wrote:
> > > The reasoning is that going from D3cold to D3hot before system sleep
> > > just never makes sense, no matter if the device got there by standard
> > > or nonstandard means.
> >
> > That may not be true in theory.
> >
> > If this is a wakeup device, it may not be able to generate wakeup signals
> > from D3cold while the system is in the target system state, although it might
> > be able to generate those signals when the system is in S0 (in the ACPI case).
> 
> However earlier you wrote:
> 
> On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
> > On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
> > > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> > > > Is there a reason you don't want to do this check for devices that
> > > > may wakeup?
> > >
> > > Fear of breaking things. It would mean that a device would be left in
> > > D3cold even though it may not be able to signal wakeup from that power
> > > state.
> >
> > Then it should not be put into D3_cold at run time too if it is wakeup-capable.
> 
> So on the one hand, you warn that a wakeup-capable device may have been
> put into D3cold at runtime but needs to be woken before system sleep
> because it might otherwise not be able to signal wakeup.

Yes, so specifically I'm concerned about the pci_target_state() invocation in
pci_dev_keep_suspended() which is done exactly for this purpose.

If you apply the "keep it in D3cold if already there" logic to that case, it
may lead to a wrong decision in theory.  Say the device is in D3cold and
platform_pci_choose_state() returns D1, but pci_no_d1d2() returns true,
the device will end up in D3cold, but it may not be able to signal wakeup
from that state after the system has been suspended.

> On the other hand you say that such devices should not be put into D3cold
> at runtime at all.
>
> Which one is it?

I said the latter under the assumption that the device would not be able to
signal wakeup from D3cold at all (including at run time).  I may have not
understand the context of your conversation with Bjorn correctly.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-12 22:18                               ` Rafael J. Wysocki
@ 2016-08-12 22:37                                 ` Rafael J. Wysocki
  2016-08-14 10:27                                 ` Lukas Wunner
  1 sibling, 0 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-12 22:37 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Saturday, August 13, 2016 12:18:26 AM Rafael J. Wysocki wrote:
> On Friday, August 12, 2016 06:16:09 PM Lukas Wunner wrote:
> > On Fri, Aug 12, 2016 at 02:50:04AM +0200, Rafael J. Wysocki wrote:
> > > On Thu, Aug 11, 2016 at 3:20 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > > > So I would like to find a common ground and something you feel
> > > > comfortable to ack. The problem I see with your suggested approach
> > > > of subclassing struct dev_pm_domain in a struct pci_pm_domain is
> > > > that I can easily envision Apple putting some custom methods in the
> > > > DSDT to power a non-PCI device up and down. They're starting to use
> > > > SPI and UART to attach devices in newer machines.
> > > 
> > > Those devices have no standard power state definitions.
> > > 
> > > The problem you have here really is PCI-specific, because you want to
> > > use PCI PM along with the non-standard methods.
> > 
> > If I introduce a struct pci_pm_domain like you suggested, it would mean
> > that *all* PCI devices using dev_pm_domain_set() have to be changed,
> > else the container_of() wouldn't work. The resulting code bloat alone
> > inhibits me from implementing this. Plus, it's a tripwire for anyone
> > wishing to assign a dev_pm_domain to their PCI device.
> > 
> > > > Hence my suggestion to add a flag to struct dev_pm_domain, even
> > > > though at the moment that flag would only be queried by the PCI core.
> > > > I don't care if this is called can_power_off or power_manageable or
> > > > whatever.
> > > 
> > > struct dev_pm_domain is way too generic for that though, as I'm sure
> > > there are users of it where the can_power_off thing wouldn't make any
> > > sense whatever.
> > 
> > That seems like a small tradeoff compared to introducing a struct
> > pci_pm_domain.
> 
> I'm not going to apply any patches addding can_power_off or similar flags to
> struct dev_pm_domain.
> 
> > If you dislike a can_power_off flag in struct dev_pm_domain, that only
> > leaves the option to add a one-liner to pci_target_state(), unless I'm
> > missing something.
> 
> I'm not sure why you are insisting on setting target_state to D3cold
> before taking the platform_pci_power_manageable() branch.  Why don't
> you simply rearrange the routine like
> 
> 	pci_power_t target_state = PCI_D3hot;
> 
> 	if (platform_pci_power_manageable(dev)) {
> 		...
> 		return target_state;
> 	}
> 
> 	if (!dev->pm_cap)
> 		return PCI_D0;
> 
> 	if (dev->current_state == PCI_D3cold)
> 		target_state = PCI_D3cold;
> 
> 	if (device_may_wakeup(&dev->dev)) {
> 		...
> 	}
> 
> 	return target_state;
> 
> And that would be fine by me.
> 
> That said I'm not sure why you want to use pci_target_state() so badly?
> 
> If you are going to use a PM domain, why do you still need that function?
> 
> > BTW there seems to be a contradiction in your statements on wakeup devices:
> > 
> > On Mon, Aug 08, 2016 at 01:32:54AM +0200, Rafael J. Wysocki wrote:
> > > On Sunday, August 07, 2016 11:03:47 AM Lukas Wunner wrote:
> > > > The reasoning is that going from D3cold to D3hot before system sleep
> > > > just never makes sense, no matter if the device got there by standard
> > > > or nonstandard means.
> > >
> > > That may not be true in theory.
> > >
> > > If this is a wakeup device, it may not be able to generate wakeup signals
> > > from D3cold while the system is in the target system state, although it might
> > > be able to generate those signals when the system is in S0 (in the ACPI case).
> > 
> > However earlier you wrote:
> > 
> > On Mon, Jul 18, 2016 at 03:39:15PM +0200, Rafael J. Wysocki wrote:
> > > On Saturday, June 18, 2016 12:14:07 AM Lukas Wunner wrote:
> > > > On Fri, Jun 17, 2016 at 04:09:24PM -0500, Bjorn Helgaas wrote:
> > > > > Is there a reason you don't want to do this check for devices that
> > > > > may wakeup?
> > > >
> > > > Fear of breaking things. It would mean that a device would be left in
> > > > D3cold even though it may not be able to signal wakeup from that power
> > > > state.
> > >
> > > Then it should not be put into D3_cold at run time too if it is wakeup-capable.
> > 
> > So on the one hand, you warn that a wakeup-capable device may have been
> > put into D3cold at runtime but needs to be woken before system sleep
> > because it might otherwise not be able to signal wakeup.
> 
> Yes, so specifically I'm concerned about the pci_target_state() invocation in
> pci_dev_keep_suspended() which is done exactly for this purpose.
> 
> If you apply the "keep it in D3cold if already there" logic to that case, it
> may lead to a wrong decision in theory.  Say the device is in D3cold and
> platform_pci_choose_state() returns D1, but pci_no_d1d2() returns true,
> the device will end up in D3cold, but it may not be able to signal wakeup
> from that state after the system has been suspended.

Of course, I guess you'll say that it may not be able to signal wakeup from
D3hot as well in that case, which is correct. :-)

So I guess the one-liner change in pci_target_state() would be fine if Bjorn
likes it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete
  2016-08-12 17:30             ` Alan Stern
  (?)
@ 2016-08-12 22:40             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-12 22:40 UTC (permalink / raw)
  To: Alan Stern, Lukas Wunner; +Cc: linux-pci, linux-pm, Andreas Noever

On Friday, August 12, 2016 01:30:04 PM Alan Stern wrote:
> On Fri, 12 Aug 2016, Lukas Wunner wrote:
> 
> > Thank you for explaining the motivation to carry out ->prepare top-down.
> > However my problem is really that ->complete is carried out bottom-up.
> > What's the motivation for that? Merely to mirror the behaviour of
> > ->prepare? Would it be possible to change it to top-down? Note that
> > re-enablement of device addition is already allowed in ->resume,
> > which is called top-down.
> 
> I'm not aware of any particular reason why making ->complete run
> top-down wouldn't work.  Of course, if you did then the environment at
> the start of the ->complete callback wouldn't be the same as it was at
> the end of the ->prepare callback.
> 
> I think originally the idea was just to mirror ->prepare.  Perhaps
> Rafael will remember something that has escaped me.

Nothing specific from the top of my head.

> > By the way, neither the PCI nor USB bus-level ->prepare callbacks perform
> > any action that would stop device addition. Same for the pciehp driver
> > (we don't even have a ->prepare callback defined for PCIe port services.
> > So it *is* possible to hotplug PCI devices after ->prepare.

Not via ACPI, though.  The ACPI core blocks all hotplug events at the
beginning of the suspend sequence and releases them at the end of device
resume.

> I don't know about PCI (although what you describe sounds like a bug).  
> 
> USB relies on a freezable workqueue for adding child devices, so it
> stops adding children even before the prepare phase begins.

Right.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-12 22:18                               ` Rafael J. Wysocki
  2016-08-12 22:37                                 ` Rafael J. Wysocki
@ 2016-08-14 10:27                                 ` Lukas Wunner
  2016-08-15 23:05                                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 65+ messages in thread
From: Lukas Wunner @ 2016-08-14 10:27 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Sat, Aug 13, 2016 at 12:18:26AM +0200, Rafael J. Wysocki wrote:
> Yes, so specifically I'm concerned about the pci_target_state() invocation
> in pci_dev_keep_suspended() which is done exactly for this purpose.
> 
> If you apply the "keep it in D3cold if already there" logic to that case,
> it may lead to a wrong decision in theory. Say the device is in D3cold and
> platform_pci_choose_state() returns D1, but pci_no_d1d2() returns true,
> the device will end up in D3cold, but it may not be able to signal wakeup
> from that state after the system has been suspended.

Ugh, I had missed those break statements in the platform-case.
I must be blind. You're right of course, that wouldn't be correct.

> Of course, I guess you'll say that it may not be able to signal wakeup from
> D3hot as well in that case, which is correct. :-)

Hm, what would be the correct power state in that case then? PCI_D0?

> Why don't you simply rearrange the routine like
> 
> 	pci_power_t target_state = PCI_D3hot;
> 
> 	if (platform_pci_power_manageable(dev)) {
> 		...
> 		return target_state;
> 	}
> 
> 	if (!dev->pm_cap)
> 		return PCI_D0;
> 
> 	if (dev->current_state == PCI_D3cold)
> 		target_state = PCI_D3cold;
> 
> 	if (device_may_wakeup(&dev->dev)) {
> 		...
> 	}
> 
> 	return target_state;
> 
> And that would be fine by me.

Looks good, I'll give that a try.

If the correct power state in the pci_no_d1d2() case is PCI_D0,
I could fix that up as well.

> That said I'm not sure why you want to use pci_target_state() so badly?
> 
> If you are going to use a PM domain, why do you still need that function?

The dev_pm_domain is only assigned to the topmost device exposed by
the Thunderbolt controller (the upstream bridge). I would like to avoid
having to assign separate dev_pm_domains to the downstream bridges.

So I let the NHI and downstream bridges go to D3hot. And when the
upstream bridge cuts power, it iterates over all child devices
and changes their current_state to D3cold to reflect reality.

When the system is later put to sleep, this patch ensures that the
NHI and downstream bridges are not unnecessarily resumed to D3hot.

So why change the current_state of the children at all? I could just
leave the (incorrect) PCI_D3hot and everything would be peachy, right?
Well, there's another problem: The first few Thunderbolt chips had
broken MSI, they have to use INTx to signal hotplug. Unfortunately on
some Macs built 2011/2012, the IRQ is shared with multiple other devices,
most importantly the wireless card which can generate thousands of
interrupts on a crowded WLAN. If power is cut to the Thunderbolt
controller, reading from the hotplug ports' config space in pcie_isr()
fails and results in a "no response from device" message logged with
KERN_INFO. Getting thousands of such messages is annoying, not to
mention the giant waste of CPU cycles to read from the config space
of a device which we *know* is powered down.

The solution I came up with is to add a tiny two-liner to pcie_isr()
with commit ed91de7e14fb ("PCI: pciehp: Ignore interrupts during D3cold").
But that requires that I update the children's current_state to D3cold,
and necessitates that pci_target_state() doesn't resume them to D3hot
for system sleep. Hence the need for this patch.

The approach has the additional benefit that hybrid graphics devices
are implicitly also afforded direct-complete without having to add a
->prepare hook that returns a positive int. They only need to set their
current_state to D3cold, which they already do, see azx_vs_set_state(),
nouveau_pmops_runtime_suspend(), radeon_pmops_runtime_suspend(),
amdgpu_pmops_runtime_suspend().

However this also means that adding a can_power_off flag to struct
dev_pm_domain wouldn't be a viable solution because then I'd have to
assign a dev_pm_domain to the downstream bridges. Another thing I've
missed. Ugh. This is so complicated it's easy to get tangled up in
all these intricate little details.

Thanks for your patience in dealing with these issues,

Lukas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep
  2016-08-14 10:27                                 ` Lukas Wunner
@ 2016-08-15 23:05                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 65+ messages in thread
From: Rafael J. Wysocki @ 2016-08-15 23:05 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Bjorn Helgaas, Linux PCI, Linux PM, Andreas Noever

On Sunday, August 14, 2016 12:27:25 PM Lukas Wunner wrote:
> On Sat, Aug 13, 2016 at 12:18:26AM +0200, Rafael J. Wysocki wrote:
> > Yes, so specifically I'm concerned about the pci_target_state() invocation
> > in pci_dev_keep_suspended() which is done exactly for this purpose.
> > 
> > If you apply the "keep it in D3cold if already there" logic to that case,
> > it may lead to a wrong decision in theory. Say the device is in D3cold and
> > platform_pci_choose_state() returns D1, but pci_no_d1d2() returns true,
> > the device will end up in D3cold, but it may not be able to signal wakeup
> > from that state after the system has been suspended.
> 
> Ugh, I had missed those break statements in the platform-case.
> I must be blind. You're right of course, that wouldn't be correct.
> 
> > Of course, I guess you'll say that it may not be able to signal wakeup from
> > D3hot as well in that case, which is correct. :-)
> 
> Hm, what would be the correct power state in that case then? PCI_D0?

D0 may not be a good choice here too.

The problem in this case is the discrepancy between what the platform firmware
tells us and what we know from other sources, so this way or another, something
may be broken.

I guess the safest option is to just keep the current behavior. :-)

> > Why don't you simply rearrange the routine like
> > 
> > 	pci_power_t target_state = PCI_D3hot;
> > 
> > 	if (platform_pci_power_manageable(dev)) {
> > 		...
> > 		return target_state;
> > 	}
> > 
> > 	if (!dev->pm_cap)
> > 		return PCI_D0;
> > 
> > 	if (dev->current_state == PCI_D3cold)
> > 		target_state = PCI_D3cold;
> > 
> > 	if (device_may_wakeup(&dev->dev)) {
> > 		...
> > 	}
> > 
> > 	return target_state;
> > 
> > And that would be fine by me.
> 
> Looks good, I'll give that a try.
> 
> If the correct power state in the pci_no_d1d2() case is PCI_D0,
> I could fix that up as well.
> 
> > That said I'm not sure why you want to use pci_target_state() so badly?
> > 
> > If you are going to use a PM domain, why do you still need that function?
> 
> The dev_pm_domain is only assigned to the topmost device exposed by
> the Thunderbolt controller (the upstream bridge). I would like to avoid
> having to assign separate dev_pm_domains to the downstream bridges.
> 
> So I let the NHI and downstream bridges go to D3hot. And when the
> upstream bridge cuts power, it iterates over all child devices
> and changes their current_state to D3cold to reflect reality.
> 
> When the system is later put to sleep, this patch ensures that the
> NHI and downstream bridges are not unnecessarily resumed to D3hot.
> 
> So why change the current_state of the children at all? I could just
> leave the (incorrect) PCI_D3hot and everything would be peachy, right?
> Well, there's another problem: The first few Thunderbolt chips had
> broken MSI, they have to use INTx to signal hotplug. Unfortunately on
> some Macs built 2011/2012, the IRQ is shared with multiple other devices,
> most importantly the wireless card which can generate thousands of
> interrupts on a crowded WLAN. If power is cut to the Thunderbolt
> controller, reading from the hotplug ports' config space in pcie_isr()
> fails and results in a "no response from device" message logged with
> KERN_INFO. Getting thousands of such messages is annoying, not to
> mention the giant waste of CPU cycles to read from the config space
> of a device which we *know* is powered down.
> 
> The solution I came up with is to add a tiny two-liner to pcie_isr()
> with commit ed91de7e14fb ("PCI: pciehp: Ignore interrupts during D3cold").
> But that requires that I update the children's current_state to D3cold,
> and necessitates that pci_target_state() doesn't resume them to D3hot
> for system sleep. Hence the need for this patch.
> 
> The approach has the additional benefit that hybrid graphics devices
> are implicitly also afforded direct-complete without having to add a
> ->prepare hook that returns a positive int. They only need to set their
> current_state to D3cold, which they already do, see azx_vs_set_state(),
> nouveau_pmops_runtime_suspend(), radeon_pmops_runtime_suspend(),
> amdgpu_pmops_runtime_suspend().

Sounds reasonable to me.

> However this also means that adding a can_power_off flag to struct
> dev_pm_domain wouldn't be a viable solution because then I'd have to
> assign a dev_pm_domain to the downstream bridges. Another thing I've
> missed. Ugh. This is so complicated it's easy to get tangled up in
> all these intricate little details.
> 
> Thanks for your patience in dealing with these issues,

No problem.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2016-08-15 23:05 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-13 11:15 [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 01/13] PCI: Recognize Thunderbolt devices Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 11/13] PM / sleep: Allow opt-out from runtime resume after direct-complete Lukas Wunner
2016-07-18 13:18   ` Rafael J. Wysocki
2016-08-07  9:56     ` Lukas Wunner
2016-08-07 15:33       ` Alan Stern
2016-08-07 15:33         ` Alan Stern
2016-08-12 16:39         ` Lukas Wunner
2016-08-12 17:30           ` Alan Stern
2016-08-12 17:30             ` Alan Stern
2016-08-12 22:40             ` Rafael J. Wysocki
2016-05-13 11:15 ` [PATCH v2 03/13] PCI: Add Thunderbolt portdrv service type Lukas Wunner
2016-06-17 22:51   ` Bjorn Helgaas
2016-07-20  0:30     ` Rafael J. Wysocki
2016-07-20  6:59     ` Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 09/13] PCI: Do not write to PM control register while in D3cold Lukas Wunner
2016-06-17 21:18   ` Bjorn Helgaas
2016-07-18 13:55   ` Rafael J. Wysocki
2016-05-13 11:15 ` [PATCH v2 13/13] thunderbolt: Support runtime pm on NHI Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 04/13] PCI: Generalize portdrv pm iterator Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 08/13] PCI: Allow runtime PM for Thunderbolt hotplug ports on Macs Lukas Wunner
2016-06-14  9:08   ` [PATCH v2 08/13 REBASED] " Lukas Wunner
2016-06-17 21:53   ` [PATCH v2 08/13] " Bjorn Helgaas
2016-05-13 11:15 ` [PATCH v2 10/13] PCI: Avoid going from D3cold to D3hot for system sleep Lukas Wunner
2016-06-17 21:09   ` Bjorn Helgaas
2016-06-17 22:14     ` Lukas Wunner
2016-07-18 13:39       ` Rafael J. Wysocki
2016-08-03 12:28         ` Lukas Wunner
2016-08-03 23:50           ` Rafael J. Wysocki
2016-08-04  0:45             ` Lukas Wunner
2016-08-04  1:07               ` Rafael J. Wysocki
2016-08-04  8:14                 ` Lukas Wunner
2016-08-04 15:30                   ` Rafael J. Wysocki
2016-08-07  9:03                     ` Lukas Wunner
2016-08-07 23:32                       ` Rafael J. Wysocki
2016-08-11 13:20                         ` Lukas Wunner
2016-08-12  0:50                           ` Rafael J. Wysocki
2016-08-12 16:16                             ` Lukas Wunner
2016-08-12 22:18                               ` Rafael J. Wysocki
2016-08-12 22:37                                 ` Rafael J. Wysocki
2016-08-14 10:27                                 ` Lukas Wunner
2016-08-15 23:05                                   ` Rafael J. Wysocki
2016-05-13 11:15 ` [PATCH v2 06/13] PCI: pciehp: Support runtime pm Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 05/13] PCI: Use portdrv pm iterator on further callbacks Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 02/13] PCI: Allow D3 for Thunderbolt ports Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 12/13] thunderbolt: Support runtime pm on upstream bridge Lukas Wunner
2016-05-13 11:15 ` [PATCH v2 07/13] PCI: pciehp: Ignore interrupts during D3cold Lukas Wunner
2016-06-17 22:52   ` Bjorn Helgaas
2016-08-02 16:27     ` Lukas Wunner
2016-08-05  0:29       ` Rafael J. Wysocki
2016-05-21  9:48 ` [PATCH v2 00/13] Runtime PM for Thunderbolt on Macs Andreas Noever
2016-06-14 16:37   ` Bjorn Helgaas
2016-06-14 19:14     ` Andreas Noever
2016-06-14 20:22       ` Bjorn Helgaas
2016-06-15 18:40         ` Lukas Wunner
2016-06-16  1:55           ` Linus Torvalds
2016-07-07 17:39         ` Andreas Noever
2016-07-09  5:23           ` Greg KH
2016-07-12 21:46             ` Andreas Noever
2016-06-13 20:58 ` Bjorn Helgaas
2016-06-14  9:27   ` Lukas Wunner
2016-07-07 15:02 ` Lukas Wunner
2016-07-08  1:28   ` Rafael J. Wysocki
2016-07-20  7:23     ` Lukas Wunner
2016-07-20 12:48       ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.