linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/8] PCI: Recognize Thunderbolt devices
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (3 preceding siblings ...)
  2017-02-12 16:07 ` [PATCH v6 8/8] thunderbolt: Runtime suspend NHI when idle Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  2017-02-17 15:29   ` Bjorn Helgaas
  2017-02-12 16:07 ` [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports Lukas Wunner
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Andreas Noever, linux-pci

Detect on device probe whether a PCI device is part of a Thunderbolt
daisy chain (i.e. it is either part of a Thunderbolt controller or part
of the hierarchy below a Thunderbolt controller).

Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234
on devices belonging to a Thunderbolt controller.  Detect presence of
this VSEC on the device itself or on one of its parents and cache it in
a newly added is_thunderbolt bit in struct pci_dev.

The necessity arises from the following:

* To power off Thunderbolt controllers on Macs when nothing is plugged
  in, we need to allow runtime PM on their PCIe ports in
  pci_bridge_d3_possible().  For this we need a way to recognize them.

* Dual GPU MacBook Pros introduced 2011+ can no longer switch external
  DisplayPort ports between GPUs.  (They're no longer just used for DP
  but have become combined DP/Thunderbolt ports.)  The driver to switch
  the ports, drivers/platform/x86/apple-gmux.c, needs to detect presence
  of a Thunderbolt controller and, if found, keep external ports
  permanently switched to the discrete GPU.

* If an external Thunderbolt GPU is connected to a dual GPU laptop (Mac
  or not), that GPU is currently registered with vga_switcheroo even
  though it can neither drive the laptop's panel nor be powered off by
  the platform.  To vga_switcheroo it will appear as if two discrete
  GPUs are present.  As a result, when the external GPU is runtime
  suspended, vga_switcheroo will cut power to the internal discrete GPU
  which may not be runtime suspended at all at this moment.  The
  solution is to not register external GPUs with vga_switcheroo, which
  necessitates a way to recognize if they're part of a Thunderbolt daisy
  chain.

Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Amir Levy <amir.jer.levy@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.h   |  2 ++
 drivers/pci/probe.c | 32 ++++++++++++++++++++++++++++++++
 include/linux/pci.h |  1 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index cb17db242f30..45c2b8144911 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -3,6 +3,8 @@
 
 #define PCI_FIND_CAP_TTL	48
 
+#define PCI_VSEC_ID_INTEL_TBT	0x1234	/* Thunderbolt */
+
 extern const unsigned char pcie_link_speed[];
 
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 204960e70333..2fcbc535786e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1208,6 +1208,35 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
 		pdev->is_hotplug_bridge = 1;
 }
 
+static void set_pcie_thunderbolt(struct pci_dev *dev)
+{
+	struct pci_dev *parent = dev;
+	int vsec = 0;
+	u32 header;
+
+	/* Is the device part of a Thunderbolt controller? */
+	while ((vsec = pci_find_next_ext_capability(dev, vsec,
+						    PCI_EXT_CAP_ID_VNDR))) {
+		pci_read_config_dword(dev, vsec + PCI_VNDR_HEADER, &header);
+		if (dev->vendor == PCI_VENDOR_ID_INTEL &&
+		    PCI_VNDR_HEADER_ID(header) == PCI_VSEC_ID_INTEL_TBT) {
+			dev->is_thunderbolt = 1;
+			return;
+		}
+	}
+
+	/*
+	 * Is the device attached with Thunderbolt?  Walk upwards and check for
+	 * each encountered bridge if it's part of a Thunderbolt controller.
+	 * Reaching the host bridge means dev is soldered to the mainboard.
+	 */
+	while ((parent = pci_upstream_bridge(parent)))
+		if (parent->is_thunderbolt) {
+			dev->is_thunderbolt = 1;
+			return;
+		}
+}
+
 /**
  * pci_ext_cfg_is_aliased - is ext config space just an alias of std config?
  * @dev: PCI device
@@ -1360,6 +1389,9 @@ int pci_setup_device(struct pci_dev *dev)
 	/* need to have dev->class ready */
 	dev->cfg_size = pci_cfg_space_size(dev);
 
+	/* need to have dev->cfg_size ready */
+	set_pcie_thunderbolt(dev);
+
 	/* "Unknown power state" */
 	dev->current_state = PCI_UNKNOWN;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e2d1a124216a..3c775e8498f1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -358,6 +358,7 @@ struct pci_dev {
 	unsigned int	is_virtfn:1;
 	unsigned int	reset_fn:1;
 	unsigned int    is_hotplug_bridge:1;
+	unsigned int	is_thunderbolt:1; /* part of Thunderbolt daisy chain */
 	unsigned int    __aer_firmware_first_valid:1;
 	unsigned int	__aer_firmware_first:1;
 	unsigned int	broken_intx_masking:1;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 4/8] Revert "PM / Runtime: Remove the exported function pm_children_suspended()" Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 7/8] thunderbolt: Power down controller when idle Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  2017-02-14 20:38   ` Bjorn Helgaas
  2017-02-17 16:06   ` Bjorn Helgaas
  2017-02-12 16:07 ` [PATCH v6 8/8] thunderbolt: Runtime suspend NHI when idle Lukas Wunner
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Andreas Noever, linux-pci

PCIe ports are only allowed to go to D3 if the BIOS is dated 2015+ to
avoid potential issues with old chipsets.  However for Thunderbolt we
know that even the oldest controller, Light Ridge (2010), is able to
suspend its ports to D3 just fine.

We're about to add runtime PM for Thunderbolt on the Mac.  Apple has
released multiple EFI updates since 2015 but not everyone has installed
them (https://support.apple.com/en-us/HT201222).  Those users should
still benefit from the achievable power saving.  Thus, special-case
Thunderbolt in pci_bridge_d3_possible().

This allows the Thunderbolt controller to power down but the root port
to which the Thunderbolt controller is attached remains in D0 unless
an EFI update is installed.  Users can pass pcie_port_pm=force on the
kernel command line if they cannot install an EFI update but still want
to benefit from the additional power saving of putting the root port
into D3.  In practice, root ports can be suspended to D3 without issues
at least on 2012 Ivy Bridge machines.

If the BIOS cut-off date is ever lowered to 2010 and runtime PM is
allowed on all native hotplug ports, the Thunderbolt special case can be
removed.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Amir Levy <amir.jer.levy@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/pci.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7904d02ffdb9..441083a0d5b0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2224,7 +2224,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
  * @bridge: Bridge to check
  *
  * This function checks if it is possible to move the bridge to D3.
- * Currently we only allow D3 for recent enough PCIe ports.
+ * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt.
  */
 bool pci_bridge_d3_possible(struct pci_dev *bridge)
 {
@@ -2253,6 +2253,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
 		if (pci_bridge_d3_force)
 			return true;
 
+		/* Even the oldest 2010 Thunderbolt controller supports D3. */
+		if (bridge->is_thunderbolt)
+			return true;
+
 		/*
 		 * It should be safe to put PCIe ports from 2015 or newer
 		 * to D3.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 7/8] thunderbolt: Power down controller when idle
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 4/8] Revert "PM / Runtime: Remove the exported function pm_children_suspended()" Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports Lukas Wunner
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Andreas Noever, linux-pci

Document and implement Apple's ACPI-based (but nonstandard) pm mechanism
for Thunderbolt.  Briefly, an ACPI method provided by Apple is used to
cut power to the controller.  A GPE is enabled while the controller is
powered down which sideband-signals a plug event, whereupon we reinstate
power using the ACPI method.

This saves 1.7 W on machines with a Light Ridge controller and is
reported to save 4 W on Cactus Ridge 4C and Falcon Ridge 4C.  (I believe
4 W includes the bus power drawn by Apple's Gigabit Ethernet adapter.)
It fixes a power regression introduced in 3.17 by commit 7bc5a2bad0b8
("ACPI: Support _OSI("Darwin") correctly").

A Thunderbolt controller appears to the OS as a set of virtual devices:
One upstream bridge, multiple downstream bridges and one NHI (Native
Host Interface).  The upstream and downstream bridges represent a PCIe
switch (see definition of a switch in the PCIe spec).  The NHI device is
used to manage the Converged I/O switch over which PCI tunnels are set
up to hotplugged devices.  Those devices appear behind the downstream
bridges:

  (Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
                                     +-- Downstream Bridge 1 --
                                     +-- Downstream Bridge 2 --
                                     ...

Power is cut to the entire set of devices.  The Linux pm model is
hierarchical and assumes that a child cannot resume before its parent.
To conform to this model, power control must be governed by the
Thunderbolt controller's topmost device, which is the upstream bridge.
The NHI and downstream bridges go to D3hot independently and the
upstream bridge goes to D3cold once all its children have suspended.
This commit only adds runtime pm for the upstream bridge.  Runtime pm
for the NHI is added in a separate commit to signify its independence.
Runtime pm for the downstream bridges is handled by the pcieport driver.

Because Apple's ACPI methods are nonstandard, a struct dev_pm_domain is
used to override the PCI bus pm_ops.  The thunderbolt driver binds to
the NHI, thus the dev_pm_domain is assigned to the upstream bridge when
its grandchild ->probes and evicted when it ->removes.

There are no Thunderbolt specs publicly available from Intel or Apple,
so I've included documentation to the extent that I was able to reverse-
engineer things.  Documentation on the Go2Sx and Ok2Go2Sx pins is
tentative as those are missing on my Light Ridge.  Apple only uses them
on Cactus Ridge 4C.  Someone with such a controller needs to find out
through experimentation if the documentation is accurate and amend it if
necessary.

To maximize power saving, the controller utilizes the PM core's direct-
complete procedure, i.e. it stays suspended during the system sleep
process.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92111
Cc: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/thunderbolt/Kconfig  |   3 +-
 drivers/thunderbolt/Makefile |   4 +-
 drivers/thunderbolt/nhi.c    |   3 +
 drivers/thunderbolt/power.c  | 350 +++++++++++++++++++++++++++++++++++++++++++
 drivers/thunderbolt/power.h  |  37 +++++
 drivers/thunderbolt/tb.h     |   2 +
 6 files changed, 396 insertions(+), 3 deletions(-)
 create mode 100644 drivers/thunderbolt/power.c
 create mode 100644 drivers/thunderbolt/power.h

diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
index d35db16aa43f..41625cf3f461 100644
--- a/drivers/thunderbolt/Kconfig
+++ b/drivers/thunderbolt/Kconfig
@@ -1,9 +1,10 @@
 menuconfig THUNDERBOLT
 	tristate "Thunderbolt support for Apple devices"
-	depends on PCI
+	depends on PCI && ACPI
 	depends on X86 || COMPILE_TEST
 	select APPLE_PROPERTIES if EFI_STUB && X86
 	select CRC32
+	select PM
 	help
 	  Cactus Ridge Thunderbolt Controller driver
 	  This driver is required if you want to hotplug Thunderbolt devices on
diff --git a/drivers/thunderbolt/Makefile b/drivers/thunderbolt/Makefile
index 5d1053cdfa54..b22082596c42 100644
--- a/drivers/thunderbolt/Makefile
+++ b/drivers/thunderbolt/Makefile
@@ -1,3 +1,3 @@
 obj-${CONFIG_THUNDERBOLT} := thunderbolt.o
-thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o eeprom.o
-
+thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o \
+		    eeprom.o power.o
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index a8c20413dbda..88fb2fb8cf4e 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -605,6 +605,8 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 	pci_set_drvdata(pdev, tb);
 
+	thunderbolt_power_init(tb);
+
 	return 0;
 }
 
@@ -612,6 +614,7 @@ static void nhi_remove(struct pci_dev *pdev)
 {
 	struct tb *tb = pci_get_drvdata(pdev);
 	struct tb_nhi *nhi = tb->nhi;
+	thunderbolt_power_fini(tb);
 	thunderbolt_shutdown_and_free(tb);
 	nhi_shutdown(nhi);
 }
diff --git a/drivers/thunderbolt/power.c b/drivers/thunderbolt/power.c
new file mode 100644
index 000000000000..e17edab13e4c
--- /dev/null
+++ b/drivers/thunderbolt/power.c
@@ -0,0 +1,350 @@
+/*
+ * power.c - power thunderbolt controller down when idle
+ * Copyright (C) 2016 Lukas Wunner <lukas@wunner.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Apple provides the following means for power control in ACPI:
+ *
+ * * On Macs with Thunderbolt 1 Gen 1 controllers (Light Ridge, Eagle Ridge):
+ *   * XRPE method ("Power Enable"), takes argument 1 or 0, toggles a GPIO pin
+ *     to switch the controller on or off.
+ *   * XRIN named object (alternatively _GPE), contains number of a GPE which
+ *     fires as long as something is plugged in (regardless of power state).
+ *   * XRIL method ("Interrupt Low"), returns 0 as long as something is
+ *     plugged in, 1 otherwise.
+ *   * XRIP and XRIO methods, unused by macOS driver.
+ *
+ * * On Macs with Thunderbolt 1 Gen 2 controllers (Cactus Ridge 4C):
+ *   * XRIN not only fires as long as something is plugged in, but also as long
+ *     as the controller's CIO switch is powered up.
+ *   * XRIL method changed its meaning, it returns 0 as long as the CIO switch
+ *     is powered up, 1 otherwise.
+ *   * Additional SXFP method ("Force Power"), accepts only argument 0,
+ *     switches the controller off. This carries out just the raw power change,
+ *     unlike XRPE which disables the link on the PCIe Root Port in an orderly
+ *     fashion before switching off the controller.
+ *   * Additional SXLV, SXIO, SXIL methods to utilize the Go2Sx and Ok2Go2Sx
+ *     pins (see background below). Apparently SXLV toggles the value given to
+ *     the POC via Go2Sx (0 or 1), SXIO changes the direction (0 or 1) and SXIL
+ *     returns the value received from the POC via Ok2Go2Sx.
+ *   * On some Macs, additional XRST method, takes argument 1 or 0, asserts or
+ *     deasserts a GPIO pin to reset the controller.
+ *   * On Macs introduced 2013, XRPE was renamed TRPE.
+ *
+ * * On Macs with Thunderbolt 2 controllers (Falcon Ridge 4C and 2C):
+ *   * SXLV, SXIO, SXIL methods to utilize Go2Sx and Ok2Go2Sx are gone.
+ *   * On the MacPro6,1 which has multiple Thunderbolt controllers, each NHI
+ *     device has a separate XRIN GPE and separate TRPE, SXFP and XRIL methods.
+ *
+ * Background:
+ *
+ * * Gen 1 controllers (Light Ridge, Eagle Ridge) had no power management
+ *   and no ability to distinguish whether a DP or Thunderbolt device is
+ *   plugged in. Apple put an ARM Cortex MCU (NXP LPC1112A) on the logic board
+ *   which snoops on the connector lines and, depending on the type of device,
+ *   sends an HPD signal to the GPU or rings the Thunderbolt XRIN doorbell
+ *   interrupt. The switches for the 3.3V and 1.05V power rails to the
+ *   Thunderbolt controller are toggled by a GPIO pin on the southbridge.
+ *
+ * * On gen 2 controllers (Cactus Ridge 4C), Intel integrated the MCU into the
+ *   controller and called it POC. This caused a change of semantics for XRIN
+ *   and XRIL. The POC is powered by a separate 3.3V rail which is active even
+ *   in sleep state S4. It only draws a very small current. The regular 3.3V
+ *   and 1.05V power rails are no longer controlled by the southbridge but by
+ *   the POC. In other words the controller powers *itself* up and down! It is
+ *   instructed to do so with the Go2Sx pin. Another pin, Ok2Go2Sx, allows the
+ *   controller to indicate if it is ready to power itself down. Apple wires
+ *   Go2Sx and Ok2Go2Sx to the same GPIO pin on the southbridge, hence the pin
+ *   is used bidirectionally. A third pin, Force Power, is intended by Intel
+ *   for debug only but Apple abuses it for XRPE/TRPE and SXFP. Perhaps it
+ *   leads to larger power saving gains. They utilize Go2Sx and Ok2Go2Sx only
+ *   on Cactus Ridge, presumably because the controller somehow requires that.
+ *   On Falcon Ridge they forego these pins and rely solely on Force Power.
+ *
+ * Implementation Notes:
+ *
+ * * To conform to Linux' hierarchical power management model, power control
+ *   is governed by the topmost PCI device of the controller, which is the
+ *   upstream bridge. The controller is powered down once all child devices
+ *   of the upstream bridge have suspended and its autosuspend delay has
+ *   elapsed.
+ *
+ * * The autosuspend delay is user configurable via sysfs and should be lower
+ *   or equal to that of the NHI since hotplug events are not acted upon if
+ *   the NHI has suspended but the controller has not yet powered down.
+ *   However the delay should not be zero to avoid frequent power changes
+ *   (e.g. multiple times just for lspci -vv) since powering up takes 2 sec.
+ *   (Powering down is almost instantaneous.)
+ */
+
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+
+#include "power.h"
+
+/*
+ * The dev_pm_ops assigned to the upstream bridge use pr_*() instead of dev_*()
+ * to get a "thunderbolt" prefix on messages, rather than "pcieport".
+ */
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
+
+#define to_power(dev) container_of(dev->pm_domain, struct tb_power, pm_domain)
+
+static int upstream_prepare(struct device *dev)
+{
+	struct tb_power *power = to_power(dev);
+
+	if (pm_runtime_active(dev))
+		return 0;
+
+	/* prevent interrupts during system sleep transition */
+	if (ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
+		pr_err("cannot disable wake GPE, resuming\n");
+		pm_request_resume(dev);
+		return -EAGAIN;
+	}
+
+	return DPM_DIRECT_COMPLETE;
+}
+
+static void upstream_complete(struct device *dev)
+{
+	struct tb_power *power = to_power(dev);
+
+	if (pm_runtime_active(dev))
+		return;
+
+	/*
+	 * If the controller was powered down before system sleep, calling XRPE
+	 * to power it up will fail on the next runtime resume. An additional
+	 * call to XRPE is necessary to reset the power switch first.
+	 */
+	pr_info("resetting power switch\n");
+	if (ACPI_FAILURE(acpi_execute_simple_method(power->set, NULL, 0))) {
+		pr_err("cannot call power->set method\n");
+		dev->power.runtime_error = -EIO;
+	}
+
+	if (ACPI_FAILURE(acpi_enable_gpe(NULL, power->wake_gpe))) {
+		pr_err("cannot enable wake GPE, resuming\n");
+		pm_request_resume(dev);
+	}
+}
+
+static int set_d3cold(struct pci_dev *pdev, void *ptr)
+{
+	pdev->current_state = PCI_D3cold;
+	return 0;
+}
+
+static int request_resume(struct pci_dev *pdev, void *ptr)
+{
+	WARN_ON(pm_request_resume(&pdev->dev) < 0);
+	return 0;
+}
+
+static int upstream_runtime_suspend(struct device *dev)
+{
+	struct tb_power *power = to_power(dev);
+	struct pci_dev *pdev = to_pci_dev(dev);
+	unsigned long long powered_down;
+	int ret, i;
+
+	/* children are effectively in D3cold once upstream goes to D3hot */
+	pci_walk_bus(pdev->subordinate, set_d3cold, NULL);
+
+	ret = dev->bus->pm->runtime_suspend(dev);
+	if (ret) {
+		pci_walk_bus(pdev->subordinate, request_resume, NULL);
+		return ret;
+	}
+
+	pr_info("powering down\n");
+	pdev->current_state = PCI_D3cold;
+	if (ACPI_FAILURE(acpi_execute_simple_method(power->set, NULL, 0))) {
+		pr_err("cannot call power->set method, resuming\n");
+		goto err_resume;
+	}
+
+	/*
+	 * On gen 2 controllers, the wake GPE fires as long as the CIO switch
+	 * is powered up. Poll until it's powered down before enabling the GPE.
+	 * macOS polls up to 300 times with a 1 ms delay, just mimic that.
+	 */
+	for (i = 0; i < 300; i++) {
+		if (ACPI_FAILURE(acpi_evaluate_integer(power->get,
+					      NULL, NULL, &powered_down))) {
+			pr_err("cannot call power->get method, resuming\n");
+			goto err_resume;
+		}
+		if (powered_down)
+			break;
+		usleep_range(800, 1200);
+	}
+	if (!powered_down) {
+		pr_err("refused to power down, resuming\n");
+		goto err_resume;
+	}
+
+	if (ACPI_FAILURE(acpi_enable_gpe(NULL, power->wake_gpe))) {
+		pr_err("cannot enable wake GPE, resuming\n");
+		goto err_resume;
+	}
+
+	return 0;
+
+err_resume:
+	acpi_execute_simple_method(power->set, NULL, 1);
+	dev->bus->pm->runtime_resume(dev);
+	pci_walk_bus(pdev->subordinate, request_resume, NULL);
+	return -EAGAIN;
+}
+
+static int upstream_runtime_resume(struct device *dev)
+{
+	struct tb_power *power = to_power(dev);
+	struct pci_dev *pdev = to_pci_dev(dev);
+	int ret;
+
+	if (!dev->power.is_prepared &&
+	    ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
+		pr_err("cannot disable wake GPE, disabling runtime pm\n");
+		pm_runtime_disable(&power->tb->nhi->pdev->dev);
+	}
+
+	pr_info("powering up\n");
+	if (ACPI_FAILURE(acpi_execute_simple_method(power->set, NULL, 1))) {
+		pr_err("cannot call power->set method\n");
+		return -ENODEV;
+	}
+
+	ret = dev->bus->pm->runtime_resume(dev);
+
+	/* wake children to force pci_restore_state() after D3cold */
+	pci_walk_bus(pdev->subordinate, request_resume, NULL);
+
+	return ret;
+}
+
+static u32 nhi_wake(acpi_handle gpe_device, u32 gpe_number, void *ctx)
+{
+	struct device *nhi_dev = ctx;
+	WARN_ON(pm_request_resume(nhi_dev) < 0);
+	return ACPI_INTERRUPT_HANDLED;
+}
+
+static int disable_pme_poll(struct pci_dev *pdev, void *ptr)
+{
+	struct pci_bus *downstream_bus = (struct pci_bus *)ptr;
+
+	/* the wake GPE is used instead of PME */
+	if (pdev->bus == downstream_bus	||		/* downstream bridge */
+	    pdev->subordinate == downstream_bus ||	  /* upstream bridge */
+	    (pdev->bus->parent == downstream_bus &&
+	     pdev->class == PCI_CLASS_SYSTEM_OTHER << 8))	      /* NHI */
+		pdev->pme_poll = false;
+
+	return 0;
+}
+
+void thunderbolt_power_init(struct tb *tb)
+{
+	struct device *upstream_dev, *nhi_dev = &tb->nhi->pdev->dev;
+	struct acpi_handle *nhi_handle;
+	struct tb_power *power;
+
+	power = kzalloc(sizeof(*power), GFP_KERNEL);
+	if (!power) {
+		dev_err(nhi_dev, "cannot allocate power data\n");
+		goto err_free;
+	}
+
+	nhi_handle = ACPI_HANDLE(nhi_dev);
+	if (!nhi_handle) {
+		dev_err(nhi_dev, "cannot find ACPI handle\n");
+		goto err_free;
+	}
+
+	if (!nhi_dev->parent || !nhi_dev->parent->parent) {
+		dev_err(nhi_dev, "cannot find upstream bridge\n");
+		goto err_free;
+	}
+	upstream_dev = nhi_dev->parent->parent;
+
+	/* Macs introduced 2011/2012 have XRPE, 2013+ have TRPE */
+	if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRPE", &power->set)) &&
+	    ACPI_FAILURE(acpi_get_handle(nhi_handle, "TRPE", &power->set))) {
+		dev_err(nhi_dev, "cannot find power->set method\n");
+		goto err_free;
+	}
+
+	if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRIL", &power->get))) {
+		dev_err(nhi_dev, "cannot find power->get method\n");
+		goto err_free;
+	}
+
+	if (ACPI_FAILURE(acpi_evaluate_integer(nhi_handle, "XRIN", NULL,
+							&power->wake_gpe))) {
+		dev_err(nhi_dev, "cannot find wake GPE\n");
+		goto err_free;
+	}
+
+	if (ACPI_FAILURE(acpi_install_gpe_handler(NULL, power->wake_gpe,
+			     ACPI_GPE_LEVEL_TRIGGERED, nhi_wake, nhi_dev))) {
+		dev_err(nhi_dev, "cannot install GPE handler\n");
+		goto err_free;
+	}
+
+	pci_walk_bus(to_pci_dev(upstream_dev)->bus, disable_pme_poll,
+		     to_pci_dev(upstream_dev)->subordinate);
+
+	power->pm_domain.ops		     = *upstream_dev->bus->pm;
+	power->pm_domain.ops.prepare	     =  upstream_prepare;
+	power->pm_domain.ops.complete	     =  upstream_complete;
+	power->pm_domain.ops.runtime_suspend =  upstream_runtime_suspend;
+	power->pm_domain.ops.runtime_resume  =  upstream_runtime_resume;
+	power->tb			     =  tb;
+	dev_pm_domain_set(upstream_dev, &power->pm_domain);
+
+	tb->power = power;
+
+	return;
+
+err_free:
+	dev_err(nhi_dev, "controller will stay powered up permanently\n");
+	kfree(power);
+}
+
+void thunderbolt_power_fini(struct tb *tb)
+{
+	struct device *nhi_dev = &tb->nhi->pdev->dev;
+	struct device *upstream_dev = nhi_dev->parent->parent;
+	struct tb_power *power = tb->power;
+
+	if (!power)
+		return; /* thunderbolt_power_init() failed */
+
+	tb->power = NULL;
+	dev_pm_domain_set(upstream_dev, NULL);
+
+	if (ACPI_FAILURE(acpi_remove_gpe_handler(NULL, power->wake_gpe,
+						 nhi_wake)))
+		dev_err(nhi_dev, "cannot remove GPE handler\n");
+
+	kfree(power);
+}
diff --git a/drivers/thunderbolt/power.h b/drivers/thunderbolt/power.h
new file mode 100644
index 000000000000..4ab9b29a9136
--- /dev/null
+++ b/drivers/thunderbolt/power.h
@@ -0,0 +1,37 @@
+/*
+ * power.h - power thunderbolt controller down when idle
+ * Copyright (C) 2016 Lukas Wunner <lukas@wunner.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef THUNDERBOLT_POWER_H
+#define THUNDERBOLT_POWER_H
+
+#include <linux/acpi.h>
+#include <linux/pm_domain.h>
+
+#include "tb.h"
+
+struct tb_power {
+	struct tb *tb;
+	struct dev_pm_domain pm_domain; /* assigned to upstream bridge */
+	unsigned long long wake_gpe; /* hotplug interrupt during powerdown */
+	acpi_handle set; /* method to power controller up/down */
+	acpi_handle get; /* method to query power state */
+};
+
+void thunderbolt_power_init(struct tb *tb);
+void thunderbolt_power_fini(struct tb *tb);
+
+#endif
diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
index 61d57ba64035..43aed5832c9e 100644
--- a/drivers/thunderbolt/tb.h
+++ b/drivers/thunderbolt/tb.h
@@ -11,6 +11,7 @@
 
 #include "tb_regs.h"
 #include "ctl.h"
+#include "power.h"
 
 /**
  * struct tb_switch - a thunderbolt switch
@@ -103,6 +104,7 @@ struct tb {
 				 */
 	struct tb_nhi *nhi;
 	struct tb_ctl *ctl;
+	struct tb_power *power;
 	struct workqueue_struct *wq; /* ordered workqueue for plug events */
 	struct tb_switch *root_switch;
 	struct list_head tunnel_list; /* list of active PCIe tunnels */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 4/8] Revert "PM / Runtime: Remove the exported function pm_children_suspended()"
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 7/8] thunderbolt: Power down controller when idle Lukas Wunner
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Andreas Noever, linux-pci, Rafael J. Wysocki

This reverts commit 62006c1702b3b1be0c0726949e0ee0ea2326be9c which
removed pm_children_suspended() because it had only a single caller.
We're about to add a second caller, so establish the status quo ante.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/base/power/runtime.c | 3 +--
 include/linux/pm_runtime.h   | 7 +++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 872eac4cb1df..03293c32793c 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -243,8 +243,7 @@ static int rpm_check_suspend_allowed(struct device *dev)
 		retval = -EACCES;
 	else if (atomic_read(&dev->power.usage_count) > 0)
 		retval = -EAGAIN;
-	else if (!dev->power.ignore_children &&
-			atomic_read(&dev->power.child_count))
+	else if (!pm_children_suspended(dev))
 		retval = -EBUSY;
 
 	/* Pending resume requests take precedence over suspends. */
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index ca4823e675e2..7de2aa5c3692 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -66,6 +66,12 @@ static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
 	dev->power.ignore_children = enable;
 }
 
+static inline bool pm_children_suspended(struct device *dev)
+{
+	return dev->power.ignore_children
+		|| !atomic_read(&dev->power.child_count);
+}
+
 static inline void pm_runtime_get_noresume(struct device *dev)
 {
 	atomic_inc(&dev->power.usage_count);
@@ -161,6 +167,7 @@ static inline void pm_runtime_allow(struct device *dev) {}
 static inline void pm_runtime_forbid(struct device *dev) {}
 
 static inline void pm_suspend_ignore_children(struct device *dev, bool enable) {}
+static inline bool pm_children_suspended(struct device *dev) { return false; }
 static inline void pm_runtime_get_noresume(struct device *dev) {}
 static inline void pm_runtime_put_noidle(struct device *dev) {}
 static inline bool device_run_wake(struct device *dev) { return false; }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 8/8] thunderbolt: Runtime suspend NHI when idle
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (2 preceding siblings ...)
  2017-02-12 16:07 ` [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 1/8] PCI: Recognize Thunderbolt devices Lukas Wunner
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Andreas Noever, linux-pci

Runtime suspend the NHI when no Thunderbolt devices have been plugged in
for 10 sec (user-configurable via autosuspend_delay_ms in sysfs).

The NHI is not able to detect plug events while suspended, it relies on
the GPE handler to resume it on hotplug.

After the NHI resumes, it takes about 700 ms until a hotplug event
appears on the RX ring.  In case autosuspend_delay_ms has been reduced
to 0 by the user, we need to wait in tb_resume() to avoid going back to
sleep before we had a chance to detect a hotplugged device.  A runtime
pm ref is held for the duration of tb_handle_hotplug() to keep the NHI
awake while the hotplug event is processed.

Apart from that we acquire a runtime pm ref for each newly allocated
switch (except for the root switch) and drop one when a switch is freed,
thereby ensuring the NHI stays active as long as devices are plugged in.
This behaviour is identical to the macOS driver.

Just as the rest of the Thunderbolt driver, runtime PM is an ongoing
work in progress.  In particular, two issues remain:

* When a daisy-chained device is hotplugged and a PCIe port between the
  end of the chain and the host controller is in D3hot, the pciehp
  hotplug interrupt cannot be expected to be delivered.  The plug event
  sent via the CIO switch to the NHI serves as a side-band signal and
  the NHI should runtime resume the PCIe hotplug port on which the plug
  event occurred.  We do not support daisy chaining yet so this is only
  an issue for PCI tunnels established by the EFI driver.

* On Cactus Ridge and newer, directly-attached DisplayPort devices are
  not wired to the GPU, but rather the DP signal is routed through the
  Thunderbolt controller.  It may thus be necessary to check presence of
  attached DisplayPort devices when scanning the root switch and
  acquire a runtime ref for each of them to prevent the controller
  from powering off while displays are attached.  Unfortunately my own
  machine predates Cactus Ridge and wires the DP signal directly to the
  GPU, so I cannot test this.

Cc: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/thunderbolt/nhi.c    |  2 ++
 drivers/thunderbolt/power.c  |  9 +++++++++
 drivers/thunderbolt/switch.c |  9 +++++++++
 drivers/thunderbolt/tb.c     | 13 +++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 88fb2fb8cf4e..319ed81422c5 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -632,6 +632,8 @@ static const struct dev_pm_ops nhi_pm_ops = {
 					    * pci-tunnels stay alive.
 					    */
 	.restore_noirq = nhi_resume_noirq,
+	.runtime_suspend = nhi_suspend_noirq,
+	.runtime_resume = nhi_resume_noirq,
 };
 
 static struct pci_device_id nhi_ids[] = {
diff --git a/drivers/thunderbolt/power.c b/drivers/thunderbolt/power.c
index e17edab13e4c..b9dd2a1ab96d 100644
--- a/drivers/thunderbolt/power.c
+++ b/drivers/thunderbolt/power.c
@@ -323,6 +323,12 @@ void thunderbolt_power_init(struct tb *tb)
 
 	tb->power = power;
 
+	pm_runtime_allow(nhi_dev);
+	pm_runtime_set_autosuspend_delay(nhi_dev, 10000);
+	pm_runtime_use_autosuspend(nhi_dev);
+	pm_runtime_mark_last_busy(nhi_dev);
+	pm_runtime_put_autosuspend(nhi_dev);
+
 	return;
 
 err_free:
@@ -339,6 +345,9 @@ void thunderbolt_power_fini(struct tb *tb)
 	if (!power)
 		return; /* thunderbolt_power_init() failed */
 
+	pm_runtime_get(nhi_dev);
+	pm_runtime_forbid(nhi_dev);
+
 	tb->power = NULL;
 	dev_pm_domain_set(upstream_dev, NULL);
 
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index c6f30b1695a9..422fe6e96f09 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/pm_runtime.h>
 #include <linux/slab.h>
 
 #include "tb.h"
@@ -326,6 +327,11 @@ void tb_switch_free(struct tb_switch *sw)
 	if (!sw->is_unplugged)
 		tb_plug_events_active(sw, false);
 
+	if (sw != sw->tb->root_switch) {
+		pm_runtime_mark_last_busy(&sw->tb->nhi->pdev->dev);
+		pm_runtime_put_autosuspend(&sw->tb->nhi->pdev->dev);
+	}
+
 	kfree(sw->ports);
 	kfree(sw->drom);
 	kfree(sw);
@@ -420,6 +426,9 @@ struct tb_switch *tb_switch_alloc(struct tb *tb, u64 route)
 	if (tb_plug_events_active(sw, true))
 		goto err;
 
+	if (tb->root_switch)
+		pm_runtime_get(&tb->nhi->pdev->dev);
+
 	return sw;
 err:
 	kfree(sw->ports);
diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index 24b6d30c3c86..a3fedf90e545 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -7,6 +7,7 @@
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/delay.h>
+#include <linux/pm_runtime.h>
 
 #include "tb.h"
 #include "tb_regs.h"
@@ -217,8 +218,11 @@ static void tb_handle_hotplug(struct work_struct *work)
 {
 	struct tb_hotplug_event *ev = container_of(work, typeof(*ev), work);
 	struct tb *tb = ev->tb;
+	struct device *dev = &tb->nhi->pdev->dev;
 	struct tb_switch *sw;
 	struct tb_port *port;
+
+	pm_runtime_get(dev);
 	mutex_lock(&tb->lock);
 	if (!tb->hotplug_active)
 		goto out; /* during init, suspend or shutdown */
@@ -274,6 +278,8 @@ static void tb_handle_hotplug(struct work_struct *work)
 out:
 	mutex_unlock(&tb->lock);
 	kfree(ev);
+	pm_runtime_mark_last_busy(dev);
+	pm_runtime_put_autosuspend(dev);
 }
 
 /**
@@ -433,4 +439,11 @@ void thunderbolt_resume(struct tb *tb)
 	tb->hotplug_active = true;
 	mutex_unlock(&tb->lock);
 	tb_info(tb, "resume finished\n");
+
+	/*
+	 * If runtime resuming due to a hotplug event (rather than resuming
+	 * from system sleep), wait for it to arrive. May take about 700 ms.
+	 */
+	if (tb->nhi->pdev->dev.power.runtime_status == RPM_RESUMING)
+		msleep(1000);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs
@ 2017-02-12 16:07 Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 4/8] Revert "PM / Runtime: Remove the exported function pm_children_suspended()" Lukas Wunner
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

Power down Thunderbolt controllers on Macs when nothing is plugged in
to save around 2W per controller and fix a power regression introduced
in v3.17 that affects millions of machines (all Macs with Thunderbolt)


Changes since v5:

- Patch [1/8] ("PCI: Recognize Thunderbolt devices"):
  Rename set_pcie_vendor_specific() (Bjorn Helgaas)

- Patch [3/8] ("PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug
  ports"):  Modified version of reverted patch

- Patch [7/8] ("thunderbolt: Power down controller when idle"):
  Add comment explaining use of pr_*() instead of dev_*(), drop unnecessary
  initialization (Bjorn Helgaas)

- Polish commit messages


Link to v5:
https://lkml.org/lkml/2017/1/15/180

Browse the patches on GitHub:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v6


Thanks,

Lukas


Chen Yu (1):
  PM / sleep: Define constant for direct_complete

Lukas Wunner (7):
  PCI: Recognize Thunderbolt devices
  PCI: Allow runtime PM on Thunderbolt ports
  PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports
  Revert "PM / Runtime: Remove the exported function
    pm_children_suspended()"
  PM: Make requirements of dev_pm_domain_set() more precise
  thunderbolt: Power down controller when idle
  thunderbolt: Runtime suspend NHI when idle

 drivers/base/power/common.c       |  15 +-
 drivers/base/power/runtime.c      |   3 +-
 drivers/pci/hotplug/pciehp_ctrl.c |  13 +-
 drivers/pci/pci.c                 |  20 ++-
 drivers/pci/pci.h                 |   2 +
 drivers/pci/probe.c               |  32 ++++
 drivers/thunderbolt/Kconfig       |   3 +-
 drivers/thunderbolt/Makefile      |   4 +-
 drivers/thunderbolt/nhi.c         |   5 +
 drivers/thunderbolt/power.c       | 359 ++++++++++++++++++++++++++++++++++++++
 drivers/thunderbolt/power.h       |  37 ++++
 drivers/thunderbolt/switch.c      |   9 +
 drivers/thunderbolt/tb.c          |  13 ++
 drivers/thunderbolt/tb.h          |   2 +
 include/linux/pci.h               |   1 +
 include/linux/pm.h                |   7 +
 include/linux/pm_runtime.h        |   7 +
 17 files changed, 515 insertions(+), 17 deletions(-)
 create mode 100644 drivers/thunderbolt/power.c
 create mode 100644 drivers/thunderbolt/power.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v6 6/8] PM / sleep: Define constant for direct_complete
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (6 preceding siblings ...)
  2017-02-12 16:07 ` [PATCH v6 5/8] PM: Make requirements of dev_pm_domain_set() more precise Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  7 siblings, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Andreas Noever, linux-pci, Rafael J. Wysocki

From: Chen Yu <yu.c.chen@intel.com>

The PM core introduced the ability to keep devices runtime suspended
during the entire system sleep process with commit aae4518b3124
("PM / sleep: Mechanism to avoid resuming runtime-suspended devices
unnecessarily").

Drivers opt in to this so-called "direct_complete" mechanism by
returning a positive value from their ->prepare hook.  Usually this is
achieved with a "return 1;" statement, which looks somewhat cryptic to
readers not intimately familiar with the PM core.

Improve clarity by defining a DPM_DIRECT_COMPLETE constant which drivers
may use instead.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[LW: separate this hunk out of mfd patch, add commit message]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 include/linux/pm.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/pm.h b/include/linux/pm.h
index f926af41e122..2d651a30b35c 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -788,4 +788,11 @@ enum dpm_order {
 	DPM_ORDER_DEV_LAST,
 };
 
+/*
+ * Return this from system suspend/hibernation ->prepare() callback to
+ * request the core to leave the device runtime-suspended during system
+ * suspend if possible.
+ */
+#define DPM_DIRECT_COMPLETE 1
+
 #endif /* _LINUX_PM_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (4 preceding siblings ...)
  2017-02-12 16:07 ` [PATCH v6 1/8] PCI: Recognize Thunderbolt devices Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  2017-02-14 22:59   ` Bjorn Helgaas
  2017-02-12 16:07 ` [PATCH v6 5/8] PM: Make requirements of dev_pm_domain_set() more precise Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 6/8] PM / sleep: Define constant for direct_complete Lukas Wunner
  7 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
However Yinghai Lu reported the following breakage caused by the commit:

* After disabling a slot on one of his servers, the slot signals PME
  which causes pciehp to immediately re-enable the card, thus defeating
  manual slot control via sysfs.

* On another server which has a Power Controller for PCIe hotplug ports
  (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
  disabling it via sysfs since commit 68db9bc81436 acquired a runtime
  ref only *after* enabling power on the slot, yet this particular
  SkyLake server requires that the port is in D0 when enabling power.

Hence the commit was reverted with d98e0929071e.  It is herewith
reinstated in modified form:

* The Power Controller issue is addressed by no longer acquiring the
  runtime ref in board_added() and remove_board(), but in their callers.
  (There's just a single one each.)

* An additional condition is added to pci_bridge_d3_possible() such that
  runtime PM is only enabled on *Thunderbolt* hotplug ports to constrain
  it to a few well understood chips.  In all other cases the feature can
  be enabled by booting with pcie_port_pm=force.  If confidence is high
  that it works well for everyone, it can be made the default by
  removing this condition:

  /*
   * Non-Thunderbolt hotplug ports need further testing before
   * enabling D3 on them.
   */
  if (bridge->is_hotplug_bridge)
	  return false;

Following is an updated recapitulation of the extra considerations
required for hotplug ports as laid out in 68db9bc81436:

* The configuration space of the port remains accessible in D3hot, so
  all the functions to read or modify the Slot Status and Slot Control
  registers need not be modified.

* However D0 is required to access devices on the secondary bus.  This
  happens in pciehp_check_link_status() and pciehp_configure_device()
  (both called from board_added()) and in pciehp_unconfigure_device()
  (called from remove_board()), so acquire a runtime PM ref for their
  invocation.

* The hotplug port stays active as long as it has active children.
  If all hotplugged devices below the port runtime suspend, the port
  is allowed to runtime suspend as well.  Plug and unplug detection
  continues to work in D3hot.

* Hotplug interrupts are delivered in-band, which requires parents of
  the hotplug port to stay in D0.  For hotplug-capable root ports this
  is a non-issue.  For cascaded hotplug ports, side-band signaling is
  required.  E.g. with Thunderbolt, a plug event at the end of the chain
  is signaled through the CIO switching fabric to the NHI regardless of
  PCIe ports on the chain being in D3.  It is then the NHI's job to
  runtime resume the PCIe port on which the plug event occurred.

* Runtime PM may only be allowed if the hotplug port is handled natively
  by the OS.  On ACPI systems, the port may alternatively be handled by
  the firmware and things break if the OS puts the port into D3 behind the
  firmware's back:  E.g. Thunderbolt hotplug ports on non-Macs are handled
  by Intel's firmware in System Management Mode and the firmware is known
  to access devices on the port's secondary bus without checking first if
  the port is in D0:  https://bugzilla.kernel.org/show_bug.cgi?id=53811

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193951
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Erik Veijola <erik.veijola@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 13 +++++++++++--
 drivers/pci/pci.c                 | 14 +++++++++-----
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index ec0b4c11ccd9..d071aa63dac9 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -31,6 +31,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/slab.h>
+#include <linux/pm_runtime.h>
 #include <linux/pci.h>
 #include "../pci.h"
 #include "pciehp.h"
@@ -390,6 +391,7 @@ int pciehp_enable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
+	int retval;
 
 	pciehp_get_adapter_status(p_slot, &getstatus);
 	if (!getstatus) {
@@ -414,7 +416,10 @@ int pciehp_enable_slot(struct slot *p_slot)
 		}
 	}
 
-	return board_added(p_slot);
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
+	retval = board_added(p_slot);
+	pm_runtime_put(&ctrl->pcie->port->dev);
+	return retval;
 }
 
 /*
@@ -424,6 +429,7 @@ int pciehp_disable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
 	struct controller *ctrl = p_slot->ctrl;
+	int retval;
 
 	if (!p_slot->ctrl)
 		return 1;
@@ -437,7 +443,10 @@ int pciehp_disable_slot(struct slot *p_slot)
 		}
 	}
 
-	return remove_board(p_slot);
+	pm_runtime_get_sync(&ctrl->pcie->port->dev);
+	retval = remove_board(p_slot);
+	pm_runtime_put(&ctrl->pcie->port->dev);
+	return retval;
 }
 
 int pciehp_sysfs_enable_slot(struct slot *p_slot)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 441083a0d5b0..a1896df274d4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2241,13 +2241,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
 			return false;
 
 		/*
-		 * Hotplug interrupts cannot be delivered if the link is down,
-		 * so parents of a hotplug port must stay awake. In addition,
-		 * hotplug ports handled by firmware in System Management Mode
+		 * Hotplug ports handled by firmware in System Management Mode
 		 * may not be put into D3 by the OS (Thunderbolt on non-Macs).
-		 * For simplicity, disallow in general for now.
 		 */
-		if (bridge->is_hotplug_bridge)
+		if (bridge->is_hotplug_bridge && !pciehp_is_native(bridge))
 			return false;
 
 		if (pci_bridge_d3_force)
@@ -2258,6 +2255,13 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
 			return true;
 
 		/*
+		 * Non-Thunderbolt hotplug ports need further testing before
+		 * enabling D3 on them.
+		 */
+		if (bridge->is_hotplug_bridge)
+			return false;
+
+		/*
 		 * It should be safe to put PCIe ports from 2015 or newer
 		 * to D3.
 		 */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 5/8] PM: Make requirements of dev_pm_domain_set() more precise
  2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
                   ` (5 preceding siblings ...)
  2017-02-12 16:07 ` [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports Lukas Wunner
@ 2017-02-12 16:07 ` Lukas Wunner
  2017-02-12 16:07 ` [PATCH v6 6/8] PM / sleep: Define constant for direct_complete Lukas Wunner
  7 siblings, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-12 16:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Andreas Noever, linux-pci, Rafael J. Wysocki

Since commit 989561de9b51 ("PM / Domains: add setter for dev.pm_domain")
a PM domain may only be assigned to unbound devices.

The motivation was not made explicit in the changelog other than "in the
general case that can cause problems and also [...] we can simplify code
quite a bit if we can always assume that".  Rafael J. Wysocki elaborated
in a mailing list conversation that "setting a PM domain generally
changes the set of PM callbacks for the device and it may not be safe to
call it after the driver has been bound".

The concern seems to be that if a device is put to sleep and its PM
callbacks are changed, the device may end up in an undefined state or
not resume at all.  The real underlying requirement is thus to ensure
that the device is awake and execution of its PM callbacks is prevented
while the PM domain is assigned.  Unbound devices happen to fulfill this
requirement, but bound devices can be made to satisfy it as well:
The caller can prevent execution of PM ops with lock_system_sleep() and
by holding a runtime PM reference to the device.

Accordingly, adjust dev_pm_domain_set() to WARN only if the device is in
the midst of a system sleep transition, or runtime PM is enabled and the
device is either not active or may become inactive imminently (because
it has no active children or its refcount is zero).

The change is required to support runtime PM for Thunderbolt on the Mac,
which poses the unique issue that a child device (the NHI) needs to
assign a PM domain to its grandparent (the upstream bridge).  Because
the grandparent's driver is built-in and the child's driver is a module,
the grandparent is usually already bound when the child probes,
resulting in a WARN splat when calling dev_pm_domain_set().  However the
PM core guarantees both that the grandparent is active and that system
sleep is not commenced until the child has finished probing.  So in this
case it is safe to call dev_pm_domain_set() from the child's ->probe
hook and the WARN splat is entirely gratuitous.

Note that commit e79aee49bcf9 ("PM: Avoid false-positive warnings in
dev_pm_domain_set()") modified the WARN to not apply if a PM domain is
removed.  This is unsafe as it allows removal of the PM domain while
the device is asleep.  The present commit rectifies this.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/base/power/common.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/base/power/common.c b/drivers/base/power/common.c
index f6a9ad52cbbf..d02c1e08a7ed 100644
--- a/drivers/base/power/common.c
+++ b/drivers/base/power/common.c
@@ -13,6 +13,7 @@
 #include <linux/pm_clock.h>
 #include <linux/acpi.h>
 #include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
 
 #include "power.h"
 
@@ -136,8 +137,10 @@ EXPORT_SYMBOL_GPL(dev_pm_domain_detach);
  * @dev: Device whose PM domain is to be set.
  * @pd: PM domain to be set, or NULL.
  *
- * Sets the PM domain the device belongs to. The PM domain of a device needs
- * to be set before its probe finishes (it's bound to a driver).
+ * Sets the PM domain the device belongs to.  The PM domain of a device needs
+ * to be set while the device is awake.  This is guaranteed during ->probe.
+ * Otherwise the caller is responsible for ensuring wakefulness, e.g. by
+ * holding a runtime PM reference as well as invoking lock_system_sleep().
  *
  * This function must be called with the device lock held.
  */
@@ -146,8 +149,12 @@ void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd)
 	if (dev->pm_domain == pd)
 		return;
 
-	WARN(pd && device_is_bound(dev),
-	     "PM domains can only be changed for unbound devices\n");
+	WARN(dev->power.is_prepared || dev->power.is_suspended ||
+	     (pm_runtime_enabled(dev) &&
+	      (dev->power.runtime_status != RPM_ACTIVE ||
+	       (pm_children_suspended(dev) &&
+		!atomic_read(&dev->power.usage_count)))),
+	     "PM domains can only be changed for awake devices\n");
 	dev->pm_domain = pd;
 	device_pm_check_callbacks(dev);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports
  2017-02-12 16:07 ` [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports Lukas Wunner
@ 2017-02-14 20:38   ` Bjorn Helgaas
  2017-02-15 12:13     ` Lukas Wunner
  2017-02-17 16:06   ` Bjorn Helgaas
  1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2017-02-14 20:38 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Andreas Noever, linux-pci

On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> PCIe ports are only allowed to go to D3 if the BIOS is dated 2015+ to
> avoid potential issues with old chipsets.  However for Thunderbolt we
> know that even the oldest controller, Light Ridge (2010), is able to
> suspend its ports to D3 just fine.
> 
> We're about to add runtime PM for Thunderbolt on the Mac.  Apple has
> released multiple EFI updates since 2015 but not everyone has installed
> them (https://support.apple.com/en-us/HT201222).  Those users should
> still benefit from the achievable power saving.  Thus, special-case
> Thunderbolt in pci_bridge_d3_possible().
> 
> This allows the Thunderbolt controller to power down but the root port
> to which the Thunderbolt controller is attached remains in D0 unless
> an EFI update is installed.  Users can pass pcie_port_pm=force on the
> kernel command line if they cannot install an EFI update but still want
> to benefit from the additional power saving of putting the root port
> into D3.  In practice, root ports can be suspended to D3 without issues
> at least on 2012 Ivy Bridge machines.
> 
> If the BIOS cut-off date is ever lowered to 2010 and runtime PM is
> allowed on all native hotplug ports, the Thunderbolt special case can be
> removed.
> 
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Andreas Noever <andreas.noever@gmail.com>
> Cc: Tomas Winkler <tomas.winkler@intel.com>
> Cc: Amir Levy <amir.jer.levy@intel.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7904d02ffdb9..441083a0d5b0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2224,7 +2224,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
>   * @bridge: Bridge to check
>   *
>   * This function checks if it is possible to move the bridge to D3.
> - * Currently we only allow D3 for recent enough PCIe ports.
> + * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt.
>   */
>  bool pci_bridge_d3_possible(struct pci_dev *bridge)
>  {
> @@ -2253,6 +2253,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>  		if (pci_bridge_d3_force)
>  			return true;
>  
> +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> +		if (bridge->is_thunderbolt)
> +			return true;

Does this whole function connect with anything in a spec?  For
example, I see how dev->pme_support, pci_pme_capable(), etc., relate
to PCI_PM_PMC in the PM Capability (PCI PM r1.1, sec 3.2.3).  But I'm
not a PM guy, so I'm kind of at a loss to see how
pci_bridge_d3_possible() is related to that.

My larger question is about dev->is_thunderbolt.  After the previous
patch, every device downstream of a Thunderbolt controller will be
marked "is_thunderbolt".  If I understand correctly, a downstream
device could be an arbitrary PCIe device totally unrelated to
Thunderbolt except that some upstream bridge is a Thunderbolt device.

I guess I'm just unclear on the purpose of pci_bridge_d3_possible().
The comment says "is it possible to move the bridge to D3".  I would
have expected that to involve PCI_PM_PMC and other questions of what
the hardware can do.  But I can't connect the dots.

>  		/*
>  		 * It should be safe to put PCIe ports from 2015 or newer
>  		 * to D3.
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports
  2017-02-12 16:07 ` [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports Lukas Wunner
@ 2017-02-14 22:59   ` Bjorn Helgaas
  2017-02-18  9:25     ` Lukas Wunner
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2017-02-14 22:59 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> However Yinghai Lu reported the following breakage caused by the commit:
> 
> * After disabling a slot on one of his servers, the slot signals PME
>   which causes pciehp to immediately re-enable the card, thus defeating
>   manual slot control via sysfs.

I don't think we've clearly established this.  It surprises me that a
Downstream Port would signal PME after turning off power to the slot.
Wouldn't that make PME useless?

I'd like to look at this in a little more detail, either by turning
off PM and pciehp and poking things manually with setpci, or possibly
by adding some debug output in the PME ISR and the pciehp ISR and
"write command" paths.

> * On another server which has a Power Controller for PCIe hotplug ports
>   (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
>   disabling it via sysfs since commit 68db9bc81436 acquired a runtime
>   ref only *after* enabling power on the slot, yet this particular
>   SkyLake server requires that the port is in D0 when enabling power.

So this server requires that the Downstream Port must be in D0 before
we turn on power to the slot below the Downstream Port?  That seems
like it would be a requirement for *all* systems.  If the bridge is
not in D0, we can't generate any PCI transactions on the secondary bus
(PCI PM r1.1., Table 19), so we can't enumerate anything in the slot.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports
  2017-02-14 20:38   ` Bjorn Helgaas
@ 2017-02-15 12:13     ` Lukas Wunner
  0 siblings, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-15 12:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg

[cc += Mika, Rafael]

On Tue, Feb 14, 2017 at 02:38:27PM -0600, Bjorn Helgaas wrote:
> On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > PCIe ports are only allowed to go to D3 if the BIOS is dated 2015+ to
> > avoid potential issues with old chipsets.  However for Thunderbolt we
> > know that even the oldest controller, Light Ridge (2010), is able to
> > suspend its ports to D3 just fine.
> > 
> > We're about to add runtime PM for Thunderbolt on the Mac.  Apple has
> > released multiple EFI updates since 2015 but not everyone has installed
> > them (https://support.apple.com/en-us/HT201222).  Those users should
> > still benefit from the achievable power saving.  Thus, special-case
> > Thunderbolt in pci_bridge_d3_possible().
> > 
> > This allows the Thunderbolt controller to power down but the root port
> > to which the Thunderbolt controller is attached remains in D0 unless
> > an EFI update is installed.  Users can pass pcie_port_pm=force on the
> > kernel command line if they cannot install an EFI update but still want
> > to benefit from the additional power saving of putting the root port
> > into D3.  In practice, root ports can be suspended to D3 without issues
> > at least on 2012 Ivy Bridge machines.
> > 
> > If the BIOS cut-off date is ever lowered to 2010 and runtime PM is
> > allowed on all native hotplug ports, the Thunderbolt special case can be
> > removed.
> > 
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Andreas Noever <andreas.noever@gmail.com>
> > Cc: Tomas Winkler <tomas.winkler@intel.com>
> > Cc: Amir Levy <amir.jer.levy@intel.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/pci.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 7904d02ffdb9..441083a0d5b0 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2224,7 +2224,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
> >   * @bridge: Bridge to check
> >   *
> >   * This function checks if it is possible to move the bridge to D3.
> > - * Currently we only allow D3 for recent enough PCIe ports.
> > + * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt.
> >   */
> >  bool pci_bridge_d3_possible(struct pci_dev *bridge)
> >  {
> > @@ -2253,6 +2253,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> >  		if (pci_bridge_d3_force)
> >  			return true;
> >  
> > +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> > +		if (bridge->is_thunderbolt)
> > +			return true;
> 
> Does this whole function connect with anything in a spec?  For
> example, I see how dev->pme_support, pci_pme_capable(), etc., relate
> to PCI_PM_PMC in the PM Capability (PCI PM r1.1, sec 3.2.3).  But I'm
> not a PM guy, so I'm kind of at a loss to see how
> pci_bridge_d3_possible() is related to that.

No, this function does not connect with anything in a spec but rather
encodes policy:

pci_bridge_d3_possible() determines whether runtime PM is allowed on a
PCIe port *at all*.  This is used to activate runtime PM on the port
(or not) in drivers/pci/pcie/portdrv_pci.c:pcie_portdrv_probe().
It is also used to initially set a pci_dev's ->bridge_d3 flag on probe
in drivers/pci/pci.c:pci_pm_init().  If this function returns false,
the port will never ever runtime suspend.  So right now "false" is the
default, and "true" is only returned if the BIOS is dated 2015+ or
pcie_port_pm=force is passed on the command line.  This patch allows
runtime PM on ports with is_thunderbolt set, i.e. on Thunderbolt
controllers and on PCIe switches on a Thunderbolt daisy chain.
(E.g. my external Thunderbolt chassis contains a PLX 8613 switch,
but I've verified that those ports runtime suspend and resume just fine.)

pci_dev_check_d3cold() determines whether a pci_dev blocks runtime PM
on its parent ports.  Unlike pci_bridge_d3_possible(), the return value
can change over time, e.g. because the user allowed or disallowed D3cold
via sysfs.  This function contains a mix of policy-driven conditions and
stuff mandated by the spec (e.g. the pci_dev is wakeup capable but not
from D3cold, so it has to block its parents from going to D3hot as it's
effectively in D3cold then).


> I guess I'm just unclear on the purpose of pci_bridge_d3_possible().
> The comment says "is it possible to move the bridge to D3".  I would
> have expected that to involve PCI_PM_PMC and other questions of what
> the hardware can do.  But I can't connect the dots.

That happens later when the port actually runtime suspends:

pci_pm_runtime_suspend
  pci_finish_runtime_suspend

This determines the appropriate power state, enables PME, etc.

Obviously if pci_bridge_d3_possible() returns false or one of the port's
children blocks runtime PM via pci_dev_check_d3cold(), the port will
never get this far.


> My larger question is about dev->is_thunderbolt.  After the previous
> patch, every device downstream of a Thunderbolt controller will be
> marked "is_thunderbolt".  If I understand correctly, a downstream
> device could be an arbitrary PCIe device totally unrelated to
> Thunderbolt except that some upstream bridge is a Thunderbolt device.

That is correct.  It allows quirks for devices which happen to be part
of a Thunderbolt daisy chain (e.g. not registering with vga_switcheroo
for Thunderbolt eGPUs) among other things (see the three use cases listed
in the changelog of patch [1/8]).

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/8] PCI: Recognize Thunderbolt devices
  2017-02-12 16:07 ` [PATCH v6 1/8] PCI: Recognize Thunderbolt devices Lukas Wunner
@ 2017-02-17 15:29   ` Bjorn Helgaas
  2017-02-18  9:12     ` Lukas Wunner
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2017-02-17 15:29 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

[+cc other recipients of cover letter]

Hi Lukas,

On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> Detect on device probe whether a PCI device is part of a Thunderbolt
> daisy chain (i.e. it is either part of a Thunderbolt controller or part
> of the hierarchy below a Thunderbolt controller).

My problem with this is that "is_thunderbolt" is not well-defined.
The PCI specs lay out the common vocabulary and framework for this
area.  To keep the code maintainable, we need to connect things back
to the specs somehow.

For example, it might make sense to have a bit that means "this device
can generate PME from D3hot", because PME and D3hot are part of that
common understanding of how PCI works, and we can use that information
to design the software.

An "is_thunderbolt" bit doesn't have any of that context.  It doesn't
say anything about how the device behaves, so I can't evaluate the
code that uses it.

A secondary issue is that I think whatever bit you define here should
only be set for the actual Thunderbolt device itself.  The bit will
presumably tell us something about how that particular device works.
This is an enumeration-time thing and could be done either as in this
path or as a quirk, since it only affects one device.

If runtime code needs to know whether any upstream bridge is
Thunderbolt, it can always search up the hierarchy.  I think that
would improve readability because the software model would map more
closely to the hardware situation.

> Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234
> on devices belonging to a Thunderbolt controller.  Detect presence of
> this VSEC on the device itself or on one of its parents and cache it in
> a newly added is_thunderbolt bit in struct pci_dev.
> 
> The necessity arises from the following:
> 
> * To power off Thunderbolt controllers on Macs when nothing is plugged
>   in, we need to allow runtime PM on their PCIe ports in
>   pci_bridge_d3_possible().  For this we need a way to recognize them.
> 
> * Dual GPU MacBook Pros introduced 2011+ can no longer switch external
>   DisplayPort ports between GPUs.  (They're no longer just used for DP
>   but have become combined DP/Thunderbolt ports.)  The driver to switch
>   the ports, drivers/platform/x86/apple-gmux.c, needs to detect presence
>   of a Thunderbolt controller and, if found, keep external ports
>   permanently switched to the discrete GPU.
> 
> * If an external Thunderbolt GPU is connected to a dual GPU laptop (Mac
>   or not), that GPU is currently registered with vga_switcheroo even
>   though it can neither drive the laptop's panel nor be powered off by
>   the platform.  To vga_switcheroo it will appear as if two discrete
>   GPUs are present.  As a result, when the external GPU is runtime
>   suspended, vga_switcheroo will cut power to the internal discrete GPU
>   which may not be runtime suspended at all at this moment.  The
>   solution is to not register external GPUs with vga_switcheroo, which
>   necessitates a way to recognize if they're part of a Thunderbolt daisy
>   chain.
> 
> Cc: Andreas Noever <andreas.noever@gmail.com>
> Cc: Tomas Winkler <tomas.winkler@intel.com>
> Cc: Amir Levy <amir.jer.levy@intel.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci.h   |  2 ++
>  drivers/pci/probe.c | 32 ++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  3 files changed, 35 insertions(+)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index cb17db242f30..45c2b8144911 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -3,6 +3,8 @@
>  
>  #define PCI_FIND_CAP_TTL	48
>  
> +#define PCI_VSEC_ID_INTEL_TBT	0x1234	/* Thunderbolt */
> +
>  extern const unsigned char pcie_link_speed[];
>  
>  bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 204960e70333..2fcbc535786e 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1208,6 +1208,35 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
>  		pdev->is_hotplug_bridge = 1;
>  }
>  
> +static void set_pcie_thunderbolt(struct pci_dev *dev)
> +{
> +	struct pci_dev *parent = dev;
> +	int vsec = 0;
> +	u32 header;
> +
> +	/* Is the device part of a Thunderbolt controller? */
> +	while ((vsec = pci_find_next_ext_capability(dev, vsec,
> +						    PCI_EXT_CAP_ID_VNDR))) {
> +		pci_read_config_dword(dev, vsec + PCI_VNDR_HEADER, &header);
> +		if (dev->vendor == PCI_VENDOR_ID_INTEL &&
> +		    PCI_VNDR_HEADER_ID(header) == PCI_VSEC_ID_INTEL_TBT) {
> +			dev->is_thunderbolt = 1;
> +			return;
> +		}
> +	}
> +
> +	/*
> +	 * Is the device attached with Thunderbolt?  Walk upwards and check for
> +	 * each encountered bridge if it's part of a Thunderbolt controller.
> +	 * Reaching the host bridge means dev is soldered to the mainboard.
> +	 */
> +	while ((parent = pci_upstream_bridge(parent)))
> +		if (parent->is_thunderbolt) {
> +			dev->is_thunderbolt = 1;
> +			return;
> +		}
> +}
> +
>  /**
>   * pci_ext_cfg_is_aliased - is ext config space just an alias of std config?
>   * @dev: PCI device
> @@ -1360,6 +1389,9 @@ int pci_setup_device(struct pci_dev *dev)
>  	/* need to have dev->class ready */
>  	dev->cfg_size = pci_cfg_space_size(dev);
>  
> +	/* need to have dev->cfg_size ready */
> +	set_pcie_thunderbolt(dev);
> +
>  	/* "Unknown power state" */
>  	dev->current_state = PCI_UNKNOWN;
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e2d1a124216a..3c775e8498f1 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -358,6 +358,7 @@ struct pci_dev {
>  	unsigned int	is_virtfn:1;
>  	unsigned int	reset_fn:1;
>  	unsigned int    is_hotplug_bridge:1;
> +	unsigned int	is_thunderbolt:1; /* part of Thunderbolt daisy chain */
>  	unsigned int    __aer_firmware_first_valid:1;
>  	unsigned int	__aer_firmware_first:1;
>  	unsigned int	broken_intx_masking:1;
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports
  2017-02-12 16:07 ` [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports Lukas Wunner
  2017-02-14 20:38   ` Bjorn Helgaas
@ 2017-02-17 16:06   ` Bjorn Helgaas
  1 sibling, 0 replies; 21+ messages in thread
From: Bjorn Helgaas @ 2017-02-17 16:06 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Andreas Noever, linux-pci

On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> PCIe ports are only allowed to go to D3 if the BIOS is dated 2015+ to
> avoid potential issues with old chipsets.  However for Thunderbolt we
> know that even the oldest controller, Light Ridge (2010), is able to
> suspend its ports to D3 just fine.
> 
> We're about to add runtime PM for Thunderbolt on the Mac.  Apple has
> released multiple EFI updates since 2015 but not everyone has installed
> them (https://support.apple.com/en-us/HT201222).  Those users should
> still benefit from the achievable power saving.  Thus, special-case
> Thunderbolt in pci_bridge_d3_possible().
> 
> This allows the Thunderbolt controller to power down but the root port
> to which the Thunderbolt controller is attached remains in D0 unless
> an EFI update is installed.  Users can pass pcie_port_pm=force on the
> kernel command line if they cannot install an EFI update but still want
> to benefit from the additional power saving of putting the root port
> into D3.  In practice, root ports can be suspended to D3 without issues
> at least on 2012 Ivy Bridge machines.
> 
> If the BIOS cut-off date is ever lowered to 2010 and runtime PM is
> allowed on all native hotplug ports, the Thunderbolt special case can be
> removed.
> 
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Andreas Noever <andreas.noever@gmail.com>
> Cc: Tomas Winkler <tomas.winkler@intel.com>
> Cc: Amir Levy <amir.jer.levy@intel.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Given that I'm hoping we can rework or rename is_thunderbolt to
something better-defined, which would affect this patch as well, it
doesn't really make sense for me to ack this one yet.

> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/pci.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7904d02ffdb9..441083a0d5b0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2224,7 +2224,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
>   * @bridge: Bridge to check
>   *
>   * This function checks if it is possible to move the bridge to D3.
> - * Currently we only allow D3 for recent enough PCIe ports.
> + * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt.
>   */
>  bool pci_bridge_d3_possible(struct pci_dev *bridge)
>  {
> @@ -2253,6 +2253,10 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>  		if (pci_bridge_d3_force)
>  			return true;
>  
> +		/* Even the oldest 2010 Thunderbolt controller supports D3. */
> +		if (bridge->is_thunderbolt)
> +			return true;
> +
>  		/*
>  		 * It should be safe to put PCIe ports from 2015 or newer
>  		 * to D3.
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/8] PCI: Recognize Thunderbolt devices
  2017-02-17 15:29   ` Bjorn Helgaas
@ 2017-02-18  9:12     ` Lukas Wunner
  2017-02-19  4:27       ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2017-02-18  9:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Fri, Feb 17, 2017 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > Detect on device probe whether a PCI device is part of a Thunderbolt
> > daisy chain (i.e. it is either part of a Thunderbolt controller or part
> > of the hierarchy below a Thunderbolt controller).
> 
> My problem with this is that "is_thunderbolt" is not well-defined.
> The PCI specs lay out the common vocabulary and framework for this
> area.  To keep the code maintainable, we need to connect things back
> to the specs somehow.
> 
> For example, it might make sense to have a bit that means "this device
> can generate PME from D3hot", because PME and D3hot are part of that
> common understanding of how PCI works, and we can use that information
> to design the software.
> 
> An "is_thunderbolt" bit doesn't have any of that context.  It doesn't
> say anything about how the device behaves, so I can't evaluate the
> code that uses it.

No, this *does* connect back to the spec, the Thunderbolt spec to be
specific, except that spec is not available publicly.  (I assume it's
available under NDA.)  FWIW the PCI SIG doesn't make its specs freely
available either, only to members or for $$$.  As Andreas Noever has
pointed out before, there is plenty of precedent for including
(reverse-engineered) code in the kernel for stuff that isn't public.

The rationale for this bit is that devices on a Thunderbolt daisy chain
differ from plain PCIe devices and as such require special treatment.

Thunderbolt encapsulates PCIe and other protocols such as DisplayPort
into Thunderbolt packets and tunnels them over a network of Converged I/O
switches, so what's transmitted on the wire differs significantly from
plain PCIe.  That should already make the necessity for special treatment
obvious.  I've named three use cases for the is_thunderbolt bit in the
commit message of this patch but expect there will be more in the future.
E.g. Apple has a traffic prioritization mechanism on the Converged I/O
layer which is afforded to specific devices on the PCIe layer above,
see US 9,015,384:
http://pimg-fpiw.uspto.gov/fdd/84/153/090/0.pdf


> A secondary issue is that I think whatever bit you define here should
> only be set for the actual Thunderbolt device itself.  The bit will
> presumably tell us something about how that particular device works.

I don't quite follow, the bit *does* tell us something about how the
device works, doesn't it?  That it's situated on a Thunderbolt daisy
chain, I've clearly spelled that out in the commit message as well as
in a code comment of this patch.


> This is an enumeration-time thing and could be done either as in this
> path or as a quirk, since it only affects one device.

You're contradicting yourself by suggesting to use a quirk here:

Back in 2014 an Intel engineer tried to add a PCI quirk with a long
list of Thunderbolt devices and none other than you objected that
you're "really not happy about checking a list of device IDs to identify
Thunderbolt devices.  Surely there's a better way, because a list
like this has to be updated regularly."
https://patchwork.ozlabs.org/patch/409994/

The better way you were asking for back then is to probe for presence
of the Thunderbolt VSEC, which is what this patch does.  Yet you're
still not happy.  I don't get it.

FWIW I've asked an Intel engineer for details on the contents of the
VSEC but he declined to comment:
https://lkml.org/lkml/2017/1/11/153


> If runtime code needs to know whether any upstream bridge is
> Thunderbolt, it can always search up the hierarchy.  I think that
> would improve readability because the software model would map more
> closely to the hardware situation.

That would be more expensive than just checking a bit that is searched
only once and then cached.  We have is_hotplug_bridge for just eight
places where we check presence of hotplug capability, why can't we have
is_thunderbolt?  Beats me.

Thanks,

Lukas

> 
> > Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234
> > on devices belonging to a Thunderbolt controller.  Detect presence of
> > this VSEC on the device itself or on one of its parents and cache it in
> > a newly added is_thunderbolt bit in struct pci_dev.
> > 
> > The necessity arises from the following:
> > 
> > * To power off Thunderbolt controllers on Macs when nothing is plugged
> >   in, we need to allow runtime PM on their PCIe ports in
> >   pci_bridge_d3_possible().  For this we need a way to recognize them.
> > 
> > * Dual GPU MacBook Pros introduced 2011+ can no longer switch external
> >   DisplayPort ports between GPUs.  (They're no longer just used for DP
> >   but have become combined DP/Thunderbolt ports.)  The driver to switch
> >   the ports, drivers/platform/x86/apple-gmux.c, needs to detect presence
> >   of a Thunderbolt controller and, if found, keep external ports
> >   permanently switched to the discrete GPU.
> > 
> > * If an external Thunderbolt GPU is connected to a dual GPU laptop (Mac
> >   or not), that GPU is currently registered with vga_switcheroo even
> >   though it can neither drive the laptop's panel nor be powered off by
> >   the platform.  To vga_switcheroo it will appear as if two discrete
> >   GPUs are present.  As a result, when the external GPU is runtime
> >   suspended, vga_switcheroo will cut power to the internal discrete GPU
> >   which may not be runtime suspended at all at this moment.  The
> >   solution is to not register external GPUs with vga_switcheroo, which
> >   necessitates a way to recognize if they're part of a Thunderbolt daisy
> >   chain.
> > 
> > Cc: Andreas Noever <andreas.noever@gmail.com>
> > Cc: Tomas Winkler <tomas.winkler@intel.com>
> > Cc: Amir Levy <amir.jer.levy@intel.com>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/pci.h   |  2 ++
> >  drivers/pci/probe.c | 32 ++++++++++++++++++++++++++++++++
> >  include/linux/pci.h |  1 +
> >  3 files changed, 35 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index cb17db242f30..45c2b8144911 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -3,6 +3,8 @@
> >  
> >  #define PCI_FIND_CAP_TTL	48
> >  
> > +#define PCI_VSEC_ID_INTEL_TBT	0x1234	/* Thunderbolt */
> > +
> >  extern const unsigned char pcie_link_speed[];
> >  
> >  bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 204960e70333..2fcbc535786e 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -1208,6 +1208,35 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
> >  		pdev->is_hotplug_bridge = 1;
> >  }
> >  
> > +static void set_pcie_thunderbolt(struct pci_dev *dev)
> > +{
> > +	struct pci_dev *parent = dev;
> > +	int vsec = 0;
> > +	u32 header;
> > +
> > +	/* Is the device part of a Thunderbolt controller? */
> > +	while ((vsec = pci_find_next_ext_capability(dev, vsec,
> > +						    PCI_EXT_CAP_ID_VNDR))) {
> > +		pci_read_config_dword(dev, vsec + PCI_VNDR_HEADER, &header);
> > +		if (dev->vendor == PCI_VENDOR_ID_INTEL &&
> > +		    PCI_VNDR_HEADER_ID(header) == PCI_VSEC_ID_INTEL_TBT) {
> > +			dev->is_thunderbolt = 1;
> > +			return;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Is the device attached with Thunderbolt?  Walk upwards and check for
> > +	 * each encountered bridge if it's part of a Thunderbolt controller.
> > +	 * Reaching the host bridge means dev is soldered to the mainboard.
> > +	 */
> > +	while ((parent = pci_upstream_bridge(parent)))
> > +		if (parent->is_thunderbolt) {
> > +			dev->is_thunderbolt = 1;
> > +			return;
> > +		}
> > +}
> > +
> >  /**
> >   * pci_ext_cfg_is_aliased - is ext config space just an alias of std config?
> >   * @dev: PCI device
> > @@ -1360,6 +1389,9 @@ int pci_setup_device(struct pci_dev *dev)
> >  	/* need to have dev->class ready */
> >  	dev->cfg_size = pci_cfg_space_size(dev);
> >  
> > +	/* need to have dev->cfg_size ready */
> > +	set_pcie_thunderbolt(dev);
> > +
> >  	/* "Unknown power state" */
> >  	dev->current_state = PCI_UNKNOWN;
> >  
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index e2d1a124216a..3c775e8498f1 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -358,6 +358,7 @@ struct pci_dev {
> >  	unsigned int	is_virtfn:1;
> >  	unsigned int	reset_fn:1;
> >  	unsigned int    is_hotplug_bridge:1;
> > +	unsigned int	is_thunderbolt:1; /* part of Thunderbolt daisy chain */
> >  	unsigned int    __aer_firmware_first_valid:1;
> >  	unsigned int	__aer_firmware_first:1;
> >  	unsigned int	broken_intx_masking:1;
> > -- 
> > 2.11.0
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports
  2017-02-14 22:59   ` Bjorn Helgaas
@ 2017-02-18  9:25     ` Lukas Wunner
  2017-02-19  0:16       ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2017-02-18  9:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Tue, Feb 14, 2017 at 04:59:26PM -0600, Bjorn Helgaas wrote:
> On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> > However Yinghai Lu reported the following breakage caused by the commit:
> > 
> > * After disabling a slot on one of his servers, the slot signals PME
> >   which causes pciehp to immediately re-enable the card, thus defeating
> >   manual slot control via sysfs.
> 
> I don't think we've clearly established this.  It surprises me that a
> Downstream Port would signal PME after turning off power to the slot.
> Wouldn't that make PME useless?
> 
> I'd like to look at this in a little more detail, either by turning
> off PM and pciehp and poking things manually with setpci, or possibly
> by adding some debug output in the PME ISR and the pciehp ISR and
> "write command" paths.

I'm in the To: header of your e-mail but I am not in a position to
debug this as I don't have the hardware available.  It would have
to be done by Yinghai Lu or Intel.  This may already be clear to
everyone involved, so apologies for stating the obvious.

Also, I hope it's clear that these Intel servers are not affected
by v6 of this patch as it only enables runtime PM on *Thunderbolt*
hotplug ports.


> > * On another server which has a Power Controller for PCIe hotplug ports
> >   (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
> >   disabling it via sysfs since commit 68db9bc81436 acquired a runtime
> >   ref only *after* enabling power on the slot, yet this particular
> >   SkyLake server requires that the port is in D0 when enabling power.
> 
> So this server requires that the Downstream Port must be in D0 before
> we turn on power to the slot below the Downstream Port?

Apparently.


> That seems like it would be a requirement for *all* systems.

Well, it's not mandated by the spec but indeed seems like a good idea.
Please note that the present patch does exactly that.


> If the bridge is
> not in D0, we can't generate any PCI transactions on the secondary bus
> (PCI PM r1.1., Table 19), so we can't enumerate anything in the slot.

Turning on slot power has nothing to do with accessing the secondary bus.
v5 of this patch resumed the hotplug port to D0 when accessing the
secondary bus. On top of that, v6 resumes it to D0 when turning on
slot power.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports
  2017-02-18  9:25     ` Lukas Wunner
@ 2017-02-19  0:16       ` Bjorn Helgaas
  2017-02-20 12:04         ` Lukas Wunner
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2017-02-19  0:16 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Sat, Feb 18, 2017 at 10:25:16AM +0100, Lukas Wunner wrote:
> On Tue, Feb 14, 2017 at 04:59:26PM -0600, Bjorn Helgaas wrote:
> > On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > > Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> > > However Yinghai Lu reported the following breakage caused by the commit:
> > > 
> > > * After disabling a slot on one of his servers, the slot signals PME
> > >   which causes pciehp to immediately re-enable the card, thus defeating
> > >   manual slot control via sysfs.
> > 
> > I don't think we've clearly established this.  It surprises me that a
> > Downstream Port would signal PME after turning off power to the slot.
> > Wouldn't that make PME useless?
> > 
> > I'd like to look at this in a little more detail, either by turning
> > off PM and pciehp and poking things manually with setpci, or possibly
> > by adding some debug output in the PME ISR and the pciehp ISR and
> > "write command" paths.
> 
> I'm in the To: header of your e-mail but I am not in a position to
> debug this as I don't have the hardware available.  It would have
> to be done by Yinghai Lu or Intel.  This may already be clear to
> everyone involved, so apologies for stating the obvious.

It is clear; I was just hoping you would make it as easy as possible
for the folks with the hardware by proposing a debugging patch or a
series of specific setpci commands to run.

> Also, I hope it's clear that these Intel servers are not affected
> by v6 of this patch as it only enables runtime PM on *Thunderbolt*
> hotplug ports.

Right.  I just don't understand what's happening, and I don't want
this to become folk wisdom before we have more details.

> > > * On another server which has a Power Controller for PCIe hotplug ports
> > >   (PCIe r3.1, sec 6.7.1.8), the slot cannot be re-enabled after manually
> > >   disabling it via sysfs since commit 68db9bc81436 acquired a runtime
> > >   ref only *after* enabling power on the slot, yet this particular
> > >   SkyLake server requires that the port is in D0 when enabling power.
> > 
> > So this server requires that the Downstream Port must be in D0 before
> > we turn on power to the slot below the Downstream Port?
> 
> Apparently.
> 
> > That seems like it would be a requirement for *all* systems.
> 
> Well, it's not mandated by the spec but indeed seems like a good idea.
> Please note that the present patch does exactly that.

A bridge in D3hot still accepts config cycles, so we should be able to
use Slot Control to turn slot power for its secondary bus on or off.

PCI PM r1.1, Table 16, says the only action we can expect from the
bridge is PME, so we can't rely on interrupts for Command Completion,
Link State Changed, etc.  Unless we wanted to manage the hotplug
controller by polling it, I think we have to put the bridge in D0
before sending any commands to it.

This seems like something that could be split into a separate patch
that wouldn't have anything to do with Thunderbolt.  Would that make
sense to you?

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/8] PCI: Recognize Thunderbolt devices
  2017-02-18  9:12     ` Lukas Wunner
@ 2017-02-19  4:27       ` Bjorn Helgaas
  2017-02-20 11:49         ` Lukas Wunner
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2017-02-19  4:27 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Sat, Feb 18, 2017 at 10:12:24AM +0100, Lukas Wunner wrote:
> On Fri, Feb 17, 2017 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> > On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > > Detect on device probe whether a PCI device is part of a Thunderbolt
> > > daisy chain (i.e. it is either part of a Thunderbolt controller or part
> > > of the hierarchy below a Thunderbolt controller).
> > 
> > My problem with this is that "is_thunderbolt" is not well-defined.
> > The PCI specs lay out the common vocabulary and framework for this
> > area.  To keep the code maintainable, we need to connect things back
> > to the specs somehow.
> > 
> > For example, it might make sense to have a bit that means "this device
> > can generate PME from D3hot", because PME and D3hot are part of that
> > common understanding of how PCI works, and we can use that information
> > to design the software.
> > 
> > An "is_thunderbolt" bit doesn't have any of that context.  It doesn't
> > say anything about how the device behaves, so I can't evaluate the
> > code that uses it.
> 
> No, this *does* connect back to the spec, the Thunderbolt spec to be
> specific, except that spec is not available publicly.  (I assume it's
> available under NDA.)  FWIW the PCI SIG doesn't make its specs freely
> available either, only to members or for $$$.  As Andreas Noever has
> pointed out before, there is plenty of precedent for including
> (reverse-engineered) code in the kernel for stuff that isn't public.

I'm not objecting to the fact that the Thunderbolt spec is not public.
What I want to know is specifically how the Thunderbolt bridge behaves
differently than a plain old PCIe bridge.  I don't even want to know
*all* the differences.  You're proposing to make the PCI core work a
slightly differently based on this bit, and in order to maintain that
in the future, we need to know the details of *why* we need to do
things differently.

Maybe D3hot means something different, maybe PME works differently,
maybe hotplug interrupts are signaled differently, I dunno.  If you
want us to treat these devices differently, we have to know *why* so
we can tell whether future changes in other areas also need to handle
them differently.

This might mean several bits instead of one single "is_thunderbolt"
bit, because there may be several differences that affect different
parts of the PCI core.

> The rationale for this bit is that devices on a Thunderbolt daisy chain
> differ from plain PCIe devices and as such require special treatment.
> 
> Thunderbolt encapsulates PCIe and other protocols such as DisplayPort
> into Thunderbolt packets and tunnels them over a network of Converged I/O
> switches, so what's transmitted on the wire differs significantly from
> plain PCIe.  That should already make the necessity for special treatment
> obvious.  I've named three use cases for the is_thunderbolt bit in the
> commit message of this patch but expect there will be more in the future.
> E.g. Apple has a traffic prioritization mechanism on the Converged I/O
> layer which is afforded to specific devices on the PCIe layer above,
> see US 9,015,384:
> http://pimg-fpiw.uspto.gov/fdd/84/153/090/0.pdf
> 
> 
> > A secondary issue is that I think whatever bit you define here should
> > only be set for the actual Thunderbolt device itself.  The bit will
> > presumably tell us something about how that particular device works.
> 
> I don't quite follow, the bit *does* tell us something about how the
> device works, doesn't it?  That it's situated on a Thunderbolt daisy
> chain, I've clearly spelled that out in the commit message as well as
> in a code comment of this patch.

No, it doesn't actually tell us anything about how it works.  The
commit message tells us some *consequences*, but it doesn't say
anything about the PCIe-level features that *lead* to those
consequences.

That means we can't reason about how Thunderbolt devices will work in
different scenarios or future platforms.

> > This is an enumeration-time thing and could be done either as in this
> > path or as a quirk, since it only affects one device.
> 
> You're contradicting yourself by suggesting to use a quirk here:
> 
> Back in 2014 an Intel engineer tried to add a PCI quirk with a long
> list of Thunderbolt devices and none other than you objected that
> you're "really not happy about checking a list of device IDs to identify
> Thunderbolt devices.  Surely there's a better way, because a list
> like this has to be updated regularly."
> https://patchwork.ozlabs.org/patch/409994/
> 
> The better way you were asking for back then is to probe for presence
> of the Thunderbolt VSEC, which is what this patch does.  Yet you're
> still not happy.  I don't get it.

I was wrong to suggest the possibility of a quirk, sorry.  You're
right: looking for the VSEC is a much better route, and that doesn't
fit well in a quirk because the quirk would have to run for all
devices.

> > If runtime code needs to know whether any upstream bridge is
> > Thunderbolt, it can always search up the hierarchy.  I think that
> > would improve readability because the software model would map more
> > closely to the hardware situation.
> 
> That would be more expensive than just checking a bit that is searched
> only once and then cached.  We have is_hotplug_bridge for just eight
> places where we check presence of hotplug capability, why can't we have
> is_thunderbolt?  Beats me.

Searching up the tree is more expensive, but it looks like we only
check it in enumeration-type situations, so I doubt this is a
performance issue.

is_hotplug_bridge is not quite comparable.  For one thing, it can be
set three ways: when PCI_EXP_SLTCAP contains PCI_EXP_SLTCAP_HPC, by a
quirk, and by acpiphp.  It maybe *should* also be set by other hotplug
drivers.  I don't think it would be practical to make an
is_hotplug_bridge() function to look up all those cases.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/8] PCI: Recognize Thunderbolt devices
  2017-02-19  4:27       ` Bjorn Helgaas
@ 2017-02-20 11:49         ` Lukas Wunner
  2017-02-21 22:55           ` Bjorn Helgaas
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2017-02-20 11:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Sat, Feb 18, 2017 at 10:27:45PM -0600, Bjorn Helgaas wrote:
> On Sat, Feb 18, 2017 at 10:12:24AM +0100, Lukas Wunner wrote:
> > On Fri, Feb 17, 2017 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> > > On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > > > Detect on device probe whether a PCI device is part of a Thunderbolt
> > > > daisy chain (i.e. it is either part of a Thunderbolt controller or part
> > > > of the hierarchy below a Thunderbolt controller).
> > > 
> > > My problem with this is that "is_thunderbolt" is not well-defined.
> > > The PCI specs lay out the common vocabulary and framework for this
> > > area.  To keep the code maintainable, we need to connect things back
> > > to the specs somehow.
> > > 
> > > For example, it might make sense to have a bit that means "this device
> > > can generate PME from D3hot", because PME and D3hot are part of that
> > > common understanding of how PCI works, and we can use that information
> > > to design the software.
> > > 
> > > An "is_thunderbolt" bit doesn't have any of that context.  It doesn't
> > > say anything about how the device behaves, so I can't evaluate the
> > > code that uses it.
> > 
> > No, this *does* connect back to the spec, the Thunderbolt spec to be
> > specific, except that spec is not available publicly.  (I assume it's
> > available under NDA.)  FWIW the PCI SIG doesn't make its specs freely
> > available either, only to members or for $$$.  As Andreas Noever has
> > pointed out before, there is plenty of precedent for including
> > (reverse-engineered) code in the kernel for stuff that isn't public.
> 
> I'm not objecting to the fact that the Thunderbolt spec is not public.
> What I want to know is specifically how the Thunderbolt bridge behaves
> differently than a plain old PCIe bridge.  I don't even want to know
> *all* the differences.  You're proposing to make the PCI core work a
> slightly differently based on this bit, and in order to maintain that
> in the future, we need to know the details of *why* we need to do
> things differently.

Okay, I think I'm starting to understand what you're driving at.

Thunderbolt tunnels PCIe transparently, so in principle there should be
no difference behaviour-wise.

As far as the PCI core is concerned, I'm only using is_thunderbolt to
whitelist Thunderbolt ports for runtime PM in patch [2/8]. (This just
concerns like a dozen chips whose behaviour is well understood, hence
enabling runtime PM should be safe.)

The other two upcoming use cases merely concern whether a device is on
a Thunderbolt daisy chain (vs. soldered to the mainboard), and to detect
presence of a Thunderbolt controller in the machine (their PCI devices
use PCI_CLASS_SYSTEM_OTHER and PCI_CLASS_BRIDGE_PCI, which is not
sufficiently unique to identify Thunderbolt controllers).

I decided to put the is_thunderbolt flag in struct pci_dev because any
PCI device might end up on a Thunderbolt daisy chain.  The purpose of
the bit is merely to cache that status, it does not signify that the
device suffers from some particular PCI quirk.


> Maybe D3hot means something different, maybe PME works differently,
> maybe hotplug interrupts are signaled differently, I dunno.  If you
> want us to treat these devices differently, we have to know *why* so
> we can tell whether future changes in other areas also need to handle
> them differently.

This all works fine.  Once a Thunderbolt tunnel has been set up, the
hotplug port on the PCIe switch integrated into the Thunderbolt
controller signals "Card present" and "Link up" interrupts.  On surprise
removal of an attached device, the Presence Detect and Link State bits
are cleared and an interrupt is signaled for each of them.

There's a quirk wherein Thunderbolt controllers claim to support Command
Complete interrupts but they never send them, I don't think that's
intentional but rather a hardware bug. The kernel deals gracefully with
that though, no special treatment necessary.


> > > If runtime code needs to know whether any upstream bridge is
> > > Thunderbolt, it can always search up the hierarchy.  I think that
> > > would improve readability because the software model would map more
> > > closely to the hardware situation.
> > 
> > That would be more expensive than just checking a bit that is searched
> > only once and then cached.  We have is_hotplug_bridge for just eight
> > places where we check presence of hotplug capability, why can't we have
> > is_thunderbolt?  Beats me.
> 
> Searching up the tree is more expensive, but it looks like we only
> check it in enumeration-type situations, so I doubt this is a
> performance issue.

In patch [3/8] of v5 of this series, I used the is_thunderbolt bit in
pci_dev_check_d3cold(), which is not only executed during ->probe and
->remove but also whenver the D3cold status of a device is changed via
sysfs.

However I dropped that patch and the remaining use cases are indeed
limited to ->probe paths, so I no longer feel strongly about avoiding
walking up the hierarchy.

The set_pcie_thunderbolt() function in this commit essentially does
two things:  (a) Detect presence of the Thunderbolt VSEC on a device
and (b) walking up the hierarchy to detect whether that VSEC is present
on a parent.

Do you want me to set the is_thunderbolt bit only on devices belonging
to a Thunderbolt controller and use a separate function to walk up the
hierarchy?

Or do you want me to dispose of the is_thunderbolt bit altogether and
use another function to detect presence of the VSEC?  Searching for a
capability can require lots of non-posted transactions, so keeping
is_thunderbolt would seem reasonable to me.

I could probably add these as inlines to include/linux/pci.h, similar
to how acpi_find_root_bridge_handle() in include/linux/pci-acpi.h walks
up the hierarchy.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports
  2017-02-19  0:16       ` Bjorn Helgaas
@ 2017-02-20 12:04         ` Lukas Wunner
  0 siblings, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2017-02-20 12:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Sat, Feb 18, 2017 at 06:16:06PM -0600, Bjorn Helgaas wrote:
> On Sat, Feb 18, 2017 at 10:25:16AM +0100, Lukas Wunner wrote:
> > On Tue, Feb 14, 2017 at 04:59:26PM -0600, Bjorn Helgaas wrote:
> > > On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > > > Commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe
> > > > hotplug ports") extended runtime PM for PCIe ports to hotplug ports.
> > > > However Yinghai Lu reported the following breakage caused by the commit:
> > > > 
> > > > * After disabling a slot on one of his servers, the slot signals PME
> > > >   which causes pciehp to immediately re-enable the card, thus defeating
> > > >   manual slot control via sysfs.
> > > 
> > > I don't think we've clearly established this.  It surprises me that a
> > > Downstream Port would signal PME after turning off power to the slot.
> > > Wouldn't that make PME useless?
> > > 
> > > I'd like to look at this in a little more detail, either by turning
> > > off PM and pciehp and poking things manually with setpci, or possibly
> > > by adding some debug output in the PME ISR and the pciehp ISR and
> > > "write command" paths.
> > 
> > I'm in the To: header of your e-mail but I am not in a position to
> > debug this as I don't have the hardware available.  It would have
> > to be done by Yinghai Lu or Intel.  This may already be clear to
> > everyone involved, so apologies for stating the obvious.
> 
> It is clear; I was just hoping you would make it as easy as possible
> for the folks with the hardware by proposing a debugging patch or a
> series of specific setpci commands to run.
> 
> > Also, I hope it's clear that these Intel servers are not affected
> > by v6 of this patch as it only enables runtime PM on *Thunderbolt*
> > hotplug ports.
> 
> Right.  I just don't understand what's happening, and I don't want
> this to become folk wisdom before we have more details.

If I understand Yinghai Lu's analysis correctly, the port is in D3hot,
we write to config space to enable power on the slot and request a
Command Complete interrupt, the port signals PME to request being
transferred to D0 so that it can deliver the Command Complete interrupt.

So this patch just papers over the issue by disabling PME. :-)

The issue doesn't manifest itself on Thunderbolt controllers because
(a) they don't support power control and (b) they never send Command
Complete interrupts (even though they claim to support it).

Since patch [3/8] of v6 of my series enables runtime PM only on
*Thunderbolt* hotplug ports, the series should be fine as is, however
this isn't really beautiful so if you decide to apply patch [3/8] I'll
come up with a fix, or if you'd rather postpone it I'll rework the patch
to keep the port runtime resumed whenever and for as long as we're waiting
for command completion.  I'll not be able to start working on this until
next week though.  If you do decide to postpone patch [3/8], maybe we
can at least come up with an agreeable version of patch [1/8] to go
into your 4.11 pull.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/8] PCI: Recognize Thunderbolt devices
  2017-02-20 11:49         ` Lukas Wunner
@ 2017-02-21 22:55           ` Bjorn Helgaas
  0 siblings, 0 replies; 21+ messages in thread
From: Bjorn Helgaas @ 2017-02-21 22:55 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Andreas Noever, linux-pci, Rafael J. Wysocki, Mika Westerberg,
	Erik Veijola, Ashok Raj, Keith Busch, Yinghai Lu

On Mon, Feb 20, 2017 at 12:49:28PM +0100, Lukas Wunner wrote:
> On Sat, Feb 18, 2017 at 10:27:45PM -0600, Bjorn Helgaas wrote:
> > On Sat, Feb 18, 2017 at 10:12:24AM +0100, Lukas Wunner wrote:
> > > On Fri, Feb 17, 2017 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> > > > On Sun, Feb 12, 2017 at 05:07:45PM +0100, Lukas Wunner wrote:
> > > > > Detect on device probe whether a PCI device is part of a Thunderbolt
> > > > > daisy chain (i.e. it is either part of a Thunderbolt controller or part
> > > > > of the hierarchy below a Thunderbolt controller).
> > > > 
> > > > My problem with this is that "is_thunderbolt" is not well-defined.
> > > > The PCI specs lay out the common vocabulary and framework for this
> > > > area.  To keep the code maintainable, we need to connect things back
> > > > to the specs somehow.
> > > > 
> > > > For example, it might make sense to have a bit that means "this device
> > > > can generate PME from D3hot", because PME and D3hot are part of that
> > > > common understanding of how PCI works, and we can use that information
> > > > to design the software.
> > > > 
> > > > An "is_thunderbolt" bit doesn't have any of that context.  It doesn't
> > > > say anything about how the device behaves, so I can't evaluate the
> > > > code that uses it.
> > > 
> > > No, this *does* connect back to the spec, the Thunderbolt spec to be
> > > specific, except that spec is not available publicly.  (I assume it's
> > > available under NDA.)  FWIW the PCI SIG doesn't make its specs freely
> > > available either, only to members or for $$$.  As Andreas Noever has
> > > pointed out before, there is plenty of precedent for including
> > > (reverse-engineered) code in the kernel for stuff that isn't public.
> > 
> > I'm not objecting to the fact that the Thunderbolt spec is not public.
> > What I want to know is specifically how the Thunderbolt bridge behaves
> > differently than a plain old PCIe bridge.  I don't even want to know
> > *all* the differences.  You're proposing to make the PCI core work a
> > slightly differently based on this bit, and in order to maintain that
> > in the future, we need to know the details of *why* we need to do
> > things differently.
> 
> Okay, I think I'm starting to understand what you're driving at.
> 
> Thunderbolt tunnels PCIe transparently, so in principle there should be
> no difference behaviour-wise.
> 
> As far as the PCI core is concerned, I'm only using is_thunderbolt to
> whitelist Thunderbolt ports for runtime PM in patch [2/8]. (This just
> concerns like a dozen chips whose behaviour is well understood, hence
> enabling runtime PM should be safe.)

If there's no known PM-related behavior difference between Thunderbolt
and other PCIe ports, I'm hesitant to enable runtime PM only for
Thunderbolt, especially since the proposal also enables runtime PM for
non-Thunderbolt devices that happen to be downstream from a
Thunderbolt port.

If we think Thunderbolt ports and non-Thunderbolt ports work the same
with respect to PM, we should operate them the same way to simplify
the code and improve test coverage.

In other words, if there are problems with new functionality, I would
prefer not to just exclude the hardware platforms with the problems,
especially when it's the huge generic class of "all non-Thunderbolt
PCIe ports".  I'd rather figure out the problems so we can enable the
functionality everywhere (possibly with quirks for known defective
platforms).

> The other two upcoming use cases merely concern whether a device is on
> a Thunderbolt daisy chain (vs. soldered to the mainboard), and to detect
> presence of a Thunderbolt controller in the machine (their PCI devices
> use PCI_CLASS_SYSTEM_OTHER and PCI_CLASS_BRIDGE_PCI, which is not
> sufficiently unique to identify Thunderbolt controllers).

If I understand correctly, this is another case where there's no
actual functional difference because of Thunderbolt, but apple-gmux
would use the fact that a GPU is connected via Thunderbolt to infer
something about which GPUs vga_switcheroo can switch between.

Since the PCI core doesn't care at all about this, could apple-gmux
figure this out on its own by looking for the VSEC capability itself?
It seems like doing it closer to where it's used would make things
more understandable.

> I decided to put the is_thunderbolt flag in struct pci_dev because any
> PCI device might end up on a Thunderbolt daisy chain.  The purpose of
> the bit is merely to cache that status, it does not signify that the
> device suffers from some particular PCI quirk.

Assuming we need it, having it in struct pci_dev is fine.  There's no
point in looking up the VSEC capability more than once.

> > Maybe D3hot means something different, maybe PME works differently,
> > maybe hotplug interrupts are signaled differently, I dunno.  If you
> > want us to treat these devices differently, we have to know *why* so
> > we can tell whether future changes in other areas also need to handle
> > them differently.
> 
> This all works fine.  Once a Thunderbolt tunnel has been set up, the
> hotplug port on the PCIe switch integrated into the Thunderbolt
> controller signals "Card present" and "Link up" interrupts.  On surprise
> removal of an attached device, the Presence Detect and Link State bits
> are cleared and an interrupt is signaled for each of them.
> 
> There's a quirk wherein Thunderbolt controllers claim to support Command
> Complete interrupts but they never send them, I don't think that's
> intentional but rather a hardware bug. The kernel deals gracefully with
> that though, no special treatment necessary.

Many Intel ports have a similar issue (erratum CF118, see 3461a068661c
("PCI: pciehp: Wait for hotplug command completion lazily")) where
they generate Command Completion interrupts for some commands but not
all.

I think pciehp should work even without those interrupts, though we
emit timeout warnings (which maybe could be toned down or omitted).

> > > > If runtime code needs to know whether any upstream bridge is
> > > > Thunderbolt, it can always search up the hierarchy.  I think that
> > > > would improve readability because the software model would map more
> > > > closely to the hardware situation.
> > > 
> > > That would be more expensive than just checking a bit that is searched
> > > only once and then cached.  We have is_hotplug_bridge for just eight
> > > places where we check presence of hotplug capability, why can't we have
> > > is_thunderbolt?  Beats me.
> > 
> > Searching up the tree is more expensive, but it looks like we only
> > check it in enumeration-type situations, so I doubt this is a
> > performance issue.
> 
> In patch [3/8] of v5 of this series, I used the is_thunderbolt bit in
> pci_dev_check_d3cold(), which is not only executed during ->probe and
> ->remove but also whenver the D3cold status of a device is changed via
> sysfs.
> 
> However I dropped that patch and the remaining use cases are indeed
> limited to ->probe paths, so I no longer feel strongly about avoiding
> walking up the hierarchy.
> 
> The set_pcie_thunderbolt() function in this commit essentially does
> two things:  (a) Detect presence of the Thunderbolt VSEC on a device
> and (b) walking up the hierarchy to detect whether that VSEC is present
> on a parent.
> 
> Do you want me to set the is_thunderbolt bit only on devices belonging
> to a Thunderbolt controller and use a separate function to walk up the
> hierarchy?

Let's figure out what information we need and then figure out where to
store it.  If Thunderbolt devices have a functional difference we need
to know about, struct pci_dev seems like a good place to store that.

Bjorn

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-02-21 22:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-12 16:07 [PATCH v6 0/8] Runtime PM for Thunderbolt on Macs Lukas Wunner
2017-02-12 16:07 ` [PATCH v6 4/8] Revert "PM / Runtime: Remove the exported function pm_children_suspended()" Lukas Wunner
2017-02-12 16:07 ` [PATCH v6 7/8] thunderbolt: Power down controller when idle Lukas Wunner
2017-02-12 16:07 ` [PATCH v6 2/8] PCI: Allow runtime PM on Thunderbolt ports Lukas Wunner
2017-02-14 20:38   ` Bjorn Helgaas
2017-02-15 12:13     ` Lukas Wunner
2017-02-17 16:06   ` Bjorn Helgaas
2017-02-12 16:07 ` [PATCH v6 8/8] thunderbolt: Runtime suspend NHI when idle Lukas Wunner
2017-02-12 16:07 ` [PATCH v6 1/8] PCI: Recognize Thunderbolt devices Lukas Wunner
2017-02-17 15:29   ` Bjorn Helgaas
2017-02-18  9:12     ` Lukas Wunner
2017-02-19  4:27       ` Bjorn Helgaas
2017-02-20 11:49         ` Lukas Wunner
2017-02-21 22:55           ` Bjorn Helgaas
2017-02-12 16:07 ` [PATCH v6 3/8] PCI: pciehp: Reinstate runtime PM on Thunderbolt hotplug ports Lukas Wunner
2017-02-14 22:59   ` Bjorn Helgaas
2017-02-18  9:25     ` Lukas Wunner
2017-02-19  0:16       ` Bjorn Helgaas
2017-02-20 12:04         ` Lukas Wunner
2017-02-12 16:07 ` [PATCH v6 5/8] PM: Make requirements of dev_pm_domain_set() more precise Lukas Wunner
2017-02-12 16:07 ` [PATCH v6 6/8] PM / sleep: Define constant for direct_complete Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).