All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug
@ 2018-04-16 10:34 Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 1/9] PCI: Take all bridges into account when calculating bus numbers for extension Mika Westerberg
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

Hi,

When Thunderbolt host router is configured to be in native enumeration mode
it is only present in the system if there is something connected to the
ports. This pretty much follows how the BIOS assisted mode works.

In native enumeration mode the Thunderbolt host controller (NHI) and USB
host controller (xHCI) are not hot-added using native PCIe hotplug but
instead they will be hotplugged via BIOS triggered ACPI Notify() to the
root port. This is done to preserve resources since NHI and xHCI only need
1 MB of MMIO space and no additional buses. Currently Linux does not
support this very well and ends up failing the hotplug in one way or
another. More detailed explanation is in changelog of patch [4/9].

This series fixes this issue and in addition includes fixes for few other
issues found during testing on a system that has Thunderbolt controller in
native PCIe enumeration mode. However, the fixes here are not in any way
Thunderbolt specific and should be applicable to other systems as well.

The previous versions of the patch series can be found here:

  v4: https://www.spinics.net/lists/linux-pci/msg71006.html
  v3: https://www.spinics.net/lists/linux-acpi/msg80876.html
  v2: https://www.spinics.net/lists/linux-pci/msg69186.html
  v1: https://www.spinics.net/lists/linux-acpi/msg80607.html

Changes from v4:

  - Updated message in patch [8/9] following Rafael's suggestion
  - Added Rafael's tag to patches 6, 7, 9.

Changes from v3:

  - Added Andy's tag to patches [1-5]
  - Improved changelog and subject line of patch [1/9] to match better what
    it is trying to solve
  - Improved comment in patch [1/9]
  - Improved changelog of patches [2/9] and [4/9] accordingly to match
    better why they are needed
  - Added cleanup patches [6-9/9]

Changes from v2:

  - Added Rafael's tag to patch [1/5]
  - Updated changelog of patch [1/5] to include more details about how the
    problem can be observed from dmesg and lspci output

Changes from v1:

  - Drop 'cmax - max ?: 1' and use similar construct than we use in second
    pass loop in patch [1/5]
  - Drop unnecessary parentheses in patch [1/5]
  - Added Rafael's tag to patches [2-5/5]

Mika Westerberg (9):
  PCI: Take all bridges into account when calculating bus numbers for extension
  PCI: Take bridge window alignment into account when distributing resources
  PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used
  ACPI / hotplug / PCI: Mark stale PCI devices disconnected
  PCI: Move resource distribution for a single bridge outside of the loop
  PCI: Document return value of pci_scan_bridge() and pci_scan_bridge_extend()
  PCI: Improve "partially hidden behind bridge" log message
  ACPI / hotplug / PCI: Drop unnecessary parentheses

 drivers/pci/hotplug/acpiphp.h      |  1 +
 drivers/pci/hotplug/acpiphp_glue.c | 82 ++++++++++++++++++++++++++++++--------
 drivers/pci/hotplug/pciehp.h       |  2 +-
 drivers/pci/hotplug/pciehp_core.c  |  2 +-
 drivers/pci/hotplug/pciehp_hpc.c   | 13 +++++-
 drivers/pci/probe.c                | 33 +++++++++------
 drivers/pci/setup-bus.c            | 69 ++++++++++++++++++++++++++------
 7 files changed, 158 insertions(+), 44 deletions(-)

-- 
2.16.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5 1/9] PCI: Take all bridges into account when calculating bus numbers for extension
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources Mika Westerberg
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

When distributing extra buses between hotplug bridges for future
extension we currently fail to take into account the fact that there
might be non-hotplug bridges on the bus after the hotplug bridges. In
this particular system we have following topology:

  01:00.0 --+- 02:00.0 -- Thunderbolt host controller
            +- 02:01.0 (HotPlug+)
            \- 02:02.0 -- xHCI host controller

Now, pci_scan_child_bus_extend() is supposed to distribute remaining bus
numbers to the hotplug bridge at 02:01.0 but only after all bridges on
that bus have been been accounted for. Since we don't check upfront that
there will be another non-hotplug bridge after the hotplug bridge
02:01.0 it will inadvertently extend over the bus space of the
non-hotplug bridge:

  pci 0000:01:00.0: PCI bridge to [bus 02-39]
  ...
  pci_bus 0000:04: [bus 04-39] extended by 0x35
  pci_bus 0000:04: bus scan returning with max=39
  pci_bus 0000:04: busn_res: [bus 04-39] end is updated to 39
  pci 0000:02:02.0: scanning [bus 00-00] behind bridge, pass 1
  pci_bus 0000:3a: scanning bus
  pci_bus 0000:3a: bus scan returning with max=3a
  pci_bus 0000:3a: busn_res: [bus 3a] end is updated to 3a
  pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:02 [bus 02-39]
  pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:01 [bus 01-39]
  pci_bus 0000:02: bus scan returning with max=3a
  pci_bus 0000:02: busn_res: [bus 02-39] end can not be updated to 3a

Resulting 'lspci -t' output looks like this:

  +-1b.0-[01-39]----00.0-[02-3a]--+-00.0-[03]----00.0
                             ^^   +-01.0-[04-39]--
                                  \-02.0-[3a]----00.0
                                          ^^
The xHCI host controller (3a:00.0) behind downstream bridge at 02:02.0
is not usable anymore.

To fix this reserve at least one bus for each bridge during scanning of
already configured bridges. Then we use this information in the second
scan to correct the available extra bus space for hotplug bridges.

After this change the 'lspci -t' output is what is expected:

  +-1b.0-[01-39]----00.0-[02-39]--+-00.0-[03]----00.0
                                  +-01.0-[04-38]--
                                  \-02.0-[39]----00.0

Fixes: 1c02ea810065 ("PCI: Distribute available buses to hotplug-capable bridges")
Reported-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/pci/probe.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac91b6fd0bcd..7c34a2ccb514 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2638,7 +2638,14 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
 	for_each_pci_bridge(dev, bus) {
 		cmax = max;
 		max = pci_scan_bridge_extend(bus, dev, max, 0, 0);
-		used_buses += cmax - max;
+		/*
+		 * Reserve one bus for each bridge now to avoid
+		 * extending hotplug bridges too much during the second
+		 * scan below.
+		 */
+		used_buses++;
+		if (cmax - max > 1)
+			used_buses += cmax - max - 1;
 	}
 
 	/* Scan bridges that need to be reconfigured */
@@ -2661,12 +2668,14 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
 			 * bridges if any.
 			 */
 			buses = available_buses / hotplug_bridges;
-			buses = min(buses, available_buses - used_buses);
+			buses = min(buses, available_buses - used_buses + 1);
 		}
 
 		cmax = max;
 		max = pci_scan_bridge_extend(bus, dev, cmax, buses, 1);
-		used_buses += max - cmax;
+		/* One bus is already accounted so don't add it again */
+		if (max - cmax > 1)
+			used_buses += max - cmax - 1;
 	}
 
 	/*
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 1/9] PCI: Take all bridges into account when calculating bus numbers for extension Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  2018-04-25 22:38   ` Bjorn Helgaas
  2018-04-16 10:34 ` [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume Mika Westerberg
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

When hot-adding a PCIe switch the way we currently distribute resources
does not always work well because devices connected to the switch might
need to have their MMIO resources aligned to something else than the
default 1 MB boundary. For example Intel Gigabit ET2 quad port server
adapter includes PCIe switch leading to 4 x GbE NIC devices that want
to have their MMIO resources aligned to 2 MB boundary instead.

The current resource distribution code does not take this alignment into
account and might try to add too much resources for the extension
hotplug bridge(s). The resulting bridge window is too big which makes
the resource assignment operation fail, and we are left with a bridge
window with minimal amount (1 MB) of MMIO space.

Here is what happens when an Intel Gigabit ET2 quad port server adapter
is hot-added:

  pci 0000:39:00.0: BAR 14: assigned [mem 0x53300000-0x6a0fffff]
                                          ^^^^^^^^^^
  pci 0000:3a:01.0: BAR 14: assigned [mem 0x53400000-0x547fffff]
                                          ^^^^^^^^^^
The above shows that the downstream bridge (3a:01.0) window is aligned
to 2 MB instead of 1 MB as is the upstream bridge (39:00.0) window. The
remaining MMIO space (0x15a00000) is assigned to the hotplug bridge
(3a:04.0) but it fails:

  pci 0000:3a:04.0: BAR 14: no space for [mem size 0x15a00000]
  pci 0000:3a:04.0: BAR 14: failed to assign [mem size 0x15a00000]

The MMIO resource is calculated as follows:

  start = 0x54800000
  end = 0x54800000 + 0x15a00000 - 1 = 0x6a1fffff

This results bridge window [mem 0x54800000 - 0x6a1fffff] and it ends
after the upstream bridge window [mem 0x53300000-0x6a0fffff] explaining
the above failure. Because of this Linux falls back to the default
allocation of 1 MB as can be seen from 'lspci' output:

 39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
   3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
   3a:04.0 Memory behind bridge: 53300000-533fffff [size=1M]

The hotplug bridge 3a:04.0 only occupies 1 MB MMIO window which is
clearly not enough for extending the PCIe topology later if more devices
are to be hot-added.

Fix this by substracting properly aligned non-hotplug downstream bridge
window size from the remaining resources used for extension. After this
change the resource allocation looks like:

  39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
    3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
    3a:04.0 Memory behind bridge: 54800000-6a0fffff [size=345M]

This matches the expectation. All the extra MMIO resource space (345 MB)
is allocated to the extension hotplug bridge (3a:04.0).

Fixes: 1a5767725cec ("PCI: Distribute available resources to hotplug-capable bridges")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/pci/setup-bus.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 072784f55ea5..eb3059fb7f63 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1878,6 +1878,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 	resource_size_t available_mmio, resource_size_t available_mmio_pref)
 {
 	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
+	resource_size_t io_start, mmio_start, mmio_pref_start;
 	unsigned int normal_bridges = 0, hotplug_bridges = 0;
 	struct resource *io_res, *mmio_res, *mmio_pref_res;
 	struct pci_dev *dev, *bridge = bus->self;
@@ -1942,11 +1943,16 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 			remaining_mmio_pref -= resource_size(res);
 	}
 
+	io_start = io_res->start;
+	mmio_start = mmio_res->start;
+	mmio_pref_start = mmio_pref_res->start;
+
 	/*
 	 * Go over devices on this bus and distribute the remaining
 	 * resource space between hotplug bridges.
 	 */
 	for_each_pci_bridge(dev, bus) {
+		resource_size_t align;
 		struct pci_bus *b;
 
 		b = dev->subordinate;
@@ -1964,7 +1970,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 				available_io, available_mmio,
 				available_mmio_pref);
 		} else if (dev->is_hotplug_bridge) {
-			resource_size_t align, io, mmio, mmio_pref;
+			resource_size_t io, mmio, mmio_pref;
 
 			/*
 			 * Distribute available extra resources equally
@@ -1977,11 +1983,13 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 			io = div64_ul(available_io, hotplug_bridges);
 			io = min(ALIGN(io, align), remaining_io);
 			remaining_io -= io;
+			io_start += io;
 
 			align = pci_resource_alignment(bridge, mmio_res);
 			mmio = div64_ul(available_mmio, hotplug_bridges);
 			mmio = min(ALIGN(mmio, align), remaining_mmio);
 			remaining_mmio -= mmio;
+			mmio_start += mmio;
 
 			align = pci_resource_alignment(bridge, mmio_pref_res);
 			mmio_pref = div64_ul(available_mmio_pref,
@@ -1989,9 +1997,40 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 			mmio_pref = min(ALIGN(mmio_pref, align),
 					remaining_mmio_pref);
 			remaining_mmio_pref -= mmio_pref;
+			mmio_pref_start += mmio_pref;
 
 			pci_bus_distribute_available_resources(b, add_list, io,
 							       mmio, mmio_pref);
+		} else {
+			/*
+			 * For normal bridges, track start of the parent
+			 * bridge window to make sure we align the
+			 * remaining space which is distributed to the
+			 * hotplug bridges properly.
+			 */
+			resource_size_t aligned;
+			struct resource *res;
+
+			res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
+			io_start += resource_size(res);
+			aligned = ALIGN(io_start,
+					pci_resource_alignment(dev, res));
+			if (aligned > io_start)
+				remaining_io -= aligned - io_start;
+
+			res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
+			mmio_start += resource_size(res);
+			aligned = ALIGN(mmio_start,
+					pci_resource_alignment(dev, res));
+			if (aligned > mmio_start)
+				remaining_mmio -= aligned - mmio_start;
+
+			res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
+			mmio_pref_start += resource_size(res);
+			aligned = ALIGN(mmio_pref_start,
+					pci_resource_alignment(dev, res));
+			if (aligned > mmio_pref_start)
+				remaining_mmio_pref -= aligned - mmio_pref_start;
 		}
 	}
 }
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 1/9] PCI: Take all bridges into account when calculating bus numbers for extension Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  2018-05-01 21:52   ` Bjorn Helgaas
  2018-04-16 10:34 ` [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used Mika Westerberg
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

After suspend/resume cycle the Presence Detect or Data Link Layer Status
Changed bits might be set and if we don't clear them those events will
not fire anymore and nothing happens for instance when a device is now
hot-unplugged.

Fix this by clearing those bits in a newly introduced function
pcie_reenable_notification(). This should be fine because immediately
after we check if the adapter is still present by reading directly from
the status register.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/pci/hotplug/pciehp.h      |  2 +-
 drivers/pci/hotplug/pciehp_core.c |  2 +-
 drivers/pci/hotplug/pciehp_hpc.c  | 13 ++++++++++++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 88e917c9120f..5f892065585e 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -121,7 +121,7 @@ struct controller *pcie_init(struct pcie_device *dev);
 int pcie_init_notification(struct controller *ctrl);
 int pciehp_enable_slot(struct slot *p_slot);
 int pciehp_disable_slot(struct slot *p_slot);
-void pcie_enable_notification(struct controller *ctrl);
+void pcie_reenable_notification(struct controller *ctrl);
 int pciehp_power_on_slot(struct slot *slot);
 void pciehp_power_off_slot(struct slot *slot);
 void pciehp_get_power_status(struct slot *slot, u8 *status);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 332b723ff9e6..44a6a63802d5 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -283,7 +283,7 @@ static int pciehp_resume(struct pcie_device *dev)
 	ctrl = get_service_data(dev);
 
 	/* reinitialize the chipset's event detection logic */
-	pcie_enable_notification(ctrl);
+	pcie_reenable_notification(ctrl);
 
 	slot = ctrl->slot;
 
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 18a42f8f5dc5..98ea75aa32c7 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -659,7 +659,7 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 	return handled;
 }
 
-void pcie_enable_notification(struct controller *ctrl)
+static void pcie_enable_notification(struct controller *ctrl)
 {
 	u16 cmd, mask;
 
@@ -697,6 +697,17 @@ void pcie_enable_notification(struct controller *ctrl)
 		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
 }
 
+void pcie_reenable_notification(struct controller *ctrl)
+{
+	/*
+	 * Clear both Presence and Data Link Layer Changed to make sure
+	 * those events still fire after we have re-enabled them.
+	 */
+	pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_SLTSTA,
+				   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
+	pcie_enable_notification(ctrl);
+}
+
 static void pcie_disable_notification(struct controller *ctrl)
 {
 	u16 mask;
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
                   ` (2 preceding siblings ...)
  2018-04-16 10:34 ` [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
       [not found]   ` <20180502204932.GG11698@bhelgaas-glaptop.roam.corp.google.com>
  2018-04-16 10:34 ` [PATCH v5 5/9] ACPI / hotplug / PCI: Mark stale PCI devices disconnected Mika Westerberg
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

When a system is using native PCIe hotplug for Thunderbolt and the
controller is not enabled for full RTD3 (runtime D3) it will be only
present in the system when there is a device connected. This pretty much
follows the BIOS assisted hotplug behaviour.

Thunderbolt host router integrated PCIe switch has two additional PCIe
downstream bridges that lead to NHI (Thunderbolt host controller) and xHCI
(USB 3 host controller) respectively. These downstream bridges are not
marked being hotplug capable. Reason for that is to preserve resources.
Otherwise the OS would distribute remaining resources between all
downstream bridges making these two bridges consume precious resources
of the actual hotplug bridges.

Now, because these two bridges are not marked being hotplug capable the OS
will not enable hotplug interrupt for them either and will not receive
interrupt when devices behind them are hot-added. Solution to this is
that the BIOS sends ACPI Notify() to the root port let the OS know it
needs to rescan for added and/or removed devices.

Here is how the mechanism is supposed to work when a Thunderbolt
endpoint is connected to one of the ports. In case of a standard USB-C
device only the xHCI is hot-added otherwise steps are the same.

1. Initially there is only the PCIe root port that is controlled by
   the pciehp driver

  00:1b.0 (Hotplug+) --

2. Then we get native PCIe hotplug interrupt and once it is handled the
   topology looks as following

  00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 --
                                  +- 02:01.0 (HotPlug+)
                                  \- 02:02.0 --

3. Bridges 02:00.0 and 02:02.0 are not marked as hotplug capable and
   they don't have anything behind them currently. Bridge 02:01.0 is
   hotplug capable and used for extending the topology. At this point
   the required PCIe devices are enabled and ACPI Notify() is sent to
   the root port. The resulting topology is expected to look like

  00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 -- Thunderbolt host controller
                                  +- 02:01.0 (HotPlug+)
                                  \- 02:02.0 -- xHCI host controller

However, the current ACPI hotplug implementation scans the whole 00:1b.0
hotplug slot and everything behind it regardless whether native PCIe is
used or not, and it expects that the BIOS has configured bridge
resources upfront. If that's not the case it assigns resources using
minimal allocation (everything currently found just barely fit)
preventing future extension. In addition to that, if there is another
native PCIe hotplug going on we may find the new PCIe switch only
partially ready (all links are not fully trained yet) confusing pciehp
when it finally starts to enumerate for new devices.

To make this work better with the native PCIe hotplug driver (pciehp),
we let it handle all slot management and resource allocation for hotplug
bridges and restrict ACPI hotplug to non-hotplug bridges.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/pci/hotplug/acpiphp.h      |  1 +
 drivers/pci/hotplug/acpiphp_glue.c | 73 ++++++++++++++++++++++++++++++--------
 2 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp.h b/drivers/pci/hotplug/acpiphp.h
index e438a2d734f2..8dcfd623ef1d 100644
--- a/drivers/pci/hotplug/acpiphp.h
+++ b/drivers/pci/hotplug/acpiphp.h
@@ -158,6 +158,7 @@ struct acpiphp_attention_info
 
 #define SLOT_ENABLED		(0x00000001)
 #define SLOT_IS_GOING_AWAY	(0x00000002)
+#define SLOT_IS_NATIVE		(0x00000004)
 
 /* function flags */
 
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index b45b375c0e6c..5efa21cdddc9 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -282,6 +282,9 @@ static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
 	slot->device = device;
 	INIT_LIST_HEAD(&slot->funcs);
 
+	if (pdev && pciehp_is_native(pdev))
+		slot->flags |= SLOT_IS_NATIVE;
+
 	list_add_tail(&slot->node, &bridge->slots);
 
 	/*
@@ -291,7 +294,7 @@ static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
 	 * expose slots to user space in those cases.
 	 */
 	if ((acpi_pci_check_ejectable(pbus, handle) || is_dock_device(adev))
-	    && !(pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev))) {
+	    && !(slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge)) {
 		unsigned long long sun;
 		int retval;
 
@@ -430,6 +433,29 @@ static int acpiphp_rescan_slot(struct acpiphp_slot *slot)
 	return pci_scan_slot(slot->bus, PCI_DEVFN(slot->device, 0));
 }
 
+static void acpiphp_native_scan_bridge(struct pci_dev *bridge)
+{
+	struct pci_bus *bus = bridge->subordinate;
+	struct pci_dev *dev;
+	int max;
+
+	if (!bus)
+		return;
+
+	max = bus->busn_res.start;
+	/* Scan already configured non-hotplug bridges */
+	for_each_pci_bridge(dev, bus) {
+		if (!dev->is_hotplug_bridge)
+			max = pci_scan_bridge(bus, dev, max, 0);
+	}
+
+	/* Scan non-hotplug bridges that need to be reconfigured */
+	for_each_pci_bridge(dev, bus) {
+		if (!dev->is_hotplug_bridge)
+			max = pci_scan_bridge(bus, dev, max, 1);
+	}
+}
+
 /**
  * enable_slot - enable, configure a slot
  * @slot: slot to be enabled
@@ -442,25 +468,42 @@ static void enable_slot(struct acpiphp_slot *slot)
 	struct pci_dev *dev;
 	struct pci_bus *bus = slot->bus;
 	struct acpiphp_func *func;
-	int max, pass;
-	LIST_HEAD(add_list);
 
-	acpiphp_rescan_slot(slot);
-	max = acpiphp_max_busnr(bus);
-	for (pass = 0; pass < 2; pass++) {
+	if (slot->flags & SLOT_IS_NATIVE) {
+		/*
+		 * If native PCIe hotplug is used, it will take care of
+		 * hotplug slot management and resource allocation for
+		 * hotplug bridges. However, ACPI hotplug may still be
+		 * used for non-hotplug bridges to bring in additional
+		 * devices such as Thunderbolt host controller.
+		 */
 		for_each_pci_bridge(dev, bus) {
-			if (PCI_SLOT(dev->devfn) != slot->device)
-				continue;
-
-			max = pci_scan_bridge(bus, dev, max, pass);
-			if (pass && dev->subordinate) {
-				check_hotplug_bridge(slot, dev);
-				pcibios_resource_survey_bus(dev->subordinate);
-				__pci_bus_size_bridges(dev->subordinate, &add_list);
+			if (PCI_SLOT(dev->devfn) == slot->device)
+				acpiphp_native_scan_bridge(dev);
+		}
+		pci_assign_unassigned_bridge_resources(bus->self);
+	} else {
+		LIST_HEAD(add_list);
+		int max, pass;
+
+		acpiphp_rescan_slot(slot);
+		max = acpiphp_max_busnr(bus);
+		for (pass = 0; pass < 2; pass++) {
+			for_each_pci_bridge(dev, bus) {
+				if (PCI_SLOT(dev->devfn) != slot->device)
+					continue;
+
+				max = pci_scan_bridge(bus, dev, max, pass);
+				if (pass && dev->subordinate) {
+					check_hotplug_bridge(slot, dev);
+					pcibios_resource_survey_bus(dev->subordinate);
+					__pci_bus_size_bridges(dev->subordinate,
+							       &add_list);
+				}
 			}
 		}
+		__pci_bus_assign_resources(bus, &add_list, NULL);
 	}
-	__pci_bus_assign_resources(bus, &add_list, NULL);
 
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 5/9] ACPI / hotplug / PCI: Mark stale PCI devices disconnected
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
                   ` (3 preceding siblings ...)
  2018-04-16 10:34 ` [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 6/9] PCI: Move resource distribution for a single bridge outside of the loop Mika Westerberg
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

Following PCIehp mark the unplugged PCI devices disconnected. This makes
sure PCI core code leaves the now missing hardware registers alone.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/pci/hotplug/acpiphp_glue.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 5efa21cdddc9..0aef35ee665a 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -651,6 +651,11 @@ static void trim_stale_devices(struct pci_dev *dev)
 		alive = pci_device_is_present(dev);
 
 	if (!alive) {
+		pci_dev_set_disconnected(dev, NULL);
+		if (pci_has_subordinate(dev))
+			pci_walk_bus(dev->subordinate, pci_dev_set_disconnected,
+				     NULL);
+
 		pci_stop_and_remove_bus_device(dev);
 		if (adev)
 			acpi_bus_trim(adev);
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 6/9] PCI: Move resource distribution for a single bridge outside of the loop
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
                   ` (4 preceding siblings ...)
  2018-04-16 10:34 ` [PATCH v5 5/9] ACPI / hotplug / PCI: Mark stale PCI devices disconnected Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  2018-04-24 23:05   ` Bjorn Helgaas
  2018-04-16 10:34 ` [PATCH v5 7/9] PCI: Document return value of pci_scan_bridge() and pci_scan_bridge_extend() Mika Westerberg
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

There is a special case where there is only single bridge on the bus. In
that case we just assign all resources to it. Currently this is done as
a part of the resource distribution loop but it does not have to be
there, and moving it outside actually improves readability.

While there we can add hotplug_bridges == 1 && normal_bridges == 0 to
the same block because they are dealt the same way.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/setup-bus.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index eb3059fb7f63..a1b63c7745d0 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1947,6 +1947,22 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 	mmio_start = mmio_res->start;
 	mmio_pref_start = mmio_pref_res->start;
 
+	/*
+	 * There is only one bridge on the bus so it gets all available
+	 * resources which it can then distribute to the possible
+	 * hotplug bridges below.
+	 */
+	if ((hotplug_bridges == 0 && normal_bridges == 1) ||
+	    (hotplug_bridges == 1 && normal_bridges == 0)) {
+		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
+		if (dev->subordinate) {
+			pci_bus_distribute_available_resources(dev->subordinate,
+				add_list, available_io, available_mmio,
+				available_mmio_pref);
+		}
+		return;
+	}
+
 	/*
 	 * Go over devices on this bus and distribute the remaining
 	 * resource space between hotplug bridges.
@@ -1959,17 +1975,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 		if (!b)
 			continue;
 
-		if (!hotplug_bridges && normal_bridges == 1) {
-			/*
-			 * There is only one bridge on the bus (upstream
-			 * port) so it gets all available resources
-			 * which it can then distribute to the possible
-			 * hotplug bridges below.
-			 */
-			pci_bus_distribute_available_resources(b, add_list,
-				available_io, available_mmio,
-				available_mmio_pref);
-		} else if (dev->is_hotplug_bridge) {
+		if (dev->is_hotplug_bridge) {
 			resource_size_t io, mmio, mmio_pref;
 
 			/*
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 7/9] PCI: Document return value of pci_scan_bridge() and pci_scan_bridge_extend()
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
                   ` (5 preceding siblings ...)
  2018-04-16 10:34 ` [PATCH v5 6/9] PCI: Move resource distribution for a single bridge outside of the loop Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 8/9] PCI: Improve "partially hidden behind bridge" log message Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 9/9] ACPI / hotplug / PCI: Drop unnecessary parentheses Mika Westerberg
  8 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

It is not immediately clear what the two functions actually return so
add kernel-doc comment explaining it a bit better.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/probe.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 7c34a2ccb514..30a895bd9839 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -998,6 +998,8 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
  * already configured by the BIOS and after we are done with all of
  * them, we proceed to assigning numbers to the remaining buses in
  * order to avoid overlaps between old and new bus numbers.
+ *
+ * Return: New subordinate number covering all buses behind this bridge.
  */
 static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 				  int max, unsigned int available_buses,
@@ -1230,6 +1232,8 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
  * already configured by the BIOS and after we are done with all of
  * them, we proceed to assigning numbers to the remaining buses in
  * order to avoid overlaps between old and new bus numbers.
+ *
+ * Return: New subordinate number covering all buses behind this bridge.
  */
 int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
 {
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 8/9] PCI: Improve "partially hidden behind bridge" log message
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
                   ` (6 preceding siblings ...)
  2018-04-16 10:34 ` [PATCH v5 7/9] PCI: Document return value of pci_scan_bridge() and pci_scan_bridge_extend() Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  2018-04-16 10:34 ` [PATCH v5 9/9] ACPI / hotplug / PCI: Drop unnecessary parentheses Mika Westerberg
  8 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

There is a sanity checker in the end of pci_scan_child_bus_extend() that
tries to detect badly configured bridges. For example given the below
topology:

  +-1b.0-[01-39]----00.0-[02-3a]--+-00.0-[03]----00.0
                                  +-01.0-[04-39]--
                                  \-02.0-[3a]----00.0

The sanity checker notices this and logs following messages:

  pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:02 [bus 02-39]
  pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:01 [bus 01-39]

This is not really helpful to users and the information above is not
even correct (0000:02 is a bus not bridge). Make this a bit more
understandable by changing the sanity checker to log following message
in place of the above two messages:

  pci 0000:02:02.0: devices behind bridge are unusable, because [bus 3a] cannot be assigned for them

While there update the comment on top of the sanity checker block to
make it clear that it is not only restricted to CardBus.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/probe.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 30a895bd9839..e71299ac1135 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1190,20 +1190,16 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
 		(is_cardbus ? "PCI CardBus %04x:%02x" : "PCI Bus %04x:%02x"),
 		pci_domain_nr(bus), child->number);
 
-	/* Has only triggered on CardBus, fixup is in yenta_socket */
+	/* Check that all devices are accessible */
 	while (bus->parent) {
 		if ((child->busn_res.end > bus->busn_res.end) ||
 		    (child->number > bus->busn_res.end) ||
 		    (child->number < bus->number) ||
 		    (child->busn_res.end < bus->number)) {
-			dev_info(&child->dev, "%pR %s hidden behind%s bridge %s %pR\n",
-				&child->busn_res,
-				(bus->number > child->busn_res.end &&
-				 bus->busn_res.end < child->number) ?
-					"wholly" : "partially",
-				bus->self->transparent ? " transparent" : "",
-				dev_name(&bus->dev),
-				&bus->busn_res);
+			dev_info(&dev->dev,
+				 "devices behind bridge are unusable, because %pR cannot be assigned for them\n",
+				 &child->busn_res);
+			break;
 		}
 		bus = bus->parent;
 	}
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 9/9] ACPI / hotplug / PCI: Drop unnecessary parentheses
  2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
                   ` (7 preceding siblings ...)
  2018-04-16 10:34 ` [PATCH v5 8/9] PCI: Improve "partially hidden behind bridge" log message Mika Westerberg
@ 2018-04-16 10:34 ` Mika Westerberg
  8 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-04-16 10:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, Lukas Wunner, Mika Westerberg, linux-pci,
	linux-acpi

There is no need for them and it makes the code look uglier than it
should. Remove them.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/hotplug/acpiphp_glue.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 0aef35ee665a..45e19615819e 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -524,7 +524,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 		if (!dev) {
 			/* Do not set SLOT_ENABLED flag if some funcs
 			   are not added. */
-			slot->flags &= (~SLOT_ENABLED);
+			slot->flags &= ~SLOT_ENABLED;
 			continue;
 		}
 	}
@@ -553,7 +553,7 @@ static void disable_slot(struct acpiphp_slot *slot)
 	list_for_each_entry(func, &slot->funcs, sibling)
 		acpi_bus_trim(func_to_acpi_device(func));
 
-	slot->flags &= (~SLOT_ENABLED);
+	slot->flags &= ~SLOT_ENABLED;
 }
 
 static bool slot_no_hotplug(struct acpiphp_slot *slot)
-- 
2.16.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 6/9] PCI: Move resource distribution for a single bridge outside of the loop
  2018-04-16 10:34 ` [PATCH v5 6/9] PCI: Move resource distribution for a single bridge outside of the loop Mika Westerberg
@ 2018-04-24 23:05   ` Bjorn Helgaas
  2018-04-25  7:29     ` Mika Westerberg
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2018-04-24 23:05 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Mon, Apr 16, 2018 at 01:34:50PM +0300, Mika Westerberg wrote:
> There is a special case where there is only single bridge on the bus. In
> that case we just assign all resources to it. Currently this is done as
> a part of the resource distribution loop but it does not have to be
> there, and moving it outside actually improves readability.
> 
> While there we can add hotplug_bridges == 1 && normal_bridges == 0 to
> the same block because they are dealt the same way.
> 
> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/pci/setup-bus.c | 28 +++++++++++++++++-----------
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index eb3059fb7f63..a1b63c7745d0 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1947,6 +1947,22 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	mmio_start = mmio_res->start;
>  	mmio_pref_start = mmio_pref_res->start;
>  
> +	/*
> +	 * There is only one bridge on the bus so it gets all available
> +	 * resources which it can then distribute to the possible
> +	 * hotplug bridges below.
> +	 */
> +	if ((hotplug_bridges == 0 && normal_bridges == 1) ||
> +	    (hotplug_bridges == 1 && normal_bridges == 0)) {

  if (hotplug_bridges + normal_bridges == 1) {

Don't repost just for this; I can fold it in if you agree.

> +		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> +		if (dev->subordinate) {
> +			pci_bus_distribute_available_resources(dev->subordinate,
> +				add_list, available_io, available_mmio,
> +				available_mmio_pref);
> +		}
> +		return;
> +	}
> +
>  	/*
>  	 * Go over devices on this bus and distribute the remaining
>  	 * resource space between hotplug bridges.
> @@ -1959,17 +1975,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  		if (!b)
>  			continue;
>  
> -		if (!hotplug_bridges && normal_bridges == 1) {
> -			/*
> -			 * There is only one bridge on the bus (upstream
> -			 * port) so it gets all available resources
> -			 * which it can then distribute to the possible
> -			 * hotplug bridges below.
> -			 */
> -			pci_bus_distribute_available_resources(b, add_list,
> -				available_io, available_mmio,
> -				available_mmio_pref);
> -		} else if (dev->is_hotplug_bridge) {
> +		if (dev->is_hotplug_bridge) {
>  			resource_size_t io, mmio, mmio_pref;
>  
>  			/*
> -- 
> 2.16.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 6/9] PCI: Move resource distribution for a single bridge outside of the loop
  2018-04-24 23:05   ` Bjorn Helgaas
@ 2018-04-25  7:29     ` Mika Westerberg
  0 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-04-25  7:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Tue, Apr 24, 2018 at 06:05:58PM -0500, Bjorn Helgaas wrote:
> > +	/*
> > +	 * There is only one bridge on the bus so it gets all available
> > +	 * resources which it can then distribute to the possible
> > +	 * hotplug bridges below.
> > +	 */
> > +	if ((hotplug_bridges == 0 && normal_bridges == 1) ||
> > +	    (hotplug_bridges == 1 && normal_bridges == 0)) {
> 
>   if (hotplug_bridges + normal_bridges == 1) {
> 
> Don't repost just for this; I can fold it in if you agree.

I agree, looks better like that. Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources
  2018-04-16 10:34 ` [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources Mika Westerberg
@ 2018-04-25 22:38   ` Bjorn Helgaas
  2018-04-26 12:23     ` Mika Westerberg
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2018-04-25 22:38 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Mon, Apr 16, 2018 at 01:34:46PM +0300, Mika Westerberg wrote:
> When hot-adding a PCIe switch the way we currently distribute resources
> does not always work well because devices connected to the switch might
> need to have their MMIO resources aligned to something else than the
> default 1 MB boundary. For example Intel Gigabit ET2 quad port server
> adapter includes PCIe switch leading to 4 x GbE NIC devices that want
> to have their MMIO resources aligned to 2 MB boundary instead.
> 
> The current resource distribution code does not take this alignment into
> account and might try to add too much resources for the extension
> hotplug bridge(s). The resulting bridge window is too big which makes
> the resource assignment operation fail, and we are left with a bridge
> window with minimal amount (1 MB) of MMIO space.
> 
> Here is what happens when an Intel Gigabit ET2 quad port server adapter
> is hot-added:
> 
>   pci 0000:39:00.0: BAR 14: assigned [mem 0x53300000-0x6a0fffff]
>                                           ^^^^^^^^^^
>   pci 0000:3a:01.0: BAR 14: assigned [mem 0x53400000-0x547fffff]
>                                           ^^^^^^^^^^
> The above shows that the downstream bridge (3a:01.0) window is aligned
> to 2 MB instead of 1 MB as is the upstream bridge (39:00.0) window. The
> remaining MMIO space (0x15a00000) is assigned to the hotplug bridge
> (3a:04.0) but it fails:
> 
>   pci 0000:3a:04.0: BAR 14: no space for [mem size 0x15a00000]
>   pci 0000:3a:04.0: BAR 14: failed to assign [mem size 0x15a00000]
> 
> The MMIO resource is calculated as follows:
> 
>   start = 0x54800000
>   end = 0x54800000 + 0x15a00000 - 1 = 0x6a1fffff
> 
> This results bridge window [mem 0x54800000 - 0x6a1fffff] and it ends
> after the upstream bridge window [mem 0x53300000-0x6a0fffff] explaining
> the above failure. Because of this Linux falls back to the default
> allocation of 1 MB as can be seen from 'lspci' output:
> 
>  39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
>    3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
>    3a:04.0 Memory behind bridge: 53300000-533fffff [size=1M]
> 
> The hotplug bridge 3a:04.0 only occupies 1 MB MMIO window which is
> clearly not enough for extending the PCIe topology later if more devices
> are to be hot-added.
> 
> Fix this by substracting properly aligned non-hotplug downstream bridge
> window size from the remaining resources used for extension. After this
> change the resource allocation looks like:
> 
>   39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
>     3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
>     3a:04.0 Memory behind bridge: 54800000-6a0fffff [size=345M]
> 
> This matches the expectation. All the extra MMIO resource space (345 MB)
> is allocated to the extension hotplug bridge (3a:04.0).

Sorry, I've spent a lot of time trying to trace through this code, and
I'm still hopelessly confused.  Can you post the complete "lspci -vv"
output and the dmesg log (including the hot-add event) somewhere and
include a URL to it?

I think I understand the problem you're solving:

  - You have 366M, 1M-aligned, available for things on bus 3a
  - You assign 20M, 2M-aligned to 3a:01.0
  - This leaves 346M for other things on bus 3a, but it's not all
    contiguous because the 20M is in the middle.
  - The remaining 346M might be 1M on one side and 345M on the other
    (and there are many other possibilities, e.g., 3M + 343M, 5M +
    341M, ..., 345M + 1M).
  - The current code tries to assign all 346M to 3a:04.0, which
    fails because that space is not contiguous, so it falls back to
    allocating 1M, which works but is insufficient for future
    hot-adds.

Obviously this patch makes *this* situation work: it assigns 345M to
3a:04.0 and (I assume) leaves the 1M unused.  But I haven't been able
to convince myself that this patch works *in general*.

For example, what if we assigned the 20M from the end of the 366M
window instead of the beginning, so the 345M piece is below the 20M
and there's 1M left above it?  That is legal and should work, but I
suspect this patch would ignore the 345M piece and again assign 1M to
3a:04.0.

Or what if there are several hotplug bridges on bus 3a?  This example
has two, but there could be many more.

Or what if there are normal bridges as well as hotplug bridges on bus
3a?  Or if they're in arbitrary orders?

> Fixes: 1a5767725cec ("PCI: Distribute available resources to hotplug-capable bridges")
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: stable@vger.kernel.org

Given my confusion about this, I doubt this satisfies the stable
kernel "obviously correct" rule.

s/substracting/subtracting/ above

> ---
>  drivers/pci/setup-bus.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 072784f55ea5..eb3059fb7f63 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1878,6 +1878,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	resource_size_t available_mmio, resource_size_t available_mmio_pref)
>  {
>  	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
> +	resource_size_t io_start, mmio_start, mmio_pref_start;
>  	unsigned int normal_bridges = 0, hotplug_bridges = 0;
>  	struct resource *io_res, *mmio_res, *mmio_pref_res;
>  	struct pci_dev *dev, *bridge = bus->self;
> @@ -1942,11 +1943,16 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  			remaining_mmio_pref -= resource_size(res);
>  	}
>  
> +	io_start = io_res->start;
> +	mmio_start = mmio_res->start;
> +	mmio_pref_start = mmio_pref_res->start;
> +
>  	/*
>  	 * Go over devices on this bus and distribute the remaining
>  	 * resource space between hotplug bridges.
>  	 */
>  	for_each_pci_bridge(dev, bus) {
> +		resource_size_t align;
>  		struct pci_bus *b;
>  
>  		b = dev->subordinate;
> @@ -1964,7 +1970,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  				available_io, available_mmio,
>  				available_mmio_pref);
>  		} else if (dev->is_hotplug_bridge) {
> -			resource_size_t align, io, mmio, mmio_pref;
> +			resource_size_t io, mmio, mmio_pref;
>  
>  			/*
>  			 * Distribute available extra resources equally
> @@ -1977,11 +1983,13 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  			io = div64_ul(available_io, hotplug_bridges);
>  			io = min(ALIGN(io, align), remaining_io);
>  			remaining_io -= io;
> +			io_start += io;
>  
>  			align = pci_resource_alignment(bridge, mmio_res);
>  			mmio = div64_ul(available_mmio, hotplug_bridges);
>  			mmio = min(ALIGN(mmio, align), remaining_mmio);
>  			remaining_mmio -= mmio;
> +			mmio_start += mmio;
>  
>  			align = pci_resource_alignment(bridge, mmio_pref_res);
>  			mmio_pref = div64_ul(available_mmio_pref,
> @@ -1989,9 +1997,40 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  			mmio_pref = min(ALIGN(mmio_pref, align),
>  					remaining_mmio_pref);
>  			remaining_mmio_pref -= mmio_pref;
> +			mmio_pref_start += mmio_pref;
>  
>  			pci_bus_distribute_available_resources(b, add_list, io,
>  							       mmio, mmio_pref);
> +		} else {
> +			/*
> +			 * For normal bridges, track start of the parent
> +			 * bridge window to make sure we align the
> +			 * remaining space which is distributed to the
> +			 * hotplug bridges properly.
> +			 */
> +			resource_size_t aligned;
> +			struct resource *res;
> +
> +			res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
> +			io_start += resource_size(res);
> +			aligned = ALIGN(io_start,
> +					pci_resource_alignment(dev, res));
> +			if (aligned > io_start)
> +				remaining_io -= aligned - io_start;
> +
> +			res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
> +			mmio_start += resource_size(res);
> +			aligned = ALIGN(mmio_start,
> +					pci_resource_alignment(dev, res));
> +			if (aligned > mmio_start)
> +				remaining_mmio -= aligned - mmio_start;
> +
> +			res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
> +			mmio_pref_start += resource_size(res);
> +			aligned = ALIGN(mmio_pref_start,
> +					pci_resource_alignment(dev, res));
> +			if (aligned > mmio_pref_start)
> +				remaining_mmio_pref -= aligned - mmio_pref_start;
>  		}
>  	}
>  }
> -- 
> 2.16.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources
  2018-04-25 22:38   ` Bjorn Helgaas
@ 2018-04-26 12:23     ` Mika Westerberg
  2018-05-01 20:32       ` Bjorn Helgaas
  0 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-04-26 12:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Wed, Apr 25, 2018 at 05:38:54PM -0500, Bjorn Helgaas wrote:
> On Mon, Apr 16, 2018 at 01:34:46PM +0300, Mika Westerberg wrote:
> > When hot-adding a PCIe switch the way we currently distribute resources
> > does not always work well because devices connected to the switch might
> > need to have their MMIO resources aligned to something else than the
> > default 1 MB boundary. For example Intel Gigabit ET2 quad port server
> > adapter includes PCIe switch leading to 4 x GbE NIC devices that want
> > to have their MMIO resources aligned to 2 MB boundary instead.
> > 
> > The current resource distribution code does not take this alignment into
> > account and might try to add too much resources for the extension
> > hotplug bridge(s). The resulting bridge window is too big which makes
> > the resource assignment operation fail, and we are left with a bridge
> > window with minimal amount (1 MB) of MMIO space.
> > 
> > Here is what happens when an Intel Gigabit ET2 quad port server adapter
> > is hot-added:
> > 
> >   pci 0000:39:00.0: BAR 14: assigned [mem 0x53300000-0x6a0fffff]
> >                                           ^^^^^^^^^^
> >   pci 0000:3a:01.0: BAR 14: assigned [mem 0x53400000-0x547fffff]
> >                                           ^^^^^^^^^^
> > The above shows that the downstream bridge (3a:01.0) window is aligned
> > to 2 MB instead of 1 MB as is the upstream bridge (39:00.0) window. The
> > remaining MMIO space (0x15a00000) is assigned to the hotplug bridge
> > (3a:04.0) but it fails:
> > 
> >   pci 0000:3a:04.0: BAR 14: no space for [mem size 0x15a00000]
> >   pci 0000:3a:04.0: BAR 14: failed to assign [mem size 0x15a00000]
> > 
> > The MMIO resource is calculated as follows:
> > 
> >   start = 0x54800000
> >   end = 0x54800000 + 0x15a00000 - 1 = 0x6a1fffff
> > 
> > This results bridge window [mem 0x54800000 - 0x6a1fffff] and it ends
> > after the upstream bridge window [mem 0x53300000-0x6a0fffff] explaining
> > the above failure. Because of this Linux falls back to the default
> > allocation of 1 MB as can be seen from 'lspci' output:
> > 
> >  39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
> >    3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
> >    3a:04.0 Memory behind bridge: 53300000-533fffff [size=1M]
> > 
> > The hotplug bridge 3a:04.0 only occupies 1 MB MMIO window which is
> > clearly not enough for extending the PCIe topology later if more devices
> > are to be hot-added.
> > 
> > Fix this by substracting properly aligned non-hotplug downstream bridge
> > window size from the remaining resources used for extension. After this
> > change the resource allocation looks like:
> > 
> >   39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
> >     3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
> >     3a:04.0 Memory behind bridge: 54800000-6a0fffff [size=345M]
> > 
> > This matches the expectation. All the extra MMIO resource space (345 MB)
> > is allocated to the extension hotplug bridge (3a:04.0).
> 
> Sorry, I've spent a lot of time trying to trace through this code, and
> I'm still hopelessly confused.  Can you post the complete "lspci -vv"
> output and the dmesg log (including the hot-add event) somewhere and
> include a URL to it?

I sent you the logs and lspci output both with and without this patch
when I connect a full chain of 6 Thunderbolt devices where 3 of them
include those NICs with 4 ethernet ports. The resulting topology
includes total of 6 + 3 + 1 PCIe switches.

> I think I understand the problem you're solving:
> 
>   - You have 366M, 1M-aligned, available for things on bus 3a
>   - You assign 20M, 2M-aligned to 3a:01.0
>   - This leaves 346M for other things on bus 3a, but it's not all
>     contiguous because the 20M is in the middle.
>   - The remaining 346M might be 1M on one side and 345M on the other
>     (and there are many other possibilities, e.g., 3M + 343M, 5M +
>     341M, ..., 345M + 1M).
>   - The current code tries to assign all 346M to 3a:04.0, which
>     fails because that space is not contiguous, so it falls back to
>     allocating 1M, which works but is insufficient for future
>     hot-adds.

My understanding is that the 20M is aligned to 2M so we need to take
that into account when we distribute the remaining space which makes it
345 instead of 346 which it would be without the alignment.

> Obviously this patch makes *this* situation work: it assigns 345M to
> 3a:04.0 and (I assume) leaves the 1M unused.  But I haven't been able
> to convince myself that this patch works *in general*.

I've tested this patch with full chain of devices with all my three
Intel Gigabit ET2 quad port server adapters connected there along with
other devices and the issue does not happen.

> For example, what if we assigned the 20M from the end of the 366M
> window instead of the beginning, so the 345M piece is below the 20M
> and there's 1M left above it?  That is legal and should work, but I
> suspect this patch would ignore the 345M piece and again assign 1M to
> 3a:04.0.

It should work so that it first allocates resources for the non-hotplug
bridges and after that everything else is put to hotplug bridges.

> Or what if there are several hotplug bridges on bus 3a?  This example
> has two, but there could be many more.
> 
> Or what if there are normal bridges as well as hotplug bridges on bus
> 3a?  Or if they're in arbitrary orders?

Thunderbolt host router with two ports has such configuration where
there are two hotplug ports and two normal ports (there could be more)
and it is hot-added as well. At least that works. With the other
arbitrary scenarios, it is hard to say without actually testing it on a
real hardware.

> > Fixes: 1a5767725cec ("PCI: Distribute available resources to hotplug-capable bridges")
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Cc: stable@vger.kernel.org
> 
> Given my confusion about this, I doubt this satisfies the stable
> kernel "obviously correct" rule.

Fair enough.

> s/substracting/subtracting/ above

OK, thanks.

Also I'm fine dropping this patch altogether and just file a kernel
bugzilla with this information attached. Maybe someone else can provide
a better fix eventually. This is not really common situation anyway
because typically you have only PCIe endpoints included in a Thunderbolt
device (not PCIe switches with a bunch of endpoints connected).
Furthermore, I tried the same in Windows and it does not handle it
properly either ;-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources
  2018-04-26 12:23     ` Mika Westerberg
@ 2018-05-01 20:32       ` Bjorn Helgaas
  2018-05-03 12:39         ` Mika Westerberg
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2018-05-01 20:32 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Thu, Apr 26, 2018 at 03:23:33PM +0300, Mika Westerberg wrote:
> On Wed, Apr 25, 2018 at 05:38:54PM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 16, 2018 at 01:34:46PM +0300, Mika Westerberg wrote:
> > > When hot-adding a PCIe switch the way we currently distribute resources
> > > does not always work well because devices connected to the switch might
> > > need to have their MMIO resources aligned to something else than the
> > > default 1 MB boundary. For example Intel Gigabit ET2 quad port server
> > > adapter includes PCIe switch leading to 4 x GbE NIC devices that want
> > > to have their MMIO resources aligned to 2 MB boundary instead.
> > > 
> > > The current resource distribution code does not take this alignment into
> > > account and might try to add too much resources for the extension
> > > hotplug bridge(s). The resulting bridge window is too big which makes
> > > the resource assignment operation fail, and we are left with a bridge
> > > window with minimal amount (1 MB) of MMIO space.
> > > 
> > > Here is what happens when an Intel Gigabit ET2 quad port server adapter
> > > is hot-added:
> > > 
> > >   pci 0000:39:00.0: BAR 14: assigned [mem 0x53300000-0x6a0fffff]
> > >                                           ^^^^^^^^^^
> > >   pci 0000:3a:01.0: BAR 14: assigned [mem 0x53400000-0x547fffff]
> > >                                           ^^^^^^^^^^
> > > The above shows that the downstream bridge (3a:01.0) window is aligned
> > > to 2 MB instead of 1 MB as is the upstream bridge (39:00.0) window. The
> > > remaining MMIO space (0x15a00000) is assigned to the hotplug bridge
> > > (3a:04.0) but it fails:
> > > 
> > >   pci 0000:3a:04.0: BAR 14: no space for [mem size 0x15a00000]
> > >   pci 0000:3a:04.0: BAR 14: failed to assign [mem size 0x15a00000]
> > > 
> > > The MMIO resource is calculated as follows:
> > > 
> > >   start = 0x54800000
> > >   end = 0x54800000 + 0x15a00000 - 1 = 0x6a1fffff
> > > 
> > > This results bridge window [mem 0x54800000 - 0x6a1fffff] and it ends
> > > after the upstream bridge window [mem 0x53300000-0x6a0fffff] explaining
> > > the above failure. Because of this Linux falls back to the default
> > > allocation of 1 MB as can be seen from 'lspci' output:
> > > 
> > >  39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
> > >    3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
> > >    3a:04.0 Memory behind bridge: 53300000-533fffff [size=1M]
> > > 
> > > The hotplug bridge 3a:04.0 only occupies 1 MB MMIO window which is
> > > clearly not enough for extending the PCIe topology later if more devices
> > > are to be hot-added.
> > > 
> > > Fix this by substracting properly aligned non-hotplug downstream bridge
> > > window size from the remaining resources used for extension. After this
> > > change the resource allocation looks like:
> > > 
> > >   39:00.0 Memory behind bridge: 53300000-6a0fffff [size=366M]
> > >     3a:01.0 Memory behind bridge: 53400000-547fffff [size=20M]
> > >     3a:04.0 Memory behind bridge: 54800000-6a0fffff [size=345M]
> > > 
> > > This matches the expectation. All the extra MMIO resource space (345 MB)
> > > is allocated to the extension hotplug bridge (3a:04.0).
> > 
> > Sorry, I've spent a lot of time trying to trace through this code, and
> > I'm still hopelessly confused.  Can you post the complete "lspci -vv"
> > output and the dmesg log (including the hot-add event) somewhere and
> > include a URL to it?
> 
> I sent you the logs and lspci output both with and without this patch
> when I connect a full chain of 6 Thunderbolt devices where 3 of them
> include those NICs with 4 ethernet ports. The resulting topology
> includes total of 6 + 3 + 1 PCIe switches.

Thanks, I opened https://bugzilla.kernel.org/show_bug.cgi?id=199581
and attached the info you sent.

> > I think I understand the problem you're solving:
> > 
> >   - You have 366M, 1M-aligned, available for things on bus 3a
> >   - You assign 20M, 2M-aligned to 3a:01.0
> >   - This leaves 346M for other things on bus 3a, but it's not all
> >     contiguous because the 20M is in the middle.
> >   - The remaining 346M might be 1M on one side and 345M on the other
> >     (and there are many other possibilities, e.g., 3M + 343M, 5M +
> >     341M, ..., 345M + 1M).
> >   - The current code tries to assign all 346M to 3a:04.0, which
> >     fails because that space is not contiguous, so it falls back to
> >     allocating 1M, which works but is insufficient for future
> >     hot-adds.
> 
> My understanding is that the 20M is aligned to 2M so we need to take
> that into account when we distribute the remaining space which makes it
> 345 instead of 346 which it would be without the alignment.

I think that's what I said above, or did I miss something?

> > Obviously this patch makes *this* situation work: it assigns 345M to
> > 3a:04.0 and (I assume) leaves the 1M unused.  But I haven't been able
> > to convince myself that this patch works *in general*.
> 
> I've tested this patch with full chain of devices with all my three
> Intel Gigabit ET2 quad port server adapters connected there along with
> other devices and the issue does not happen.
> 
> > For example, what if we assigned the 20M from the end of the 366M
> > window instead of the beginning, so the 345M piece is below the 20M
> > and there's 1M left above it?  That is legal and should work, but I
> > suspect this patch would ignore the 345M piece and again assign 1M to
> > 3a:04.0.
> 
> It should work so that it first allocates resources for the non-hotplug
> bridges and after that everything else is put to hotplug bridges.
> 
> > Or what if there are several hotplug bridges on bus 3a?  This example
> > has two, but there could be many more.
> > 
> > Or what if there are normal bridges as well as hotplug bridges on bus
> > 3a?  Or if they're in arbitrary orders?
> 
> Thunderbolt host router with two ports has such configuration where
> there are two hotplug ports and two normal ports (there could be more)
> and it is hot-added as well. At least that works. With the other
> arbitrary scenarios, it is hard to say without actually testing it on a
> real hardware.

This is where it gets hard for me -- I'm not really comfortable if we
have to convince ourselves that code is correct by testing every
scenario.  It's a lot better if we can convince ourselves by reasoning
about what the code does.  That's not very reliable either, but if we
understand the code, we at least have a hope of being able to fix the
bugs we missed in our reasoning.

> Also I'm fine dropping this patch altogether and just file a kernel
> bugzilla with this information attached. Maybe someone else can provide
> a better fix eventually. This is not really common situation anyway
> because typically you have only PCIe endpoints included in a Thunderbolt
> device (not PCIe switches with a bunch of endpoints connected).
> Furthermore, I tried the same in Windows and it does not handle it
> properly either ;-)

OK, I opened the bugzilla and attached the info.  Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-04-16 10:34 ` [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume Mika Westerberg
@ 2018-05-01 21:52   ` Bjorn Helgaas
  2018-05-02 11:55     ` Mika Westerberg
  2018-05-04  7:18     ` Lukas Wunner
  0 siblings, 2 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2018-05-01 21:52 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

Hi Mika,

On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> After suspend/resume cycle the Presence Detect or Data Link Layer Status
> Changed bits might be set and if we don't clear them those events will
> not fire anymore and nothing happens for instance when a device is now
> hot-unplugged.
> 
> Fix this by clearing those bits in a newly introduced function
> pcie_reenable_notification(). This should be fine because immediately
> after we check if the adapter is still present by reading directly from
> the status register.
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: stable@vger.kernel.org

I want to understand why we need to handle this differently between
the boot-time probe path and the resume path.

This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
resume path:

  pciehp_resume
    pcie_reenable_notification
      # clear PDC DLLSC
      pcie_enable_notification
        # set DLLSCE ABPE/PDCE HPIE CCIE
    pciehp_get_adapter_status
      # read PCI_EXP_SLTSTA

We used to clear PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
probe path:

  pciehp_probe
    pcie_init
      # clear PDC ABP PFD MRLSC CC DLSCC
    pcie_init_notification
      pciehp_request_irq
      pcie_enable_notification
        # set DLLSCE ABPE/PDCE HPIE CCIE
    pciehp_get_adapter_status
      # read PCI_EXP_SLTSTA
    pciehp_get_power_status

db63d40017a5 ("PCI: pciehp: Do not clear Presence Detect Changed
during initialization") changed the probe path so we don't clear
PCI_EXP_SLTSTA_PDC there anymore.  This was so we wouldn't miss a PDC
interrupt that happened before probe.

Why can't we handle probe and resume the same way?  They look quite
similar.

You say this patch should be fine because we read SLTSTA immediately
after clearing PDC and DLLSC.  But we also read SLTSTA in the probe
path, so I'm not sure why we need to clear PDC and DLLSC for resume
but not for probe.

> ---
>  drivers/pci/hotplug/pciehp.h      |  2 +-
>  drivers/pci/hotplug/pciehp_core.c |  2 +-
>  drivers/pci/hotplug/pciehp_hpc.c  | 13 ++++++++++++-
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index 88e917c9120f..5f892065585e 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -121,7 +121,7 @@ struct controller *pcie_init(struct pcie_device *dev);
>  int pcie_init_notification(struct controller *ctrl);
>  int pciehp_enable_slot(struct slot *p_slot);
>  int pciehp_disable_slot(struct slot *p_slot);
> -void pcie_enable_notification(struct controller *ctrl);
> +void pcie_reenable_notification(struct controller *ctrl);
>  int pciehp_power_on_slot(struct slot *slot);
>  void pciehp_power_off_slot(struct slot *slot);
>  void pciehp_get_power_status(struct slot *slot, u8 *status);
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 332b723ff9e6..44a6a63802d5 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -283,7 +283,7 @@ static int pciehp_resume(struct pcie_device *dev)
>  	ctrl = get_service_data(dev);
>  
>  	/* reinitialize the chipset's event detection logic */
> -	pcie_enable_notification(ctrl);
> +	pcie_reenable_notification(ctrl);
>  
>  	slot = ctrl->slot;
>  
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 18a42f8f5dc5..98ea75aa32c7 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -659,7 +659,7 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
>  	return handled;
>  }
>  
> -void pcie_enable_notification(struct controller *ctrl)
> +static void pcie_enable_notification(struct controller *ctrl)
>  {
>  	u16 cmd, mask;
>  
> @@ -697,6 +697,17 @@ void pcie_enable_notification(struct controller *ctrl)
>  		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
>  }
>  
> +void pcie_reenable_notification(struct controller *ctrl)
> +{
> +	/*
> +	 * Clear both Presence and Data Link Layer Changed to make sure
> +	 * those events still fire after we have re-enabled them.
> +	 */
> +	pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_SLTSTA,
> +				   PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
> +	pcie_enable_notification(ctrl);
> +}
> +
>  static void pcie_disable_notification(struct controller *ctrl)
>  {
>  	u16 mask;
> -- 
> 2.16.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-01 21:52   ` Bjorn Helgaas
@ 2018-05-02 11:55     ` Mika Westerberg
  2018-05-02 13:41       ` Bjorn Helgaas
  2018-05-04  7:18     ` Lukas Wunner
  1 sibling, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-05-02 11:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Tue, May 01, 2018 at 04:52:11PM -0500, Bjorn Helgaas wrote:
> Hi Mika,
> 
> On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> > After suspend/resume cycle the Presence Detect or Data Link Layer Status
> > Changed bits might be set and if we don't clear them those events will
> > not fire anymore and nothing happens for instance when a device is now
> > hot-unplugged.
> > 
> > Fix this by clearing those bits in a newly introduced function
> > pcie_reenable_notification(). This should be fine because immediately
> > after we check if the adapter is still present by reading directly from
> > the status register.
> > 
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Cc: stable@vger.kernel.org
> 
> I want to understand why we need to handle this differently between
> the boot-time probe path and the resume path.
> 
> This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> resume path:
> 
>   pciehp_resume
>     pcie_reenable_notification
>       # clear PDC DLLSC
>       pcie_enable_notification
>         # set DLLSCE ABPE/PDCE HPIE CCIE
>     pciehp_get_adapter_status
>       # read PCI_EXP_SLTSTA
> 
> We used to clear PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> probe path:
> 
>   pciehp_probe
>     pcie_init
>       # clear PDC ABP PFD MRLSC CC DLSCC
>     pcie_init_notification
>       pciehp_request_irq
>       pcie_enable_notification
>         # set DLLSCE ABPE/PDCE HPIE CCIE
>     pciehp_get_adapter_status
>       # read PCI_EXP_SLTSTA
>     pciehp_get_power_status
> 
> db63d40017a5 ("PCI: pciehp: Do not clear Presence Detect Changed
> during initialization") changed the probe path so we don't clear
> PCI_EXP_SLTSTA_PDC there anymore.  This was so we wouldn't miss a PDC
> interrupt that happened before probe.
> 
> Why can't we handle probe and resume the same way?  They look quite
> similar.
> 
> You say this patch should be fine because we read SLTSTA immediately
> after clearing PDC and DLLSC.  But we also read SLTSTA in the probe
> path, so I'm not sure why we need to clear PDC and DLLSC for resume
> but not for probe.

On probe path we read status but here is what it does:

	pcie_init_notification()
	...
        pciehp_get_adapter_status(slot, &occupied);
	...
        if (occupied && pciehp_force) {
                mutex_lock(&slot->hotplug_lock);
                pciehp_enable_slot(slot);
                mutex_unlock(&slot->hotplug_lock);
        }

If you don't have "pciehp.pciehp_force=1" in your kernel command line
you miss the fact that the card is already there. Obviously you can't
expect ordinary users to pass random command line options to get their
already connected device detected by Linux.

So the reason why in probe we don't clear PDC is that once the interrupt
is unmasked, you get an interrupt and the card gets detected properly.

On resume path we already check whether the card is there or not and
handle it accordingly. However, if we don't clear PDC and DLLSC bits we
will never get hotplug interrupt again.

Now, you ask why can't we handle probe and resume the same way? I think
we can if we could get rid of that pciehp_force thing but it seems to
fix an issue on some HP hardware. See 9e5858244926 ("pciehp: don't
enable slot unless forced").

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-02 11:55     ` Mika Westerberg
@ 2018-05-02 13:41       ` Bjorn Helgaas
  2018-05-03 10:42         ` Mika Westerberg
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2018-05-02 13:41 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Wed, May 02, 2018 at 02:55:09PM +0300, Mika Westerberg wrote:
> On Tue, May 01, 2018 at 04:52:11PM -0500, Bjorn Helgaas wrote:
> > Hi Mika,
> > 
> > On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> > > After suspend/resume cycle the Presence Detect or Data Link Layer Status
> > > Changed bits might be set and if we don't clear them those events will
> > > not fire anymore and nothing happens for instance when a device is now
> > > hot-unplugged.
> > > 
> > > Fix this by clearing those bits in a newly introduced function
> > > pcie_reenable_notification(). This should be fine because immediately
> > > after we check if the adapter is still present by reading directly from
> > > the status register.
> > > 
> > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > Cc: stable@vger.kernel.org
> > 
> > I want to understand why we need to handle this differently between
> > the boot-time probe path and the resume path.
> > 
> > This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > resume path:
> > 
> >   pciehp_resume
> >     pcie_reenable_notification
> >       # clear PDC DLLSC
> >       pcie_enable_notification
> >         # set DLLSCE ABPE/PDCE HPIE CCIE
> >     pciehp_get_adapter_status
> >       # read PCI_EXP_SLTSTA
> > 
> > We used to clear PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > probe path:
> > 
> >   pciehp_probe
> >     pcie_init
> >       # clear PDC ABP PFD MRLSC CC DLSCC
> >     pcie_init_notification
> >       pciehp_request_irq
> >       pcie_enable_notification
> >         # set DLLSCE ABPE/PDCE HPIE CCIE
> >     pciehp_get_adapter_status
> >       # read PCI_EXP_SLTSTA
> >     pciehp_get_power_status
> > 
> > db63d40017a5 ("PCI: pciehp: Do not clear Presence Detect Changed
> > during initialization") changed the probe path so we don't clear
> > PCI_EXP_SLTSTA_PDC there anymore.  This was so we wouldn't miss a PDC
> > interrupt that happened before probe.
> > 
> > Why can't we handle probe and resume the same way?  They look quite
> > similar.
> > 
> > You say this patch should be fine because we read SLTSTA immediately
> > after clearing PDC and DLLSC.  But we also read SLTSTA in the probe
> > path, so I'm not sure why we need to clear PDC and DLLSC for resume
> > but not for probe.
> 
> On probe path we read status but here is what it does:
> 
> 	pcie_init_notification()
> 	...
>         pciehp_get_adapter_status(slot, &occupied);
> 	...
>         if (occupied && pciehp_force) {
>                 mutex_lock(&slot->hotplug_lock);
>                 pciehp_enable_slot(slot);
>                 mutex_unlock(&slot->hotplug_lock);
>         }
> 
> If you don't have "pciehp.pciehp_force=1" in your kernel command line
> you miss the fact that the card is already there. Obviously you can't
> expect ordinary users to pass random command line options to get their
> already connected device detected by Linux.

Yeah, definitely not, that's really ugly.

> So the reason why in probe we don't clear PDC is that once the interrupt
> is unmasked, you get an interrupt and the card gets detected properly.
> 
> On resume path we already check whether the card is there or not and
> handle it accordingly. However, if we don't clear PDC and DLLSC bits we
> will never get hotplug interrupt again.
> 
> Now, you ask why can't we handle probe and resume the same way? I think
> we can if we could get rid of that pciehp_force thing but it seems to
> fix an issue on some HP hardware. See 9e5858244926 ("pciehp: don't
> enable slot unless forced").

Ugh.  That's really ancient history.  I would *love* to get rid of
pciehp.pciehp_force.  There's not much detail in 9e5858244926, but
maybe with some digging we can figure out something.

I'd rather do the digging now and try to simplify this area instead of
adding another tweak.

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used
       [not found]   ` <20180502204932.GG11698@bhelgaas-glaptop.roam.corp.google.com>
@ 2018-05-03 10:22     ` Mika Westerberg
  2018-05-05  0:04       ` Bjorn Helgaas
  0 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-05-03 10:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Wed, May 02, 2018 at 03:49:32PM -0500, Bjorn Helgaas wrote:
> On Mon, Apr 16, 2018 at 01:34:48PM +0300, Mika Westerberg wrote:
> > When a system is using native PCIe hotplug for Thunderbolt and the
> > controller is not enabled for full RTD3 (runtime D3) it will be only
> > present in the system when there is a device connected. This pretty much
> > follows the BIOS assisted hotplug behaviour.
> 
> This is a tangent, but what exactly does "controller is not enabled
> for full RTD3 (runtime D3)" mean?  The way that's worded sounds like
> it *could* be a setting in a PCI config register, but I suspect it's
> really something in Linux, e.g., some bit in struct pci_dev or device?

It means that the controller can to into D3 runtime. With BIOS assisted
mode and native PCIe mode, the controller is hot-removed when there is
nothing connected. In RTD3 mode it is there always and the OS expected
to put it into D3 when idle using standard PCI PM registers.

> > Thunderbolt host router integrated PCIe switch has two additional PCIe
> > downstream bridges that lead to NHI (Thunderbolt host controller) and xHCI
> > (USB 3 host controller) respectively. These downstream bridges are not
> > marked being hotplug capable. Reason for that is to preserve resources.
> > Otherwise the OS would distribute remaining resources between all
> > downstream bridges making these two bridges consume precious resources
> > of the actual hotplug bridges.
> > 
> > Now, because these two bridges are not marked being hotplug capable the OS
> > will not enable hotplug interrupt for them either and will not receive
> > interrupt when devices behind them are hot-added. Solution to this is
> > that the BIOS sends ACPI Notify() to the root port let the OS know it
> > needs to rescan for added and/or removed devices.
> 
> We're building stuff based on "is_hotplug_bridge", but I'm not sure
> that bit is really useful.
> 
> set_pcie_hotplug_bridge() sets it for downstream ports with
> PCI_EXP_SLTCAP_HPC, which is easy enough to understand.
> 
> In acpiphp, check_hotplug_bridge() sets in some scenario that I can't
> quite figure out.  I assume it's based on something in the namespace.
> 
> But it seems like the whole point of this patch is that even if a
> bridge doesn't have "is_hotplug_bridge" set, ACPI hotplug can hot-add
> devices below it.

Correct.

> So I'm not sure what "is_hotplug_bridge" is really telling us.  Is it
> just a hint about how many resources we want to assign to the bridge,
> i.e., we assign only a few when it's not set and more if it is set?

Good question. I did not invent the flag so hard to say. I've been using
it because IIRC you prefer it.

> > Here is how the mechanism is supposed to work when a Thunderbolt
> > endpoint is connected to one of the ports. In case of a standard USB-C
> > device only the xHCI is hot-added otherwise steps are the same.
> > 
> > 1. Initially there is only the PCIe root port that is controlled by
> >    the pciehp driver
> > 
> >   00:1b.0 (Hotplug+) --
> > 
> > 2. Then we get native PCIe hotplug interrupt and once it is handled the
> >    topology looks as following
> > 
> >   00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 --
> >                                   +- 02:01.0 (HotPlug+)
> >                                   \- 02:02.0 --
> > 
> > 3. Bridges 02:00.0 and 02:02.0 are not marked as hotplug capable and
> 
> By "not marked as hotplug capable", I guess you mean
> PCI_EXP_SLTCAP_HPC is not set, right?

Right.

> >    they don't have anything behind them currently. Bridge 02:01.0 is
> >    hotplug capable and used for extending the topology. At this point
> >    the required PCIe devices are enabled and ACPI Notify() is sent to
> >    the root port. The resulting topology is expected to look like
> > 
> >   00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 -- Thunderbolt host controller
> >                                   +- 02:01.0 (HotPlug+)
> >                                   \- 02:02.0 -- xHCI host controller
> > 
> > However, the current ACPI hotplug implementation scans the whole 00:1b.0
> > hotplug slot and everything behind it regardless whether native PCIe is
> > used or not, and it expects that the BIOS has configured bridge
> > resources upfront. If that's not the case it assigns resources using
> > minimal allocation (everything currently found just barely fit)
> > preventing future extension. In addition to that, if there is another
> > native PCIe hotplug going on we may find the new PCIe switch only
> > partially ready (all links are not fully trained yet) confusing pciehp
> > when it finally starts to enumerate for new devices.
> > 
> > To make this work better with the native PCIe hotplug driver (pciehp),
> > we let it handle all slot management and resource allocation for hotplug
> > bridges and restrict ACPI hotplug to non-hotplug bridges.
> > 
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/pci/hotplug/acpiphp.h      |  1 +
> >  drivers/pci/hotplug/acpiphp_glue.c | 73 ++++++++++++++++++++++++++++++--------
> >  2 files changed, 59 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/pci/hotplug/acpiphp.h b/drivers/pci/hotplug/acpiphp.h
> > index e438a2d734f2..8dcfd623ef1d 100644
> > --- a/drivers/pci/hotplug/acpiphp.h
> > +++ b/drivers/pci/hotplug/acpiphp.h
> > @@ -158,6 +158,7 @@ struct acpiphp_attention_info
> >  
> >  #define SLOT_ENABLED		(0x00000001)
> >  #define SLOT_IS_GOING_AWAY	(0x00000002)
> > +#define SLOT_IS_NATIVE		(0x00000004)
> >  
> >  /* function flags */
> >  
> > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > index b45b375c0e6c..5efa21cdddc9 100644
> > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -282,6 +282,9 @@ static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
> >  	slot->device = device;
> >  	INIT_LIST_HEAD(&slot->funcs);
> >  
> > +	if (pdev && pciehp_is_native(pdev))
> > +		slot->flags |= SLOT_IS_NATIVE;
> 
> If I understand correctly, pciehp_is_native() checks
> root->osc_control_set, so for everything under the same host bridge
> (PNP0A03 device), it gives the same answer.

Correct.

> If so, the name "SLOT_IS_NATIVE" seems a little misleading because
> there are very often several slots under the same host bridge, and
> they will be either all native or all non-native.

I used it because it works with pciehp_is_native(). I can rename it if
you have a better name.

> But I must be missing something because it seems like the whole point
> of this patch is to treat part of the Thunderbolt host router as
> native and another part as non-native.  Help, I'm confused :)

No, you are not confused. It is exactly what we are trying to fix with
this patch. These are systems (already out there in the wild) that are
in "native" mode. As I tried to explain in the changelog hotplugging
PCIe devices to hotplug PCIe downstream ports is done using native PCIe
hotplug. However, xHCI (USB 3) controller and NHI (Thunderbolt
controller) are hot-added using ACPI Notify(). Before you ask, no it is
not violating any spec because these two downstream ports are not under
control of pciehp (they don't have hotplug slot capability set):

  00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 -- Thunderbolt host controller
                                  +- 02:01.0 (HotPlug+)
                                  \- 02:02.0 -- xHCI host controller

Hope this clarifies.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-02 13:41       ` Bjorn Helgaas
@ 2018-05-03 10:42         ` Mika Westerberg
  2018-05-03 23:01           ` Bjorn Helgaas
  0 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-05-03 10:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Wed, May 02, 2018 at 08:41:48AM -0500, Bjorn Helgaas wrote:
> On Wed, May 02, 2018 at 02:55:09PM +0300, Mika Westerberg wrote:
> > On Tue, May 01, 2018 at 04:52:11PM -0500, Bjorn Helgaas wrote:
> > > Hi Mika,
> > > 
> > > On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> > > > After suspend/resume cycle the Presence Detect or Data Link Layer Status
> > > > Changed bits might be set and if we don't clear them those events will
> > > > not fire anymore and nothing happens for instance when a device is now
> > > > hot-unplugged.
> > > > 
> > > > Fix this by clearing those bits in a newly introduced function
> > > > pcie_reenable_notification(). This should be fine because immediately
> > > > after we check if the adapter is still present by reading directly from
> > > > the status register.
> > > > 
> > > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > > Cc: stable@vger.kernel.org
> > > 
> > > I want to understand why we need to handle this differently between
> > > the boot-time probe path and the resume path.
> > > 
> > > This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > > resume path:
> > > 
> > >   pciehp_resume
> > >     pcie_reenable_notification
> > >       # clear PDC DLLSC
> > >       pcie_enable_notification
> > >         # set DLLSCE ABPE/PDCE HPIE CCIE
> > >     pciehp_get_adapter_status
> > >       # read PCI_EXP_SLTSTA
> > > 
> > > We used to clear PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > > probe path:
> > > 
> > >   pciehp_probe
> > >     pcie_init
> > >       # clear PDC ABP PFD MRLSC CC DLSCC
> > >     pcie_init_notification
> > >       pciehp_request_irq
> > >       pcie_enable_notification
> > >         # set DLLSCE ABPE/PDCE HPIE CCIE
> > >     pciehp_get_adapter_status
> > >       # read PCI_EXP_SLTSTA
> > >     pciehp_get_power_status
> > > 
> > > db63d40017a5 ("PCI: pciehp: Do not clear Presence Detect Changed
> > > during initialization") changed the probe path so we don't clear
> > > PCI_EXP_SLTSTA_PDC there anymore.  This was so we wouldn't miss a PDC
> > > interrupt that happened before probe.
> > > 
> > > Why can't we handle probe and resume the same way?  They look quite
> > > similar.
> > > 
> > > You say this patch should be fine because we read SLTSTA immediately
> > > after clearing PDC and DLLSC.  But we also read SLTSTA in the probe
> > > path, so I'm not sure why we need to clear PDC and DLLSC for resume
> > > but not for probe.
> > 
> > On probe path we read status but here is what it does:
> > 
> > 	pcie_init_notification()
> > 	...
> >         pciehp_get_adapter_status(slot, &occupied);
> > 	...
> >         if (occupied && pciehp_force) {
> >                 mutex_lock(&slot->hotplug_lock);
> >                 pciehp_enable_slot(slot);
> >                 mutex_unlock(&slot->hotplug_lock);
> >         }
> > 
> > If you don't have "pciehp.pciehp_force=1" in your kernel command line
> > you miss the fact that the card is already there. Obviously you can't
> > expect ordinary users to pass random command line options to get their
> > already connected device detected by Linux.
> 
> Yeah, definitely not, that's really ugly.
> 
> > So the reason why in probe we don't clear PDC is that once the interrupt
> > is unmasked, you get an interrupt and the card gets detected properly.
> > 
> > On resume path we already check whether the card is there or not and
> > handle it accordingly. However, if we don't clear PDC and DLLSC bits we
> > will never get hotplug interrupt again.
> > 
> > Now, you ask why can't we handle probe and resume the same way? I think
> > we can if we could get rid of that pciehp_force thing but it seems to
> > fix an issue on some HP hardware. See 9e5858244926 ("pciehp: don't
> > enable slot unless forced").
> 
> Ugh.  That's really ancient history.  I would *love* to get rid of
> pciehp.pciehp_force.  There's not much detail in 9e5858244926, but
> maybe with some digging we can figure out something.
> 
> I'd rather do the digging now and try to simplify this area instead of
> adding another tweak.

I did some digging but unfortunately it is still not clear what issue
9e5858244926 is fixing. There is no bugzilla link and I was not able to
find any discussion around this either.

However, I think currently pciehp_force=1 does not actually do what it
was intended to do. Reason is that portdrv core already asks BIOS
whether native PCIe is allowed and if not, it does not set
PCIE_PORT_SERVICE_HP and that prevents the whole device to be created.
So even if you specify pciehp_force=1 in the command line, it does not
bypass the BIOS setting.

Furthermore we have had similar check in pciehp_resume() but it was
removed with 87683e22c646 ("PCI: pciehp: Always implement resume,
regardless of pciehp_force param") because it broke resume.

If I understand correctly you want me to change the driver so that I
remove pciehp_force check from probe. Then I can revert db63d40017a5
("PCI: pciehp: Do not clear Presence Detect Changed during
initialization") and that makes both probe path and resume path similar
(when this patch is included). Is that correct? I can do that in the
next version of the patch series.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources
  2018-05-01 20:32       ` Bjorn Helgaas
@ 2018-05-03 12:39         ` Mika Westerberg
  0 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-05-03 12:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Tue, May 01, 2018 at 03:32:46PM -0500, Bjorn Helgaas wrote:
> This is where it gets hard for me -- I'm not really comfortable if we
> have to convince ourselves that code is correct by testing every
> scenario.  It's a lot better if we can convince ourselves by reasoning
> about what the code does.  That's not very reliable either, but if we
> understand the code, we at least have a hope of being able to fix the
> bugs we missed in our reasoning.

I took another look at the code and we can calculate everything upfront
before resources get distributed to hotplug bridges. I also tried and it
still works on my test systems. Does the below patch look more
acceptable to you?

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 072784f55ea5..cbc80dd5e8d8 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1881,6 +1881,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 	unsigned int normal_bridges = 0, hotplug_bridges = 0;
 	struct resource *io_res, *mmio_res, *mmio_pref_res;
 	struct pci_dev *dev, *bridge = bus->self;
+	resource_size_t align, io_align, mem_align;
 
 	io_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
 	mmio_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
@@ -1919,27 +1920,43 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 			normal_bridges++;
 	}
 
+	io_align = window_alignment(bus, IORESOURCE_IO);
+	mem_align = window_alignment(bus, IORESOURCE_MEM);
+
 	for_each_pci_bridge(dev, bus) {
-		const struct resource *res;
+		struct resource *res;
 
 		if (dev->is_hotplug_bridge)
 			continue;
 
 		/*
 		 * Reduce the available resource space by what the
-		 * bridge and devices below it occupy.
+		 * bridge and devices below it occupy taking into
+		 * account alignment if it differs from the default.
 		 */
 		res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
-		if (!res->parent && available_io > resource_size(res))
+		if (!res->parent && available_io > resource_size(res)) {
 			remaining_io -= resource_size(res);
+			align = pci_resource_alignment(dev, res);
+			if (align > io_align)
+				remaining_io -= align - io_align;
+		}
 
 		res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
-		if (!res->parent && available_mmio > resource_size(res))
+		if (!res->parent && available_mmio > resource_size(res)) {
 			remaining_mmio -= resource_size(res);
+			align = pci_resource_alignment(dev, res);
+			if (align > mem_align)
+				remaining_mmio -= align - mem_align;
+		}
 
 		res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
-		if (!res->parent && available_mmio_pref > resource_size(res))
+		if (!res->parent && available_mmio_pref > resource_size(res)) {
 			remaining_mmio_pref -= resource_size(res);
+			align = pci_resource_alignment(dev, res);
+			if (align > mem_align)
+				remaining_mmio_pref -= align - mem_align;
+		}
 	}
 
 	/*
@@ -1964,7 +1981,7 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 				available_io, available_mmio,
 				available_mmio_pref);
 		} else if (dev->is_hotplug_bridge) {
-			resource_size_t align, io, mmio, mmio_pref;
+			resource_size_t io, mmio, mmio_pref;
 
 			/*
 			 * Distribute available extra resources equally
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-03 10:42         ` Mika Westerberg
@ 2018-05-03 23:01           ` Bjorn Helgaas
  2018-05-04  7:20             ` Mika Westerberg
  2018-05-30 10:40             ` Lukas Wunner
  0 siblings, 2 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2018-05-03 23:01 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Thu, May 03, 2018 at 01:42:50PM +0300, Mika Westerberg wrote:
> On Wed, May 02, 2018 at 08:41:48AM -0500, Bjorn Helgaas wrote:
> > On Wed, May 02, 2018 at 02:55:09PM +0300, Mika Westerberg wrote:
> > > On Tue, May 01, 2018 at 04:52:11PM -0500, Bjorn Helgaas wrote:
> > > > Hi Mika,
> > > > 
> > > > On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> > > > > After suspend/resume cycle the Presence Detect or Data Link Layer Status
> > > > > Changed bits might be set and if we don't clear them those events will
> > > > > not fire anymore and nothing happens for instance when a device is now
> > > > > hot-unplugged.
> > > > > 
> > > > > Fix this by clearing those bits in a newly introduced function
> > > > > pcie_reenable_notification(). This should be fine because immediately
> > > > > after we check if the adapter is still present by reading directly from
> > > > > the status register.
> > > > > 
> > > > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > > > Cc: stable@vger.kernel.org
> > > > 
> > > > I want to understand why we need to handle this differently between
> > > > the boot-time probe path and the resume path.
> > > > 
> > > > This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > > > resume path:
> > > > 
> > > >   pciehp_resume
> > > >     pcie_reenable_notification
> > > >       # clear PDC DLLSC
> > > >       pcie_enable_notification
> > > >         # set DLLSCE ABPE/PDCE HPIE CCIE
> > > >     pciehp_get_adapter_status
> > > >       # read PCI_EXP_SLTSTA
> > > > 
> > > > We used to clear PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > > > probe path:
> > > > 
> > > >   pciehp_probe
> > > >     pcie_init
> > > >       # clear PDC ABP PFD MRLSC CC DLSCC
> > > >     pcie_init_notification
> > > >       pciehp_request_irq
> > > >       pcie_enable_notification
> > > >         # set DLLSCE ABPE/PDCE HPIE CCIE
> > > >     pciehp_get_adapter_status
> > > >       # read PCI_EXP_SLTSTA
> > > >     pciehp_get_power_status
> > > > 
> > > > db63d40017a5 ("PCI: pciehp: Do not clear Presence Detect Changed
> > > > during initialization") changed the probe path so we don't clear
> > > > PCI_EXP_SLTSTA_PDC there anymore.  This was so we wouldn't miss a PDC
> > > > interrupt that happened before probe.
> > > > 
> > > > Why can't we handle probe and resume the same way?  They look quite
> > > > similar.
> > > > 
> > > > You say this patch should be fine because we read SLTSTA immediately
> > > > after clearing PDC and DLLSC.  But we also read SLTSTA in the probe
> > > > path, so I'm not sure why we need to clear PDC and DLLSC for resume
> > > > but not for probe.
> > > 
> > > On probe path we read status but here is what it does:
> > > 
> > > 	pcie_init_notification()
> > > 	...
> > >         pciehp_get_adapter_status(slot, &occupied);
> > > 	...
> > >         if (occupied && pciehp_force) {
> > >                 mutex_lock(&slot->hotplug_lock);
> > >                 pciehp_enable_slot(slot);
> > >                 mutex_unlock(&slot->hotplug_lock);
> > >         }
> > > 
> > > If you don't have "pciehp.pciehp_force=1" in your kernel command line
> > > you miss the fact that the card is already there. Obviously you can't
> > > expect ordinary users to pass random command line options to get their
> > > already connected device detected by Linux.
> > 
> > Yeah, definitely not, that's really ugly.
> > 
> > > So the reason why in probe we don't clear PDC is that once the interrupt
> > > is unmasked, you get an interrupt and the card gets detected properly.
> > > 
> > > On resume path we already check whether the card is there or not and
> > > handle it accordingly. However, if we don't clear PDC and DLLSC bits we
> > > will never get hotplug interrupt again.
> > > 
> > > Now, you ask why can't we handle probe and resume the same way? I think
> > > we can if we could get rid of that pciehp_force thing but it seems to
> > > fix an issue on some HP hardware. See 9e5858244926 ("pciehp: don't
> > > enable slot unless forced").
> > 
> > Ugh.  That's really ancient history.  I would *love* to get rid of
> > pciehp.pciehp_force.  There's not much detail in 9e5858244926, but
> > maybe with some digging we can figure out something.
> > 
> > I'd rather do the digging now and try to simplify this area instead of
> > adding another tweak.
> 
> I did some digging but unfortunately it is still not clear what issue
> 9e5858244926 is fixing. There is no bugzilla link and I was not able to
> find any discussion around this either.
> 
> However, I think currently pciehp_force=1 does not actually do what it
> was intended to do. Reason is that portdrv core already asks BIOS
> whether native PCIe is allowed and if not, it does not set
> PCIE_PORT_SERVICE_HP and that prevents the whole device to be created.
> So even if you specify pciehp_force=1 in the command line, it does not
> bypass the BIOS setting.
> 
> Furthermore we have had similar check in pciehp_resume() but it was
> removed with 87683e22c646 ("PCI: pciehp: Always implement resume,
> regardless of pciehp_force param") because it broke resume.
> 
> If I understand correctly you want me to change the driver so that I
> remove pciehp_force check from probe. Then I can revert db63d40017a5
> ("PCI: pciehp: Do not clear Presence Detect Changed during
> initialization") and that makes both probe path and resume path similar
> (when this patch is included). Is that correct? I can do that in the
> next version of the patch series.

If you think we can remove pciehp_force, that would be awesome.  This
should be a separate patch all by itself, of course, and include your
reasoning above.

I would also love to revert db63d40017a5 ("PCI: pciehp: Do not clear
Presence Detect Changed during initialization") because I'm not
convinced that trying to handle interrupts that happened before
binding the driver makes sense.  It *would* make sense to me to enable
interrupts, clear the "changed" status bits so future changes will
cause interrupts, and check the "state" status bits and act on them.

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-01 21:52   ` Bjorn Helgaas
  2018-05-02 11:55     ` Mika Westerberg
@ 2018-05-04  7:18     ` Lukas Wunner
  2018-05-04  8:02       ` Mika Westerberg
  1 sibling, 1 reply; 30+ messages in thread
From: Lukas Wunner @ 2018-05-04  7:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Bjorn Helgaas, Rafael J. Wysocki, Len Brown,
	Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, linux-pci, linux-acpi

On Tue, May 01, 2018 at 04:52:11PM -0500, Bjorn Helgaas wrote:
> On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> > After suspend/resume cycle the Presence Detect or Data Link Layer Status
> > Changed bits might be set and if we don't clear them those events will
> > not fire anymore and nothing happens for instance when a device is now
> > hot-unplugged.
> > 
> > Fix this by clearing those bits in a newly introduced function
> > pcie_reenable_notification(). This should be fine because immediately
> > after we check if the adapter is still present by reading directly from
> > the status register.
> 
> I want to understand why we need to handle this differently between
> the boot-time probe path and the resume path.
> 
> This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> resume path:
> 
>   pciehp_resume
>     pcie_reenable_notification
>       # clear PDC DLLSC
>       pcie_enable_notification
>         # set DLLSCE ABPE/PDCE HPIE CCIE
>     pciehp_get_adapter_status
>       # read PCI_EXP_SLTSTA

The Slot Control register is already written during the ->resume_noirq
phase via:

  pci_pm_resume_noirq
    pci_pm_default_resume_early
    pci_restore_state
      pci_restore_pcie_state

>From that point on, slot events are enabled, but they're not received
because interrupts are disabled until the ->resume_early phase commences.

My guess is that if the hotplug port uses edge-triggered MSI/MSI-X,
an interrupt may have already been sent during ->resume_noirq, but
was lost because interrupts were disabled.  Clearing the Slot Status
register thus acknowledges any lost interrupts.

If this theory is correct, it should probably be included in the
commit message and/or a code comment so that it's clear what's
going on.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-03 23:01           ` Bjorn Helgaas
@ 2018-05-04  7:20             ` Mika Westerberg
  2018-05-30 10:40             ` Lukas Wunner
  1 sibling, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-05-04  7:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Thu, May 03, 2018 at 06:01:08PM -0500, Bjorn Helgaas wrote:
> If you think we can remove pciehp_force, that would be awesome.  This
> should be a separate patch all by itself, of course, and include your
> reasoning above.
> 
> I would also love to revert db63d40017a5 ("PCI: pciehp: Do not clear
> Presence Detect Changed during initialization") because I'm not
> convinced that trying to handle interrupts that happened before
> binding the driver makes sense.  It *would* make sense to me to enable
> interrupts, clear the "changed" status bits so future changes will
> cause interrupts, and check the "state" status bits and act on them.

OK, I'll make that a separate patch series completely.

Regarding the current patch set, do you have any additional comments /
converns? I would like to send out v6 next week.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-04  7:18     ` Lukas Wunner
@ 2018-05-04  8:02       ` Mika Westerberg
  0 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-05-04  8:02 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Bjorn Helgaas, Rafael J. Wysocki, Len Brown,
	Mario.Limonciello, Michael Jamet, Yehezkel Bernat,
	Andy Shevchenko, linux-pci, linux-acpi

On Fri, May 04, 2018 at 09:18:27AM +0200, Lukas Wunner wrote:
> On Tue, May 01, 2018 at 04:52:11PM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> > > After suspend/resume cycle the Presence Detect or Data Link Layer Status
> > > Changed bits might be set and if we don't clear them those events will
> > > not fire anymore and nothing happens for instance when a device is now
> > > hot-unplugged.
> > > 
> > > Fix this by clearing those bits in a newly introduced function
> > > pcie_reenable_notification(). This should be fine because immediately
> > > after we check if the adapter is still present by reading directly from
> > > the status register.
> > 
> > I want to understand why we need to handle this differently between
> > the boot-time probe path and the resume path.
> > 
> > This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > resume path:
> > 
> >   pciehp_resume
> >     pcie_reenable_notification
> >       # clear PDC DLLSC
> >       pcie_enable_notification
> >         # set DLLSCE ABPE/PDCE HPIE CCIE
> >     pciehp_get_adapter_status
> >       # read PCI_EXP_SLTSTA
> 
> The Slot Control register is already written during the ->resume_noirq
> phase via:
> 
>   pci_pm_resume_noirq
>     pci_pm_default_resume_early
>     pci_restore_state
>       pci_restore_pcie_state
> 
> >From that point on, slot events are enabled, but they're not received
> because interrupts are disabled until the ->resume_early phase commences.
> 
> My guess is that if the hotplug port uses edge-triggered MSI/MSI-X,
> an interrupt may have already been sent during ->resume_noirq, but
> was lost because interrupts were disabled.  Clearing the Slot Status
> register thus acknowledges any lost interrupts.
> 
> If this theory is correct, it should probably be included in the
> commit message and/or a code comment so that it's clear what's
> going on.

I think I can check this by printing out the Slot status register in
pci_restore_pcie_state(). If it turns out that the interrupt happens
after we have restored Slot control register, I'll update the changelog
accordingly.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used
  2018-05-03 10:22     ` Mika Westerberg
@ 2018-05-05  0:04       ` Bjorn Helgaas
  2018-05-07 11:34         ` Mika Westerberg
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2018-05-05  0:04 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Thu, May 03, 2018 at 01:22:41PM +0300, Mika Westerberg wrote:
> On Wed, May 02, 2018 at 03:49:32PM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 16, 2018 at 01:34:48PM +0300, Mika Westerberg wrote:
> > > When a system is using native PCIe hotplug for Thunderbolt and the
> > > controller is not enabled for full RTD3 (runtime D3) it will be only
> > > present in the system when there is a device connected. This pretty much
> > > follows the BIOS assisted hotplug behaviour.
> > 
> > This is a tangent, but what exactly does "controller is not enabled
> > for full RTD3 (runtime D3)" mean?  The way that's worded sounds like
> > it *could* be a setting in a PCI config register, but I suspect it's
> > really something in Linux, e.g., some bit in struct pci_dev or device?
> 
> It means that the controller can to into D3 runtime. With BIOS assisted
> mode and native PCIe mode, the controller is hot-removed when there is
> nothing connected. In RTD3 mode it is there always and the OS expected
> to put it into D3 when idle using standard PCI PM registers.

How do I tell if a controller is enabled for runtime D3?

I'm still struggling to figure out exactly how this runtime D3 part is
relevant to this patch.  It might be behavior that happens in this
particular scenario, but I'm not sure yet that it is a required part
of the picture.

I think you're trying to separate enumeration handled by pciehp from
enumeration handled by acpiphp, and runtime D3 sounds like an
orthogonal issue.

> > > Thunderbolt host router integrated PCIe switch has two additional PCIe
> > > downstream bridges that lead to NHI (Thunderbolt host controller) and xHCI
> > > (USB 3 host controller) respectively. These downstream bridges are not
> > > marked being hotplug capable. Reason for that is to preserve resources.
> > > Otherwise the OS would distribute remaining resources between all
> > > downstream bridges making these two bridges consume precious resources
> > > of the actual hotplug bridges.
> > > 
> > > Now, because these two bridges are not marked being hotplug capable the OS
> > > will not enable hotplug interrupt for them either and will not receive
> > > interrupt when devices behind them are hot-added. Solution to this is
> > > that the BIOS sends ACPI Notify() to the root port let the OS know it
> > > needs to rescan for added and/or removed devices.
> > 
> > We're building stuff based on "is_hotplug_bridge", but I'm not sure
> > that bit is really useful.
> > 
> > set_pcie_hotplug_bridge() sets it for downstream ports with
> > PCI_EXP_SLTCAP_HPC, which is easy enough to understand.
> > 
> > In acpiphp, check_hotplug_bridge() sets in some scenario that I can't
> > quite figure out.  I assume it's based on something in the namespace.
> > 
> > But it seems like the whole point of this patch is that even if a
> > bridge doesn't have "is_hotplug_bridge" set, ACPI hotplug can hot-add
> > devices below it.
> 
> Correct.
> 
> > So I'm not sure what "is_hotplug_bridge" is really telling us.  Is it
> > just a hint about how many resources we want to assign to the bridge,
> > i.e., we assign only a few when it's not set and more if it is set?
> 
> Good question. I did not invent the flag so hard to say. I've been using
> it because IIRC you prefer it.

It's not a question of whether I prefer it.  It's only useful if it
means something specific and we agree on what that is.

So what do you think it means?

In this code:

> > > @@ -291,7 +294,7 @@ static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
> > >          * expose slots to user space in those cases.
> > >          */
> > >         if ((acpi_pci_check_ejectable(pbus, handle) || is_dock_device(adev))
> > > -           && !(pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev))) {
> > > +           && !(slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge)) {

I *think* this part:

  slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge

means "this bridge supports hotplug but it is handled by pciehp, and
acpiphp should not expose the slot to user space".

If that's the case, maybe we should rework pciehp_is_native(bridge) so
instead of answering the question "does the OS have permission to
control PCIe hotplug in the hierarchy containing <bridge>?", it could
answer the specific question "does pciehp handle hotplug for
<bridge>?", e.g., something along these lines:

  bool pciehp_is_native(struct pci_dev *bridge)
  {
  #ifdef CONFIG_HOTPLUG_PCI_PCIE
    u32 slot_cap;
    struct pci_host_bridge *host;

    if (!pci_is_pcie(bridge))
      return false;

    pcie_capability_read_dword(bridge, PCI_EXP_SLTCAP, &slot_cap);
    if (!(slot_cap & PCI_EXP_SLTCAP_HPC))
      return false;

    if (pcie_ports_native)
      return true;

    host = pci_find_host_bridge(bridge->bus);
    return host->native_hotplug;
  #else
    return false;
  #endif
  }

Then your test for whether acpiphp should expose the slot to user
space could be:

  -   && !(pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev))) {
  +   && !pciehp_is_native(pdev)) {
 
We could also use pciehp_is_native() directly in
get_port_device_capability(), which is essentially the condition that
allows pciehp to claim the device.

> > > Here is how the mechanism is supposed to work when a Thunderbolt
> > > endpoint is connected to one of the ports. In case of a standard USB-C
> > > device only the xHCI is hot-added otherwise steps are the same.
> > > 
> > > 1. Initially there is only the PCIe root port that is controlled by
> > >    the pciehp driver
> > > 
> > >   00:1b.0 (Hotplug+) --
> > > 
> > > 2. Then we get native PCIe hotplug interrupt and once it is handled the
> > >    topology looks as following
> > > 
> > >   00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 --
> > >                                   +- 02:01.0 (HotPlug+)
> > >                                   \- 02:02.0 --
> > > 
> > > 3. Bridges 02:00.0 and 02:02.0 are not marked as hotplug capable and
> > 
> > By "not marked as hotplug capable", I guess you mean
> > PCI_EXP_SLTCAP_HPC is not set, right?
> 
> Right.
> 
> > >    they don't have anything behind them currently. Bridge 02:01.0 is
> > >    hotplug capable and used for extending the topology. At this point
> > >    the required PCIe devices are enabled and ACPI Notify() is sent to
> > >    the root port. The resulting topology is expected to look like
> > > 
> > >   00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 -- Thunderbolt host controller
> > >                                   +- 02:01.0 (HotPlug+)
> > >                                   \- 02:02.0 -- xHCI host controller
> > > 
> > > However, the current ACPI hotplug implementation scans the whole 00:1b.0
> > > hotplug slot and everything behind it regardless whether native PCIe is
> > > used or not, and it expects that the BIOS has configured bridge
> > > resources upfront. If that's not the case it assigns resources using
> > > minimal allocation (everything currently found just barely fit)
> > > preventing future extension. In addition to that, if there is another
> > > native PCIe hotplug going on we may find the new PCIe switch only
> > > partially ready (all links are not fully trained yet) confusing pciehp
> > > when it finally starts to enumerate for new devices.
> > > 
> > > To make this work better with the native PCIe hotplug driver (pciehp),
> > > we let it handle all slot management and resource allocation for hotplug
> > > bridges and restrict ACPI hotplug to non-hotplug bridges.

I *think* the point of this patch is that:

  If X is a bridge, and pciehp manages hotplug on X, i.e., X has
  PCI_EXP_SLTCAP_HPC set and the OS owns PCIe hotplug in this
  hierarchy per _OSC, acpiphp should not enumerate devices on X's
  secondary bus.

  Furthermore, acpiphp should avoid this enumeration no matter where X
  is in the hierarchy.  So if the Notify() goes to a bridge Y far
  above X, acpiphp should look at every existing bridge in the
  hierarchy below Y.  If the bridge is managed by pciehp, acpiphp
  should ignore it.  Otherwise, acpiphp should scan for new devices
  below it.

This enumeration by acpiphp is recursive, and it has to avoid the
pciehp-managed bridges at every level.  Your changes to enable_slot()
check for SLOT_IS_NATIVE at the top level, but I don't see how they
avoid pciehp-managed bridges at lower levels that may be scanned by
pci_scan_bridge().

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used
  2018-05-05  0:04       ` Bjorn Helgaas
@ 2018-05-07 11:34         ` Mika Westerberg
  2018-05-07 20:37           ` Bjorn Helgaas
  0 siblings, 1 reply; 30+ messages in thread
From: Mika Westerberg @ 2018-05-07 11:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Fri, May 04, 2018 at 07:04:20PM -0500, Bjorn Helgaas wrote:
> On Thu, May 03, 2018 at 01:22:41PM +0300, Mika Westerberg wrote:
> > On Wed, May 02, 2018 at 03:49:32PM -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 16, 2018 at 01:34:48PM +0300, Mika Westerberg wrote:
> > > > When a system is using native PCIe hotplug for Thunderbolt and the
> > > > controller is not enabled for full RTD3 (runtime D3) it will be only
> > > > present in the system when there is a device connected. This pretty much
> > > > follows the BIOS assisted hotplug behaviour.
> > > 
> > > This is a tangent, but what exactly does "controller is not enabled
> > > for full RTD3 (runtime D3)" mean?  The way that's worded sounds like
> > > it *could* be a setting in a PCI config register, but I suspect it's
> > > really something in Linux, e.g., some bit in struct pci_dev or device?
> > 
> > It means that the controller can to into D3 runtime. With BIOS assisted
> > mode and native PCIe mode, the controller is hot-removed when there is
> > nothing connected. In RTD3 mode it is there always and the OS expected
> > to put it into D3 when idle using standard PCI PM registers.
> 
> How do I tell if a controller is enabled for runtime D3?
> 
> I'm still struggling to figure out exactly how this runtime D3 part is
> relevant to this patch.  It might be behavior that happens in this
> particular scenario, but I'm not sure yet that it is a required part
> of the picture.
> 
> I think you're trying to separate enumeration handled by pciehp from
> enumeration handled by acpiphp, and runtime D3 sounds like an
> orthogonal issue.

It is not an issue at all. I just thought it is good to mention where
this all comes from in the changelog. In can reword it to:

  When a system is using native PCIe hotplug for Thunderbolt it will be
  only present in the system when there is a device connected. This pretty
  much follows the BIOS assisted hotplug behaviour.

if that is more understandable.

> > > > Thunderbolt host router integrated PCIe switch has two additional PCIe
> > > > downstream bridges that lead to NHI (Thunderbolt host controller) and xHCI
> > > > (USB 3 host controller) respectively. These downstream bridges are not
> > > > marked being hotplug capable. Reason for that is to preserve resources.
> > > > Otherwise the OS would distribute remaining resources between all
> > > > downstream bridges making these two bridges consume precious resources
> > > > of the actual hotplug bridges.
> > > > 
> > > > Now, because these two bridges are not marked being hotplug capable the OS
> > > > will not enable hotplug interrupt for them either and will not receive
> > > > interrupt when devices behind them are hot-added. Solution to this is
> > > > that the BIOS sends ACPI Notify() to the root port let the OS know it
> > > > needs to rescan for added and/or removed devices.
> > > 
> > > We're building stuff based on "is_hotplug_bridge", but I'm not sure
> > > that bit is really useful.
> > > 
> > > set_pcie_hotplug_bridge() sets it for downstream ports with
> > > PCI_EXP_SLTCAP_HPC, which is easy enough to understand.
> > > 
> > > In acpiphp, check_hotplug_bridge() sets in some scenario that I can't
> > > quite figure out.  I assume it's based on something in the namespace.
> > > 
> > > But it seems like the whole point of this patch is that even if a
> > > bridge doesn't have "is_hotplug_bridge" set, ACPI hotplug can hot-add
> > > devices below it.
> > 
> > Correct.
> > 
> > > So I'm not sure what "is_hotplug_bridge" is really telling us.  Is it
> > > just a hint about how many resources we want to assign to the bridge,
> > > i.e., we assign only a few when it's not set and more if it is set?
> > 
> > Good question. I did not invent the flag so hard to say. I've been using
> > it because IIRC you prefer it.
> 
> It's not a question of whether I prefer it.  It's only useful if it
> means something specific and we agree on what that is.
> 
> So what do you think it means?

I think it currently means a bridge that may get new devices behind it
as a result of hot-add.

In case of ACPI based hotplug, it is not always possible to determine
and set it upfront so it would make more sense to me if is_hotplug_bridge 
means that this bridge is used only as part of native PCIe hotplug.

> In this code:
> 
> > > > @@ -291,7 +294,7 @@ static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
> > > >          * expose slots to user space in those cases.
> > > >          */
> > > >         if ((acpi_pci_check_ejectable(pbus, handle) || is_dock_device(adev))
> > > > -           && !(pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev))) {
> > > > +           && !(slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge)) {
> 
> I *think* this part:
> 
>   slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge
> 
> means "this bridge supports hotplug but it is handled by pciehp, and
> acpiphp should not expose the slot to user space".

Yes, I specifically added SLOT_IS_NATIVE flag there so we can get rid of
one condition. Note that:

  pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev)

and

  slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge

are equivalent.

There are no functional changes in that hunk.

> If that's the case, maybe we should rework pciehp_is_native(bridge) so
> instead of answering the question "does the OS have permission to
> control PCIe hotplug in the hierarchy containing <bridge>?", it could
> answer the specific question "does pciehp handle hotplug for
> <bridge>?", e.g., something along these lines:
> 
>   bool pciehp_is_native(struct pci_dev *bridge)
>   {
>   #ifdef CONFIG_HOTPLUG_PCI_PCIE
>     u32 slot_cap;
>     struct pci_host_bridge *host;
> 
>     if (!pci_is_pcie(bridge))
>       return false;
> 
>     pcie_capability_read_dword(bridge, PCI_EXP_SLTCAP, &slot_cap);
>     if (!(slot_cap & PCI_EXP_SLTCAP_HPC))
>       return false;
> 
>     if (pcie_ports_native)
>       return true;
> 
>     host = pci_find_host_bridge(bridge->bus);
>     return host->native_hotplug;
>   #else
>     return false;
>   #endif
>   }
> 
> Then your test for whether acpiphp should expose the slot to user
> space could be:

[That test was there already before this patch.]

>   -   && !(pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev))) {
>   +   && !pciehp_is_native(pdev)) {
>  
> We could also use pciehp_is_native() directly in
> get_port_device_capability(), which is essentially the condition that
> allows pciehp to claim the device.

Fine, I agree it makes sense.

However, I intended *this* patch to be *fix* and thus did not want to
start reworking things too much. Especially code that is not related to
the issue at all (pciehp_is_native()).

> > > > Here is how the mechanism is supposed to work when a Thunderbolt
> > > > endpoint is connected to one of the ports. In case of a standard USB-C
> > > > device only the xHCI is hot-added otherwise steps are the same.
> > > > 
> > > > 1. Initially there is only the PCIe root port that is controlled by
> > > >    the pciehp driver
> > > > 
> > > >   00:1b.0 (Hotplug+) --
> > > > 
> > > > 2. Then we get native PCIe hotplug interrupt and once it is handled the
> > > >    topology looks as following
> > > > 
> > > >   00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 --
> > > >                                   +- 02:01.0 (HotPlug+)
> > > >                                   \- 02:02.0 --
> > > > 
> > > > 3. Bridges 02:00.0 and 02:02.0 are not marked as hotplug capable and
> > > 
> > > By "not marked as hotplug capable", I guess you mean
> > > PCI_EXP_SLTCAP_HPC is not set, right?
> > 
> > Right.
> > 
> > > >    they don't have anything behind them currently. Bridge 02:01.0 is
> > > >    hotplug capable and used for extending the topology. At this point
> > > >    the required PCIe devices are enabled and ACPI Notify() is sent to
> > > >    the root port. The resulting topology is expected to look like
> > > > 
> > > >   00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 -- Thunderbolt host controller
> > > >                                   +- 02:01.0 (HotPlug+)
> > > >                                   \- 02:02.0 -- xHCI host controller
> > > > 
> > > > However, the current ACPI hotplug implementation scans the whole 00:1b.0
> > > > hotplug slot and everything behind it regardless whether native PCIe is
> > > > used or not, and it expects that the BIOS has configured bridge
> > > > resources upfront. If that's not the case it assigns resources using
> > > > minimal allocation (everything currently found just barely fit)
> > > > preventing future extension. In addition to that, if there is another
> > > > native PCIe hotplug going on we may find the new PCIe switch only
> > > > partially ready (all links are not fully trained yet) confusing pciehp
> > > > when it finally starts to enumerate for new devices.
> > > > 
> > > > To make this work better with the native PCIe hotplug driver (pciehp),
> > > > we let it handle all slot management and resource allocation for hotplug
> > > > bridges and restrict ACPI hotplug to non-hotplug bridges.
> 
> I *think* the point of this patch is that:
> 
>   If X is a bridge, and pciehp manages hotplug on X, i.e., X has
>   PCI_EXP_SLTCAP_HPC set and the OS owns PCIe hotplug in this
>   hierarchy per _OSC, acpiphp should not enumerate devices on X's
>   secondary bus.
> 
>   Furthermore, acpiphp should avoid this enumeration no matter where X
>   is in the hierarchy.  So if the Notify() goes to a bridge Y far
>   above X, acpiphp should look at every existing bridge in the
>   hierarchy below Y.  If the bridge is managed by pciehp, acpiphp
>   should ignore it.  Otherwise, acpiphp should scan for new devices
>   below it.
> 
> This enumeration by acpiphp is recursive, and it has to avoid the
> pciehp-managed bridges at every level.  Your changes to enable_slot()
> check for SLOT_IS_NATIVE at the top level, but I don't see how they
> avoid pciehp-managed bridges at lower levels that may be scanned by
> pci_scan_bridge().

This is pretty special case. For example below the code scans only the
two bridges (02:00.0 and 02:02.0) and they only have one bus reserved.
There won't be any further bridges downstream. The hotplug bridge
(02:01.0) is left for pciehp:

  00:1b.0 (Hotplug+) -- 01:00.0 --+- 02:00.0 --
                                  +- 02:01.0 (HotPlug+)
                                  \- 02:02.0 --

In theory of course it is possible that there will be another hotplug
bridge added behind those two bridges but in practice this is only used
for this special case to bring in the two devices (NHI, xHCI). I don't
think it makes much sense to complicate pci_scan_bridge() and friends
just to make more "generic" support for a case that in practice never
happens.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used
  2018-05-07 11:34         ` Mika Westerberg
@ 2018-05-07 20:37           ` Bjorn Helgaas
  0 siblings, 0 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2018-05-07 20:37 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, Lukas Wunner,
	linux-pci, linux-acpi

On Mon, May 07, 2018 at 02:34:49PM +0300, Mika Westerberg wrote:
> On Fri, May 04, 2018 at 07:04:20PM -0500, Bjorn Helgaas wrote:
> ...

> > In this code:
> > 
> > > > > @@ -291,7 +294,7 @@ static acpi_status acpiphp_add_context(acpi_handle handle, u32 lvl, void *data,
> > > > >          * expose slots to user space in those cases.
> > > > >          */
> > > > >         if ((acpi_pci_check_ejectable(pbus, handle) || is_dock_device(adev))
> > > > > -           && !(pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev))) {
> > > > > +           && !(slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge)) {
> > 
> > I *think* this part:
> > 
> >   slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge
> > 
> > means "this bridge supports hotplug but it is handled by pciehp, and
> > acpiphp should not expose the slot to user space".
> 
> Yes, I specifically added SLOT_IS_NATIVE flag there so we can get rid of
> one condition. Note that:
> 
>   pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev)
> 
> and
> 
>   slot->flags & SLOT_IS_NATIVE && pdev->is_hotplug_bridge
> 
> are equivalent.
> 
> There are no functional changes in that hunk.
> 
> > If that's the case, maybe we should rework pciehp_is_native(bridge) so
> > instead of answering the question "does the OS have permission to
> > control PCIe hotplug in the hierarchy containing <bridge>?", it could
> > answer the specific question "does pciehp handle hotplug for
> > <bridge>?", e.g., something along these lines:
> > 
> >   bool pciehp_is_native(struct pci_dev *bridge)
> >   {
> >   #ifdef CONFIG_HOTPLUG_PCI_PCIE
> >     u32 slot_cap;
> >     struct pci_host_bridge *host;
> > 
> >     if (!pci_is_pcie(bridge))
> >       return false;
> > 
> >     pcie_capability_read_dword(bridge, PCI_EXP_SLTCAP, &slot_cap);
> >     if (!(slot_cap & PCI_EXP_SLTCAP_HPC))
> >       return false;
> > 
> >     if (pcie_ports_native)
> >       return true;
> > 
> >     host = pci_find_host_bridge(bridge->bus);
> >     return host->native_hotplug;
> >   #else
> >     return false;
> >   #endif
> >   }
> > 
> > Then your test for whether acpiphp should expose the slot to user
> > space could be:
> 
> [That test was there already before this patch.]
> 
> >   -   && !(pdev && pdev->is_hotplug_bridge && pciehp_is_native(pdev))) {
> >   +   && !pciehp_is_native(pdev)) {
> >  
> > We could also use pciehp_is_native() directly in
> > get_port_device_capability(), which is essentially the condition that
> > allows pciehp to claim the device.
> 
> Fine, I agree it makes sense.
> 
> However, I intended *this* patch to be *fix* and thus did not want to
> start reworking things too much. Especially code that is not related to
> the issue at all (pciehp_is_native()).

I think this pciehp_is_native() issue is definitely related.  You're
using it to set SLOT_IS_NATIVE.  But it's more complicated than I
first thought and there are several things that look wrong here.

  - We request OSC_PCI_EXPRESS_NATIVE_HP_CONTROL unconditionally, even
    if CONFIG_HOTPLUG_PCI_PCIE isn't set, which seems wrong.  If we
    request control, we should be prepared to handle hotplug events.

  - I think we should make CONFIG_HOTPLUG_PCI_SHPC a bool instead of a
    tristate, like we did for pciehp.  acpiphp is a bool and I don't
    think it can coordinate correctly with pciehp and SHPC unless
    they're all bools.

  - We probably should split "host->native_hotplug" into
    "native_pcie_hotplug" and "native_shpc_hotplug".

  - We should request OSC_PCI_SHPC_NATIVE_HP_CONTROL similarly to how
    we request OSC_PCI_EXPRESS_NATIVE_HP_CONTROL and adapt the
    is_shpc_capable() / acpi_get_hp_hw_control_from_firmware() path to
    use "host->native_shpc_hotplug".

  - The acpiphp_add_context() path probably should treat SHPC and
    pciehp the same.

I know this probably sounds like a tangent to you, but to me it seems
like we're building on a poor foundation and we should make the
foundation solid before extending it.

Here's an example of what I mean about the foundation being poor: it
seems fairly clear that the *intent* is that pciehp_is_native() means
"pciehp manages hotplug events on this bridge", but I don't think
that's true.

Today pciehp_is_native() returns "true" for every PCI device in a
hierarchy where _OSC says we can use pciehp.  That's obviously wrong
because in such a hierarchy, bridges without PCI_EXP_SLTCAP_HPC should
be managed by acpiphp.

There's only one caller (acpiphp_add_context()), which *tries* to make
it sensible by restricting it to bridges with "is_hotplug_bridge" set.
But "is_hotplug_bridge" tells us nothing about whether pciehp manages
the bridge because acpiphp sets it in check_hotplug_bridge() (under
conditions that are very difficult to figure out).

So evidently "pdev->is_hotplug_bridge && pciehp_is_native(pdev)" is
true for some bridges that are actually managed by acpiphp, which
prevents acpiphp_add_context() from registering some slots that it
should register.

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-03 23:01           ` Bjorn Helgaas
  2018-05-04  7:20             ` Mika Westerberg
@ 2018-05-30 10:40             ` Lukas Wunner
  2018-05-30 13:27               ` Mika Westerberg
  1 sibling, 1 reply; 30+ messages in thread
From: Lukas Wunner @ 2018-05-30 10:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, linux-pci,
	linux-acpi

On Thu, May 03, 2018 at 06:01:08PM -0500, Bjorn Helgaas wrote:
> On Thu, May 03, 2018 at 01:42:50PM +0300, Mika Westerberg wrote:
> > On Wed, May 02, 2018 at 08:41:48AM -0500, Bjorn Helgaas wrote:
> > > On Wed, May 02, 2018 at 02:55:09PM +0300, Mika Westerberg wrote:
> > > > On Tue, May 01, 2018 at 04:52:11PM -0500, Bjorn Helgaas wrote:
> > > > > On Mon, Apr 16, 2018 at 01:34:47PM +0300, Mika Westerberg wrote:
> > > > > > After suspend/resume cycle the Presence Detect or Data Link Layer Status
> > > > > > Changed bits might be set and if we don't clear them those events will
> > > > > > not fire anymore and nothing happens for instance when a device is now
> > > > > > hot-unplugged.
> > > > > > 
> > > > > > Fix this by clearing those bits in a newly introduced function
> > > > > > pcie_reenable_notification(). This should be fine because immediately
> > > > > > after we check if the adapter is still present by reading directly from
> > > > > > the status register.
> > > > > 
> > > > > I want to understand why we need to handle this differently between
> > > > > the boot-time probe path and the resume path.
> > > > > 
> > > > > This patch clears PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > > > > resume path:
> > > > > 
> > > > >   pciehp_resume
> > > > >     pcie_reenable_notification
> > > > >       # clear PDC DLLSC
> > > > >       pcie_enable_notification
> > > > >         # set DLLSCE ABPE/PDCE HPIE CCIE
> > > > >     pciehp_get_adapter_status
> > > > >       # read PCI_EXP_SLTSTA
> > > > > 
> > > > > We used to clear PCI_EXP_SLTSTA_PDC and PCI_EXP_SLTSTA_DLLSC in the
> > > > > probe path:
> > > > > 
> > > > >   pciehp_probe
> > > > >     pcie_init
> > > > >       # clear PDC ABP PFD MRLSC CC DLSCC
> > > > >     pcie_init_notification
> > > > >       pciehp_request_irq
> > > > >       pcie_enable_notification
> > > > >         # set DLLSCE ABPE/PDCE HPIE CCIE
> > > > >     pciehp_get_adapter_status
> > > > >       # read PCI_EXP_SLTSTA
> > > > >     pciehp_get_power_status
> > > > > 
> > > > > db63d40017a5 ("PCI: pciehp: Do not clear Presence Detect Changed
> > > > > during initialization") changed the probe path so we don't clear
> > > > > PCI_EXP_SLTSTA_PDC there anymore.  This was so we wouldn't miss a PDC
> > > > > interrupt that happened before probe.
> > > > > 
> > > > > Why can't we handle probe and resume the same way?  They look quite
> > > > > similar.
> > > > > 
> > > > > You say this patch should be fine because we read SLTSTA immediately
> > > > > after clearing PDC and DLLSC.  But we also read SLTSTA in the probe
> > > > > path, so I'm not sure why we need to clear PDC and DLLSC for resume
> > > > > but not for probe.
> > > > 
> > > > On probe path we read status but here is what it does:
> > > > 
> > > > 	pcie_init_notification()
> > > > 	...
> > > >         pciehp_get_adapter_status(slot, &occupied);
> > > > 	...
> > > >         if (occupied && pciehp_force) {
> > > >                 mutex_lock(&slot->hotplug_lock);
> > > >                 pciehp_enable_slot(slot);
> > > >                 mutex_unlock(&slot->hotplug_lock);
> > > >         }
> > > > 
> > > > If you don't have "pciehp.pciehp_force=1" in your kernel command line
> > > > you miss the fact that the card is already there. Obviously you can't
> > > > expect ordinary users to pass random command line options to get their
> > > > already connected device detected by Linux.
> > > 
> > > Yeah, definitely not, that's really ugly.
> > > 
> > > > So the reason why in probe we don't clear PDC is that once the interrupt
> > > > is unmasked, you get an interrupt and the card gets detected properly.
> > > > 
> > > > On resume path we already check whether the card is there or not and
> > > > handle it accordingly. However, if we don't clear PDC and DLLSC bits we
> > > > will never get hotplug interrupt again.
> > > > 
> > > > Now, you ask why can't we handle probe and resume the same way? I think
> > > > we can if we could get rid of that pciehp_force thing but it seems to
> > > > fix an issue on some HP hardware. See 9e5858244926 ("pciehp: don't
> > > > enable slot unless forced").
> > > 
> > > Ugh.  That's really ancient history.  I would *love* to get rid of
> > > pciehp.pciehp_force.  There's not much detail in 9e5858244926, but
> > > maybe with some digging we can figure out something.
> > > 
> > > I'd rather do the digging now and try to simplify this area instead of
> > > adding another tweak.
> > 
> > I did some digging but unfortunately it is still not clear what issue
> > 9e5858244926 is fixing. There is no bugzilla link and I was not able to
> > find any discussion around this either.
> > 
> > However, I think currently pciehp_force=1 does not actually do what it
> > was intended to do. Reason is that portdrv core already asks BIOS
> > whether native PCIe is allowed and if not, it does not set
> > PCIE_PORT_SERVICE_HP and that prevents the whole device to be created.
> > So even if you specify pciehp_force=1 in the command line, it does not
> > bypass the BIOS setting.
> > 
> > Furthermore we have had similar check in pciehp_resume() but it was
> > removed with 87683e22c646 ("PCI: pciehp: Always implement resume,
> > regardless of pciehp_force param") because it broke resume.
> > 
> > If I understand correctly you want me to change the driver so that I
> > remove pciehp_force check from probe. Then I can revert db63d40017a5
> > ("PCI: pciehp: Do not clear Presence Detect Changed during
> > initialization") and that makes both probe path and resume path similar
> > (when this patch is included). Is that correct? I can do that in the
> > next version of the patch series.
> 
> If you think we can remove pciehp_force, that would be awesome.  This
> should be a separate patch all by itself, of course, and include your
> reasoning above.
> 
> I would also love to revert db63d40017a5 ("PCI: pciehp: Do not clear
> Presence Detect Changed during initialization") because I'm not
> convinced that trying to handle interrupts that happened before
> binding the driver makes sense.  It *would* make sense to me to enable
> interrupts, clear the "changed" status bits so future changes will
> cause interrupts, and check the "state" status bits and act on them.

JFYI, I've added the below patch to my upcoming series which reworks
pciehp event handling and adds runtime PM support:
https://github.com/l1k/linux/commits/pciehp_runpm_v2

The patch removes pciehp_force and reverts db63d40017a5 as requested.
I needed this for the series to work flawlessly.  Note, the patch
depends on the preceding patches in the series.  It cannot be applied
as is to pci.git branches.

The series has been in development for months but is now converging,
I hope to post it sometime during or after the merge window.
Comments/testing welcome.

Thanks,

Lukas

-- >8 --
Subject: [PATCH] PCI: pciehp: Always enable occupied slot on probe

Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
there are hot-plug events that occur while interrupt generation is
disabled, and interrupt generation is subsequently enabled."

On probe, we currently clear all event bits in the Slot Status register
with the notable exception of the Presence Detect Changed bit.  Thereby
we seek to receive an interrupt for an already occupied slot once event
notification is enabled.

But because the interrupt is optional, users may have to specify the
pciehp_force parameter on the command line, which is inconvenient.

Moreover, now that pciehp's event handling has become resilient to
missed events, a Presence Detect Changed interrupt for a slot which is
powered on is interpreted as removal of the card.  If the slot has
already been brought up by the BIOS, receiving such an interrupt on
probe causes the slot to be powered off and immediately back on, which
is likewise undesirable.

Avoid both issues by making the behavior of pciehp_force the default and
clearing the Presence Detect Changed bit on probe.

Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
("Force pciehp, even if OSHP is missing") seems nonsensical because the
OSHP control method is only relevant for SHCP slots according to the
PCI Firmware specification r3.0, sec 4.8.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/hotplug/pciehp_core.c | 12 ++++--------
 drivers/pci/hotplug/pciehp_hpc.c  |  9 ++-------
 2 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 6c77b6d7445f..64903f6fc8ef 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -30,7 +30,6 @@
 bool pciehp_debug;
 bool pciehp_poll_mode;
 int pciehp_poll_time;
-static bool pciehp_force;
 
 /*
  * not really modular, but the easiest way to keep compat with existing
@@ -39,11 +38,9 @@ static bool pciehp_force;
 module_param(pciehp_debug, bool, 0644);
 module_param(pciehp_poll_mode, bool, 0644);
 module_param(pciehp_poll_time, int, 0644);
-module_param(pciehp_force, bool, 0644);
 MODULE_PARM_DESC(pciehp_debug, "Debugging mode enabled or not");
 MODULE_PARM_DESC(pciehp_poll_mode, "Using polling mechanism for hot-plug events or not");
 MODULE_PARM_DESC(pciehp_poll_time, "Polling mechanism frequency, in seconds");
-MODULE_PARM_DESC(pciehp_force, "Force pciehp, even if OSHP is missing");
 
 #define PCIE_MODULE_NAME "pciehp"
 
@@ -243,11 +240,10 @@ static int pciehp_probe(struct pcie_device *dev)
 	mutex_lock(&slot->lock);
 	pciehp_get_adapter_status(slot, &occupied);
 	pciehp_get_power_status(slot, &poweron);
-	if (pciehp_force &&
-	    ((occupied && (slot->state == OFF_STATE ||
-			   slot->state == BLINKINGON_STATE)) ||
-	     (!occupied && (slot->state == ON_STATE ||
-			    slot->state == BLINKINGOFF_STATE)))) {
+	if ((occupied && (slot->state == OFF_STATE ||
+			  slot->state == BLINKINGON_STATE)) ||
+	    (!occupied && (slot->state == ON_STATE ||
+			   slot->state == BLINKINGOFF_STATE))) {
 		atomic_or(PCI_EXP_SLTSTA_PDC, &ctrl->pending_events);
 		irq_wake_thread(ctrl->pcie->irq, ctrl);
 	}
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3d402bd998df..8cde5c14f4e6 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -835,16 +835,11 @@ struct controller *pcie_init(struct pcie_device *dev)
 	if (link_cap & PCI_EXP_LNKCAP_DLLLARC)
 		ctrl->link_active_reporting = 1;
 
-	/*
-	 * Clear all remaining event bits in Slot Status register except
-	 * Presence Detect Changed. We want to make sure possible
-	 * hotplug event is triggered when the interrupt is unmasked so
-	 * that we don't lose that event.
-	 */
+	/* Clear all remaining event bits in Slot Status register. */
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 		PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
 		PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_CC |
-		PCI_EXP_SLTSTA_DLLSC);
+		PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC);
 
 	ctrl_info(ctrl, "Slot #%d AttnBtn%c PwrCtrl%c MRL%c AttnInd%c PwrInd%c HotPlug%c Surprise%c Interlock%c NoCompl%c LLActRep%c\n",
 		(slot_cap & PCI_EXP_SLTCAP_PSN) >> 19,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
  2018-05-30 10:40             ` Lukas Wunner
@ 2018-05-30 13:27               ` Mika Westerberg
  0 siblings, 0 replies; 30+ messages in thread
From: Mika Westerberg @ 2018-05-30 13:27 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mario.Limonciello,
	Michael Jamet, Yehezkel Bernat, Andy Shevchenko, linux-pci,
	linux-acpi

On Wed, May 30, 2018 at 12:40:50PM +0200, Lukas Wunner wrote:
> JFYI, I've added the below patch to my upcoming series which reworks
> pciehp event handling and adds runtime PM support:
> https://github.com/l1k/linux/commits/pciehp_runpm_v2
> 
> The patch removes pciehp_force and reverts db63d40017a5 as requested.

Thanks for taking care if it! I can remove it from my todo list now :-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-05-30 13:27 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-16 10:34 [PATCH v5 0/9] PCI: Fixes and cleanups for native PCIe and ACPI hotplug Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 1/9] PCI: Take all bridges into account when calculating bus numbers for extension Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 2/9] PCI: Take bridge window alignment into account when distributing resources Mika Westerberg
2018-04-25 22:38   ` Bjorn Helgaas
2018-04-26 12:23     ` Mika Westerberg
2018-05-01 20:32       ` Bjorn Helgaas
2018-05-03 12:39         ` Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 3/9] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume Mika Westerberg
2018-05-01 21:52   ` Bjorn Helgaas
2018-05-02 11:55     ` Mika Westerberg
2018-05-02 13:41       ` Bjorn Helgaas
2018-05-03 10:42         ` Mika Westerberg
2018-05-03 23:01           ` Bjorn Helgaas
2018-05-04  7:20             ` Mika Westerberg
2018-05-30 10:40             ` Lukas Wunner
2018-05-30 13:27               ` Mika Westerberg
2018-05-04  7:18     ` Lukas Wunner
2018-05-04  8:02       ` Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 4/9] ACPI / hotplug / PCI: Do not scan all bridges when native PCIe hotplug is used Mika Westerberg
     [not found]   ` <20180502204932.GG11698@bhelgaas-glaptop.roam.corp.google.com>
2018-05-03 10:22     ` Mika Westerberg
2018-05-05  0:04       ` Bjorn Helgaas
2018-05-07 11:34         ` Mika Westerberg
2018-05-07 20:37           ` Bjorn Helgaas
2018-04-16 10:34 ` [PATCH v5 5/9] ACPI / hotplug / PCI: Mark stale PCI devices disconnected Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 6/9] PCI: Move resource distribution for a single bridge outside of the loop Mika Westerberg
2018-04-24 23:05   ` Bjorn Helgaas
2018-04-25  7:29     ` Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 7/9] PCI: Document return value of pci_scan_bridge() and pci_scan_bridge_extend() Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 8/9] PCI: Improve "partially hidden behind bridge" log message Mika Westerberg
2018-04-16 10:34 ` [PATCH v5 9/9] ACPI / hotplug / PCI: Drop unnecessary parentheses Mika Westerberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.