Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug
@ 2020-01-29 15:29 Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 01/26] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
                   ` (26 more replies)
  0 siblings, 27 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

Currently PCI hotplug works on top of resources which are usually reserved
not by the kernel, but by BIOS, bootloader, firmware, etc. These resources
are gaps in the address space where BARs of new devices may fit, and extra
bus number per port, so bridges can be hot-added. This series aim the BARs
problem: it shows the kernel how to redistribute them on the run, so the
hotplug becomes predictable and cross-platform. A follow-up patchset will
propose a solution for bus numbers.

To arrange a space for BARs of new hotplugged devices, the kernel now pause
the drivers of working PCI devices and reshuffle the assigned BARs by the
same procedure as during the system boot. When a driver is un-paused by the
kernel, it should ioremap() the new addresses of its BARs.

Drivers indicate their support of the feature by implementing the new hooks
.rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
doesn't yet support the feature, BARs of its devices will be considered as
immovable and handled in the same way as resources with the PCI_FIXED flag:
they are guaranteed to remain untouched.

Tested on a number x86_64 machines with "pci=pcie_bus_peer2peer" command
line argument to specify a policy for Maximum Payload Size PCIe setting.

Also tested on a POWER8 PowerNV+OPAL+PHB3 ppc64le machine, but with extra
patches which are to be sent upstream after the PCIPOCALYPSE patchset is
merged.

First two patches of this series are bugfixes, not related directly to the
movable BARs feature.

Patches 03-14/26 implement the essentials of the feature.

Patch 15/26 enables the feature by default.

Patches 16-26/26 are performance improvements for hotplug.

This patchset is a part of our work on adding support for hotplugging
chains of chassis full of other bridges, NVME drives, SAS HBAs, GPUs, etc.
without special requirements such as Hot-Plug Controller, reservation of
bus numbers or memory regions by firmware, etc.

Changes since v6:
 - Added a fix for hotplug on AMD Epyc + Supermicro H11SSL-i by ignoring
   PCIBIOS_MIN_MEM;
 - Fixed a workaround which marks VGA BARs as immovables;
 - Fixed misleading "can't claim BAR ... no compatible bridge window" error
   messages;
 - Refactored the code, reduced the amount of patches;
 - Exclude PowerPC-specific arch patches, they will be sent separately;
 - Disabled for PowerNV by default - waiting for the PCIPOCALYPSE patchset.
 - Fixed reports from the kbuild test robot.

Changes since v5:
 - Simplified the disable flag, now it is "pci=no_movable_buses";
 - More deliberate marking the BARs as immovable;
 - Mark as immovable BARs which are used by unbound drivers;
 - Ignoring BAR assignment by non-kernel program components, so the kernel
   is able now to distribute BARs in optimal and predictable way;
 - Move here PowerNV-specific patches from the older "powerpc/powernv/pci:
   Make hotplug self-sufficient, independent of FW and DT" series;
 - Fix EEH cache rebuilding and PE allocation for PowerNV during rescan.

Changes since v4:
 - Feature is enabled by default (turned on by one of the latest patches);
 - Add pci_dev_movable_bars_supported(dev) instead of marking the immovable
   BARs with the IORESOURCE_PCI_FIXED flag;
 - Set up PCIe bridges during rescan via sysfs, so MPS settings are now
   configured not only during system boot or pcihp events;
 - Allow movement of switch's BARs if claimed by portdrv;
 - Update EEH address caches after rescan for powerpc;
 - Don't disable completely hot-added devices which can't have BARs being
   fit - just disable their BARs, so they are still visible in lspci etc;
 - Clearer names: fixed_range_hard -> immovable_range, fixed_range_soft ->
   realloc_range;
 - Drop the patch for pci_restore_config_space() - fixed by properly using
   the runtime PM.

Changes since v3:
 - Rebased to the upstream, so the patches apply cleanly again.

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

Sergei Miroshnichenko (26):
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Enable bridge's I/O and MEM access for hotplugged devices
  PCI: hotplug: Initial support of the movable BARs feature
  PCI: Add version of release_child_resources() aware of immovable BARs
  PCI: hotplug: Fix reassigning the released BARs
  PCI: hotplug: Recalculate every bridge window during rescan
  PCI: hotplug: Don't allow hot-added devices to steal resources
  PCI: hotplug: Try to reassign movable BARs only once
  PCI: hotplug: Calculate immovable parts of bridge windows
  PCI: Include fixed and immovable BARs into the bus size calculating
  PCI: hotplug: movable BARs: Compute limits for relocated bridge
    windows
  PCI: Make sure bridge windows include their fixed BARs
  PCI: hotplug: Add support of immovable BARs to pci_assign_resource()
  PCI: hotplug: Sort immovable BARs before assignment
  PCI: hotplug: Enable the movable BARs feature by default
  PCI: Ignore PCIBIOS_MIN_MEM
  PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader
  PCI: Treat VGA BARs as immovable
  PCI: hotplug: Configure MPS for hot-added bridges during bus rescan
  PNP: Don't reserve BARs for PCI when enabled movable BARs
  PCI: hotplug: Don't disable the released bridge windows immediately
  PCI: pciehp: Trigger a domain rescan on hp events when enabled movable
    BARs
  PCI: Don't claim immovable BARs
  PCI: hotplug: Don't reserve bus space when enabled movable BARs
  nvme-pci: Handle movable BARs
  PCI/portdrv: Declare support of movable BARs

 .../admin-guide/kernel-parameters.txt         |   1 +
 arch/powerpc/platforms/powernv/pci.c          |   2 +
 arch/powerpc/platforms/pseries/setup.c        |   2 +
 drivers/nvme/host/pci.c                       |  21 +-
 drivers/pci/bus.c                             |   2 +-
 drivers/pci/hotplug/pciehp_pci.c              |   5 +
 drivers/pci/pci.c                             |  38 +-
 drivers/pci/pci.h                             |  31 ++
 drivers/pci/pcie/portdrv_pci.c                |  11 +
 drivers/pci/probe.c                           | 331 +++++++++++++++++-
 drivers/pci/setup-bus.c                       | 311 ++++++++++++++--
 drivers/pci/setup-res.c                       |  47 ++-
 drivers/pnp/system.c                          |   6 +
 include/linux/pci.h                           |  20 ++
 14 files changed, 790 insertions(+), 38 deletions(-)


base-commit: b3a6082223369203d7e7db7e81253ac761377644
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 01/26] PCI: Fix race condition in pci_enable/disable_device()
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 02/26] PCI: Enable bridge's I/O and MEM access for hotplugged devices Sergei Miroshnichenko
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko,
	Srinath Mannam, Marta Rybczynska

This is a yet another approach to fix an old [1-2] concurrency issue, when:
 - two or more devices are being hot-added into a bridge which was
   initially empty;
 - a bridge with two or more devices is being hot-added;
 - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.

The problem is that a bridge is reported as enabled before the MEM/IO bits
are actually written to the PCI_COMMAND register, so another driver thread
starts memory requests through the not-yet-enabled bridge:

 CPU0                                        CPU1

 pci_enable_device_mem()                     pci_enable_device_mem()
   pci_enable_bridge()                         pci_enable_bridge()
     pci_is_enabled()
       return false;
     atomic_inc_return(enable_cnt)
     Start actual enabling the bridge
     ...                                         pci_is_enabled()
     ...                                           return true;
     ...                                     Start memory requests <-- FAIL
     ...
     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
while enabling upstream bridges"), but adding a per-device mutexes and
preventing the dev->enable_cnt from from incrementing early.

CC: Srinath Mannam <srinath.mannam@broadcom.com>
CC: Marta Rybczynska <mrybczyn@kalray.eu>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>

[1] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[2] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
    [RFC PATCH] nvme: avoid race-conditions when enabling devices
---
 drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index df21e3227b57..0366289c75e9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1672,6 +1672,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	struct pci_dev *bridge;
 	int retval;
 
+	mutex_lock(&dev->enable_mutex);
+
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
 		pci_enable_bridge(bridge);
@@ -1679,6 +1681,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	if (pci_is_enabled(dev)) {
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
+		mutex_unlock(&dev->enable_mutex);
 		return;
 	}
 
@@ -1687,11 +1690,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_err(dev, "Error enabling bridge (%d), continuing\n",
 			retval);
 	pci_set_master(dev);
+	mutex_unlock(&dev->enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
+	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
+	bool need_lock = !dev->subordinate;
 	int err;
 	int i, bars = 0;
 
@@ -1707,8 +1713,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
 	}
 
-	if (atomic_inc_return(&dev->enable_cnt) > 1)
+	if (need_lock)
+		mutex_lock(&dev->enable_mutex);
+	if (pci_is_enabled(dev)) {
+		if (need_lock)
+			mutex_unlock(&dev->enable_mutex);
 		return 0;		/* already enabled */
+	}
 
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
@@ -1723,8 +1734,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 			bars |= (1 << i);
 
 	err = do_pci_enable_device(dev, bars);
-	if (err < 0)
-		atomic_dec(&dev->enable_cnt);
+	if (err >= 0)
+		atomic_inc(&dev->enable_cnt);
+	if (need_lock)
+		mutex_unlock(&dev->enable_mutex);
 	return err;
 }
 
@@ -1968,15 +1981,20 @@ void pci_disable_device(struct pci_dev *dev)
 	if (dr)
 		dr->enabled = 0;
 
+	mutex_lock(&dev->enable_mutex);
 	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
 		      "disabling already-disabled device");
 
-	if (atomic_dec_return(&dev->enable_cnt) != 0)
+	if (atomic_dec_return(&dev->enable_cnt) != 0) {
+		mutex_unlock(&dev->enable_mutex);
 		return;
+	}
 
 	do_pci_disable_device(dev);
 
 	dev->is_busmaster = 0;
+
+	mutex_unlock(&dev->enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 512cb4312ddd..2a7e81f146e4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2169,6 +2169,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
 	dev->bus = pci_bus_get(bus);
+	mutex_init(&dev->enable_mutex);
 
 	return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c393dff2d66f..e473e70834b5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -424,6 +424,7 @@ struct pci_dev {
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
+	struct mutex	enable_mutex;
 
 	u32		saved_config_space[16]; /* Config space saved at suspend time */
 	struct hlist_head saved_cap_space;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 02/26] PCI: Enable bridge's I/O and MEM access for hotplugged devices
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 01/26] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-30 23:12   ` Bjorn Helgaas
  2020-01-29 15:29 ` [PATCH v7 03/26] PCI: hotplug: Initial support of the movable BARs feature Sergei Miroshnichenko
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

The PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits of the bridge must be
updated not only when enabling the bridge for the first time, but also if a
hotplugged device requests these types of resources.

For example, if a bridge had all its slots empty, the IO and MEM bits will
not be set, and a hot hotplugged device will fail.

Originally these bits were set by the pci_enable_device_flags() only, which
exits early if the bridge is already pci_is_enabled(). So let's check them
again when requested.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0366289c75e9..cb0042f28e6a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1679,6 +1679,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_enable_bridge(bridge);
 
 	if (pci_is_enabled(dev)) {
+		int i, bars = 0;
+
+		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
+			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
+				bars |= (1 << i);
+		}
+		do_pci_enable_device(dev, bars);
+
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
 		mutex_unlock(&dev->enable_mutex);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 03/26] PCI: hotplug: Initial support of the movable BARs feature
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 01/26] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 02/26] PCI: Enable bridge's I/O and MEM access for hotplugged devices Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 04/26] PCI: Add version of release_child_resources() aware of immovable BARs Sergei Miroshnichenko
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko,
	Sam Bobroff, Rajat Jain, Lukas Wunner, Oliver O'Halloran,
	David Laight

When hot-adding a device, the involved bridge may have windows not big
enough (or fragmented too much) for newly requested BARs to fit in. But
expanding these bridge windows may be impossible if they are wedged between
"neighboring" BARs.

Still, it may be possible to allocate a memory region for new BARs, if at
least some utilized BARs can be moved, with the following procedure:

1) notify all the drivers which support movable BARs to pause and release
   the BARs; the rest of the drivers are guaranteed that their devices will
   not get BARs moved;

2) release all the bridge windows and movable BARs;

3) try to recalculate new bridge windows that will fit all the BAR types:
   - immovable (including those marked with PCI_FIXED);
   - movable;
   - newly requested by hot-added devices;

4) if the previous step fails, disable BARs for one of the hot-added
   devices and retry from step 3;

5) notify the drivers, so they remap BARs and resume.

If bridge calculation and BAR assignment fail with hot-added devices, this
patchset disables their BARs, falling back to the same amount and size of
BARs as they were before the hotplug event. The kernel succeeded then - so
the same BAR layout will be reproduced again.

This makes the prior reservation of memory by BIOS/bootloader/firmware not
required anymore for the PCI hotplug.

Drivers indicate their support of movable BARs by implementing the new
.rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
device's activity must be paused during a rescan, and iounmap()+ioremap()
must be applied to every used BAR.

If a device is not bound to a driver, its BARs are considered movable.

For a higher probability of the successful BAR reassignment, all the BARs
and bridge windows should be released during a rescan, not only those with
higher addresses.

One example when it is needed, BAR(I) is moved to free a gap for the new
BAR(II):

Before:

    ==================== parent bridge window ===============
                   ---- hotplug bridge window ----
    |   BAR(I)    |   fixed BAR   |   fixed BAR   | fixed BAR |
        ^^^^^^                 ^
                               |
                           new BAR(II)

After:

    ==================== parent bridge window =========================
     ----------- hotplug bridge window -----------
    | new BAR(II) |   fixed BAR   |   fixed BAR   | fixed BAR | BAR(I)  |
      ^^^^^^^^^^^                                               ^^^^^^

Another example is a fragmented bridge window jammed between fixed BARs:

Before:

     ===================== parent bridge window ========================
                 ---------- hotplug bridge window ----------
    | fixed BAR |   | BAR(I) |    | BAR(II) |    | BAR(III) | fixed BAR |
                      ^^^^^^   ^    ^^^^^^^        ^^^^^^^^
                               |
                           new BAR(IV)

After:

     ==================== parent bridge window =========================
                 ---------- hotplug bridge window ----------
    | fixed BAR | BAR(I) | BAR(II) | BAR(III) | new BAR(IV) | fixed BAR |
                  ^^^^^^   ^^^^^^^   ^^^^^^^^   ^^^^^^^^^^^

This patch is a preparation for future patches with actual implementation,
and for now it just does the following:
 - declares the feature;
 - defines bool pci_can_move_bars and bool pci_dev_bar_movable(dev, bar);
 - invokes the new .rescan_prepare() and .rescan_done() driver notifiers;
 - disables the feature for the powerpc (support will be added later, in
   another series).

This is disabled by default in this first patch of the series, until a
patch in the middle, which finalizes the actual implementation. Then it
can be overridden per-arch using the pci_can_move_bars=false flag or by the
following command line option:

    pci=no_movable_bars

CC: Sam Bobroff <sbobroff@linux.ibm.com>
CC: Rajat Jain <rajatja@google.com>
CC: Lukas Wunner <lukas@wunner.de>
CC: Oliver O'Halloran <oohall@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 .../admin-guide/kernel-parameters.txt         |  1 +
 arch/powerpc/platforms/powernv/pci.c          |  2 +
 arch/powerpc/platforms/pseries/setup.c        |  2 +
 drivers/pci/pci.c                             |  4 +
 drivers/pci/pci.h                             |  2 +
 drivers/pci/probe.c                           | 81 ++++++++++++++++++-
 include/linux/pci.h                           |  6 ++
 7 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ec92120a7952..9f42cf43bf08 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3610,6 +3610,7 @@
 				may put more devices in an IOMMU group.
 		force_floating	[S390] Force usage of floating interrupts.
 		nomio		[S390] Do not use MIO instructions.
+		no_movable_bars	Don't allow BARs to be moved during hotplug
 
 	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
 			Management.
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index c0bea75ac27b..ec772aa31ccf 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -939,6 +939,8 @@ void __init pnv_pci_init(void)
 {
 	struct device_node *np;
 
+	pci_can_move_bars = false;
+
 	pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
 
 	/* If we don't have OPAL, eg. in sim, just skip PCI probe */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 0c8421dd01ab..2edb41f55237 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -927,6 +927,8 @@ static void __init pseries_init(void)
 {
 	pr_debug(" -> pseries_init()\n");
 
+	pci_can_move_bars = false;
+
 #ifdef CONFIG_HVC_CONSOLE
 	if (firmware_has_feature(FW_FEATURE_LPAR))
 		hvc_vio_init_early();
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cb0042f28e6a..6741c7f111ac 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -79,6 +79,8 @@ static void pci_dev_d3_sleep(struct pci_dev *dev)
 int pci_domains_supported = 1;
 #endif
 
+bool pci_can_move_bars;
+
 #define DEFAULT_CARDBUS_IO_SIZE		(256)
 #define DEFAULT_CARDBUS_MEM_SIZE	(64*1024*1024)
 /* pci=cbmemsize=nnM,cbiosize=nn can override this */
@@ -6477,6 +6479,8 @@ static int __init pci_setup(char *str)
 				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
 			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
 				disable_acs_redir_param = str + 18;
+			} else if (!strncmp(str, "no_movable_bars", 15)) {
+				pci_can_move_bars = false;
 			} else {
 				pr_err("PCI: Unknown option `%s'\n", str);
 			}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a0a53bd05a0b..ce8c53ca7a1e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -289,6 +289,8 @@ void pci_disable_bridge_window(struct pci_dev *dev);
 struct pci_bus *pci_bus_get(struct pci_bus *bus);
 void pci_bus_put(struct pci_bus *bus);
 
+bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res);
+
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
 	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2a7e81f146e4..9b6d15587f5d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2935,6 +2935,7 @@ int pci_host_probe(struct pci_host_bridge *bridge)
 	 * or pci_bus_assign_resources().
 	 */
 	if (pci_has_flag(PCI_PROBE_ONLY)) {
+		pci_can_move_bars = false;
 		pci_bus_claim_resources(bus);
 	} else {
 		pci_bus_size_bridges(bus);
@@ -3127,6 +3128,68 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
+{
+	struct pci_bus_region region;
+
+	if (!pci_can_move_bars)
+		return false;
+
+	if (res->flags & IORESOURCE_PCI_FIXED)
+		return false;
+
+	if (dev->driver && dev->driver->rescan_prepare)
+		return true;
+
+	/* Workaround for the legacy VGA memory 0xa0000-0xbffff */
+	pcibios_resource_to_bus(dev->bus, &region, res);
+	if (region.start == 0xa0000)
+		return false;
+
+	if (!dev->driver && !res->child)
+		return true;
+
+	return false;
+}
+
+static void pci_bus_rescan_prepare(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (bus->self)
+		pci_config_pm_runtime_get(bus->self);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child)
+			pci_bus_rescan_prepare(child);
+
+		if (dev->driver &&
+		    dev->driver->rescan_prepare)
+			dev->driver->rescan_prepare(dev);
+	}
+}
+
+static void pci_bus_rescan_done(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (dev->driver &&
+		    dev->driver->rescan_done)
+			dev->driver->rescan_done(dev);
+
+		if (child)
+			pci_bus_rescan_done(child);
+	}
+
+	if (bus->self)
+		pci_config_pm_runtime_put(bus->self);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3139,9 +3202,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
+	struct pci_bus *root = bus;
+
+	while (!pci_is_root_bus(root))
+		root = root->parent;
+
+	if (pci_can_move_bars) {
+		pci_bus_rescan_prepare(root);
+
+		max = pci_scan_child_bus(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		pci_bus_rescan_done(root);
+	} else {
+		max = pci_scan_child_bus(bus);
+		pci_assign_unassigned_bus_resources(bus);
+	}
 
-	max = pci_scan_child_bus(bus);
-	pci_assign_unassigned_bus_resources(bus);
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e473e70834b5..a543f647d337 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -817,6 +817,8 @@ struct module;
  *		e.g. drivers/net/e100.c.
  * @sriov_configure: Optional driver callback to allow configuration of
  *		number of VFs to enable via sysfs "sriov_numvfs" file.
+ * @rescan_prepare: Prepare to BAR movement - called before PCI rescan.
+ * @rescan_done: Remap BARs and restore after PCI rescan.
  * @err_handler: See Documentation/PCI/pci-error-recovery.rst
  * @groups:	Sysfs attribute groups.
  * @driver:	Driver model structure.
@@ -832,6 +834,8 @@ struct pci_driver {
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
+	void (*rescan_prepare)(struct pci_dev *dev);
+	void (*rescan_done)(struct pci_dev *dev);
 	const struct pci_error_handlers *err_handler;
 	const struct attribute_group **groups;
 	struct device_driver	driver;
@@ -1392,6 +1396,8 @@ void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
 
+extern bool pci_can_move_bars;
+
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 04/26] PCI: Add version of release_child_resources() aware of immovable BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (2 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 03/26] PCI: hotplug: Initial support of the movable BARs feature Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 05/26] PCI: hotplug: Fix reassigning the released BARs Sergei Miroshnichenko
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

When using the release_child_resources() to release a bridge window, it
drops the .start field of child's BARs to zero, but with the STARTALIGN
flag remaining set, which makes this resource invalid for re-assignment.

Immovable BARs must preserve their offset and size: those marked with the
PCI_FIXED or which are bound by drivers without support of the movable BARs
feature.

Add the pci_release_child_resources() to replace release_child_resources()
in handling the described PCI-specific cases.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 54 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index f279826204eb..4c464430478f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1500,6 +1500,54 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
 	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
 	 IORESOURCE_MEM_64)
 
+/*
+ * Similar to generic release_child_resources(), but aware of immovable BARs and
+ * PCI_FIXED and STARTALIGN flags
+ */
+static void pci_release_child_resources(struct pci_bus *bus, struct resource *r)
+{
+	struct pci_dev *dev;
+
+	if (!bus || !r)
+		return;
+
+	if (r->flags & IORESOURCE_PCI_FIXED)
+		return;
+
+	r->child = NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+			struct resource *tmp = &dev->resource[i];
+			resource_size_t size = resource_size(tmp);
+
+			if (!tmp->flags || tmp->parent != r)
+				continue;
+
+			tmp->parent = NULL;
+			tmp->sibling = NULL;
+
+			pci_release_child_resources(dev->subordinate, tmp);
+
+			tmp->flags &= ~IORESOURCE_STARTALIGN;
+			tmp->flags |= IORESOURCE_SIZEALIGN;
+
+			if (!pci_dev_bar_movable(dev, tmp)) {
+				pci_dbg(dev, "release immovable BAR %d %pR (%s), keep its flags, base and size\n",
+					i, tmp, tmp->name);
+				continue;
+			}
+
+			pci_dbg(dev, "release BAR %d %pR (%s)\n", i, tmp, tmp->name);
+
+			tmp->start = 0;
+			tmp->end = size - 1;
+		}
+	}
+}
+
 static void pci_bridge_release_resources(struct pci_bus *bus,
 					 unsigned long type)
 {
@@ -1540,7 +1588,11 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		return;
 
 	/* If there are children, release them all */
-	release_child_resources(r);
+	if (pci_can_move_bars)
+		pci_release_child_resources(bus, r);
+	else
+		release_child_resources(r);
+
 	if (!release_resource(r)) {
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 05/26] PCI: hotplug: Fix reassigning the released BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (3 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 04/26] PCI: Add version of release_child_resources() aware of immovable BARs Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 06/26] PCI: hotplug: Recalculate every bridge window during rescan Sergei Miroshnichenko
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

Bridge windows are temporarily released during a PCI rescan, and their old
size is not relevant anymore - it will be recreated in pbus_size_*() from
scratch. To make these functions work correctly, set zero size of released
windows.

If BAR assignment fails after a PCI hotplug event, the kernel will retry
with a fall-back reduced configuration, so don't apply reset_resource() for
non-window BARs to keep them valid for the next attempt.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4c464430478f..2cf07003d5fc 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -295,7 +295,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
 						    0 /* don't care */,
 						    0 /* don't care */);
 			}
-			reset_resource(res);
+			if (!pci_can_move_bars ||
+			    idx >= PCI_BRIDGE_RESOURCES)
+				reset_resource(res);
 		}
 	}
 }
@@ -1597,8 +1599,8 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
 			 PCI_BRIDGE_RESOURCES + idx, r);
-		/* Keep the old size */
-		r->end = resource_size(r) - 1;
+		/* Don't keep the old size if the bridge will be recalculated */
+		r->end = pci_can_move_bars ? 0 : (resource_size(r) - 1);
 		r->start = 0;
 		r->flags = 0;
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 06/26] PCI: hotplug: Recalculate every bridge window during rescan
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (4 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 05/26] PCI: hotplug: Fix reassigning the released BARs Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 07/26] PCI: hotplug: Don't allow hot-added devices to steal resources Sergei Miroshnichenko
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

When the movable BARs feature is enabled and a rescan has been requested,
release all the bridge windows and recalculate them from scratch, taking
into account all kinds for BARs: immovable+fixed, movable, new.

Comparing to simply trying to expand bridge windows, this also employs the
PCI ability to shuffle BARs within a bridge, increasing the chances to find
a memory space to fit BARs for newly hotplugged devices, especially if no
(not enough) gaps were reserved by the BIOS/bootloader/firmware.

The last step of writing the recalculated windows to the bridges is done
by the new pci_setup_bridges() function.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/probe.c     | 22 ++++++++++++++++++++++
 drivers/pci/setup-bus.c | 16 ++++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ce8c53ca7a1e..fe995993b481 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -283,6 +283,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
+void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 9b6d15587f5d..daaaebdfd4c4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3190,6 +3190,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 		pci_config_pm_runtime_put(bus->self);
 }
 
+static void pci_setup_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child;
+
+		if (!pci_dev_is_added(dev))
+			continue;
+
+		child = dev->subordinate;
+		if (child)
+			pci_setup_bridges(child);
+	}
+
+	if (bus->self)
+		pci_setup_bridge(bus);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3211,8 +3230,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+
+		pci_bus_release_root_bridge_resources(root);
 		pci_assign_unassigned_root_bus_resources(root);
 
+		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
 	} else {
 		max = pci_scan_child_bus(bus);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 2cf07003d5fc..7a9bf979908d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1654,6 +1654,22 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
 		pci_bridge_release_resources(bus, type);
 }
 
+void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
+{
+	int i;
+	struct resource *r;
+
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus,
+					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
+					 whole_subtree);
+
+	pci_bus_for_each_resource(root_bus, r, i) {
+		pci_release_child_resources(root_bus, r);
+	}
+}
+
 static void pci_bus_dump_res(struct pci_bus *bus)
 {
 	struct resource *res;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 07/26] PCI: hotplug: Don't allow hot-added devices to steal resources
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (5 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 06/26] PCI: hotplug: Recalculate every bridge window during rescan Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 08/26] PCI: hotplug: Try to reassign movable BARs only once Sergei Miroshnichenko
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

When movable BARs are enabled, the PCI subsystem at first releases all the
bridge windows and then attempts to assign resources both to previously
working devices and to the newly hotplugged ones, with the same priority.

If a hotplugged device gets its BARs first, this may lead to lack of space
for already working devices, which is unacceptable. If that happens, mark
one of the new devices with the newly introduced flag PCI_DEV_DISABLED_BARS
(if it is not yet marked) and retry the BAR recalculation.

The worst case would be no BARs for hotplugged devices, while all the rest
just continue working.

The algorithm is simple and it doesn't retry different subsets of hot-added
devices in case of a failure, e.g. if there are no space to allocate BARs
for both hotplugged devices A and B, but is enough for just A, the A will
be marked with PCI_DEV_DISABLED_BARS first, then (after the next failure) -
B. As a result, A will not get BARs while it could. This issue is only
relevant when hotplugging two and more devices simultaneously.

Add a new res_mask bitmask to the struct pci_dev for storing the indices of
assigned BARs.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  11 +++++
 drivers/pci/probe.c     | 102 ++++++++++++++++++++++++++++++++++++++--
 drivers/pci/setup-bus.c |  15 ++++++
 drivers/pci/setup-res.c |   3 ++
 include/linux/pci.h     |   1 +
 5 files changed, 129 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fe995993b481..3b4c982772d3 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -406,6 +406,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DEV_DISABLED_BARS 1
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -417,6 +418,16 @@ static inline bool pci_dev_is_added(const struct pci_dev *dev)
 	return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
 }
 
+static inline void pci_dev_disable_bars(struct pci_dev *dev)
+{
+	assign_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags, true);
+}
+
+static inline bool pci_dev_bars_enabled(const struct pci_dev *dev)
+{
+	return !test_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags);
+}
+
 #ifdef CONFIG_PCIEAER
 #include <linux/aer.h>
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index daaaebdfd4c4..1e8bbf5c99f6 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3152,6 +3152,23 @@ bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
 	return false;
 }
 
+static unsigned int pci_dev_count_res_mask(struct pci_dev *dev)
+{
+	unsigned int res_mask = 0;
+	int i;
+
+	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+			continue;
+
+		res_mask |= (1 << i);
+	}
+
+	return res_mask;
+}
+
 static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3162,6 +3179,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
+		dev->res_mask = pci_dev_count_res_mask(dev);
+
 		if (child)
 			pci_bus_rescan_prepare(child);
 
@@ -3197,7 +3216,7 @@ static void pci_setup_bridges(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child;
 
-		if (!pci_dev_is_added(dev))
+		if (!pci_dev_is_added(dev) || !pci_dev_bars_enabled(dev))
 			continue;
 
 		child = dev->subordinate;
@@ -3209,6 +3228,83 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (!bus)
+		return NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child_bus = dev->subordinate;
+
+		if (child_bus) {
+			struct pci_dev *next_new_dev;
+
+			next_new_dev = pci_find_next_new_device(child_bus);
+			if (next_new_dev)
+				return next_new_dev;
+		}
+
+		if (!pci_dev_is_added(dev) && pci_dev_bars_enabled(dev))
+			return dev;
+	}
+
+	return NULL;
+}
+
+static bool pci_bus_check_all_bars_reassigned(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	bool ret = true;
+
+	if (!bus)
+		return false;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+		unsigned int res_mask = pci_dev_count_res_mask(dev);
+
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
+		if (dev->res_mask & ~res_mask) {
+			pci_err(dev, "Non-re-enabled resources found: 0x%x -> 0x%x\n",
+				dev->res_mask, res_mask);
+			ret = false;
+		}
+
+		if (child && !pci_bus_check_all_bars_reassigned(child))
+			ret = false;
+	}
+
+	return ret;
+}
+
+static void pci_reassign_root_bus_resources(struct pci_bus *root)
+{
+	do {
+		struct pci_dev *next_new_dev;
+
+		pci_assign_unassigned_root_bus_resources(root);
+
+		if (pci_bus_check_all_bars_reassigned(root))
+			break;
+
+		next_new_dev = pci_find_next_new_device(root);
+		if (!next_new_dev) {
+			dev_err(&root->dev, "failed to re-assign resources even after ignoring all the hotplugged devices\n");
+			break;
+		}
+
+		dev_warn(&root->dev, "failed to re-assign resources, disable the next hotplugged device %s and retry\n",
+			 dev_name(&next_new_dev->dev));
+
+		pci_dev_disable_bars(next_new_dev);
+		pci_bus_release_root_bridge_resources(root);
+	} while (true);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3228,11 +3324,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 	if (pci_can_move_bars) {
 		pci_bus_rescan_prepare(root);
+		pci_bus_release_root_bridge_resources(root);
 
 		max = pci_scan_child_bus(root);
 
-		pci_bus_release_root_bridge_resources(root);
-		pci_assign_unassigned_root_bus_resources(root);
+		pci_reassign_root_bus_resources(root);
 
 		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7a9bf979908d..b9521ca8b24c 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -128,6 +128,9 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	int i;
 
+	if (!pci_dev_bars_enabled(dev))
+		return;
+
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *r;
 		struct pci_dev_resource *dev_res, *tmp;
@@ -177,6 +180,9 @@ static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
 	u16 class = dev->class >> 8;
 
+	if (!pci_dev_bars_enabled(dev))
+		return;
+
 	/* Don't touch classless devices or host bridges or IOAPICs */
 	if (class == PCI_CLASS_NOT_DEFINED || class == PCI_CLASS_BRIDGE_HOST)
 		return;
@@ -278,6 +284,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
 	int idx;
 
 	list_for_each_entry(dev_res, head, list) {
+		if (!pci_dev_bars_enabled(dev_res->dev))
+			continue;
+
 		res = dev_res->res;
 		idx = res - &dev_res->dev->resource[0];
 		if (resource_size(res) &&
@@ -1012,6 +1021,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
 
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
 		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
@@ -1368,6 +1380,9 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (!pci_dev_bars_enabled(dev))
+			continue;
+
 		pdev_assign_fixed_resources(dev);
 
 		b = dev->subordinate;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d8ca40a97693..17c79057f517 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -458,6 +458,9 @@ int pci_enable_resources(struct pci_dev *dev, int mask)
 	int i;
 	struct resource *r;
 
+	if (!pci_dev_bars_enabled(dev))
+		return -ENOSPC;
+
 	pci_read_config_word(dev, PCI_COMMAND, &cmd);
 	old_cmd = cmd;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a543f647d337..a8f2c26757fe 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -375,6 +375,7 @@ struct pci_dev {
 	 */
 	unsigned int	irq;
 	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+	unsigned int	res_mask;		/* Bitmask of assigned resources */
 
 	bool		match_driver;		/* Skip attaching driver */
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 08/26] PCI: hotplug: Try to reassign movable BARs only once
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (6 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 07/26] PCI: hotplug: Don't allow hot-added devices to steal resources Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows Sergei Miroshnichenko
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

With enabled BAR movement, BARs and bridge windows can only be assigned to
their direct parents, so there can be only one variant of resource tree,
thus every retry within the pci_assign_unassigned_root_bus_resources() will
result in the same tree, and it is enough to try just once.

In case of failures the pci_reassign_root_bus_resources() disables BARs for
one of the hotplugged devices and tries the assignment again.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b9521ca8b24c..e87fefa12f62 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1827,6 +1827,13 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	int pci_try_num = 1;
 	enum enable_type enable_local;
 
+	if (pci_can_move_bars) {
+		__pci_bus_size_bridges(bus, NULL);
+		__pci_bus_assign_resources(bus, NULL, NULL);
+
+		goto dump;
+	}
+
 	/* Don't realloc if asked to do so */
 	enable_local = pci_realloc_detect(bus, pci_realloc_enable);
 	if (pci_realloc_enabled(enable_local)) {
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (7 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 08/26] PCI: hotplug: Try to reassign movable BARs only once Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-30 21:26   ` Bjorn Helgaas
  2020-01-29 15:29 ` [PATCH v7 10/26] PCI: Include fixed and immovable BARs into the bus size calculating Sergei Miroshnichenko
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

When movable BARs are enabled, and if a bridge contains a device with fixed
(IORESOURCE_PCI_FIXED) or immovable BARs, the corresponding windows can't
be moved too far away from their original positions - they must still
contain all the fixed/immovable BARs, like that:

1) Window position before a bus rescan:

   | <--                    root bridge window                        --> |
   |                                                                      |
   | | <--     bridge window    --> |                                     |
   | | movable BARs | **fixed BAR** |                                     |
       ^^^^^^^^^^^^

2) Possible valid outcome after rescan and move:

   | <--                    root bridge window                        --> |
   |                                                                      |
   |                | <--     bridge window    --> |                      |
   |                | **fixed BAR** | Movable BARs |                      |
                                      ^^^^^^^^^^^^

An immovable area of a bridge window is a range that covers all the fixed
and immovable BARs of direct children, and all the fixed area of children
bridges:

   | <--                    root bridge window                        --> |
   |                                                                      |
   |  | <--                  bridge window level 1                --> |   |
   |  | ******************** immovable area *******************       |   |
   |  |                                                               |   |
   |  | **fixed BAR** | <--      bridge window level 2     --> | BARs |   |
   |  |               | *********** immovable area *********** |      |   |
   |  |               |                                        |      |   |
   |  |               | **fixed BAR** |  BARs  | **fixed BAR** |      |   |
                                         ^^^^

To store these areas, the .immovable_range field has been added to struct
pci_bus for every bridge window type: IO, MEM and PREFETCH. It is filled
recursively from leaves to the root before a rescan.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h   | 14 ++++++++
 drivers/pci/probe.c | 88 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  6 ++++
 3 files changed, 108 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 3b4c982772d3..5f2051c8531c 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -404,6 +404,20 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 	return dev->error_state == pci_channel_io_perm_failure;
 }
 
+static inline int pci_get_bridge_resource_idx(struct resource *r)
+{
+	int idx = 1;
+
+	if (r->flags & IORESOURCE_IO)
+		idx = 0;
+	else if (!(r->flags & IORESOURCE_PREFETCH))
+		idx = 1;
+	else if (r->flags & IORESOURCE_MEM_64)
+		idx = 2;
+
+	return idx;
+}
+
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DEV_DISABLED_BARS 1
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1e8bbf5c99f6..bb584038d5b4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -546,6 +546,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
 static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 {
 	struct pci_bus *b;
+	int idx;
 
 	b = kzalloc(sizeof(*b), GFP_KERNEL);
 	if (!b)
@@ -562,6 +563,11 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	if (parent)
 		b->domain_nr = parent->domain_nr;
 #endif
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		b->immovable_range[idx].start = 0;
+		b->immovable_range[idx].end = 0;
+	}
+
 	return b;
 }
 
@@ -3228,6 +3234,87 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static void pci_bus_update_immovable_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int idx;
+	resource_size_t start, end;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		bus->immovable_range[idx].start = 0;
+		bus->immovable_range[idx].end = 0;
+	}
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_immovable_range(dev->subordinate);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+		struct pci_bus *child = dev->subordinate;
+
+		for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+			struct resource *r = &dev->resource[i];
+
+			if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+				continue;
+
+			if (!pci_dev_bar_movable(dev, r)) {
+				idx = pci_get_bridge_resource_idx(r);
+				start = bus->immovable_range[idx].start;
+				end = bus->immovable_range[idx].end;
+
+				if (!start || start > r->start)
+					start = r->start;
+				if (end < r->end)
+					end = r->end;
+
+				if (bus->immovable_range[idx].start != start ||
+				    bus->immovable_range[idx].end != end) {
+					dev_dbg(&bus->dev, "Found fixed BAR%d 0x%llx-0x%llx in %s, expand the fixed bridge window %d to 0x%llx-0x%llx\n",
+						i,
+						(unsigned long long)r->start,
+						(unsigned long long)r->end,
+						dev_name(&dev->dev), idx,
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->immovable_range[idx].start = start;
+					bus->immovable_range[idx].end = end;
+				}
+			}
+		}
+
+		if (child) {
+			for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+				struct resource *child_immovable_range =
+					&child->immovable_range[idx];
+
+				if (child_immovable_range->start >=
+				    child_immovable_range->end)
+					continue;
+
+				start = bus->immovable_range[idx].start;
+				end = bus->immovable_range[idx].end;
+
+				if (!start || start > child_immovable_range->start)
+					start = child_immovable_range->start;
+				if (end < child_immovable_range->end)
+					end = child_immovable_range->end;
+
+				if (start < bus->immovable_range[idx].start ||
+				    end > bus->immovable_range[idx].end) {
+					dev_dbg(&bus->dev, "Expand the fixed bridge window %d from %s to 0x%llx-0x%llx\n",
+						idx, dev_name(&child->dev),
+						(unsigned long long)start,
+						(unsigned long long)end);
+					bus->immovable_range[idx].start = start;
+					bus->immovable_range[idx].end = end;
+				}
+			}
+		}
+	}
+}
+
 static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3324,6 +3411,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 	if (pci_can_move_bars) {
 		pci_bus_rescan_prepare(root);
+		pci_bus_update_immovable_range(root);
 		pci_bus_release_root_bridge_resources(root);
 
 		max = pci_scan_child_bus(root);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a8f2c26757fe..d6eb498f7997 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -582,6 +582,12 @@ struct pci_bus {
 	struct list_head resources;	/* Address space routed to this bus */
 	struct resource busn_res;	/* Bus numbers routed to this bus */
 
+	/*
+	 * If there are fixed or immovable resources in the bridge window, this range
+	 * contains the lowest start address and highest end address of them.
+	 */
+	struct resource immovable_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 10/26] PCI: Include fixed and immovable BARs into the bus size calculating
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (8 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 11/26] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows Sergei Miroshnichenko
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

The only difference between the fixed/immovable and movable BARs is a size
and offset preservation after they are released (the corresponding struct
resource* detached from a bridge window for a while during a bus rescan).

Include fixed/immovable BARs into result of pbus_size_mem().

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index e87fefa12f62..5874c352b41e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -891,6 +891,11 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
 
+	resource_size_t fixed_start = bus->immovable_range[0].start;
+	resource_size_t fixed_end = bus->immovable_range[0].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
 	if (!b_res)
 		return;
 
@@ -898,6 +903,9 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	if (b_res->parent)
 		return;
 
+	if (min_size < fixed_size)
+		min_size = fixed_size;
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
@@ -1006,6 +1014,15 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	int idx = pci_get_bridge_resource_idx(b_res);
+
+	resource_size_t fixed_start = bus->immovable_range[idx].start;
+	resource_size_t fixed_end = bus->immovable_range[idx].end;
+	resource_size_t fixed_size = (fixed_start < fixed_end) ?
+		(fixed_end - fixed_start + 1) : 0;
+
+	if (min_size < fixed_size)
+		min_size = fixed_size;
 
 	if (!b_res)
 		return -ENOSPC;
@@ -1028,12 +1045,22 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
 
-			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			if (r->parent ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
 			     (r->flags & mask) != type3))
 				continue;
 			r_size = resource_size(r);
+
+			if (pci_can_move_bars && !pci_dev_bar_movable(dev, r)) {
+				size += r_size;
+
+				continue;
+			} else if (!pci_can_move_bars &&
+				   (r->flags & IORESOURCE_PCI_FIXED)) {
+				continue;
+			}
+
 #ifdef CONFIG_PCI_IOV
 			/* Put SRIOV requested res to the optional list */
 			if (realloc_head && i >= PCI_IOV_RESOURCES &&
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 11/26] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (9 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 10/26] PCI: Include fixed and immovable BARs into the bus size calculating Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 12/26] PCI: Make sure bridge windows include their fixed BARs Sergei Miroshnichenko
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

With enabled movable BARs, bridge windows are recalculated during each PCI
rescan. Some of the BARs below a bridge may be immovable: these areas are
represented by the .immovable_range field in struct pci_bus.

If a bridge window size is equal to its immovable range, it can only be
assigned to the start of this range. But if a bridge window size is larger,
and this difference in size is denoted as "delta", the window can start
from (immovable_range.start - delta) to (immovable_range.start), and it can
end from (immovable_range.end) to (immovable_range.end + delta). This range
(the new .realloc_range field in struct pci_bus) must then be compared with
immovable ranges of neighbouring bridges to guarantee absence of
intersections.

This patch only calculates valid ranges for reallocated bridges during pci
rescan, and the next one will make use of these values during allocation.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/setup-bus.c | 86 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |  6 +++
 3 files changed, 93 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5f2051c8531c..ddd6fd33f9c3 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -291,6 +291,7 @@ struct pci_bus *pci_bus_get(struct pci_bus *bus);
 void pci_bus_put(struct pci_bus *bus);
 
 bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res);
+void pci_bus_update_realloc_range(struct pci_bus *bus);
 
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 5874c352b41e..71babb968830 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1837,6 +1837,91 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 }
 #endif
 
+/*
+ * Calculate the address margins where the bridge windows may be allocated to fit all
+ * the fixed and immovable BARs beneath.
+ */
+void pci_bus_update_realloc_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	struct pci_bus *parent = bus->parent;
+	int idx;
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_realloc_range(dev->subordinate);
+
+	if (!parent || !bus->self)
+		return;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		struct resource *immovable_range = &bus->immovable_range[idx];
+		resource_size_t window_size = resource_size(bus->resource[idx]);
+		resource_size_t realloc_start, realloc_end;
+
+		bus->realloc_range[idx].start = 0;
+		bus->realloc_range[idx].end = 0;
+
+		/* Check if there any immovable BARs under the bridge */
+		if (immovable_range->start >= immovable_range->end)
+			continue;
+
+		/* The lowest possible address where the bridge window can start */
+		realloc_start = immovable_range->end - window_size + 1;
+		/* The highest possible address where the bridge window can end */
+		realloc_end = immovable_range->start + window_size - 1;
+
+		if (realloc_start > immovable_range->start)
+			realloc_start = immovable_range->start;
+
+		if (realloc_end < immovable_range->end)
+			realloc_end = immovable_range->end;
+
+		/*
+		 * Check that realloc range doesn't intersect with hard fixed ranges
+		 * of neighboring bridges
+		 */
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			struct pci_bus *neighbor = dev->subordinate;
+			struct resource *n_imm_range;
+
+			if (!neighbor) {
+				int i;
+
+				for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; ++i) {
+					struct resource *nr = &dev->resource[i];
+
+					if (!nr->flags ||
+					    !nr->start ||
+					    pci_dev_bar_movable(dev, nr))
+						continue;
+
+					if (nr->end < immovable_range->start &&
+					    nr->end > realloc_start)
+						realloc_start = nr->end;
+				}
+
+				continue;
+			}
+
+			if (neighbor == bus)
+				continue;
+
+			n_imm_range = &neighbor->immovable_range[idx];
+
+			if (n_imm_range->start >= n_imm_range->end)
+				continue;
+
+			if (n_imm_range->end < immovable_range->start &&
+			    n_imm_range->end > realloc_start)
+				realloc_start = n_imm_range->end;
+		}
+
+		bus->realloc_range[idx].start = realloc_start;
+		bus->realloc_range[idx].end = realloc_end;
+	}
+}
+
 /*
  * First try will not touch PCI bridge res.
  * Second and later try will clear small leaf bridge res.
@@ -1856,6 +1941,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 
 	if (pci_can_move_bars) {
 		__pci_bus_size_bridges(bus, NULL);
+		pci_bus_update_realloc_range(bus);
 		__pci_bus_assign_resources(bus, NULL, NULL);
 
 		goto dump;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d6eb498f7997..8830a254405f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -588,6 +588,12 @@ struct pci_bus {
 	 */
 	struct resource immovable_range[PCI_BRIDGE_RESOURCE_NUM];
 
+	/*
+	 * Acceptable address range, where the bridge window may reside, considering its
+	 * size, so it will cover all the fixed and immovable BARs below.
+	 */
+	struct resource realloc_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 12/26] PCI: Make sure bridge windows include their fixed BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (10 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 11/26] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 13/26] PCI: hotplug: Add support of immovable BARs to pci_assign_resource() Sergei Miroshnichenko
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

When choosing a start address for a bridge window, it should be not just a
lowest possible address: this window must cover every underlying immovable
BAR. The lowest address that satisfies this requirement is the
.realloc_range field of struct pci_bus.

After allocating a bridge window, validate that it covers all its immovable
BARs: this range is put to the .immovable_range field of struct pci_bus.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |  2 +-
 drivers/pci/setup-res.c | 31 +++++++++++++++++++++++++++++--
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 8e40b3e6da77..a1efa87e31b9 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -192,7 +192,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 		 * this is an already-configured bridge window, its start
 		 * overrides "min".
 		 */
-		if (avail.start)
+		if (min_used < avail.start)
 			min_used = avail.start;
 
 		max = avail.end;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 17c79057f517..3582ae68840b 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -248,9 +248,23 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
 	int ret;
+	resource_size_t start = (resource_size_t)-1;
+	resource_size_t end = 0;
 
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
+	if (pci_can_move_bars && dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
+		struct pci_bus *child_bus = dev->subordinate;
+		int b_resno = resno - PCI_BRIDGE_RESOURCES;
+		struct resource *immovable_range = &child_bus->immovable_range[b_resno];
+
+		if (immovable_range->start < immovable_range->end) {
+			start = immovable_range->start;
+			end = immovable_range->end;
+			min = child_bus->realloc_range[b_resno].start;
+		}
+	}
+
 	/*
 	 * First, try exact prefetching match.  Even if a 64-bit
 	 * prefetchable bridge window is below 4GB, we can't put a 32-bit
@@ -262,7 +276,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 				     IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
 				     pcibios_align_resource, dev);
 	if (ret == 0)
-		return 0;
+		goto check_fixed;
 
 	/*
 	 * If the prefetchable window is only 32 bits wide, we can put
@@ -274,7 +288,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 					     IORESOURCE_PREFETCH,
 					     pcibios_align_resource, dev);
 		if (ret == 0)
-			return 0;
+			goto check_fixed;
 	}
 
 	/*
@@ -287,6 +301,19 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 					     pcibios_align_resource, dev);
 
+check_fixed:
+	if (pci_can_move_bars && ret == 0 && start < end) {
+		if (res->start > start || res->end < end) {
+			dev_err(&bus->dev, "fixed area 0x%llx-0x%llx for %s doesn't fit in the allocated %pR (0x%llx-0x%llx)",
+				(unsigned long long)start, (unsigned long long)end,
+				dev_name(&dev->dev),
+				res, (unsigned long long)res->start,
+				(unsigned long long)res->end);
+			release_resource(res);
+			return -1;
+		}
+	}
+
 	return ret;
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 13/26] PCI: hotplug: Add support of immovable BARs to pci_assign_resource()
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (11 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 12/26] PCI: Make sure bridge windows include their fixed BARs Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 14/26] PCI: hotplug: Sort immovable BARs before assignment Sergei Miroshnichenko
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

The PCI_FIXED and other immovable BARs must be assigned after the bridge
windows of parent bridge and before "usual" BARs, but currently they are
assigned the last by the pdev_assign_fixed_resources().

Let the immovable BARs be handled by pci_assign_resource() in the same way
as it does for movable ones, assigning them in correct order, unifying the
code.

Allow matching IORESOURCE_PCI_FIXED prefetchable BARs to non-prefetchable
windows, so they follow the same rules as immovable BARs.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  2 ++
 drivers/pci/setup-bus.c | 25 ++++++++++++++++++-------
 drivers/pci/setup-res.c |  5 ++++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ddd6fd33f9c3..3ffbee2c3243 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -293,6 +293,8 @@ void pci_bus_put(struct pci_bus *bus);
 bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res);
 void pci_bus_update_realloc_range(struct pci_bus *bus);
 
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r);
+
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
 	((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 71babb968830..cdbf4afe1334 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1356,21 +1356,31 @@ void pci_bus_size_bridges(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_size_bridges);
 
-static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
 	int i;
 	struct resource *parent_r;
-	unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
-			     IORESOURCE_PREFETCH;
+	unsigned long mask = IORESOURCE_TYPE_BITS;
 
 	pci_bus_for_each_resource(b, parent_r, i) {
 		if (!parent_r)
 			continue;
 
-		if ((r->flags & mask) == (parent_r->flags & mask) &&
-		    resource_contains(parent_r, r))
-			request_resource(parent_r, r);
+		if ((r->flags & mask) != (parent_r->flags & mask))
+			continue;
+
+		if (parent_r->flags & IORESOURCE_PREFETCH &&
+		    !(r->flags & IORESOURCE_PREFETCH))
+			continue;
+
+		if (resource_contains(parent_r, r)) {
+			if (!request_resource(parent_r, r))
+				return 0;
+		}
 	}
+
+	dev_err(&b->dev, "failed to assign immovable %pR\n", r);
+	return -EBUSY;
 }
 
 /*
@@ -1410,7 +1420,8 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 		if (!pci_dev_bars_enabled(dev))
 			continue;
 
-		pdev_assign_fixed_resources(dev);
+		if (!pci_can_move_bars)
+			pdev_assign_fixed_resources(dev);
 
 		b = dev->subordinate;
 		if (!b)
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 3582ae68840b..a7d81816d1ea 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -339,7 +339,10 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
 	resource_size_t align, size;
 	int ret;
 
-	if (res->flags & IORESOURCE_PCI_FIXED)
+	if (pci_can_move_bars && !pci_dev_bar_movable(dev, res) &&
+	    resno < PCI_ROM_RESOURCE && res->start)
+		return assign_fixed_resource_on_bus(dev->bus, res);
+	else if (!pci_can_move_bars && res->flags & IORESOURCE_PCI_FIXED)
 		return 0;
 
 	res->flags |= IORESOURCE_UNSET;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 14/26] PCI: hotplug: Sort immovable BARs before assignment
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (12 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 13/26] PCI: hotplug: Add support of immovable BARs to pci_assign_resource() Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 15/26] PCI: hotplug: Enable the movable BARs feature by default Sergei Miroshnichenko
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

Immovable BARs and bridge windows containing immovable BARs must be
assigned before the "usual" ones.

When assigning an immovable BAR/bridge window, its start address is chosen
to be the lowest possible. To prevent conflicts, sort such resources based
on the start address of their immovable areas.

Add support of immovable BARs and bridge windows to pdev_sort_resources().

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 66 +++++++++++++++++++++++++++++++++++------
 1 file changed, 57 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index cdbf4afe1334..573d5c67c136 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -122,9 +122,9 @@ static resource_size_t get_res_add_align(struct list_head *head,
 	return dev_res ? dev_res->min_align : 0;
 }
 
-
 /* Sort resources by alignment */
-static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
+static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head,
+				struct list_head *head_immovables)
 {
 	int i;
 
@@ -136,17 +136,31 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		struct pci_dev_resource *dev_res, *tmp;
 		resource_size_t r_align;
 		struct list_head *n;
+		struct resource *fixed_res = NULL;
 
 		r = &dev->resource[i];
 
-		if (r->flags & IORESOURCE_PCI_FIXED)
+		if (!(r->flags) || r->parent)
 			continue;
 
-		if (!(r->flags) || r->parent)
+		if (pci_can_move_bars) {
+			if (i >= PCI_BRIDGE_RESOURCES &&
+			    dev->subordinate) {
+				int idx = i - PCI_BRIDGE_RESOURCES;
+
+				fixed_res = &dev->subordinate->immovable_range[idx];
+			} else if (!pci_dev_bar_movable(dev, r)) {
+				fixed_res = r;
+			}
+
+			if (fixed_res && fixed_res->start >= fixed_res->end)
+				fixed_res = NULL;
+		} else if (r->flags & IORESOURCE_PCI_FIXED) {
 			continue;
+		}
 
 		r_align = pci_resource_alignment(dev, r);
-		if (!r_align) {
+		if (!r_align && !fixed_res) {
 			pci_warn(dev, "BAR %d: %pR has bogus alignment\n",
 				 i, r);
 			continue;
@@ -158,6 +172,30 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		tmp->res = r;
 		tmp->dev = dev;
 
+		if (fixed_res) {
+			n = head_immovables;
+			list_for_each_entry(dev_res, head_immovables, list) {
+				struct resource *c_fixed_res = NULL;
+				int c_resno = dev_res->res - dev_res->dev->resource;
+				int br_idx = c_resno - PCI_BRIDGE_RESOURCES;
+				struct pci_bus *cbus = dev_res->dev->subordinate;
+
+				if (br_idx >= 0)
+					c_fixed_res = &cbus->immovable_range[br_idx];
+				else
+					c_fixed_res = dev_res->res;
+
+				if (fixed_res->start < c_fixed_res->start) {
+					n = &dev_res->list;
+					break;
+				}
+			}
+			/* Insert it just before n */
+			list_add_tail(&tmp->list, n);
+
+			continue;
+		}
+
 		/* Fallback is smallest one or list is empty */
 		n = head;
 		list_for_each_entry(dev_res, head, list) {
@@ -176,7 +214,8 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 	}
 }
 
-static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
+static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head,
+				 struct list_head *head_immovables)
 {
 	u16 class = dev->class >> 8;
 
@@ -195,7 +234,7 @@ static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
 			return;
 	}
 
-	pdev_sort_resources(dev, head);
+	pdev_sort_resources(dev, head, head_immovables);
 }
 
 static inline void reset_resource(struct resource *res)
@@ -493,8 +532,13 @@ static void pdev_assign_resources_sorted(struct pci_dev *dev,
 					 struct list_head *fail_head)
 {
 	LIST_HEAD(head);
+	LIST_HEAD(head_immovables);
+
+	__dev_sort_resources(dev, &head, &head_immovables);
+
+	if (!list_empty(&head_immovables))
+		__assign_resources_sorted(&head_immovables, NULL, NULL);
 
-	__dev_sort_resources(dev, &head);
 	__assign_resources_sorted(&head, add_head, fail_head);
 
 }
@@ -505,9 +549,13 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
 {
 	struct pci_dev *dev;
 	LIST_HEAD(head);
+	LIST_HEAD(head_immovables);
 
 	list_for_each_entry(dev, &bus->devices, bus_list)
-		__dev_sort_resources(dev, &head);
+		__dev_sort_resources(dev, &head, &head_immovables);
+
+	if (!list_empty(&head_immovables))
+		__assign_resources_sorted(&head_immovables, NULL, NULL);
 
 	__assign_resources_sorted(&head, realloc_head, fail_head);
 }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 15/26] PCI: hotplug: Enable the movable BARs feature by default
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (13 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 14/26] PCI: hotplug: Sort immovable BARs before assignment Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM Sergei Miroshnichenko
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

This is the last patch in the series which implements the essentials of the
movable BARs feature, so it is turned on by default now. Tested on Intel
and AMD machines with "pci=pcie_bus_peer2peer" command line argument.

In case of problems it is still can be overridden by the following command
line option:

  pcie_movable_bars=off

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6741c7f111ac..db8cfec2baff 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -79,7 +79,7 @@ static void pci_dev_d3_sleep(struct pci_dev *dev)
 int pci_domains_supported = 1;
 #endif
 
-bool pci_can_move_bars;
+bool pci_can_move_bars = true;
 
 #define DEFAULT_CARDBUS_IO_SIZE		(256)
 #define DEFAULT_CARDBUS_MEM_SIZE	(64*1024*1024)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (14 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 15/26] PCI: hotplug: Enable the movable BARs feature by default Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-30 23:52   ` Bjorn Helgaas
  2020-01-29 15:29 ` [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader Sergei Miroshnichenko
                   ` (10 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

BARs and bridge windows are only allowed to be assigned to their
parent bus's bridge windows, going up to the root complex's resources.
So additional limitations on BAR address are not needed, and the
PCIBIOS_MIN_MEM can be ignored.

Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS 1.3 on
Supermicro H11SSL-i via e820__setup_pci_gap():

  [mem 0xebff1000-0xfe9fffff] available for PCI devices

is only suitable for a single RC out of four:

  pci_bus 0000:00: root bus resource [mem 0xec000000-0xefffffff window]
  pci_bus 0000:20: root bus resource [mem 0xeb800000-0xebefffff window]
  pci_bus 0000:40: root bus resource [mem 0xeb200000-0xeb5fffff window]
  pci_bus 0000:60: root bus resource [mem 0xe8b00000-0xeaffffff window]

, which makes the AMD EPYC 7251 unable to boot with this movable BARs
patchset.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-res.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index a7d81816d1ea..4043aab021dd 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -246,12 +246,13 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		int resno, resource_size_t size, resource_size_t align)
 {
 	struct resource *res = dev->resource + resno;
-	resource_size_t min;
+	resource_size_t min = 0;
 	int ret;
 	resource_size_t start = (resource_size_t)-1;
 	resource_size_t end = 0;
 
-	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
+	if (!pci_can_move_bars)
+		min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
 	if (pci_can_move_bars && dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
 		struct pci_bus *child_bus = dev->subordinate;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (15 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-31 20:31   ` Bjorn Helgaas
  2020-01-29 15:29 ` [PATCH v7 18/26] PCI: Treat VGA BARs as immovable Sergei Miroshnichenko
                   ` (9 subsequent siblings)
  26 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

BAR allocation by BIOS/UEFI/bootloader/firmware may be non-optimal and
it may even clash with the kernel's BAR assignment algorithm.

For example, sometimes BIOS doesn't reserve space for SR-IOV BARs, and
this bridge window can neither extend (blocked by immovable BARs) nor
move (the device itself is immovable).

With this patch the kernel will use its own methods of BAR allocating
when possible, increasing the chances of successful boot and hotplug.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bb584038d5b4..f8f643dac6d1 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 			 pos, (unsigned long long)region.start);
 	}
 
+	if (pci_can_move_bars && res->start && !(res->flags & IORESOURCE_IO)) {
+		pci_warn(dev, "ignore the current offset of BAR %llx-%llx\n",
+			 l64, l64 + sz64 - 1);
+		res->start = 0;
+		res->end = sz64 - 1;
+		res->flags |= IORESOURCE_SIZEALIGN;
+	}
+
 	goto out;
 
 
@@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus *child)
 		child->resource[i] = &dev->resource[PCI_BRIDGE_RESOURCES+i];
 
 	pci_read_bridge_io(child);
-	pci_read_bridge_mmio(child);
-	pci_read_bridge_mmio_pref(child);
+	if (!pci_can_move_bars) {
+		pci_read_bridge_mmio(child);
+		pci_read_bridge_mmio_pref(child);
+	}
 
 	if (dev->transparent) {
 		pci_bus_for_each_resource(child->parent, res, i) {
@@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge *bridge)
 		pci_bus_claim_resources(bus);
 	} else {
 		pci_bus_size_bridges(bus);
+		if (pci_can_move_bars)
+			pci_bus_update_realloc_range(bus);
 		pci_bus_assign_resources(bus);
 
 		list_for_each_entry(child, &bus->children, node)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 18/26] PCI: Treat VGA BARs as immovable
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (16 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 19/26] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan Sergei Miroshnichenko
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

Some framebuffer drivers (efifb) don't act as a PCI driver (like nouveau),
but take information about BARs indirectly - from BIOS for example. This
makes them vulnerable to BAR movement during boot, when setting reported by
BIOS/bootloader differs from new addresses assigned by the kernel.

Until every such driver is aware of movable BARs, mark every VGA BAR as
immovable. Perhaps this is also useful for splash screens, so they don't
flicker.

This makes some BARs and bridge windows immovable during boot, so update
the parent's struct pci_bus->immovable_range when encountered.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f8f643dac6d1..b810d28ebf96 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -164,6 +164,20 @@ static inline unsigned long decode_bar(struct pci_dev *dev, u32 bar)
 
 #define PCI_COMMAND_DECODE_ENABLE	(PCI_COMMAND_MEMORY | PCI_COMMAND_IO)
 
+static void expand_immovable_range(struct pci_bus *bus, struct resource *res, int idx)
+{
+	struct resource *immovable_range = &bus->immovable_range[idx];
+
+	if (!immovable_range->start || immovable_range->start > res->start)
+		immovable_range->start = res->start;
+
+	if (immovable_range->end < res->end)
+		immovable_range->end = res->end;
+
+	if (bus->parent)
+		expand_immovable_range(bus->parent, immovable_range, idx);
+}
+
 /**
  * pci_read_base - Read a PCI BAR
  * @dev: the PCI device
@@ -307,11 +321,16 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 	}
 
 	if (pci_can_move_bars && res->start && !(res->flags & IORESOURCE_IO)) {
-		pci_warn(dev, "ignore the current offset of BAR %llx-%llx\n",
-			 l64, l64 + sz64 - 1);
-		res->start = 0;
-		res->end = sz64 - 1;
-		res->flags |= IORESOURCE_SIZEALIGN;
+		if ((dev->class >> 8) == PCI_CLASS_DISPLAY_VGA) {
+			expand_immovable_range(dev->bus, res,
+					       pci_get_bridge_resource_idx(res));
+		} else {
+			pci_warn(dev, "ignore the current offset of BAR %llx-%llx\n",
+				 l64, l64 + sz64 - 1);
+			res->start = 0;
+			res->end = sz64 - 1;
+			res->flags |= IORESOURCE_SIZEALIGN;
+		}
 	}
 
 	goto out;
@@ -3164,6 +3183,11 @@ bool pci_dev_bar_movable(struct pci_dev *dev, struct resource *res)
 	if (region.start == 0xa0000)
 		return false;
 
+	if (res->start &&
+	    !(res->flags & IORESOURCE_IO) &&
+	    (dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
+		return false;
+
 	if (!dev->driver && !res->child)
 		return true;
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 19/26] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (17 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 18/26] PCI: Treat VGA BARs as immovable Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs Sergei Miroshnichenko
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

Assure that MPS settings are set up for bridges which are discovered during
manually triggered rescan via sysfs. This sequence of bridge init (using
pci_rescan_bus()) will be used for pciehp hot-add events when BARs are
movable.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b810d28ebf96..5f809ddd72b4 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3440,7 +3440,7 @@ static void pci_reassign_root_bus_resources(struct pci_bus *root)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
-	struct pci_bus *root = bus;
+	struct pci_bus *root = bus, *child;
 
 	while (!pci_is_root_bus(root))
 		root = root->parent;
@@ -3461,6 +3461,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_assign_unassigned_bus_resources(bus);
 	}
 
+	list_for_each_entry(child, &root->children, node)
+		pcie_bus_configure_settings(child);
+
 	pci_bus_add_devices(bus);
 
 	return max;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (18 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 19/26] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-30 14:39   ` Rafael J. Wysocki
  2020-01-30 21:14   ` Bjorn Helgaas
  2020-01-29 15:29 ` [PATCH v7 21/26] PCI: hotplug: Don't disable the released bridge windows immediately Sergei Miroshnichenko
                   ` (6 subsequent siblings)
  26 siblings, 2 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko,
	Rafael J . Wysocki

When the Movable BARs feature is supported, the PCI subsystem is able to
distribute existing BARs and allocate the new ones itself, without need to
reserve gaps by BIOS.

CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pnp/system.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/pnp/system.c b/drivers/pnp/system.c
index 6950503741eb..16cd260a609d 100644
--- a/drivers/pnp/system.c
+++ b/drivers/pnp/system.c
@@ -12,6 +12,7 @@
 #include <linux/device.h>
 #include <linux/init.h>
 #include <linux/slab.h>
+#include <linux/pci.h>
 #include <linux/kernel.h>
 #include <linux/ioport.h>
 
@@ -58,6 +59,11 @@ static void reserve_resources_of_dev(struct pnp_dev *dev)
 	struct resource *res;
 	int i;
 
+#ifdef CONFIG_PCI
+	if (pci_can_move_bars)
+		return;
+#endif
+
 	for (i = 0; (res = pnp_get_resource(dev, IORESOURCE_IO, i)); i++) {
 		if (res->flags & IORESOURCE_DISABLED)
 			continue;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 21/26] PCI: hotplug: Don't disable the released bridge windows immediately
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (19 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 22/26] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs Sergei Miroshnichenko
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

On a hotplug event with enabled BAR movement, calculating the new bridge
windows takes some time. During this procedure, the structures that
represent these windows are "released" - means marked for recalculation by
setting to zero.

When the new bridge windows are ready, they are written to the registers of
every bridge via pci_setup_bridges().

Currently, bridge's registers are updated immediately after releasing a
window to disable it. But if a driver doesn't yet support movable BARs, it
doesn't stop MEM transactions during the hotplug, so disabled bridge
windows will break them.

Let the bridge windows remain operating after releasing, as they will be
updated to the new values in the end of a hotplug event.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 573d5c67c136..dd94b4ed7358 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1708,7 +1708,8 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		/* Avoiding touch the one without PREF */
 		if (type & IORESOURCE_PREFETCH)
 			type = IORESOURCE_PREFETCH;
-		__pci_setup_bridge(bus, type);
+		if (!pci_can_move_bars)
+			__pci_setup_bridge(bus, type);
 		/* For next child res under same bridge */
 		r->flags = old_flags;
 	}
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 22/26] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (20 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 21/26] PCI: hotplug: Don't disable the released bridge windows immediately Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 23/26] PCI: Don't claim immovable BARs Sergei Miroshnichenko
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko, Lukas Wunner

With movable BARs, adding a hotplugged device is not local to its bridge
anymore, but it affects the whole RC: BARs, bridge windows and bus numbers
can be substantially rearranged. So instead of trying to fit the new
devices into preallocated reserved gaps, initiate a full domain rescan.

The pci_rescan_bus() covers all the operations of the replaced functions:
 - assigning new bus numbers, as the pci_hp_add_bridge() does it;
 - allocating BARs (pci_assign_unassigned_bridge_resources());
 - cofiguring MPS settings (pcie_bus_configure_settings());
 - binding devices to their drivers (pci_bus_add_devices()).

CC: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/hotplug/pciehp_pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index d17f3bf36f70..6d4c1ef38210 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -58,6 +58,11 @@ int pciehp_configure_device(struct controller *ctrl)
 		goto out;
 	}
 
+	if (pci_can_move_bars) {
+		pci_rescan_bus(parent);
+		goto out;
+	}
+
 	for_each_pci_bridge(dev, parent)
 		pci_hp_add_bridge(dev);
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 23/26] PCI: Don't claim immovable BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (21 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 22/26] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 24/26] PCI: hotplug: Don't reserve bus space when enabled movable BARs Sergei Miroshnichenko
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

Immovable BAR always has an address, but its parent bridge window can
be not yet calculated (during boot) or temporarily released for re-
calculation (during PCI rescan) when pci_claim_resource() is called.

Apart from that, immovable BARs now have separate guaranteed mechanism
of assigning comparing to usual BARs, so claiming them is not needed.

Return immediately from pci_claim_resource() to prevent misleading
"can't claim BAR ... no compatible bridge window" error messages

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-res.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 4043aab021dd..dc9b52f11922 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -138,6 +138,9 @@ int pci_claim_resource(struct pci_dev *dev, int resource)
 		return -EINVAL;
 	}
 
+	if (pci_can_move_bars && !pci_dev_bar_movable(dev, res))
+		return 0;
+
 	/*
 	 * If we have a shadow copy in RAM, the PCI device doesn't respond
 	 * to the shadow range, so we don't need to claim it, and upstream
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 24/26] PCI: hotplug: Don't reserve bus space when enabled movable BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (22 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 23/26] PCI: Don't claim immovable BARs Sergei Miroshnichenko
@ 2020-01-29 15:29 ` Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 25/26] nvme-pci: Handle " Sergei Miroshnichenko
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

A hotplugged bridge with many hotplug-capable ports may request reserving
more IO space than the machine has. This could be overridden with the
"hpiosize=" kernel argument though.

But when BARs are movable, no need to reserve space anymore: new BARs are
allocated not from reserved gaps, but via rearranging the existing BARs.
Requesting a precise amount of space for bridge windows increases the
chances of adding the new bridge successfully.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index dd94b4ed7358..3ec01a74b7b6 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1316,7 +1316,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 
 	case PCI_HEADER_TYPE_BRIDGE:
 		pci_bridge_check_ranges(bus);
-		if (bus->self->is_hotplug_bridge) {
+		if (bus->self->is_hotplug_bridge && !pci_can_move_bars) {
 			additional_io_size  = pci_hotplug_io_size;
 			additional_mmio_size = pci_hotplug_mmio_size;
 			additional_mmio_pref_size = pci_hotplug_mmio_pref_size;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 25/26] nvme-pci: Handle movable BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (23 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 24/26] PCI: hotplug: Don't reserve bus space when enabled movable BARs Sergei Miroshnichenko
@ 2020-01-29 15:29 ` " Sergei Miroshnichenko
  2020-01-29 15:29 ` [PATCH v7 26/26] PCI/portdrv: Declare support of " Sergei Miroshnichenko
  2020-01-30 23:37 ` [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Bjorn Helgaas
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko,
	linux-nvme, Christoph Hellwig

Hotplugged devices can affect the existing ones by moving their BARs. The
PCI subsystem will inform the NVME driver about this by invoking the
.rescan_prepare() and .rescan_done() hooks, so the BARs can by re-mapped.

Tested under the "randrw" mode of the fio tool, and when using an NVME
drive as a root filesystem storage. Before the hotplugging:

  % sudo cat /proc/iomem
  ...
                3fe800000000-3fe8007fffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:18
                    3fe800000000-3fe8000fffff : 0020:18:00.0
                      3fe800000000-3fe8000fffff : nvme
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
                    3fe800100000-3fe80017ffff : 0020:18:00.0
  ...

Then another NVME drive was hot-added, so BARs of the 0020:18:00.0 are
moved:

  % sudo cat /proc/iomem
    ...
                3fe800000000-3fe800ffffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:10
                    3fe800000000-3fe800003fff : 0020:10:00.0
                      3fe800000000-3fe800003fff : nvme
                    3fe800010000-3fe80001ffff : 0020:10:00.0
                  3fe800800000-3fe800ffffff : PCI Bus 0020:18
                    3fe800800000-3fe8008fffff : 0020:18:00.0
                      3fe800800000-3fe8008fffff : nvme
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
                    3fe800900000-3fe80097ffff : 0020:18:00.0
    ...

During the rescanning, both READ and WRITE speeds drop to zero for a while
due to driver's pause, then restore.

Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/nvme/host/pci.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 365a2ddbeaa7..42976c13d3f7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1644,7 +1644,7 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-	if (size <= dev->bar_mapped_size)
+	if (dev->bar && size <= dev->bar_mapped_size)
 		return 0;
 	if (size > pci_resource_len(pdev, 0))
 		return -ENOMEM;
@@ -3049,6 +3049,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 }
 
+static void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable(dev, false);
+	nvme_dev_unmap(dev);
+	dev->bar = NULL;
+}
+
+static void nvme_rescan_done(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_map(dev);
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
@@ -3126,6 +3143,8 @@ static struct pci_driver nvme_driver = {
 #endif
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
+	.rescan_prepare	= nvme_rescan_prepare,
+	.rescan_done	= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 26/26] PCI/portdrv: Declare support of movable BARs
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (24 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 25/26] nvme-pci: Handle " Sergei Miroshnichenko
@ 2020-01-29 15:29 ` " Sergei Miroshnichenko
  2020-01-30 23:37 ` [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Bjorn Helgaas
  26 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-29 15:29 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, Stefan Roese, linux, Sergei Miroshnichenko

Currently there are no reliable way to determine if a driver uses BARs of
its devices (their struct resource don't always have a child), so if it
doesn't explicitly support movable BARs, they are considered immovable.

The portdrv driver for PCI switches don't use BARs, so add empty hooks
.rescan_prepare() and .rescan_done() to increase chances to allocate new
BARs for new devices.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pcie/portdrv_pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 160d67c59310..df1faf2fed86 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -205,6 +205,14 @@ static const struct pci_error_handlers pcie_portdrv_err_handler = {
 	.resume = pcie_portdrv_err_resume,
 };
 
+static void pcie_portdrv_rescan_prepare(struct pci_dev *pdev)
+{
+}
+
+static void pcie_portdrv_rescan_done(struct pci_dev *pdev)
+{
+}
+
 static struct pci_driver pcie_portdriver = {
 	.name		= "pcieport",
 	.id_table	= &port_pci_ids[0],
@@ -215,6 +223,9 @@ static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
+	.rescan_prepare	= pcie_portdrv_rescan_prepare,
+	.rescan_done	= pcie_portdrv_rescan_done,
+
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
 };
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs
  2020-01-29 15:29 ` [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs Sergei Miroshnichenko
@ 2020-01-30 14:39   ` Rafael J. Wysocki
  2020-01-30 21:14   ` Bjorn Helgaas
  1 sibling, 0 replies; 48+ messages in thread
From: Rafael J. Wysocki @ 2020-01-30 14:39 UTC (permalink / raw)
  To: Sergei Miroshnichenko
  Cc: linux-pci, Bjorn Helgaas, Stefan Roese, linux, Rafael J . Wysocki

On Wednesday, January 29, 2020 4:29:31 PM CET Sergei Miroshnichenko wrote:
> When the Movable BARs feature is supported, the PCI subsystem is able to
> distribute existing BARs and allocate the new ones itself, without need to
> reserve gaps by BIOS.
> 
> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pnp/system.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/pnp/system.c b/drivers/pnp/system.c
> index 6950503741eb..16cd260a609d 100644
> --- a/drivers/pnp/system.c
> +++ b/drivers/pnp/system.c
> @@ -12,6 +12,7 @@
>  #include <linux/device.h>
>  #include <linux/init.h>
>  #include <linux/slab.h>
> +#include <linux/pci.h>
>  #include <linux/kernel.h>
>  #include <linux/ioport.h>
>  
> @@ -58,6 +59,11 @@ static void reserve_resources_of_dev(struct pnp_dev *dev)
>  	struct resource *res;
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	if (pci_can_move_bars)
> +		return;
> +#endif

Would it be a problem to define pci_can_move_bars() as a static inline
returning 'false' when CONFIG_PCI is unset?

The #ifdef wouldn't be needed here then.

> +
>  	for (i = 0; (res = pnp_get_resource(dev, IORESOURCE_IO, i)); i++) {
>  		if (res->flags & IORESOURCE_DISABLED)
>  			continue;
> 





^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs
  2020-01-29 15:29 ` [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs Sergei Miroshnichenko
  2020-01-30 14:39   ` Rafael J. Wysocki
@ 2020-01-30 21:14   ` Bjorn Helgaas
  2020-01-31 15:48     ` Sergei Miroshnichenko
  1 sibling, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-30 21:14 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, Stefan Roese, linux, Rafael J . Wysocki

On Wed, Jan 29, 2020 at 06:29:31PM +0300, Sergei Miroshnichenko wrote:
> When the Movable BARs feature is supported, the PCI subsystem is able to
> distribute existing BARs and allocate the new ones itself, without need to
> reserve gaps by BIOS.
> 
> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pnp/system.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/pnp/system.c b/drivers/pnp/system.c
> index 6950503741eb..16cd260a609d 100644
> --- a/drivers/pnp/system.c
> +++ b/drivers/pnp/system.c
> @@ -12,6 +12,7 @@
>  #include <linux/device.h>
>  #include <linux/init.h>
>  #include <linux/slab.h>
> +#include <linux/pci.h>
>  #include <linux/kernel.h>
>  #include <linux/ioport.h>
>  
> @@ -58,6 +59,11 @@ static void reserve_resources_of_dev(struct pnp_dev *dev)
>  	struct resource *res;
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	if (pci_can_move_bars)
> +		return;
> +#endif

I don't understand this.  The reason this function exists is so we
keep track of the resources consumed by PNP devices and we can keep
from assigning those resources to other things like PCI devices.

Admittedly we currently only do this for PNP0C01 and PNP0C02 devices,
but we really should do it for all PNP devices.

Why does Movable BARs mean that we no longer need this information?
The whole point is that this information is needed *during* PCI
resource allocation, so I don't understand the idea that "because the
PCI subsystem is able to distribute existing BARs and allocate the new
ones itself", we don't need to know about PNP resources to avoid.

>  	for (i = 0; (res = pnp_get_resource(dev, IORESOURCE_IO, i)); i++) {
>  		if (res->flags & IORESOURCE_DISABLED)
>  			continue;
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows
  2020-01-29 15:29 ` [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows Sergei Miroshnichenko
@ 2020-01-30 21:26   ` Bjorn Helgaas
  2020-01-31 16:59     ` Sergei Miroshnichenko
  0 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-30 21:26 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, Stefan Roese, linux

On Wed, Jan 29, 2020 at 06:29:20PM +0300, Sergei Miroshnichenko wrote:
> When movable BARs are enabled, and if a bridge contains a device with fixed
> (IORESOURCE_PCI_FIXED) or immovable BARs, the corresponding windows can't

What is the difference between fixed (IORESOURCE_PCI_FIXED) and
immovable BARs?  I'm hesitant to add a new concept ("immovable") with
a different name but very similar meaning.

I understand that in the case of bridge windows, you may need to track
only part of the window, as opposed to a BAR where the entire BAR has
to be either fixed or movable, but I think adding a new term will be
confusing.

> be moved too far away from their original positions - they must still
> contain all the fixed/immovable BARs, like that:
> 
> 1) Window position before a bus rescan:
> 
>    | <--                    root bridge window                        --> |
>    |                                                                      |
>    | | <--     bridge window    --> |                                     |
>    | | movable BARs | **fixed BAR** |                                     |
>        ^^^^^^^^^^^^
> 
> 2) Possible valid outcome after rescan and move:
> 
>    | <--                    root bridge window                        --> |
>    |                                                                      |
>    |                | <--     bridge window    --> |                      |
>    |                | **fixed BAR** | Movable BARs |                      |
>                                       ^^^^^^^^^^^^
> 
> An immovable area of a bridge window is a range that covers all the fixed
> and immovable BARs of direct children, and all the fixed area of children
> bridges:
> 
>    | <--                    root bridge window                        --> |
>    |                                                                      |
>    |  | <--                  bridge window level 1                --> |   |
>    |  | ******************** immovable area *******************       |   |
>    |  |                                                               |   |
>    |  | **fixed BAR** | <--      bridge window level 2     --> | BARs |   |
>    |  |               | *********** immovable area *********** |      |   |
>    |  |               |                                        |      |   |
>    |  |               | **fixed BAR** |  BARs  | **fixed BAR** |      |   |
>                                          ^^^^
> 
> To store these areas, the .immovable_range field has been added to struct
> pci_bus for every bridge window type: IO, MEM and PREFETCH. It is filled
> recursively from leaves to the root before a rescan.
> 
> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.h   | 14 ++++++++
>  drivers/pci/probe.c | 88 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h |  6 ++++
>  3 files changed, 108 insertions(+)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 3b4c982772d3..5f2051c8531c 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -404,6 +404,20 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
>  	return dev->error_state == pci_channel_io_perm_failure;
>  }
>  
> +static inline int pci_get_bridge_resource_idx(struct resource *r)
> +{
> +	int idx = 1;
> +
> +	if (r->flags & IORESOURCE_IO)
> +		idx = 0;
> +	else if (!(r->flags & IORESOURCE_PREFETCH))
> +		idx = 1;
> +	else if (r->flags & IORESOURCE_MEM_64)
> +		idx = 2;

Random nit: No variables or elses required:

  if (r->flags & IORESOURCE_IO)
    return 0;
  if (!(r->flags & IORESOURCE_PREFETCH))
    return 1;
  ...

> +	return idx;
> +}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 02/26] PCI: Enable bridge's I/O and MEM access for hotplugged devices
  2020-01-29 15:29 ` [PATCH v7 02/26] PCI: Enable bridge's I/O and MEM access for hotplugged devices Sergei Miroshnichenko
@ 2020-01-30 23:12   ` Bjorn Helgaas
  0 siblings, 0 replies; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-30 23:12 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, Stefan Roese, linux

On Wed, Jan 29, 2020 at 06:29:13PM +0300, Sergei Miroshnichenko wrote:
> The PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits of the bridge must be
> updated not only when enabling the bridge for the first time, but also if a
> hotplugged device requests these types of resources.
> 
> For example, if a bridge had all its slots empty, the IO and MEM bits will
> not be set, and a hot hotplugged device will fail.

s/hot hotplugged/hot-added/ or something similar

> Originally these bits were set by the pci_enable_device_flags() only, which
> exits early if the bridge is already pci_is_enabled(). So let's check them
> again when requested.

s/by the/by/
s/ only,/,

IIUC, in the current tree (before this series), we do set
PCI_COMMAND_IO and PCI_COMMAND_MEMORY on bridges leading to hot-added
devices, but this patch is needed because [01/26] "PCI: Fix race
condition in pci_enable/disable_device()" makes
pci_enable_device_flags() exit early without setting those bits.

That would mean there's a bisection hole between [01/26] and [02/26]
where hot-added devices will fail.  We don't want a hole like that.

> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/pci.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 0366289c75e9..cb0042f28e6a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1679,6 +1679,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  		pci_enable_bridge(bridge);
>  
>  	if (pci_is_enabled(dev)) {
> +		int i, bars = 0;
> +
> +		for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
> +			if (dev->resource[i].flags & (IORESOURCE_MEM | IORESOURCE_IO))
> +				bars |= (1 << i);
> +		}
> +		do_pci_enable_device(dev, bars);
> +
>  		if (!dev->is_busmaster)
>  			pci_set_master(dev);
>  		mutex_unlock(&dev->enable_mutex);
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug
  2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (25 preceding siblings ...)
  2020-01-29 15:29 ` [PATCH v7 26/26] PCI/portdrv: Declare support of " Sergei Miroshnichenko
@ 2020-01-30 23:37 ` Bjorn Helgaas
  2020-02-03  4:56   ` Oliver O'Halloran
  26 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-30 23:37 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, Stefan Roese, linux

On Wed, Jan 29, 2020 at 06:29:11PM +0300, Sergei Miroshnichenko wrote:
> Currently PCI hotplug works on top of resources which are usually reserved
> not by the kernel, but by BIOS, bootloader, firmware, etc. These resources
> are gaps in the address space where BARs of new devices may fit, and extra
> bus number per port, so bridges can be hot-added. This series aim the BARs
> problem: it shows the kernel how to redistribute them on the run, so the
> hotplug becomes predictable and cross-platform. A follow-up patchset will
> propose a solution for bus numbers.
> 
> To arrange a space for BARs of new hotplugged devices, the kernel now pause
> the drivers of working PCI devices and reshuffle the assigned BARs by the
> same procedure as during the system boot. When a driver is un-paused by the
> kernel, it should ioremap() the new addresses of its BARs.
> 
> Drivers indicate their support of the feature by implementing the new hooks
> .rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
> doesn't yet support the feature, BARs of its devices will be considered as
> immovable and handled in the same way as resources with the PCI_FIXED flag:
> they are guaranteed to remain untouched.
> 
> Tested on a number x86_64 machines with "pci=pcie_bus_peer2peer" command
> line argument to specify a policy for Maximum Payload Size PCIe setting.

What is the significance of "pci=pcie_bus_peer2peer"?  On the face of
it, that doesn't sound related to BAR assignment.

> Also tested on a POWER8 PowerNV+OPAL+PHB3 ppc64le machine, but with extra
> patches which are to be sent upstream after the PCIPOCALYPSE patchset is
> merged.

"PCIPOCALYPSE" is meaningless to me.  Not sure if it's relevant for me
or not, but if you mention it, it'd be nice to have some more context.

> First two patches of this series are bugfixes, not related directly to the
> movable BARs feature.
> 
> Patches 03-14/26 implement the essentials of the feature.
> 
> Patch 15/26 enables the feature by default.
> 
> Patches 16-26/26 are performance improvements for hotplug.
> 
> This patchset is a part of our work on adding support for hotplugging
> chains of chassis full of other bridges, NVME drives, SAS HBAs, GPUs, etc.
> without special requirements such as Hot-Plug Controller, reservation of
> bus numbers or memory regions by firmware, etc.

Thanks for reposting these.  This series was next on my list to look
at, so I'm glad for an updated version.

> Changes since v6:
>  - Added a fix for hotplug on AMD Epyc + Supermicro H11SSL-i by ignoring
>    PCIBIOS_MIN_MEM;
>  - Fixed a workaround which marks VGA BARs as immovables;
>  - Fixed misleading "can't claim BAR ... no compatible bridge window" error
>    messages;
>  - Refactored the code, reduced the amount of patches;
>  - Exclude PowerPC-specific arch patches, they will be sent separately;
>  - Disabled for PowerNV by default - waiting for the PCIPOCALYPSE patchset.
>  - Fixed reports from the kbuild test robot.
> 
> Changes since v5:
>  - Simplified the disable flag, now it is "pci=no_movable_buses";
>  - More deliberate marking the BARs as immovable;
>  - Mark as immovable BARs which are used by unbound drivers;
>  - Ignoring BAR assignment by non-kernel program components, so the kernel
>    is able now to distribute BARs in optimal and predictable way;
>  - Move here PowerNV-specific patches from the older "powerpc/powernv/pci:
>    Make hotplug self-sufficient, independent of FW and DT" series;
>  - Fix EEH cache rebuilding and PE allocation for PowerNV during rescan.
> 
> Changes since v4:
>  - Feature is enabled by default (turned on by one of the latest patches);
>  - Add pci_dev_movable_bars_supported(dev) instead of marking the immovable
>    BARs with the IORESOURCE_PCI_FIXED flag;
>  - Set up PCIe bridges during rescan via sysfs, so MPS settings are now
>    configured not only during system boot or pcihp events;
>  - Allow movement of switch's BARs if claimed by portdrv;
>  - Update EEH address caches after rescan for powerpc;
>  - Don't disable completely hot-added devices which can't have BARs being
>    fit - just disable their BARs, so they are still visible in lspci etc;
>  - Clearer names: fixed_range_hard -> immovable_range, fixed_range_soft ->
>    realloc_range;
>  - Drop the patch for pci_restore_config_space() - fixed by properly using
>    the runtime PM.
> 
> Changes since v3:
>  - Rebased to the upstream, so the patches apply cleanly again.
> 
> Changes since v2:
>  - Fixed double-assignment of bridge windows;
>  - Fixed assignment of fixed prefetched resources;
>  - Fixed releasing of fixed resources;
>  - Fixed a debug message;
>  - Removed auto-enabling the movable BARs for x86 - let's rely on the
>    "pcie_movable_bars=force" option for now;
>  - Reordered the patches - bugfixes first.
> 
> Changes since v1:
>  - Add a "pcie_movable_bars={ off | force }" command line argument;
>  - Handle the IORESOURCE_PCI_FIXED flag properly;
>  - Don't move BARs of devices which don't support the feature;
>  - Guarantee that new hotplugged devices will not steal memory from working
>    devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
>    flag;
>  - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
>    using the reset_prepare()+reset_done() from struct pci_error_handlers;
>  - Add a bugfix of a race condition;
>  - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
>  - Fix the compatibility of the feature with pm_runtime and D3-state;
>  - Hotplug events from pciehp also can move BARs;
>  - Add support of the feature to the NVME driver.
> 
> Sergei Miroshnichenko (26):
>   PCI: Fix race condition in pci_enable/disable_device()
>   PCI: Enable bridge's I/O and MEM access for hotplugged devices
>   PCI: hotplug: Initial support of the movable BARs feature
>   PCI: Add version of release_child_resources() aware of immovable BARs
>   PCI: hotplug: Fix reassigning the released BARs
>   PCI: hotplug: Recalculate every bridge window during rescan
>   PCI: hotplug: Don't allow hot-added devices to steal resources
>   PCI: hotplug: Try to reassign movable BARs only once
>   PCI: hotplug: Calculate immovable parts of bridge windows
>   PCI: Include fixed and immovable BARs into the bus size calculating
>   PCI: hotplug: movable BARs: Compute limits for relocated bridge
>     windows
>   PCI: Make sure bridge windows include their fixed BARs
>   PCI: hotplug: Add support of immovable BARs to pci_assign_resource()
>   PCI: hotplug: Sort immovable BARs before assignment
>   PCI: hotplug: Enable the movable BARs feature by default
>   PCI: Ignore PCIBIOS_MIN_MEM
>   PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader
>   PCI: Treat VGA BARs as immovable
>   PCI: hotplug: Configure MPS for hot-added bridges during bus rescan
>   PNP: Don't reserve BARs for PCI when enabled movable BARs
>   PCI: hotplug: Don't disable the released bridge windows immediately
>   PCI: pciehp: Trigger a domain rescan on hp events when enabled movable
>     BARs
>   PCI: Don't claim immovable BARs
>   PCI: hotplug: Don't reserve bus space when enabled movable BARs
>   nvme-pci: Handle movable BARs
>   PCI/portdrv: Declare support of movable BARs
> 
>  .../admin-guide/kernel-parameters.txt         |   1 +

Probably needs some text in Documentation/PCI/pci.rst as well about
what drivers need to do to support this.  We haven't been very good
about documenting all the hooks in struct pci_driver, but this would
be a good chance to start.

>  arch/powerpc/platforms/powernv/pci.c          |   2 +
>  arch/powerpc/platforms/pseries/setup.c        |   2 +
>  drivers/nvme/host/pci.c                       |  21 +-
>  drivers/pci/bus.c                             |   2 +-
>  drivers/pci/hotplug/pciehp_pci.c              |   5 +
>  drivers/pci/pci.c                             |  38 +-
>  drivers/pci/pci.h                             |  31 ++
>  drivers/pci/pcie/portdrv_pci.c                |  11 +
>  drivers/pci/probe.c                           | 331 +++++++++++++++++-
>  drivers/pci/setup-bus.c                       | 311 ++++++++++++++--
>  drivers/pci/setup-res.c                       |  47 ++-
>  drivers/pnp/system.c                          |   6 +
>  include/linux/pci.h                           |  20 ++
>  14 files changed, 790 insertions(+), 38 deletions(-)
> 
> 
> base-commit: b3a6082223369203d7e7db7e81253ac761377644
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-01-29 15:29 ` [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM Sergei Miroshnichenko
@ 2020-01-30 23:52   ` Bjorn Helgaas
  2020-01-31 18:19     ` Sergei Miroshnichenko
  0 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-30 23:52 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, Stefan Roese, linux

On Wed, Jan 29, 2020 at 06:29:27PM +0300, Sergei Miroshnichenko wrote:
> BARs and bridge windows are only allowed to be assigned to their
> parent bus's bridge windows, going up to the root complex's resources.
> So additional limitations on BAR address are not needed, and the
> PCIBIOS_MIN_MEM can be ignored.

This is theoretically true, but I don't think we have reliable
information about the host bridge windows in all cases, so
PCIBIOS_MIN_MEM/_IO is something of an approximation.

> Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS 1.3 on
> Supermicro H11SSL-i via e820__setup_pci_gap():
> 
>   [mem 0xebff1000-0xfe9fffff] available for PCI devices
> 
> is only suitable for a single RC out of four:
> 
>   pci_bus 0000:00: root bus resource [mem 0xec000000-0xefffffff window]
>   pci_bus 0000:20: root bus resource [mem 0xeb800000-0xebefffff window]
>   pci_bus 0000:40: root bus resource [mem 0xeb200000-0xeb5fffff window]
>   pci_bus 0000:60: root bus resource [mem 0xe8b00000-0xeaffffff window]
> 
> , which makes the AMD EPYC 7251 unable to boot with this movable BARs
> patchset.

Something's wrong if this system booted before this patch set but not
after.  We shouldn't be doing *anything* with the BARs until we need
to, i.e., until we hot-add a device where we have to move things to
find space for it.

(And we don't want a bisection hole where this system can't boot until
this patch is applied, but I assume that's obvious.)

> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/setup-res.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index a7d81816d1ea..4043aab021dd 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -246,12 +246,13 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
>  		int resno, resource_size_t size, resource_size_t align)
>  {
>  	struct resource *res = dev->resource + resno;
> -	resource_size_t min;
> +	resource_size_t min = 0;
>  	int ret;
>  	resource_size_t start = (resource_size_t)-1;
>  	resource_size_t end = 0;
>  
> -	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
> +	if (!pci_can_move_bars)
> +		min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;

I don't understand the connection here.  PCIBIOS_MIN_MEM and
PCIBIOS_MIN_IO are basically ways to say "we can't put PCI resources
below this address".

On ACPI systems, the devices in the ACPI namespace are supposed to
tell the OS what resources they use, and obviously the OS should not
assign those resources to anything else.  If Linux handled all those
ACPI resources correctly and in the absence of firmware defects, we
shouldn't need PCIBIOS_MIN_MEM/_IO at all.  But neither of those is
currently true.

It's true that we should be smarter about PCIBIOS_MIN_MEM/_IO, but I
don't think that has anything to do with whether we support *moving*
BARs.  We have to avoid the address space that's already in use in
*all* cases.

>  	if (pci_can_move_bars && dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
>  		struct pci_bus *child_bus = dev->subordinate;
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs
  2020-01-30 21:14   ` Bjorn Helgaas
@ 2020-01-31 15:48     ` Sergei Miroshnichenko
  2020-01-31 19:39       ` Bjorn Helgaas
  0 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-31 15:48 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, rafael.j.wysocki, sr

Hello Bjorn,

On Thu, 2020-01-30 at 15:14 -0600, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2020 at 06:29:31PM +0300, Sergei Miroshnichenko
> wrote:
> > When the Movable BARs feature is supported, the PCI subsystem is
> > able to
> > distribute existing BARs and allocate the new ones itself, without
> > need to
> > reserve gaps by BIOS.
> > 
> > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > ---
> >  drivers/pnp/system.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/pnp/system.c b/drivers/pnp/system.c
> > index 6950503741eb..16cd260a609d 100644
> > --- a/drivers/pnp/system.c
> > +++ b/drivers/pnp/system.c
> > @@ -12,6 +12,7 @@
> >  #include <linux/device.h>
> >  #include <linux/init.h>
> >  #include <linux/slab.h>
> > +#include <linux/pci.h>
> >  #include <linux/kernel.h>
> >  #include <linux/ioport.h>
> >  
> > @@ -58,6 +59,11 @@ static void reserve_resources_of_dev(struct
> > pnp_dev *dev)
> >  	struct resource *res;
> >  	int i;
> >  
> > +#ifdef CONFIG_PCI
> > +	if (pci_can_move_bars)
> > +		return;
> > +#endif
> 
> I don't understand this.  The reason this function exists is so we
> keep track of the resources consumed by PNP devices and we can keep
> from assigning those resources to other things like PCI devices.
> 
> Admittedly we currently only do this for PNP0C01 and PNP0C02 devices,
> but we really should do it for all PNP devices.
> 
> Why does Movable BARs mean that we no longer need this information?
> The whole point is that this information is needed *during* PCI
> resource allocation, so I don't understand the idea that "because the
> PCI subsystem is able to distribute existing BARs and allocate the
> new
> ones itself", we don't need to know about PNP resources to avoid.
> 

Oh. I've made this patch in assumption that non-PCI PNP devices should
not reside in the PCI address space, and PCI PNP devices behave like
usual PCI devices - with BARs handled by the common PCI subsystem.

Do I understand correctly after digging a bit into drivers/pnp, that
some of these resources are some kind of "invisible" BARs, which are
used by drivers, but the PCI subsystem can't "see" them, so that's why
the PNP reserves them?

In this case I need just to discard this patch and to modify the
pci_bus_release_root_bridge_resources() added in patch 06/26 - remove
the pci_bus_for_each_resource(root_bus, r, i) block there, which
releases such non-BAR resourses. I've just checked that it works, so
the next version - v8 - of this patchset will be a bit lighter. Thank
you for pointing that out!

Best regards,
Serge


> >  	for (i = 0; (res = pnp_get_resource(dev, IORESOURCE_IO, i));
> > i++) {
> >  		if (res->flags & IORESOURCE_DISABLED)
> >  			continue;
> > -- 
> > 2.24.1
> > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows
  2020-01-30 21:26   ` Bjorn Helgaas
@ 2020-01-31 16:59     ` Sergei Miroshnichenko
  2020-01-31 20:10       ` Bjorn Helgaas
  0 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-31 16:59 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, sr

On Thu, 2020-01-30 at 15:26 -0600, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2020 at 06:29:20PM +0300, Sergei Miroshnichenko
> wrote:
> > When movable BARs are enabled, and if a bridge contains a device
> > with fixed
> > (IORESOURCE_PCI_FIXED) or immovable BARs, the corresponding windows
> > can't
> 
> What is the difference between fixed (IORESOURCE_PCI_FIXED) and
> immovable BARs?  I'm hesitant to add a new concept ("immovable") with
> a different name but very similar meaning.
> 
> I understand that in the case of bridge windows, you may need to
> track
> only part of the window, as opposed to a BAR where the entire BAR has
> to be either fixed or movable, but I think adding a new term will be
> confusing.
> 

I thought the term "fixed BAR" has some legacy scent, and those marked
with flag IORESOURCE_PCI_FIXED are fixed forever. But a BAR may become
movable after rmmod-ing its driver that didn't support movable BARs.

Still, the IORESOURCE_PCI_FIXED is used just a few times in the kernel,
so the "fixed BAR" term is probably not so well-established, so I don't
mind referring all non-movables as "fixed": both marked with the flag
and not. Will change all of them in v8.

> > be moved too far away from their original positions - they must
> > still
> > contain all the fixed/immovable BARs, like that:
> > 
> > 1) Window position before a bus rescan:
> > 
> >    | <--                    root bridge
> > window                        --> |
> >    |                                                               
> >        |
> >    | | <--     bridge window    -->
> > |                                     |
> >    | | movable BARs | **fixed BAR**
> > |                                     |
> >        ^^^^^^^^^^^^
> > 
> > 2) Possible valid outcome after rescan and move:
> > 
> >    | <--                    root bridge
> > window                        --> |
> >    |                                                               
> >        |
> >    |                | <--     bridge window    -->
> > |                      |
> >    |                | **fixed BAR** | Movable BARs
> > |                      |
> >                                       ^^^^^^^^^^^^
> > 
> > An immovable area of a bridge window is a range that covers all the
> > fixed
> > and immovable BARs of direct children, and all the fixed area of
> > children
> > bridges:
> > 
> >    | <--                    root bridge
> > window                        --> |
> >    |                                                               
> >        |
> >    |  | <--                  bridge window level 1                -
> > -> |   |
> >    |  | ******************** immovable area
> > *******************       |   |
> >    |  |                                                            
> >    |   |
> >    |  | **fixed BAR** | <--      bridge window level 2     --> |
> > BARs |   |
> >    |  |               | *********** immovable area ***********
> > |      |   |
> >    |  |               |                                        |   
> >    |   |
> >    |  |               | **fixed BAR** |  BARs  | **fixed BAR**
> > |      |   |
> >                                          ^^^^
> > 
> > To store these areas, the .immovable_range field has been added to
> > struct
> > pci_bus for every bridge window type: IO, MEM and PREFETCH. It is
> > filled
> > recursively from leaves to the root before a rescan.
> > 
> > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > ---
> >  drivers/pci/pci.h   | 14 ++++++++
> >  drivers/pci/probe.c | 88
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h |  6 ++++
> >  3 files changed, 108 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 3b4c982772d3..5f2051c8531c 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -404,6 +404,20 @@ static inline bool
> > pci_dev_is_disconnected(const struct pci_dev *dev)
> >  	return dev->error_state == pci_channel_io_perm_failure;
> >  }
> >  
> > +static inline int pci_get_bridge_resource_idx(struct resource *r)
> > +{
> > +	int idx = 1;
> > +
> > +	if (r->flags & IORESOURCE_IO)
> > +		idx = 0;
> > +	else if (!(r->flags & IORESOURCE_PREFETCH))
> > +		idx = 1;
> > +	else if (r->flags & IORESOURCE_MEM_64)
> > +		idx = 2;
> 
> Random nit: No variables or elses required:
> 
>   if (r->flags & IORESOURCE_IO)
>     return 0;
>   if (!(r->flags & IORESOURCE_PREFETCH))
>     return 1;
>   ...
> 

Thank you!

Best regards,
Serge

> > +	return idx;
> > +}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-01-30 23:52   ` Bjorn Helgaas
@ 2020-01-31 18:19     ` Sergei Miroshnichenko
  2020-01-31 20:27       ` Bjorn Helgaas
  0 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-01-31 18:19 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, sr

On Thu, 2020-01-30 at 17:52 -0600, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2020 at 06:29:27PM +0300, Sergei Miroshnichenko
> wrote:
> > BARs and bridge windows are only allowed to be assigned to their
> > parent bus's bridge windows, going up to the root complex's
> > resources.
> > So additional limitations on BAR address are not needed, and the
> > PCIBIOS_MIN_MEM can be ignored.
> 
> This is theoretically true, but I don't think we have reliable
> information about the host bridge windows in all cases, so
> PCIBIOS_MIN_MEM/_IO is something of an approximation.
> 
> > Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS 1.3 on
> > Supermicro H11SSL-i via e820__setup_pci_gap():
> > 
> >   [mem 0xebff1000-0xfe9fffff] available for PCI devices
> > 
> > is only suitable for a single RC out of four:
> > 
> >   pci_bus 0000:00: root bus resource [mem 0xec000000-0xefffffff
> > window]
> >   pci_bus 0000:20: root bus resource [mem 0xeb800000-0xebefffff
> > window]
> >   pci_bus 0000:40: root bus resource [mem 0xeb200000-0xeb5fffff
> > window]
> >   pci_bus 0000:60: root bus resource [mem 0xe8b00000-0xeaffffff
> > window]
> > 
> > , which makes the AMD EPYC 7251 unable to boot with this movable
> > BARs
> > patchset.
> 
> Something's wrong if this system booted before this patch set but not
> after.  We shouldn't be doing *anything* with the BARs until we need
> to, i.e., until we hot-add a device where we have to move things to
> find space for it.
> 

The one breaking boot on this system initially was 17/26 of this
patchset: "PCI: hotplug: Ignore the MEM BAR offsets from
BIOS/bootloader"

Before it the kernel just took BARs pre-assigned by BIOS. In the same
time, the same BIOS reports 0xebff1000-0xfe9fffff as available for PCI
devices, but the real root bridge windows are 0xe8b00000-0xefffffff in
total (and also 64-bit windows) - which are also reported by the same
BIOS. So the kernel was only able to hanble the 0xec000000-0xefffffff
root bus.

With that patch reverted the kernel was able to boot, but unable to
rescan - to reassign BARs actually.

> (And we don't want a bisection hole where this system can't boot
> until
> this patch is applied, but I assume that's obvious.)
> 
> > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > ---
> >  drivers/pci/setup-res.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> > index a7d81816d1ea..4043aab021dd 100644
> > --- a/drivers/pci/setup-res.c
> > +++ b/drivers/pci/setup-res.c
> > @@ -246,12 +246,13 @@ static int __pci_assign_resource(struct
> > pci_bus *bus, struct pci_dev *dev,
> >  		int resno, resource_size_t size, resource_size_t align)
> >  {
> >  	struct resource *res = dev->resource + resno;
> > -	resource_size_t min;
> > +	resource_size_t min = 0;
> >  	int ret;
> >  	resource_size_t start = (resource_size_t)-1;
> >  	resource_size_t end = 0;
> >  
> > -	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > PCIBIOS_MIN_MEM;
> > +	if (!pci_can_move_bars)
> > +		min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > PCIBIOS_MIN_MEM;
> 
> I don't understand the connection here.  PCIBIOS_MIN_MEM and
> PCIBIOS_MIN_IO are basically ways to say "we can't put PCI resources
> below this address".
> 
> On ACPI systems, the devices in the ACPI namespace are supposed to
> tell the OS what resources they use, and obviously the OS should not
> assign those resources to anything else.  If Linux handled all those
> ACPI resources correctly and in the absence of firmware defects, we
> shouldn't need PCIBIOS_MIN_MEM/_IO at all.  But neither of those is
> currently true.
> 
> It's true that we should be smarter about PCIBIOS_MIN_MEM/_IO, but I
> don't think that has anything to do with whether we support *moving*
> BARs.  We have to avoid the address space that's already in use in
> *all* cases.
> 

This is connected to the approach of this feature: releasing,
recalculating and reassigning the BARs and bridge windows. If movable
BARs are disabled, this bug doesn't reproduce. And the bug doesn't let
the system boot when BARs are allowed to move. That's why I've tied
these together.

This line setting the "min" to PCIBIOS_MIN_* is there untouched since
the first kernel git commit in 2005 - could it be that all systems are
just fine now, having their root bridge windows set up correctly?

Best regards,
Serge

> >  	if (pci_can_move_bars && dev->subordinate && resno >=
> > PCI_BRIDGE_RESOURCES) {
> >  		struct pci_bus *child_bus = dev->subordinate;
> > -- 
> > 2.24.1
> > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs
  2020-01-31 15:48     ` Sergei Miroshnichenko
@ 2020-01-31 19:39       ` Bjorn Helgaas
  0 siblings, 0 replies; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-31 19:39 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, linux, rafael.j.wysocki, sr

On Fri, Jan 31, 2020 at 03:48:48PM +0000, Sergei Miroshnichenko wrote:
> Hello Bjorn,
> 
> On Thu, 2020-01-30 at 15:14 -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2020 at 06:29:31PM +0300, Sergei Miroshnichenko
> > wrote:
> > > When the Movable BARs feature is supported, the PCI subsystem is
> > > able to
> > > distribute existing BARs and allocate the new ones itself, without
> > > need to
> > > reserve gaps by BIOS.
> > > 
> > > CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > > ---
> > >  drivers/pnp/system.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/pnp/system.c b/drivers/pnp/system.c
> > > index 6950503741eb..16cd260a609d 100644
> > > --- a/drivers/pnp/system.c
> > > +++ b/drivers/pnp/system.c
> > > @@ -12,6 +12,7 @@
> > >  #include <linux/device.h>
> > >  #include <linux/init.h>
> > >  #include <linux/slab.h>
> > > +#include <linux/pci.h>
> > >  #include <linux/kernel.h>
> > >  #include <linux/ioport.h>
> > >  
> > > @@ -58,6 +59,11 @@ static void reserve_resources_of_dev(struct
> > > pnp_dev *dev)
> > >  	struct resource *res;
> > >  	int i;
> > >  
> > > +#ifdef CONFIG_PCI
> > > +	if (pci_can_move_bars)
> > > +		return;
> > > +#endif
> > 
> > I don't understand this.  The reason this function exists is so we
> > keep track of the resources consumed by PNP devices and we can keep
> > from assigning those resources to other things like PCI devices.
> > 
> > Admittedly we currently only do this for PNP0C01 and PNP0C02 devices,
> > but we really should do it for all PNP devices.
> > 
> > Why does Movable BARs mean that we no longer need this
> > information?  The whole point is that this information is needed
> > *during* PCI resource allocation, so I don't understand the idea
> > that "because the PCI subsystem is able to distribute existing
> > BARs and allocate the new ones itself", we don't need to know
> > about PNP resources to avoid.
> 
> Oh. I've made this patch in assumption that non-PCI PNP devices should
> not reside in the PCI address space, and PCI PNP devices behave like
> usual PCI devices - with BARs handled by the common PCI subsystem.

I don't think we can rely on that assumption.  I think all we should
assume is that address space described by _CRS of any PNP device is
unavailable for use by other devices (except for bridge windows, of
course).

> Do I understand correctly after digging a bit into drivers/pnp, that
> some of these resources are some kind of "invisible" BARs, which are
> used by drivers, but the PCI subsystem can't "see" them, so that's why
> the PNP reserves them?

ACPI/PNP _CRS is sort of a generalized BAR idea for devices that don't
have a native configuration protocol.  E.g., PCI has config space that
supports both enumeration and resource configuration (BARs).  There
may be other buses that have similar ideas and don't need PNP devices.

Generally, ACPI describes devices that can't be enumerated and
configured via native means.

> In this case I need just to discard this patch and to modify the
> pci_bus_release_root_bridge_resources() added in patch 06/26 - remove
> the pci_bus_for_each_resource(root_bus, r, i) block there, which
> releases such non-BAR resourses. I've just checked that it works, so
> the next version - v8 - of this patchset will be a bit lighter. Thank
> you for pointing that out!
> 
> Best regards,
> Serge
> 
> 
> > >  	for (i = 0; (res = pnp_get_resource(dev, IORESOURCE_IO, i));
> > > i++) {
> > >  		if (res->flags & IORESOURCE_DISABLED)
> > >  			continue;
> > > -- 
> > > 2.24.1
> > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows
  2020-01-31 16:59     ` Sergei Miroshnichenko
@ 2020-01-31 20:10       ` Bjorn Helgaas
  0 siblings, 0 replies; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-31 20:10 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, linux, sr

On Fri, Jan 31, 2020 at 04:59:44PM +0000, Sergei Miroshnichenko wrote:
> On Thu, 2020-01-30 at 15:26 -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2020 at 06:29:20PM +0300, Sergei Miroshnichenko
> > wrote:
> > > When movable BARs are enabled, and if a bridge contains a device
> > > with fixed
> > > (IORESOURCE_PCI_FIXED) or immovable BARs, the corresponding windows
> > > can't
> > 
> > What is the difference between fixed (IORESOURCE_PCI_FIXED) and
> > immovable BARs?  I'm hesitant to add a new concept ("immovable") with
> > a different name but very similar meaning.
> > 
> > I understand that in the case of bridge windows, you may need to
> > track
> > only part of the window, as opposed to a BAR where the entire BAR has
> > to be either fixed or movable, but I think adding a new term will be
> > confusing.
> 
> I thought the term "fixed BAR" has some legacy scent, and those marked
> with flag IORESOURCE_PCI_FIXED are fixed forever. But a BAR may become
> movable after rmmod-ing its driver that didn't support movable BARs.
> 
> Still, the IORESOURCE_PCI_FIXED is used just a few times in the kernel,
> so the "fixed BAR" term is probably not so well-established, so I don't
> mind referring all non-movables as "fixed": both marked with the flag
> and not. Will change all of them in v8.

Yeah, "fixed" is a relatively new idea and not part of the actual
spec.  And you're right that "immovable" is slightly different because
sometimes it's a consequence of lack of driver support.

This probably isn't a big deal either way.  I think there are two
cases: (a) IORESOURCE_PCI_FIXED resources (i.e., legacy things that
can't be changed because they don't have BARs) and (b) BARs that can't
be changed because there's a driver bound that doesn't support moving
them.  Either way, we have to treat them the same for resource
allocation, so I'm not sure it's worth differentiating at this level.

Bjorn

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-01-31 18:19     ` Sergei Miroshnichenko
@ 2020-01-31 20:27       ` Bjorn Helgaas
  2020-02-05 13:04         ` Sergei Miroshnichenko
  0 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-31 20:27 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, linux, sr

On Fri, Jan 31, 2020 at 06:19:48PM +0000, Sergei Miroshnichenko wrote:
> On Thu, 2020-01-30 at 17:52 -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2020 at 06:29:27PM +0300, Sergei Miroshnichenko
> > wrote:
> > > BARs and bridge windows are only allowed to be assigned to their
> > > parent bus's bridge windows, going up to the root complex's
> > > resources.
> > > So additional limitations on BAR address are not needed, and the
> > > PCIBIOS_MIN_MEM can be ignored.
> > 
> > This is theoretically true, but I don't think we have reliable
> > information about the host bridge windows in all cases, so
> > PCIBIOS_MIN_MEM/_IO is something of an approximation.
> > 
> > > Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS 1.3 on
> > > Supermicro H11SSL-i via e820__setup_pci_gap():
> > > 
> > >   [mem 0xebff1000-0xfe9fffff] available for PCI devices
> > > 
> > > is only suitable for a single RC out of four:
> > > 
> > >   pci_bus 0000:00: root bus resource [mem 0xec000000-0xefffffff
> > > window]
> > >   pci_bus 0000:20: root bus resource [mem 0xeb800000-0xebefffff
> > > window]
> > >   pci_bus 0000:40: root bus resource [mem 0xeb200000-0xeb5fffff
> > > window]
> > >   pci_bus 0000:60: root bus resource [mem 0xe8b00000-0xeaffffff
> > > window]
> > > 
> > > , which makes the AMD EPYC 7251 unable to boot with this movable
> > > BARs
> > > patchset.
> > 
> > Something's wrong if this system booted before this patch set but not
> > after.  We shouldn't be doing *anything* with the BARs until we need
> > to, i.e., until we hot-add a device where we have to move things to
> > find space for it.
> 
> The one breaking boot on this system initially was 17/26 of this
> patchset: "PCI: hotplug: Ignore the MEM BAR offsets from
> BIOS/bootloader"

I don't think that patch is a good idea.  I think we should read the
current BARs and windows at boot-time and leave them alone unless we
*must* change them.  I don't think we should change things
preemptively to make future hotplug events easier.

> Before it the kernel just took BARs pre-assigned by BIOS. In the same
> time, the same BIOS reports 0xebff1000-0xfe9fffff as available for PCI
> devices, but the real root bridge windows are 0xe8b00000-0xefffffff in
> total (and also 64-bit windows) - which are also reported by the same
> BIOS. So the kernel was only able to handle the 0xec000000-0xefffffff
> root bus.
> 
> With that patch reverted the kernel was able to boot, but unable to
> rescan - to reassign BARs actually.
> 
> > (And we don't want a bisection hole where this system can't boot
> > until
> > this patch is applied, but I assume that's obvious.)
> > 
> > > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > > ---
> > >  drivers/pci/setup-res.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> > > index a7d81816d1ea..4043aab021dd 100644
> > > --- a/drivers/pci/setup-res.c
> > > +++ b/drivers/pci/setup-res.c
> > > @@ -246,12 +246,13 @@ static int __pci_assign_resource(struct
> > > pci_bus *bus, struct pci_dev *dev,
> > >  		int resno, resource_size_t size, resource_size_t align)
> > >  {
> > >  	struct resource *res = dev->resource + resno;
> > > -	resource_size_t min;
> > > +	resource_size_t min = 0;
> > >  	int ret;
> > >  	resource_size_t start = (resource_size_t)-1;
> > >  	resource_size_t end = 0;
> > >  
> > > -	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > > PCIBIOS_MIN_MEM;
> > > +	if (!pci_can_move_bars)
> > > +		min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > > PCIBIOS_MIN_MEM;
> > 
> > I don't understand the connection here.  PCIBIOS_MIN_MEM and
> > PCIBIOS_MIN_IO are basically ways to say "we can't put PCI resources
> > below this address".
> > 
> > On ACPI systems, the devices in the ACPI namespace are supposed to
> > tell the OS what resources they use, and obviously the OS should not
> > assign those resources to anything else.  If Linux handled all those
> > ACPI resources correctly and in the absence of firmware defects, we
> > shouldn't need PCIBIOS_MIN_MEM/_IO at all.  But neither of those is
> > currently true.
> > 
> > It's true that we should be smarter about PCIBIOS_MIN_MEM/_IO, but I
> > don't think that has anything to do with whether we support *moving*
> > BARs.  We have to avoid the address space that's already in use in
> > *all* cases.
> 
> This is connected to the approach of this feature: releasing,
> recalculating and reassigning the BARs and bridge windows. If movable
> BARs are disabled, this bug doesn't reproduce. And the bug doesn't let
> the system boot when BARs are allowed to move. That's why I've tied
> these together.

My point is just that logically this has nothing to do with movable
BARs.

> This line setting the "min" to PCIBIOS_MIN_* is there untouched since
> the first kernel git commit in 2005 - could it be that all systems are
> just fine now, having their root bridge windows set up correctly?

I don't understand the question, sorry.

> > >  	if (pci_can_move_bars && dev->subordinate && resno >=
> > > PCI_BRIDGE_RESOURCES) {
> > >  		struct pci_bus *child_bus = dev->subordinate;
> > > -- 
> > > 2.24.1
> > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader
  2020-01-29 15:29 ` [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader Sergei Miroshnichenko
@ 2020-01-31 20:31   ` Bjorn Helgaas
  2020-02-05 11:01     ` Sergei Miroshnichenko
  0 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-01-31 20:31 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, Stefan Roese, linux

On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko wrote:
> BAR allocation by BIOS/UEFI/bootloader/firmware may be non-optimal and
> it may even clash with the kernel's BAR assignment algorithm.
> 
> For example, sometimes BIOS doesn't reserve space for SR-IOV BARs, and
> this bridge window can neither extend (blocked by immovable BARs) nor
> move (the device itself is immovable).
> 
> With this patch the kernel will use its own methods of BAR allocating
> when possible, increasing the chances of successful boot and hotplug.
> 
> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/probe.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index bb584038d5b4..f8f643dac6d1 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>  			 pos, (unsigned long long)region.start);
>  	}
>  
> +	if (pci_can_move_bars && res->start && !(res->flags & IORESOURCE_IO)) {
> +		pci_warn(dev, "ignore the current offset of BAR %llx-%llx\n",
> +			 l64, l64 + sz64 - 1);
> +		res->start = 0;
> +		res->end = sz64 - 1;
> +		res->flags |= IORESOURCE_SIZEALIGN;
> +	}
> +
>  	goto out;
>  
>  
> @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus *child)
>  		child->resource[i] = &dev->resource[PCI_BRIDGE_RESOURCES+i];
>  
>  	pci_read_bridge_io(child);
> -	pci_read_bridge_mmio(child);
> -	pci_read_bridge_mmio_pref(child);
> +	if (!pci_can_move_bars) {
> +		pci_read_bridge_mmio(child);
> +		pci_read_bridge_mmio_pref(child);
> +	}

I mentioned this in another response, but I'll repeat it here to
comment on the code directly: I don't think we should have feature
tests like this "!pci_can_move_bars" scattered around, and I don't
want basic behaviors like reading bridge windows during enumeration to
depend on it.

There's no obvious reason why we should ignore bridge windows if we
support movable BARs.

>  	if (dev->transparent) {
>  		pci_bus_for_each_resource(child->parent, res, i) {
> @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge *bridge)
>  		pci_bus_claim_resources(bus);
>  	} else {
>  		pci_bus_size_bridges(bus);
> +		if (pci_can_move_bars)
> +			pci_bus_update_realloc_range(bus);
>  		pci_bus_assign_resources(bus);
>  
>  		list_for_each_entry(child, &bus->children, node)
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug
  2020-01-30 23:37 ` [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Bjorn Helgaas
@ 2020-02-03  4:56   ` Oliver O'Halloran
  0 siblings, 0 replies; 48+ messages in thread
From: Oliver O'Halloran @ 2020-02-03  4:56 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Sergei Miroshnichenko, linux-pci, Stefan Roese, linux

On Fri, Jan 31, 2020 at 10:37 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jan 29, 2020 at 06:29:11PM +0300, Sergei Miroshnichenko wrote:
> >
> > Also tested on a POWER8 PowerNV+OPAL+PHB3 ppc64le machine, but with extra
> > patches which are to be sent upstream after the PCIPOCALYPSE patchset is
> > merged.
>
> "PCIPOCALYPSE" is meaningless to me.  Not sure if it's relevant for me
> or not, but if you mention it, it'd be nice to have some more context.

It's a big RFC series I posted to linuxppc-dev late last year that
removes most of the pseries PCI baggage that got carried over to
PowerNV. It also re-works EEH / DMA setup to be done on a per-device
rather than per-bridge basis and those changes are blocking Sergei's.
I didn't think it would be interesting to a general audience in it's
RFC state so I didn't CC it to linux-pci, but it's here if you want to
have a look: https://lore.kernel.org/linuxppc-dev/20191120012859.23300-1-oohall@gmail.com/

Oliver

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader
  2020-01-31 20:31   ` Bjorn Helgaas
@ 2020-02-05 11:01     ` Sergei Miroshnichenko
  2020-02-05 16:42       ` Bjorn Helgaas
  0 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-02-05 11:01 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, sr

On Fri, 2020-01-31 at 14:31 -0600, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko
> wrote:
> > BAR allocation by BIOS/UEFI/bootloader/firmware may be non-optimal
> > and
> > it may even clash with the kernel's BAR assignment algorithm.
> > 
> > For example, sometimes BIOS doesn't reserve space for SR-IOV BARs,
> > and
> > this bridge window can neither extend (blocked by immovable BARs)
> > nor
> > move (the device itself is immovable).
> > 
> > With this patch the kernel will use its own methods of BAR
> > allocating
> > when possible, increasing the chances of successful boot and
> > hotplug.
> > 
> > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > ---
> >  drivers/pci/probe.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index bb584038d5b4..f8f643dac6d1 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev, enum
> > pci_bar_type type,
> >  			 pos, (unsigned long long)region.start);
> >  	}
> >  
> > +	if (pci_can_move_bars && res->start && !(res->flags &
> > IORESOURCE_IO)) {
> > +		pci_warn(dev, "ignore the current offset of BAR %llx-
> > %llx\n",
> > +			 l64, l64 + sz64 - 1);
> > +		res->start = 0;
> > +		res->end = sz64 - 1;
> > +		res->flags |= IORESOURCE_SIZEALIGN;
> > +	}
> > +
> >  	goto out;
> >  
> >  
> > @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus
> > *child)
> >  		child->resource[i] = &dev-
> > >resource[PCI_BRIDGE_RESOURCES+i];
> >  
> >  	pci_read_bridge_io(child);
> > -	pci_read_bridge_mmio(child);
> > -	pci_read_bridge_mmio_pref(child);
> > +	if (!pci_can_move_bars) {
> > +		pci_read_bridge_mmio(child);
> > +		pci_read_bridge_mmio_pref(child);
> > +	}
> 
> I mentioned this in another response, but I'll repeat it here to
> comment on the code directly: I don't think we should have feature
> tests like this "!pci_can_move_bars" scattered around, and I don't
> want basic behaviors like reading bridge windows during enumeration
> to
> depend on it.
> 
> There's no obvious reason why we should ignore bridge windows if we
> support movable BARs.
> 


That patch solves a problem which is non-fatal during boot, but is
breaking this whole patchset when trying a PCI rescan. On a specific
machine we have a popular i350 network card:

$ lspci -tv
-[0000:00]-...
           +-01.1-[02]--+-00.0  Intel Corporation I350 Gigabit Network
Connection
           |            +-00.1  Intel Corporation I350 Gigabit Network
Connection
           |            +-00.2  Intel Corporation I350 Gigabit Network
Connection
           |            \-00.3  Intel Corporation I350 Gigabit Network
Connection

On the "ROG STRIX Z370-F GAMING, BIOS 0612 03/01/2018" motherboard and
vanilla kernel, not every BAR is allocated:

$ dmesg -t
  ...
  pci 0000:00:01.0: PCI bridge to [bus 01]
  pci 0000:00:01.0:   bridge window [mem 0xf7700000-0xf78fffff]
  pci 0000:02:00.0: BAR 7: assigned [mem 0xf7490000-0xf74affff]
  pci 0000:02:00.0: BAR 10: assigned [mem 0xf74b0000-0xf74cffff]
  pci 0000:02:00.1: BAR 7: assigned [mem 0xf74d0000-0xf74effff]
  pci 0000:02:00.1: BAR 10: no space for [mem size 0x00020000]
  pci 0000:02:00.1: BAR 10: failed to assign [mem size 0x00020000]
  pci 0000:02:00.2: BAR 7: no space for [mem size 0x00020000]
  pci 0000:02:00.2: BAR 7: failed to assign [mem size 0x00020000]
  pci 0000:02:00.2: BAR 10: no space for [mem size 0x00020000]
  pci 0000:02:00.2: BAR 10: failed to assign [mem size 0x00020000]
  pci 0000:02:00.3: BAR 7: no space for [mem size 0x00020000]
  pci 0000:02:00.3: BAR 7: failed to assign [mem size 0x00020000]
  pci 0000:02:00.3: BAR 10: no space for [mem size 0x00020000]
  pci 0000:02:00.3: BAR 10: failed to assign [mem size 0x00020000]
  pci 0000:00:01.1: PCI bridge to [bus 02]

$ sudo cat /proc/iomem
  ...
  f7000000-f74fffff : PCI Bus 0000:02
    f7000000-f70fffff : 0000:02:00.3
      f7000000-f70fffff : igb
    f7100000-f71fffff : 0000:02:00.2
      f7100000-f71fffff : igb
    f7200000-f72fffff : 0000:02:00.1
      f7200000-f72fffff : igb
    f7300000-f73fffff : 0000:02:00.0
      f7300000-f73fffff : igb
    f7400000-f747ffff : 0000:02:00.0
    f7480000-f7483fff : 0000:02:00.3
      f7480000-f7483fff : igb
    f7484000-f7487fff : 0000:02:00.2
      f7484000-f7487fff : igb
    f7488000-f748bfff : 0000:02:00.1
      f7488000-f748bfff : igb
    f748c000-f748ffff : 0000:02:00.0
      f748c000-f748ffff : igb
    f7490000-f74affff : 0000:02:00.0
    f74b0000-f74cffff : 0000:02:00.0
    f74d0000-f74effff : 0000:02:00.1

But when allowing the kernel to allocate BARs properly, the map is
full:

  c8200000-c87fffff : PCI Bus 0000:02
    c8200000-c82fffff : 0000:02:00.0
      c8200000-c82fffff : igb
    c8300000-c83fffff : 0000:02:00.1
      c8300000-c83fffff : igb
    c8400000-c84fffff : 0000:02:00.2
      c8400000-c84fffff : igb
    c8500000-c85fffff : 0000:02:00.3
      c8500000-c85fffff : igb
    c8600000-c867ffff : 0000:02:00.0
    c8680000-c8683fff : 0000:02:00.0
      c8680000-c8683fff : igb
    c8684000-c86a3fff : 0000:02:00.0
    c86a4000-c86c3fff : 0000:02:00.0
    c86c4000-c86c7fff : 0000:02:00.1
      c86c4000-c86c7fff : igb
    c86c8000-c86e7fff : 0000:02:00.1
    c86e8000-c8707fff : 0000:02:00.1
    c8708000-c870bfff : 0000:02:00.2
      c8708000-c870bfff : igb
    c870c000-c872bfff : 0000:02:00.2
    c872c000-c874bfff : 0000:02:00.2
    c874c000-c874ffff : 0000:02:00.3
      c874c000-c874ffff : igb
    c8750000-c876ffff : 0000:02:00.3
    c8770000-c878ffff : 0000:02:00.3

In this particular case the "repaired" BARs are not vital and are not
used by the igb driver, but in general such behavior of BIOS can lead
to a non-working setup.

So ignoring pre-set BARs and bridge windows may be useful on its own,
but it is also provides a working starting point required by this
patchset, otherwise it will need to track such BARs impossible to
assign, and don't try to assign them during a next PCI rescan.

The reason I've tied this feature to the "movable BARs" flag is that I
know at least one exception demanding a workaround - VGA. So I wanted
to provide a flag to disable it in case of other unforeseen
consequences, and the only feature depends on this - is movable BARs.

The [07/26] "PCI: hotplug: Don't allow hot-added devices to steal
resources" patch introduces an additional step in BAR assignment:
 - try to assign every existing BAR + BARs of the hot-added device;
 - it if fails, disable BARs for the hot-added device and retry without
   them.

A possible way to work-around non-working BARs could be adding more
steps:
 - first try to assign every existing BAR + BARs not worked previously
   + "hot-added" BARs;
 - if it fails, retry without those BARs which weren't working before;
 - if it still fails, retry without "hot-added" BARs.

Best regards,
Serge

> >  	if (dev->transparent) {
> >  		pci_bus_for_each_resource(child->parent, res, i) {
> > @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge
> > *bridge)
> >  		pci_bus_claim_resources(bus);
> >  	} else {
> >  		pci_bus_size_bridges(bus);
> > +		if (pci_can_move_bars)
> > +			pci_bus_update_realloc_range(bus);
> >  		pci_bus_assign_resources(bus);
> >  
> >  		list_for_each_entry(child, &bus->children, node)
> > -- 
> > 2.24.1
> > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-01-31 20:27       ` Bjorn Helgaas
@ 2020-02-05 13:04         ` Sergei Miroshnichenko
  2020-02-05 16:32           ` Bjorn Helgaas
  0 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-02-05 13:04 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, sr

On Fri, 2020-01-31 at 14:27 -0600, Bjorn Helgaas wrote:
> On Fri, Jan 31, 2020 at 06:19:48PM +0000, Sergei Miroshnichenko
> wrote:
> > On Thu, 2020-01-30 at 17:52 -0600, Bjorn Helgaas wrote:
> > > On Wed, Jan 29, 2020 at 06:29:27PM +0300, Sergei Miroshnichenko
> > > wrote:
> > > > BARs and bridge windows are only allowed to be assigned to
> > > > their
> > > > parent bus's bridge windows, going up to the root complex's
> > > > resources.
> > > > So additional limitations on BAR address are not needed, and
> > > > the
> > > > PCIBIOS_MIN_MEM can be ignored.
> > > 
> > > This is theoretically true, but I don't think we have reliable
> > > information about the host bridge windows in all cases, so
> > > PCIBIOS_MIN_MEM/_IO is something of an approximation.
> > > 
> > > > Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS 1.3
> > > > on
> > > > Supermicro H11SSL-i via e820__setup_pci_gap():
> > > > 
> > > >   [mem 0xebff1000-0xfe9fffff] available for PCI devices
> > > > 
> > > > is only suitable for a single RC out of four:
> > > > 
> > > >   pci_bus 0000:00: root bus resource [mem 0xec000000-0xefffffff
> > > > window]
> > > >   pci_bus 0000:20: root bus resource [mem 0xeb800000-0xebefffff
> > > > window]
> > > >   pci_bus 0000:40: root bus resource [mem 0xeb200000-0xeb5fffff
> > > > window]
> > > >   pci_bus 0000:60: root bus resource [mem 0xe8b00000-0xeaffffff
> > > > window]
> > > > 
> > > > , which makes the AMD EPYC 7251 unable to boot with this
> > > > movable
> > > > BARs
> > > > patchset.
> > > 
> > > Something's wrong if this system booted before this patch set but
> > > not
> > > after.  We shouldn't be doing *anything* with the BARs until we
> > > need
> > > to, i.e., until we hot-add a device where we have to move things
> > > to
> > > find space for it.
> > 
> > The one breaking boot on this system initially was 17/26 of this
> > patchset: "PCI: hotplug: Ignore the MEM BAR offsets from
> > BIOS/bootloader"
> 
> I don't think that patch is a good idea.  I think we should read the
> current BARs and windows at boot-time and leave them alone unless we
> *must* change them.  I don't think we should change things
> preemptively to make future hotplug events easier.
> 
> > Before it the kernel just took BARs pre-assigned by BIOS. In the
> > same
> > time, the same BIOS reports 0xebff1000-0xfe9fffff as available for
> > PCI
> > devices, but the real root bridge windows are 0xe8b00000-0xefffffff 
> > in
> > total (and also 64-bit windows) - which are also reported by the
> > same
> > BIOS. So the kernel was only able to handle the 0xec000000-
> > 0xefffffff
> > root bus.
> > 
> > With that patch reverted the kernel was able to boot, but unable to
> > rescan - to reassign BARs actually.
> > 
> > > (And we don't want a bisection hole where this system can't boot
> > > until
> > > this patch is applied, but I assume that's obvious.)
> > > 
> > > > Signed-off-by: Sergei Miroshnichenko <
> > > > s.miroshnichenko@yadro.com>
> > > > ---
> > > >  drivers/pci/setup-res.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> > > > index a7d81816d1ea..4043aab021dd 100644
> > > > --- a/drivers/pci/setup-res.c
> > > > +++ b/drivers/pci/setup-res.c
> > > > @@ -246,12 +246,13 @@ static int __pci_assign_resource(struct
> > > > pci_bus *bus, struct pci_dev *dev,
> > > >  		int resno, resource_size_t size,
> > > > resource_size_t align)
> > > >  {
> > > >  	struct resource *res = dev->resource + resno;
> > > > -	resource_size_t min;
> > > > +	resource_size_t min = 0;
> > > >  	int ret;
> > > >  	resource_size_t start = (resource_size_t)-1;
> > > >  	resource_size_t end = 0;
> > > >  
> > > > -	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > > > PCIBIOS_MIN_MEM;
> > > > +	if (!pci_can_move_bars)
> > > > +		min = (res->flags & IORESOURCE_IO) ?
> > > > PCIBIOS_MIN_IO :
> > > > PCIBIOS_MIN_MEM;
> > > 
> > > I don't understand the connection here.  PCIBIOS_MIN_MEM and
> > > PCIBIOS_MIN_IO are basically ways to say "we can't put PCI
> > > resources
> > > below this address".
> > > 
> > > On ACPI systems, the devices in the ACPI namespace are supposed
> > > to
> > > tell the OS what resources they use, and obviously the OS should
> > > not
> > > assign those resources to anything else.  If Linux handled all
> > > those
> > > ACPI resources correctly and in the absence of firmware defects,
> > > we
> > > shouldn't need PCIBIOS_MIN_MEM/_IO at all.  But neither of those
> > > is
> > > currently true.
> > > 
> > > It's true that we should be smarter about PCIBIOS_MIN_MEM/_IO,
> > > but I
> > > don't think that has anything to do with whether we support
> > > *moving*
> > > BARs.  We have to avoid the address space that's already in use
> > > in
> > > *all* cases.
> > 
> > This is connected to the approach of this feature: releasing,
> > recalculating and reassigning the BARs and bridge windows. If
> > movable
> > BARs are disabled, this bug doesn't reproduce. And the bug doesn't
> > let
> > the system boot when BARs are allowed to move. That's why I've tied
> > these together.
> 
> My point is just that logically this has nothing to do with movable
> BARs.
> 
> > This line setting the "min" to PCIBIOS_MIN_* is there untouched
> > since
> > the first kernel git commit in 2005 - could it be that all systems
> > are
> > just fine now, having their root bridge windows set up correctly?
> 
> I don't understand the question, sorry.
> 


I mean, every BAR assigned here can't reside outside of a host IO/MEM
bridge window, which is a bus->resource[n] set up by the platform code,
and their .start fields are seemed to be duplicated by the
PCIBIOS_MIN_* values - from the platform code as well. But the .start
fields are seem to be correct (aren't they?), and the PCIBIOS_MIN_*
values are sometimes definitely not.

What can be a reliable test to check if PCIBIOS_MIN_* are safe to
ignore unconditionally? Could it be a separate flag instead of the
pci_can_move_bars here?

Would it be fine for a start to ignore the PCIBIOS_MIN_* if it lies
completely outside of host bridge windows? So at least AMD EPYC can
obtain its hotplug power.

Best regards,
Serge

> > > >  	if (pci_can_move_bars && dev->subordinate && resno >=
> > > > PCI_BRIDGE_RESOURCES) {
> > > >  		struct pci_bus *child_bus = dev->subordinate;
> > > > -- 
> > > > 2.24.1
> > > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-02-05 13:04         ` Sergei Miroshnichenko
@ 2020-02-05 16:32           ` Bjorn Helgaas
  2020-02-12 12:16             ` Sergei Miroshnichenko
  0 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-02-05 16:32 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, linux, sr

On Wed, Feb 05, 2020 at 01:04:06PM +0000, Sergei Miroshnichenko wrote:
> On Fri, 2020-01-31 at 14:27 -0600, Bjorn Helgaas wrote:
> > On Fri, Jan 31, 2020 at 06:19:48PM +0000, Sergei Miroshnichenko
> > wrote:
> > > On Thu, 2020-01-30 at 17:52 -0600, Bjorn Helgaas wrote:
> > > > On Wed, Jan 29, 2020 at 06:29:27PM +0300, Sergei Miroshnichenko
> > > > wrote:
> > > > > BARs and bridge windows are only allowed to be assigned to
> > > > > their parent bus's bridge windows, going up to the root
> > > > > complex's resources.  So additional limitations on BAR
> > > > > address are not needed, and the PCIBIOS_MIN_MEM can be
> > > > > ignored.
> > > > 
> > > > This is theoretically true, but I don't think we have reliable
> > > > information about the host bridge windows in all cases, so
> > > > PCIBIOS_MIN_MEM/_IO is something of an approximation.
> > > > 
> > > > > Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS
> > > > > 1.3 on Supermicro H11SSL-i via e820__setup_pci_gap():
> > > > > 
> > > > >   [mem 0xebff1000-0xfe9fffff] available for PCI devices
> > > > > 
> > > > > is only suitable for a single RC out of four:
> > > > > 
> > > > >   pci_bus 0000:00: root bus resource [mem 0xec000000-0xefffffff
> > > > > window]
> > > > >   pci_bus 0000:20: root bus resource [mem 0xeb800000-0xebefffff
> > > > > window]
> > > > >   pci_bus 0000:40: root bus resource [mem 0xeb200000-0xeb5fffff
> > > > > window]
> > > > >   pci_bus 0000:60: root bus resource [mem 0xe8b00000-0xeaffffff
> > > > > window]
> > > > > 
> > > > > , which makes the AMD EPYC 7251 unable to boot with this
> > > > > movable BARs patchset.
> > > > 
> > > > Something's wrong if this system booted before this patch set
> > > > but not after.  We shouldn't be doing *anything* with the BARs
> > > > until we need to, i.e., until we hot-add a device where we
> > > > have to move things to find space for it.
> > > 
> > > The one breaking boot on this system initially was 17/26 of this
> > > patchset: "PCI: hotplug: Ignore the MEM BAR offsets from
> > > BIOS/bootloader"
> > 
> > I don't think that patch is a good idea.  I think we should read the
> > current BARs and windows at boot-time and leave them alone unless we
> > *must* change them.  I don't think we should change things
> > preemptively to make future hotplug events easier.
> > 
> > > Before it the kernel just took BARs pre-assigned by BIOS. In the
> > > same time, the same BIOS reports 0xebff1000-0xfe9fffff as
> > > available for PCI devices, but the real root bridge windows are
> > > 0xe8b00000-0xefffffff in total (and also 64-bit windows) - which
> > > are also reported by the same BIOS. So the kernel was only able
> > > to handle the 0xec000000- 0xefffffff root bus.
> > > 
> > > With that patch reverted the kernel was able to boot, but unable to
> > > rescan - to reassign BARs actually.
> > > 
> > > > (And we don't want a bisection hole where this system can't
> > > > boot until this patch is applied, but I assume that's
> > > > obvious.)
> > > > 
> > > > > Signed-off-by: Sergei Miroshnichenko <
> > > > > s.miroshnichenko@yadro.com>
> > > > > ---
> > > > >  drivers/pci/setup-res.c | 5 +++--
> > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> > > > > index a7d81816d1ea..4043aab021dd 100644
> > > > > --- a/drivers/pci/setup-res.c
> > > > > +++ b/drivers/pci/setup-res.c
> > > > > @@ -246,12 +246,13 @@ static int __pci_assign_resource(struct
> > > > > pci_bus *bus, struct pci_dev *dev,
> > > > >  		int resno, resource_size_t size,
> > > > > resource_size_t align)
> > > > >  {
> > > > >  	struct resource *res = dev->resource + resno;
> > > > > -	resource_size_t min;
> > > > > +	resource_size_t min = 0;
> > > > >  	int ret;
> > > > >  	resource_size_t start = (resource_size_t)-1;
> > > > >  	resource_size_t end = 0;
> > > > >  
> > > > > -	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > > > > PCIBIOS_MIN_MEM;
> > > > > +	if (!pci_can_move_bars)
> > > > > +		min = (res->flags & IORESOURCE_IO) ?
> > > > > PCIBIOS_MIN_IO :
> > > > > PCIBIOS_MIN_MEM;
> > > > 
> > > > I don't understand the connection here.  PCIBIOS_MIN_MEM and
> > > > PCIBIOS_MIN_IO are basically ways to say "we can't put PCI
> > > > resources below this address".
> > > > 
> > > > On ACPI systems, the devices in the ACPI namespace are
> > > > supposed to tell the OS what resources they use, and obviously
> > > > the OS should not assign those resources to anything else.  If
> > > > Linux handled all those ACPI resources correctly and in the
> > > > absence of firmware defects, we shouldn't need
> > > > PCIBIOS_MIN_MEM/_IO at all.  But neither of those is currently
> > > > true.
> > > > 
> > > > It's true that we should be smarter about PCIBIOS_MIN_MEM/_IO,
> > > > but I don't think that has anything to do with whether we
> > > > support *moving* BARs.  We have to avoid the address space
> > > > that's already in use in *all* cases.
> > > 
> > > This is connected to the approach of this feature: releasing,
> > > recalculating and reassigning the BARs and bridge windows. If
> > > movable BARs are disabled, this bug doesn't reproduce. And the
> > > bug doesn't let the system boot when BARs are allowed to move.
> > > That's why I've tied these together.
> > 
> > My point is just that logically this has nothing to do with
> > movable BARs.
> > 
> > > This line setting the "min" to PCIBIOS_MIN_* is there untouched
> > > since the first kernel git commit in 2005 - could it be that all
> > > systems are just fine now, having their root bridge windows set
> > > up correctly?
> > 
> > I don't understand the question, sorry.
> 
> I mean, every BAR assigned here can't reside outside of a host
> IO/MEM bridge window, which is a bus->resource[n] set up by the
> platform code, and their .start fields are seemed to be duplicated
> by the PCIBIOS_MIN_* values - from the platform code as well. But
> the .start fields are seem to be correct (aren't they?), and the
> PCIBIOS_MIN_* values are sometimes definitely not.
> 
> What can be a reliable test to check if PCIBIOS_MIN_* are safe to
> ignore unconditionally? Could it be a separate flag instead of the
> pci_can_move_bars here?
> 
> Would it be fine for a start to ignore the PCIBIOS_MIN_* if it lies
> completely outside of host bridge windows? So at least AMD EPYC can
> obtain its hotplug power.

PCIBIOS_MIN_* has a long history and it touches every arch, so you'd
have to make sure this is safe for all of them.

On x86, PCIBIOS_MIN_MEM is computed from the e820 memory map and is
basically a guess because the e820 map is only incidentally related to
ACPI device resource usage.  I could imagine making the x86
computation smarter by looking at the PNP0A03/PNP0A08 _CRS
information.

> > > > >  	if (pci_can_move_bars && dev->subordinate && resno >=
> > > > > PCI_BRIDGE_RESOURCES) {
> > > > >  		struct pci_bus *child_bus = dev->subordinate;
> > > > > -- 
> > > > > 2.24.1
> > > > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader
  2020-02-05 11:01     ` Sergei Miroshnichenko
@ 2020-02-05 16:42       ` Bjorn Helgaas
  2020-02-12 12:29         ` Sergei Miroshnichenko
  0 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2020-02-05 16:42 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, linux, sr

On Wed, Feb 05, 2020 at 11:01:32AM +0000, Sergei Miroshnichenko wrote:
> On Fri, 2020-01-31 at 14:31 -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko
> > wrote:
> > > BAR allocation by BIOS/UEFI/bootloader/firmware may be
> > > non-optimal and it may even clash with the kernel's BAR
> > > assignment algorithm.
> > > 
> > > For example, sometimes BIOS doesn't reserve space for SR-IOV
> > > BARs, and this bridge window can neither extend (blocked by
> > > immovable BARs) nor move (the device itself is immovable).
> > > 
> > > With this patch the kernel will use its own methods of BAR
> > > allocating when possible, increasing the chances of successful
> > > boot and hotplug.
> > > 
> > > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > > ---
> > >  drivers/pci/probe.c | 16 ++++++++++++++--
> > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index bb584038d5b4..f8f643dac6d1 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev, enum
> > > pci_bar_type type,
> > >  			 pos, (unsigned long long)region.start);
> > >  	}
> > >  
> > > +	if (pci_can_move_bars && res->start && !(res->flags &
> > > IORESOURCE_IO)) {
> > > +		pci_warn(dev, "ignore the current offset of BAR %llx-
> > > %llx\n",
> > > +			 l64, l64 + sz64 - 1);
> > > +		res->start = 0;
> > > +		res->end = sz64 - 1;
> > > +		res->flags |= IORESOURCE_SIZEALIGN;
> > > +	}
> > > +
> > >  	goto out;
> > >  
> > >  
> > > @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus
> > > *child)
> > >  		child->resource[i] = &dev-
> > > >resource[PCI_BRIDGE_RESOURCES+i];
> > >  
> > >  	pci_read_bridge_io(child);
> > > -	pci_read_bridge_mmio(child);
> > > -	pci_read_bridge_mmio_pref(child);
> > > +	if (!pci_can_move_bars) {
> > > +		pci_read_bridge_mmio(child);
> > > +		pci_read_bridge_mmio_pref(child);
> > > +	}
> > 
> > I mentioned this in another response, but I'll repeat it here to
> > comment on the code directly: I don't think we should have feature
> > tests like this "!pci_can_move_bars" scattered around, and I don't
> > want basic behaviors like reading bridge windows during enumeration
> > to
> > depend on it.
> > 
> > There's no obvious reason why we should ignore bridge windows if we
> > support movable BARs.
> 
> That patch solves a problem which is non-fatal during boot, but is
> breaking this whole patchset when trying a PCI rescan. On a specific
> machine we have a popular i350 network card:
> 
> $ lspci -tv
> -[0000:00]-...
>            +-01.1-[02]--+-00.0  Intel Corporation I350 Gigabit Network
> Connection
>            |            +-00.1  Intel Corporation I350 Gigabit Network
> Connection
>            |            +-00.2  Intel Corporation I350 Gigabit Network
> Connection
>            |            \-00.3  Intel Corporation I350 Gigabit Network
> Connection
> 
> On the "ROG STRIX Z370-F GAMING, BIOS 0612 03/01/2018" motherboard and
> vanilla kernel, not every BAR is allocated:
> 
> $ dmesg -t
>   ...
>   pci 0000:00:01.0: PCI bridge to [bus 01]
>   pci 0000:00:01.0:   bridge window [mem 0xf7700000-0xf78fffff]
>   pci 0000:02:00.0: BAR 7: assigned [mem 0xf7490000-0xf74affff]
>   pci 0000:02:00.0: BAR 10: assigned [mem 0xf74b0000-0xf74cffff]
>   pci 0000:02:00.1: BAR 7: assigned [mem 0xf74d0000-0xf74effff]
>   pci 0000:02:00.1: BAR 10: no space for [mem size 0x00020000]
>   pci 0000:02:00.1: BAR 10: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 7: no space for [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 7: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 10: no space for [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 10: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 7: no space for [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 7: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 10: no space for [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 10: failed to assign [mem size 0x00020000]
>   pci 0000:00:01.1: PCI bridge to [bus 02]
> 
> $ sudo cat /proc/iomem
>   ...
>   f7000000-f74fffff : PCI Bus 0000:02
>     f7000000-f70fffff : 0000:02:00.3
>       f7000000-f70fffff : igb
>     f7100000-f71fffff : 0000:02:00.2
>       f7100000-f71fffff : igb
>     f7200000-f72fffff : 0000:02:00.1
>       f7200000-f72fffff : igb
>     f7300000-f73fffff : 0000:02:00.0
>       f7300000-f73fffff : igb
>     f7400000-f747ffff : 0000:02:00.0
>     f7480000-f7483fff : 0000:02:00.3
>       f7480000-f7483fff : igb
>     f7484000-f7487fff : 0000:02:00.2
>       f7484000-f7487fff : igb
>     f7488000-f748bfff : 0000:02:00.1
>       f7488000-f748bfff : igb
>     f748c000-f748ffff : 0000:02:00.0
>       f748c000-f748ffff : igb
>     f7490000-f74affff : 0000:02:00.0
>     f74b0000-f74cffff : 0000:02:00.0
>     f74d0000-f74effff : 0000:02:00.1
> 
> But when allowing the kernel to allocate BARs properly, the map is
> full:
> 
>   c8200000-c87fffff : PCI Bus 0000:02
>     c8200000-c82fffff : 0000:02:00.0
>       c8200000-c82fffff : igb
>     c8300000-c83fffff : 0000:02:00.1
>       c8300000-c83fffff : igb
>     c8400000-c84fffff : 0000:02:00.2
>       c8400000-c84fffff : igb
>     c8500000-c85fffff : 0000:02:00.3
>       c8500000-c85fffff : igb
>     c8600000-c867ffff : 0000:02:00.0
>     c8680000-c8683fff : 0000:02:00.0
>       c8680000-c8683fff : igb
>     c8684000-c86a3fff : 0000:02:00.0
>     c86a4000-c86c3fff : 0000:02:00.0
>     c86c4000-c86c7fff : 0000:02:00.1
>       c86c4000-c86c7fff : igb
>     c86c8000-c86e7fff : 0000:02:00.1
>     c86e8000-c8707fff : 0000:02:00.1
>     c8708000-c870bfff : 0000:02:00.2
>       c8708000-c870bfff : igb
>     c870c000-c872bfff : 0000:02:00.2
>     c872c000-c874bfff : 0000:02:00.2
>     c874c000-c874ffff : 0000:02:00.3
>       c874c000-c874ffff : igb
>     c8750000-c876ffff : 0000:02:00.3
>     c8770000-c878ffff : 0000:02:00.3
> 
> In this particular case the "repaired" BARs are not vital and are not
> used by the igb driver, but in general such behavior of BIOS can lead
> to a non-working setup.
> 
> So ignoring pre-set BARs and bridge windows may be useful on its own,
> but it is also provides a working starting point required by this
> patchset, otherwise it will need to track such BARs impossible to
> assign, and don't try to assign them during a next PCI rescan.
> 
> The reason I've tied this feature to the "movable BARs" flag is that I
> know at least one exception demanding a workaround - VGA. So I wanted
> to provide a flag to disable it in case of other unforeseen
> consequences, and the only feature depends on this - is movable BARs.

If we need exceptions for broken or legacy devices, we should check
for those explicitly.

If we fail to assign some BARs at boot, I think it's reasonable to try
to reassign things before drivers claim the devices.  But support for
that should be in a reassignment path, not in the general enumeration
path.  And, since drivers aren't involved yet, it probably doesn't
even depend on pci_can_move_bars.

> The [07/26] "PCI: hotplug: Don't allow hot-added devices to steal
> resources" patch introduces an additional step in BAR assignment:
>  - try to assign every existing BAR + BARs of the hot-added device;
>  - it if fails, disable BARs for the hot-added device and retry without
>    them.
> 
> A possible way to work-around non-working BARs could be adding more
> steps:
>  - first try to assign every existing BAR + BARs not worked previously
>    + "hot-added" BARs;
>  - if it fails, retry without those BARs which weren't working before;
>  - if it still fails, retry without "hot-added" BARs.
> 
> Best regards,
> Serge
> 
> > >  	if (dev->transparent) {
> > >  		pci_bus_for_each_resource(child->parent, res, i) {
> > > @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge
> > > *bridge)
> > >  		pci_bus_claim_resources(bus);
> > >  	} else {
> > >  		pci_bus_size_bridges(bus);
> > > +		if (pci_can_move_bars)
> > > +			pci_bus_update_realloc_range(bus);
> > >  		pci_bus_assign_resources(bus);
> > >  
> > >  		list_for_each_entry(child, &bus->children, node)
> > > -- 
> > > 2.24.1
> > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-02-05 16:32           ` Bjorn Helgaas
@ 2020-02-12 12:16             ` Sergei Miroshnichenko
  2020-02-12 14:13               ` Bjorn Helgaas
  0 siblings, 1 reply; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-02-12 12:16 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, sr

On Wed, 2020-02-05 at 10:32 -0600, Bjorn Helgaas wrote:
> On Wed, Feb 05, 2020 at 01:04:06PM +0000, Sergei Miroshnichenko
> wrote:
> > On Fri, 2020-01-31 at 14:27 -0600, Bjorn Helgaas wrote:
> > > On Fri, Jan 31, 2020 at 06:19:48PM +0000, Sergei Miroshnichenko
> > > wrote:
> > > > On Thu, 2020-01-30 at 17:52 -0600, Bjorn Helgaas wrote:
> > > > > On Wed, Jan 29, 2020 at 06:29:27PM +0300, Sergei
> > > > > Miroshnichenko
> > > > > wrote:
> > > > > > BARs and bridge windows are only allowed to be assigned to
> > > > > > their parent bus's bridge windows, going up to the root
> > > > > > complex's resources.  So additional limitations on BAR
> > > > > > address are not needed, and the PCIBIOS_MIN_MEM can be
> > > > > > ignored.
> > > > > 
> > > > > This is theoretically true, but I don't think we have
> > > > > reliable
> > > > > information about the host bridge windows in all cases, so
> > > > > PCIBIOS_MIN_MEM/_IO is something of an approximation.
> > > > > 
> > > > > > Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS
> > > > > > 1.3 on Supermicro H11SSL-i via e820__setup_pci_gap():
> > > > > > 
> > > > > >   [mem 0xebff1000-0xfe9fffff] available for PCI devices
> > > > > > 
> > > > > > is only suitable for a single RC out of four:
> > > > > > 
> > > > > >   pci_bus 0000:00: root bus resource [mem 0xec000000-
> > > > > > 0xefffffff
> > > > > > window]
> > > > > >   pci_bus 0000:20: root bus resource [mem 0xeb800000-
> > > > > > 0xebefffff
> > > > > > window]
> > > > > >   pci_bus 0000:40: root bus resource [mem 0xeb200000-
> > > > > > 0xeb5fffff
> > > > > > window]
> > > > > >   pci_bus 0000:60: root bus resource [mem 0xe8b00000-
> > > > > > 0xeaffffff
> > > > > > window]
> > > > > > 
> > > > > > , which makes the AMD EPYC 7251 unable to boot with this
> > > > > > movable BARs patchset.
> > > > > 
> > > > > Something's wrong if this system booted before this patch set
> > > > > but not after.  We shouldn't be doing *anything* with the
> > > > > BARs
> > > > > until we need to, i.e., until we hot-add a device where we
> > > > > have to move things to find space for it.
> > > > 
> > > > The one breaking boot on this system initially was 17/26 of
> > > > this
> > > > patchset: "PCI: hotplug: Ignore the MEM BAR offsets from
> > > > BIOS/bootloader"
> > > 
> > > I don't think that patch is a good idea.  I think we should read
> > > the
> > > current BARs and windows at boot-time and leave them alone unless
> > > we
> > > *must* change them.  I don't think we should change things
> > > preemptively to make future hotplug events easier.
> > > 
> > > > Before it the kernel just took BARs pre-assigned by BIOS. In
> > > > the
> > > > same time, the same BIOS reports 0xebff1000-0xfe9fffff as
> > > > available for PCI devices, but the real root bridge windows are
> > > > 0xe8b00000-0xefffffff in total (and also 64-bit windows) -
> > > > which
> > > > are also reported by the same BIOS. So the kernel was only able
> > > > to handle the 0xec000000- 0xefffffff root bus.
> > > > 
> > > > With that patch reverted the kernel was able to boot, but
> > > > unable to
> > > > rescan - to reassign BARs actually.
> > > > 
> > > > > (And we don't want a bisection hole where this system can't
> > > > > boot until this patch is applied, but I assume that's
> > > > > obvious.)
> > > > > 
> > > > > > Signed-off-by: Sergei Miroshnichenko <
> > > > > > s.miroshnichenko@yadro.com>
> > > > > > ---
> > > > > >  drivers/pci/setup-res.c | 5 +++--
> > > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-
> > > > > > res.c
> > > > > > index a7d81816d1ea..4043aab021dd 100644
> > > > > > --- a/drivers/pci/setup-res.c
> > > > > > +++ b/drivers/pci/setup-res.c
> > > > > > @@ -246,12 +246,13 @@ static int
> > > > > > __pci_assign_resource(struct
> > > > > > pci_bus *bus, struct pci_dev *dev,
> > > > > >  		int resno, resource_size_t size,
> > > > > > resource_size_t align)
> > > > > >  {
> > > > > >  	struct resource *res = dev->resource + resno;
> > > > > > -	resource_size_t min;
> > > > > > +	resource_size_t min = 0;
> > > > > >  	int ret;
> > > > > >  	resource_size_t start = (resource_size_t)-1;
> > > > > >  	resource_size_t end = 0;
> > > > > >  
> > > > > > -	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > > > > > PCIBIOS_MIN_MEM;
> > > > > > +	if (!pci_can_move_bars)
> > > > > > +		min = (res->flags & IORESOURCE_IO) ?
> > > > > > PCIBIOS_MIN_IO :
> > > > > > PCIBIOS_MIN_MEM;
> > > > > 
> > > > > I don't understand the connection here.  PCIBIOS_MIN_MEM and
> > > > > PCIBIOS_MIN_IO are basically ways to say "we can't put PCI
> > > > > resources below this address".
> > > > > 
> > > > > On ACPI systems, the devices in the ACPI namespace are
> > > > > supposed to tell the OS what resources they use, and
> > > > > obviously
> > > > > the OS should not assign those resources to anything
> > > > > else.  If
> > > > > Linux handled all those ACPI resources correctly and in the
> > > > > absence of firmware defects, we shouldn't need
> > > > > PCIBIOS_MIN_MEM/_IO at all.  But neither of those is
> > > > > currently
> > > > > true.
> > > > > 
> > > > > It's true that we should be smarter about
> > > > > PCIBIOS_MIN_MEM/_IO,
> > > > > but I don't think that has anything to do with whether we
> > > > > support *moving* BARs.  We have to avoid the address space
> > > > > that's already in use in *all* cases.
> > > > 
> > > > This is connected to the approach of this feature: releasing,
> > > > recalculating and reassigning the BARs and bridge windows. If
> > > > movable BARs are disabled, this bug doesn't reproduce. And the
> > > > bug doesn't let the system boot when BARs are allowed to move.
> > > > That's why I've tied these together.
> > > 
> > > My point is just that logically this has nothing to do with
> > > movable BARs.
> > > 
> > > > This line setting the "min" to PCIBIOS_MIN_* is there untouched
> > > > since the first kernel git commit in 2005 - could it be that
> > > > all
> > > > systems are just fine now, having their root bridge windows set
> > > > up correctly?
> > > 
> > > I don't understand the question, sorry.
> > 
> > I mean, every BAR assigned here can't reside outside of a host
> > IO/MEM bridge window, which is a bus->resource[n] set up by the
> > platform code, and their .start fields are seemed to be duplicated
> > by the PCIBIOS_MIN_* values - from the platform code as well. But
> > the .start fields are seem to be correct (aren't they?), and the
> > PCIBIOS_MIN_* values are sometimes definitely not.
> > 
> > What can be a reliable test to check if PCIBIOS_MIN_* are safe to
> > ignore unconditionally? Could it be a separate flag instead of the
> > pci_can_move_bars here?
> > 
> > Would it be fine for a start to ignore the PCIBIOS_MIN_* if it lies
> > completely outside of host bridge windows? So at least AMD EPYC can
> > obtain its hotplug power.
> 
> PCIBIOS_MIN_* has a long history and it touches every arch, so you'd
> have to make sure this is safe for all of them.
> 

Right, I should rework this change and make it x86-specific - the only
platform where I've encountered this issue (invalid PCIBIOS_MIN_MEM
from the e820 memory map, preventing hotplug).

> On x86, PCIBIOS_MIN_MEM is computed from the e820 memory map and is
> basically a guess because the e820 map is only incidentally related
> to
> ACPI device resource usage.  I could imagine making the x86
> computation smarter by looking at the PNP0A03/PNP0A08 _CRS
> information.
> 

Would it be acceptable to set PCIBIOS_MIN_MEM to zero for x86 in
arch/x86/include/asm/pci.h ? I've debugged PCs in my possession, and I
see there every ACPI_RESOURCE_TYPE_ADDRESS* and
ACPI_RESOURCE_TYPE_FIXED_MEMORY* resource extracted from the ACPI are
eventually represented in /proc/iomem and /proc/ioports, so the kernel
can't put device BARs in these reserved address ranges.

Thanks,
Serge

> > > > > >  	if (pci_can_move_bars && dev->subordinate && resno >=
> > > > > > PCI_BRIDGE_RESOURCES) {
> > > > > >  		struct pci_bus *child_bus = dev->subordinate;
> > > > > > -- 
> > > > > > 2.24.1
> > > > > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader
  2020-02-05 16:42       ` Bjorn Helgaas
@ 2020-02-12 12:29         ` Sergei Miroshnichenko
  0 siblings, 0 replies; 48+ messages in thread
From: Sergei Miroshnichenko @ 2020-02-12 12:29 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, linux, sr

On Wed, 2020-02-05 at 10:42 -0600, Bjorn Helgaas wrote:
> On Wed, Feb 05, 2020 at 11:01:32AM +0000, Sergei Miroshnichenko
> wrote:
> > On Fri, 2020-01-31 at 14:31 -0600, Bjorn Helgaas wrote:
> > > On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko
> > > wrote:
> > > > BAR allocation by BIOS/UEFI/bootloader/firmware may be
> > > > non-optimal and it may even clash with the kernel's BAR
> > > > assignment algorithm.
> > > > 
> > > > For example, sometimes BIOS doesn't reserve space for SR-IOV
> > > > BARs, and this bridge window can neither extend (blocked by
> > > > immovable BARs) nor move (the device itself is immovable).
> > > > 
> > > > With this patch the kernel will use its own methods of BAR
> > > > allocating when possible, increasing the chances of successful
> > > > boot and hotplug.
> > > > 
> > > > Signed-off-by: Sergei Miroshnichenko <
> > > > s.miroshnichenko@yadro.com>
> > > > ---
> > > >  drivers/pci/probe.c | 16 ++++++++++++++--
> > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > > index bb584038d5b4..f8f643dac6d1 100644
> > > > --- a/drivers/pci/probe.c
> > > > +++ b/drivers/pci/probe.c
> > > > @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev,
> > > > enum
> > > > pci_bar_type type,
> > > >  			 pos, (unsigned long
> > > > long)region.start);
> > > >  	}
> > > >  
> > > > +	if (pci_can_move_bars && res->start && !(res->flags &
> > > > IORESOURCE_IO)) {
> > > > +		pci_warn(dev, "ignore the current offset of BAR
> > > > %llx-
> > > > %llx\n",
> > > > +			 l64, l64 + sz64 - 1);
> > > > +		res->start = 0;
> > > > +		res->end = sz64 - 1;
> > > > +		res->flags |= IORESOURCE_SIZEALIGN;
> > > > +	}
> > > > +
> > > >  	goto out;
> > > >  
> > > >  
> > > > @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus
> > > > *child)
> > > >  		child->resource[i] = &dev-
> > > > > resource[PCI_BRIDGE_RESOURCES+i];
> > > >  
> > > >  	pci_read_bridge_io(child);
> > > > -	pci_read_bridge_mmio(child);
> > > > -	pci_read_bridge_mmio_pref(child);
> > > > +	if (!pci_can_move_bars) {
> > > > +		pci_read_bridge_mmio(child);
> > > > +		pci_read_bridge_mmio_pref(child);
> > > > +	}
> > > 
> > > I mentioned this in another response, but I'll repeat it here to
> > > comment on the code directly: I don't think we should have
> > > feature
> > > tests like this "!pci_can_move_bars" scattered around, and I
> > > don't
> > > want basic behaviors like reading bridge windows during
> > > enumeration
> > > to
> > > depend on it.
> > > 
> > > There's no obvious reason why we should ignore bridge windows if
> > > we
> > > support movable BARs.
> > 
> > That patch solves a problem which is non-fatal during boot, but is
> > breaking this whole patchset when trying a PCI rescan. On a
> > specific
> > machine we have a popular i350 network card:
> > 
> > $ lspci -tv
> > -[0000:00]-...
> >            +-01.1-[02]--+-00.0  Intel Corporation I350 Gigabit
> > Network
> > Connection
> >            |            +-00.1  Intel Corporation I350 Gigabit
> > Network
> > Connection
> >            |            +-00.2  Intel Corporation I350 Gigabit
> > Network
> > Connection
> >            |            \-00.3  Intel Corporation I350 Gigabit
> > Network
> > Connection
> > 
> > On the "ROG STRIX Z370-F GAMING, BIOS 0612 03/01/2018" motherboard
> > and
> > vanilla kernel, not every BAR is allocated:
> > 
> > $ dmesg -t
> >   ...
> >   pci 0000:00:01.0: PCI bridge to [bus 01]
> >   pci 0000:00:01.0:   bridge window [mem 0xf7700000-0xf78fffff]
> >   pci 0000:02:00.0: BAR 7: assigned [mem 0xf7490000-0xf74affff]
> >   pci 0000:02:00.0: BAR 10: assigned [mem 0xf74b0000-0xf74cffff]
> >   pci 0000:02:00.1: BAR 7: assigned [mem 0xf74d0000-0xf74effff]
> >   pci 0000:02:00.1: BAR 10: no space for [mem size 0x00020000]
> >   pci 0000:02:00.1: BAR 10: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 7: no space for [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 7: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 10: no space for [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 10: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 7: no space for [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 7: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 10: no space for [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 10: failed to assign [mem size 0x00020000]
> >   pci 0000:00:01.1: PCI bridge to [bus 02]
> > 
> > $ sudo cat /proc/iomem
> >   ...
> >   f7000000-f74fffff : PCI Bus 0000:02
> >     f7000000-f70fffff : 0000:02:00.3
> >       f7000000-f70fffff : igb
> >     f7100000-f71fffff : 0000:02:00.2
> >       f7100000-f71fffff : igb
> >     f7200000-f72fffff : 0000:02:00.1
> >       f7200000-f72fffff : igb
> >     f7300000-f73fffff : 0000:02:00.0
> >       f7300000-f73fffff : igb
> >     f7400000-f747ffff : 0000:02:00.0
> >     f7480000-f7483fff : 0000:02:00.3
> >       f7480000-f7483fff : igb
> >     f7484000-f7487fff : 0000:02:00.2
> >       f7484000-f7487fff : igb
> >     f7488000-f748bfff : 0000:02:00.1
> >       f7488000-f748bfff : igb
> >     f748c000-f748ffff : 0000:02:00.0
> >       f748c000-f748ffff : igb
> >     f7490000-f74affff : 0000:02:00.0
> >     f74b0000-f74cffff : 0000:02:00.0
> >     f74d0000-f74effff : 0000:02:00.1
> > 
> > But when allowing the kernel to allocate BARs properly, the map is
> > full:
> > 
> >   c8200000-c87fffff : PCI Bus 0000:02
> >     c8200000-c82fffff : 0000:02:00.0
> >       c8200000-c82fffff : igb
> >     c8300000-c83fffff : 0000:02:00.1
> >       c8300000-c83fffff : igb
> >     c8400000-c84fffff : 0000:02:00.2
> >       c8400000-c84fffff : igb
> >     c8500000-c85fffff : 0000:02:00.3
> >       c8500000-c85fffff : igb
> >     c8600000-c867ffff : 0000:02:00.0
> >     c8680000-c8683fff : 0000:02:00.0
> >       c8680000-c8683fff : igb
> >     c8684000-c86a3fff : 0000:02:00.0
> >     c86a4000-c86c3fff : 0000:02:00.0
> >     c86c4000-c86c7fff : 0000:02:00.1
> >       c86c4000-c86c7fff : igb
> >     c86c8000-c86e7fff : 0000:02:00.1
> >     c86e8000-c8707fff : 0000:02:00.1
> >     c8708000-c870bfff : 0000:02:00.2
> >       c8708000-c870bfff : igb
> >     c870c000-c872bfff : 0000:02:00.2
> >     c872c000-c874bfff : 0000:02:00.2
> >     c874c000-c874ffff : 0000:02:00.3
> >       c874c000-c874ffff : igb
> >     c8750000-c876ffff : 0000:02:00.3
> >     c8770000-c878ffff : 0000:02:00.3
> > 
> > In this particular case the "repaired" BARs are not vital and are
> > not
> > used by the igb driver, but in general such behavior of BIOS can
> > lead
> > to a non-working setup.
> > 
> > So ignoring pre-set BARs and bridge windows may be useful on its
> > own,
> > but it is also provides a working starting point required by this
> > patchset, otherwise it will need to track such BARs impossible to
> > assign, and don't try to assign them during a next PCI rescan.
> > 
> > The reason I've tied this feature to the "movable BARs" flag is
> > that I
> > know at least one exception demanding a workaround - VGA. So I
> > wanted
> > to provide a flag to disable it in case of other unforeseen
> > consequences, and the only feature depends on this - is movable
> > BARs.
> 
> If we need exceptions for broken or legacy devices, we should check
> for those explicitly.
> 
> If we fail to assign some BARs at boot, I think it's reasonable to
> try
> to reassign things before drivers claim the devices.  But support for
> that should be in a reassignment path, not in the general enumeration
> path.  And, since drivers aren't involved yet, it probably doesn't
> even depend on pci_can_move_bars.
> 

Thanks for the advice, it worked! Now I'm validating in the end of
__init pci_assign_unassigned_resources() that every device got desired
BARs, otherwise call to pci_rescan_bus(), which is allowed to move BARs
- indeed not yet captured by drivers at this stage. This will be
submitted in v8.

Serge

> > The [07/26] "PCI: hotplug: Don't allow hot-added devices to steal
> > resources" patch introduces an additional step in BAR assignment:
> >  - try to assign every existing BAR + BARs of the hot-added device;
> >  - it if fails, disable BARs for the hot-added device and retry
> > without
> >    them.
> > 
> > A possible way to work-around non-working BARs could be adding more
> > steps:
> >  - first try to assign every existing BAR + BARs not worked
> > previously
> >    + "hot-added" BARs;
> >  - if it fails, retry without those BARs which weren't working
> > before;
> >  - if it still fails, retry without "hot-added" BARs.
> > 
> > Best regards,
> > Serge
> > 
> > > >  	if (dev->transparent) {
> > > >  		pci_bus_for_each_resource(child->parent, res,
> > > > i) {
> > > > @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge
> > > > *bridge)
> > > >  		pci_bus_claim_resources(bus);
> > > >  	} else {
> > > >  		pci_bus_size_bridges(bus);
> > > > +		if (pci_can_move_bars)
> > > > +			pci_bus_update_realloc_range(bus);
> > > >  		pci_bus_assign_resources(bus);
> > > >  
> > > >  		list_for_each_entry(child, &bus->children,
> > > > node)
> > > > -- 
> > > > 2.24.1
> > > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM
  2020-02-12 12:16             ` Sergei Miroshnichenko
@ 2020-02-12 14:13               ` Bjorn Helgaas
  0 siblings, 0 replies; 48+ messages in thread
From: Bjorn Helgaas @ 2020-02-12 14:13 UTC (permalink / raw)
  To: Sergei Miroshnichenko; +Cc: linux-pci, linux, sr

On Wed, Feb 12, 2020 at 12:16:55PM +0000, Sergei Miroshnichenko wrote:
> On Wed, 2020-02-05 at 10:32 -0600, Bjorn Helgaas wrote:
> > On Wed, Feb 05, 2020 at 01:04:06PM +0000, Sergei Miroshnichenko
> > wrote:
> > > On Fri, 2020-01-31 at 14:27 -0600, Bjorn Helgaas wrote:
> > > > On Fri, Jan 31, 2020 at 06:19:48PM +0000, Sergei Miroshnichenko
> > > > wrote:
> > > > > On Thu, 2020-01-30 at 17:52 -0600, Bjorn Helgaas wrote:
> > > > > > On Wed, Jan 29, 2020 at 06:29:27PM +0300, Sergei
> > > > > > Miroshnichenko
> > > > > > wrote:
> > > > > > > BARs and bridge windows are only allowed to be assigned to
> > > > > > > their parent bus's bridge windows, going up to the root
> > > > > > > complex's resources.  So additional limitations on BAR
> > > > > > > address are not needed, and the PCIBIOS_MIN_MEM can be
> > > > > > > ignored.
> > > > > > 
> > > > > > This is theoretically true, but I don't think we have
> > > > > > reliable
> > > > > > information about the host bridge windows in all cases, so
> > > > > > PCIBIOS_MIN_MEM/_IO is something of an approximation.
> > > > > > 
> > > > > > > Besides, the value of PCIBIOS_MIN_MEM reported by the BIOS
> > > > > > > 1.3 on Supermicro H11SSL-i via e820__setup_pci_gap():
> > > > > > > 
> > > > > > >   [mem 0xebff1000-0xfe9fffff] available for PCI devices
> > > > > > > 
> > > > > > > is only suitable for a single RC out of four:
> > > > > > > 
> > > > > > >   pci_bus 0000:00: root bus resource [mem 0xec000000-
> > > > > > > 0xefffffff
> > > > > > > window]
> > > > > > >   pci_bus 0000:20: root bus resource [mem 0xeb800000-
> > > > > > > 0xebefffff
> > > > > > > window]
> > > > > > >   pci_bus 0000:40: root bus resource [mem 0xeb200000-
> > > > > > > 0xeb5fffff
> > > > > > > window]
> > > > > > >   pci_bus 0000:60: root bus resource [mem 0xe8b00000-
> > > > > > > 0xeaffffff
> > > > > > > window]
> > > > > > > 
> > > > > > > , which makes the AMD EPYC 7251 unable to boot with this
> > > > > > > movable BARs patchset.
> > > > > > 
> > > > > > Something's wrong if this system booted before this patch
> > > > > > set but not after.  We shouldn't be doing *anything* with
> > > > > > the BARs until we need to, i.e., until we hot-add a device
> > > > > > where we have to move things to find space for it.
> > > > > 
> > > > > The one breaking boot on this system initially was 17/26 of
> > > > > this patchset: "PCI: hotplug: Ignore the MEM BAR offsets
> > > > > from BIOS/bootloader"
> > > > 
> > > > I don't think that patch is a good idea.  I think we should
> > > > read the current BARs and windows at boot-time and leave them
> > > > alone unless we *must* change them.  I don't think we should
> > > > change things preemptively to make future hotplug events
> > > > easier.
> > > > 
> > > > > Before it the kernel just took BARs pre-assigned by BIOS. In
> > > > > the same time, the same BIOS reports 0xebff1000-0xfe9fffff
> > > > > as available for PCI devices, but the real root bridge
> > > > > windows are 0xe8b00000-0xefffffff in total (and also 64-bit
> > > > > windows) - which are also reported by the same BIOS. So the
> > > > > kernel was only able to handle the 0xec000000- 0xefffffff
> > > > > root bus.
> > > > > 
> > > > > With that patch reverted the kernel was able to boot, but
> > > > > unable to rescan - to reassign BARs actually.
> > > > > 
> > > > > > (And we don't want a bisection hole where this system can't
> > > > > > boot until this patch is applied, but I assume that's
> > > > > > obvious.)
> > > > > > 
> > > > > > > Signed-off-by: Sergei Miroshnichenko <
> > > > > > > s.miroshnichenko@yadro.com>
> > > > > > > ---
> > > > > > >  drivers/pci/setup-res.c | 5 +++--
> > > > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-
> > > > > > > res.c
> > > > > > > index a7d81816d1ea..4043aab021dd 100644
> > > > > > > --- a/drivers/pci/setup-res.c
> > > > > > > +++ b/drivers/pci/setup-res.c
> > > > > > > @@ -246,12 +246,13 @@ static int
> > > > > > > __pci_assign_resource(struct
> > > > > > > pci_bus *bus, struct pci_dev *dev,
> > > > > > >  		int resno, resource_size_t size,
> > > > > > > resource_size_t align)
> > > > > > >  {
> > > > > > >  	struct resource *res = dev->resource + resno;
> > > > > > > -	resource_size_t min;
> > > > > > > +	resource_size_t min = 0;
> > > > > > >  	int ret;
> > > > > > >  	resource_size_t start = (resource_size_t)-1;
> > > > > > >  	resource_size_t end = 0;
> > > > > > >  
> > > > > > > -	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO :
> > > > > > > PCIBIOS_MIN_MEM;
> > > > > > > +	if (!pci_can_move_bars)
> > > > > > > +		min = (res->flags & IORESOURCE_IO) ?
> > > > > > > PCIBIOS_MIN_IO :
> > > > > > > PCIBIOS_MIN_MEM;
> > > > > > 
> > > > > > I don't understand the connection here.  PCIBIOS_MIN_MEM and
> > > > > > PCIBIOS_MIN_IO are basically ways to say "we can't put PCI
> > > > > > resources below this address".
> > > > > > 
> > > > > > On ACPI systems, the devices in the ACPI namespace are
> > > > > > supposed to tell the OS what resources they use, and
> > > > > > obviously
> > > > > > the OS should not assign those resources to anything
> > > > > > else.  If
> > > > > > Linux handled all those ACPI resources correctly and in the
> > > > > > absence of firmware defects, we shouldn't need
> > > > > > PCIBIOS_MIN_MEM/_IO at all.  But neither of those is
> > > > > > currently
> > > > > > true.
> > > > > > 
> > > > > > It's true that we should be smarter about
> > > > > > PCIBIOS_MIN_MEM/_IO,
> > > > > > but I don't think that has anything to do with whether we
> > > > > > support *moving* BARs.  We have to avoid the address space
> > > > > > that's already in use in *all* cases.
> > > > > 
> > > > > This is connected to the approach of this feature: releasing,
> > > > > recalculating and reassigning the BARs and bridge windows. If
> > > > > movable BARs are disabled, this bug doesn't reproduce. And the
> > > > > bug doesn't let the system boot when BARs are allowed to move.
> > > > > That's why I've tied these together.
> > > > 
> > > > My point is just that logically this has nothing to do with
> > > > movable BARs.
> > > > 
> > > > > This line setting the "min" to PCIBIOS_MIN_* is there untouched
> > > > > since the first kernel git commit in 2005 - could it be that
> > > > > all
> > > > > systems are just fine now, having their root bridge windows set
> > > > > up correctly?
> > > > 
> > > > I don't understand the question, sorry.
> > > 
> > > I mean, every BAR assigned here can't reside outside of a host
> > > IO/MEM bridge window, which is a bus->resource[n] set up by the
> > > platform code, and their .start fields are seemed to be duplicated
> > > by the PCIBIOS_MIN_* values - from the platform code as well. But
> > > the .start fields are seem to be correct (aren't they?), and the
> > > PCIBIOS_MIN_* values are sometimes definitely not.
> > > 
> > > What can be a reliable test to check if PCIBIOS_MIN_* are safe to
> > > ignore unconditionally? Could it be a separate flag instead of the
> > > pci_can_move_bars here?
> > > 
> > > Would it be fine for a start to ignore the PCIBIOS_MIN_* if it lies
> > > completely outside of host bridge windows? So at least AMD EPYC can
> > > obtain its hotplug power.
> > 
> > PCIBIOS_MIN_* has a long history and it touches every arch, so you'd
> > have to make sure this is safe for all of them.
> 
> Right, I should rework this change and make it x86-specific - the
> only platform where I've encountered this issue (invalid
> PCIBIOS_MIN_MEM from the e820 memory map, preventing hotplug).
> 
> > On x86, PCIBIOS_MIN_MEM is computed from the e820 memory map and
> > is basically a guess because the e820 map is only incidentally
> > related to ACPI device resource usage.  I could imagine making the
> > x86 computation smarter by looking at the PNP0A03/PNP0A08 _CRS
> > information.
> 
> Would it be acceptable to set PCIBIOS_MIN_MEM to zero for x86 in
> arch/x86/include/asm/pci.h ? I've debugged PCs in my possession, and
> I see there every ACPI_RESOURCE_TYPE_ADDRESS* and
> ACPI_RESOURCE_TYPE_FIXED_MEMORY* resource extracted from the ACPI
> are eventually represented in /proc/iomem and /proc/ioports, so the
> kernel can't put device BARs in these reserved address ranges.

I think setting PCIBIOS_MIN_MEM to zero for all of x86 would be too
hard to validate.  On x86, PCIBIOS_MIN_MEM is actually a variable
(pci_mem_start), so you could adjust that as we process host bridge
_CRS methods, e.g., if you find an aperture that starts below
pci_mem_start, update pci_mem_start to the beginning of the aperture.

> > > > > > >  	if (pci_can_move_bars && dev->subordinate && resno >=
> > > > > > > PCI_BRIDGE_RESOURCES) {
> > > > > > >  		struct pci_bus *child_bus = dev->subordinate;
> > > > > > > -- 
> > > > > > > 2.24.1
> > > > > > > 

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, back to index

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-29 15:29 [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 01/26] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 02/26] PCI: Enable bridge's I/O and MEM access for hotplugged devices Sergei Miroshnichenko
2020-01-30 23:12   ` Bjorn Helgaas
2020-01-29 15:29 ` [PATCH v7 03/26] PCI: hotplug: Initial support of the movable BARs feature Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 04/26] PCI: Add version of release_child_resources() aware of immovable BARs Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 05/26] PCI: hotplug: Fix reassigning the released BARs Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 06/26] PCI: hotplug: Recalculate every bridge window during rescan Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 07/26] PCI: hotplug: Don't allow hot-added devices to steal resources Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 08/26] PCI: hotplug: Try to reassign movable BARs only once Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 09/26] PCI: hotplug: Calculate immovable parts of bridge windows Sergei Miroshnichenko
2020-01-30 21:26   ` Bjorn Helgaas
2020-01-31 16:59     ` Sergei Miroshnichenko
2020-01-31 20:10       ` Bjorn Helgaas
2020-01-29 15:29 ` [PATCH v7 10/26] PCI: Include fixed and immovable BARs into the bus size calculating Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 11/26] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 12/26] PCI: Make sure bridge windows include their fixed BARs Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 13/26] PCI: hotplug: Add support of immovable BARs to pci_assign_resource() Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 14/26] PCI: hotplug: Sort immovable BARs before assignment Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 15/26] PCI: hotplug: Enable the movable BARs feature by default Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 16/26] PCI: Ignore PCIBIOS_MIN_MEM Sergei Miroshnichenko
2020-01-30 23:52   ` Bjorn Helgaas
2020-01-31 18:19     ` Sergei Miroshnichenko
2020-01-31 20:27       ` Bjorn Helgaas
2020-02-05 13:04         ` Sergei Miroshnichenko
2020-02-05 16:32           ` Bjorn Helgaas
2020-02-12 12:16             ` Sergei Miroshnichenko
2020-02-12 14:13               ` Bjorn Helgaas
2020-01-29 15:29 ` [PATCH v7 17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader Sergei Miroshnichenko
2020-01-31 20:31   ` Bjorn Helgaas
2020-02-05 11:01     ` Sergei Miroshnichenko
2020-02-05 16:42       ` Bjorn Helgaas
2020-02-12 12:29         ` Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 18/26] PCI: Treat VGA BARs as immovable Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 19/26] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 20/26] PNP: Don't reserve BARs for PCI when enabled movable BARs Sergei Miroshnichenko
2020-01-30 14:39   ` Rafael J. Wysocki
2020-01-30 21:14   ` Bjorn Helgaas
2020-01-31 15:48     ` Sergei Miroshnichenko
2020-01-31 19:39       ` Bjorn Helgaas
2020-01-29 15:29 ` [PATCH v7 21/26] PCI: hotplug: Don't disable the released bridge windows immediately Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 22/26] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 23/26] PCI: Don't claim immovable BARs Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 24/26] PCI: hotplug: Don't reserve bus space when enabled movable BARs Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 25/26] nvme-pci: Handle " Sergei Miroshnichenko
2020-01-29 15:29 ` [PATCH v7 26/26] PCI/portdrv: Declare support of " Sergei Miroshnichenko
2020-01-30 23:37 ` [PATCH v7 00/26] PCI: Allow BAR movement during boot and hotplug Bjorn Helgaas
2020-02-03  4:56   ` Oliver O'Halloran

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git