linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug
@ 2020-04-27 18:23 Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 01/24] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
                   ` (25 more replies)
  0 siblings, 26 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Currently PCI hotplug works on top of resources which are usually reserved
not by the kernel, but by BIOS, bootloader, firmware, etc. These resources
are gaps in the address space where BARs of new devices may fit, and extra
bus number per port, so bridges can be hot-added. This series aim the BARs
problem: it shows the kernel how to redistribute them on the run, so the
hotplug becomes predictable and cross-platform. A follow-up patchset will
propose a solution for bus numbers.

To arrange a space for BARs of new hotplugged devices, the kernel can pause
the drivers of working PCI devices and reshuffle the assigned BARs. When a
driver is un-paused by the kernel, it should ioremap() the new addresses of
its BARs.

Drivers indicate their support of the feature by implementing the new hooks
.rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
doesn't yet support the feature, BARs of its devices will be considered as
immovable and handled in the same way as resources with the PCI_FIXED flag:
they are guaranteed to remain untouched.

Tested on a number of x86_64 machines without any special kernel command
line arguments:
 - PC: i7-5930K + ASUS X99-A;
 - PC: i5-8500 + ASUS Z370-F;
 - Supermicro Super Server/H11SSL-i: AMD EPYC 7251;
 - HP ProLiant DL380 G5: Xeon X5460;
 - Dell Inspiron N5010: i5 M 480;
 - Dell Precision M6600: i7-2920XM.

Also tested on a Power8 (Vesnin) and Power9 (Nicole) ppc64le machines, but
with extra patchset, its next version is to be sent upstream a bit later.

First two patches of this series are independent bugfixes, both are not
related directly to the movable BARs feature, but without them the rest of
this series will not work as expected.

Patches 03-15 implement the essentials of the feature.

Patches 16-21 are performance improvements for movable BARs and pciehp.

Patch 22 enables the feature by default.

Patches 23-24 add movable BARs support to nvme and portdrv.

This patchset is a part of our work on adding support for hotplugging
chains of chassis full of other bridges, NVME drives, SAS HBAs, GPUs, etc.
without special requirements such as Hot-Plug Controller, reservation of
bus numbers or memory regions by firmware, etc.

Added Stefan Roese and Andy Lavr to CC, thank you for trying this on your
hardware!

Added Christian König and Ard Biesheuvel to CC, because of the recent
"PCI: allow pci_resize_resource() to be used on devices on the root bus"
thread, which covers a similar problem.

Changes since v7:
 - Added some documentation;
 - Replaced every occurrence of the word "immovable" with "fixed";
 - Don't touch PNP, ACPI resources anymore;
 - Replaced double rescan with triple rescan:
   * first try every BAR;
   * if that failed, retry without BARs which weren't assigned before;
   * if that failed, retry without BARs of hotplugged devices;
 - Reassign BARs during boot only if BIOS assigned not all requested BARs;
 - Fixed up PCIBIOS_MIN_MEM instead of ignoring it;
 - Now the feature auto-disables in presence of a transparent bridge;
 - Improved support of runtime PM;
 - Fixed issues with incorrectly released bridge windows;
 - Fixed calculating bridge window size.
 
Changes since v6:
 - Added a fix for hotplug on AMD Epyc + Supermicro H11SSL-i by ignoring
   PCIBIOS_MIN_MEM;
 - Fixed a workaround which marks VGA BARs as immovables;
 - Fixed misleading "can't claim BAR ... no compatible bridge window" error
   messages;
 - Refactored the code, reduced the amount of patches;
 - Exclude PowerPC-specific arch patches, they will be sent separately;
 - Disabled for PowerNV by default - waiting for the PCIPOCALYPSE patchset.
 - Fixed reports from the kbuild test robot.

Changes since v5:
 - Simplified the disable flag, now it is "pci=no_movable_buses";
 - More deliberate marking the BARs as immovable;
 - Mark as immovable BARs which are used by unbound drivers;
 - Ignoring BAR assignment by non-kernel program components, so the kernel
   is able now to distribute BARs in optimal and predictable way;
 - Move here PowerNV-specific patches from the older "powerpc/powernv/pci:
   Make hotplug self-sufficient, independent of FW and DT" series;
 - Fix EEH cache rebuilding and PE allocation for PowerNV during rescan.

Changes since v4:
 - Feature is enabled by default (turned on by one of the latest patches);
 - Add pci_dev_movable_bars_supported(dev) instead of marking the immovable
   BARs with the IORESOURCE_PCI_FIXED flag;
 - Set up PCIe bridges during rescan via sysfs, so MPS settings are now
   configured not only during system boot or pcihp events;
 - Allow movement of switch's BARs if claimed by portdrv;
 - Update EEH address caches after rescan for powerpc;
 - Don't disable completely hot-added devices which can't have BARs being
   fit - just disable their BARs, so they are still visible in lspci etc;
 - Clearer names: fixed_range_hard -> immovable_range, fixed_range_soft ->
   realloc_range;
 - Drop the patch for pci_restore_config_space() - fixed by properly using
   the runtime PM.

Changes since v3:
 - Rebased to the upstream, so the patches apply cleanly again.

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

Sergei Miroshnichenko (24):
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Ensure a bridge has I/O and MEM access for hot-added devices
  PCI: hotplug: Initial support of the movable BARs feature
  PCI: Add version of release_child_resources() aware of fixed BARs
  PCI: hotplug: Fix reassigning the released BARs
  PCI: hotplug: Recalculate every bridge window during rescan
  PCI: hotplug: Don't allow hot-added devices to steal resources
  PCI: Reassign BARs if BIOS/bootloader had assigned not all of them
  PCI: hotplug: Try to reassign movable BARs only once
  PCI: hotplug: Calculate fixed parts of bridge windows
  PCI: Include fixed BARs into the bus size calculating
  PCI: hotplug: movable BARs: Compute limits for relocated bridge
    windows
  PCI: Make sure bridge windows include their fixed BARs
  PCI: hotplug: Add support of fixed BARs to pci_assign_resource()
  PCI: hotplug: Sort fixed BARs before assignment
  x86/PCI/ACPI: Fix up PCIBIOS_MIN_MEM if value computed from e820 is
    invalid
  PCI: hotplug: Configure MPS after manual bus rescan
  PCI: hotplug: Don't disable the released bridge windows immediately
  PCI: pciehp: Trigger a domain rescan on hp events when enabled movable
    BARs
  PCI: Don't claim fixed BARs
  PCI: hotplug: Don't reserve bus space when enabled movable BARs
  PCI: hotplug: Enable the movable BARs feature by default
  PCI/portdrv: Declare support of movable BARs
  nvme-pci: Handle movable BARs

 Documentation/PCI/pci.rst                     |  55 +++
 .../admin-guide/kernel-parameters.txt         |   1 +
 arch/powerpc/platforms/powernv/pci.c          |   2 +
 arch/powerpc/platforms/pseries/setup.c        |   2 +
 arch/x86/pci/acpi.c                           |  15 +
 drivers/nvme/host/pci.c                       |  21 +-
 drivers/pci/bus.c                             |   2 +-
 drivers/pci/hotplug/pciehp_pci.c              |   5 +
 drivers/pci/iov.c                             |   2 +
 drivers/pci/pci.c                             |  33 +-
 drivers/pci/pci.h                             |  33 ++
 drivers/pci/pcie/portdrv_pci.c                |  11 +
 drivers/pci/probe.c                           | 399 +++++++++++++++++-
 drivers/pci/setup-bus.c                       | 301 ++++++++++---
 drivers/pci/setup-res.c                       |  75 +++-
 include/linux/pci.h                           |  20 +
 16 files changed, 905 insertions(+), 72 deletions(-)


base-commit: 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c
-- 
2.24.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 01/24] PCI: Fix race condition in pci_enable/disable_device()
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 02/24] PCI: Ensure a bridge has I/O and MEM access for hot-added devices Sergei Miroshnichenko
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko, Srinath Mannam, Marta Rybczynska

This is a yet another approach to fix an old [1-2] concurrency issue, when:
 - two or more devices are being hot-added into a bridge which was
   initially empty;
 - a bridge with two or more devices is being hot-added;
 - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.

The problem is that a bridge is reported as enabled before the MEM/IO bits
are actually written to the PCI_COMMAND register, so another driver thread
starts memory requests through the not-yet-enabled bridge:

 CPU0                                        CPU1

 pci_enable_device_mem()                     pci_enable_device_mem()
   pci_enable_bridge()                         pci_enable_bridge()
     pci_is_enabled()
       return false;
     atomic_inc_return(enable_cnt)
     Start actual enabling the bridge
     ...                                         pci_is_enabled()
     ...                                           return true;
     ...                                     Start memory requests <-- FAIL
     ...
     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
while enabling upstream bridges"), but adding a per-device mutexes and
preventing the dev->enable_cnt from from incrementing early.

CC: Srinath Mannam <srinath.mannam@broadcom.com>
CC: Marta Rybczynska <mrybczyn@kalray.eu>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>

[1] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[2] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
    [RFC PATCH] nvme: avoid race-conditions when enabling devices
---
 drivers/pci/pci.c   | 27 +++++++++++++++++++++++----
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 595fcf59843f..d7819be809a3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1733,6 +1733,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	struct pci_dev *bridge;
 	int retval;
 
+	mutex_lock(&dev->enable_mutex);
+
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
 		pci_enable_bridge(bridge);
@@ -1740,6 +1742,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
 	if (pci_is_enabled(dev)) {
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
+		mutex_unlock(&dev->enable_mutex);
 		return;
 	}
 
@@ -1748,11 +1751,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_err(dev, "Error enabling bridge (%d), continuing\n",
 			retval);
 	pci_set_master(dev);
+	mutex_unlock(&dev->enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
+	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
+	bool need_lock = !dev->subordinate;
 	int err;
 	int i, bars = 0;
 
@@ -1768,8 +1774,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
 	}
 
-	if (atomic_inc_return(&dev->enable_cnt) > 1)
+	if (need_lock)
+		mutex_lock(&dev->enable_mutex);
+	if (pci_is_enabled(dev)) {
+		if (need_lock)
+			mutex_unlock(&dev->enable_mutex);
 		return 0;		/* already enabled */
+	}
 
 	bridge = pci_upstream_bridge(dev);
 	if (bridge)
@@ -1784,8 +1795,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 			bars |= (1 << i);
 
 	err = do_pci_enable_device(dev, bars);
-	if (err < 0)
-		atomic_dec(&dev->enable_cnt);
+	if (err >= 0)
+		atomic_inc(&dev->enable_cnt);
+	if (need_lock)
+		mutex_unlock(&dev->enable_mutex);
 	return err;
 }
 
@@ -2029,15 +2042,21 @@ void pci_disable_device(struct pci_dev *dev)
 	if (dr)
 		dr->enabled = 0;
 
+	mutex_lock(&dev->enable_mutex);
+
 	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
 		      "disabling already-disabled device");
 
-	if (atomic_dec_return(&dev->enable_cnt) != 0)
+	if (atomic_dec_return(&dev->enable_cnt) != 0) {
+		mutex_unlock(&dev->enable_mutex);
 		return;
+	}
 
 	do_pci_disable_device(dev);
 
 	dev->is_busmaster = 0;
+
+	mutex_unlock(&dev->enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 77b8a145c39b..dd169e2935df 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2210,6 +2210,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
 	INIT_LIST_HEAD(&dev->bus_list);
 	dev->dev.type = &pci_dev_type;
 	dev->bus = pci_bus_get(bus);
+	mutex_init(&dev->enable_mutex);
 
 	return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 83ce1cdf5676..d3c9fc20fc1f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -431,6 +431,7 @@ struct pci_dev {
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
+	struct mutex	enable_mutex;
 
 	u32		saved_config_space[16]; /* Config space saved at suspend time */
 	struct hlist_head saved_cap_space;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 02/24] PCI: Ensure a bridge has I/O and MEM access for hot-added devices
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 01/24] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-29  6:30   ` kbuild test robot
  2020-04-27 18:23 ` [PATCH v8 03/24] PCI: hotplug: Initial support of the movable BARs feature Sergei Miroshnichenko
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

During hot-adding a device, its bridge may be already pci_is_enabled(), but
without the I/O and/or MEM access bits, which may be required by this new
device: these bits are set during first enabling the bridge, and they must
be checked again.

When hot-adding to the following bridge:

  +-[0020:00]---00.0-[01-0d]----00.0-[02-0d]----04.0-[03-0d]--   <-   00.0

this patch sets up the MEM bit in the downstream port 0020:02:04.0, needed
for 0020:08:00.0:

  [ 1037.698206] pci 0020:00:00.0: PCI bridge to [bus 01-0d]
  [ 1037.698785] pci 0020:00:00.0:   bridge window [mem 0x3fe800000000-0x3fe8017fffff]
  [ 1037.698874] pci 0020:00:00.0:   bridge window [mem 0x240000000000-0x2400ffffffff 64bit pref]
  [ 1037.699002] pcieport 0020:02:04.0: enabling device (0545 -> 0547)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [ 1037.699114] pcieport 0020:03:00.0: enabling device (0540 -> 0542)
  [ 1037.699198] pciehp 0020:04:09.0:pcie204: Slot #41 AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+
  [ 1037.699285] pciehp 0020:04:09.0:pcie204: Slot(41): Card present
  [ 1037.699346] pciehp 0020:04:09.0:pcie204: Slot(41): Already enabled

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d7819be809a3..2c0ae81d260d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1740,6 +1740,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
 		pci_enable_bridge(bridge);
 
 	if (pci_is_enabled(dev)) {
+		pci_reenable_device(dev);
+
 		if (!dev->is_busmaster)
 			pci_set_master(dev);
 		mutex_unlock(&dev->enable_mutex);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 03/24] PCI: hotplug: Initial support of the movable BARs feature
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 01/24] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 02/24] PCI: Ensure a bridge has I/O and MEM access for hot-added devices Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 04/24] PCI: Add version of release_child_resources() aware of fixed BARs Sergei Miroshnichenko
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

When hot-adding a device, the involved bridge may have windows not big
enough (or fragmented too much) for newly requested BARs to fit in. But
expanding these bridge windows may be impossible if they are wedged between
"neighboring" BARs.

Still, it may be possible to allocate a memory region for new BARs, if at
least some working BARs can be moved, using the following procedure:

1) Notify all the drivers which support movable BARs to pause and release
   the BARs; the rest of the drivers are guaranteed that their devices will
   not get BARs moved;

2) Release all the bridge windows and movable BARs;

3) Try to recalculate new bridge windows that will fit all the BAR types:
   - fixed (those marked with PCI_FIXED or bound by not-updated drivers);
   - movable;
   - newly requested by hot-added devices;

4) If that failed, disable BARs for one of the hot-added devices and repeat
   the step 3;

5) Notify the drivers, so they remap BARs and resume.

If bridge calculation and BAR assignment fail with hot-added devices, this
patchset disables their BARs, falling back to the same amount and size of
BARs as they were before the hotplug event. The kernel succeeded then - so
the same BAR layout will be reproduced again.

This makes the prior reservation of memory by BIOS/bootloader/firmware not
required anymore for the PCI hotplug.

Drivers indicate their support of movable BARs by implementing the new
.rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
device's activity must be paused during a rescan, and iounmap()+ioremap()
must be applied to every used BAR.

If a device is not bound to a driver, its BARs are considered movable.

For a higher probability of the successful BAR reassignment, all the BARs
and bridge windows should be released during a rescan, not only those with
higher addresses.

One example when it is needed, BAR(I) is moved to free a gap for the new
BAR(II):

Before:

    ==================== parent bridge window ===============
                   ---- hotplug bridge window ----
    |   BAR(I)    |   fixed BAR   |   fixed BAR   | fixed BAR |
        ^^^^^^                 ^
                               |
                           new BAR(II)

After:

    ==================== parent bridge window =========================
     ----------- hotplug bridge window -----------
    | new BAR(II) |   fixed BAR   |   fixed BAR   | fixed BAR | BAR(I)  |
      ^^^^^^^^^^^                                               ^^^^^^

Another example is a fragmented bridge window jammed between fixed BARs:

Before:

     ===================== parent bridge window ========================
                 ---------- hotplug bridge window ----------
    | fixed BAR |   | BAR(I) |    | BAR(II) |    | BAR(III) | fixed BAR |
                      ^^^^^^   ^    ^^^^^^^        ^^^^^^^^
                               |
                           new BAR(IV)

After:

     ==================== parent bridge window =========================
                 ---------- hotplug bridge window ----------
    | fixed BAR | BAR(I) | BAR(II) | BAR(III) | new BAR(IV) | fixed BAR |
                  ^^^^^^   ^^^^^^^   ^^^^^^^^   ^^^^^^^^^^^

This patch is a preparation for future patches with actual implementation,
and for now it just does the following:
 - declares the feature;
 - defines bool pci_can_move_bars and bool pci_dev_bar_fixed(dev, bar);
 - invokes the new .rescan_prepare() and .rescan_done() driver notifiers;
 - disables the feature for the powerpc (support will be added later, in
   another series).

The feature is disabled in this first patch of the series, until the actual
implementation if finalized by the following patches. It can be overridden
per-arch using the pci_can_move_bars=false flag or by the following command
line option:

    pci=no_movable_bars

Current approach doesn't support calculating windows for subtractive decode
bridges, so disable BAR movement if such bridge is present.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 Documentation/PCI/pci.rst                     | 55 +++++++++++
 .../admin-guide/kernel-parameters.txt         |  1 +
 arch/powerpc/platforms/powernv/pci.c          |  2 +
 arch/powerpc/platforms/pseries/setup.c        |  2 +
 drivers/pci/pci.c                             |  4 +
 drivers/pci/pci.h                             |  2 +
 drivers/pci/probe.c                           | 99 ++++++++++++++++++-
 include/linux/pci.h                           |  6 ++
 8 files changed, 169 insertions(+), 2 deletions(-)

diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
index 8c016d8c9862..2449f0b43c5a 100644
--- a/Documentation/PCI/pci.rst
+++ b/Documentation/PCI/pci.rst
@@ -576,3 +576,58 @@ handle the PCI master abort on all platforms if the PCI device is
 expected to not respond to a readl().  Most x86 platforms will allow
 MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
 (e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
+
+
+Movable BARs
+============
+
+To increase the probability of finding a space for BARs of hot-added devices,
+the kernel requests the drivers to release used BARs, so they can be moved
+to free a gap for new BARs.
+
+This ability can be added to a driver by implementing the
+:c:type:`rescan_prepare()` and :c:type:`rescan_done()` hooks from the
+:c:type:`struct pci_driver`.
+
+Before a PCI bus rescan the driver must pause its activity and unmap its
+BARs, here is an example of how the NVME driver can perform this::
+
+    static struct pci_driver nvme_driver = {
+            ...
+            .rescan_prepare = nvme_rescan_prepare,
+            .rescan_done    = nvme_rescan_done,
+    };
+
+    static void nvme_rescan_prepare(struct pci_dev *pdev)
+    {
+            struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+            nvme_dev_disable(dev, false);
+            nvme_dev_unmap(dev);
+            dev->bar = NULL;
+    }
+
+After a PCI rescan, the driver must re-read new addresses of BARs, remap
+them and resume::
+
+    static void nvme_rescan_done(struct pci_dev *pdev)
+    {
+            struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+            nvme_dev_map(dev);
+            nvme_reset_ctrl_sync(&dev->ctrl);
+    }
+
+Currently there are no reliable way to determine if a driver uses BARs of
+its devices or not (their :c:type:`struct resource` don't always have a child),
+so if it doesn't explicitly support movable BARs, they are considered fixed.
+To let the PCI subsystem move unused BARs, a driver still must implement empty
+hooks::
+
+    static void pcie_portdrv_rescan_prepare(struct pci_dev *pdev)
+    {
+    }
+
+    static void pcie_portdrv_rescan_done(struct pci_dev *pdev)
+    {
+    }
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdf..0ef5635961ca 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3669,6 +3669,7 @@
 				may put more devices in an IOMMU group.
 		force_floating	[S390] Force usage of floating interrupts.
 		nomio		[S390] Do not use MIO instructions.
+		no_movable_bars	Don't allow BARs to be moved during hotplug
 
 	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
 			Management.
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 5bf818246339..20e2f4289c23 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -896,6 +896,8 @@ void __init pnv_pci_init(void)
 {
 	struct device_node *np;
 
+	pci_can_move_bars = false;
+
 	pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
 
 	/* If we don't have OPAL, eg. in sim, just skip PCI probe */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 0c8421dd01ab..2edb41f55237 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -927,6 +927,8 @@ static void __init pseries_init(void)
 {
 	pr_debug(" -> pseries_init()\n");
 
+	pci_can_move_bars = false;
+
 #ifdef CONFIG_HVC_CONSOLE
 	if (firmware_has_feature(FW_FEATURE_LPAR))
 		hvc_vio_init_early();
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2c0ae81d260d..5ae07991edff 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -79,6 +79,8 @@ static void pci_dev_d3_sleep(struct pci_dev *dev)
 int pci_domains_supported = 1;
 #endif
 
+bool pci_can_move_bars;
+
 #define DEFAULT_CARDBUS_IO_SIZE		(256)
 #define DEFAULT_CARDBUS_MEM_SIZE	(64*1024*1024)
 /* pci=cbmemsize=nnM,cbiosize=nn can override this */
@@ -6553,6 +6555,8 @@ static int __init pci_setup(char *str)
 				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
 			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
 				disable_acs_redir_param = str + 18;
+			} else if (!strncmp(str, "no_movable_bars", 15)) {
+				pci_can_move_bars = false;
 			} else {
 				pr_err("PCI: Unknown option `%s'\n", str);
 			}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f75867106..25e49c5b998b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -292,6 +292,8 @@ void pci_disable_bridge_window(struct pci_dev *dev);
 struct pci_bus *pci_bus_get(struct pci_bus *bus);
 void pci_bus_put(struct pci_bus *bus);
 
+bool pci_dev_bar_fixed(struct pci_dev *dev, struct resource *res);
+
 /* PCIe link information from Link Capabilities 2 */
 #define PCIE_LNKCAP2_SLS2SPEED(lnkcap2) \
 	((lnkcap2) & PCI_EXP_LNKCAP2_SLS_32_0GB ? PCIE_SPEED_32_0GT : \
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index dd169e2935df..1ac08b64ce83 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2443,6 +2443,11 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 	/* Fix up broken headers */
 	pci_fixup_device(pci_fixup_header, dev);
 
+	if (dev->transparent && pci_can_move_bars) {
+		pci_info(dev, "Disable movable BARs in presence of a transparent bridge\n");
+		pci_can_move_bars = false;
+	}
+
 	pci_reassigndev_resource_alignment(dev);
 
 	dev->state_saved = false;
@@ -2977,6 +2982,7 @@ int pci_host_probe(struct pci_host_bridge *bridge)
 	 * or pci_bus_assign_resources().
 	 */
 	if (pci_has_flag(PCI_PROBE_ONLY)) {
+		pci_can_move_bars = false;
 		pci_bus_claim_resources(bus);
 	} else {
 		pci_bus_size_bridges(bus);
@@ -3169,6 +3175,81 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 	return max;
 }
 
+bool pci_dev_bar_fixed(struct pci_dev *dev, struct resource *res)
+{
+	struct pci_bus_region region;
+	int resno = res - dev->resource;
+
+	/* Bridge windows are never fixed */
+	if (resno >= PCI_BRIDGE_RESOURCES)
+		return false;
+
+	if (res->flags & IORESOURCE_PCI_FIXED)
+		return true;
+
+	if (!pci_can_move_bars)
+		return false;
+
+	if (dev->driver && dev->driver->rescan_prepare)
+		return false;
+
+	/* Workaround for the legacy VGA memory 0xa0000-0xbffff */
+	pcibios_resource_to_bus(dev->bus, &region, res);
+	if (region.start == 0xa0000)
+		return true;
+
+	if (!dev->driver && !res->child)
+		return false;
+
+	return true;
+}
+
+static void pci_bus_rescan_prepare(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (bus->self) {
+		pci_config_pm_runtime_get(bus->self);
+		pm_runtime_get_sync(&bus->self->dev);
+	}
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (child)
+			pci_bus_rescan_prepare(child);
+
+		if (dev->driver &&
+		    dev->driver->rescan_prepare)
+			dev->driver->rescan_prepare(dev);
+	}
+}
+
+static void pci_bus_rescan_done(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (bus->self && !pci_dev_is_added(bus->self))
+		return;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (dev->driver &&
+		    dev->driver->rescan_done)
+			dev->driver->rescan_done(dev);
+
+		if (child)
+			pci_bus_rescan_done(child);
+	}
+
+	if (bus->self) {
+		pci_save_state(bus->self);
+		pm_runtime_put(&bus->self->dev);
+		pci_config_pm_runtime_put(bus->self);
+	}
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3181,9 +3262,23 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
+	struct pci_bus *root = bus;
+
+	while (!pci_is_root_bus(root))
+		root = root->parent;
+
+	if (pci_can_move_bars) {
+		pci_bus_rescan_prepare(root);
+
+		max = pci_scan_child_bus(root);
+		pci_assign_unassigned_root_bus_resources(root);
+
+		pci_bus_rescan_done(root);
+	} else {
+		max = pci_scan_child_bus(bus);
+		pci_assign_unassigned_bus_resources(bus);
+	}
 
-	max = pci_scan_child_bus(bus);
-	pci_assign_unassigned_bus_resources(bus);
 	pci_bus_add_devices(bus);
 
 	return max;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d3c9fc20fc1f..adfc8dfdc87b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -831,6 +831,8 @@ struct module;
  *		e.g. drivers/net/e100.c.
  * @sriov_configure: Optional driver callback to allow configuration of
  *		number of VFs to enable via sysfs "sriov_numvfs" file.
+ * @rescan_prepare: Prepare to BAR movement - called before PCI rescan.
+ * @rescan_done: Remap BARs and restore after PCI rescan.
  * @err_handler: See Documentation/PCI/pci-error-recovery.rst
  * @groups:	Sysfs attribute groups.
  * @driver:	Driver model structure.
@@ -846,6 +848,8 @@ struct pci_driver {
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
+	void (*rescan_prepare)(struct pci_dev *dev);
+	void (*rescan_done)(struct pci_dev *dev);
 	const struct pci_error_handlers *err_handler;
 	const struct attribute_group **groups;
 	struct device_driver	driver;
@@ -1409,6 +1413,8 @@ void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 					 unsigned long type);
 
+extern bool pci_can_move_bars;
+
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 04/24] PCI: Add version of release_child_resources() aware of fixed BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (2 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 03/24] PCI: hotplug: Initial support of the movable BARs feature Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 05/24] PCI: hotplug: Fix reassigning the released BARs Sergei Miroshnichenko
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

When release_child_resources() is applied to a bridge window, it drops the
.start field of children BARs to zero, but the STARTALIGN flag remains set,
leaving the resource in a state not valid for later re-assignment.

Fixed BARs must preserve their offset and size: those marked with the
PCI_FIXED or which are bound by drivers without support of the movable BARs
feature.

Add the pci_release_child_resources() to replace release_child_resources()
in handling the described PCI-specific cases.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 53 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index bbcef1a053ab..1370e798db30 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1510,6 +1510,56 @@ static void __pci_bridge_assign_resources(const struct pci_dev *bridge,
 	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
 	 IORESOURCE_MEM_64)
 
+/*
+ * Similar to generic release_child_resources(), but aware of fixed BARs and the
+ * STARTALIGN flag.
+ */
+static void pci_release_child_resources(struct pci_bus *bus, struct resource *r)
+{
+	struct pci_dev *dev;
+
+	if (!pci_can_move_bars)
+		return release_child_resources(r);
+
+	if (!bus || !r)
+		return;
+
+	r->child = NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+
+		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+			struct resource *tmp = &dev->resource[i];
+			resource_size_t size = resource_size(tmp);
+
+			if (!tmp->flags || !tmp->parent)
+				continue;
+
+			tmp->parent = NULL;
+			tmp->sibling = NULL;
+
+			pci_release_child_resources(dev->subordinate, tmp);
+
+			tmp->flags &= ~IORESOURCE_STARTALIGN;
+			tmp->flags |= IORESOURCE_SIZEALIGN;
+
+			if (pci_dev_bar_fixed(dev, tmp)) {
+				pci_dbg(dev, "release fixed BAR %d %pR (%s), keep its flags, base and size\n",
+					i, tmp, tmp->name);
+				continue;
+			}
+
+			pci_dbg(dev, "release BAR %d %pR (%s)\n", i, tmp, tmp->name);
+
+			if (i >= PCI_BRIDGE_RESOURCES)
+				tmp->flags = 0;
+			tmp->start = 0;
+			tmp->end = size - 1;
+		}
+	}
+}
+
 static void pci_bridge_release_resources(struct pci_bus *bus,
 					 unsigned long type)
 {
@@ -1550,7 +1600,8 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		return;
 
 	/* If there are children, release them all */
-	release_child_resources(r);
+	pci_release_child_resources(bus, r);
+
 	if (!release_resource(r)) {
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 05/24] PCI: hotplug: Fix reassigning the released BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (3 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 04/24] PCI: Add version of release_child_resources() aware of fixed BARs Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 06/24] PCI: hotplug: Recalculate every bridge window during rescan Sergei Miroshnichenko
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Bridge windows are temporarily released during a PCI rescan, and their old
size is not relevant anymore - it will be recreated in pbus_size_*() from
scratch. To make these functions work correctly, set zero size of released
windows.

If BAR assignment fails after a PCI hotplug event, the kernel will retry
with a fall-back reduced configuration, so don't apply reset_resource() for
non-window BARs to keep them valid for the next attempt.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1370e798db30..ffa81949a75f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -295,7 +295,9 @@ static void assign_requested_resources_sorted(struct list_head *head,
 						    0 /* don't care */,
 						    0 /* don't care */);
 			}
-			reset_resource(res);
+			if (!pci_can_move_bars ||
+			    idx >= PCI_BRIDGE_RESOURCES)
+				reset_resource(res);
 		}
 	}
 }
@@ -1606,8 +1608,8 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		type = old_flags = r->flags & PCI_RES_TYPE_MASK;
 		pci_info(dev, "resource %d %pR released\n",
 			 PCI_BRIDGE_RESOURCES + idx, r);
-		/* Keep the old size */
-		r->end = resource_size(r) - 1;
+		/* Don't keep the old size if the bridge will be recalculated */
+		r->end = pci_can_move_bars ? 0 : (resource_size(r) - 1);
 		r->start = 0;
 		r->flags = 0;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 06/24] PCI: hotplug: Recalculate every bridge window during rescan
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (4 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 05/24] PCI: hotplug: Fix reassigning the released BARs Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 07/24] PCI: hotplug: Don't allow hot-added devices to steal resources Sergei Miroshnichenko
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

When the movable BARs feature is enabled and a rescan has been requested,
release all bridge windows and recalculate them from scratch, taking into
account all kinds of BARs: fixed, movable, new.

Comparing to simply trying to expand bridge windows, this also employs the
PCI ability to shuffle BARs within a bridge, increasing the chances to find
a memory space to fit BARs of newly hot-added devices, especially if no (or
not enough) gaps were reserved by the BIOS/bootloader/firmware.

The last step of writing the recalculated windows to the bridges is done by
the new pci_setup_bridges() function.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |  1 +
 drivers/pci/probe.c     | 22 ++++++++++++++++++++++
 drivers/pci/setup-bus.c |  9 +++++++++
 3 files changed, 32 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 25e49c5b998b..c7d3c022bf35 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -286,6 +286,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
+void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 1ac08b64ce83..5baad5325b16 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3250,6 +3250,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
 	}
 }
 
+static void pci_setup_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child;
+
+		if (!pci_dev_is_added(dev))
+			continue;
+
+		child = dev->subordinate;
+		if (child)
+			pci_setup_bridges(child);
+	}
+
+	if (bus->self)
+		pci_setup_bridge(bus);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3271,8 +3290,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_bus_rescan_prepare(root);
 
 		max = pci_scan_child_bus(root);
+
+		pci_bus_release_root_bridge_resources(root);
 		pci_assign_unassigned_root_bus_resources(root);
 
+		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
 	} else {
 		max = pci_scan_child_bus(bus);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ffa81949a75f..00bdbc0ea817 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1663,6 +1663,15 @@ static void pci_bus_release_bridge_resources(struct pci_bus *bus,
 		pci_bridge_release_resources(bus, type);
 }
 
+void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
+{
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, whole_subtree);
+	pci_bus_release_bridge_resources(root_bus,
+					 IORESOURCE_MEM_64 | IORESOURCE_PREFETCH,
+					 whole_subtree);
+}
+
 static void pci_bus_dump_res(struct pci_bus *bus)
 {
 	struct resource *res;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 07/24] PCI: hotplug: Don't allow hot-added devices to steal resources
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (5 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 06/24] PCI: hotplug: Recalculate every bridge window during rescan Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 08/24] PCI: Reassign BARs if BIOS/bootloader had assigned not all of them Sergei Miroshnichenko
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

When movable BARs are enabled, the PCI subsystem at first releases all the
bridge windows and then attempts to assign resources both to previously
working devices and to the newly hot-added ones, with the same priority.

If a hot-added device gets its BARs first, this may lead to lack of space
for already working devices, which is unacceptable. If that happens, mark
one of the new devices with the newly introduced flag PCI_DEV_DISABLED_BARS
(if it is not yet marked) and retry the BAR recalculation.

The worst case would be no BARs for hot-added devices, while all the rest
just continue working.

The algorithm is simple and it doesn't retry different subsets of hot-added
devices in case of a failure, e.g. if there are no space to allocate BARs
for both hot-added devices A and B, but is enough for just A, the A will be
marked with PCI_DEV_DISABLED_BARS first, then (after the next failure) - B.
As a result, A will not get BARs while it could. This issue is only
relevant when hot-adding two and more devices simultaneously.

Add a new res_mask bitmask to the struct pci_dev for storing the indices of
assigned BARs.

When preparing to the next rescan, all PCI_DEV_DISABLED_BARS marks are
unset, so the kernel can retry to assign them.

Before a rescan, some working devices may have assigned only part of their
BARs - for example, if BIOS didn't allocate them. With this patch, the
kernel assigns BARs in three steps:
  - first try every BAR, even those that weren't assigned before;
  - if that fails, retry without those failed BARs;
  - if that fails, retry without one of hotplugged devices.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       |   3 +
 drivers/pci/probe.c     | 177 +++++++++++++++++++++++++++++++++++++++-
 drivers/pci/setup-bus.c |   6 +-
 drivers/pci/setup-res.c |   2 +
 include/linux/pci.h     |   1 +
 5 files changed, 186 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c7d3c022bf35..7483a5716317 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -294,6 +294,8 @@ struct pci_bus *pci_bus_get(struct pci_bus *bus);
 void pci_bus_put(struct pci_bus *bus);
 
 bool pci_dev_bar_fixed(struct pci_dev *dev, struct resource *res);
+bool pci_dev_bar_enabled(const struct pci_dev *dev, int idx);
+bool pci_bus_check_bars_assigned(struct pci_bus *bus, bool complete_set);
 
 /* PCIe link information from Link Capabilities 2 */
 #define PCIE_LNKCAP2_SLS2SPEED(lnkcap2) \
@@ -412,6 +414,7 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DEV_DISABLED_BARS 1
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5baad5325b16..5ca6e5887326 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -35,6 +35,13 @@ static struct resource busn_resource = {
 LIST_HEAD(pci_root_buses);
 EXPORT_SYMBOL(pci_root_buses);
 
+/*
+ * This flag is used during pci_rescan_bus(), protected by pci_rescan_remove_lock:
+ * it indicates which BARs should be reassigned: every one, or only those which
+ * were assigned before the rescan.
+ */
+static bool pci_try_failed_bars = true;
+
 static LIST_HEAD(pci_domain_busn_res_list);
 
 struct pci_domain_busn_res {
@@ -43,6 +50,41 @@ struct pci_domain_busn_res {
 	int domain_nr;
 };
 
+static void pci_dev_disable_bars(struct pci_dev *dev)
+{
+	assign_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags, true);
+}
+
+static void pci_dev_enable_bars(struct pci_dev *dev)
+{
+	assign_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags, false);
+}
+
+static bool pci_dev_bars_enabled(const struct pci_dev *dev)
+{
+	if (pci_try_failed_bars)
+		return true;
+
+	return !(test_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags));
+}
+
+bool pci_dev_bar_enabled(const struct pci_dev *dev, int idx)
+{
+	if (idx >= PCI_BRIDGE_RESOURCES)
+		return true;
+
+	if (pci_try_failed_bars)
+		return true;
+
+	if (test_bit(PCI_DEV_DISABLED_BARS, &dev->priv_flags))
+		return false;
+
+	if (!pci_dev_is_added(dev))
+		return true;
+
+	return dev->res_mask & (1 << idx);
+}
+
 static struct resource *get_pci_domain_busn_res(int domain_nr)
 {
 	struct pci_domain_busn_res *r;
@@ -3204,6 +3246,24 @@ bool pci_dev_bar_fixed(struct pci_dev *dev, struct resource *res)
 	return true;
 }
 
+static unsigned int pci_dev_count_res_mask(struct pci_dev *dev)
+{
+	unsigned int res_mask = 0;
+	int i;
+
+	for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+		struct resource *r = &dev->resource[i];
+
+		if (!r->flags || !r->parent ||
+		    (r->flags & IORESOURCE_UNSET))
+			continue;
+
+		res_mask |= (1 << i);
+	}
+
+	return res_mask;
+}
+
 static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3216,6 +3276,9 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		struct pci_bus *child = dev->subordinate;
 
+		dev->res_mask = pci_dev_count_res_mask(dev);
+		pci_dev_enable_bars(dev);
+
 		if (child)
 			pci_bus_rescan_prepare(child);
 
@@ -3269,6 +3332,118 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+
+	if (!bus)
+		return NULL;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child_bus = dev->subordinate;
+
+		if (child_bus) {
+			struct pci_dev *next_new_dev;
+
+			next_new_dev = pci_find_next_new_device(child_bus);
+			if (next_new_dev)
+				return next_new_dev;
+		}
+
+		if (!pci_dev_is_added(dev) && pci_dev_bars_enabled(dev))
+			return dev;
+	}
+
+	return NULL;
+}
+
+/**
+ * pci_bus_check_bars_assigned - check BARs under the bridge
+ * @bus: Parent PCI bus
+ * @complete_set: check every BAR, otherwise only those assigned before
+ *
+ * Returns true if every BAR is assigned.
+ */
+bool pci_bus_check_bars_assigned(struct pci_bus *bus, bool complete_set)
+{
+	struct pci_dev *dev;
+	bool good = true;
+
+	if (!bus)
+		return false;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		struct pci_bus *child = dev->subordinate;
+
+		if (complete_set) {
+			int i;
+
+			for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+				struct resource *r = &dev->resource[i];
+
+				if (!(r->flags & IORESOURCE_UNSET))
+					continue;
+
+				pci_warn(dev, "BAR %d: requested but not assigned: %pR\n",
+					 i, r);
+				good = false;
+			}
+		} else {
+			unsigned int res_mask;
+
+			if (!pci_dev_bars_enabled(dev))
+				continue;
+
+			res_mask = pci_dev_count_res_mask(dev);
+
+			if (dev->res_mask & ~res_mask) {
+				pci_err(dev, "Non-re-enabled resources found: 0x%x -> 0x%x\n",
+					dev->res_mask, res_mask);
+				good = false;
+			}
+		}
+
+		if (child && !pci_bus_check_bars_assigned(child, complete_set))
+			good = false;
+	}
+
+	return good;
+}
+
+static void pci_reassign_root_bus_resources(struct pci_bus *root)
+{
+	do {
+		struct pci_dev *next_new_dev;
+
+		pci_assign_unassigned_root_bus_resources(root);
+
+		if (pci_bus_check_bars_assigned(root, pci_try_failed_bars))
+			break;
+
+		if (pci_try_failed_bars) {
+			dev_warn(&root->dev, "failed to assign all BARs, retry without those failed before\n");
+
+			pci_bus_release_root_bridge_resources(root);
+			pci_try_failed_bars = false;
+			continue;
+		}
+
+		next_new_dev = pci_find_next_new_device(root);
+		if (!next_new_dev) {
+			dev_err(&root->dev, "failed to reassign BARs even after ignoring all the hot-added devices, reload the kernel with pci=no_movable_bars\n");
+			break;
+		}
+
+		dev_warn(&root->dev, "failed to reassign BARs, disable the next hot-added device %s and retry\n",
+			 dev_name(&next_new_dev->dev));
+
+		pci_dev_disable_bars(next_new_dev);
+		pci_bus_release_root_bridge_resources(root);
+	} while (true);
+
+	pci_try_failed_bars = true;
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3292,7 +3467,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		max = pci_scan_child_bus(root);
 
 		pci_bus_release_root_bridge_resources(root);
-		pci_assign_unassigned_root_bus_resources(root);
+		pci_reassign_root_bus_resources(root);
 
 		pci_setup_bridges(root);
 		pci_bus_rescan_done(root);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 00bdbc0ea817..3bde8fdb9aa0 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -139,7 +139,7 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		if (r->flags & IORESOURCE_PCI_FIXED)
 			continue;
 
-		if (!(r->flags) || r->parent)
+		if (!(r->flags) || r->parent || !pci_dev_bar_enabled(dev, i))
 			continue;
 
 		r_align = pci_resource_alignment(dev, r);
@@ -897,7 +897,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 			struct resource *r = &dev->resource[i];
 			unsigned long r_size;
 
-			if (r->parent || !(r->flags & IORESOURCE_IO))
+			if (r->parent || !(r->flags & IORESOURCE_IO) ||
+			    !pci_dev_bar_enabled(dev, i))
 				continue;
 			r_size = resource_size(r);
 
@@ -1017,6 +1018,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			resource_size_t r_size;
 
 			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			    !pci_dev_bar_enabled(dev, i) ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
 			     (r->flags & mask) != type3))
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d8ca40a97693..51bc69d60791 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -472,6 +472,8 @@ int pci_enable_resources(struct pci_dev *dev, int mask)
 		if ((i == PCI_ROM_RESOURCE) &&
 				(!(r->flags & IORESOURCE_ROM_ENABLE)))
 			continue;
+		if (!pci_dev_bar_enabled(dev, i))
+			continue;
 
 		if (r->flags & IORESOURCE_UNSET) {
 			pci_err(dev, "can't enable device: BAR %d %pR not assigned\n",
diff --git a/include/linux/pci.h b/include/linux/pci.h
index adfc8dfdc87b..6a0a919a3cdb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -382,6 +382,7 @@ struct pci_dev {
 	 */
 	unsigned int	irq;
 	struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
+	unsigned int	res_mask;		/* Bitmask of assigned resources */
 
 	bool		match_driver;		/* Skip attaching driver */
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 08/24] PCI: Reassign BARs if BIOS/bootloader had assigned not all of them
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (6 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 07/24] PCI: hotplug: Don't allow hot-added devices to steal resources Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 09/24] PCI: hotplug: Try to reassign movable BARs only once Sergei Miroshnichenko
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Some BIOSes don't allocate all requested BARs, leaving some (for example,
SR_IOV) unassigned, without gaps for bridge windows to extend.

If that happens, let the kernel use its own methods of BAR allocating on an
early init stage, when drivers aren't yet bound to their devices, and it is
safe to shuffle BARs that are not yet used.

Not every BAR can be safely moved: some framebuffer drivers (efifb) don't
act as a PCI driver (like nouveau), taking BAR locations indirectly - via
ACPI for example. Until every such driver is aware of movable BARs, mark
every VGA BAR as fixed. Perhaps this is also useful for splash screens, so
they don't flicker.

If this reassignment fails, fall back to the BAR layout proposed by BIOS,
working around the fact that they are marked with IORESOURCE_UNSET during
init, so the new flag pci_init_done was introduced.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h       | 2 ++
 drivers/pci/probe.c     | 8 +++++++-
 drivers/pci/setup-bus.c | 7 +++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 7483a5716317..2ef72741e8e5 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -297,6 +297,8 @@ bool pci_dev_bar_fixed(struct pci_dev *dev, struct resource *res);
 bool pci_dev_bar_enabled(const struct pci_dev *dev, int idx);
 bool pci_bus_check_bars_assigned(struct pci_bus *bus, bool complete_set);
 
+extern bool pci_init_done;
+
 /* PCIe link information from Link Capabilities 2 */
 #define PCIE_LNKCAP2_SLS2SPEED(lnkcap2) \
 	((lnkcap2) & PCI_EXP_LNKCAP2_SLS_32_0GB ? PCIE_SPEED_32_0GT : \
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5ca6e5887326..0c681bb767cc 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -41,6 +41,7 @@ EXPORT_SYMBOL(pci_root_buses);
  * were assigned before the rescan.
  */
 static bool pci_try_failed_bars = true;
+bool pci_init_done;
 
 static LIST_HEAD(pci_domain_busn_res_list);
 
@@ -3240,6 +3241,11 @@ bool pci_dev_bar_fixed(struct pci_dev *dev, struct resource *res)
 	if (region.start == 0xa0000)
 		return true;
 
+	if (res->start &&
+	    !(res->flags & IORESOURCE_IO) &&
+	    (dev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
+		return true;
+
 	if (!dev->driver && !res->child)
 		return false;
 
@@ -3255,7 +3261,7 @@ static unsigned int pci_dev_count_res_mask(struct pci_dev *dev)
 		struct resource *r = &dev->resource[i];
 
 		if (!r->flags || !r->parent ||
-		    (r->flags & IORESOURCE_UNSET))
+		    (pci_init_done && (r->flags & IORESOURCE_UNSET)))
 			continue;
 
 		res_mask |= (1 << i);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3bde8fdb9aa0..d265db4c746d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1909,7 +1909,14 @@ void __init pci_assign_unassigned_resources(void)
 		/* Make sure the root bridge has a companion ACPI device */
 		if (ACPI_HANDLE(root_bus->bridge))
 			acpi_ioapic_add(ACPI_HANDLE(root_bus->bridge));
+
+		if (pci_can_move_bars && !pci_bus_check_bars_assigned(root_bus, true)) {
+			dev_err(&root_bus->dev, "Not all requested BARs are assigned, triggering a rescan with movable BARs");
+			pci_rescan_bus(root_bus);
+		}
 	}
+
+	pci_init_done = true;
 }
 
 static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 09/24] PCI: hotplug: Try to reassign movable BARs only once
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (7 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 08/24] PCI: Reassign BARs if BIOS/bootloader had assigned not all of them Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 10/24] PCI: hotplug: Calculate fixed parts of bridge windows Sergei Miroshnichenko
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

With enabled BAR movement, BARs and bridge windows can only be assigned to
their direct parents, so there can be only one variant of resource tree,
thus every retry within the pci_assign_unassigned_root_bus_resources() will
result in the same tree, and it is enough to try just once.

In case of failures the pci_reassign_root_bus_resources() disables BARs for
one of the hot-added devices and tries the assignment again.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index d265db4c746d..92517275fc06 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1816,6 +1816,13 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	int pci_try_num = 1;
 	enum enable_type enable_local;
 
+	if (pci_can_move_bars) {
+		__pci_bus_size_bridges(bus, NULL);
+		__pci_bus_assign_resources(bus, NULL, NULL);
+
+		goto dump;
+	}
+
 	/* Don't realloc if asked to do so */
 	enable_local = pci_realloc_detect(bus, pci_realloc_enable);
 	if (pci_realloc_enabled(enable_local)) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 10/24] PCI: hotplug: Calculate fixed parts of bridge windows
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (8 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 09/24] PCI: hotplug: Try to reassign movable BARs only once Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 11/24] PCI: Include fixed BARs into the bus size calculating Sergei Miroshnichenko
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

When movable BARs are enabled, and if a bridge contains a device with fixed
BARs, the corresponding windows can't be moved too far away from their
original positions - they must still contain all the fixed BARs, like that:

1) Window position before a bus rescan:

   | <--                    root bridge window                        --> |
   |                                                                      |
   | | <--     bridge window    --> |                                     |
   | | movable BARs | **fixed BAR** |                                     |
       ^^^^^^^^^^^^

2) Possible valid outcome after rescan and move:

   | <--                    root bridge window                        --> |
   |                                                                      |
   |                | <--     bridge window    --> |                      |
   |                | **fixed BAR** | Movable BARs |                      |
                                      ^^^^^^^^^^^^

A fixed area of a bridge window is a range that covers all the fixed BARs
of direct children, and all the fixed area of children bridges:

   | <--                    root bridge window                        --> |
   |                                                                      |
   |  | <--                  bridge window level 1                --> |   |
   |  | ********************** fixed area *********************       |   |
   |  |                                                               |   |
   |  | **fixed BAR** | <--      bridge window level 2     --> | BARs |   |
   |  |               | ************* fixed area ************* |      |   |
   |  |               |                                        |      |   |
   |  |               | **fixed BAR** |  BARs  | **fixed BAR** |      |   |
                                         ^^^^

To store these areas, the .fixed_range field has been added to the struct
pci_bus for every bridge window type: IO, MEM and PREFETCH. It is filled
recursively from leaves to the root before a rescan.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.h   | 23 ++++++++++++
 drivers/pci/probe.c | 89 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h |  6 +++
 3 files changed, 118 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2ef72741e8e5..869398a62e5f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -297,6 +297,17 @@ bool pci_dev_bar_fixed(struct pci_dev *dev, struct resource *res);
 bool pci_dev_bar_enabled(const struct pci_dev *dev, int idx);
 bool pci_bus_check_bars_assigned(struct pci_bus *bus, bool complete_set);
 
+static inline void pci_set_fixed_range(struct resource *res)
+{
+	res->start = (resource_size_t)-1;
+	res->end = 0;
+}
+
+static inline bool pci_fixed_range_valid(struct resource *res)
+{
+	return res->start <= res->end;
+}
+
 extern bool pci_init_done;
 
 /* PCIe link information from Link Capabilities 2 */
@@ -414,6 +425,18 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 	return dev->error_state == pci_channel_io_perm_failure;
 }
 
+static inline int pci_get_bridge_resource_idx(struct resource *r)
+{
+	if (r->flags & IORESOURCE_IO)
+		return 0;
+	else if (!(r->flags & IORESOURCE_PREFETCH))
+		return 1;
+	else if (r->flags & IORESOURCE_MEM_64)
+		return 2;
+
+	return 1;
+}
+
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DEV_DISABLED_BARS 1
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0c681bb767cc..2ec3f80f2711 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -589,6 +589,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
 static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 {
 	struct pci_bus *b;
+	int idx;
 
 	b = kzalloc(sizeof(*b), GFP_KERNEL);
 	if (!b)
@@ -605,6 +606,13 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	if (parent)
 		b->domain_nr = parent->domain_nr;
 #endif
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx)
+		pci_set_fixed_range(&b->fixed_range[idx]);
+
+	b->fixed_range[0].flags = IORESOURCE_IO;
+	b->fixed_range[1].flags = IORESOURCE_MEM;
+	b->fixed_range[2].flags = IORESOURCE_MEM | IORESOURCE_PREFETCH;
+
 	return b;
 }
 
@@ -3338,6 +3346,86 @@ static void pci_setup_bridges(struct pci_bus *bus)
 		pci_setup_bridge(bus);
 }
 
+static void pci_bus_update_fixed_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int idx;
+	resource_size_t start, end;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx)
+		pci_set_fixed_range(&bus->fixed_range[idx]);
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_fixed_range(dev->subordinate);
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		int i;
+		struct pci_bus *child = dev->subordinate;
+
+		for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+			struct resource *r = &dev->resource[i];
+			struct resource *fixed_range;
+
+			if (!r->flags || (r->flags & IORESOURCE_UNSET) ||
+			    !r->parent || !pci_dev_bar_fixed(dev, r))
+				continue;
+
+			idx = pci_get_bridge_resource_idx(r);
+			fixed_range = &bus->fixed_range[idx];
+			start = fixed_range->start;
+			end = fixed_range->end;
+
+			if (!pci_fixed_range_valid(fixed_range) ||
+			    start > r->start)
+				start = r->start;
+
+			if (end < r->end)
+				end = r->end;
+
+			if (fixed_range->start != start ||
+			    fixed_range->end != end) {
+				fixed_range->start = start;
+				fixed_range->end = end;
+				dev_dbg(&bus->dev, "Found fixed BAR %d %pR in %s, expand the fixed bridge window %d to %pR\n",
+					i, r, dev_name(&dev->dev), idx,
+					fixed_range);
+			}
+		}
+
+		if (child) {
+			for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+				struct resource *fixed_range = &bus->fixed_range[idx];
+				struct resource *child_fixed_range =
+					&child->fixed_range[idx];
+
+				if (!pci_fixed_range_valid(child_fixed_range))
+					continue;
+
+				start = fixed_range->start;
+				end = fixed_range->end;
+
+				if (!pci_fixed_range_valid(fixed_range) ||
+				    start > child_fixed_range->start)
+					start = child_fixed_range->start;
+
+				if (end < child_fixed_range->end)
+					end = child_fixed_range->end;
+
+				if (start < fixed_range->start ||
+				    end > fixed_range->end) {
+					dev_dbg(&bus->dev, "Expand the fixed bridge window %d from %s to 0x%llx-0x%llx\n",
+						idx, dev_name(&child->dev),
+						(unsigned long long)start,
+						(unsigned long long)end);
+					fixed_range->start = start;
+					fixed_range->end = end;
+				}
+			}
+		}
+	}
+}
+
 static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
@@ -3469,6 +3557,7 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 
 	if (pci_can_move_bars) {
 		pci_bus_rescan_prepare(root);
+		pci_bus_update_fixed_range(root);
 
 		max = pci_scan_child_bus(root);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 6a0a919a3cdb..b2d766ed425c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -596,6 +596,12 @@ struct pci_bus {
 	struct list_head resources;	/* Address space routed to this bus */
 	struct resource busn_res;	/* Bus numbers routed to this bus */
 
+	/*
+	 * If there are fixed resources in the bridge window, this range contains
+	 * the lowest start address and the highest end address of them.
+	 */
+	struct resource fixed_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 11/24] PCI: Include fixed BARs into the bus size calculating
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (9 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 10/24] PCI: hotplug: Calculate fixed parts of bridge windows Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 12/24] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows Sergei Miroshnichenko
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

The only difference between fixed and movable BARs is a size and offset
preservation after they are released (the corresponding struct resource* is
detached from a bridge window for a while during a bus rescan). So fixed
BARs should not be skipped in pbus_size_{mem,io}().

Bridge window size calculation uses pci_{,sriov_}resource_alignment(), that
are applicable only to not yet assigned BARs and don't make sense for fixed
ones. Original alignment of a fixed BAR is lost after assignment, so return
1 in this case as a neutral value.

A window should be additionally extended if it has distant fixed BARs on
its edges:

    | <--          bridge window          --> |
    | **fixed BAR** |         | **fixed BAR** |

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/iov.c       |  2 ++
 drivers/pci/pci.h       |  2 ++
 drivers/pci/setup-bus.c | 17 ++++++++++++++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4d1f392b05f9..481cb8257a8e 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -860,6 +860,8 @@ resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
  */
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
 {
+	if (pci_dev_bar_fixed(dev, dev->resource + resno))
+		return 1;
 	return pcibios_iov_resource_alignment(dev, resno);
 }
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 869398a62e5f..124b88398075 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -558,6 +558,8 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
 	if (resno >= PCI_IOV_RESOURCES && resno <= PCI_IOV_RESOURCE_END)
 		return pci_sriov_resource_alignment(dev, resno);
 #endif
+	if (pci_dev_bar_fixed(dev, res))
+		return 1;
 	if (dev->class >> 8 == PCI_CLASS_BRIDGE_CARDBUS)
 		return pci_cardbus_resource_alignment(res);
 	return resource_alignment(res);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 92517275fc06..1e52dd71f02a 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -882,6 +882,10 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
 
+	struct resource *fixed_range = &bus->fixed_range[0];
+	resource_size_t fixed_size = pci_fixed_range_valid(fixed_range) ?
+		resource_size(fixed_range) : 0;
+
 	if (!b_res)
 		return;
 
@@ -917,6 +921,9 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 		}
 	}
 
+	if (size1 < fixed_size)
+		size1 = fixed_size;
+
 	size0 = calculate_iosize(size, min_size, size1, 0, 0,
 			resource_size(b_res), min_align);
 	size1 = (!realloc_head || (realloc_head && !add_size && !children_add_size)) ? size0 :
@@ -998,6 +1005,14 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
 	resource_size_t add_align = 0;
+	int idx = pci_get_bridge_resource_idx(b_res);
+
+	struct resource *fixed_range = &bus->fixed_range[idx];
+	resource_size_t fixed_size = pci_fixed_range_valid(fixed_range) ?
+		resource_size(fixed_range) : 0;
+
+	if (min_size < fixed_size)
+		min_size = fixed_size;
 
 	if (!b_res)
 		return -ENOSPC;
@@ -1017,7 +1032,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 			struct resource *r = &dev->resource[i];
 			resource_size_t r_size;
 
-			if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+			if (r->parent ||
 			    !pci_dev_bar_enabled(dev, i) ||
 			    ((r->flags & mask) != type &&
 			     (r->flags & mask) != type2 &&
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 12/24] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (10 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 11/24] PCI: Include fixed BARs into the bus size calculating Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 13/24] PCI: Make sure bridge windows include their fixed BARs Sergei Miroshnichenko
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

With enabled movable BARs, bridge windows are recalculated during each PCI
rescan. Some of the BARs below a bridge may be fixed: these areas are
represented by the .fixed_range field in struct pci_bus.

If a bridge window size is equal to its fixed range, it can only be
assigned to the start of this range. But if a bridge window size is larger,
and this difference in size is denoted as "delta", the window can start
from (fixed_range.start - delta) to (fixed_range.start), and it can end
from (fixed_range.end) to (fixed_range.end + delta). This range (the new
.realloc_range field in struct pci_bus) must then be compared with fixed
ranges of neighbouring bridges to guarantee absence of intersections.

This patch only calculates valid ranges for reallocated bridges during pci
rescan, and the next one will make use of these values during allocation.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c     |  4 ++
 drivers/pci/setup-bus.c | 85 +++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |  6 +++
 3 files changed, 95 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2ec3f80f2711..765b2883755a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -613,6 +613,10 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 	b->fixed_range[1].flags = IORESOURCE_MEM;
 	b->fixed_range[2].flags = IORESOURCE_MEM | IORESOURCE_PREFETCH;
 
+	b->realloc_range[0].flags = IORESOURCE_IO;
+	b->realloc_range[1].flags = IORESOURCE_MEM;
+	b->realloc_range[2].flags = IORESOURCE_MEM | IORESOURCE_PREFETCH;
+
 	return b;
 }
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1e52dd71f02a..a6d8bb5ed43d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1814,6 +1814,90 @@ static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 }
 #endif
 
+/*
+ * Calculate the address margins where the bridge windows may be allocated to fit all
+ * the fixed BARs beneath.
+ */
+static void pci_bus_update_realloc_range(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	struct pci_bus *parent = bus->parent;
+	int idx;
+
+	if (!pci_can_move_bars)
+		return;
+
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->subordinate)
+			pci_bus_update_realloc_range(dev->subordinate);
+
+	if (!parent || !bus->self)
+		return;
+
+	for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+		struct resource *fixed_range = &bus->fixed_range[idx];
+		struct resource *realloc_range = &bus->realloc_range[idx];
+		resource_size_t window_size = resource_size(bus->resource[idx]);
+		resource_size_t realloc_start, realloc_end;
+
+		pci_set_fixed_range(realloc_range);
+
+		/* Check if there any fixed BARs under the bridge */
+		if (!pci_fixed_range_valid(fixed_range))
+			continue;
+
+		/* The lowest possible address where the bridge window can start */
+		realloc_start = fixed_range->end - window_size + 1;
+		if (realloc_start > fixed_range->start)
+			realloc_start = fixed_range->start;
+
+		/* The highest possible address where the bridge window can end */
+		realloc_end = fixed_range->start + window_size - 1;
+		if (realloc_end < fixed_range->end)
+			realloc_end = fixed_range->end;
+
+		/*
+		 * Check that realloc range doesn't intersect with hard fixed ranges
+		 * of neighboring bridges
+		 */
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			struct pci_bus *neighbor = dev->subordinate;
+			struct resource *n_imm_range;
+			int i;
+
+			if (neighbor == bus)
+				continue;
+
+			for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; ++i) {
+				struct resource *nr = &dev->resource[i];
+
+				if (!nr->flags ||
+				    !pci_dev_bar_fixed(dev, nr))
+					continue;
+
+				if (nr->end < fixed_range->start &&
+				    nr->end > realloc_start)
+					realloc_start = nr->end;
+			}
+
+			if (!neighbor)
+				continue;
+
+			n_imm_range = &neighbor->fixed_range[idx];
+
+			if (!pci_fixed_range_valid(n_imm_range))
+				continue;
+
+			if (n_imm_range->end < fixed_range->start &&
+			    n_imm_range->end > realloc_start)
+				realloc_start = n_imm_range->end;
+		}
+
+		realloc_range->start = realloc_start;
+		realloc_range->end = realloc_end;
+	}
+}
+
 /*
  * First try will not touch PCI bridge res.
  * Second and later try will clear small leaf bridge res.
@@ -1833,6 +1917,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 
 	if (pci_can_move_bars) {
 		__pci_bus_size_bridges(bus, NULL);
+		pci_bus_update_realloc_range(bus);
 		__pci_bus_assign_resources(bus, NULL, NULL);
 
 		goto dump;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b2d766ed425c..9f34b932dac6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -602,6 +602,12 @@ struct pci_bus {
 	 */
 	struct resource fixed_range[PCI_BRIDGE_RESOURCE_NUM];
 
+	/*
+	 * Acceptable address range, where the bridge window may reside, considering its
+	 * size, so it will cover all the fixed BARs below.
+	 */
+	struct resource realloc_range[PCI_BRIDGE_RESOURCE_NUM];
+
 	struct pci_ops	*ops;		/* Configuration access functions */
 	struct msi_controller *msi;	/* MSI controller */
 	void		*sysdata;	/* Hook for sys-specific extension */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 13/24] PCI: Make sure bridge windows include their fixed BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (11 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 12/24] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 14/24] PCI: hotplug: Add support of fixed BARs to pci_assign_resource() Sergei Miroshnichenko
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

When choosing a start address for a bridge window, it should be not just a
lowest possible address: this window must cover every underlying fixed BAR.
The lowest address that satisfies this requirement is the .realloc_range
field of struct pci_bus.

After allocating a bridge window, validate that it covers all its fixed
BARs: this range is put to the .fixed_range field of struct pci_bus.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/bus.c       |  2 +-
 drivers/pci/setup-res.c | 29 +++++++++++++++++++++++++++--
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 8e40b3e6da77..a1efa87e31b9 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -192,7 +192,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, struct resource *res,
 		 * this is an already-configured bridge window, its start
 		 * overrides "min".
 		 */
-		if (avail.start)
+		if (min_used < avail.start)
 			min_used = avail.start;
 
 		max = avail.end;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 51bc69d60791..494eb5a2e98c 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -248,9 +248,21 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 	struct resource *res = dev->resource + resno;
 	resource_size_t min;
 	int ret;
+	struct resource *fixed_range = NULL;
 
 	min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
+	if (pci_can_move_bars && dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
+		struct pci_bus *child_bus = dev->subordinate;
+		int win_no = resno - PCI_BRIDGE_RESOURCES;
+
+		fixed_range = &child_bus->fixed_range[win_no];
+		if (pci_fixed_range_valid(fixed_range))
+			min = child_bus->realloc_range[win_no].start;
+		else
+			fixed_range = NULL;
+	}
+
 	/*
 	 * First, try exact prefetching match.  Even if a 64-bit
 	 * prefetchable bridge window is below 4GB, we can't put a 32-bit
@@ -262,7 +274,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 				     IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
 				     pcibios_align_resource, dev);
 	if (ret == 0)
-		return 0;
+		goto check_fixed;
 
 	/*
 	 * If the prefetchable window is only 32 bits wide, we can put
@@ -274,7 +286,7 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 					     IORESOURCE_PREFETCH,
 					     pcibios_align_resource, dev);
 		if (ret == 0)
-			return 0;
+			goto check_fixed;
 	}
 
 	/*
@@ -287,6 +299,19 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
 		ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 					     pcibios_align_resource, dev);
 
+check_fixed:
+	if (ret == 0 && fixed_range &&
+	    (res->start > fixed_range->start ||
+	     res->end < fixed_range->end)) {
+		dev_err(&bus->dev, "fixed area %pR for %s doesn't fit in the allocated %pR (0x%llx-0x%llx)",
+			fixed_range,
+			dev_name(&dev->dev),
+			res, (unsigned long long)res->start,
+			(unsigned long long)res->end);
+		release_resource(res);
+		return -1;
+	}
+
 	return ret;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 14/24] PCI: hotplug: Add support of fixed BARs to pci_assign_resource()
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (12 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 13/24] PCI: Make sure bridge windows include their fixed BARs Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 15/24] PCI: hotplug: Sort fixed BARs before assignment Sergei Miroshnichenko
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Fixed BARs must be assigned within a bridge window first, before movable
BARs and neighboring bridge windows. Currently they are assigned last by
pdev_assign_fixed_resources().

Let the fixed BARs be handled by pci_assign_resource() in the same way as
it does for movable ones, assigning them in correct order, unifying the
code.

Allow matching IORESOURCE_PCI_FIXED prefetchable BARs to non-prefetchable
windows, so they follow the same rules as non-flagged fixed BARs.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 43 -----------------------------------------
 drivers/pci/setup-res.c | 41 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index a6d8bb5ed43d..1f76a4dffb7d 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1344,47 +1344,6 @@ void pci_bus_size_bridges(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_size_bridges);
 
-static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
-{
-	int i;
-	struct resource *parent_r;
-	unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
-			     IORESOURCE_PREFETCH;
-
-	pci_bus_for_each_resource(b, parent_r, i) {
-		if (!parent_r)
-			continue;
-
-		if ((r->flags & mask) == (parent_r->flags & mask) &&
-		    resource_contains(parent_r, r))
-			request_resource(parent_r, r);
-	}
-}
-
-/*
- * Try to assign any resources marked as IORESOURCE_PCI_FIXED, as they are
- * skipped by pbus_assign_resources_sorted().
- */
-static void pdev_assign_fixed_resources(struct pci_dev *dev)
-{
-	int i;
-
-	for (i = 0; i <  PCI_NUM_RESOURCES; i++) {
-		struct pci_bus *b;
-		struct resource *r = &dev->resource[i];
-
-		if (r->parent || !(r->flags & IORESOURCE_PCI_FIXED) ||
-		    !(r->flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-			continue;
-
-		b = dev->bus;
-		while (b && !r->parent) {
-			assign_fixed_resource_on_bus(b, r);
-			b = b->parent;
-		}
-	}
-}
-
 void __pci_bus_assign_resources(const struct pci_bus *bus,
 				struct list_head *realloc_head,
 				struct list_head *fail_head)
@@ -1395,8 +1354,6 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
 	pbus_assign_resources_sorted(bus, realloc_head, fail_head);
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		pdev_assign_fixed_resources(dev);
-
 		b = dev->subordinate;
 		if (!b)
 			continue;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 494eb5a2e98c..a1e61e74ce00 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -331,14 +331,51 @@ static int _pci_assign_resource(struct pci_dev *dev, int resno,
 	return ret;
 }
 
+static int assign_fixed_resource_on_bus(struct pci_dev *dev, int resno)
+{
+	int i;
+	struct resource *parent_r;
+	unsigned long mask = IORESOURCE_TYPE_BITS;
+	struct resource *r = dev->resource + resno;
+
+	/*
+	 * If we have a shadow copy in RAM, the PCI device doesn't respond
+	 * to the shadow range
+	 */
+	if (r->flags & IORESOURCE_ROM_SHADOW)
+		return 0;
+
+	pci_bus_for_each_resource(dev->bus, parent_r, i) {
+		if (!parent_r)
+			continue;
+
+		if ((r->flags & mask) != (parent_r->flags & mask))
+			continue;
+
+		if (parent_r->flags & IORESOURCE_PREFETCH &&
+		    !(r->flags & IORESOURCE_PREFETCH))
+			continue;
+
+		if (resource_contains(parent_r, r)) {
+			if (!request_resource(parent_r, r)) {
+				pci_info(dev, "BAR %d: assigned fixed %pR\n", resno, r);
+				return 0;
+			}
+		}
+	}
+
+	pci_err(dev, "BAR %d: failed to assign fixed %pR\n", resno, r);
+	return -ENOSPC;
+}
+
 int pci_assign_resource(struct pci_dev *dev, int resno)
 {
 	struct resource *res = dev->resource + resno;
 	resource_size_t align, size;
 	int ret;
 
-	if (res->flags & IORESOURCE_PCI_FIXED)
-		return 0;
+	if (res->flags && pci_dev_bar_fixed(dev, res))
+		return assign_fixed_resource_on_bus(dev, resno);
 
 	res->flags |= IORESOURCE_UNSET;
 	align = pci_resource_alignment(dev, res);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 15/24] PCI: hotplug: Sort fixed BARs before assignment
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (13 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 14/24] PCI: hotplug: Add support of fixed BARs to pci_assign_resource() Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 16/24] x86/PCI/ACPI: Fix up PCIBIOS_MIN_MEM if value computed from e820 is invalid Sergei Miroshnichenko
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Fixed BARs and bridge windows containing fixed BARs must be assigned before
the movable ones.

When assigning a fixed BAR/bridge window, its start address is chosen to be
the lowest possible. To prevent conflicts, sort such resources based on the
start address of their fixed areas.

Add support of fixed BARs and bridge windows to pdev_sort_resources().

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 63 +++++++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1f76a4dffb7d..4cadfa1f9519 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -124,7 +124,8 @@ static resource_size_t get_res_add_align(struct list_head *head,
 
 
 /* Sort resources by alignment */
-static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
+static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head,
+				struct list_head *head_fixed)
 {
 	int i;
 
@@ -133,17 +134,27 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		struct pci_dev_resource *dev_res, *tmp;
 		resource_size_t r_align;
 		struct list_head *n;
+		struct resource *fixed_res = NULL;
 
 		r = &dev->resource[i];
 
-		if (r->flags & IORESOURCE_PCI_FIXED)
-			continue;
-
 		if (!(r->flags) || r->parent || !pci_dev_bar_enabled(dev, i))
 			continue;
 
+		if (i >= PCI_BRIDGE_RESOURCES &&
+		    dev->subordinate) {
+			int idx = i - PCI_BRIDGE_RESOURCES;
+
+			fixed_res = &dev->subordinate->fixed_range[idx];
+		} else if (pci_dev_bar_fixed(dev, r)) {
+			fixed_res = r;
+		}
+
+		if (fixed_res && !pci_fixed_range_valid(fixed_res))
+			fixed_res = NULL;
+
 		r_align = pci_resource_alignment(dev, r);
-		if (!r_align) {
+		if (!r_align && !fixed_res) {
 			pci_warn(dev, "BAR %d: %pR has bogus alignment\n",
 				 i, r);
 			continue;
@@ -155,6 +166,30 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 		tmp->res = r;
 		tmp->dev = dev;
 
+		if (fixed_res) {
+			n = head_fixed;
+			list_for_each_entry(dev_res, head_fixed, list) {
+				struct resource *c_fixed_res = NULL;
+				int c_resno = dev_res->res - dev_res->dev->resource;
+				int br_idx = c_resno - PCI_BRIDGE_RESOURCES;
+				struct pci_bus *cbus = dev_res->dev->subordinate;
+
+				if (br_idx >= 0)
+					c_fixed_res = &cbus->fixed_range[br_idx];
+				else
+					c_fixed_res = dev_res->res;
+
+				if (fixed_res->start < c_fixed_res->start) {
+					n = &dev_res->list;
+					break;
+				}
+			}
+			/* Insert it just before n */
+			list_add_tail(&tmp->list, n);
+
+			continue;
+		}
+
 		/* Fallback is smallest one or list is empty */
 		n = head;
 		list_for_each_entry(dev_res, head, list) {
@@ -173,7 +208,8 @@ static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 	}
 }
 
-static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
+static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head,
+				 struct list_head *head_fixed)
 {
 	u16 class = dev->class >> 8;
 
@@ -189,7 +225,7 @@ static void __dev_sort_resources(struct pci_dev *dev, struct list_head *head)
 			return;
 	}
 
-	pdev_sort_resources(dev, head);
+	pdev_sort_resources(dev, head, head_fixed);
 }
 
 static inline void reset_resource(struct resource *res)
@@ -484,8 +520,13 @@ static void pdev_assign_resources_sorted(struct pci_dev *dev,
 					 struct list_head *fail_head)
 {
 	LIST_HEAD(head);
+	LIST_HEAD(head_fixed);
+
+	__dev_sort_resources(dev, &head, &head_fixed);
+
+	if (!list_empty(&head_fixed))
+		__assign_resources_sorted(&head_fixed, NULL, NULL);
 
-	__dev_sort_resources(dev, &head);
 	__assign_resources_sorted(&head, add_head, fail_head);
 
 }
@@ -496,9 +537,13 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
 {
 	struct pci_dev *dev;
 	LIST_HEAD(head);
+	LIST_HEAD(head_fixed);
 
 	list_for_each_entry(dev, &bus->devices, bus_list)
-		__dev_sort_resources(dev, &head);
+		__dev_sort_resources(dev, &head, &head_fixed);
+
+	if (!list_empty(&head_fixed))
+		__assign_resources_sorted(&head_fixed, NULL, NULL);
 
 	__assign_resources_sorted(&head, realloc_head, fail_head);
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 16/24] x86/PCI/ACPI: Fix up PCIBIOS_MIN_MEM if value computed from e820 is invalid
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (14 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 15/24] PCI: hotplug: Sort fixed BARs before assignment Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 17/24] PCI: hotplug: Configure MPS after manual bus rescan Sergei Miroshnichenko
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko, Thomas Gleixner

The value of PCIBIOS_MIN_MEM reported by BIOS 1.3 on Supermicro H11SSL-i
via e820__setup_pci_gap():

  [mem 0xebff1000-0xfe9fffff] available for PCI devices

is only suitable for a single root complex out of four (0000:00):

  pci_bus 0000:00: root bus resource [mem 0xec000000-0xefffffff window]
  pci_bus 0000:20: root bus resource [mem 0xeb800000-0xebefffff window]
  pci_bus 0000:40: root bus resource [mem 0xeb200000-0xeb5fffff window]
  pci_bus 0000:60: root bus resource [mem 0xe8b00000-0xeaffffff window]

That makes the AMD EPYC 7251 unable to assign BARs of devices hot-added in
those three unlucky RCs (0000:20, 0000:40 and 0000:60).

If there are apertures that end below the current PCIBIOS_MIN_MEM (which is
a variable pci_mem_start on x86), adjust it to the aperture's start.

CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 arch/x86/pci/acpi.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 948656069cdd..9eccb26d0bf3 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -299,6 +299,21 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci)
 	int status;
 
 	status = acpi_pci_probe_root_resources(ci);
+
+	resource_list_for_each_entry(entry, &ci->resources) {
+		struct resource *res = entry->res;
+
+		if (!(res->flags & IORESOURCE_MEM) ||
+		    res->end > pci_mem_start ||
+		    res->start == 0xa0000)
+			continue;
+
+		dev_warn(&ci->root->device->dev, "Fix up PCI start address: %lx -> %llx\n",
+			 pci_mem_start, res->start);
+
+		pci_mem_start = res->start;
+	}
+
 	if (pci_use_crs) {
 		resource_list_for_each_entry_safe(entry, tmp, &ci->resources)
 			if (resource_is_pcicfg_ioport(entry->res))
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 17/24] PCI: hotplug: Configure MPS after manual bus rescan
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (15 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 16/24] x86/PCI/ACPI: Fix up PCIBIOS_MIN_MEM if value computed from e820 is invalid Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 18/24] PCI: hotplug: Don't disable the released bridge windows immediately Sergei Miroshnichenko
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Assure that MPS settings are set up for bridges which are discovered during
manually triggered rescan via sysfs. This sequence of bridge init (using
pci_rescan_bus()) later will be used for pciehp hot-add events when BAR
movement is enabled.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 765b2883755a..d01d93c6bfa2 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3554,7 +3554,7 @@ static void pci_reassign_root_bus_resources(struct pci_bus *root)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
 	unsigned int max;
-	struct pci_bus *root = bus;
+	struct pci_bus *root = bus, *child;
 
 	while (!pci_is_root_bus(root))
 		root = root->parent;
@@ -3575,6 +3575,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 		pci_assign_unassigned_bus_resources(bus);
 	}
 
+	list_for_each_entry(child, &root->children, node)
+		pcie_bus_configure_settings(child);
+
 	pci_bus_add_devices(bus);
 
 	return max;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 18/24] PCI: hotplug: Don't disable the released bridge windows immediately
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (16 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 17/24] PCI: hotplug: Configure MPS after manual bus rescan Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 19/24] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs Sergei Miroshnichenko
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

On a hotplug event with enabled BAR movement, calculating new BAR layout
and bridge windows takes some time. During this procedure, the structures
representing these windows are released - marked for recalculation.

When the new bridge window values are ready, they are written to the bridge
registers via pci_setup_bridges().

Currently, bridge's registers are updated immediately after releasing a
window to disable it. But if a driver doesn't yet support movable BARs, it
doesn't stop MEM transactions during the hotplug, so disabled bridge
windows will break them.

Let the bridge windows remain operating after releasing, as they will be
updated to the new values in the end of a hotplug event.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 4cadfa1f9519..3081f2d2a48a 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1635,7 +1635,8 @@ static void pci_bridge_release_resources(struct pci_bus *bus,
 		/* Avoiding touch the one without PREF */
 		if (type & IORESOURCE_PREFETCH)
 			type = IORESOURCE_PREFETCH;
-		__pci_setup_bridge(bus, type);
+		if (!pci_can_move_bars)
+			__pci_setup_bridge(bus, type);
 		/* For next child res under same bridge */
 		r->flags = old_flags;
 	}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 19/24] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (17 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 18/24] PCI: hotplug: Don't disable the released bridge windows immediately Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 20/24] PCI: Don't claim fixed BARs Sergei Miroshnichenko
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

With movable BARs, hot-adding a device is not local to its bridge anymore,
but it affects the whole RC: BARs and bridge windows can be substantially
rearranged. So instead of trying to fit the new devices into preallocated
reserved gaps, initiate a full domain rescan.

The pci_rescan_bus() covers all the operations of the replaced functions:
 - assigning new bus numbers, as the pci_hp_add_bridge() does it;
 - allocating BARs (pci_assign_unassigned_bridge_resources());
 - cofiguring MPS settings (pcie_bus_configure_settings());
 - binding devices to their drivers (pci_bus_add_devices()).

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/hotplug/pciehp_pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index d17f3bf36f70..6d4c1ef38210 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -58,6 +58,11 @@ int pciehp_configure_device(struct controller *ctrl)
 		goto out;
 	}
 
+	if (pci_can_move_bars) {
+		pci_rescan_bus(parent);
+		goto out;
+	}
+
 	for_each_pci_bridge(dev, parent)
 		pci_hp_add_bridge(dev);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 20/24] PCI: Don't claim fixed BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (18 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 19/24] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 21/24] PCI: hotplug: Don't reserve bus space when enabled movable BARs Sergei Miroshnichenko
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Fixed BAR always has an address, but its parent bridge window can be not
yet calculated (during boot) or temporarily released for re-calculation
(during PCI rescan) when pci_claim_resource() is called.

Apart from that, fixed BARs now have separate guaranteed mechanism of
assigning comparing to usual BARs, so claiming them is not needed.

Return immediately from pci_claim_resource() to prevent misleading "can't
claim BAR ... no compatible bridge window" error messages

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-res.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index a1e61e74ce00..98051edd7eef 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -138,6 +138,9 @@ int pci_claim_resource(struct pci_dev *dev, int resource)
 		return -EINVAL;
 	}
 
+	if (pci_dev_bar_fixed(dev, res))
+		return 0;
+
 	/*
 	 * If we have a shadow copy in RAM, the PCI device doesn't respond
 	 * to the shadow range, so we don't need to claim it, and upstream
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 21/24] PCI: hotplug: Don't reserve bus space when enabled movable BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (19 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 20/24] PCI: Don't claim fixed BARs Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 22/24] PCI: hotplug: Enable the movable BARs feature by default Sergei Miroshnichenko
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

A hot-added bridge with many hotplug-capable ports may request reserving
more IO space than the machine has. This could be overridden with the
"hpiosize=" kernel argument though.

But when BARs are movable, no need to reserve space anymore: new BARs are
allocated not from reserved gaps, but via rearranging the existing BARs.
Requesting a precise amount of space for bridge windows increases the
chances of adding the new bridge successfully.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 3081f2d2a48a..e37976551a05 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1302,7 +1302,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 
 	case PCI_HEADER_TYPE_BRIDGE:
 		pci_bridge_check_ranges(bus);
-		if (bus->self->is_hotplug_bridge) {
+		if (bus->self->is_hotplug_bridge && !pci_can_move_bars) {
 			additional_io_size  = pci_hotplug_io_size;
 			additional_mmio_size = pci_hotplug_mmio_size;
 			additional_mmio_pref_size = pci_hotplug_mmio_pref_size;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 22/24] PCI: hotplug: Enable the movable BARs feature by default
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (20 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 21/24] PCI: hotplug: Don't reserve bus space when enabled movable BARs Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 23/24] PCI/portdrv: Declare support of movable BARs Sergei Miroshnichenko
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

This is the last patch in the series which implements the essentials of the
movable BARs feature, so it is turned on by default now. Tested on Intel
and AMD machines with "pci=pcie_bus_peer2peer" command line argument.

In case of problems it is still can be overridden by the following command
line option:

  pcie_movable_bars=off

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 5ae07991edff..4aa6c22264a3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -79,7 +79,7 @@ static void pci_dev_d3_sleep(struct pci_dev *dev)
 int pci_domains_supported = 1;
 #endif
 
-bool pci_can_move_bars;
+bool pci_can_move_bars = true;
 
 #define DEFAULT_CARDBUS_IO_SIZE		(256)
 #define DEFAULT_CARDBUS_MEM_SIZE	(64*1024*1024)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 23/24] PCI/portdrv: Declare support of movable BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (21 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 22/24] PCI: hotplug: Enable the movable BARs feature by default Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-27 18:23 ` [PATCH v8 24/24] nvme-pci: Handle " Sergei Miroshnichenko
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko

Currently there are no reliable way to determine if a driver uses BARs of
its devices (their struct resource don't always have a child), so if it
doesn't explicitly support movable BARs, they are considered fixed.

The portdrv driver for PCI switches don't use BARs, so add empty hooks
.rescan_prepare() and .rescan_done() to increase chances to allocate new
BARs for new devices.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/pcie/portdrv_pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 160d67c59310..df1faf2fed86 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -205,6 +205,14 @@ static const struct pci_error_handlers pcie_portdrv_err_handler = {
 	.resume = pcie_portdrv_err_resume,
 };
 
+static void pcie_portdrv_rescan_prepare(struct pci_dev *pdev)
+{
+}
+
+static void pcie_portdrv_rescan_done(struct pci_dev *pdev)
+{
+}
+
 static struct pci_driver pcie_portdriver = {
 	.name		= "pcieport",
 	.id_table	= &port_pci_ids[0],
@@ -215,6 +223,9 @@ static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
+	.rescan_prepare	= pcie_portdrv_rescan_prepare,
+	.rescan_done	= pcie_portdrv_rescan_done,
+
 	.driver.pm	= PCIE_PORTDRV_PM_OPS,
 };
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 24/24] nvme-pci: Handle movable BARs
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (22 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 23/24] PCI/portdrv: Declare support of movable BARs Sergei Miroshnichenko
@ 2020-04-27 18:23 ` Sergei Miroshnichenko
  2020-04-28 12:59 ` [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Christian König
  2020-08-10 22:21 ` Bjorn Helgaas
  25 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-04-27 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux, Sergei Miroshnichenko, linux-nvme, Christoph Hellwig

Hot-added devices can affect the existing ones by moving their BARs. The
PCI subsystem will inform the NVME driver about this by invoking the
.rescan_prepare() and .rescan_done() hooks, so the BARs can by re-mapped.

Tested under the "randrw" mode of the fio tool, and when using an NVME
drive as a root filesystem storage. Before the hot-adding:

  % sudo cat /proc/iomem
  ...
                3fe800000000-3fe8007fffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:18
                    3fe800000000-3fe8000fffff : 0020:18:00.0
                      3fe800000000-3fe8000fffff : nvme
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
                    3fe800100000-3fe80017ffff : 0020:18:00.0
  ...

Then another NVME drive was hot-added, so BARs of the 0020:18:00.0 are
moved:

  % sudo cat /proc/iomem
    ...
                3fe800000000-3fe800ffffff : PCI Bus 0020:0b
                  3fe800000000-3fe8007fffff : PCI Bus 0020:10
                    3fe800000000-3fe800003fff : 0020:10:00.0
                      3fe800000000-3fe800003fff : nvme
                    3fe800010000-3fe80001ffff : 0020:10:00.0
                  3fe800800000-3fe800ffffff : PCI Bus 0020:18
                    3fe800800000-3fe8008fffff : 0020:18:00.0
                      3fe800800000-3fe8008fffff : nvme
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
                    3fe800900000-3fe80097ffff : 0020:18:00.0
    ...

During the rescanning, both READ and WRITE speeds drop to zero for a while
due to driver's pause, then restore.

Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/nvme/host/pci.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4e79e412b276..677ded2d4dd4 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1621,7 +1621,7 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-	if (size <= dev->bar_mapped_size)
+	if (dev->bar && size <= dev->bar_mapped_size)
 		return 0;
 	if (size > pci_resource_len(pdev, 0))
 		return -ENOMEM;
@@ -3032,6 +3032,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	flush_work(&dev->ctrl.reset_work);
 }
 
+static void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable(dev, false);
+	nvme_dev_unmap(dev);
+	dev->bar = NULL;
+}
+
+static void nvme_rescan_done(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_map(dev);
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
@@ -3110,6 +3127,8 @@ static struct pci_driver nvme_driver = {
 #endif
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
+	.rescan_prepare	= nvme_rescan_prepare,
+	.rescan_done	= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (23 preceding siblings ...)
  2020-04-27 18:23 ` [PATCH v8 24/24] nvme-pci: Handle " Sergei Miroshnichenko
@ 2020-04-28 12:59 ` Christian König
  2020-05-04  9:30   ` Sergei Miroshnichenko
  2020-08-10 22:21 ` Bjorn Helgaas
  25 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2020-04-28 12:59 UTC (permalink / raw)
  To: Sergei Miroshnichenko, linux-pci
  Cc: Bjorn Helgaas, Lukas Wunner, Stefan Roese, Andy Lavr,
	Ard Biesheuvel, David Laight, Rajat Jain, linux

Well that is a really nice surprise. Just FYI the situation with GPUs is 
essentially this:

a) The BAR to access video memory with the CPU is by default only 256MB 
in size for backward compatibility with 32bit Windows 7 and older.

b) Modern GPUs easily have 16GB of video memory, but most of that used 
to be accessed only rarely by the CPU. Unfortunately this has changed 
recently by getting more modern graphics APIs in userspace (Vulkan).

c) Both NVidia as well as AMD used to have a mechanism to map different 
stuff into the 256MB window, but AMD dropped this ability quite some 
time ago because it was rather inefficient.

d) Instead for hard of the last 5 years AMD implements the PCI standard 
for dynamic BAR resizing. So what we do is to extend the 256MB BAR into 
16GB (or whatever is needed) once the OS is started and the driver loads.

The problem with this approach is that sometimes bridges can't be 
reconfigured and BARs resized because we have other BARs currently in 
use under the same bridge.

So long story short you have fixed my BAR resizing problem with this 
patchset as well :D

Am 27.04.20 um 20:23 schrieb Sergei Miroshnichenko:
> Currently PCI hotplug works on top of resources which are usually reserved
> not by the kernel, but by BIOS, bootloader, firmware, etc. These resources
> are gaps in the address space where BARs of new devices may fit, and extra
> bus number per port, so bridges can be hot-added. This series aim the BARs
> problem: it shows the kernel how to redistribute them on the run, so the
> hotplug becomes predictable and cross-platform. A follow-up patchset will
> propose a solution for bus numbers.
>
> To arrange a space for BARs of new hotplugged devices, the kernel can pause
> the drivers of working PCI devices and reshuffle the assigned BARs. When a
> driver is un-paused by the kernel, it should ioremap() the new addresses of
> its BARs.
>
> Drivers indicate their support of the feature by implementing the new hooks
> .rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
> doesn't yet support the feature, BARs of its devices will be considered as
> immovable and handled in the same way as resources with the PCI_FIXED flag:
> they are guaranteed to remain untouched.

Could we let rescan_prepare() optionally return an error and then 
consider the BARs in question not movable for the current rescan? 
Alternatively would it be allowed in the implementation of the 
rescan_prepare() callback to update the PCI_FIXED flags on the BARs?

Problem is that we can't know beforehand if a BAR is currently in use or 
not or if we can block the uses until the rescan is completed.

Additional to that I'm not an expert on the PCI code outside of the 
stuff that I wrote/touched. Still trying to go over the set in the next 
couple of days, but don't expect more than an Acked-by from me.

Cheers,
Christian.

>
> Tested on a number of x86_64 machines without any special kernel command
> line arguments:
>   - PC: i7-5930K + ASUS X99-A;
>   - PC: i5-8500 + ASUS Z370-F;
>   - Supermicro Super Server/H11SSL-i: AMD EPYC 7251;
>   - HP ProLiant DL380 G5: Xeon X5460;
>   - Dell Inspiron N5010: i5 M 480;
>   - Dell Precision M6600: i7-2920XM.
>
> Also tested on a Power8 (Vesnin) and Power9 (Nicole) ppc64le machines, but
> with extra patchset, its next version is to be sent upstream a bit later.
>
> First two patches of this series are independent bugfixes, both are not
> related directly to the movable BARs feature, but without them the rest of
> this series will not work as expected.
>
> Patches 03-15 implement the essentials of the feature.
>
> Patches 16-21 are performance improvements for movable BARs and pciehp.
>
> Patch 22 enables the feature by default.
>
> Patches 23-24 add movable BARs support to nvme and portdrv.
>
> This patchset is a part of our work on adding support for hotplugging
> chains of chassis full of other bridges, NVME drives, SAS HBAs, GPUs, etc.
> without special requirements such as Hot-Plug Controller, reservation of
> bus numbers or memory regions by firmware, etc.
>
> Added Stefan Roese and Andy Lavr to CC, thank you for trying this on your
> hardware!
>
> Added Christian König and Ard Biesheuvel to CC, because of the recent
> "PCI: allow pci_resize_resource() to be used on devices on the root bus"
> thread, which covers a similar problem.
>
> Changes since v7:
>   - Added some documentation;
>   - Replaced every occurrence of the word "immovable" with "fixed";
>   - Don't touch PNP, ACPI resources anymore;
>   - Replaced double rescan with triple rescan:
>     * first try every BAR;
>     * if that failed, retry without BARs which weren't assigned before;
>     * if that failed, retry without BARs of hotplugged devices;
>   - Reassign BARs during boot only if BIOS assigned not all requested BARs;
>   - Fixed up PCIBIOS_MIN_MEM instead of ignoring it;
>   - Now the feature auto-disables in presence of a transparent bridge;
>   - Improved support of runtime PM;
>   - Fixed issues with incorrectly released bridge windows;
>   - Fixed calculating bridge window size.
>   
> Changes since v6:
>   - Added a fix for hotplug on AMD Epyc + Supermicro H11SSL-i by ignoring
>     PCIBIOS_MIN_MEM;
>   - Fixed a workaround which marks VGA BARs as immovables;
>   - Fixed misleading "can't claim BAR ... no compatible bridge window" error
>     messages;
>   - Refactored the code, reduced the amount of patches;
>   - Exclude PowerPC-specific arch patches, they will be sent separately;
>   - Disabled for PowerNV by default - waiting for the PCIPOCALYPSE patchset.
>   - Fixed reports from the kbuild test robot.
>
> Changes since v5:
>   - Simplified the disable flag, now it is "pci=no_movable_buses";
>   - More deliberate marking the BARs as immovable;
>   - Mark as immovable BARs which are used by unbound drivers;
>   - Ignoring BAR assignment by non-kernel program components, so the kernel
>     is able now to distribute BARs in optimal and predictable way;
>   - Move here PowerNV-specific patches from the older "powerpc/powernv/pci:
>     Make hotplug self-sufficient, independent of FW and DT" series;
>   - Fix EEH cache rebuilding and PE allocation for PowerNV during rescan.
>
> Changes since v4:
>   - Feature is enabled by default (turned on by one of the latest patches);
>   - Add pci_dev_movable_bars_supported(dev) instead of marking the immovable
>     BARs with the IORESOURCE_PCI_FIXED flag;
>   - Set up PCIe bridges during rescan via sysfs, so MPS settings are now
>     configured not only during system boot or pcihp events;
>   - Allow movement of switch's BARs if claimed by portdrv;
>   - Update EEH address caches after rescan for powerpc;
>   - Don't disable completely hot-added devices which can't have BARs being
>     fit - just disable their BARs, so they are still visible in lspci etc;
>   - Clearer names: fixed_range_hard -> immovable_range, fixed_range_soft ->
>     realloc_range;
>   - Drop the patch for pci_restore_config_space() - fixed by properly using
>     the runtime PM.
>
> Changes since v3:
>   - Rebased to the upstream, so the patches apply cleanly again.
>
> Changes since v2:
>   - Fixed double-assignment of bridge windows;
>   - Fixed assignment of fixed prefetched resources;
>   - Fixed releasing of fixed resources;
>   - Fixed a debug message;
>   - Removed auto-enabling the movable BARs for x86 - let's rely on the
>     "pcie_movable_bars=force" option for now;
>   - Reordered the patches - bugfixes first.
>
> Changes since v1:
>   - Add a "pcie_movable_bars={ off | force }" command line argument;
>   - Handle the IORESOURCE_PCI_FIXED flag properly;
>   - Don't move BARs of devices which don't support the feature;
>   - Guarantee that new hotplugged devices will not steal memory from working
>     devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
>     flag;
>   - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
>     using the reset_prepare()+reset_done() from struct pci_error_handlers;
>   - Add a bugfix of a race condition;
>   - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
>   - Fix the compatibility of the feature with pm_runtime and D3-state;
>   - Hotplug events from pciehp also can move BARs;
>   - Add support of the feature to the NVME driver.
>
> Sergei Miroshnichenko (24):
>    PCI: Fix race condition in pci_enable/disable_device()
>    PCI: Ensure a bridge has I/O and MEM access for hot-added devices
>    PCI: hotplug: Initial support of the movable BARs feature
>    PCI: Add version of release_child_resources() aware of fixed BARs
>    PCI: hotplug: Fix reassigning the released BARs
>    PCI: hotplug: Recalculate every bridge window during rescan
>    PCI: hotplug: Don't allow hot-added devices to steal resources
>    PCI: Reassign BARs if BIOS/bootloader had assigned not all of them
>    PCI: hotplug: Try to reassign movable BARs only once
>    PCI: hotplug: Calculate fixed parts of bridge windows
>    PCI: Include fixed BARs into the bus size calculating
>    PCI: hotplug: movable BARs: Compute limits for relocated bridge
>      windows
>    PCI: Make sure bridge windows include their fixed BARs
>    PCI: hotplug: Add support of fixed BARs to pci_assign_resource()
>    PCI: hotplug: Sort fixed BARs before assignment
>    x86/PCI/ACPI: Fix up PCIBIOS_MIN_MEM if value computed from e820 is
>      invalid
>    PCI: hotplug: Configure MPS after manual bus rescan
>    PCI: hotplug: Don't disable the released bridge windows immediately
>    PCI: pciehp: Trigger a domain rescan on hp events when enabled movable
>      BARs
>    PCI: Don't claim fixed BARs
>    PCI: hotplug: Don't reserve bus space when enabled movable BARs
>    PCI: hotplug: Enable the movable BARs feature by default
>    PCI/portdrv: Declare support of movable BARs
>    nvme-pci: Handle movable BARs
>
>   Documentation/PCI/pci.rst                     |  55 +++
>   .../admin-guide/kernel-parameters.txt         |   1 +
>   arch/powerpc/platforms/powernv/pci.c          |   2 +
>   arch/powerpc/platforms/pseries/setup.c        |   2 +
>   arch/x86/pci/acpi.c                           |  15 +
>   drivers/nvme/host/pci.c                       |  21 +-
>   drivers/pci/bus.c                             |   2 +-
>   drivers/pci/hotplug/pciehp_pci.c              |   5 +
>   drivers/pci/iov.c                             |   2 +
>   drivers/pci/pci.c                             |  33 +-
>   drivers/pci/pci.h                             |  33 ++
>   drivers/pci/pcie/portdrv_pci.c                |  11 +
>   drivers/pci/probe.c                           | 399 +++++++++++++++++-
>   drivers/pci/setup-bus.c                       | 301 ++++++++++---
>   drivers/pci/setup-res.c                       |  75 +++-
>   include/linux/pci.h                           |  20 +
>   16 files changed, 905 insertions(+), 72 deletions(-)
>
>
> base-commit: 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 02/24] PCI: Ensure a bridge has I/O and MEM access for hot-added devices
  2020-04-27 18:23 ` [PATCH v8 02/24] PCI: Ensure a bridge has I/O and MEM access for hot-added devices Sergei Miroshnichenko
@ 2020-04-29  6:30   ` kbuild test robot
  0 siblings, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2020-04-29  6:30 UTC (permalink / raw)
  To: Sergei Miroshnichenko, linux-pci
  Cc: kbuild-all, clang-built-linux, Bjorn Helgaas, Lukas Wunner,
	Stefan Roese, Andy Lavr, Christian König, Ard Biesheuvel,
	David Laight, Rajat Jain, linux

[-- Attachment #1: Type: text/plain, Size: 2505 bytes --]

Hi Sergei,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on powerpc/next linus/master v5.7-rc3 next-20200428]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Sergei-Miroshnichenko/PCI-Allow-BAR-movement-during-boot-and-hotplug/20200428-105625
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: powerpc-defconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project f30416fdde922eaa655934e050026930fefbd260)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/pci/pci.c:1745:3: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result]
                   pci_reenable_device(dev);
                   ^~~~~~~~~~~~~~~~~~~ ~~~
   1 warning generated.

vim +/warn_unused_result +1745 drivers/pci/pci.c

  1732	
  1733	static void pci_enable_bridge(struct pci_dev *dev)
  1734	{
  1735		struct pci_dev *bridge;
  1736		int retval;
  1737	
  1738		mutex_lock(&dev->enable_mutex);
  1739	
  1740		bridge = pci_upstream_bridge(dev);
  1741		if (bridge)
  1742			pci_enable_bridge(bridge);
  1743	
  1744		if (pci_is_enabled(dev)) {
> 1745			pci_reenable_device(dev);
  1746	
  1747			if (!dev->is_busmaster)
  1748				pci_set_master(dev);
  1749			mutex_unlock(&dev->enable_mutex);
  1750			return;
  1751		}
  1752	
  1753		retval = pci_enable_device(dev);
  1754		if (retval)
  1755			pci_err(dev, "Error enabling bridge (%d), continuing\n",
  1756				retval);
  1757		pci_set_master(dev);
  1758		mutex_unlock(&dev->enable_mutex);
  1759	}
  1760	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26104 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug
  2020-04-28 12:59 ` [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Christian König
@ 2020-05-04  9:30   ` Sergei Miroshnichenko
  0 siblings, 0 replies; 29+ messages in thread
From: Sergei Miroshnichenko @ 2020-05-04  9:30 UTC (permalink / raw)
  To: christian.koenig, linux-pci
  Cc: David.Laight, rajatja, ardb, helgaas, linux, sr, lukas, andy.lavr

Hello Christian,

On Tue, 2020-04-28 at 14:59 +0200, Christian König wrote:
> Well that is a really nice surprise. Just FYI the situation with GPUs
> is 
> essentially this:
> 
> a) The BAR to access video memory with the CPU is by default only
> 256MB 
> in size for backward compatibility with 32bit Windows 7 and older.
> 
> b) Modern GPUs easily have 16GB of video memory, but most of that
> used 
> to be accessed only rarely by the CPU. Unfortunately this has
> changed 
> recently by getting more modern graphics APIs in userspace (Vulkan).
> 
> c) Both NVidia as well as AMD used to have a mechanism to map
> different 
> stuff into the 256MB window, but AMD dropped this ability quite some 
> time ago because it was rather inefficient.
> 
> d) Instead for hard of the last 5 years AMD implements the PCI
> standard 
> for dynamic BAR resizing. So what we do is to extend the 256MB BAR
> into 
> 16GB (or whatever is needed) once the OS is started and the driver
> loads.
> 
> The problem with this approach is that sometimes bridges can't be 
> reconfigured and BARs resized because we have other BARs currently
> in 
> use under the same bridge.
> 
> So long story short you have fixed my BAR resizing problem with this 
> patchset as well :D
> 

Thanks for introducing to this problem, it is not yet covered by
this code, so I'll modify the pci_resize_resource(): let it try as
it does now, and if that didn't work - try pci_rescan_bus(), which
moves BARs.

To test this, do I need to trigger a BAR resizing manually, or
drm/amdgpu will do it automatically during init?

May these resized BAR change their start address during init?

> Am 27.04.20 um 20:23 schrieb Sergei Miroshnichenko:
> > ...
> > 
> > Drivers indicate their support of the feature by implementing the
> > new hooks
> > .rescan_prepare() and .rescan_done() in the struct pci_driver. If a
> > driver
> > doesn't yet support the feature, BARs of its devices will be
> > considered as
> > immovable and handled in the same way as resources with the
> > PCI_FIXED flag:
> > they are guaranteed to remain untouched.
> 
> Could we let rescan_prepare() optionally return an error and then 
> consider the BARs in question not movable for the current rescan? 
> Alternatively would it be allowed in the implementation of the 
> rescan_prepare() callback to update the PCI_FIXED flags on the BARs?
> 
> Problem is that we can't know beforehand if a BAR is currently in use
> or 
> not or if we can block the uses until the rescan is completed.

I guess one more optional hook may be added to the pci_driver:

  bool (*bar_fixed)(struct pci_dev *dev, struct resource *res);

So a driver can mark some BARs as fixed, and some - as movable, in
runtime, depending on current conditions.

If rescan_prepare() and rescan_done() hooks are set, but bar_fixed()
isn't, consider every BAR as movable. If bar_fixed() is set and returns
false, the driver must not use it between rescan_prepare() and
rescan_done().

Best regards,
Serge

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug
  2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
                   ` (24 preceding siblings ...)
  2020-04-28 12:59 ` [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Christian König
@ 2020-08-10 22:21 ` Bjorn Helgaas
  25 siblings, 0 replies; 29+ messages in thread
From: Bjorn Helgaas @ 2020-08-10 22:21 UTC (permalink / raw)
  To: Sergei Miroshnichenko
  Cc: linux-pci, Lukas Wunner, Stefan Roese, Andy Lavr,
	Christian König, Ard Biesheuvel, David Laight, Rajat Jain,
	linux

On Mon, Apr 27, 2020 at 09:23:34PM +0300, Sergei Miroshnichenko wrote:
> Currently PCI hotplug works on top of resources which are usually reserved
> not by the kernel, but by BIOS, bootloader, firmware, etc. These resources
> are gaps in the address space where BARs of new devices may fit, and extra
> bus number per port, so bridges can be hot-added. This series aim the BARs
> problem: it shows the kernel how to redistribute them on the run, so the
> hotplug becomes predictable and cross-platform. A follow-up patchset will
> propose a solution for bus numbers.
> 
> To arrange a space for BARs of new hotplugged devices, the kernel can pause
> the drivers of working PCI devices and reshuffle the assigned BARs. When a
> driver is un-paused by the kernel, it should ioremap() the new addresses of
> its BARs.
> 
> Drivers indicate their support of the feature by implementing the new hooks
> .rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
> doesn't yet support the feature, BARs of its devices will be considered as
> immovable and handled in the same way as resources with the PCI_FIXED flag:
> they are guaranteed to remain untouched.
> 
> Tested on a number of x86_64 machines without any special kernel command
> line arguments:
>  - PC: i7-5930K + ASUS X99-A;
>  - PC: i5-8500 + ASUS Z370-F;
>  - Supermicro Super Server/H11SSL-i: AMD EPYC 7251;
>  - HP ProLiant DL380 G5: Xeon X5460;
>  - Dell Inspiron N5010: i5 M 480;
>  - Dell Precision M6600: i7-2920XM.
> ...

There's a lot of good work here, and I apologize that we haven't made
much progress on merging it.  I suspect this will become more and more
important with Thunderbolt.

It does touch a lot of the ugliest and least maintainable code under
drivers/pci, which is *good* if we can clean it up a little bit in the
process, but it is also risky.

I expect that a few problems are inevitable because of BIOS issues,
driver issues, and devices that can't tolerate their BARs being moved.
We've tripped over a few of those devices in the past.

Those can be really hard to debug and fix since we won't have the
hardware in question.  To make them tractable, I think we will really
need some way to test at least the resource assignment pieces of this
"in vitro" without needing the actual hardware.  E.g., maybe we could
add enough diagnostics so that a dmesg log would contain all the
information needed to reproduce a PCI hierarchy, the initial resource
assignments, and subsequent hotplug events in some sort of test
fixture, maybe a qemu boot or similar.

Bjorn

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-08-10 22:21 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-27 18:23 [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 01/24] PCI: Fix race condition in pci_enable/disable_device() Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 02/24] PCI: Ensure a bridge has I/O and MEM access for hot-added devices Sergei Miroshnichenko
2020-04-29  6:30   ` kbuild test robot
2020-04-27 18:23 ` [PATCH v8 03/24] PCI: hotplug: Initial support of the movable BARs feature Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 04/24] PCI: Add version of release_child_resources() aware of fixed BARs Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 05/24] PCI: hotplug: Fix reassigning the released BARs Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 06/24] PCI: hotplug: Recalculate every bridge window during rescan Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 07/24] PCI: hotplug: Don't allow hot-added devices to steal resources Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 08/24] PCI: Reassign BARs if BIOS/bootloader had assigned not all of them Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 09/24] PCI: hotplug: Try to reassign movable BARs only once Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 10/24] PCI: hotplug: Calculate fixed parts of bridge windows Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 11/24] PCI: Include fixed BARs into the bus size calculating Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 12/24] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 13/24] PCI: Make sure bridge windows include their fixed BARs Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 14/24] PCI: hotplug: Add support of fixed BARs to pci_assign_resource() Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 15/24] PCI: hotplug: Sort fixed BARs before assignment Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 16/24] x86/PCI/ACPI: Fix up PCIBIOS_MIN_MEM if value computed from e820 is invalid Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 17/24] PCI: hotplug: Configure MPS after manual bus rescan Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 18/24] PCI: hotplug: Don't disable the released bridge windows immediately Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 19/24] PCI: pciehp: Trigger a domain rescan on hp events when enabled movable BARs Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 20/24] PCI: Don't claim fixed BARs Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 21/24] PCI: hotplug: Don't reserve bus space when enabled movable BARs Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 22/24] PCI: hotplug: Enable the movable BARs feature by default Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 23/24] PCI/portdrv: Declare support of movable BARs Sergei Miroshnichenko
2020-04-27 18:23 ` [PATCH v8 24/24] nvme-pci: Handle " Sergei Miroshnichenko
2020-04-28 12:59 ` [PATCH v8 00/24] PCI: Allow BAR movement during boot and hotplug Christian König
2020-05-04  9:30   ` Sergei Miroshnichenko
2020-08-10 22:21 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).